OSCAR: Optimization-Steered Agentic Planning for Composed Image Retrieval

📄 Abstract - OSCAR: Optimization-Steered Agentic Planning for Composed Image Retrieval

Composed image retrieval (CIR) requires complex reasoning over heterogeneous visual and textual constraints. Existing approaches largely fall into two paradigms: unified embedding retrieval, which suffers from single-model myopia, and heuristic agentic retrieval, which is limited by suboptimal, trial-and-error orchestration. To this end, we propose OSCAR, an optimization-steered agentic planning framework for composed image retrieval. We are the first to reformulate agentic CIR from a heuristic search process into a principled trajectory optimization problem. Instead of relying on heuristic trial-and-error exploration, OSCAR employs a novel offline-online paradigm. In the offline phase, we model CIR via atomic retrieval selection and composition as a two-stage mixed-integer programming problem, mathematically deriving optimal trajectories that maximize ground-truth coverage for training samples via rigorous boolean set operations. These trajectories are then stored in a golden library to serve as in-context demonstrations for online steering of VLM planner at online inference time. Extensive experiments on three public benchmarks and a private industrial benchmark show that OSCAR consistently outperforms SOTA baselines. Notably, it achieves superior performance using only 10% of training data, demonstrating strong generalization of planning logic rather than dataset-specific memorization.

OSCAR：用于组合图像检索的优化引导智能体规划框架 / OSCAR: Optimization-Steered Agentic Planning for Composed Image Retrieval

1️⃣ 一句话总结

这篇论文提出了一个名为OSCAR的新框架，它将组合图像检索任务从一个依赖试错的启发式搜索过程，转变为一个有理论依据的轨迹优化问题，通过离线计算最优检索路径并在线引导模型，从而用更少的数据实现了更准确、泛化能力更强的检索效果。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要