CoSPlan:基于场景图增量更新的纠正式序列规划 / CoSPlan: Corrective Sequential Planning via Scene Graph Incremental Updates
1️⃣ 一句话总结
这篇论文提出了一个名为CoSPlan的新基准,用于测试大型视觉语言模型在容易出错的视觉序列规划任务中的能力,并针对现有模型的不足,提出了一种无需额外训练、通过场景图增量更新来帮助模型进行中间步骤推理的新方法,有效提升了规划性能。
Large-scale Vision-Language Models (VLMs) exhibit impressive complex reasoning capabilities but remain largely unexplored in visual sequential planning, i.e., executing multi-step actions towards a goal. Additionally, practical sequential planning often involves non-optimal (erroneous) steps, challenging VLMs to detect and correct such steps. We propose Corrective Sequential Planning Benchmark (CoSPlan) to evaluate VLMs in error-prone, vision-based sequential planning tasks across 4 domains: maze navigation, block rearrangement, image reconstruction,and object reorganization. CoSPlan assesses two key abilities: Error Detection (identifying non-optimal action) and Step Completion (correcting and completing action sequences to reach the goal). Despite using state-of-the-art reasoning techniques such as Chain-of-Thought and Scene Graphs, VLMs (e.g. Intern-VLM and Qwen2) struggle on CoSPlan, failing to leverage contextual cues to reach goals. Addressing this, we propose a novel training-free method, Scene Graph Incremental updates (SGI), which introduces intermediate reasoning steps between the initial and goal states. SGI helps VLMs reason about sequences, yielding an average performance gain of 5.2%. In addition to enhancing reliability in corrective sequential planning, SGI generalizes to traditional planning tasks such as Plan-Bench and VQA.
CoSPlan:基于场景图增量更新的纠正式序列规划 / CoSPlan: Corrective Sequential Planning via Scene Graph Incremental Updates
这篇论文提出了一个名为CoSPlan的新基准,用于测试大型视觉语言模型在容易出错的视觉序列规划任务中的能力,并针对现有模型的不足,提出了一种无需额外训练、通过场景图增量更新来帮助模型进行中间步骤推理的新方法,有效提升了规划性能。
源自 arXiv: 2512.10342