菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-03-24
📄 Abstract - VLGOR: Visual-Language Knowledge Guided Offline Reinforcement Learning for Generalizable Agents

Combining Large Language Models (LLMs) with Reinforcement Learning (RL) enables agents to interpret language instructions more effectively for task execution. However, LLMs typically lack direct perception of the physical environment, which limits their understanding of environmental dynamics and their ability to generalize to unseen tasks. To address this limitation, we propose Visual-Language Knowledge-Guided Offline Reinforcement Learning (VLGOR), a framework that integrates visual and language knowledge to generate imaginary rollouts, thereby enriching the interaction data. The core premise of VLGOR is to fine-tune a vision-language model to predict future states and actions conditioned on an initial visual observation and high-level instructions, ensuring that the generated rollouts remain temporally coherent and spatially plausible. Furthermore, we employ counterfactual prompts to produce more diverse rollouts for offline RL training, enabling the agent to acquire knowledge that facilitates following language instructions while grounding in environments based on visual cues. Experiments on robotic manipulation benchmarks demonstrate that VLGOR significantly improves performance on unseen tasks requiring novel optimal policies, achieving a success rate over 24% higher than the baseline methods.

顶级标签: multi-modal reinforcement learning agents
详细标签: offline rl vision-language model knowledge-guided generalization imaginary rollouts 或 搜索:

VLGOR:面向通用智能体的视觉-语言知识引导的离线强化学习 / VLGOR: Visual-Language Knowledge Guided Offline Reinforcement Learning for Generalizable Agents


1️⃣ 一句话总结

该论文提出了一个名为VLGOR的新框架,它通过结合视觉和语言知识来生成高质量的模拟训练数据,从而显著提升了智能体在未见过的任务中理解和执行语言指令的能力。

源自 arXiv: 2603.22892