菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-02-10
📄 Abstract - SceneReVis: A Self-Reflective Vision-Grounded Framework for 3D Indoor Scene Synthesis via Multi-turn RL

Current one-pass 3D scene synthesis methods often suffer from spatial hallucinations, such as collisions, due to a lack of deliberative reasoning. To bridge this gap, we introduce SceneReVis, a vision-grounded self-reflection framework that employs an iterative ``diagnose-and-act'' loop to explicitly intercept and resolve spatial conflicts using multi-modal feedback. To support this step-wise paradigm, we construct SceneChain-12k, a large-scale dataset of causal construction trajectories derived through a novel reverse engineering pipeline. We further propose a two-stage training recipe that transitions from Supervised Fine-Tuning to Agentic Reinforcement Learning, evolving the model into an active spatial planner. Extensive experiments demonstrate that SceneReVis achieves state-of-the-art performance in high-fidelity generation and goal-oriented optimization, with robust generalization to long-tail domains.

顶级标签: computer vision agents reinforcement learning
详细标签: 3d scene synthesis spatial reasoning multi-turn rl vision-grounded planning self-reflection 或 搜索:

SceneReVis:一个基于视觉自反思的多轮强化学习框架用于3D室内场景合成 / SceneReVis: A Self-Reflective Vision-Grounded Framework for 3D Indoor Scene Synthesis via Multi-turn RL


1️⃣ 一句话总结

这篇论文提出了一个名为SceneReVis的新框架,它通过‘诊断-行动’的循环和多轮强化学习,让AI像人一样反复检查和修正3D场景中的物体摆放错误(如碰撞),从而生成更逼真、合理的室内场景。

源自 arXiv: 2602.09432