菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-03-03
📄 Abstract - Geometry-Guided Reinforcement Learning for Multi-view Consistent 3D Scene Editing

Leveraging the priors of 2D diffusion models for 3D editing has emerged as a promising paradigm. However, maintaining multi-view consistency in edited results remains challenging, and the extreme scarcity of 3D-consistent editing paired data renders supervised fine-tuning (SFT), the most effective training strategy for editing tasks, infeasible. In this paper, we observe that, while generating multi-view consistent 3D content is highly challenging, verifying 3D consistency is tractable, naturally positioning reinforcement learning (RL) as a feasible solution. Motivated by this, we propose \textbf{RL3DEdit}, a single-pass framework driven by RL optimization with novel rewards derived from the 3D foundation model, VGGT. Specifically, we leverage VGGT's robust priors learned from massive real-world data, feed the edited images, and utilize the output confidence maps and pose estimation errors as reward signals, effectively anchoring the 2D editing priors onto a 3D-consistent manifold via RL. Extensive experiments demonstrate that RL3DEdit achieves stable multi-view consistency and outperforms state-of-the-art methods in editing quality with high efficiency. To promote the development of 3D editing, we will release the code and model.

顶级标签: computer vision reinforcement learning multi-modal
详细标签: 3d scene editing multi-view consistency geometry guidance diffusion models reward design 或 搜索:

几何引导的强化学习用于多视角一致的3D场景编辑 / Geometry-Guided Reinforcement Learning for Multi-view Consistent 3D Scene Editing


1️⃣ 一句话总结

这篇论文提出了一个名为RL3DEdit的新方法,它利用强化学习和一个3D基础模型的反馈信号,来指导2D扩散模型进行3D场景编辑,从而高效地生成多视角下看起来一致且高质量的编辑结果,解决了现有方法难以保持3D一致性的难题。

源自 arXiv: 2603.03143