菜单

关于 🐙 GitHub
arXiv 提交日期: 2025-12-19
📄 Abstract - InsertAnywhere: Bridging 4D Scene Geometry and Diffusion Models for Realistic Video Object Insertion

Recent advances in diffusion-based video generation have opened new possibilities for controllable video editing, yet realistic video object insertion (VOI) remains challenging due to limited 4D scene understanding and inadequate handling of occlusion and lighting effects. We present InsertAnywhere, a new VOI framework that achieves geometrically consistent object placement and appearance-faithful video synthesis. Our method begins with a 4D aware mask generation module that reconstructs the scene geometry and propagates user specified object placement across frames while maintaining temporal coherence and occlusion consistency. Building upon this spatial foundation, we extend a diffusion based video generation model to jointly synthesize the inserted object and its surrounding local variations such as illumination and shading. To enable supervised training, we introduce ROSE++, an illumination aware synthetic dataset constructed by transforming the ROSE object removal dataset into triplets of object removed video, object present video, and a VLM generated reference image. Through extensive experiments, we demonstrate that our framework produces geometrically plausible and visually coherent object insertions across diverse real world scenarios, significantly outperforming existing research and commercial models.

顶级标签: computer vision video generation multi-modal
详细标签: video object insertion 4d scene understanding diffusion models geometric consistency illumination synthesis 或 搜索:

InsertAnywhere:连接4D场景几何与扩散模型以实现逼真的视频对象插入 / InsertAnywhere: Bridging 4D Scene Geometry and Diffusion Models for Realistic Video Object Insertion


1️⃣ 一句话总结

这项研究提出了一个名为InsertAnywhere的新框架,它通过结合4D场景几何理解和扩散模型,解决了在视频中逼真插入物体时遇到的位置、遮挡和光影一致性问题,从而实现了比现有方法更自然、更协调的视频编辑效果。

源自 arXiv: 2512.17504