EgoX:从单一第三人称视频生成第一人称视角视频 / EgoX: Egocentric Video Generation from a Single Exocentric Video
1️⃣ 一句话总结
这篇论文提出了一个名为EgoX的新方法,它能够利用一个现有的第三人称视角视频,自动生成一个看起来真实、连贯的第一人称视角视频,让观看者仿佛身临其境。
Egocentric perception enables humans to experience and understand the world directly from their own point of view. Translating exocentric (third-person) videos into egocentric (first-person) videos opens up new possibilities for immersive understanding but remains highly challenging due to extreme camera pose variations and minimal view overlap. This task requires faithfully preserving visible content while synthesizing unseen regions in a geometrically consistent manner. To achieve this, we present EgoX, a novel framework for generating egocentric videos from a single exocentric input. EgoX leverages the pretrained spatio temporal knowledge of large-scale video diffusion models through lightweight LoRA adaptation and introduces a unified conditioning strategy that combines exocentric and egocentric priors via width and channel wise concatenation. Additionally, a geometry-guided self-attention mechanism selectively attends to spatially relevant regions, ensuring geometric coherence and high visual fidelity. Our approach achieves coherent and realistic egocentric video generation while demonstrating strong scalability and robustness across unseen and in-the-wild videos.
EgoX:从单一第三人称视频生成第一人称视角视频 / EgoX: Egocentric Video Generation from a Single Exocentric Video
这篇论文提出了一个名为EgoX的新方法,它能够利用一个现有的第三人称视角视频,自动生成一个看起来真实、连贯的第一人称视角视频,让观看者仿佛身临其境。
源自 arXiv: 2512.08269