菜单

关于 🐙 GitHub
arXiv 提交日期: 2025-12-18
📄 Abstract - FrameDiffuser: G-Buffer-Conditioned Diffusion for Neural Forward Frame Rendering

Neural rendering for interactive applications requires translating geometric and material properties (G-buffer) to photorealistic images with realistic lighting on a frame-by-frame basis. While recent diffusion-based approaches show promise for G-buffer-conditioned image synthesis, they face critical limitations: single-image models like RGBX generate frames independently without temporal consistency, while video models like DiffusionRenderer are too computationally expensive for most consumer gaming sets ups and require complete sequences upfront, making them unsuitable for interactive applications where future frames depend on user input. We introduce FrameDiffuser, an autoregressive neural rendering framework that generates temporally consistent, photorealistic frames by conditioning on G-buffer data and the models own previous output. After an initial frame, FrameDiffuser operates purely on incoming G-buffer data, comprising geometry, materials, and surface properties, while using its previously generated frame for temporal guidance, maintaining stable, temporal consistent generation over hundreds to thousands of frames. Our dual-conditioning architecture combines ControlNet for structural guidance with ControlLoRA for temporal coherence. A three-stage training strategy enables stable autoregressive generation. We specialize our model to individual environments, prioritizing consistency and inference speed over broad generalization, demonstrating that environment-specific training achieves superior photorealistic quality with accurate lighting, shadows, and reflections compared to generalized approaches.

顶级标签: computer vision model training aigc
详细标签: neural rendering diffusion models temporal consistency g-buffer autoregressive generation 或 搜索:

FrameDiffuser:基于G-Buffer条件扩散的神经前向帧渲染 / FrameDiffuser: G-Buffer-Conditioned Diffusion for Neural Forward Frame Rendering


1️⃣ 一句话总结

这篇论文提出了一种名为FrameDiffuser的新方法,它能够利用游戏场景的几何与材质信息,像放电影一样一帧接一帧地实时生成画面逼真、前后连贯的动态图像,解决了现有技术在交互应用(如游戏)中画面闪烁或计算太慢的问题。


源自 arXiv: 2512.16670