DepthPilot: From Controllability to Interpretability in Colonoscopy Video Generation

📄 Abstract - DepthPilot: From Controllability to Interpretability in Colonoscopy Video Generation

Controllable medical video generation has achieved remarkable progress, but it still lacks interpretability, which requires the alignment of generated contents with physical priors and faithful clinical manifestations. To push the boundaries from mere controllability to interpretability, we propose DepthPilot, the first interpretable framework for colonoscopy video generation. This work takes a step toward trustworthy generation through two synergistic paradigms. To achieve explicit geometric grounding, DepthPilot devises a prior distribution alignment strategy, injecting depth constraints into the diffusion backbone via parameter-efficient fine-tuning to ensure anatomical fidelity. To enhance intrinsic nonlinear modeling under these geometric constraints, DepthPilot employs an adaptive spline denoising module, replacing fixed linear weights with learnable spline functions to capture complex spatio-temporal dynamics. Extensive evaluations across three public datasets and in-house clinical data confirm DepthPilot's robust ability to produce physically consistent videos. It achieves FID scores below 15 across all benchmarks and ranks first in clinician assessments, bridging the gap between "visually realistic" and "clinically interpretable". Moreover, DepthPilot-generated videos are expected to enable reliable 3D reconstruction, facilitating surgical navigation and blind region identification, and serve as a foundation toward the colorectal world model.

DepthPilot：从可控性到可解释性的结肠镜检查视频生成 / DepthPilot: From Controllability to Interpretability in Colonoscopy Video Generation

1️⃣ 一句话总结

本文提出了一种名为DepthPilot的可解释框架，通过将深度信息约束与自适应样条去噪模块相结合，使得生成的结肠镜视频不仅视觉逼真，还能符合真实的解剖结构，从而在临床评估中超越现有方法，并为手术导航和三维重建提供了可靠基础。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要