通过自蒸馏实现自我进化的4D感知 / Self-Improving 4D Perception via Self-Distillation
1️⃣ 一句话总结
这篇论文提出了一个名为SelfEvo的自我进化框架,它能让已有的多视角三维重建模型仅通过观看无标签的视频,就能像‘自学’一样持续提升对动态场景的感知能力,而不再依赖昂贵的人工标注数据。
Large-scale multi-view reconstruction models have made remarkable progress, but most existing approaches still rely on fully supervised training with ground-truth 3D/4D annotations. Such annotations are expensive and particularly scarce for dynamic scenes, limiting scalability. We propose SelfEvo, a self-improving framework that continually improves pretrained multi-view reconstruction models using unlabeled videos. SelfEvo introduces a self-distillation scheme using spatiotemporal context asymmetry, enabling self-improvement for learning-based 4D perception without external annotations. We systematically study design choices that make self-improvement effective, including loss signals, forms of asymmetry, and other training strategies. Across eight benchmarks spanning diverse datasets and domains, SelfEvo consistently improves pretrained baselines and generalizes across base models (e.g. VGGT and $\pi^3$), with significant gains on dynamic scenes. Overall, SelfEvo achieves up to 36.5% relative improvement in video depth estimation and 20.1% in camera estimation, without using any labeled data. Project Page: this https URL.
通过自蒸馏实现自我进化的4D感知 / Self-Improving 4D Perception via Self-Distillation
这篇论文提出了一个名为SelfEvo的自我进化框架,它能让已有的多视角三维重建模型仅通过观看无标签的视频,就能像‘自学’一样持续提升对动态场景的感知能力,而不再依赖昂贵的人工标注数据。
源自 arXiv: 2604.08532