菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-01-26
📄 Abstract - DeFM: Learning Foundation Representations from Depth for Robotics

Depth sensors are widely deployed across robotic platforms, and advances in fast, high-fidelity depth simulation have enabled robotic policies trained on depth observations to achieve robust sim-to-real transfer for a wide range of tasks. Despite this, representation learning for depth modality remains underexplored compared to RGB, where large-scale foundation models now define the state of the art. To address this gap, we present DeFM, a self-supervised foundation model trained entirely on depth images for robotic applications. Using a DINO-style self-distillation objective on a curated dataset of 60M depth images, DeFM learns geometric and semantic representations that generalize to diverse environments, tasks, and sensors. To retain metric awareness across multiple scales, we introduce a novel input normalization strategy. We further distill DeFM into compact models suitable for resource-constrained robotic systems. When evaluated on depth-based classification, segmentation, navigation, locomotion, and manipulation benchmarks, DeFM achieves state-of-the-art performance and demonstrates strong generalization from simulation to real-world environments. We release all our pretrained models, which can be adopted off-the-shelf for depth-based robotic learning without task-specific fine-tuning. Webpage: this https URL

顶级标签: robotics computer vision model training
详细标签: depth representation self-supervised learning sim-to-real foundation model robotic perception 或 搜索:

DeFM:从深度信息中学习机器人基础表征 / DeFM: Learning Foundation Representations from Depth for Robotics


1️⃣ 一句话总结

这篇论文提出了一个名为DeFM的自监督基础模型,它专门从大量深度图像中学习几何和语义表征,无需任务微调就能提升机器人在分类、导航、操作等多种任务上的性能,并能很好地从仿真环境迁移到现实世界。

源自 arXiv: 2601.18923