菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-03-01
📄 Abstract - Dr.Occ: Depth- and Region-Guided 3D Occupancy from Surround-View Cameras for Autonomous Driving

3D semantic occupancy prediction is crucial for autonomous driving perception, offering comprehensive geometric scene understanding and semantic recognition. However, existing methods struggle with geometric misalignment in view transformation due to the lack of pixel-level accurate depth estimation, and severe spatial class imbalance where semantic categories exhibit strong spatial anisotropy. To address these challenges, we propose Dr. Occ, a depth- and region-guided occupancy prediction framework. Specifically, we introduce a depth-guided 2D-to-3D View Transformer (D$^2$-VFormer) that effectively leverages high-quality dense depth cues from MoGe-2 to construct reliable geometric priors, thereby enabling precise geometric alignment of voxel features. Moreover, inspired by the Mixture-of-Experts (MoE) framework, we propose a region-guided Expert Transformer (R/R$^2$-EFormer) that adaptively allocates region-specific experts to focus on different spatial regions, effectively addressing spatial semantic variations. Thus, the two components make complementary contributions: depth guidance ensures geometric alignment, while region experts enhance semantic learning. Experiments on the Occ3D--nuScenes benchmark demonstrate that Dr. Occ improves the strong baseline BEVDet4D by 7.43% mIoU and 3.09% IoU under the full vision-only setting.

顶级标签: computer vision autonomous driving multi-modal
详细标签: 3d occupancy prediction view transformation depth estimation semantic segmentation autonomous perception 或 搜索:

Dr.Occ:用于自动驾驶的环视相机深度与区域引导三维占据栅格预测 / Dr.Occ: Depth- and Region-Guided 3D Occupancy from Surround-View Cameras for Autonomous Driving


1️⃣ 一句话总结

这篇论文提出了一个名为Dr.Occ的新方法,它通过利用高质量的深度信息来提升三维空间几何建模的准确性,并采用类似专家混合的策略来更好地处理不同区域(如天空、道路、车辆)的语义差异,从而显著提升了自动驾驶系统仅凭视觉感知三维语义场景的能力。

源自 arXiv: 2603.01007