菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-04-06
📄 Abstract - NAIMA: Semantics Aware RGB Guided Depth Super-Resolution

Guided depth super-resolution (GDSR) is a multi-modal approach for depth map super-resolution that relies on a low-resolution depth map and a high-resolution RGB image to restore finer structural details. However, the misleading color and texture cues indicating depth discontinuities in RGB images often lead to artifacts and blurred depth boundaries in the generated depth map. We propose a solution that introduces global contextual semantic priors, generated from pretrained vision transformer token embeddings. Our approach to distilling semantic knowledge from pretrained token embeddings is motivated by their demonstrated effectiveness in related monocular depth estimation tasks. We introduce a Guided Token Attention (GTA) module, which iteratively aligns encoded RGB spatial features with depth encodings, using cross-attention for selectively injecting global semantic context extracted from different layers of a pretrained vision transformer. Additionally, we present an architecture called Neural Attention for Implicit Multi-token Alignment (NAIMA), which integrates DINOv2 with GTA blocks for a semantics-aware GDSR. Our proposed architecture, with its ability to distill semantic knowledge, achieves significant improvements over existing methods across multiple scaling factors and datasets.

顶级标签: computer vision multi-modal model training
详细标签: depth super-resolution guided upsampling vision transformers cross-attention semantic priors 或 搜索:

NAIMA:语义感知的RGB引导深度超分辨率 / NAIMA: Semantics Aware RGB Guided Depth Super-Resolution


1️⃣ 一句话总结

这篇论文提出了一种名为NAIMA的新方法,通过引入从预训练视觉模型中提取的全局语义信息,有效解决了在利用高分辨率RGB图像增强低分辨率深度图时,因颜色和纹理误导而产生的边界模糊问题,从而显著提升了深度图超分辨率的精度。

源自 arXiv: 2604.04407