📄 论文总结
NAF:通过邻域注意力滤波实现零样本特征上采样 / NAF: Zero-Shot Feature Upsampling via Neighborhood Attention Filtering
1️⃣ 一句话总结
这篇论文提出了一种名为NAF的零样本特征上采样方法,它通过学习自适应权重来提升任意视觉基础模型生成的低分辨率特征图,无需重新训练就能在多个任务中达到最先进的性能,同时保持高效率。
Vision Foundation Models (VFMs) extract spatially downsampled representations, posing challenges for pixel-level tasks. Existing upsampling approaches face a fundamental trade-off: classical filters are fast and broadly applicable but rely on fixed forms, while modern upsamplers achieve superior accuracy through learnable, VFM-specific forms at the cost of retraining for each VFM. We introduce Neighborhood Attention Filtering (NAF), which bridges this gap by learning adaptive spatial-and-content weights through Cross-Scale Neighborhood Attention and Rotary Position Embeddings (RoPE), guided solely by the high-resolution input image. NAF operates zero-shot: it upsamples features from any VFM without retraining, making it the first VFM-agnostic architecture to outperform VFM-specific upsamplers and achieve state-of-the-art performance across multiple downstream tasks. It maintains high efficiency, scaling to 2K feature maps and reconstructing intermediate-resolution maps at 18 FPS. Beyond feature upsampling, NAF demonstrates strong performance on image restoration, highlighting its versatility. Code and checkpoints are available at this https URL.
NAF:通过邻域注意力滤波实现零样本特征上采样 / NAF: Zero-Shot Feature Upsampling via Neighborhood Attention Filtering
这篇论文提出了一种名为NAF的零样本特征上采样方法,它通过学习自适应权重来提升任意视觉基础模型生成的低分辨率特征图,无需重新训练就能在多个任务中达到最先进的性能,同时保持高效率。