菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-03-31
📄 Abstract - Storing Less, Finding More: How Novelty Filtering Improves Cross-Modal Retrieval on Edge Cameras

Always-on edge cameras generate continuous video streams where redundant frames degrade cross-modal retrieval by crowding correct results out of top-k search. This paper presents a streaming retrieval architecture: an on-device epsilon-net filter retains only semantically novel frames, building a denoised embedding index; a cross-modal adapter and cloud re-ranker compensate for the compact encoder's weak alignment. A single-pass streaming filter outperforms offline alternatives (k-means, farthest-point, uniform, random) across eight vision-language models (8M-632M) on two egocentric datasets (AEA, EPIC-KITCHENS). Combined, the architecture reaches 45.6% Hit@5 on held-out data using an 8M on-device encoder at an estimated 2.7 mW.

顶级标签: computer vision systems model evaluation
详细标签: edge computing cross-modal retrieval novelty filtering video streams efficient inference 或 搜索:

存储更少,发现更多:新颖性过滤如何提升边缘摄像头的跨模态检索性能 / Storing Less, Finding More: How Novelty Filtering Improves Cross-Modal Retrieval on Edge Cameras


1️⃣ 一句话总结

这篇论文提出了一种用于边缘摄像头的流式检索架构,通过在设备端过滤掉语义重复的视频帧来提升跨模态检索效率,使得使用小型编码器也能达到与大型模型相当的检索精度。

源自 arXiv: 2603.29631