← 返回列表

arXiv 提交日期: 2026-03-31

📄 Abstract - Storing Less, Finding More: How Novelty Filtering Improves Cross-Modal Retrieval on Edge Cameras

Always-on edge cameras generate continuous video streams where redundant frames degrade cross-modal retrieval by crowding correct results out of top-k search. This paper presents a streaming retrieval architecture: an on-device epsilon-net filter retains only semantically novel frames, building a denoised embedding index; a cross-modal adapter and cloud re-ranker compensate for the compact encoder's weak alignment. A single-pass streaming filter outperforms offline alternatives (k-means, farthest-point, uniform, random) across eight vision-language models (8M-632M) on two egocentric datasets (AEA, EPIC-KITCHENS). Combined, the architecture reaches 45.6% Hit@5 on held-out data using an 8M on-device encoder at an estimated 2.7 mW.

顶级标签: computer vision systems model evaluation

存储更少，发现更多：新颖性过滤如何提升边缘摄像头的跨模态检索性能 / Storing Less, Finding More: How Novelty Filtering Improves Cross-Modal Retrieval on Edge Cameras

1️⃣ 一句话总结

这篇论文提出了一种用于边缘摄像头的流式检索架构，通过在设备端过滤掉语义重复的视频帧来提升跨模态检索效率，使得使用小型编码器也能达到与大型模型相当的检索精度。

👋 没兴趣 ☆ 感兴趣 📌 待读

打开原文 PDF

源自 arXiv: 2603.29631

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要