菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-04-22
📄 Abstract - Exploring High-Order Self-Similarity for Video Understanding

Space-time self-similarity (STSS), which captures visual correspondences across frames, provides an effective way to represent temporal dynamics for video understanding. In this work, we explore higher-order STSS and demonstrate how STSSs at different orders reveal distinct aspects of these dynamics. We then introduce the Multi-Order Self-Similarity (MOSS) module, a lightweight neural module designed to learn and integrate multi-order STSS features. It can be applied to diverse video tasks to enhance motion modeling capabilities while consuming only marginal computational cost and memory usage. Extensive experiments on video action recognition, motion-centric video VQA, and real-world robotic tasks consistently demonstrate substantial improvements, validating the broad applicability of MOSS as a general temporal modeling module. The source code and checkpoints will be publicly available.

顶级标签: computer vision machine learning
详细标签: video understanding self-similarity temporal modeling action recognition motion analysis 或 搜索:

探索高阶自相似性以理解视频 / Exploring High-Order Self-Similarity for Video Understanding


1️⃣ 一句话总结

本文提出了一种轻量级的神经网络模块MOSS,能够从视频中提取并整合不同层次的时空自相似性特征,从而以极低的计算成本显著提升动作识别、视频问答和机器人任务等多种视频理解任务的性能。

源自 arXiv: 2604.20760