菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-03-15
📄 Abstract - GenState-AI: State-Aware Dataset for Text-to-Video Retrieval on AI-Generated Videos

Existing text-to-video retrieval benchmarks are dominated by real-world footage where much of the semantics can be inferred from a single frame, leaving temporal reasoning and explicit end-state grounding under-evaluated. We introduce GenState-AI, an AI-generated benchmark centered on controlled state transitions, where each query is paired with a main video, a temporal hard negative that differs only in the decisive end-state, and a semantic hard negative with content substitution, enabling fine-grained diagnosis of temporal vs. semantic confusions beyond appearance matching. Using Wan2.2-TI2V-5B, we generate short clips whose meaning depends on precise changes in position, quantity, and object relations, providing controllable evaluation conditions for state-aware retrieval. We evaluate two representative MLLM-based baselines, and observe consistent and interpretable failure patterns: both frequently confuse the main video with the temporal hard negative and over-prefer temporally plausible but end-state-incorrect clips, indicating insufficient grounding to decisive end-state evidence, while being comparatively less sensitive to semantic substitutions. We further introduce triplet-based diagnostic analyses, including relative-order statistics and breakdowns across transition categories, to make temporal vs. semantic failure sources explicit. GenState-AI provides a focused testbed for state-aware, temporally and semantically sensitive text-to-video retrieval, and will be released on this http URL.

顶级标签: benchmark multi-modal aigc
详细标签: text-to-video retrieval ai-generated video state transitions temporal reasoning evaluation benchmark 或 搜索:

GenState-AI:面向AI生成视频的文本-视频检索的状态感知数据集 / GenState-AI: State-Aware Dataset for Text-to-Video Retrieval on AI-Generated Videos


1️⃣ 一句话总结

这篇论文提出了一个名为GenState-AI的AI生成视频数据集,专门用于测试和诊断文本-视频检索模型是否真正理解视频中物体状态(如位置、数量)的精确变化,而不仅仅是匹配画面内容。

源自 arXiv: 2603.14426