Training-free Detection of Generated Videos via Spatial-Temporal Likelihoods

📄 Abstract - Training-free Detection of Generated Videos via Spatial-Temporal Likelihoods

Following major advances in text and image generation, the video domain has surged, producing highly realistic and controllable sequences. Along with this progress, these models also raise serious concerns about misinformation, making reliable detection of synthetic videos increasingly crucial. Image-based detectors are fundamentally limited because they operate per frame and ignore temporal dynamics, while supervised video detectors generalize poorly to unseen generators, a critical drawback given the rapid emergence of new models. These challenges motivate zero-shot approaches, which avoid synthetic data and instead score content against real-data statistics, enabling training-free, model-agnostic detection. We introduce \emph{STALL}, a simple, training-free, theoretically justified detector that provides likelihood-based scoring for videos, jointly modeling spatial and temporal evidence within a probabilistic framework. We evaluate STALL on two public benchmarks and introduce ComGenVid, a new benchmark with state-of-the-art generative models. STALL consistently outperforms prior image- and video-based baselines. Code and data are available at this https URL.

基于时空似然性的免训练生成视频检测方法 / Training-free Detection of Generated Videos via Spatial-Temporal Likelihoods

1️⃣ 一句话总结

这篇论文提出了一种名为STALL的免训练检测方法，它通过一个概率框架同时分析视频的空间和时间特征来识别AI生成的假视频，无需依赖特定生成器的数据就能有效应对新型模型，在多个测试中表现优于现有方法。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要