Skyra: AI-Generated Video Detection via Grounded Artifact Reasoning

📄 Abstract - Skyra: AI-Generated Video Detection via Grounded Artifact Reasoning

The misuse of AI-driven video generation technologies has raised serious social concerns, highlighting the urgent need for reliable AI-generated video detectors. However, most existing methods are limited to binary classification and lack the necessary explanations for human interpretation. In this paper, we present Skyra, a specialized multimodal large language model (MLLM) that identifies human-perceivable visual artifacts in AI-generated videos and leverages them as grounded evidence for both detection and explanation. To support this objective, we construct ViF-CoT-4K for Supervised Fine-Tuning (SFT), which represents the first large-scale AI-generated video artifact dataset with fine-grained human annotations. We then develop a two-stage training strategy that systematically enhances our model's spatio-temporal artifact perception, explanation capability, and detection accuracy. To comprehensively evaluate Skyra, we introduce ViF-Bench, a benchmark comprising 3K high-quality samples generated by over ten state-of-the-art video generators. Extensive experiments demonstrate that Skyra surpasses existing methods across multiple benchmarks, while our evaluation yields valuable insights for advancing explainable AI-generated video detection.

Skyra：基于具象化伪影推理的AI生成视频检测 / Skyra: AI-Generated Video Detection via Grounded Artifact Reasoning

1️⃣ 一句话总结

这篇论文提出了一个名为Skyra的新型多模态大模型，它通过识别并解释AI生成视频中人类可感知的视觉伪影，不仅能高精度检测假视频，还能提供易于理解的检测依据，从而解决了现有方法只能简单分类而无法解释的局限性。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要