菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-05-26
📄 Abstract - When Eyes Betray AI: Social Gaze Consistency as a Semantic Cue for AI-Generated Image Detection

Recent generative models have largely closed the gap on low-level artifacts - pixel fingerprints, frequency anomalies, upsampling traces - particularly in person-centric and partial-edit settings where the manipulated region is small and surrounded by photometrically authentic content. We introduce Social Gaze Consistency, a high-level semantic cue defined as the mutual coherence of gaze direction, head-eye alignment, and pupil placement between interacting individuals, and show that it constitutes a previously underutilized detection axis orthogonal to existing low-level paradigms. We instantiate this insight through three coupled mechanisms: (i) a controlled diagnostic dataset with region-specific perturbations of gaze-consistent imagery, where strict pair-level grouping forecloses generator-fingerprint memorization as an optimization-time shortcut rather than relying on augmentation; (ii) Block-Compositional Caption Supervision, which holds a single 5-block reasoning skeleton invariant across 1,250 macro-combined captions, decoupling reasoning consistency from surface diversity; (iii) Cross-architecture validation showing the same supervision improves a vision-language backbone (FakeVLM) by +3.7 pp on the COCOAI Interaction subset (balanced accuracy 67.8 -> 71.5) and +1.3 pp on the COCOAI Person subset (83.0 -> 84.3), with consistent gains on a vision-only backbone (Effort), evidencing a backbone-agnostic cue. Real- and fake-class recalls rise simultaneously, ruling out a "predict-all-fake" artifact. A four-step mechanistic account - paired-edit shortcut blocking, hard-to-easy difficulty transfer, CLIP prior preservation, and diffusion-family shared spectral weakness in periocular structure - explains why training on a single inpainter (FLUX.1-Fill) transfers to multi-generator suites. We will release the code upon acceptance to facilitate reproducibility.

顶级标签: computer vision machine learning model evaluation
详细标签: ai-generated image detection gaze consistency semantic cue benchmark vision-language model 或 搜索:

当眼睛背叛AI:社交凝视一致性作为AI生成图像检测的语义线索 / When Eyes Betray AI: Social Gaze Consistency as a Semantic Cue for AI-Generated Image Detection


1️⃣ 一句话总结

本文提出利用人与人之间眼神方向、头部与眼睛对齐以及瞳孔位置的相互一致性(即“社交凝视一致性”)作为检测AI生成图像的新线索,通过构建专用数据集和创新的描述监督方法,在多个现有检测模型上显著提升了识别真实图像和伪造图像的能力,且这种线索不依赖于具体的生成模型。

源自 arXiv: 2605.27348