菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-05-07
📄 Abstract - Log-Likelihood, Simpson's Paradox, and the Detection of Machine-Generated Text

The ability to reliably distinguish human-written text from that generated by large language models is of profound societal importance. The dominant approach to this problem exploits the likelihood hypothesis: that machine-generated text should appear more probable to a detector language model than human-written text. However, we demonstrate that the token-level signal distinguishing human and machine text is non-uniform across the hidden space of the detector model, and naively averaging likelihood-based token scores across regions with fundamentally different statistical structure, as most detectors do, causes a form of Simpson's paradox: a strong local signal is destroyed by inappropriate aggregation. To correct for this, we introduce a learned local calibration step grounded in Bayesian decision theory. Rather than aggregating raw token scores, we first learn lightweight predictors of the score distributions conditioned on position in hidden space, and aggregate calibrated log-likelihood ratios instead. This single intervention dramatically and consistently improves detection performance across all baseline detectors and all datasets we consider. For example, our calibrated variant of Fast-DetectGPT improves AUROC from $0.63$ to $0.85$ on GPT-5.4 text, and a locally-calibrated DMAP detector we introduce achieves state-of-the-art performance across the board. That said, our central contribution is not a new detector, but a precise diagnosis of a significant cause of under-performance of existing detectors and a principled, modular remedy compatible with any token-averaging pipeline. This will serve as a foundation for the community to build upon, with natural avenues including richer distributional models, improved calibration strategies, and principled ensembling with hidden-space geometry signals via the full Bayes-optimal decision rule.

顶级标签: llm natural language processing model evaluation
详细标签: machine-generated text detection simpson's paradox likelihood hypothesis calibration bayesian decision theory 或 搜索:

对数似然、辛普森悖论与机器生成文本的检测 / Log-Likelihood, Simpson's Paradox, and the Detection of Machine-Generated Text


1️⃣ 一句话总结

本文发现现有AI文本检测方法因简单平均不同区域的词概率得分而陷入辛普森悖论,通过引入基于贝叶斯决策理论的局部校准技术,显著提升了检测准确率,揭示并修复了主流检测器普遍存在的性能瓶颈。

源自 arXiv: 2605.06294