菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-06-11
📄 Abstract - Layer-Resolved Optimal Transport for Hallucination Detection in NMT and Abstractive Summarization

Optimal transport (OT) has been shown to detect hallucinations in neural machine translation (NMT) by measuring the geometric distance between cross-attention distributions and a reference distribution, without any supervision. We extend this analysis to all six decoder layers of the Fairseq DE-EN model ($N=3{,}414$), showing that Wass-to-Unif and Wass-to-Data are complementary detectors specialised across hallucination types, that detection is concentrated in layers L1--L4 with L5 anti-predictive for subtler types, and that hallucinated translations lack the exploratory attention phase present in correct translations from the first decoding step. We further evaluate whether the geometric signal transfers to abstractive summarization faithfulness detection: our unsupervised OT detector on AggreFact ($N=1{,}116$) achieves $57.2\%$/$57.6\%$ balanced accuracy on CNN/XSum -- above chance but substantially below supervised MiniCheck-Flan-T5-L($69.9\%$/$74.3\%$). This gap is principled: unlike NMT hallucinations, unfaithful summaries can attend correctly to source tokens while misrepresenting their content, a failure mode invisible to concentration-based OT metrics by construction. Structural experiments on T5-base confirm consistent decoder organisation across depth, with Layer~3 showing peak concentration and Layer~12 being most critical for generation quality. Together, the results establish OT on cross-attention as a reliable detector when the failure mode is source disengagement, a principled interpretability tool regardless of task, and fundamentally limited when faithfulness failures occur downstream of attention.

顶级标签: natural language processing llm model evaluation
详细标签: hallucination detection optimal transport neural machine translation abstractive summarization cross-attention analysis 或 搜索:

层分解最优传输:用于神经机器翻译与抽象式摘要中的幻觉检测 / Layer-Resolved Optimal Transport for Hallucination Detection in NMT and Abstractive Summarization


1️⃣ 一句话总结

本文通过分析神经网络翻译和摘要生成模型中各解码层的注意力分布,发现最优传输距离可以有效检测模型是否脱离源文本产生幻觉内容,但该方法仅适用于注意力与原文脱节的情况,对于注意力正确但语义歪曲的摘要幻觉则无能为力。

源自 arXiv: 2606.13216