当验证失败时:组合上不可行的主张如何逃脱被拒绝的命运 / When Verification Fails: How Compositionally Infeasible Claims Escape Rejection
1️⃣ 一句话总结
这篇论文发现,现有的科学主张验证模型存在一个普遍缺陷:它们只关注最显眼的证据,而忽略了组合性证据的整体验证,导致许多看似合理但实际矛盾的主张被错误接受。
Scientific claim verification, the task of determining whether claims are entailed by scientific evidence, is fundamental to establishing discoveries in evidence while preventing misinformation. This process involves evaluating each asserted constraint against validated evidence. Under the Closed-World Assumption (CWA), a claim is accepted if and only if all asserted constraints are positively supported. We show that existing verification benchmarks cannot distinguish models enforcing this standard from models applying a simpler shortcut called salient-constraint checking, which applies CWA's rejection criterion only to the most salient constraint and accepts when that constraint is supported. Because existing benchmarks construct infeasible claims by perturbing a single salient element they are insufficient at distinguishing between rigorous claim verification and simple salient-constraint reliance. To separate the two, we construct compositionally infeasible claims where the salient constraint is supported but a non-salient constraint is contradicted. Across model families and modalities, models that otherwise saturate existing benchmarks consistently over-accept these claims, confirming the prevalence of such shortcut reasoning. Via model context interventions, we show that different models and prompting strategies occupy distinct positions on a shared ROC curve, indicating that the gap between model families reflects differences in verification threshold rather than underlying reasoning ability, and that the compositional inference bottleneck is a structural property of current verification behavior that strategy guidance alone cannot overcome.
当验证失败时:组合上不可行的主张如何逃脱被拒绝的命运 / When Verification Fails: How Compositionally Infeasible Claims Escape Rejection
这篇论文发现,现有的科学主张验证模型存在一个普遍缺陷:它们只关注最显眼的证据,而忽略了组合性证据的整体验证,导致许多看似合理但实际矛盾的主张被错误接受。
源自 arXiv: 2604.10990