菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-05-18
📄 Abstract - GRASP: Deterministic argument ranking in interaction graphs

Large language models are increasingly deployed as automated judges to evaluate the strength of arguments. As this role expands, their legitimacy depends on consistency, transparency, and the ability to separate argumentative structure from rhetorical appeal. However, we show that holistic judging - a common LLM-as-a-Judge practice where a model provides a global verdict on a debate - suffers from substantial inter-model disagreement. We argue that this instability arises from collapsing a debate's complex interaction structure into a single opaque score. To address this, we propose GRASP (Gradual Ranking with Attacks and Support Propagation), a deterministic framework that aggregates stable local interaction judgments into a global ranking via a convergent attack--defense propagation operator. We show that local interaction judgments are more reproducible than holistic rankings in LLM-as-a-Judge evaluations, allowing GRASP to produce more consistent global rankings. We further show that GRASP scores do not correlate with human "convincingness" labels, highlighting a vital sociotechnical distinction: GRASP does not measure persuasion, factuality, or rhetorical appeal, but structural sufficiency - a defense-aware notion of argument robustness over the explicit interaction graph. Overall, GRASP offers a transparent and auditable alternative to holistic LLM judging.

顶级标签: llm agents natural language processing
详细标签: argument ranking interaction graph llm-as-a-judge deterministic sociotechnical 或 搜索:

GRASP:交互图中确定性论点排名方法 / GRASP: Deterministic argument ranking in interaction graphs


1️⃣ 一句话总结

针对大型语言模型作为裁判时全局评分不稳定且难以解释的问题,本文提出GRASP框架,通过确定性传播算法将辩论中局部的支持与反驳关系聚合为全局排名,从而提供一种比传统整体打分更一致、更透明、且专注于论点结构鲁棒性而非说服力的评估方法。

源自 arXiv: 2605.19141