菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-02-18
📄 Abstract - AREG: Adversarial Resource Extraction Game for Evaluating Persuasion and Resistance in Large Language Models

Evaluating the social intelligence of Large Language Models (LLMs) increasingly requires moving beyond static text generation toward dynamic, adversarial interaction. We introduce the Adversarial Resource Extraction Game (AREG), a benchmark that operationalizes persuasion and resistance as a multi-turn, zero-sum negotiation over financial resources. Using a round-robin tournament across frontier models, AREG enables joint evaluation of offensive (persuasion) and defensive (resistance) capabilities within a single interactional framework. Our analysis provides evidence that these capabilities are weakly correlated ($\rho = 0.33$) and empirically dissociated: strong persuasive performance does not reliably predict strong resistance, and vice versa. Across all evaluated models, resistance scores exceed persuasion scores, indicating a systematic defensive advantage in adversarial dialogue settings. Further linguistic analysis suggests that interaction structure plays a central role in these outcomes. Incremental commitment-seeking strategies are associated with higher extraction success, while verification-seeking responses are more prevalent in successful defenses than explicit refusal. Together, these findings indicate that social influence in LLMs is not a monolithic capability and that evaluation frameworks focusing on persuasion alone may overlook asymmetric behavioral vulnerabilities.

顶级标签: llm benchmark agents
详细标签: adversarial interaction social intelligence persuasion resistance evaluation framework 或 搜索:

AREG:用于评估大语言模型说服与抵抗能力的对抗性资源提取博弈 / AREG: Adversarial Resource Extraction Game for Evaluating Persuasion and Resistance in Large Language Models


1️⃣ 一句话总结

这篇论文提出了一个名为AREG的对抗性谈判游戏基准,用于同时评估大语言模型的说服力和抵抗力,发现这两种能力关联性弱且模型普遍更擅长防守,表明仅评估说服力会忽略其行为中的不对称弱点。

源自 arXiv: 2602.16639