大语言模型辩论中的网络效应与共识漂移 / Network Effects and Agreement Drift in LLM Debates
1️⃣ 一句话总结
这项研究发现,在模拟群体辩论时,大语言模型代理会表现出一种‘共识漂移’的倾向,即更容易被说服并倒向特定立场,这表明在将AI群体作为人类行为代理进行研究前,必须仔细区分其行为是源于真实的社会结构效应还是模型自身偏见。
Large Language Models (LLMs) have demonstrated an unprecedented ability to simulate human-like social behaviors, making them useful tools for simulating complex social systems. However, it remains unclear to what extent these simulations can be trusted to accurately capture key social mechanisms, particularly in highly unbalanced contexts involving minority groups. This paper uses a network generation model with controlled homophily and class sizes to examine how LLM agents behave collectively in multi-round debates. Moreover, our findings highlight a particular directional susceptibility that we term \textit{agreement drift}, in which agents are more likely to shift toward specific positions on the opinion scale. Overall, our findings highlight the need to disentangle structural effects from model biases before treating LLM populations as behavioral proxies for human groups.
大语言模型辩论中的网络效应与共识漂移 / Network Effects and Agreement Drift in LLM Debates
这项研究发现,在模拟群体辩论时,大语言模型代理会表现出一种‘共识漂移’的倾向,即更容易被说服并倒向特定立场,这表明在将AI群体作为人类行为代理进行研究前,必须仔细区分其行为是源于真实的社会结构效应还是模型自身偏见。
源自 arXiv: 2604.11312