文明的假面:大语言模型中文反讽式礼貌理解能力基准测试 / The Mask of Civility: Benchmarking Chinese Mock Politeness Comprehension in Large Language Models
1️⃣ 一句话总结
这项研究通过构建一个包含真实与模拟中文语料的数据集,系统评估了GPT-5.1、DeepSeek等六种主流大语言模型在识别中文礼貌、不礼貌及反讽式礼貌现象上的表现差异,为语言学理论与人工智能技术的跨学科融合提供了新思路。
From a pragmatic perspective, this study systematically evaluates the differences in performance among representative large language models (LLMs) in recognizing politeness, impoliteness, and mock politeness phenomena in Chinese. Addressing the existing gaps in pragmatic comprehension, the research adopts the frameworks of Rapport Management Theory and the Model of Mock Politeness to construct a three-category dataset combining authentic and simulated Chinese discourse. Six representative models, including GPT-5.1 and DeepSeek, were selected as test subjects and evaluated under four prompting conditions: zero-shot, few-shot, knowledge-enhanced, and hybrid strategies. This study serves as a meaningful attempt within the paradigm of ``Great Linguistics,'' offering a novel approach to applying pragmatic theory in the age of technological transformation. It also responds to the contemporary question of how technology and the humanities may coexist, representing an interdisciplinary endeavor that bridges linguistic technology and humanistic reflection.
文明的假面:大语言模型中文反讽式礼貌理解能力基准测试 / The Mask of Civility: Benchmarking Chinese Mock Politeness Comprehension in Large Language Models
这项研究通过构建一个包含真实与模拟中文语料的数据集,系统评估了GPT-5.1、DeepSeek等六种主流大语言模型在识别中文礼貌、不礼貌及反讽式礼貌现象上的表现差异,为语言学理论与人工智能技术的跨学科融合提供了新思路。
源自 arXiv: 2602.03107