CoMMET: To What Extent Can LLMs Perform Theory of Mind Tasks?

📄 Abstract - CoMMET: To What Extent Can LLMs Perform Theory of Mind Tasks?

Theory of Mind (ToM)-the ability to reason about the mental states of oneself and others-is a cornerstone of human social intelligence. As Large Language Models (LLMs) become ubiquitous in real-world applications, validating their capacity for this level of social reasoning is essential for effective and natural interactions. However, existing benchmarks for assessing ToM in LLMs are limited; most rely solely on text inputs and focus narrowly on belief-related tasks. In this paper, we propose a new multimodal benchmark dataset, CoMMET, a Comprehensive Mental states and Moral Evaluation Task inspired by the Theory of Mind Booklet Task. CoMMET expands the scope of evaluation by covering a broader range of mental states and introducing multi-turn testing. To the best of our knowledge, this is the first multimodal dataset to evaluate ToM in a multi-turn conversational setting. Through a comprehensive assessment of LLMs across different families and sizes, we analyze the strengths and limitations of current models and identify directions for future improvement. Our work offers a deeper understanding of the social cognitive capabilities of modern LLMs.

CoMMET：大型语言模型能在多大程度上执行心智理论任务？ / CoMMET: To What Extent Can LLMs Perform Theory of Mind Tasks?

1️⃣ 一句话总结

这篇论文提出了一个名为CoMMET的新型多模态评估数据集，用于全面测试大型语言模型理解和推断他人心理状态的能力，发现现有模型在此类社交推理任务上仍有局限，并指出了未来改进方向。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要