LongEmotion: Measuring Emotional Intelligence of Large Language Models in Long-Context Interaction

📄 Abstract - LongEmotion: Measuring Emotional Intelligence of Large Language Models in Long-Context Interaction

Large language models (LLMs) make significant progress in Emotional Intelligence (EI) and long-context understanding. However, existing benchmarks tend to overlook certain aspects of EI in long-context scenarios, especially under realistic, practical settings where interactions are lengthy, diverse, and often noisy. To move towards such realistic settings, we present LongEmotion, a benchmark specifically designed for long-context EI tasks. It covers a diverse set of tasks, including Emotion Classification, Emotion Detection, Emotion QA, Emotion Conversation, Emotion Summary, and Emotion Expression. On average, the input length for these tasks reaches 8,777 tokens, with long-form generation required for Emotion Expression. To enhance performance under realistic constraints, we incorporate Retrieval-Augmented Generation (RAG) and Collaborative Emotional Modeling (CoEM), and compare them with standard prompt-based methods. Unlike conventional approaches, our RAG method leverages both the conversation context and the large language model itself as retrieval sources, avoiding reliance on external knowledge bases. The CoEM method further improves performance by decomposing the task into five stages, integrating both retrieval augmentation and limited knowledge injection. Experimental results show that both RAG and CoEM consistently enhance EI-related performance across most long-context tasks, advancing LLMs toward more practical and real-world EI applications. Furthermore, we conducted a comparative case study experiment on the GPT series to demonstrate the differences among various models in terms of EI. Code is available on GitHub at this https URL, and the project page can be found at this https URL.

📄 论文总结

LongEmotion：衡量大语言模型在长上下文交互中的情感智能 / LongEmotion: Measuring Emotional Intelligence of Large Language Models in Long-Context Interaction

1️⃣ 一句话总结

这篇论文提出了一个名为LongEmotion的基准测试，专门用于评估大语言模型在长文本交互中的情感智能表现，并通过引入检索增强生成和协作情感建模方法，有效提升了模型在真实复杂场景下的情感理解与表达能力。

← 返回列表

菜单

🤖 AI 深度阅读

📄 论文总结

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

🤖 AI 深度阅读

📄 论文总结

1️⃣ 一句话总结

获取最新论文摘要