菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-04-20
📄 Abstract - FreezeEmpath: Efficient Training for Empathetic Spoken Chatbots with Frozen LLMs

Empathy is essential for fostering natural interactions in spoken dialogue systems, as it enables machines to recognize the emotional tone of human speech and deliver empathetic responses. Recent research has made significant progress in developing empathetic spoken chatbots based on large language models (LLMs). However, several challenges still exist when training such models, including reliance on costly empathetic speech instruction data and a lack of emotional expressiveness in the generated speech. Finetuning LLM with cross-modal empathetic instruction data may also lead to catastrophic forgetting and a degradation of its general capability. To address these challenges, we propose FreezeEmpath, an end-to-end empathetic spoken chatbot trained in a simple and efficient manner. The entire training process relies solely on existing speech instruction data and speech emotion recognition (SER) data, while keeping the LLM's parameters frozen. Experiments demonstrate that FreezeEmpath is able to generate emotionally expressive speech and outperforms other empathetic models in empathetic dialogue, SER, and SpokenQA tasks, demonstrating the effectiveness of our training strategy.

顶级标签: llm audio multi-modal
详细标签: empathetic spoken chatbot speech emotion recognition frozen llm efficient training spokendialogue 或 搜索:

FreezeEmpath:利用冻结大语言模型高效训练共情语音聊天机器人 / FreezeEmpath: Efficient Training for Empathetic Spoken Chatbots with Frozen LLMs


1️⃣ 一句话总结

本文提出了一种名为FreezeEmpath的方法,通过冻结大语言模型参数,仅使用现有语音指令和情感识别数据,以简单高效的方式训练出能生成富有情感表达且保持通用能力的端到端共情语音聊天机器人,解决了传统方法依赖昂贵数据和模型退化的问题。

源自 arXiv: 2604.18159