ActorMind:模拟人类演员推理的语音角色扮演 / ActorMind: Emulating Human Actor Reasoning for Speech Role-Playing
1️⃣ 一句话总结
这篇论文提出了一个名为ActorMind的推理框架和一个配套的基准测试ActorMindBench,旨在让AI模型能够像人类演员一样,根据角色、场景和对话内容,用带有个人特色和情感的语音进行自然流畅的角色扮演,从而弥补当前角色扮演研究主要局限于文本而忽略语音的不足。
Role-playing has garnered rising attention as it provides a strong foundation for human-machine interaction and facilitates sociological research. However, current work is confined to textual modalities, neglecting speech, which plays a predominant role in daily life, thus limiting genuine role-playing. To bridge this gap, we conceptualize and benchmark speech role-playing through ActorMindBench, and we present a corresponding reasoning framework, called ActorMind. Specifically, (1) Speech Role-Playing enables models to deliver spontaneous responses with personalized verbal traits based on their role, the scene, and spoken dialogue. (2) ActorMindBench is a hierarchical benchmark comprises Utterance-Level content with 7,653 utterances, Scene-Level content with 313 scenes, and Role-Level content with 6 roles. (3) ActorMind is an off-the-shelf, multi-agent, chain-of-though style reasoning framework that emulates how human actors perform in theaters. Concretely, ActorMind first reads its assigned role description via Eye Agent, then comprehends emotional cues within contextual spoken dialogues through Ear Agent. Subsequently, Brain Agent generates a descriptive emotional state, and finally, Mouth Agent delivers the scripts infused with corresponding emotion state. Experimental results demonstrate the effectiveness of ActorMind in enhancing speech role-playing.
ActorMind:模拟人类演员推理的语音角色扮演 / ActorMind: Emulating Human Actor Reasoning for Speech Role-Playing
这篇论文提出了一个名为ActorMind的推理框架和一个配套的基准测试ActorMindBench,旨在让AI模型能够像人类演员一样,根据角色、场景和对话内容,用带有个人特色和情感的语音进行自然流畅的角色扮演,从而弥补当前角色扮演研究主要局限于文本而忽略语音的不足。
源自 arXiv: 2604.11103