← 返回列表

arXiv 提交日期: 2026-06-11

📄 Abstract - Adaptive Turn-Taking for Real-time Multi-Party Voice Agents

Turn-taking in multi-party spoken conversations remains a fundamental challenge for voice-based agents, particularly under dynamic floor competition and varying user expectations. We propose ModeratorLM, a role-playing voice agent that conditions turn-taking behavior on an explicitly assigned role in multi-party settings. The system is built on a speech large language model operating in chunk-wise streaming manner. We further introduce a reasoning-augmented variant that incorporates chain-of-thought reasoning over conversational context and the assigned role. We construct RolePlayConv, a large-scale synthetic dataset of spoken multi-party conversations with diverse assistant roles. Experiments on real-world meeting data and RolePlayConv show improved turn-taking precision by over 40% and recall by more than 70%, while substantially reducing false-positive interruptions compared to non-role-conditioned baselines.

顶级标签: multi-modal agents

面向实时多方语音代理的自适应话轮转换 / Adaptive Turn-Taking for Real-time Multi-Party Voice Agents

1️⃣ 一句话总结

本文提出了一种名为ModeratorLM的语音代理系统，通过为系统分配明确的角色（如主持人或参与者）并利用流式语音大语言模型结合逻辑推理，显著提升了多方对话中话轮转换的准确率和召回率，同时减少了不必要的插话干扰。

👋 没兴趣 ☆ 感兴趣 📌 待读

打开原文 PDF

源自 arXiv: 2606.13544

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要