From Speech-to-Spatial: Grounding Utterances on A Live Shared View with Augmented Reality

📄 Abstract - From Speech-to-Spatial: Grounding Utterances on A Live Shared View with Augmented Reality

We introduce Speech-to-Spatial, a referent disambiguation framework that converts verbal remote-assistance instructions into spatially grounded AR guidance. Unlike prior systems that rely on additional cues (e.g., gesture, gaze) or manual expert annotations, Speech-to-Spatial infers the intended target solely from spoken references (speech input). Motivated by our formative study of speech referencing patterns, we characterize recurring ways people specify targets (Direct Attribute, Relational, Remembrance, and Chained) and ground them to our object-centric relational graph. Given an utterance, referent cues are parsed and rendered as persistent in-situ AR visual guidance, reducing iterative micro-guidance ("a bit more to the right", "now, stop.") during remote guidance. We demonstrate the use cases of our system with remote guided assistance and intent disambiguation scenarios. Our evaluation shows that Speechto-Spatial improves task efficiency, reduces cognitive load, and enhances usability compared to a conventional voice-only baseline, transforming disembodied verbal instruction into visually explainable, actionable guidance on a live shared view.

从语音到空间：在增强现实的实时共享视图中将语音指令空间化 / From Speech-to-Spatial: Grounding Utterances on A Live Shared View with Augmented Reality

1️⃣ 一句话总结

这篇论文提出了一种名为‘语音到空间’的系统，它能将远程协助中的纯语音指令自动转换成增强现实（AR）中的可视化空间指引，从而减少沟通中的反复确认，提高任务效率和用户体验。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要