一种同步的音频-视觉多视角采集系统 / A Synchronized Audio-Visual Multi-View Capture System
1️⃣ 一句话总结
这篇论文介绍了一个新型的多视角采集系统,它首次将同步音频和视频信号置于同等核心地位,解决了以往系统在严格对齐音视频方面的不足,为精细研究对话行为(如话轮转换和韵律)提供了可靠的大规模数据采集方案。
Multi-view capture systems have been an important tool in research for recording human motion under controlling conditions. Most existing systems are specified around video streams and provide little or no support for audio acquisition and rigorous audio-video alignment, despite both being essential for studying conversational interaction where timing at the level of turn-taking, overlap, and prosody matters. In this technical report, we describe an audio-visual multi-view capture system that addresses this gap by treating synchronized audio and synchronized video as first-class signals. The system combines a multi-camera pipeline with multi-channel microphone recording under a unified timing architecture and provides a practical workflow for calibration, acquisition, and quality control that supports repeatable recordings at scale. We quantify synchronization performance in deployment and show that the resulting recordings are temporally consistent enough to support fine-grained analysis and data-driven modeling of conversation behavior.
一种同步的音频-视觉多视角采集系统 / A Synchronized Audio-Visual Multi-View Capture System
这篇论文介绍了一个新型的多视角采集系统,它首次将同步音频和视频信号置于同等核心地位,解决了以往系统在严格对齐音视频方面的不足,为精细研究对话行为(如话轮转换和韵律)提供了可靠的大规模数据采集方案。
源自 arXiv: 2603.23089