菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-03-30
📄 Abstract - BiFormer3D: Grid-Free Time-Domain Reconstruction of Head-Related Impulse Responses with a Spatially Encoded Transformer

Individualized head-related impulse responses (HRIRs) enable binaural rendering, but dense per-listener measurements are costly. We address HRIR spatial up-sampling from sparse per-listener measurements: given a few measured HRIRs for a listener, predict HRIRs at unmeasured target directions. Prior learning methods often work in the frequency domain, rely on minimum-phase assumptions or separate timing models, and use a fixed direction grid, which can degrade temporal fidelity and spatial continuity. We propose BiFormer3D, a time-domain, grid-free binaural Transformer for reconstructing HRIRs at arbitrary directions from sparse inputs. It uses sinusoidal spatial features, a Conv1D refinement module, and auxiliary interaural time difference (ITD) and interaural level difference (ILD) heads. On SONICOM, it improves normalized mean squared error (NMSE), cosine distance, and ITD/ILD errors over prior methods; ablations validate modules and show minimum-phase pre-processing is unnecessary.

顶级标签: audio systems model training
详细标签: head-related impulse responses spatial audio transformer binaural rendering time-domain reconstruction 或 搜索:

BiFormer3D:一种基于空间编码Transformer的、无网格的时域头相关脉冲响应重建方法 / BiFormer3D: Grid-Free Time-Domain Reconstruction of Head-Related Impulse Responses with a Spatially Encoded Transformer


1️⃣ 一句话总结

这篇论文提出了一种名为BiFormer3D的新方法,它能够仅用少量测量数据,就在任意方向上高质量地重建出个性化的、包含完整时间细节的3D声音定位信号,从而让虚拟现实和增强现实中的声音定位效果更真实、更连续。

源自 arXiv: 2603.27998