对比解码如何增强大型音频语言模型? / How Contrastive Decoding Enhances Large Audio Language Models?
1️⃣ 一句话总结
这项研究通过系统评估发现,对比解码技术能有效纠正大型音频语言模型中‘否认音频存在’或‘依赖猜测’的错误,但无法修正逻辑推理错误,从而为根据模型自身错误特点选择合适的增强策略提供了清晰指导。
While Contrastive Decoding (CD) has proven effective at enhancing Large Audio Language Models (LALMs), the underlying mechanisms driving its success and the comparative efficacy of different strategies remain unclear. This study systematically evaluates four distinct CD strategies across diverse LALM architectures. We identify Audio-Aware Decoding and Audio Contrastive Decoding as the most effective methods. However, their impact varies significantly by model. To explain this variability, we introduce a Transition Matrix framework to map error pattern shifts during inference. Our analysis demonstrates that CD reliably rectifies errors in which models falsely claim an absence of audio or resort to uncertainty-driven guessing. Conversely, it fails to correct flawed reasoning or confident misassertions. Ultimately, these findings provide a clear guideline for determining which LALM architectures are most suitable for CD enhancement based on their baseline error profiles.
对比解码如何增强大型音频语言模型? / How Contrastive Decoding Enhances Large Audio Language Models?
这项研究通过系统评估发现,对比解码技术能有效纠正大型音频语言模型中‘否认音频存在’或‘依赖猜测’的错误,但无法修正逻辑推理错误,从而为根据模型自身错误特点选择合适的增强策略提供了清晰指导。
源自 arXiv: 2603.09232