菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-06-17
📄 Abstract - IndicContextEval: A Benchmark for Evaluating Context Utilisation in Audio Large Language Models Across 8 Indic Languages

AudioLLMs enable speech recognition conditioned on textual prompts such as domain descriptions or entity lists. However, it remains unclear whether these models genuinely utilise such context or rely on parametric knowledge learned during pretraining. Existing benchmarks cannot answer this question because they evaluate transcription under fixed prompting conditions and rarely include explicit contextual inputs. We introduce IndicContextEval, a 56-hour multilingual benchmark of natural speech from 555 speakers across 8 Indian languages and 23 professional domains. We design a 7-level prompting framework that progressively introduces contextual signals, including metadata, natural-language descriptions, entity lists in English and native script, and adversarial prompts with incorrect entities. Evaluating five models reveals substantial differences in context utilisation behaviour, highlighting the need for explicit evaluation of contextual grounding in AudioLLMs.

顶级标签: audio benchmark natural language processing
详细标签: audio llms context utilisation indic languages speech recognition evaluation benchmark 或 搜索:

IndicContextEval:评估音频大语言模型在8种印度语言中上下文利用能力的基准测试 / IndicContextEval: A Benchmark for Evaluating Context Utilisation in Audio Large Language Models Across 8 Indic Languages


1️⃣ 一句话总结

为了检验音频大语言模型是否真正利用文本提示中的上下文(如领域或实体列表)来提升语音识别效果,而非仅依赖模型自身记忆,作者构建了一个涵盖8种印度语言、555位发言人和23个专业领域的56小时多语言基准测试,并设计了7级渐进式提示框架,结果发现不同模型在利用上下文的能力上存在显著差异。

源自 arXiv: 2606.19157