IndicParam: Benchmark to evaluate LLMs on low-resource Indic Languages

📄 Abstract - IndicParam: Benchmark to evaluate LLMs on low-resource Indic Languages

While large language models excel on high-resource multilingual tasks, low- and extremely low-resource Indic languages remain severely under-evaluated. We present IndicParam, a human-curated benchmark of over 13,000 multiple-choice questions covering 11 such languages (Nepali, Gujarati, Marathi, Odia as low-resource; Dogri, Maithili, Rajasthani, Sanskrit, Bodo, Santali, Konkani as extremely low-resource) plus Sanskrit-English code-mixed set. We evaluated 19 LLMs, both proprietary and open-weights, which reveals that even the top-performing GPT-5 reaches only 45.0% average accuracy, followed by DeepSeek-3.2 (43.1) and Claude-4.5 (42.7). We additionally label each question as knowledge-oriented or purely linguistic to discriminate factual recall from grammatical proficiency. Further, we assess the ability of LLMs to handle diverse question formats-such as list-based matching, assertion-reason pairs, and sequence ordering-alongside conventional multiple-choice questions. IndicParam provides insights into limitations of cross-lingual transfer and establishes a challenging benchmark for Indic languages. The dataset is available at this https URL. Scripts to run benchmark are present at this https URL.

IndicParam：评估大语言模型在低资源印度语言上的基准 / IndicParam: Benchmark to evaluate LLMs on low-resource Indic Languages

1️⃣ 一句话总结

该论文提出了一个名为IndicParam的人工标注基准，包含超过1.3万道选择题，用于系统评估大语言模型在11种低资源印度语言上的表现，结果显示即使是顶尖模型在这些语言上的平均准确率也不足50%，揭示了跨语言迁移的局限性。

← 返回列表

菜单

🤖 AI 深度阅读

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

🤖 AI 深度阅读

1️⃣ 一句话总结

获取最新论文摘要