菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-06-01
📄 Abstract - CARTE: A Benchmark for Mapping Language Model Knowledge Across France

We introduce CARTE 1 (Culturally Anchored Regional-Territorial Evaluation), a multiplechoice benchmark for evaluating the ability of large language models (LLMs) to perform fine-grained reasoning over geographically grounded and regionally differentiated knowledge within France. While prior benchmarks focus on national-level cultural understanding, they largely overlook intra-country variation and the need to distinguish between closely related regional contexts. CARTE addresses this gap by introducing 2,431 questions spanning the 13 metropolitan regions of France and covering 14 thematic domains, including culture, language, demographics, economy, environment, and mobility. We further introduce CARTE-LV, a subset targeting Linguistic Variation across French regions, enabling focused evaluation of language-related differences. We evaluate 27 LLMs ranging from 1B to 12B parameters under few-shot settings. Our experiments reveal performance disparities across regions and model scales, suggesting systematic gaps in pretraining coverage and limited robustness to intra-national variation.

顶级标签: llm benchmark
详细标签: regional knowledge cultural evaluation geographical reasoning linguistic variation france 或 搜索:

CARTE:一个用于评估语言模型对法国区域知识掌握程度的基准测试 / CARTE: A Benchmark for Mapping Language Model Knowledge Across France


1️⃣ 一句话总结

本文提出了一个名为CARTE的基准测试,包含2431道选择题,专门用于评估大型语言模型对法国13个行政区域在文化、语言、经济等14个领域的细粒度知识,结果发现模型在区域间表现差异显著且对国内地域变化的鲁棒性不足。

源自 arXiv: 2606.01995