菜单

关于 🐙 GitHub
arXiv 提交日期: 2025-12-12
📄 Abstract - CLINIC: Evaluating Multilingual Trustworthiness in Language Models for Healthcare

Integrating language models (LMs) in healthcare systems holds great promise for improving medical workflows and decision-making. However, a critical barrier to their real-world adoption is the lack of reliable evaluation of their trustworthiness, especially in multilingual healthcare settings. Existing LMs are predominantly trained in high-resource languages, making them ill-equipped to handle the complexity and diversity of healthcare queries in mid- and low-resource languages, posing significant challenges for deploying them in global healthcare contexts where linguistic diversity is key. In this work, we present CLINIC, a Comprehensive Multilingual Benchmark to evaluate the trustworthiness of language models in healthcare. CLINIC systematically benchmarks LMs across five key dimensions of trustworthiness: truthfulness, fairness, safety, robustness, and privacy, operationalized through 18 diverse tasks, spanning 15 languages (covering all the major continents), and encompassing a wide array of critical healthcare topics like disease conditions, preventive actions, diagnostic tests, treatments, surgeries, and medications. Our extensive evaluation reveals that LMs struggle with factual correctness, demonstrate bias across demographic and linguistic groups, and are susceptible to privacy breaches and adversarial attacks. By highlighting these shortcomings, CLINIC lays the foundation for enhancing the global reach and safety of LMs in healthcare across diverse languages.

顶级标签: llm medical benchmark
详细标签: multilingual evaluation trustworthiness healthcare safety fairness 或 搜索:

CLINIC:评估医疗领域语言模型的多语言可信度 / CLINIC: Evaluating Multilingual Trustworthiness in Language Models for Healthcare


1️⃣ 一句话总结

这篇论文提出了一个名为CLINIC的多语言医疗基准测试,用于系统评估语言模型在真实性、公平性、安全性等五个关键维度的可信度,结果发现现有模型在多语言医疗场景中存在事实错误、偏见和隐私泄露等问题,为提升全球医疗AI的安全性和适用性奠定了基础。


源自 arXiv: 2512.11437