CURE-Med:基于课程学习的强化学习框架用于多语言医学推理 / CURE-Med: Curriculum-Informed Reinforcement Learning for Multilingual Medical Reasoning
1️⃣ 一句话总结
该论文提出了一个结合课程学习思想的强化学习框架,通过构建多语言医学推理数据集并优化模型训练方法,显著提升了大型语言模型在多种语言(包括资源匮乏语言)下进行医学推理的逻辑准确性和语言一致性。
While large language models (LLMs) have shown to perform well on monolingual mathematical and commonsense reasoning, they remain unreliable for multilingual medical reasoning applications, hindering their deployment in multilingual healthcare settings. We address this by first introducing CUREMED-BENCH, a high-quality multilingual medical reasoning dataset with open-ended reasoning queries with a single verifiable answer, spanning thirteen languages, including underrepresented languages such as Amharic, Yoruba, and Swahili. Building on this dataset, we propose CURE-MED, a curriculum-informed reinforcement learning framework that integrates code-switching-aware supervised fine-tuning and Group Relative Policy Optimization to jointly improve logical correctness and language stability. Across thirteen languages, our approach consistently outperforms strong baselines and scales effectively, achieving 85.21% language consistency and 54.35% logical correctness at 7B parameters, and 94.96% language consistency and 70.04% logical correctness at 32B parameters. These results support reliable and equitable multilingual medical reasoning in LLMs. The code and dataset are available at this https URL
CURE-Med:基于课程学习的强化学习框架用于多语言医学推理 / CURE-Med: Curriculum-Informed Reinforcement Learning for Multilingual Medical Reasoning
该论文提出了一个结合课程学习思想的强化学习框架,通过构建多语言医学推理数据集并优化模型训练方法,显著提升了大型语言模型在多种语言(包括资源匮乏语言)下进行医学推理的逻辑准确性和语言一致性。
源自 arXiv: 2601.13262