菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-03-17
📄 Abstract - MedCL-Bench: Benchmarking stability-efficiency trade-offs and scaling in biomedical continual learning

Medical language models must be updated as evidence and terminology evolve, yet sequential updating can trigger catastrophic forgetting. Although biomedical NLP has many static benchmarks, no unified, task-diverse benchmark exists for evaluating continual learning under standardized protocols, robustness to task order and compute-aware reporting. We introduce MedCL-Bench, which streams ten biomedical NLP datasets spanning five task families and evaluates eleven continual learning strategies across eight task orders, reporting retention, transfer, and GPU-hour cost. Across backbones and task orders, direct sequential fine-tuning on incoming tasks induces catastrophic forgetting, causing update-induced performance regressions on prior tasks. Continual learning methods occupy distinct retention-compute frontiers: parameter-isolation provides the best retention per GPU-hour, replay offers strong protection at higher cost, and regularization yields limited benefit. Forgetting is task-dependent, with multi-label topic classification most vulnerable and constrained-output tasks more robust. MedCL-Bench provides a reproducible framework for auditing model updates before deployment.

顶级标签: medical natural language processing model evaluation
详细标签: continual learning benchmark catastrophic forgetting biomedical nlp model updating 或 搜索:

MedCL-Bench:生物医学持续学习中的稳定性-效率权衡与扩展性基准测试 / MedCL-Bench: Benchmarking stability-efficiency trade-offs and scaling in biomedical continual learning


1️⃣ 一句话总结

这篇论文提出了一个名为MedCL-Bench的标准化测试平台,用于系统评估生物医学AI模型在持续学习新知识时,如何平衡‘记住旧知识’与‘学习新知识’的效率与成本,发现直接更新会导致严重遗忘,而不同的学习方法在记忆效果和计算开销上存在明显差异。

源自 arXiv: 2603.16738