VOLMO:面向眼科的通用与开源大模型 / VOLMO: Versatile and Open Large Models for Ophthalmology
1️⃣ 一句话总结
本研究提出了一个名为VOLMO的开放框架,专门用于构建眼科领域的多模态大模型,该模型通过分阶段训练,在多种眼科疾病诊断和临床推理任务上表现优于现有通用及医疗大模型。
Vision impairment affects millions globally, and early detection is critical to preventing irreversible vision loss. Ophthalmology workflows require clinicians to integrate medical images, structured clinical data, and free-text notes to determine disease severity and management, which is time-consuming and burdensome. Recent multimodal large language models (MLLMs) show promise, but existing general and medical MLLMs perform poorly in ophthalmology, and few ophthalmology-specific MLLMs are openly available. We present VOLMO (Versatile and Open Large Models for Ophthalmology), a model-agnostic, data-open framework for developing ophthalmology-specific MLLMs. VOLMO includes three stages: ophthalmology knowledge pretraining on 86,965 image-text pairs from 26,569 articles across 82 journals; domain task fine-tuning on 26,929 annotated instances spanning 12 eye conditions for disease screening and severity classification; and multi-step clinical reasoning on 913 patient case reports for assessment, planning, and follow-up care. Using this framework, we trained a compact 2B-parameter MLLM and compared it with strong baselines, including InternVL-2B, LLaVA-Med-7B, MedGemma-4B, MedGemma-27B, and RETFound. We evaluated these models on image description generation, disease screening and staging classification, and assessment-and-management generation, with additional manual review by two healthcare professionals and external validation on three independent cohorts for age-related macular degeneration and diabetic retinopathy. Across settings, VOLMO-2B consistently outperformed baselines, achieving stronger image description performance, an average F1 of 87.4% across 12 eye conditions, and higher scores in external validation.
VOLMO:面向眼科的通用与开源大模型 / VOLMO: Versatile and Open Large Models for Ophthalmology
本研究提出了一个名为VOLMO的开放框架,专门用于构建眼科领域的多模态大模型,该模型通过分阶段训练,在多种眼科疾病诊断和临床推理任务上表现优于现有通用及医疗大模型。
源自 arXiv: 2603.23953