菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-06-03
📄 Abstract - BreastGPT: A Multimodal Large Language Model for the Full Spectrum of Breast Cancer Clinical Routine

Breast cancer remains a leading cause of cancer-related mortality among women. Its clinical management requires multimodal reasoning across a clinical workflow that spans \textit{screening}, \textit{diagnosis} and \textit{treatment planning}, where each stage involves distinct imaging modalities, task objectives, and reasoning patterns. However, constrained by data scarcity and model versatility, existing medical MLLMs are typically evaluated on isolated modalities or narrow task families, limiting their ability to support workflow-level clinical reasoning. In this work, we first introduce \textbf{BreastStage}, a workflow-aligned breast imaging instruction corpus comprising 1.86M instruction-following pairs curated from 17 sub-datasets across 5 imaging modalities and 136 task templates. Its held-out split, \textbf{BreastStage-Bench}, provides a comprehensive benchmark for evaluating multimodal reasoning across the breast cancer care continuum. Building on this corpus, we propose \textbf{BreastGPT}, a unified MLLM equipped with a dual-branch visual encoder and concept-preserving token compression to bridge the scale gap between standard radiology and gigapixel pathology. On BreastStage-Bench, BreastGPT achieves 75.66\% closed-ended accuracy and 89.92\% open-ended score, outperforming both general-purpose and medical-specific MLLMs across clinical stages and task formats. These results suggest that workflow-aligned data and cross-scale visual modeling are critical for clinically grounded medical MLLMs. All data, code, and model checkpoints are released at this https URL.

顶级标签: medical multi-modal llm
详细标签: breast cancer benchmark clinical reasoning instruction tuning pathology 或 搜索:

BreastGPT:面向乳腺癌全流程临床诊疗的多模态大语言模型 / BreastGPT: A Multimodal Large Language Model for the Full Spectrum of Breast Cancer Clinical Routine


1️⃣ 一句话总结

本文提出了BreastGPT,一个能够覆盖乳腺癌筛查、诊断和治疗规划全流程的多模态大语言模型,通过构建包含136种任务模板和186万指令对的工作流对齐数据集BreastStage,并采用双分支视觉编码器与压缩专利解决了常规影像与超大规模病理图像的尺度差异问题,在多项基准测试中显著优于现有通用和医学专用模型。

源自 arXiv: 2606.04911