菜单

关于 🐙 GitHub
arXiv 提交日期: 2025-12-03
📄 Abstract - Colon-X: Advancing Intelligent Colonoscopy from Multimodal Understanding to Clinical Reasoning

In this study, we present Colon-X, an open initiative aimed at advancing multimodal intelligence in colonoscopy. We begin by constructing ColonVQA, the most comprehensive multimodal dataset ever built for colonoscopy, featuring over 1.1M+ visual question answering entries across 76 clinical findings and 18 multimodal tasks. Beyond serving as a community-wide data foundation, we further investigate a critical yet underexplored transition in colonoscopy - evolving from multimodal understanding to clinical reasoning: (a) To capture the current landscape of multimodal understanding behaviors, we systematically assess the generalizability of 22 multimodal large language models and examine their reliability under human-induced perturbations. The results reveal that clinical outputs from leading MLLMs remain far from robust and trustworthy. (b) To narrow this gap, we further explore reasoning-centric intelligence tailored for colonoscopy. Specifically, we curate ColonReason, a clinically grounded reasoning dataset annotated through a multi-expert debating pipeline, and develop ColonR1, the first R1-styled model incorporating task-adaptive rewarding and gradient-stable optimization techniques. Under data-scarce conditions, our ColonR1 achieves 56.61% overall accuracy, outperforming supervised fine-tuning by 25.22%, and sets a new reasoning-enabled baseline for multimodal colonoscopy analysis. All data and model resources are publicly available at this https URL.

顶级标签: medical multi-modal model evaluation
详细标签: colonoscopy multimodal vqa clinical reasoning medical ai dataset 或 搜索:

Colon-X:从多模态理解到临床推理,推进智能结肠镜检查 / Colon-X: Advancing Intelligent Colonoscopy from Multimodal Understanding to Clinical Reasoning


1️⃣ 一句话总结

这项研究提出了一个名为Colon-X的开放计划,通过构建大规模数据集和开发首个推理专用模型,致力于解决当前智能结肠镜系统在从图像识别迈向临床决策推理时所面临的可靠性不足问题,并显著提升了模型在数据稀缺条件下的分析准确性。


源自 arXiv: 2512.03667