Boosting Medical Visual Understanding From Multi-Granular Language Learning

📄 Abstract - Boosting Medical Visual Understanding From Multi-Granular Language Learning

Recent advances in image-text pretraining have significantly enhanced visual understanding by aligning visual and textual representations. Contrastive Language-Image Pretraining (CLIP) has played a pivotal role in multimodal learning. However, its focus on single-label, single-granularity alignment limits its effectiveness in complex domains such as medical imaging, where images often correspond to multiple high-level labels (e.g., disease categories) across different annotation granularities (e.g., diagnostic description, clinical explanation). To address this, we propose Multi-Granular Language Learning (MGLL), a contrastive learning framework designed to improve both multi-label and cross-granularity alignment. MGLL leverages structured multi-label supervision, integrates textual descriptions across granularities, and introduces soft-label supervision with point-wise constraints to enhance alignment. MGLL employs smooth Kullback-Leibler (KL) divergence to ensure cross-granularity consistency while maintaining computational efficiency as a plug-and-play module for vision-language models. Pretrained on our constructed large-scale multi-granular datasets and evaluated across multiple datasets, MGLL outperforms other state-of-the-art methods in downstream tasks. The code is available at \href{this https URL}{this https URL}.

📄 论文总结

通过多粒度语言学习提升医学视觉理解 / Boosting Medical Visual Understanding From Multi-Granular Language Learning

1️⃣ 一句话总结

这项研究提出了一种名为MGLL的多粒度语言学习框架，通过整合不同粒度的文本描述和软标签监督，有效提升了医学影像中多标签和跨粒度对齐的准确性，在多个下游任务中表现优于现有先进方法。

← 返回列表

菜单

🤖 AI 深度阅读

📄 论文总结

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

🤖 AI 深度阅读

📄 论文总结

1️⃣ 一句话总结

获取最新论文摘要