菜单

🤖 系统
📄 Abstract - Medal S: Spatio-Textual Prompt Model for Medical Segmentation

We introduce Medal S, a medical segmentation foundation model that supports native-resolution spatial and textual prompts within an end-to-end trainable framework. Unlike text-only methods lacking spatial awareness, Medal S achieves channel-wise alignment between volumetric prompts and text embeddings, mitigating inaccuracies from resolution mismatches. By preserving full 3D context, it efficiently processes multiple native-resolution masks in parallel, enhancing multi-class segmentation performance. A lightweight 3D convolutional module enables precise voxel-space refinement guided by both prompt types, supporting up to 243 classes across CT, MRI, PET, ultrasound, and microscopy modalities in the BiomedSegFM dataset. Medal S offers two prompting modes: a text-only mode, where model predictions serve as spatial prompts for self-refinement without human input, and a hybrid mode, incorporating manual annotations for enhanced flexibility. For 24-class segmentation, parallel spatial prompting reduces inference time by more than 90% compared to sequential prompting. We propose dynamic resampling to address target-patch ratio imbalance, extending SAT and nnU-Net for data augmentation. Furthermore, we develop optimized text preprocessing, a two-stage inference strategy, and post-processing techniques to improve memory efficiency, precision, and inference speed. On the five-modality average on the validation set, Medal S outperforms SAT with a DSC of 75.44 (vs. 69.83), NSD of 77.34 (vs. 71.06), F1 of 38.24 (vs. 24.88), and DSC TP of 65.46 (vs. 46.97). Medal S achieves excellent performance by harmonizing spatial precision with semantic textual guidance, demonstrating superior efficiency and accuracy in multi-class medical segmentation tasks compared to sequential prompt-based approaches. Medal S will be publicly available at this https URL.

顶级标签: medical computer vision model training
详细标签: medical segmentation spatio-textual prompting 3d vision foundation model multi-modal 或 搜索:

📄 论文总结

Medal S:用于医学分割的时空文本提示模型 / Medal S: Spatio-Textual Prompt Model for Medical Segmentation


1️⃣ 一句话总结

这篇论文提出了一个名为Medal S的医学图像分割基础模型,它能够同时利用空间位置和文本描述作为输入提示,在保持高分辨率3D图像上下文的同时,显著提升了多类别分割的精度和效率,并在多种医学影像模态上验证了其优越性能。


📄 打开原文 PDF