菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-03-17
📄 Abstract - IOSVLM: A 3D Vision-Language Model for Unified Dental Diagnosis from Intraoral Scans

3D intraoral scans (IOS) are increasingly adopted in routine dentistry due to abundant geometric evidence, and unified multi-disease diagnosis is desirable for clinical documentation and communication. While recent works introduce dental vision-language models (VLMs) to enable unified diagnosis and report generation on 2D images or multi-view images rendered from IOS, they do not fully leverage native 3D geometry. Such work is necessary and also challenging, due to: (i) heterogeneous scan forms and the complex IOS topology, (ii) multi-disease co-occurrence with class imbalance and fine-grained morphological ambiguity, (iii) limited paired 3D IOS-text data. Thus, we present IOSVLM, an end-to-end 3D VLM that represents scans as point clouds and follows a 3D encoder-projector-LLM design for unified diagnosis and generative visual question-answering (VQA), together with IOSVQA, a large-scale multi-source IOS diagnosis VQA dataset comprising 19,002 cases and 249,055 VQA pairs over 23 oral diseases and heterogeneous scan types. To address the distribution gap between color-free IOS data and color-dependent 3D pre-training, we propose a geometry-to-chromatic proxy that stabilizes fine-grained geometric perception and cross-modal alignment. A two-stage curriculum training strategy further enhances robustness. IOSVLM consistently outperforms strong baselines, achieving gains of at least +9.58% macro accuracy and +1.46% macro F1, indicating the effectiveness of direct 3D geometry modeling for IOS-based diagnosis.

顶级标签: medical computer vision multi-modal
详细标签: 3d vision-language model dental diagnosis intraoral scans visual question answering point clouds 或 搜索:

IOSVLM:一种基于口腔内扫描的统一牙科诊断三维视觉-语言模型 / IOSVLM: A 3D Vision-Language Model for Unified Dental Diagnosis from Intraoral Scans


1️⃣ 一句话总结

这篇论文提出了一个名为IOSVLM的三维视觉-语言模型,它直接利用口腔三维扫描的几何数据,实现了对多种牙科疾病的统一诊断和视觉问答,并通过创新的训练策略解决了数据稀缺和几何特征利用的难题。

源自 arXiv: 2603.16781