IOSVLM: A 3D Vision-Language Model for Unified Dental Diagnosis from Intraoral Scans

📄 Abstract - IOSVLM: A 3D Vision-Language Model for Unified Dental Diagnosis from Intraoral Scans

3D intraoral scans (IOS) are increasingly adopted in routine dentistry due to abundant geometric evidence, and unified multi-disease diagnosis is desirable for clinical documentation and communication. While recent works introduce dental vision-language models (VLMs) to enable unified diagnosis and report generation on 2D images or multi-view images rendered from IOS, they do not fully leverage native 3D geometry. Such work is necessary and also challenging, due to: (i) heterogeneous scan forms and the complex IOS topology, (ii) multi-disease co-occurrence with class imbalance and fine-grained morphological ambiguity, (iii) limited paired 3D IOS-text data. Thus, we present IOSVLM, an end-to-end 3D VLM that represents scans as point clouds and follows a 3D encoder-projector-LLM design for unified diagnosis and generative visual question-answering (VQA), together with IOSVQA, a large-scale multi-source IOS diagnosis VQA dataset comprising 19,002 cases and 249,055 VQA pairs over 23 oral diseases and heterogeneous scan types. To address the distribution gap between color-free IOS data and color-dependent 3D pre-training, we propose a geometry-to-chromatic proxy that stabilizes fine-grained geometric perception and cross-modal alignment. A two-stage curriculum training strategy further enhances robustness. IOSVLM consistently outperforms strong baselines, achieving gains of at least +9.58% macro accuracy and +1.46% macro F1, indicating the effectiveness of direct 3D geometry modeling for IOS-based diagnosis.

IOSVLM：一种基于口腔内扫描的统一牙科诊断三维视觉-语言模型 / IOSVLM: A 3D Vision-Language Model for Unified Dental Diagnosis from Intraoral Scans

1️⃣ 一句话总结

这篇论文提出了一个名为IOSVLM的三维视觉-语言模型，它直接利用口腔三维扫描的几何数据，实现了对多种牙科疾病的统一诊断和视觉问答，并通过创新的训练策略解决了数据稀缺和几何特征利用的难题。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要