A Comparative Study in Surgical AI: Datasets, Foundation Models, and Barriers to Med-AGI

📄 Abstract - A Comparative Study in Surgical AI: Datasets, Foundation Models, and Barriers to Med-AGI

Recent Artificial Intelligence (AI) models have matched or exceeded human experts in several benchmarks of biomedical task performance, but have lagged behind on surgical image-analysis benchmarks. Since surgery requires integrating disparate tasks -- including multimodal data integration, human interaction, and physical effects -- generally-capable AI models could be particularly attractive as a collaborative tool if performance could be improved. On the one hand, the canonical approach of scaling architecture size and training data is attractive, especially since there are millions of hours of surgical video data generated per year. On the other hand, preparing surgical data for AI training requires significantly higher levels of professional expertise, and training on that data requires expensive computational resources. These trade-offs paint an uncertain picture of whether and to-what-extent modern AI could aid surgical practice. In this paper, we explore this question through a case study of surgical tool detection using state-of-the-art AI methods available in 2026. We demonstrate that even with multi-billion parameter models and extensive training, current Vision Language Models fall short in the seemingly simple task of tool detection in neurosurgery. Additionally, we show scaling experiments indicating that increasing model size and training time only leads to diminishing improvements in relevant performance metrics. Thus, our experiments suggest that current models could still face significant obstacles in surgical use cases. Moreover, some obstacles cannot be simply ``scaled away'' with additional compute and persist across diverse model architectures, raising the question of whether data and label availability are the only limiting factors. We discuss the main contributors to these constraints and advance potential solutions.

外科人工智能比较研究：数据集、基础模型与迈向医疗通用人工智能的障碍 / A Comparative Study in Surgical AI: Datasets, Foundation Models, and Barriers to Med-AGI

1️⃣ 一句话总结

这篇论文通过神经外科手术工具检测的案例研究发现，尽管投入了海量数据和巨大算力，当前最先进的视觉语言模型在看似简单的外科任务上仍表现不佳，表明仅靠扩大模型规模无法解决外科AI面临的核心障碍，并探讨了数据标注和专业性等更深层的限制因素。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要