菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-05-07
📄 Abstract - iTRIALSPACE: Programmable Virtual Lesion Trials for Controlled Evaluation of Lung CT Models

We introduce iTRIALSPACE, a programmable evaluation framework for controlled assessment of lung CT models. Standard benchmarks are static retrospective collections that entangle lesion size, lobe prevalence, anatomy, and acquisition context, making it difficult to determine what structurally drives model accuracy. iTRIALSPACE addresses this limitation by composing real clinical CTs and lesion profiles into controlled virtual lesion trials through a four-stage pipeline: multidataset nodule profiling, explicit trial specification, anatomy-aware mask insertion, and ControlNet-conditioned CT synthesis. The framework is built on a unified 54-attribute nodule-profile dataset spanning 13,140 annotated nodules from seven public CT sources and instantiated as 13 trial modes. We evaluate iTRIALSPACE in a 55,469-sample Virtual Lesion Study spanning three medical VLMs, four spatialguidance conditions, and three clinical tasks. Across all 13 modes, the synthetic substrate remains within the real-to-real FID baseline, and synthetic performance rankings transfer strongly to real clinical data ($\rho$ = 0.93, p < 10$^{-15}$). Controlled trial modes expose findings unavailable to fixed-distribution benchmarks, including shortcut-driven size prediction collapse under lobe-equalized sampling and hostto-donor variance ratios of 8.9x and 3.3x in twin-cross analysis. These results position iTRIALSPACE as an auditable evaluation infrastructure for controlled, falsifiable testing beyond static retrospective benchmarks.

顶级标签: medical model evaluation data
详细标签: lung ct virtual trials benchmark controlled evaluation tissue segmentation 或 搜索:

iTRIALSPACE:用于肺CT模型受控评估的可编程虚拟病灶试验平台 / iTRIALSPACE: Programmable Virtual Lesion Trials for Controlled Evaluation of Lung CT Models


1️⃣ 一句话总结

该论文提出了一个名为iTRIALSPACE的新型评估框架,通过将真实CT图像与病灶特征进行可控组合,模拟出大量虚拟试验场景,从而更科学地测试肺CT模型在不同病灶大小、位置和解剖条件下的真实表现,揭示了传统固定数据集评估中隐藏的模型缺陷。

源自 arXiv: 2605.05761