菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-03-10
📄 Abstract - Tracking Cancer Through Text: Longitudinal Extraction From Radiology Reports Using Open-Source Large Language Models

Radiology reports capture crucial longitudinal information on tumor burden, treatment response, and disease progression, yet their unstructured narrative format complicates automated analysis. While large language models (LLMs) have advanced clinical text processing, most state-of-the-art systems remain proprietary, limiting their applicability in privacy-sensitive healthcare environments. We present a fully open-source, locally deployable pipeline for longitudinal information extraction from radiology reports, implemented using the \texttt{llm\_extractinator} framework. The system applies the \texttt{qwen2.5-72b} model to extract and link target, non-target, and new lesion data across time points in accordance with RECIST criteria. Evaluation on 50 Dutch CT Thorax/Abdomen report pairs yielded high extraction performance, with attribute-level accuracies of 93.7\% for target lesions, 94.9\% for non-target lesions, and 94.0\% for new lesions. The approach demonstrates that open-source LLMs can achieve clinically meaningful performance in multi-timepoint oncology tasks while ensuring data privacy and reproducibility. These results highlight the potential of locally deployable LLMs for scalable extraction of structured longitudinal data from routine clinical text.

顶级标签: medical llm natural language processing
详细标签: clinical text extraction radiology reports longitudinal data open-source llms oncology 或 搜索:

通过文本追踪癌症:使用开源大语言模型从放射学报告中纵向提取信息 / Tracking Cancer Through Text: Longitudinal Extraction From Radiology Reports Using Open-Source Large Language Models


1️⃣ 一句话总结

这篇论文开发了一个完全开源、可本地部署的系统,利用大语言模型从放射学报告中自动提取癌症病灶随时间变化的纵向数据,在保护患者隐私的同时达到了接近临床应用的准确率。

源自 arXiv: 2603.09638