菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-03-25
📄 Abstract - Can we generate portable representations for clinical time series data using LLMs?

Deploying clinical ML is slow and brittle: models that work at one hospital often degrade under distribution shifts at the next. In this work, we study a simple question -- can large language models (LLMs) create portable patient embeddings i.e. representations of patients enable a downstream predictor built on one hospital to be used elsewhere with minimal-to-no retraining and fine-tuning. To do so, we map from irregular ICU time series onto concise natural language summaries using a frozen LLM, then embed each summary with a frozen text embedding model to obtain a fixed length vector capable of serving as input to a variety of downstream predictors. Across three cohorts (MIMIC-IV, HIRID, PPICU), on multiple clinically grounded forecasting and classification tasks, we find that our approach is simple, easy to use and competitive with in-distribution with grid imputation, self-supervised representation learning, and time series foundation models, while exhibiting smaller relative performance drops when transferring to new hospitals. We study the variation in performance across prompt design, with structured prompts being crucial to reducing the variance of the predictive models without altering mean accuracy. We find that using these portable representations improves few-shot learning and does not increase demographic recoverability of age or sex relative to baselines, suggesting little additional privacy risk. Our work points to the potential that LLMs hold as tools to enable the scalable deployment of production grade predictive models by reducing the engineering overhead.

顶级标签: medical llm natural language processing
详细标签: clinical time series patient embeddings distribution shift portable representations icu forecasting 或 搜索:

我们能否利用大语言模型为临床时间序列数据生成可移植的表征? / Can we generate portable representations for clinical time series data using LLMs?


1️⃣ 一句话总结

这篇论文提出了一种简单有效的方法,利用冻结的大语言模型将ICU患者的不规则时间序列数据转换为简洁的自然语言摘要,再嵌入为固定长度的向量表征,这种表征在不同医院间的预测任务中表现出良好的可移植性和稳定性,有助于降低临床机器学习模型的部署成本。

源自 arXiv: 2603.23987