📄
Abstract - From Hazard Functions to Language Space: Cox-Supervised Distillation of Survival Risk into a Large Language Model
We investigate whether information about time-to-event risk estimated by a Cox proportional hazards model can be transferred into a generative large language model. We propose a text-based survival modelling pipeline in which structured clinical covariates are converted into text prompts and a Qwen-based large language model is fine-tuned to generate patient-specific survival risk using Cox model predictions as a training target. Across GBSG2, ACTG320, and WHAS500, the model achieves competitive held-out discrimination and calibration despite being trained as a text-generation task rather than with a conventional survival-analysis loss. We further analyse the geometry of the model's hidden states, where t-SNE visualisations reveal smooth risk gradients in latent space, suggesting that the model represents survival risk as a continuous structure rather than isolated risk categories. Together, these findings suggest that large language models can internalise survival-risk structure while supporting calibrated prediction, providing a route towards time-to-event reasoning in language models.
从风险函数到语言空间:基于Cox监督的生存风险蒸馏到大型语言模型 /
From Hazard Functions to Language Space: Cox-Supervised Distillation of Survival Risk into a Large Language Model
1️⃣ 一句话总结
本研究提出了一种新方法,通过将临床数据转化为文本提示,并用Cox模型的风险预测作为训练目标,成功让大型语言模型学会输出患者个性化的生存风险,且在多个数据集上表现良好,表明语言模型不仅能理解文本,还能内化复杂的生存风险结构。