菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-05-27
📄 Abstract - FedEHR-Gen: Federated Synthetic Time-Series EHR Generation via Latent Space Alignment and Distribution-Aware Aggregation

Synthetic Electronic Health Record (EHR) generation provides a promising avenue for data augmentation and cross-hospital modeling in privacy-constrained healthcare settings. However, most existing EHR generative models are centralized and require pooling data across hospitals, which is often infeasible when real-world data sharing is restricted. While federated EHR generation offers a natural solution, direct federated modeling often collapses or diverges due to the high dimensionality, sparsity, and cross-hospital heterogeneity of EHR data. In this work, we propose FedEHR-Gen, the first federated framework for synthetic time-series EHR generation across distributed hospitals. FedEHR-Gen uses a two-stage learning paradigm. First, we introduce a federated autoencoder that projects high-dimensional and sparse EHR features onto a compact latent space. To ensure semantic consistency across hospitals, we develop a layer-wise matching aggregation mechanism that aligns local encoders into a unified global latent space. Second, operating on this aligned latent space, we train a federated temporal conditional variational autoencoder (TCVAE) with distribution-aware aggregation, enabling stable temporal generative modeling under severe cross-hospital heterogeneity. Extensive experiments on the eICU and MIMIC-III datasets demonstrate that FedEHR-Gen achieves generation fidelity, downstream utility, and privacy risk comparable to centralized training, while consistently outperforming the standard federated baseline.

顶级标签: medical machine learning systems
详细标签: ehr generation federated learning time-series latent space alignment distribution-aware aggregation 或 搜索:

联邦电子健康记录生成:通过潜在空间对齐和分布感知聚合实现分布式时间序列数据合成 / FedEHR-Gen: Federated Synthetic Time-Series EHR Generation via Latent Space Alignment and Distribution-Aware Aggregation


1️⃣ 一句话总结

该论文提出了一种名为FedEHR-Gen的联邦学习框架,能够在多家医院不共享原始数据的情况下,联合生成高质量、时间序列化的电子健康记录数据,通过两步训练(先对齐数据特征空间,再稳定生成时序数据)解决了传统方法因数据高维稀疏和医院间差异大而导致的模型崩溃问题。

源自 arXiv: 2605.27892