DT-Transformer: A Foundation Model for Disease Trajectory Prediction on a Real-world Health System

View PDF HTML (experimental)

Abstract:Accurate disease trajectory prediction is critical for early intervention, resource allocation, and improving long-term outcomes. While electronic health records (EHRs) provide a rich longitudinal view of patient health in clinical environments, models trained on curated research cohorts may not reflect routine deployment settings, and those trained on single-hospital datasets capture only fragments of each patient's trajectory. This highlights the importance of leveraging large, multi-hospital health systems for training and validation to better reflect real-world clinical complexity. In this work, we develop DT-Transformer, a foundation model trained on 57.1M structured EHR entries over 1.7M patients from Mass General Brigham (MGB), spanning 11 hospitals and a broad network of outpatient clinics. DT-Transformer achieves strong discrimination in both held-out and prospective validation settings. Next-event prediction achieves a median age- and sex-stratified AUC of 0.871 across 896 disease categories, with all categories exceeding AUC 0.5. These results support health system-scale training as a path toward foundation models suited to real-world clinical forecasting.

Comments:	Work in Progress
Subjects:	Machine Learning (cs.LG); Computation and Language (cs.CL)
Cite as:	arXiv:2605.14227 [cs.LG]
	(or arXiv:2605.14227v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2605.14227 arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Yunying Zhu [view email]
[v1] Thu, 14 May 2026 00:45:04 UTC (2,150 KB)