Abstract:Irregularly sampled multivariate event streams remain a stubbornly difficult modality for generative modeling: tokenization-based approaches break down when inter-event intervals vary by orders of magnitude, and neural temporal point processes are bottlenecked by window-level numerical quadrature. We (i) propose SurF, a generative model that uses the Time Rescaling Theorem (TRT) as a learnable bijection between event sequences and i.i.d.\ unit-rate exponential noise, enabling a single model to be trained across heterogeneous event-stream datasets; (ii) three efficient parameterizations of the cumulative intensity that scale to long sequences; and (iii) a Transformer-based encoder for multi-dataset pretraining. On six real-world benchmarks, SurF achieves the best reported time RMSE on Earthquake, Retweet, and Taobao, and is within trial-level noise of the strongest specialist on the remaining three. Under a strict leave-one-out protocol, the held-out checkpoint beats every classical and neural-autoregressive baseline on 5/6 datasets and beats every baseline on Amazon and Earthquake, an initial step toward foundation models over asynchronous event streams.
| Subjects: | Machine Learning (cs.LG) |
| Cite as: | arXiv:2605.14069 [cs.LG] |
| (or arXiv:2605.14069v1 [cs.LG] for this version) | |
| https://doi.org/10.48550/arXiv.2605.14069 arXiv-issued DOI via DataCite (pending registration) |
Submission history
From: Mohammad R. Rezaei [view email]
[v1]
Wed, 13 May 2026 19:46:48 UTC (4,040 KB)
