Abstract:Uncertainty calibration in pre-trained transformers is critical for their reliable deployment in risk-sensitive applications. Yet, most existing pre-trained transformers do not have a principled mechanism for uncertainty propagation through their feature transformation stack. In this work, we propose a diffusion-inspired reconfiguration of transformers in which each feature transformation block is modeled as a probabilistic mapping. Composing these probabilistic mappings reveals a probability path that mimics the structure of a diffusion process, transporting data mass from the input distribution to the pre-trained feature distribution. This probability path can then be recompiled on a diffusion process with a unified transition model to enable principled propagation of representation uncertainty throughout the pre-trained model's architecture while maintaining its original predictive performance. Empirical results across a variety of vision and language benchmarks demonstrate that our method achieves superior calibration and predictive accuracy compared to existing uncertainty-aware transformers.
| Subjects: | Machine Learning (cs.LG) |
| Cite as: | arXiv:2602.08920 [cs.LG] |
| (or arXiv:2602.08920v2 [cs.LG] for this version) | |
| https://doi.org/10.48550/arXiv.2602.08920 arXiv-issued DOI via DataCite |
Submission history
From: Manh Cuong Dao [view email]
[v1]
Mon, 9 Feb 2026 17:24:47 UTC (1,700 KB)
[v2]
Wed, 13 May 2026 03:23:51 UTC (1,683 KB)
