Abstract:We construct a new kind of encoder, leveraging the expressive power of diffusion models. In a traditional variational autoencoder, the encoder and decoder jointly negotiate a latent representation of the input. This is made possible by the reparameterization trick, which simplifies training at the cost of restricting the encoder to a simple family of distributions. Replacing this encoder with a diffusion model requires rethinking how the decoder pressure can be transmitted back to the encoder, given that they tend to update their internal estimates of the latent in opposing directions. We solve this problem with an alternating training scheme, inspired by the expectation-maximization algorithm. Our method enables more reliable synchronization between encoder and decoder, while preserving the simple and efficient training objective of standard diffusion models.
| Comments: | 22 pages + references, 10 figures |
| Subjects: | Machine Learning (cs.LG); Information Theory (cs.IT) |
| Cite as: | arXiv:2605.13399 [cs.LG] |
| (or arXiv:2605.13399v1 [cs.LG] for this version) | |
| https://doi.org/10.48550/arXiv.2605.13399 arXiv-issued DOI via DataCite (pending registration) |
Submission history
From: Akhil Premkumar [view email]
[v1]
Wed, 13 May 2026 11:54:43 UTC (6,614 KB)
