Abstract:Continuous diffusion is a natural framework for non-autoregressive generation but has generally lagged behind masked discrete diffusion models (MDMs) on discrete sequence generation. We argue that the bottleneck is not continuity itself, but a representation in which denoising depends on timestep-indexed noise regimes. We introduce \emph{Discrete Stochastic Localization} (DSL), a continuous-state framework with unit-sphere token embeddings whose Bayes-optimal denoiser is invariant to the nominal signal-to-noise ratio (SNR) under the localization channel. One trained network then supports an entire family of per-token SNR paths, with endpoint masked-diffusion paths as a special case. Fine-tuning a pretrained MDLM checkpoint with DSL substantially improves distributional faithfulness (MAUVE) on OpenWebText across all step budgets from $T{=}128$ to $T{=}1024$, and the same checkpoint supports random-order autoregressive sampling, as well as a hybrid continuous-then-discrete sampler using as few as T=48 total steps -- without distillation or retraining.
| Comments: | arXiv admin note: substantial text overlap with arXiv:2602.16169 |
| Subjects: | Machine Learning (cs.LG) |
| Cite as: | arXiv:2605.12836 [cs.LG] |
| (or arXiv:2605.12836v1 [cs.LG] for this version) | |
| https://doi.org/10.48550/arXiv.2605.12836 arXiv-issued DOI via DataCite (pending registration) |
Submission history
From: Yunshu Wu [view email]
[v1]
Wed, 13 May 2026 00:12:24 UTC (1,399 KB)
