DP-Muon: Differentially Private Optimization via Matrix-Orthogonalized Momentum

View PDF HTML (experimental)

Abstract:We study differentially private (DP) training with Muon, a matrix-valued optimizer that updates hidden-layer weights using momentum followed by Newton--Schulz orthogonalization. While DP-SGD is well understood, the interaction between per-example clipping, Gaussian noise, momentum, and nonlinear orthogonalization in Muon has not been systematically analyzed. We formulate DP-Muon, a private Muon procedure that clips per-example matrix gradients, adds Gaussian noise to the clipped lot average, and then applies momentum and Newton--Schulz orthogonalization as post-processing. We prove that DP-Muon inherits the privacy guarantee certified by the corresponding same-lot subsampled Gaussian accountant, with no additional privacy cost from Muon-specific post-processing. On the optimization side, we establish finite-horizon and vanishing stationarity guarantees under per-matrix clipping, with bounds that separate optimization error, clipping residual, privacy noise, and Newton--Schulz approximation error. We further show that the DP-induced bias in Muon arises not in the linear momentum buffer itself, but after the nonlinear Newton--Schulz map, where Gaussian noise induces a matrix-valued heat-smoothing bias. This motivates DP-MuonBC, a bias-corrected variant that removes the leading output-level bias term while preserving the same privacy guarantee. Experiments on E2E and DART show that Muon-style matrix updates improve private fine-tuning, and that DP-MuonBC further improves utility without increasing the privacy budget.

Comments:	26 pages
Subjects:	Machine Learning (cs.LG)
Cite as:	arXiv:2605.12994 [cs.LG]
	(or arXiv:2605.12994v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2605.12994 arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Chenglin Fan [view email]
[v1] Wed, 13 May 2026 04:52:24 UTC (213 KB)