L2R: Low-Rank and Lipschitz-Controlled Routing for Mixture-of-Experts

View PDF HTML (experimental)

Abstract:Mixture-of-Experts (MoE) models scale neural networks by conditionally activating a small subset of experts, where the router plays a central role in determining expert specialization and overall model performance. However, many modern MoE systems still adopt linear routers in raw high-dimensional representation spaces, where representation mismatch, angular concentration, and scale-sensitive scoring can jointly undermine routing discriminability and stable expert specialization. In this work, we propose Low-rank & Lipschitz-controlled Routing (L2R), a unified routing framework that reshapes both the routing space and scoring geometry. L2R performs expert assignment in a shared low-rank latent routing space and introduces Saturated Inner-Product Scoring (SIPS) to explicitly control the Lipschitz behavior of routing functions, yielding smoother and more stable routing geometry. In addition, L2R incorporates a parameter-efficient multi-anchor routing mechanism to enhance expert expressiveness. Extensive experiments on an OLMoE-based language MoE model and a vision MoE setting on ImageNet demonstrate that L2R consistently improves routing geometry, expert discrimination, and overall model performance. Code will be released.

Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2601.21349 [cs.LG]
	(or arXiv:2601.21349v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2601.21349 arXiv-issued DOI via DataCite

Submission history

From: Guang Li [view email]
[v1] Thu, 29 Jan 2026 07:18:33 UTC (5,111 KB)
[v2] Thu, 14 May 2026 07:59:04 UTC (3,975 KB)