Abstract:Mixture-of-Experts (MoE) models scale capacity by combining specialized experts, but most existing approaches assume centralized access to training data. In practice, data are distributed across clients and cannot be shared due to privacy constraints, making unified MoE training challenging. We propose MetaMoE, a privacy-preserving framework that unifies independently trained, domain-specialized experts into a single MoE using public proxy data as surrogates for inaccessible private data. Central to MetaMoE is diversity-aware proxy selection, which selects client-domain-relevant and diverse samples from public data to effectively approximate private data distributions and supervise router learning. These proxies are further used to align expert training, improving expert coordination at unification time, while a context-aware router enhances expert selection across heterogeneous inputs. Experiments on computer vision and natural language processing benchmarks demonstrate that MetaMoE consistently outperforms recent privacy-preserving MoE unification methods. Code is available at this https URL.
| Comments: | Accepted by ICML 2026 |
| Subjects: | Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Cryptography and Security (cs.CR) |
| Cite as: | arXiv:2605.14289 [cs.LG] |
| (or arXiv:2605.14289v1 [cs.LG] for this version) | |
| https://doi.org/10.48550/arXiv.2605.14289 arXiv-issued DOI via DataCite (pending registration) |
Submission history
From: Weisen Jiang [view email]
[v1]
Thu, 14 May 2026 02:48:23 UTC (2,440 KB)
