Distance-Matrix Wasserstein Statistics for Scalable Gromov--Wasserstein Learning

View PDF HTML (experimental)

Abstract:Gromov--Wasserstein (GW) distances compare graphs, shapes, and point clouds through internal distances, without requiring a common coordinate system. This invariance is powerful, but discrete GW is a nonconvex quadratic optimal transport problem and is difficult to estimate at scale. We propose \emph{Distance-Matrix Wasserstein} (DMW), a hierarchy of Wasserstein statistics comparing laws of random finite distance matrices. Rather than optimizing a global point-level alignment, DMW samples $n$ points from each space, records their pairwise distances, and transports the resulting matrix laws. We prove that DMW is a relaxation and lower bound of GW, and establish a reverse approximation inequality: the GW--DMW gap is controlled by the Wasserstein error of approximating each original measure with $n$ samples. Hence population DMW converges to GW as sampled subspaces become dense. We further give finite-sample bounds, including intrinsic-dimensional rates that depend on the data manifold rather than the ambient matrix dimension $\binom n2$. For scalable computation, we introduce sliced and multi-scale DMW; for $p=1$, the sliced multi-scale dissimilarity yields positive-definite exponential kernels. Experiments on synthetic metric spaces, scalability benchmarks, graph classification, and two-sample testing validate the theory and demonstrate an interpretable GW-style proxy for structural comparison.

Subjects:	Machine Learning (cs.LG)
Cite as:	arXiv:2605.14981 [cs.LG]
	(or arXiv:2605.14981v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2605.14981 arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Ao Xu [view email]
[v1] Thu, 14 May 2026 15:45:48 UTC (304 KB)