MathAtlas: A Benchmark for Autoformalization in the Wild

View PDF HTML (experimental)

Abstract:Current autoformalization benchmarks are largely focused on olympiad or undergraduate mathematics, while graduate and research-level mathematics remains underexplored. In this paper, we introduce MathAtlas, the first large-scale autoformalization benchmark of in the wild graduate-level mathematics, containing ~52k theorems, definitions, exercises, examples, and proofs extracted from 103 graduate mathematics textbooks. MathAtlas is enriched with a mathematical dependency graph containing ~178k relations, and is the first autoformalization benchmark to include such relations, facilitating evaluation and development of dependency-aware autoformalization systems. Our extensive experiments show that MathAtlas is high quality but extremely challenging: strong baselines achieve at most 9.8% correctness on theorem statements and 16.7% on definitions. Furthermore, we find performance of state-of-the-art models degrades substantially with dependency depth: on MA-Hard, a subset of 700 entities with the deepest dependency trees, the best model achieves only 2.6% correctness for autoformalization on this challenging dataset. We release MathAtlas to the community as a benchmark set for large-scale autoformalization of graduate-level mathematics in the wild.

Comments:	In submission at NeurIPS 2026
Subjects:	Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as:	arXiv:2605.14061 [cs.AI]
	(or arXiv:2605.14061v1 [cs.AI] for this version)
	https://doi.org/10.48550/arXiv.2605.14061 arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Nilay Patel [view email]
[v1] Wed, 13 May 2026 19:35:46 UTC (452 KB)