Authors:Dadi Guo, Yuejin Xie, Qingyu Liu, Weixian Huang, Jiayu Liu, Zhiyuan Fan, Qihan Ren, Shuai Shao, Tianyi Zhou, Jianjie Feng, Wenze Su, Yujiu Yang, Dongrui Liu, Yi R. Fung
Abstract:As large language models (LLMs) advance their mathematical capabilities toward the IMO and research level, the scarcity of challenging, high-quality problems has become a significant bottleneck for training, evaluation and self-evolution of LLMs. Simultaneously, recent code agents have demonstrated sophisticated skills in agentic coding and reasoning, suggesting that code execution can serve as a scalable environment for mathematical experimentation. In this paper, we investigate the potential of code agents to autonomously evolve existing math problems into more complex variations. We introduce a multi-agent framework designed to perform problem evolution while validating the solvability and increased difficulty of the generated problems. Our experiments demonstrate that, given sufficient test-time exploration, code agents can synthesize new, solvable problems that are structurally distinct from and more challenging than the originals. This work provides empirical evidence that code-driven agents can serve as a viable mechanism for synthesizing high-difficulty mathematical reasoning problems within scalable computational environments. Code and data is available at this https URL.
| Comments: | 38 pages |
| Subjects: | Computation and Language (cs.CL) |
| Cite as: | arXiv:2603.03202 [cs.CL] |
| (or arXiv:2603.03202v3 [cs.CL] for this version) | |
| https://doi.org/10.48550/arXiv.2603.03202 arXiv-issued DOI via DataCite |
Submission history
From: Dadi Guo [view email]
[v1]
Tue, 3 Mar 2026 17:55:10 UTC (740 KB)
[v2]
Wed, 4 Mar 2026 04:22:14 UTC (740 KB)
[v3]
Mon, 1 Jun 2026 06:25:08 UTC (734 KB)
