Abstract:Automatic heuristic design (AHD) has emerged as a promising paradigm for solving NP-hard combinatorial optimization problems (COPs). Recent works show that large language models (LLMs), when integrated into well-designed frameworks (i.e., LLM-AHD), can autonomously discover high-performing heuristics. However, existing LLM-AHD frameworks typically treat LLMs as passive generators within fixed workflows, where the model generates heuristics from manually designed, limited context. Such context may fail to capture state-dependent information (e.g., specific failure modes), leading to inefficient trial-and-error exploration. To overcome these limitations, we propose AHD Agent, a novel tool-integrated, multi-turn framework that empowers LLMs to proactively decide whether to generate heuristics or invoke tools to retrieve targeted evidence from the solving environment. To effectively train such a dynamic decision-making agent, we introduce an agentic reinforcement learning (RL) system, which leverages a novel environment synthesis pipeline to optimize a compact model's generalizable AHD capabilities. Experiments across eight diverse domains, including four held-out tasks, demonstrate that our 4B-parameter agent matches or surpasses state-of-the-art baselines using much larger models, while requiring significantly fewer evaluations. Model and inference scaling analysis further reveals that AHD Agent offers an effective trajectory toward truly autonomous heuristic design.
| Comments: | 10 pages, 7 figures for main content |
| Subjects: | Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE) |
| Cite as: | arXiv:2605.08756 [cs.AI] |
| (or arXiv:2605.08756v1 [cs.AI] for this version) | |
| https://doi.org/10.48550/arXiv.2605.08756 arXiv-issued DOI via DataCite (pending registration) |
Submission history
From: Ning Lu [view email]
[v1]
Sat, 9 May 2026 07:36:45 UTC (1,005 KB)
