ToolPRM: Fine-Grained Inference Scaling of Structured Outputs for Function Calling

Authors:Jianghao Lin, Yuanyuan Shi, Xin Peng, Renjie Ding, Hairui Wang, Yuxuan Peng, Bizhe Bai, Weixi Song, Fengshuo Bai, Huacan Chai, Weinan Zhang, Fei Huang, Ying Wen

View PDF HTML (experimental)

Abstract:Large language models (LLMs) excel at function calling, but inference scaling has been explored mainly for unstructured generation. We propose an inference-scaling framework for structured outputs that combines fine-grained beam search with \textbf{ToolPRM}, a process reward model scoring each intra-call decision (function name and argument filling). We build the first fine-grained intra-call supervision dataset via function masking, rollout collection, and step-level annotation. ToolPRM outperforms outcome and coarse-grained reward models in predictive accuracy and yields consistent test-time gains on multiple function-calling benchmarks. We further show that structured generation follows ``\textbf{explore more but retain less}'', since early JSON errors are unrecoverable.

Comments:	ACL 2026 (main)
Subjects:	Artificial Intelligence (cs.AI)
Cite as:	arXiv:2510.14703 [cs.AI]
	(or arXiv:2510.14703v2 [cs.AI] for this version)
	https://doi.org/10.48550/arXiv.2510.14703 arXiv-issued DOI via DataCite

Submission history

From: Jianghao Lin [view email]
[v1] Thu, 16 Oct 2025 14:06:03 UTC (282 KB)
[v2] Tue, 28 Apr 2026 18:17:43 UTC (220 KB)