Latency-Quality Routing for Functionally Equivalent Tools in LLM Agents

View PDF HTML (experimental)

Abstract:Tool-augmented LLM agents increasingly access the same tool type through multiple functionally equivalent providers, such as web-search APIs, retrievers, or LLM backends exposed behind a shared interface. This creates a provider-routing problem under runtime load: the router must choose among providers that differ in latency, reliability, and answer quality, often without gold labels at deployment time. We introduce LQM-ContextRoute, a contextual bandit router for same-function tool providers. Its key design is latency-quality matching: instead of letting low latency offset poor answers in an additive reward, the router ranks providers by expected answer quality per service cycle. It combines this capacity-aware score with query-specific quality estimation and LLM-as-judge feedback, allowing it to adapt online to both load changes and provider-quality differences. On the main web-search load benchmark, LQM-ContextRoute improves F1 by +2.18 pp over SW-UCB while staying on the latency-quality frontier. In a high-heterogeneity StrategyQA setting, LQM-ContextRoute avoids additive-reward collapse and improves accuracy by up to +18 pp over SW-UCB; on heterogeneous retriever pools, it improves NDCG by +2.91--+3.22 pp over SW-UCB. These results show that same-function tool routing benefits from treating latency as service capacity, especially when runtime pressure and provider-quality heterogeneity coexist.

Comments:	12 pages, 1 figure, 14 tables
Subjects:	Machine Learning (cs.LG)
Cite as:	arXiv:2605.14241 [cs.LG]
	(or arXiv:2605.14241v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2605.14241 arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Kexin Chu [view email]
[v1] Thu, 14 May 2026 01:14:13 UTC (51 KB)