Abstract:Tool-augmented LLM agents increasingly access the same tool type through multiple functionally equivalent providers, such as web-search APIs, retrievers, or LLM backends exposed behind a shared interface. This creates a provider-routing problem under runtime load: the router must choose among providers that differ in latency, reliability, and answer quality, often without gold labels at deployment time. We introduce LQM-ContextRoute, a contextual bandit router for same-function tool providers. Its key design is latency-quality matching: instead of letting low latency offset poor answers in an additive reward, the router ranks providers by expected answer quality per service cycle. It combines this capacity-aware score with query-specific quality estimation and LLM-as-judge feedback, allowing it to adapt online to both load changes and provider-quality differences. On the main web-search load benchmark, LQM-ContextRoute improves F1 by +2.18 pp over SW-UCB while staying on the latency-quality frontier. In a high-heterogeneity StrategyQA setting, LQM-ContextRoute avoids additive-reward collapse and improves accuracy by up to +18 pp over SW-UCB; on heterogeneous retriever pools, it improves NDCG by +2.91--+3.22 pp over SW-UCB. These results show that same-function tool routing benefits from treating latency as service capacity, especially when runtime pressure and provider-quality heterogeneity coexist.
| Comments: | 12 pages, 1 figure, 14 tables |
| Subjects: | Machine Learning (cs.LG) |
| Cite as: | arXiv:2605.14241 [cs.LG] |
| (or arXiv:2605.14241v1 [cs.LG] for this version) | |
| https://doi.org/10.48550/arXiv.2605.14241 arXiv-issued DOI via DataCite (pending registration) |
Submission history
From: Kexin Chu [view email]
[v1]
Thu, 14 May 2026 01:14:13 UTC (51 KB)
