Abstract:Pairwise Ranking Prompting (PRP) elicits pairwise preference judgments from an LLM, which are then aggregated into a ranking, usually via classical sorting algorithms. However, judgments are noisy, order-sensitive, and sometimes intransitive, so sorting assumptions do not match the setting. Because sorting aims to recover a full permutation, truncating it to meet a call budget does not produce a dependable top-K. We thus reframe PRP reranking as active learning from noisy pairwise comparisons and show that active rankers are drop-in replacements that improve NDCG@10 per call in the call-constrained regime. Our noise-robust framework also introduces a randomized-direction oracle that uses a single LLM call per pair. This approach converts systematic position bias into zero-mean noise, enabling unbiased aggregate ranking without the cost of bidirectional calls.
| Comments: | 13 pages, 7 figures. Preprint |
| Subjects: | Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL) |
| Cite as: | arXiv:2605.14236 [cs.LG] |
| (or arXiv:2605.14236v1 [cs.LG] for this version) | |
| https://doi.org/10.48550/arXiv.2605.14236 arXiv-issued DOI via DataCite (pending registration) |
Submission history
From: Jeremías Figueiredo Paschmann [view email]
[v1]
Thu, 14 May 2026 01:03:53 UTC (1,799 KB)
