Abstract:The 1-identification problem is a fundamental pure-exploration problem in multi-armed bandits. An agent aims to determine whether there exists an arm whose mean reward exceeds a known threshold $\mu_0$, or to output \textsf{None} otherwise. The agent must guarantee correctness with probability at least $1-\delta$, while minimizing the expected number of arm pulls $\mathbb{E}[\tau]$. We study the 1-identification problem and make two main contributions. First, for instances with at least one qualified arm, we derive a new lower bound on $\mathbb{E}[\tau]$ via a novel optimization formulation. Second, we propose a new algorithm and establish upper bounds that match the lower bounds up to polynomial logarithmic factors uniformly over all instances. Our result complements the analysis of $\mathbb{E}\tau$ when there are multiple qualified arms, which is an open problem in the literature.
| Subjects: | Machine Learning (cs.LG) |
| Cite as: | arXiv:2601.15620 [cs.LG] |
| (or arXiv:2601.15620v2 [cs.LG] for this version) | |
| https://doi.org/10.48550/arXiv.2601.15620 arXiv-issued DOI via DataCite |
Submission history
From: Zitian Li [view email]
[v1]
Thu, 22 Jan 2026 03:50:31 UTC (113 KB)
[v2]
Thu, 14 May 2026 06:31:00 UTC (63 KB)
