Embedding Perturbation may Better Reflect Intermediate-Step Uncertainty in LLM Reasoning

View PDF HTML (experimental)

Abstract:Large language Models (LLMs) have achieved significant breakthroughs across diverse domains; however, they can still produce unreliable or misleading outputs. For responsible LLM application, Uncertainty Quantification (UQ) techniques are used to estimate a model's uncertainty about its outputs, indicating the likelihood that those outputs may be problematic. For LLM reasoning tasks, it is essential to estimate the uncertainty not only for the final answer, but also for the intermediate steps of the reasoning, as this can enable more fine-grained and targeted interventions. In this study, we explore what UQ metrics better reflect the LLM's "intermediate uncertainty" during reasoning. Our study reveals that an LLM's incorrect reasoning steps tend to contain tokens which are highly sensitive to the perturbations on the preceding token embeddings, indicating the model's uncertainty among multiple competing continuations. In this way, uncertain (possibly incorrect) intermediate steps can be readily identified using this sensitivity score as guidance in practice. In our experiments, we show such perturbation-based metrics achieve stronger uncertainty quantification performance compared with baselines including probability-based, sampling-based and Bayesian-based methods. Meanwhile, such metrics also enjoy good simplicity and efficiency.

Subjects:	Machine Learning (cs.LG)
Cite as:	arXiv:2602.02427 [cs.LG]
	(or arXiv:2602.02427v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2602.02427 arXiv-issued DOI via DataCite

Submission history

From: Qihao Wen [view email]
[v1] Mon, 2 Feb 2026 18:27:26 UTC (1,172 KB)
[v2] Wed, 13 May 2026 18:26:04 UTC (898 KB)