Abstract:Accurate uncertainty quantification in large language models (LLMs) is essential for reliable confidence estimation, yet fine-tuned LLMs often become overconfident under limited adaptation data. Existing uncertainty methods for PEFT-based LLMs are largely post hoc, estimating uncertainty after fine-tuning rather than improving how adapters specialize to task-specific input-output relationships. We propose Functional-Level Uncertainty Quantification for Calibrated Fine-Tuning (UQ4CT), which calibrates uncertainty over the functional space induced by prompt-dependent mixtures of LoRA experts. UQ4CT implements this perspective through a mixture-of-experts fine-tuning framework, where a calibration loss aligns functional-level confidence with predictive correctness during training. Across four multiple-choice benchmarks and two open-ended generative QA tasks, UQ4CT reduces Expected Calibration Error (ECE) by over $25\%$ while preserving high accuracy. Under distribution shift, UQ4CT maintains superior calibration and competitive accuracy, demonstrating improved reliability and generalization for fine-tuned LLMs.
| Subjects: | Machine Learning (cs.LG) |
| Cite as: | arXiv:2410.06431 [cs.LG] |
| (or arXiv:2410.06431v5 [cs.LG] for this version) | |
| https://doi.org/10.48550/arXiv.2410.06431 arXiv-issued DOI via DataCite |
Submission history
From: Ruijia Niu [view email]
[v1]
Wed, 9 Oct 2024 00:09:15 UTC (179 KB)
[v2]
Fri, 31 Jan 2025 19:12:30 UTC (180 KB)
[v3]
Sun, 25 May 2025 08:11:40 UTC (280 KB)
[v4]
Mon, 29 Sep 2025 17:32:40 UTC (249 KB)
[v5]
Wed, 13 May 2026 22:22:18 UTC (250 KB)
