FragileFlow: Spectral Control of Correct-but-Fragile Predictions for Foundation Model Robustness

View PDF HTML (experimental)

Abstract:Robust adaptation of LLMs and VLMs is often evaluated by average accuracy or average consistency under perturbations. However, these averages can hide a structured failure mode: a prediction may remain correct while probability mass already flows from particular true classes toward systematic wrong competitors near the decision boundary. In this paper, we formalize this phenomenon as margin-aware error flow and introduce FragileFlow, a plug-in regularizer that uses a calibrated margin buffer to identify correct-but-fragile predictions and organize their off-class probability mass into a class-wise vulnerable-risk matrix. Theoretically, we provide the first PAC-Bayes upper bound for this margin-aware error-flow object, showing how empirical spectral control yields a conservative route to deterministic worst-class robustness under a stability condition. Experiments on multiple-choice LLM benchmarks and few-shot CLIP adaptation show that FragileFlow consistently improves the proposed theory-facing risk measures over matched baselines, yields perturbed worst-class accuracy gains in most settings, and preserves clean accuracy across comparisons.

Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as:	arXiv:2605.08896 [cs.CL]
	(or arXiv:2605.08896v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2605.08896 arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Zhuoyun Li [view email]
[v1] Sat, 9 May 2026 11:25:07 UTC (575 KB)