Abstract:Audio Language Models (ALMs) offer a promising shift towards explainable audio deepfake detections (ADD), moving beyond \textit{black-box} classifiers by providing transparency to their predictions via reasoning traces. However, such reasoning may not support the model predictions, reflecting poor coherence, or, worse, may rationalize incorrect predictions with plausible but misleading explanation. Moreover, the behavior of ALM reasoning under adversarial attacks remains under-explored, raising questions about the practical reliability of such explanation capabilities. To address this gap, this study introduces \textbf{SARA} (\textbf{S}hift \textbf{A}nalysis of \textbf{R}easoning in \textbf{A}udio), a diagnostic framework that evaluates ALM reasoning across three dimensions: acoustic perception, reasoning-verdict coherence and dissonance. We test five open-source ALMs against both acoustic and linguistic adversarial attacks. We show that acoustic attacks significantly degrade reasoning-verdict coherence (average decrease of 14.20\%), frequently inducing internal logical conflicts. Conversely, linguistic attacks achieve higher attack success rates while maintaining reasoning coherence. We further demonstrate that the textual coherence of generated reasoning traces also serves as a latent indicator of adversarial inputs, enabling effective detection of perturbed audio (0.78 in F1) \textit{without accessing the raw acoustic signal}. These findings suggest that reasoning traces provide diagnostic utility that persists even when final classification outputs are compromised.
| Comments: | Preprint for ACL 2026 submission |
| Subjects: | Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS) |
| MSC classes: | 68T50 |
| ACM classes: | I.2 |
| Cite as: | arXiv:2601.03615 [cs.CL] |
| (or arXiv:2601.03615v2 [cs.CL] for this version) | |
| https://doi.org/10.48550/arXiv.2601.03615 arXiv-issued DOI via DataCite |
Submission history
From: Binh Nguyen Quoc [view email]
[v1]
Wed, 7 Jan 2026 05:46:45 UTC (1,398 KB)
[v2]
Sun, 31 May 2026 06:28:59 UTC (256 KB)
