Benchmarking Large Language Models for Cryptanalysis and Side-Channel Vulnerabilities

View PDF HTML (experimental)

Abstract:Recent advancements in large language models (LLMs) have transformed natural language understanding and generation, leading to extensive benchmarking across diverse tasks. However, cryptanalysis - a critical area for data security and its connection to LLMs' generalization abilities - remains underexplored in LLM evaluations. To address this gap, we evaluate the cryptanalytic potential of state-of-the-art LLMs on ciphertexts produced by a range of cryptographic algorithms. We introduce a benchmark dataset of diverse plaintexts, spanning multiple domains, lengths, writing styles, and topics, paired with their encrypted versions. Using zero-shot and few-shot settings along with chain-of-thought prompting, we assess LLMs' decryption success rate and discuss their comprehension abilities. Our findings reveal key insights into LLMs' strengths and limitations in side-channel scenarios and raise concerns about their susceptibility to under-generalization-related attacks. This research highlights the dual-use nature of LLMs in security contexts and contributes to the ongoing discussion on AI safety and security.

Comments:	EMNLP'25 Findings
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2505.24621 [cs.CL]
	(or arXiv:2505.24621v3 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2505.24621 arXiv-issued DOI via DataCite

Submission history

From: Utsav Maskey [view email]
[v1] Fri, 30 May 2025 14:12:07 UTC (1,581 KB)
[v2] Wed, 17 Sep 2025 15:53:19 UTC (1,302 KB)
[v3] Sun, 31 May 2026 10:26:22 UTC (1,296 KB)