Abstract:Recent advancements in large language models (LLMs) have transformed natural language understanding and generation, leading to extensive benchmarking across diverse tasks. However, cryptanalysis - a critical area for data security and its connection to LLMs' generalization abilities - remains underexplored in LLM evaluations. To address this gap, we evaluate the cryptanalytic potential of state-of-the-art LLMs on ciphertexts produced by a range of cryptographic algorithms. We introduce a benchmark dataset of diverse plaintexts, spanning multiple domains, lengths, writing styles, and topics, paired with their encrypted versions. Using zero-shot and few-shot settings along with chain-of-thought prompting, we assess LLMs' decryption success rate and discuss their comprehension abilities. Our findings reveal key insights into LLMs' strengths and limitations in side-channel scenarios and raise concerns about their susceptibility to under-generalization-related attacks. This research highlights the dual-use nature of LLMs in security contexts and contributes to the ongoing discussion on AI safety and security.
| Comments: | EMNLP'25 Findings |
| Subjects: | Computation and Language (cs.CL) |
| Cite as: | arXiv:2505.24621 [cs.CL] |
| (or arXiv:2505.24621v3 [cs.CL] for this version) | |
| https://doi.org/10.48550/arXiv.2505.24621 arXiv-issued DOI via DataCite |
Submission history
From: Utsav Maskey [view email]
[v1]
Fri, 30 May 2025 14:12:07 UTC (1,581 KB)
[v2]
Wed, 17 Sep 2025 15:53:19 UTC (1,302 KB)
[v3]
Sun, 31 May 2026 10:26:22 UTC (1,296 KB)
