A Transformer-Based Deep Learning Approach for Cache Side-Channel Attack Detection on AES
Abstract
1. Introduction
- Dataset Construction: Addressing the data scarcity issue in SCA research, we constructed a comprehensive dataset comprising 10,000 timing trace samples. This dataset covers both Flush+Reload and Prime+Probe attack vectors, as well as benign system noise, ensuring a balanced and robust foundation for model training.
- Model Optimization: We propose a Transformer-based detection framework optimized for time-series analysis. By leveraging a multi-head self-attention mechanism, our model overcomes the local receptive field limitations of CNNs. It effectively fuses multi-dimensional features to identify complex attack patterns that traditional models miss.
- Performance Validation: We conduct a comparative analysis between the proposed Transformer and optimized CNN baselines. The experimental results demonstrate that our method achieves a classification accuracy of 94.00% in mixed-attack scenarios, outperforming the CNN baseline (which drops to 66.73%) by an absolute margin of 27.27%. Additionally, we integrate these detection results into a visualization interface to facilitate real-time security assessment for system administrators.
2. Background
2.1. Cache Side-Channel Attack
2.2. Neural Network
2.2.1. Convolutional Neural Networks
2.2.2. Transformer
3. Methods
3.1. Overview of Flush+Reload
- Flush Phase: The attacker uses the clflush instruction to evict a specific monitored memory line (e.g., a T-table entry in AES) from the entire cache hierarchy.
- Wait Phase: The attacker waits for a predefined interval. During this window, the victim process executes encryption operations. If the victim accesses the monitored line, the CPU reloads it into the cache; otherwise, it remains in the main memory.
- Reload Phase: The attacker re-accesses the memory line and measures the access latency using the rdtsc instruction. A short reload time implies a cache hit (victim access), while a long reload time indicates a cache miss (no victim access).
| Algorithm 1: Flush Reload Attack |
![]() |
| Algorithm 2: Create Data |
![]() |
3.2. Overview of Prime+Probe
Specific Implementation Details
- Prime Phase: The attacker traverses a linked list data structure that occupies all lines in a specific cache set. This fills the set with the attacker’s data (represented as curr_head in our implementation).
- Waiting Phase: The attacker yields the CPU to the victim process. In our experiment, the victim executes the OpenSSL AES encryption (specifically EVP_EncryptUpdate). If the encryption process involves a table lookup that maps to the primed cache set, the attacker’s cache lines are evicted to lower-level memory (L3 or DRAM).
- Probe Phase: The attacker traverses the linked list again. We measure the time required for this traversal. If the victim accessed the set, the attacker encounters cache misses, resulting in a significantly longer traversal time. If the victim did not access the set, the data remains in the cache, resulting in a fast traversal.
| Algorithm 3: Prime Probe Attack |
![]() |
3.3. Deep Learning Model Application
4. Evaluation
4.1. Experimental Setup
- Scale and Balance: The dataset consists of 10,000 samples, balanced between positive samples (indicating Attack presence) and negative samples (Benign system noise). This scale was empirically determined to satisfy the convergence requirements of the Transformer model.
- Attack Vectors: The positive samples include traces from both Flush+Reload and Prime+Probe attacks. For Flush+Reload, traces were captured by monitoring the access latency of specific cache lines corresponding to the AES T-tables. For Prime+Probe, traces were collected by measuring the access time of an eviction set after the victim’s execution.
- Labeling Strategy:
- -
- For Attack Detection, samples are labeled with binary targets: 0 for benign noise and 1 for attack activity.
- -
- For Key Recovery, labels were generated using the Differential Deep Learning Attack methodology. Based on the AES inverse S-box property, we calculated labels for the 14th byte of the key () using the recorded ciphertext bytes () via the equation: .
4.2. Flush+Reload
4.3. Prime+Probe
4.4. Classification Results
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Kocher, P.C. Timing attacks on implementations of Diffie-Hellman, RSA, DSS, and other systems. In Advances in Cryptology—CRYPTO’96; Koblitz, N., Ed.; Springer: Berlin/Heidelberg, Germany, 1996; Volume 1109, pp. 104–113. [Google Scholar]
- Kocher, P.; Jaffe, J.; Jun, B. Differential power analysis. In Advances in Cryptology (CRYPTO); Springer: Berlin/Heidelberg, Germany, 1999; pp. 388–397. [Google Scholar]
- Quisquater, J.J.; Samyde, D. Electromagnetic analysis (EMA): Measures and countermeasures for smart cards. In Proceedings of the International Conference on Research in Smart Cards, Cannes, France, 19–21 September 2001; Springer: Berlin/Heidelberg, Germany, 2001; pp. 200–210. [Google Scholar]
- Hettwer, B.; Gehrer, S.; Güneysu, T. Applications of machine learning techniques in side-channel attacks: A survey. J. Cryptogr. Eng. 2020, 10, 135–162. [Google Scholar] [CrossRef]
- Page, D. Theoretical use of cache memory as a cryptanalytic side-channel. IACR Cryptol. ePrint Arch. 2002, 169. Available online: https://eprint.iacr.org/2002/169 (accessed on 24 December 2025).
- Genkin, D.; Shamir, A.; Tromer, E. Acoustic cryptanalysis. J. Cryptol. 2017, 30, 392–443. [Google Scholar] [CrossRef]
- Biham, E.; Shamir, A. Differential fault analysis of secret key cryptosystems. In Advances in Cryptology (CRYPTO); Springer: Berlin/Heidelberg, Germany, 1997; pp. 513–525. [Google Scholar]
- Liang, X.; Gui, X.; Dai, H.; Zhang, C. Research on cross-virtual machine cache side-channel attack techniques in cloud environments. J. Comput. Sci. 2017, 40, 317–336. [Google Scholar]
- Gullasch, D.; Bangerter, E.; Krenn, S. Cache games—Bringing access-based cache attacks on AES to practice. In Proceedings of the 2011 IEEE Symposium on Security and Privacy, Oakland, CA, USA, 22–25 May 2011; pp. 490–505. [Google Scholar]
- Zhang, Y.; Juels, A.; Reiter, M.K.; Ristenpart, T. Cross-VM side channels and their use to extract private keys. In Proceedings of the ACM Conference on Computer and Communications Security (CCS), Raleigh, NC, USA, 16–18 October 2012; pp. 305–316. [Google Scholar]
- Song, D.; Gao, Y. Analysis of cache-based side-channel attack techniques. Appl. Integr. Circuits 2021, 38, 8–9. [Google Scholar]
- Liu, F.; Yarom, Y.; Ge, Q.; Heiser, G.; Lee, R.B. Last-level cache side-channel attacks are practical. In Proceedings of the 2015 IEEE Symposium on Security and Privacy (S&P), San Jose, CA, USA, 17–21 May 2015; pp. 605–622. [Google Scholar]
- Li, Z.; Tang, Y. Template attack on AES algorithm based on Euclidean distance. Comput. Eng. Appl. 2022, 58, 110–115. [Google Scholar]
- Lu, Y.; Chen, K.; Wang, Y. Cache attack method targeting the last round of AES encryption. J. Ordnance Equip. Eng. 2020, 41, 139–144. [Google Scholar]
- Lee, H.; Jang, S.; Kim, H.-Y.; Suh, T. Hardware-based FLUSH+RELOAD attack on Armv8 system via ACP. In Proceedings of the 2021 International Conference on Information Networking (ICOIN), Jeju Island, Republic of Korea, 13–16 January 2021; pp. 32–35. [Google Scholar]
- Wang, C.; Wei, S.; Zhang, F.; Song, K. Research review on cache side-channel defense. Comput. Res. Dev. 2021, 58, 794–810. [Google Scholar]
- Yang, H.; Wu, Z.; Wang, Y.; Du, Z.; Wang, M.; Xi, W.; Yan, W. Hyperparameter optimization based on Bayesian optimization in side-channel multilayer perceptron attacks. Comput. Appl. Softw. 2021, 38, 323–330. [Google Scholar]



| Model | Epoch | Time | Loss | Accuracy |
|---|---|---|---|---|
| CNN | 96 | 1 s 3 ms | 0.4029 | 0.9246 |
| 97 | 1 s 3 ms | 0.4337 | 0.9244 | |
| 98 | 1 s 3 ms | 0.4315 | 0.9123 | |
| 99 | 1 s 4 ms | 0.4135 | 0.9144 | |
| 100 | 1 s 3 ms | 0.4080 | 0.9190 |
| Model | Flush+Reload | Prime+Probe | Combined | ||
|---|---|---|---|---|---|
| Attack | No Attack | Attack | No Attack | Attack + No Attack | |
| CNN | 92.17% | 92.33% | 92.11% | 91.67% | 66.73% |
| Transformer | 97.27% | 97.37% | 97.64% | 97.66% | 94.00% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Li, Q.; Yang, X.; Ren, S. A Transformer-Based Deep Learning Approach for Cache Side-Channel Attack Detection on AES. Electronics 2026, 15, 148. https://doi.org/10.3390/electronics15010148
Li Q, Yang X, Ren S. A Transformer-Based Deep Learning Approach for Cache Side-Channel Attack Detection on AES. Electronics. 2026; 15(1):148. https://doi.org/10.3390/electronics15010148
Chicago/Turabian StyleLi, Qingtie, Xinyu Yang, and Shougang Ren. 2026. "A Transformer-Based Deep Learning Approach for Cache Side-Channel Attack Detection on AES" Electronics 15, no. 1: 148. https://doi.org/10.3390/electronics15010148
APA StyleLi, Q., Yang, X., & Ren, S. (2026). A Transformer-Based Deep Learning Approach for Cache Side-Channel Attack Detection on AES. Electronics, 15(1), 148. https://doi.org/10.3390/electronics15010148



