Inferring TLB Configuration with Performance Tools
Abstract
:1. Introduction
- A detailed process to reverse engineer the TLB configuration, including code, heatmap plots, gradient plots, and integrative analysis;
- A methodology that employs plotting the event gradient to provide additional support for determining the TLB configuration;
- An analysis of the methodology’s transferability to the IBM Power9 ISA.
1.1. Related Work
1.1.1. TLB General Operation
1.1.2. TLB Attacks
Translation Leak-Aside Buffer: Defeating Cache Side-Channel Protections with TLB Attacks
Not Lost in Translation: Implementing Side Channel Attacks Through the Translation Lookaside Buffer
TLB;DR Enhancing TLB-Based Attacks with TLB Desynchronized Reverse Engineering
1.1.3. TLB Defense
Secure TLBs
Risky Translations: Securing TLB Against Timing Side Channels
1.1.4. Related Works Limitations
2. Materials and Methods
2.1. Research Design
2.2. Materials and Instrumentation
2.2.1. Intel Xeon and i9
2.2.2. Power9 IBM ISA
2.3. Data Description
2.4. Procedure
Listing 1. Xeon DTLB C program. |
Listing 2. Xeon ITLB C program. |
Listing 3. Xeon STLB C program. |
2.5. Data Collection Method
2.6. Limitations
3. Results
3.1. Data Presentation
3.2. Analysis Results
- Intel Xeon CPUDTLB: The analysis for the gradient plot for the event dtlb_load_misses.stlb_hit aligns with our expectations. As illustrated in Figure 2b, ‘sets’ 16 experiences an increase after ‘ways’ 4, characterized by an increasing slope. The red, purple, and brown lines increase in the same ‘ways’, but the red line is the first to show an event increase and has the highest rate of change. Although this plot could be employed independently to deduce the DTLB configuration, this technique needs support from Table 1, Table 2 and Table 3 to yield consistent results across other TLB levels or CPU ISA.The analysis of the heatmap results also aligns with our expectations, as anticipated in [14,15]. In Figure 2a, observing dtlb_load_misses.stlb_hit, we discern a clear pattern: it starts with a low number of misses (indicated by dark blue), and as ‘ways’ increase, the number of misses rises (shown by lighter colors). Without prior knowledge of the TLB configuration, one would examine the heatmap to identify the lowest ‘ways’ that exhibit an increase in misses. The lowest such ‘ways’ is 4, occurring at ‘sets’ 16, 32, and 64. Since we aim to identify the smallest combination, we select ‘sets’ 16 and ‘ways’ 4, which aligns with Table 1.ITLB: The analysis of the gradient methodology for itlb_load_misses.stlb_hit, shown in Figure 3b, reveals that sets 16, represented by the red line, experience a quicker increase compared to other data series in the plot, specifically the purple and brown lines, suggesting a method to identify the configuration. Following the process that the lowest ‘set’–‘ways’ combination is the correct configuration, the red line confirms that ‘sets’ 16 and ‘ways’ 8 constitute the ITLB configuration. This contradicts Table 1, as can also be observed in the heatmap.Employing the heatmap reveals a discernible pattern, as shown in Figure 3a, yet it does not align with the data presented in Table 1. We expected the minimal ways showing an increase in events, specifically 8, to indicate the correct configuration. This increase, however, is observed at ‘sets’ 16 and ‘ways’ 8. Contrary to the specifications provided by the manufacturer, which in the past have shown to be inaccurate as indicated in [15], this study concludes that the correct configuration is indeed ‘sets’ 16 and ‘ways’ 8.STLB: The second stage is more challenging, primarily because it does not utilize a linear mapping function for selecting ‘sets’, making the probing of addresses significantly more difficult. This study concludes that the STLB employs an XOR-7 mapping with 128 ‘sets’. To infer the STLB configuration, the gradient method needs to be used in conjunction with the size of the STLB, known to be 1536. Figure 4b reveals that a change in the gradient is only noted when the set count is 128, XOR-7, offering no clear indicators or patterns that would assist in determining the number of ‘ways’. In this instance, the gradient plot serves as one part of the puzzle, providing the ‘sets’ count of 128. The other piece is the size of 1536. The ‘ways’ can be found by dividing the size by the ‘sets’, which results in ‘ways’ of 12. Thus, an XOR-7 mapping with 1536 entries suggests a 12-way configuration. Contrary to the official data, which indicates 256 ‘sets’ and 6 ‘ways’, the heatmap distinctly shows activity only in the scenario of XOR-7, as shown in Figure 4a.
- Intel i9 CPULoad-Only: The configuration of the Intel Core i9 CPU does not include a singular DTLB; it incorporates separate TLBs for load-only and store-only operations. This variation exemplifies how ISAs can change, even within the products of a single manufacturer, posing challenges for standardizing experiments. In this scenario, the gradient plot accurately indicates the expected pattern for ‘sets’ 16, as shown in Figure 5b. However, the data for ‘sets’ 32 and 64 also emerge as potential candidates, highlighting a limitation of the gradient method in this context. It confirms that ‘ways’ 4 is the minimum where event counts increase, but it does not unequivocally pinpoint ‘sets’ 16 as the corresponding configuration. Nevertheless, ‘sets’ 16 and ‘ways’ 4 is the lowest configuration.The heatmap results for the event dtlb_load_misses.stlb_hit align with our expectations, as shown in Figure 5a. It is evident that the lowest number of ‘ways’ is 4 and ‘sets’ is 16, respectively.Store-only: The gradient plot reveals an increase after ‘ways’ 12 across all sets, as shown in Figure 6b. Two key observations emerge: firstly, the presence of an increase at ‘ways’ 12, and secondly, a deceleration in the gradient at ‘ways’ 16. Here again, the gradient is one piece of the puzzle and the known size of the TLB is the other piece. By dividing the known size, 16, by the ‘set’, 1, the ‘ways’ is obtained as 16. ‘Set’ is selected because all ‘sets’ return similar behavior, so the lowest is ‘set’ 1.Similarly, the heatmap results demonstrate consistency across all sets, as shown in Figure 6a. Given that ‘ways’ is consistent for all ‘sets’, it suggests that the number of ‘sets’ is 1. However, the exact number of ‘ways’ remains unclear from the heatmap; based solely on the data, we might infer it to be 11, 12, or 13. This represents the first instance where the heatmap did not yield a specific ‘sets’-‘ways’ combination. Given this ambiguity, this study relies on the specifications listed in Table 2 as correct, and the same ‘set’-‘ways’ combination is determined to be 1–16.ITLB: The gradient methodology indicates that ‘sets’ 16 (red), 32 (purple), and 64 (brown) exhibit an increase at around ‘ways’ 8, as shown in Figure 7b. The lowest ‘sets’-‘ways’ combination is 16-8. Another way to reach the same conclusion is by choosing ‘sets’ 16, since it is the lowest among the possible candidates (16, 32, and 64), and using the size of the TLB from Table 2. Dividing the TLB size by ‘sets’ 16 gives ‘ways’ 8.The analysis of the heatmap results for itlb_load_misses.stlb_hit shows a discernible pattern, as illustrated in Figure 7a, which corresponds with the data outlined in Table 2.STLB: The gradient plot indicates that significant activity is only observed when the ‘set’ count is 128 under XOR-7 mapping as shown in Figure 8b. However, it provides no distinct indicators or patterns that would aid in concluding the number of ‘ways’. Again from Table 2 the size is known, as 1024, and it was also determined experimentally, and the ‘sets’ from Figure 8b is 128, so dividing the TLB size by ‘sets’ 128 returns ‘ways’ 8.In the heatmap in Figure 8a, it can be observed that dtlb_load_misses.walk_activate employs an XOR-7 mapping with 128 ‘sets’. Although the precise size of the TLB was not provided, it was determined through testing with an increasing number of ‘sets’ and ‘ways’. While the heatmap result is not detailed in this study, the capacity of the STLB was noted to be 1024 entries. Therefore, an XOR-7 mapping with 1024 entries implies an 8-way association.
- IBM Power9DERAT, IRAT, TLB: This study aimed to assess whether the methodology applied to Intel CPUs could be transferred to another ISA. The objective was to execute the C programs and shell scripts designed to collect and analyze data. The first challenge involved identifying the HPCs that are equivalent to those on Intel, which provides insights into the TLB configuration. To determine the configurations of DERAT and IRAT, we hypothesized that the data collected by pm_tlb_hit could serve as the equivalent to Intel’s dtlb_misses.stlb_hit. Similarly, for TLB configurations, we assumed that pm_tablewalk_cyc could be the counterpart to Intel’s event dtlb_load_misses.walk_active.In Figure 9a, the pm_tlb_hit event was used to determine the size of the TLB. The capacity of the TLB appears to be a combination of ‘ways’ and ‘sets’ that totals 1024. Given that the unit under test operates in Radix mode, it can be inferred that the actual size of the TLB is half of this number, equating to 512 entries. However, from the heatmap alone, we can only deduce that the total capacity is 1024, as indicated by the red squares.The gradient plot, Figure 9b, does not offer additional insights. It confirms the same combinations resulting in a total of 1024, allowing us to ascertain the capacity of the TLB, but it does not provide specific details regarding the configuration of ‘ways’ and ‘sets’.In Figure 10a, pm_tablewalk_cyc shows a pattern at ‘sets’ 512 and 1024. However, the TLB configuration of ‘sets’ 512 and ‘ways’ 4 cannot be deduced from this heatmap alone. Additionally, the gradient plot shown in Figure 10b does not provide any further insights into the correct configuration.Even though this study did not successfully reverse engineer the Power9 TLB configuration, the patterns observed in the heatmaps suggest that further modifications in the code, along with the exploration of additional events, could yield improved results.
4. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Disclaimer
References
- Disselkoen, C.; Kohlbrenner, D.; Porter, L.; Tullsen, D. Prime+Abort: A Timer-Free High-Precision L3 Cache Attack Using Intel TSX. In Proceedings of the 26th USENIX Security Symposium, Vancouver, BC, Canada, 16–18 August 2017. [Google Scholar]
- Kocher, P.; Genkin, D.; Gruss, D.; Haas, W.; Hamburg, M.; Lipp, M.; Mangard, S.; Prescher, T.; Schwarz, M.; Yarom, Y. Spectre attacks: Exploiting speculative execution. Commun. ACM 2020, 63, 93–101. [Google Scholar] [CrossRef]
- Lipp, M.; Schwarz, M.; Gruss, D.; Prescher, T.; Haas, W.; Horn, J.; Mangard, S.; Kocher, P.; Genkin, D.; Yarom, Y.; et al. Meltdown: Reading kernel memory from user space. Commun. ACM 2020, 63, 46–56. [Google Scholar] [CrossRef]
- Liu, F.; Yarom, Y.; Ge, Q.; Heiser, G.; Lee, R.B. Last-level cache side-channel attacks are practical. In Proceedings of the 2015 IEEE Symposium on Security and Privacy (SP), San Jose, CA, USA, 17–21 May 2015; IEEE: Piscataway, NJ, USA, 2015; pp. 605–622. [Google Scholar]
- Yarom, Y.; Falkner, K. Flush+ Reload: A High Resolution, Low Noise, L3 Cache Side-Channel Attack. In Proceedings of the USENIX Security Symposium, San Diego, CA, USA, 20–22 August 2014; pp. 719–732. [Google Scholar]
- Percival, C. Cache missing for fun and profit. In Proceedings of the Free BSD Presentations and Papers (2005), Ottawa, ON, Canada, 13–14 May 2005. [Google Scholar]
- Osvik, D.A.; Shamir, A.; Tromer, E. Cache Attacks and Countermeasures: The Case of AES. In Topics in Cryptology-CT-RSA 2006; Hutchison, D., Kanade, T., Kittler, J., Kleinberg, J.M., Mattern, F., Mitchell, J.C., Naor, M., Nierstrasz, O., Pandu Rangan, C., Steffen, B., et al., Eds.; Springer: Berlin/Heidelberg, Germany, 2006; Volume 3860, pp. 1–20. [Google Scholar] [CrossRef]
- Gullasch, D.; Bangerter, E.; Krenn, S. Cache Games–Bringing Access-Based Cache Attacks on AES to Practice. In Proceedings of the Security and Privacy (SP), 2011 IEEE Symposium On, Oakland, CA, USA, 22–25 May 2011; pp. 490–505. [Google Scholar]
- Braun, B.A.; Jana, S.; Boneh, D. Robust and efficient elimination of cache and timing side channels. arXiv 2015, arXiv:1506.00189. [Google Scholar]
- Gruss, D.; Schuster, F.; Ohrimenko, O.; Haller, I.; Lettner, J.; Costa, M. Strong and efficient cache side-channel protection using hardware transactional memory. In Proceedings of the 26th USENIX Security Symposium, Vancouver, BC, Canada, 16–18 August 2017. [Google Scholar]
- Liu, F.; Ge, Q.; Yarom, Y.; Mckeen, F.; Rozas, C.; Heiser, G.; Lee, R.B. Catalyst: Defeating last-level cache side channel attacks in cloud computing. In Proceedings of the 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA), Barcelona, Spain, 12–16 March 2016; IEEE: Piscataway, NJ, USA, 2016; pp. 406–418. [Google Scholar]
- Sprabery, R.; Evchenko, K.; Raj, A.; Bobba, R.B.; Mohan, S.; Campbell, R.H. A novel scheduling framework leveraging hardware cache partitioning for cache-side-channel elimination in clouds. arXiv 2017, arXiv:1708.09538. [Google Scholar]
- Demme, J.; Maycock, M.; Schmitz, J.; Tang, A.; Waksman, A.; Sethumadhavan, S.; Stolfo, S. On the Feasibility of Online Malware Detection with Performance Counters. ACM SIGARCH Comput. Archit. News 2013, 41, 559–570. [Google Scholar] [CrossRef]
- Gras, B.; Razavi, K.; Bos, H.; Giuffrida, C. Translation Leak-aside Buffer: Defeating Cache Side-channel Protections with TLB Attacks. In Proceedings of the USENIX Security Symposium, USENIX, Baltimore, MD, USA, 15–17 August 2018; pp. 955–972. [Google Scholar]
- Holmes, N. Not Lost in Translation: Implementing Side Channel Attacks Through the Translation Lookaside Buffer. Master’s Thesis, University of Warwick, Department of Computer Science, Coventry, UK, 2023. [Google Scholar]
- Deng, S.; Xiong, W.; Szefer, J. Secure TLBs. In Proceedings of the 46th International Symposium on Computer Architecture. Association for Computing Machinery, Phoenix, AZ, USA, 22–26 June 2019; pp. 346–359. [Google Scholar] [CrossRef]
- Tatar, A.; Trujillo, D.; Giuffrida, C.; Bos, H. TLB;DR: Enhancing TLB-based Attacks with TLB Desynchronized Reverse Engineering. In Proceedings of the 2020 IEEE Symposium on Security and Privacy (SP), San Francisco, CA, USA, 18–20 May 2020; pp. 1273–1290. [Google Scholar] [CrossRef]
- Hennessy, J.L.; Patterson, D.A. Computer Architecture, Sixth Edition: A Quantitative Approach, 6th ed.; Morgan Kaufmann Publishers Inc.: San Francisco, CA, USA, 2017. [Google Scholar]
- Stallings, W. Operating Systems: Internals and Design Principles; Pearson: New York, NY, USA, 2014. [Google Scholar]
- Stolz, F.; Thoma, J.P.; Güneysu, T.; Sasdrich, P. Risky Translations: Securing TLBs against Timing Side Channels. In Proceedings of the ACM Conference on Computer and Communications Security (CCS), Bochum, Germany, 2024. [Google Scholar]
- Power ISA Version 3.1B. 2021. Available online: http://www.openpowerfoundation.org (accessed on 15 April 2023).
- Linux Kernel Organization. Perf-A Performance Counting Tool. 2024. Available online: https://perf.wiki.kernel.org/index.php/Main_Page (accessed on 11 January 2024).
- IBM Corporation. POWER9 Processor User Manual, OpenPOWER v2.0GA; IBM Corporation: Armonk, NY, USA, 2018. Available online: https://openpowerfoundation.org/?resource_lib=power9-processor-user-manual (accessed on 29 October 2024).
- IBM Corporation. POWER9 Performance Monitoring Unit User Guide, v1.2; IBM Corporation: Armonk, NY, USA, 2018; Available online: https://openpowerfoundation.org/?resource_lib=power9-performance-monitoring-unit-user-guide (accessed on 29 October 2024).
TLB | DTLB | ITLB | STLB |
---|---|---|---|
Page Size | 4 K | 4 K | 4 K |
Ways | 4 | 8 | 6 |
Sets | 16 | 8 | 256 |
Entries | 64 | 64 | 1536 |
TLB | Load | Store | ITLB | STLB |
---|---|---|---|---|
Page Size | 4 K | 4 K | 4 K | 4 K |
Ways | 4 | 16 | 8 | 8 |
Sets | 16 | 1 | 16 | 128 |
Entries | 64 | 16 | 128 | 1024 |
TLB | DERAT | IRAT | TLB |
---|---|---|---|
Page Size | 16 K | 16 K | 16 K |
Ways | 64 | 64 | 4 |
Sets | 1 | 1 | 128 |
Entries | 64 | 64 | 512 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Agredo, C.; Langehaug, T.J.; Graham, S.R. Inferring TLB Configuration with Performance Tools. J. Cybersecur. Priv. 2024, 4, 951-971. https://doi.org/10.3390/jcp4040044
Agredo C, Langehaug TJ, Graham SR. Inferring TLB Configuration with Performance Tools. Journal of Cybersecurity and Privacy. 2024; 4(4):951-971. https://doi.org/10.3390/jcp4040044
Chicago/Turabian StyleAgredo, Cristian, Tor J. Langehaug, and Scott R. Graham. 2024. "Inferring TLB Configuration with Performance Tools" Journal of Cybersecurity and Privacy 4, no. 4: 951-971. https://doi.org/10.3390/jcp4040044
APA StyleAgredo, C., Langehaug, T. J., & Graham, S. R. (2024). Inferring TLB Configuration with Performance Tools. Journal of Cybersecurity and Privacy, 4(4), 951-971. https://doi.org/10.3390/jcp4040044