Next Article in Journal
Applications of Machine Learning in Cyber Security: A Review
Previous Article in Journal
Suspicious Financial Activity in the Context of In-Game Asset Exchange Marketplace
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Inferring TLB Configuration with Performance Tools

Air Force Institute of Technology, 2950 Hobson Way, Dayton, OH 45433, USA
*
Author to whom correspondence should be addressed.
J. Cybersecur. Priv. 2024, 4(4), 951-971; https://doi.org/10.3390/jcp4040044
Submission received: 19 September 2024 / Revised: 16 October 2024 / Accepted: 21 October 2024 / Published: 12 November 2024

Abstract

:
Modern computing systems are primarily designed for maximum performance, which inadvertently introduces vulnerabilities at the micro-architecture level. While cache side-channel analysis has received significant attention, other Central Processing Units (CPUs) components like the Translation Lookaside Buffer (TLB) can also be exploited to leak sensitive information. This paper focuses on the TLB, a micro-architecture component that is vulnerable to side-channel attacks. Despite the coarse granularity at the page level, advancements in tools and techniques have made TLB information leakage feasible. The primary goal of this study is not to demonstrate the potential for information leakage from the TLB but to establish a comprehensive framework to reverse engineer the TLB configuration, a critical aspect of side-channel analysis attacks that have previously succeeded in extracting sensitive data. The methodology involves detailed reverse engineering efforts on Intel CPUs, complemented by analytical tools to support TLB reverse engineering. This study successfully reverse-engineered the TLB configurations for Intel CPUs and introduced visual tools for further analysis. These results can be used to explore TLB vulnerabilities in greater depth. However, when attempting to apply the same methodology to the IBM Power9, it became clear that the methodology was not transferable, as mapping functions and performance counters vary across different vendors.

1. Introduction

Traditional side-channel attacks have primarily targeted CPU caches [1,2,3,4,5,6,7,8]. In this study, the term ‘cache’ refers to all caches in the memory architecture except the TLB; the TLB will always be referred to by its specific name. These attacks exploit information leakage through shared CPU data and instruction caches. Many defenses have been developed to protect systems from these attacks [9,10,11,12,13], however, other hardware-shared resources, such as TLBs, also pose a security risk [14].
In general, successful side-channel attacks require concurrent execution of two processes on the same CPU core. For an attacker to facilitate this concurrent execution, a thorough understanding of the Instruction Set Architecture (ISA) is essential. The ISA defines the hardware-software interface within the CPU, including components like the TLB. While vendors do not open-source the hardware implementation of TLBs, and each vendor has its unique design, a common characteristic across different vendors is the presence of a level 1 data TLB, an instruction TLB, and a level 2 shared TLB [14]. In this paper, they will be referred to as DTLB, ITLB, and STLB, respectively. Another common characteristic is that data TLBs are typically shared resources within CPU cores. This shared nature makes them vulnerable to exploitation in timing side-channel attacks.
Studies focused on the TLB for security purposes [14,15] have exclusively used Intel CPUs. This study presents a detailed methodology that works for Intel CPUs, includes gradient plots for additional support, and confirms that the methodology cannot be transferred to other ISAs due to inconsistencies in performance counters and mapping functions.
The initial step in these attacks is to determine the set-associative hardware managed TLB configuration, which is characterized by ‘sets’ and ‘ways’. The TLB is partitioned into ‘s’ sets, and each set contains ‘w’ ways. This ‘set’-‘ways’ configuration defines the cache-like structure of the TLB.
The contributions of the paper are as follows:
  • A detailed process to reverse engineer the TLB configuration, including code, heatmap plots, gradient plots, and integrative analysis;
  • A methodology that employs plotting the event gradient to provide additional support for determining the TLB configuration;
  • An analysis of the methodology’s transferability to the IBM Power9 ISA.
The paper is organized as follows. Section 1 is dedicated to reviewing the existing literature. Section 2 describes the methodology, including research design, instrumentation used, procedures, data collection methods, and limitations. Section 3 presents the results of the system under test, including plots and their interpretation. Section 4 concludes the paper and explores future directions for TLB research.

1.1. Related Work

This section starts with an overview of TLB operation and its interactions with other systems such as the MMU, page tables, and main memory. It then presents key research papers that emphasize the utilization of TLB side-channel analysis. The significant papers related to TLB side-channel attacks and defenses include “Translation Leak-aside Buffer: Defeating Cache Side-channel Protections with TLB Attacks” [14], “Not Lost in Translation: Implementing Side Channel Attacks Through the Translation Lookaside Buffer” [15], “Secure TLBs” [16], and “Enhancing TLB-based Attacks with TLB Desynchronized Reverse Engineering” [17].

1.1.1. TLB General Operation

In computing systems, CPUs must efficiently process data and instructions. The challenge lies in accessing the main memory and transferring the data to the CPU. To improve memory access performance, CPUs utilize a hierarchy of caches, with lower levels consisting of small, fast caches, and higher levels being larger and slower. To further enhance memory access, CPUs employ virtual addresses. Virtual addresses represent locations in an abstract address space, enabling processes to operate as though each has its own isolated memory space, despite sharing physical memory with other processes. Virtual Address (VA) to Physical Address (PA) translations are managed by the MMU. The TLB is a module internal to the MMU that caches address translations. These mappings from VAs to PAs are usually based on the most recently used address pairs. Accessing address translations from TLBs is more efficient than retrieving them from the main memory’s page tables, as it avoids the time-consuming process of a page table walk to find the corresponding PA [18,19].
Figure 1 shows the main modules involve in the address translation process. The data flow begins when a process requests access using a VA. This VA comprises a VPN and an offset. The VPN is then checked against the TLB. If the VPN matches an entry in the TLB (a ‘hit’), the VPN is translated to a PPN; the VPN also helps in identifying the specific ‘set’ and ‘way’ within the TLB where the PPN is located. This PPN corresponds to a page in physical memory, and the offset directs to the precise byte within that page.
In cases where the VPN does not match any entry in the TLB, a ‘miss’ occurs and the MMU initiates a page table walk to locate the corresponding PA. Once the correct translation is found, the PA is returned to the MMU. Next, the TLB is updated with this new information. The MMU then accesses the data at the PA and relays it back to the CPU. If the MMU’s page table walk fails to locate the requested page in Random Access Memory (RAM), a page fault occurs. In response to a page fault, the MMU signals the CPU triggering an interrupt. This interrupt activates the Operating System (OS)’s page fault handler to resolve the fault. The OS may then fetch the needed information from the disk, or it might determine that the page is not allocated.

1.1.2. TLB Attacks

Translation Leak-Aside Buffer: Defeating Cache Side-Channel Protections with TLB Attacks

As a primary contribution, ref. [14] highlights that caches are not the only shared resources vulnerable to exploitation. TLBs also share this susceptibility and can be utilized for side-channel analysis. While ref. [14] presents a framework for executing such an attack, the described steps are general, making replication effectively an entirely independent effort.
For instance, when discussing the reverse engineering of the TLB configuration, the paper highlights the use of performance tools. It describes a method that involves accessing memory addresses to uncover the hardware implementation’s underlying ‘ways’ and ‘sets’. Subsequent graphs of the results demonstrate that certain combinations of ways and sets lead to an increase in the event count. However, critical details such as the specific code used, or the exact use of the perf tools are not provided. Perf is a performance analysis tool for Linux that collects and analyzes performance and tracing data. This omission leaves readers with only broad directions, needing significant effort to piece together the information and accurately determine the TLB configuration.

Not Lost in Translation: Implementing Side Channel Attacks Through the Translation Lookaside Buffer

The work in [15] primarily replicates the findings in [14] and further explores additional TLB attacks, such as the timing of network pings and identifying the execution of Linux commands. However, like the referenced paper, the methodology for reverse engineering the TLB configuration in this thesis is also broadly outlined, providing only general steps and results. Consequently, replicating the study involves considerable effort due to the lack of detailed procedural guidance.

TLB;DR Enhancing TLB-Based Attacks with TLB Desynchronized Reverse Engineering

Unlike previous methods that depended on timing or performance counters, the technique presented in this paper is based on the fundamental properties of TLBs [17]. This technique, known as desynchronization, provides detailed insights into TLB behaviors such as replacement policies and handling process context identifiers (PCIDs) on commodity Intel processors. The authors also claim that this gain of knowledge enables more attacks. Moreover, they use their insights to design adversarial access patterns that manipulate the TLB state into evicting a target entry in the minimum number of steps, then examine their impact on several classes of prior TLB-based attacks. Ref. [17] is included to demonstrate that this methodology is known, however, this study focuses on using Hardware Performance Computings (HPCs).

1.1.3. TLB Defense

Secure TLBs

The authors of [16] discussed a hardware-based defense strategy against TLB side-channel analysis. The paper introduces two innovative designs aimed at mitigating these attacks: the Static-Partition (SP) TLB and the Random-Fill (RF) TLB. The SP TLB successfully mitigates 14 out of the 24 identified vulnerabilities, while the RF TLB protects against all known vulnerabilities, achieving this with less than a 10% performance overhead. This paper presents the first hardware defenses to TLB attacks [16].

Risky Translations: Securing TLB Against Timing Side Channels

The focus of this paper is on defeating side-channel attacks based on page translations [20]. The authors analyzed proposals for side-channel secure cache architectures and investigated their applicability to TLB side-channels. They concluded that those cache countermeasures do not apply to TLBs. Finally, they proposed TLBCoat, a side-channel secure TLB architecture [20]. While TLBCoat provides security for TLB-based side-channel attacks, the paper does not address whether these measures are applicable to counter side-channel attacks that exploit HPCs.

1.1.4. Related Works Limitations

Starting with the TLB attack papers present diverse techniques like timing [14,15] and desynchronization [17]. Moreover, these authors used their techniques to exploit TLB vulnerabilities. However, all these studies have the same limitation. Their methodologies were executed exclusively in Intel CPUs. There is no intent to develop a generalized methodology, and perhaps it cannot be developed, which can apply to another ISA. This is clearly a complex task and our study shows that the methodology is not directly transferable, but it pinpoints the main reasons, which are the differences between performance counters and mapping functions for each ISA.
The TLB defense [16,20] presents protection against timing attacks, which is the most well-known. The main limitation is its vulnerability to desynchronization attacks and any other non-timing attack. These studies are presented to show the lack of literature focused on TLB countermeasures.
Starting with TLB attacks, papers present various techniques such as timing-based methods [14,15] and desynchronization [17]. These authors used these techniques to exploit TLBs vulnerabilities. However, all these studies share the same limitation: their methodologies were applied exclusively to Intel CPUs. There is no intent to develop a generalized methodology, and perhaps one cannot be developed. This is clearly a complex task, and our study demonstrates that the methodology is not directly transferable. It also highlights the primary reasons for this, which are the differences in performance counters and mapping functions across different ISAs.
The TLB defense papers [16,20] offer protection against timing attacks, which are the most well-known. However, the main limitation of these defenses is their vulnerability to desynchronization attacks and other non-timing attacks. These studies highlight the lack of literature focused on TLB countermeasures.

2. Materials and Methods

2.1. Research Design

The data for this analysis, which are quantitative in nature, was collected by monitoring the TLBs of the system under test. This capability is accessible on Linux-based operating systems but the specific TLB events that can be monitored vary depending on the operating system’s kernel version and CPU architecture. Therefore, our analysis depends on the system running a Linux OS and having access to HPC data that records information about TLB events.
After collection, the data are organized in two ways. First, they are plotted on a heatmap where the color of each block represents the number of events, with dark colors, such as dark blue, indicating a low number of occurrences, and lighter colors, like yellow, indicating a high number of occurrences. Additionally, in this study, the data are plotted in a gradient plot, providing a different perspective for analysis. For instance, in a specific ‘set’, it can be determined how many ‘ways’ show an increase in a particular event. The results section provides a detailed interpretation of the specific heatmap and gradient plot.
In the first part of our study, we focus exclusively on events that are available on both the Intel Xeon and i9 CPUs. The second part evaluates the applicability of the same procedure to a different architecture; thus, we replicate the methodology on the IBM Power9.
The Intel VTune Profiler, a tool compatible with Windows and Linux, could be considered for future analysis. However, it was not included in this paper primarily because it is better suited for an independent assessment rather than for a direct comparison with the perf tool.

2.2. Materials and Instrumentation

This study considers two different ISAs using data collected by executing a software probe. Data were collected for Intel Xeon and Intel Core i9 processors, both supporting the x86_64 ISA, and for IBM Power9, which supports the Power ISA [21]. We utilized the Linux Performance Tool [22] to monitor specific performance events associated with these CPUs. Additionally, VSCode was employed for the development and execution of C, shell, and Python programs. These programs were used in conjunction with the Linux Performance Tool to facilitate the collection and analysis of data. Specifically, this integration meant using the shell script to run a C program and iterate through multiple perf events.

2.2.1. Intel Xeon and i9

The truth data for the Intel processors were obtained by leveraging the getconf Linux tool. By using the getconf command with the PAGESIZE argument, we can reveal the system’s memory page size. This knowledge serves two purposes. First, along with the additional commands discussed next, it helps us match the proper TLB configuration (configurations vary based on PAGESIZE), and second, it informs us how to appropriately adjust the stride size in our memory writing scripts. This ensures that our data collection is optimized for the system’s memory architecture.
Furthermore, we utilize other Linux commands, such as cpuid and cpuls, to gather detailed CPU information. While these commands do not directly provide detailed information about the TLB—such as the L1 DTLB and ITLB mapping functions or the L2 STLB mapping functions—they do offer insights into the TLB configuration based on page size. The absence of specific information about the mapping functions poses a significant challenge in this research. The correct mapping function is largely derived from the findings in [14,15], requiring numerous iterations and adjustments to work with our systems under test. This information is crucial for hypothesizing TLB setups and adjusting our analysis accordingly.
One question needs to be answered before proceeding with this study: Given that cpuid and cpuls provide potential TLB configurations, is it necessary to reverse engineer it? The answer is yes for three reasons. First, using these commands requires superuser access, which is not always available to an attacker. Second, experimental results have shown discrepancies with the specifications as indicated by [15] and by this study. Third, as mentioned above, mapping functions are not open source, so reverse engineering the TLB configuration is necessary to enable accurate probing and inference of these mapping functions.
Based on the data gathered, we have identified several TLB configurations for both the Intel Xeon and Intel Core i9 CPUs, with variations depending on the page size. Specifically, the configuration for the Xeon with a 4 K page size is detailed in Table 1. Similarly, for the Intel Core i9, we have determined multiple TLB configurations based on the page size, with the configuration for a 4 KB page size presented in Table 2.
These configurations are inferred based on the available data and the system’s responses to our Linux command queries. While they provide a strong basis for our analysis, they are not definitive specifications as the reverse engineering effort will show later. Nevertheless, these inferred configurations allow us to conduct a targeted and relevant analysis of the TLB behavior in these CPUs.

2.2.2. Power9 IBM ISA

The truth data for the Power9 were obtained from references [23,24]. The Power9 system, which runs Linux, supports the getconf command, and the PAGESIZE argument was utilized to determine the page size. The system under test employed a 16 K page size. Table 3 summarizes the TLB configuration for this system, including a Data Effective-to-Real Address Translation (DERAT) and Instruction Effective-to-Real Address Translation (IERAT) at the first level, with a TLB at the second level. The perf tool was also considered, though the events available differ from those in the Intel architecture. The primary challenge with Power9, or any ISA, involves the TLB mapping function. One approach to exploring a new TLB configuration is to test various standard mapping functions. Investigating common mapping functions used by specific vendors and experimenting with them could be effective. However, without proper documentation, it would be extremely difficult to infer the TLB configuration, especially since some vendors implement complex mapping functions involving multiple operations (e.g., multiple N-XOR operations).
Another challenge is the availability of event counters. For security reasons, some systems disable access to event counters. Additionally, each ISA has its own set of counters, so a performance counter that works in Intel may not be available in IBM. In such cases, an equivalent counter needs to be identified, if one exists.

2.3. Data Description

The dataset contains performance metrics for TLB micro-architecture, focusing on various configurations defined by the number of sets and ways. For each configuration, measurements were taken over 10 runs, sampling each run 10 times per event. The data from these 10 runs, sampled 10 times per run, is averaged for use in the heatmap and gradient analysis. The key metrics recorded include the time taken, event count, mean value, standard deviation, skewness, and kurtosis for each type of TLB-related event, such as dtlb_load_misses.miss_causes_a_walk, dtlb_load_misses.stlb_hit, itlb_misses.stlb_hit, and itlb_misses.miss_causes_a_walk. The dataset provides a comprehensive statistical overview of the TLB’s performance characteristics under different configurations, allowing for in-depth analysis of its behavior and efficiency.

2.4. Procedure

C programs are used to implement the software probes used in this study. For each system under test, we utilize three C programs, each designed to access memory in ways that allow us to infer the configurations of the DTLB, ITLB, and STLB. The first C program, aimed at inferring the DTLB configuration, accepts two key inputs: the number of sets and the number of ways. Using the malloc function, it allocates a region of memory sized according to the formula: REGION_SIZE = PAGESIZE × 256 × 1024 × 1024. This size ensures adequate memory coverage and facilitates effective monitoring of its impact on the TLB. The program iterates through the specified numbers of ‘sets’ and ‘ways’, and it uses a mapping function to access VAs that map to the same ‘set’ in the TLB. It is also designed to run for a predetermined duration during each execution. The crucial aspect of this DTLB C program is accessing VAs that map to the TLB following ( S = VA mod S , where S is the target set, and S is the total sets as indicated in [15]). The code’s strategy involves setting a stride that, when combined with this linear function, maps to a specific set. As the ‘ways’ and ‘sets’ numbers increase, we identify points where a minimal increase in ways leads to a rise in events. The implementation of the mapping function is illustrated in Listing 1.
Listing 1. Xeon DTLB C program.
Jcp 04 00044 i001
The second C program is designed to infer the configuration of the ITLB. It accepts two inputs: the number of ways and sets. Similar to the first program, it accesses memory linearly. The main distinction, however, lies in how this program populates the allocated memory. Specifically, it fills the memory with NOP (no-operation) instructions, followed by a RET (return) instruction, as shown in Listing 2. This pattern, in combination with the linear mapping and the incrementally increasing number of ways and sets, aids in inferring the ITLB configuration.
Listing 2. Xeon ITLB C program.
Jcp 04 00044 i002
The third C program is designed to infer the shared TLB. After testing the linear mapping to infer the STLB, and not obtaining an output with a pattern indicating a TLB configuration, a different mapping function was needed. As indicated by [14,15], if a linear mapping function does not work, then a more complex XOR-N function is used by the system. The XOR-N function basically takes a subset of bits, performs an XOR on them and the result indicates the set. The C program implementation is shown in Listing 3. The piece of code shows how the virtual address is built based on N bits that are XOR.
Listing 3. Xeon STLB C program.
Jcp 04 00044 i003
A shell script is designed to execute each of the previously mentioned C programs, specifically to pass values for ways and sets as arguments. The script is configured to handle a range of candidate ‘ways’ from 0 to 20 (and from 0 to 50 for XOR-N), aligning with the parameters of this study. The ‘sets’ are fixed at predetermined values: 4, 8, 16, 32, and 64 (with 128 and 256 for XOR-N). Additionally, the script is structured to sequentially monitor specific events of interest in our study. It uses a loop structure, where each iteration executes the C program with a fixed ‘sets’ value while systematically varying ‘ways’ within the specified range. A distinctive feature of the script is its ability to initiate event tracking exclusively when the C program is running, thereby minimizing the recording of unrelated processes. The results from each unique ‘set’-and-‘ways’ combination are compiled and stored in Comma Separated Values (CSV) files, enabling detailed subsequent analysis.
Two Python scripts are used to analyze the data. The Python A program, heatmap_tlb.py is designed to process the CSV files generated by the shell script. Its primary role is to remove log messages so that specific data related to the performance event produces a consistent CSV file structure across all program executions. In this context, ‘command’ denotes the event that was measured. The script carefully extracts data for ‘ways’ in the predetermined range, along with the corresponding values of the event. Additionally, it incorporates the ‘set’ value into this newly formed dataset. Consequently, we obtain a comprehensive dataset that encapsulates a specific ‘set’ for the entire range of ‘ways’, along with their respective event values. Building upon the data, the script then employs Pandas, a powerful Python data analysis library, to create a pivot table. A pivot table maps the ‘set’-and-‘ways’ combinations to the event. The final product of this script is a heatmap, which visually represents these combinations on a single plot. This heatmap offers a clear and concise visual representation of the data, thus facilitating easier interpretation and analysis of the patterns and trends within the dataset.
Python B reads CSV files containing performance data for each candidate TLB configuration, selects the data by a specific performance event, and then isolates data within a specified range of ‘ways’. The script calculates the gradient (first derivative) of the mean values with respect to ‘ways’ for each ‘sets’ configuration, indicating how the performance metric changes as the number of ‘ways’ increases. These derivatives can be used to identify trends or significant changes in the performance metric. The results are plotted, displaying gradients against ‘ways’ for each ‘sets’ configuration, providing a visual analysis of the performance data. This approach enables a detailed examination of how different TLB configurations affect CPU performance. In this study, we want to verify if the gradient plot can help us determine the TLB configuration, or if it at least helps to complement the heatmaps generated by Python A.

2.5. Data Collection Method

The data for this study were gathered from computer systems operating under the Linux OS. This OS is particularly advantageous as it provides a system facility to aggregate data from multiple HPC reads, which are crucial for our analysis. Utilizing the perf tool, we were able to monitor and record events related to the TLBs.
To capture the data, we employed a combination of C programs and shell scripts. These scripts were specifically designed to extract precise values from the TLB events monitored by the perf tool. We placed emphasis on collecting a broad spectrum of TLB event counts for a variety of candidate ‘ways’ and ‘sets’. This comprehensive approach ensures that our dataset is robust and diverse, providing us with a large amount of information. Such a dataset provides insights into the system’s performance and behavior under various configurations of ‘ways’ and ‘sets’. This level of detailed data collection paves the way for a more detailed understanding and analysis of the TLB’s functioning in Linux-operated computer systems.
In this study, HPCs played a key role in the collection of the heatmap dataset. These HPCs enable the precise collection and utilization of the dataset. The heatmap plot, which is generated from the data, serves as a crucial visualization tool. It illustrates how the values of specific events fluctuate as we vary the ‘sets’ and ‘ways’ parameters. This visual representation is not just an analytical convenience but an instrument for identifying patterns, discerning differences, and understanding relationships within the data.
Gradient plots are generated as an alternative methodology for determining the TLB configuration. These plots are analyzed to identify patterns indicating how gradient values correlate with an increase in events, directly linked to the TLB’s ‘ways’ and ‘sets’ combination. Alternatively, the gradient plot could also serve to complement the heatmap analysis, offering a multifaceted view of the data.
To ensure comprehensive data collection and analysis, we integrated several tools and platforms. The Linux operating system provided the necessary environment to access and effectively utilize the HPCs. The perf tool was employed to monitor relevant TLB events, capturing the data required for our analysis. Furthermore, Visual Studio Code (VSCode) served as the development platform for our C, Shell, and Python scripts. These scripts were crucial in both collecting and analyzing the data, ensuring that our methodology was rigorous and precise.
We replicated these methodologies on different CPU platforms, specifically the Intel Xeon, the Core i9, and the IBM Power9. By applying the same tools and techniques to collect and analyze data on these CPUs, we aim to test the transferability of the methodologies across different CPU architectures. This step is crucial to ensure whether our findings and insights are specific to a particular CPU model or if they hold broader applicability.
The final phase of our study involves a comparative analysis between the results obtained from the Intel Xeon and Intel Core i9 CPUs. This comparison provides insights into the differences and similarities in TLB configurations across these CPUs but also enhances our understanding of how these configurations influence system performance. By contrasting the findings from two distinct CPU models, we aim to draw more comprehensive conclusions that can inform future research and practical applications.

2.6. Limitations

The use of performance analysis tools, specifically perf, presents a set of constraints within this study. Perf is dependent on hardware counters, which are subject to variability based on several factors, such as the operating system, the ISA, and user permissions. This variability makes the counters challenging to interpret; in some instances, they may not be usable at all. Consequently, making comparisons between different computer systems becomes problematic due to the fluctuating nature of these factors.
The most advantageous operating system for our purposes is Linux. In comparison to other operating systems, Linux stands out for its ease of access to HPCs. Many alternative operating systems either present significant hurdles in accessing HPCs or, in some cases, do not provide access at all. This accessibility is a crucial factor in our study, as HPCs are integral to our data collection and analysis processes.
Another notable limitation pertains to the variability of ISAs. This variability poses a challenge in establishing a standardized framework for TLB analysis. The diverse nature of ISAs means that methodologies and tools optimized for one may not be directly applicable or as effective for another. For example, the TLBs mapping function varies between vendors, and even within the same vendor, it can differ across different levels of the memory hierarchy. This lack of standardization across different ISAs can complicate the process of developing a unified approach for TLB analysis, potentially impacting the efficiency and comparability of results across different computing environments

3. Results

3.1. Data Presentation

In the analysis of the Intel Xeon CPU, we focused on creating a gradient plot and a heatmap for TLB-specific events.
The gradient plot shows the rate of change of the specific ‘set’ and event with respect to ‘ways’. The graphs include a legend with the respective color for each ‘set’. This plot provides another perspective for the analysis of HPCs, enabling a quick visual interpretation of which line is the first to increase and how strong the rate of change is. This visually represents the various ‘ways’ and ‘sets’. In combination with Table 1, Table 2 and Table 3, this study infers the TLB configuration. Although the data from the tables is useful, it does not represent the final configuration. This paper demonstrates how, in combination with the gradient plot, the correct configurations are determined. Finally, it is assumed that the size of the TLB is known. This is a reasonable assumption since the size can be determined, as it was, by iterating through increasing combinations of ‘ways’–‘sets’ combinations. To confirm the results of the gradient plot analysis, this study includes heatmaps.
In our heatmap layout, the ‘sets’ are plotted along the x-axis, while the ‘ways’ are plotted on the y-axis. This arrangement allows for an intuitive understanding of how changes in ‘ways’ and ‘sets’ correlate with the output of these events. Additionally, the heatmap includes a vertical legend on the right side. This legend uses a color gradient to denote the intensity of the event’s output. The colors range from dark blue, indicating a zero value, to yellow, which signifies the highest recorded value. This color coding provides a quick and effective means to gauge the relative magnitudes of the events across different combinations of ‘ways’ and ‘sets’.
For the Intel Core i9 CPU, we applied the same methodologies as described previously, including the use of identical tools for data collection and analysis. This approach ensured that the only variable in our comparative study was the CPU model itself. By maintaining a consistent methodology, we aimed to isolate the CPU as the primary factor influencing any observed differences in the heatmap data.
For the IBM Power9, we attempted to apply the same methodologies. We utilized C programs and shell scripts for data collection and analysis. The main difference lies in executing these programs directly from the command line, rather than using VSCode, for data collection and analysis. A significant distinction was also found in the availability and interpretability of HPC events. While the HPC events for the Intel CPUs are well-known and yield interpretable values, provided the rest of the experiment is set up correctly, this is not necessarily the case for the Power9 HPCs.

3.2. Analysis Results

  • Intel Xeon CPU
    DTLB: The analysis for the gradient plot for the event dtlb_load_misses.stlb_hit aligns with our expectations. As illustrated in Figure 2b, ‘sets’ 16 experiences an increase after ‘ways’ 4, characterized by an increasing slope. The red, purple, and brown lines increase in the same ‘ways’, but the red line is the first to show an event increase and has the highest rate of change. Although this plot could be employed independently to deduce the DTLB configuration, this technique needs support from Table 1, Table 2 and Table 3 to yield consistent results across other TLB levels or CPU ISA.
    The analysis of the heatmap results also aligns with our expectations, as anticipated in [14,15]. In Figure 2a, observing dtlb_load_misses.stlb_hit, we discern a clear pattern: it starts with a low number of misses (indicated by dark blue), and as ‘ways’ increase, the number of misses rises (shown by lighter colors). Without prior knowledge of the TLB configuration, one would examine the heatmap to identify the lowest ‘ways’ that exhibit an increase in misses. The lowest such ‘ways’ is 4, occurring at ‘sets’ 16, 32, and 64. Since we aim to identify the smallest combination, we select ‘sets’ 16 and ‘ways’ 4, which aligns with Table 1.
    ITLB: The analysis of the gradient methodology for itlb_load_misses.stlb_hit, shown in Figure 3b, reveals that sets 16, represented by the red line, experience a quicker increase compared to other data series in the plot, specifically the purple and brown lines, suggesting a method to identify the configuration. Following the process that the lowest ‘set’–‘ways’ combination is the correct configuration, the red line confirms that ‘sets’ 16 and ‘ways’ 8 constitute the ITLB configuration. This contradicts Table 1, as can also be observed in the heatmap.
    Employing the heatmap reveals a discernible pattern, as shown in Figure 3a, yet it does not align with the data presented in Table 1. We expected the minimal ways showing an increase in events, specifically 8, to indicate the correct configuration. This increase, however, is observed at ‘sets’ 16 and ‘ways’ 8. Contrary to the specifications provided by the manufacturer, which in the past have shown to be inaccurate as indicated in [15], this study concludes that the correct configuration is indeed ‘sets’ 16 and ‘ways’ 8.
    STLB: The second stage is more challenging, primarily because it does not utilize a linear mapping function for selecting ‘sets’, making the probing of addresses significantly more difficult. This study concludes that the STLB employs an XOR-7 mapping with 128 ‘sets’. To infer the STLB configuration, the gradient method needs to be used in conjunction with the size of the STLB, known to be 1536. Figure 4b reveals that a change in the gradient is only noted when the set count is 128, XOR-7, offering no clear indicators or patterns that would assist in determining the number of ‘ways’. In this instance, the gradient plot serves as one part of the puzzle, providing the ‘sets’ count of 128. The other piece is the size of 1536. The ‘ways’ can be found by dividing the size by the ‘sets’, which results in ‘ways’ of 12. Thus, an XOR-7 mapping with 1536 entries suggests a 12-way configuration. Contrary to the official data, which indicates 256 ‘sets’ and 6 ‘ways’, the heatmap distinctly shows activity only in the scenario of XOR-7, as shown in Figure 4a.
  • Intel i9 CPU
    Load-Only: The configuration of the Intel Core i9 CPU does not include a singular DTLB; it incorporates separate TLBs for load-only and store-only operations. This variation exemplifies how ISAs can change, even within the products of a single manufacturer, posing challenges for standardizing experiments. In this scenario, the gradient plot accurately indicates the expected pattern for ‘sets’ 16, as shown in Figure 5b. However, the data for ‘sets’ 32 and 64 also emerge as potential candidates, highlighting a limitation of the gradient method in this context. It confirms that ‘ways’ 4 is the minimum where event counts increase, but it does not unequivocally pinpoint ‘sets’ 16 as the corresponding configuration. Nevertheless, ‘sets’ 16 and ‘ways’ 4 is the lowest configuration.
    The heatmap results for the event dtlb_load_misses.stlb_hit align with our expectations, as shown in Figure 5a. It is evident that the lowest number of ‘ways’ is 4 and ‘sets’ is 16, respectively.
    Store-only: The gradient plot reveals an increase after ‘ways’ 12 across all sets, as shown in Figure 6b. Two key observations emerge: firstly, the presence of an increase at ‘ways’ 12, and secondly, a deceleration in the gradient at ‘ways’ 16. Here again, the gradient is one piece of the puzzle and the known size of the TLB is the other piece. By dividing the known size, 16, by the ‘set’, 1, the ‘ways’ is obtained as 16. ‘Set’ is selected because all ‘sets’ return similar behavior, so the lowest is ‘set’ 1.
    Similarly, the heatmap results demonstrate consistency across all sets, as shown in Figure 6a. Given that ‘ways’ is consistent for all ‘sets’, it suggests that the number of ‘sets’ is 1. However, the exact number of ‘ways’ remains unclear from the heatmap; based solely on the data, we might infer it to be 11, 12, or 13. This represents the first instance where the heatmap did not yield a specific ‘sets’-‘ways’ combination. Given this ambiguity, this study relies on the specifications listed in Table 2 as correct, and the same ‘set’-‘ways’ combination is determined to be 1–16.
    ITLB: The gradient methodology indicates that ‘sets’ 16 (red), 32 (purple), and 64 (brown) exhibit an increase at around ‘ways’ 8, as shown in Figure 7b. The lowest ‘sets’-‘ways’ combination is 16-8. Another way to reach the same conclusion is by choosing ‘sets’ 16, since it is the lowest among the possible candidates (16, 32, and 64), and using the size of the TLB from Table 2. Dividing the TLB size by ‘sets’ 16 gives ‘ways’ 8.
    The analysis of the heatmap results for itlb_load_misses.stlb_hit shows a discernible pattern, as illustrated in Figure 7a, which corresponds with the data outlined in Table 2.
    STLB: The gradient plot indicates that significant activity is only observed when the ‘set’ count is 128 under XOR-7 mapping as shown in Figure 8b. However, it provides no distinct indicators or patterns that would aid in concluding the number of ‘ways’. Again from Table 2 the size is known, as 1024, and it was also determined experimentally, and the ‘sets’ from Figure 8b is 128, so dividing the TLB size by ‘sets’ 128 returns ‘ways’ 8.
    In the heatmap in Figure 8a, it can be observed that dtlb_load_misses.walk_activate employs an XOR-7 mapping with 128 ‘sets’. Although the precise size of the TLB was not provided, it was determined through testing with an increasing number of ‘sets’ and ‘ways’. While the heatmap result is not detailed in this study, the capacity of the STLB was noted to be 1024 entries. Therefore, an XOR-7 mapping with 1024 entries implies an 8-way association.
  • IBM Power9
    DERAT, IRAT, TLB: This study aimed to assess whether the methodology applied to Intel CPUs could be transferred to another ISA. The objective was to execute the C programs and shell scripts designed to collect and analyze data. The first challenge involved identifying the HPCs that are equivalent to those on Intel, which provides insights into the TLB configuration. To determine the configurations of DERAT and IRAT, we hypothesized that the data collected by pm_tlb_hit could serve as the equivalent to Intel’s dtlb_misses.stlb_hit. Similarly, for TLB configurations, we assumed that pm_tablewalk_cyc could be the counterpart to Intel’s event dtlb_load_misses.walk_active.
    In Figure 9a, the pm_tlb_hit event was used to determine the size of the TLB. The capacity of the TLB appears to be a combination of ‘ways’ and ‘sets’ that totals 1024. Given that the unit under test operates in Radix mode, it can be inferred that the actual size of the TLB is half of this number, equating to 512 entries. However, from the heatmap alone, we can only deduce that the total capacity is 1024, as indicated by the red squares.
    The gradient plot, Figure 9b, does not offer additional insights. It confirms the same combinations resulting in a total of 1024, allowing us to ascertain the capacity of the TLB, but it does not provide specific details regarding the configuration of ‘ways’ and ‘sets’.
    In Figure 10a, pm_tablewalk_cyc shows a pattern at ‘sets’ 512 and 1024. However, the TLB configuration of ‘sets’ 512 and ‘ways’ 4 cannot be deduced from this heatmap alone. Additionally, the gradient plot shown in Figure 10b does not provide any further insights into the correct configuration.
    Even though this study did not successfully reverse engineer the Power9 TLB configuration, the patterns observed in the heatmaps suggest that further modifications in the code, along with the exploration of additional events, could yield improved results.

4. Conclusions

In this study, we explored how the drive for efficiency in computer systems can inadvertently create vulnerabilities. While cache side-channel analysis has traditionally been the focus of both academic and industry research, other shared hardware components, such as the TLB, also present opportunities for extracting information from computer systems. The cybersecurity community understands that micro-architecture is susceptible to attacks, and in recent years, there has been a small but growing interest in TLBs. This study contributes to raising awareness within the security community by collecting microarchitectural data, analyzing it with different techniques, inferring TLB configurations, and identifying the main obstacles to transferring the methodology across different systems. A critical aspect of implementing a TLB side-channel analysis involves understanding its architecture. However, since detailed TLB configurations are typically not open-sourced, our study developed a new methodology using gradient plots to infer these configurations. Our findings indicate that HPCs are effective in monitoring CPU performance and analyzing the resulting data to deduce TLB configurations. We used methodologies from [14,15] to confirm our tested gradient plot method. This methodology provided additional insights for Intel CPUs and proved reliable as a standalone technique; however, in some cases, it needed supporting information, specifically the TLB size. The study also examined whether the methodology used for Intel CPUs could be applied to the IBM Power9. Although the methodology did not succeed in reverse engineering the TLB configuration for the IBM Power9, it revealed some patterns that hint at the possibility of success with further adjustments to the experiment, particularly the code for data collection and the events used.
Looking ahead, several avenues for future research appear promising. One direction is to extend the IBM Power9 analysis and explore other architectures. The mapping function remains a significant obstacle when working with a new ISA. A process for addressing this with a new ISA would involve experimenting with common mapping functions used by specific vendors and determining their effectiveness. However, this approach can be hit or miss, and without proper documentation, it would be extremely difficult to infer the TLB configuration, especially since some vendors implement complex mapping functions involving multiple operations (e.g., multiple XOR-N operations).
Another challenge is the availability of event counters. Their accessibility must be assessed, and it is necessary to identify equivalent counters—similar to those in Intel systems, where this methodology has proven effective—if they exist.
Implementing this methodology in a Field Programmable Gate Array (FPGA), for example, with a RISC-V ISA, is an intriguing endeavor. The main challenge in this effort is gaining user access to the counters. Since TLB events are disabled, an alternative technique would need to be employed.
Additionally, once the TLB configuration is ascertained, the next logical step in side-channel analysis would be to explore methods for accessing the TLB in a manner similar to that of a potential attacker (referred to as ‘the victim’ in side-channel parlance). This could provide further insights into potential vulnerabilities and defensive strategies. Ultimately, this study emphasizes the importance of considering security implications in the design of efficient systems. While enhancing efficiency is a key goal in computer system design, our research highlights the need for a balanced approach that also prioritizes security to protect against potential vulnerabilities in shared hardware components like the TLB.

Author Contributions

Conceptualization, C.A., T.J.L. and S.R.G.; methodology, C.A.; software, C.A. and T.J.L.; validation, C.A., T.J.L. and S.G; formal analysis, C.A., T.J.L. and S.G; investigation, C.A. and T.J.L.; resources, C.A., T.J.L. and S.R.G.; data curation, C.A. and T.J.L.; writing—original draft preparation, C.A.; writing—review and editing, T.J.L. and S.R.G.; visualization, C.A., T.J.L. and S.G; supervision, T.J.L. and S.R.G.; project administration, C.A., T.J.L. and S.G; funding acquisition, C.A., T.J.L. and S.R.G. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Data supporting the reported results are available upon request from the corresponding author, Cristian Agredo, cristian.agredo@afit.edu.

Conflicts of Interest

The authors declare no conflicts of interest.

Disclaimer

The views expressed in this paper are those of the authors, and do not reflect the official policy or position of the United States Air Force, Department of Defense, or the U.S. Government. This document has been approved for public release; distribution unlimited, case number 88ABW-2024-0397.

References

  1. Disselkoen, C.; Kohlbrenner, D.; Porter, L.; Tullsen, D. Prime+Abort: A Timer-Free High-Precision L3 Cache Attack Using Intel TSX. In Proceedings of the 26th USENIX Security Symposium, Vancouver, BC, Canada, 16–18 August 2017. [Google Scholar]
  2. Kocher, P.; Genkin, D.; Gruss, D.; Haas, W.; Hamburg, M.; Lipp, M.; Mangard, S.; Prescher, T.; Schwarz, M.; Yarom, Y. Spectre attacks: Exploiting speculative execution. Commun. ACM 2020, 63, 93–101. [Google Scholar] [CrossRef]
  3. Lipp, M.; Schwarz, M.; Gruss, D.; Prescher, T.; Haas, W.; Horn, J.; Mangard, S.; Kocher, P.; Genkin, D.; Yarom, Y.; et al. Meltdown: Reading kernel memory from user space. Commun. ACM 2020, 63, 46–56. [Google Scholar] [CrossRef]
  4. Liu, F.; Yarom, Y.; Ge, Q.; Heiser, G.; Lee, R.B. Last-level cache side-channel attacks are practical. In Proceedings of the 2015 IEEE Symposium on Security and Privacy (SP), San Jose, CA, USA, 17–21 May 2015; IEEE: Piscataway, NJ, USA, 2015; pp. 605–622. [Google Scholar]
  5. Yarom, Y.; Falkner, K. Flush+ Reload: A High Resolution, Low Noise, L3 Cache Side-Channel Attack. In Proceedings of the USENIX Security Symposium, San Diego, CA, USA, 20–22 August 2014; pp. 719–732. [Google Scholar]
  6. Percival, C. Cache missing for fun and profit. In Proceedings of the Free BSD Presentations and Papers (2005), Ottawa, ON, Canada, 13–14 May 2005. [Google Scholar]
  7. Osvik, D.A.; Shamir, A.; Tromer, E. Cache Attacks and Countermeasures: The Case of AES. In Topics in Cryptology-CT-RSA 2006; Hutchison, D., Kanade, T., Kittler, J., Kleinberg, J.M., Mattern, F., Mitchell, J.C., Naor, M., Nierstrasz, O., Pandu Rangan, C., Steffen, B., et al., Eds.; Springer: Berlin/Heidelberg, Germany, 2006; Volume 3860, pp. 1–20. [Google Scholar] [CrossRef]
  8. Gullasch, D.; Bangerter, E.; Krenn, S. Cache Games–Bringing Access-Based Cache Attacks on AES to Practice. In Proceedings of the Security and Privacy (SP), 2011 IEEE Symposium On, Oakland, CA, USA, 22–25 May 2011; pp. 490–505. [Google Scholar]
  9. Braun, B.A.; Jana, S.; Boneh, D. Robust and efficient elimination of cache and timing side channels. arXiv 2015, arXiv:1506.00189. [Google Scholar]
  10. Gruss, D.; Schuster, F.; Ohrimenko, O.; Haller, I.; Lettner, J.; Costa, M. Strong and efficient cache side-channel protection using hardware transactional memory. In Proceedings of the 26th USENIX Security Symposium, Vancouver, BC, Canada, 16–18 August 2017. [Google Scholar]
  11. Liu, F.; Ge, Q.; Yarom, Y.; Mckeen, F.; Rozas, C.; Heiser, G.; Lee, R.B. Catalyst: Defeating last-level cache side channel attacks in cloud computing. In Proceedings of the 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA), Barcelona, Spain, 12–16 March 2016; IEEE: Piscataway, NJ, USA, 2016; pp. 406–418. [Google Scholar]
  12. Sprabery, R.; Evchenko, K.; Raj, A.; Bobba, R.B.; Mohan, S.; Campbell, R.H. A novel scheduling framework leveraging hardware cache partitioning for cache-side-channel elimination in clouds. arXiv 2017, arXiv:1708.09538. [Google Scholar]
  13. Demme, J.; Maycock, M.; Schmitz, J.; Tang, A.; Waksman, A.; Sethumadhavan, S.; Stolfo, S. On the Feasibility of Online Malware Detection with Performance Counters. ACM SIGARCH Comput. Archit. News 2013, 41, 559–570. [Google Scholar] [CrossRef]
  14. Gras, B.; Razavi, K.; Bos, H.; Giuffrida, C. Translation Leak-aside Buffer: Defeating Cache Side-channel Protections with TLB Attacks. In Proceedings of the USENIX Security Symposium, USENIX, Baltimore, MD, USA, 15–17 August 2018; pp. 955–972. [Google Scholar]
  15. Holmes, N. Not Lost in Translation: Implementing Side Channel Attacks Through the Translation Lookaside Buffer. Master’s Thesis, University of Warwick, Department of Computer Science, Coventry, UK, 2023. [Google Scholar]
  16. Deng, S.; Xiong, W.; Szefer, J. Secure TLBs. In Proceedings of the 46th International Symposium on Computer Architecture. Association for Computing Machinery, Phoenix, AZ, USA, 22–26 June 2019; pp. 346–359. [Google Scholar] [CrossRef]
  17. Tatar, A.; Trujillo, D.; Giuffrida, C.; Bos, H. TLB;DR: Enhancing TLB-based Attacks with TLB Desynchronized Reverse Engineering. In Proceedings of the 2020 IEEE Symposium on Security and Privacy (SP), San Francisco, CA, USA, 18–20 May 2020; pp. 1273–1290. [Google Scholar] [CrossRef]
  18. Hennessy, J.L.; Patterson, D.A. Computer Architecture, Sixth Edition: A Quantitative Approach, 6th ed.; Morgan Kaufmann Publishers Inc.: San Francisco, CA, USA, 2017. [Google Scholar]
  19. Stallings, W. Operating Systems: Internals and Design Principles; Pearson: New York, NY, USA, 2014. [Google Scholar]
  20. Stolz, F.; Thoma, J.P.; Güneysu, T.; Sasdrich, P. Risky Translations: Securing TLBs against Timing Side Channels. In Proceedings of the ACM Conference on Computer and Communications Security (CCS), Bochum, Germany, 2024. [Google Scholar]
  21. Power ISA Version 3.1B. 2021. Available online: http://www.openpowerfoundation.org (accessed on 15 April 2023).
  22. Linux Kernel Organization. Perf-A Performance Counting Tool. 2024. Available online: https://perf.wiki.kernel.org/index.php/Main_Page (accessed on 11 January 2024).
  23. IBM Corporation. POWER9 Processor User Manual, OpenPOWER v2.0GA; IBM Corporation: Armonk, NY, USA, 2018. Available online: https://openpowerfoundation.org/?resource_lib=power9-processor-user-manual (accessed on 29 October 2024).
  24. IBM Corporation. POWER9 Performance Monitoring Unit User Guide, v1.2; IBM Corporation: Armonk, NY, USA, 2018; Available online: https://openpowerfoundation.org/?resource_lib=power9-performance-monitoring-unit-user-guide (accessed on 29 October 2024).
Figure 1. Main modules in the address translation process.
Figure 1. Main modules in the address translation process.
Jcp 04 00044 g001
Figure 2. Intel Xeon DTLB heatmaps and gradient plots are shown. In Figure (a), the red square highlights the configuration of 16 ‘sets’ and 4 ‘ways’. In Figure (b), the red line for 16 ‘sets’ distinctly shows an increase at 4 ‘ways’.
Figure 2. Intel Xeon DTLB heatmaps and gradient plots are shown. In Figure (a), the red square highlights the configuration of 16 ‘sets’ and 4 ‘ways’. In Figure (b), the red line for 16 ‘sets’ distinctly shows an increase at 4 ‘ways’.
Jcp 04 00044 g002
Figure 3. Intel Xeon ITLB heatmaps and gradient plots are shown. In Figure (a), the red squares highlight two potential configurations. The methodology confirms that the configuration of 16 ‘sets’ and 8 ‘ways’ is the correct one. In Figure (b), the red line for 16 ‘sets’ distinctly shows an increase of around 8 ‘ways’.
Figure 3. Intel Xeon ITLB heatmaps and gradient plots are shown. In Figure (a), the red squares highlight two potential configurations. The methodology confirms that the configuration of 16 ‘sets’ and 8 ‘ways’ is the correct one. In Figure (b), the red line for 16 ‘sets’ distinctly shows an increase of around 8 ‘ways’.
Jcp 04 00044 g003aJcp 04 00044 g003b
Figure 4. Intel Xeon STLB heatmaps and gradient plots. Figure (a,b) indicate that the STLB employs an XOR-7 mapping function.
Figure 4. Intel Xeon STLB heatmaps and gradient plots. Figure (a,b) indicate that the STLB employs an XOR-7 mapping function.
Jcp 04 00044 g004
Figure 5. Intel i9 load-only heatmaps and gradient plots. In Figure (a), the red square highlights the configuration of 16 ‘sets’ and 4 ‘ways’. In Figure (b), the red line for 16 ‘sets’ shows an increase at 4 ‘ways’; however, ‘32’ and ‘64’ also show increases in the same area. This is an example where the gradient plot supports the heatmap but does not independently provide the exact configuration.
Figure 5. Intel i9 load-only heatmaps and gradient plots. In Figure (a), the red square highlights the configuration of 16 ‘sets’ and 4 ‘ways’. In Figure (b), the red line for 16 ‘sets’ shows an increase at 4 ‘ways’; however, ‘32’ and ‘64’ also show increases in the same area. This is an example where the gradient plot supports the heatmap but does not independently provide the exact configuration.
Jcp 04 00044 g005
Figure 6. The i9 store-only heatmaps and gradient plots. Figures (a,b) rely on knowing the size of the TLB to infer the configuration, as explained in detail below.
Figure 6. The i9 store-only heatmaps and gradient plots. Figures (a,b) rely on knowing the size of the TLB to infer the configuration, as explained in detail below.
Jcp 04 00044 g006
Figure 7. The i9 ITLB heatmaps and gradient plots. In Figure (a), the red square highlights the configuration of 16 ‘sets’ and 8 ‘ways’. In Figure (b), the red line for 16 ‘sets’ shows an increase at 4 ‘ways’; however, ‘32’ and ‘64’ also show increases in the same area. This is another example where the gradient plot supports the heatmap but does not independently provide the exact configuration.
Figure 7. The i9 ITLB heatmaps and gradient plots. In Figure (a), the red square highlights the configuration of 16 ‘sets’ and 8 ‘ways’. In Figure (b), the red line for 16 ‘sets’ shows an increase at 4 ‘ways’; however, ‘32’ and ‘64’ also show increases in the same area. This is another example where the gradient plot supports the heatmap but does not independently provide the exact configuration.
Jcp 04 00044 g007
Figure 8. The i9 STLB heatmaps and gradient plots. Figure (a,b) indicate that the STLB employs an XOR-7 mapping function.
Figure 8. The i9 STLB heatmaps and gradient plots. Figure (a,b) indicate that the STLB employs an XOR-7 mapping function.
Jcp 04 00044 g008
Figure 9. Power 9 tlb hit heatmap and gradient plots. From Figure (a), we can only deduce that the total capacity is 1024, suggesting that the linear mapping function is not the one used by the system. Figure (b) does not provide the same insights as those observed with the Xeon or i9.
Figure 9. Power 9 tlb hit heatmap and gradient plots. From Figure (a), we can only deduce that the total capacity is 1024, suggesting that the linear mapping function is not the one used by the system. Figure (b) does not provide the same insights as those observed with the Xeon or i9.
Jcp 04 00044 g009
Figure 10. Power9 tablewalk heatmap and gradient plots. Figures (a,b) do not provide the specific configuration. However, the patterns observed in the heatmaps suggest that further modifications to the mapping function and counter-selection could yield improved results.
Figure 10. Power9 tablewalk heatmap and gradient plots. Figures (a,b) do not provide the specific configuration. However, the patterns observed in the heatmaps suggest that further modifications to the mapping function and counter-selection could yield improved results.
Jcp 04 00044 g010
Table 1. The Xeon TLB configuration for a 4 K page size.
Table 1. The Xeon TLB configuration for a 4 K page size.
TLBDTLBITLBSTLB
Page Size4 K4 K4 K
Ways486
Sets168256
Entries64641536
Table 2. The i9 TLB configuration for a 4 K page size.
Table 2. The i9 TLB configuration for a 4 K page size.
TLBLoadStoreITLBSTLB
Page Size4 K4 K4 K4 K
Ways41688
Sets16116128
Entries64161281024
Table 3. The Power9 configuration for a 16 K page size.
Table 3. The Power9 configuration for a 16 K page size.
TLBDERATIRATTLB
Page Size16 K16 K16 K
Ways64644
Sets11128
Entries6464512
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Agredo, C.; Langehaug, T.J.; Graham, S.R. Inferring TLB Configuration with Performance Tools. J. Cybersecur. Priv. 2024, 4, 951-971. https://doi.org/10.3390/jcp4040044

AMA Style

Agredo C, Langehaug TJ, Graham SR. Inferring TLB Configuration with Performance Tools. Journal of Cybersecurity and Privacy. 2024; 4(4):951-971. https://doi.org/10.3390/jcp4040044

Chicago/Turabian Style

Agredo, Cristian, Tor J. Langehaug, and Scott R. Graham. 2024. "Inferring TLB Configuration with Performance Tools" Journal of Cybersecurity and Privacy 4, no. 4: 951-971. https://doi.org/10.3390/jcp4040044

APA Style

Agredo, C., Langehaug, T. J., & Graham, S. R. (2024). Inferring TLB Configuration with Performance Tools. Journal of Cybersecurity and Privacy, 4(4), 951-971. https://doi.org/10.3390/jcp4040044

Article Metrics

Back to TopTop