BranchCloak: Mitigating Side-Channel Attacks on Directional Branch Predictors

Kim, Jihoon; Jang, Hyerean; Shin, Youngjoo

doi:10.3390/electronics14091758

Open AccessArticle

BranchCloak: Mitigating Side-Channel Attacks on Directional Branch Predictors

by

Jihoon Kim

,

Hyerean Jang

and

Youngjoo Shin

^*

School of Cybersecurity, Korea University, Seoul 02841, Republic of Korea

^*

Author to whom correspondence should be addressed.

Electronics 2025, 14(9), 1758; https://doi.org/10.3390/electronics14091758

Submission received: 22 March 2025 / Revised: 20 April 2025 / Accepted: 22 April 2025 / Published: 25 April 2025

(This article belongs to the Section Computer Science & Engineering)

Download

Browse Figures

Versions Notes

Abstract

The emerging threat of side-channel attacks targeting branch predictors on recent Intel processors has become a growing concern. These attacks rely on exploiting a pattern history table (PHT) as a source of side-channel information. Since the PHT is shared among logical cores, attackers can observe a state in the PHT entry that collides with the victim, enabling them to leak the control flow information of a victim process. Any state changes caused by the victim will reveal whether the victim’s target branch has been taken or not. In this paper, we present BranchCloak, a novel software-based mitigation technique for PHT-based side-channel attacks. The main idea of BranchCloak is to obfuscate the PHT state by augmenting the victim’s program with some r-branches near the target branch. The r-branch is a conditional branch instruction that has the following properties: (1) it collides with the target branch in the PHT, and (2) its branching decision is made uniformly at random. BranchCloak can successfully mitigate the attack without hardware modification of the vulnerable processors. By performing extensive experiments with practical applications, we show that the performance overhead of BranchCloak is negligible.

Keywords:

branch predictor; microarchitectural attack; pattern history table; side-channel attack

1. Introduction

Modern microprocessors employ various optimization techniques to improve processing performance. However, these techniques are prone to microarchitectural side-channel attacks [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18], as these components are designed to temporarily store information about the executing program. Recent studies have emphasized that the exposure of such temporal states can create vulnerability windows for side-channel exploitation [19,20]. The branch predictor unit (BPU) is one of the vulnerable microarchitectural components that creates side-channels. However, the BPU is essential for maintaining high performance in modern processors. To optimize pipeline utilization, processors must continuously fetch and execute instructions. Given that the control flow of a program can depend on the outcome of branch instructions, processors would otherwise need to stall until the branch condition is resolved. To circumvent these stalls, the BPU predicts the outcome and allows the processor to speculatively execute instructions along the predicted path. BPUs are typically classified into three categories: a directional predictor using a pattern history table (PHT), a target address predictor using a branch target buffer (BTB), and return address predictor using a return address buffer (RSB). Any conditional branch instructions refer to the PHT for branch prediction.

PHT-based side-channel attacks [21,22,23,24] exploit the PHT to reveal a victim’s control flow information concerning conditional branches, thereby enabling the inference of the victim’s secret. The primary objective of control flow leakage attacks is to extract sensitive data from already existing programs, necessitating the identification of attack gadgets. In cases where there is another leakage source (e.g., branch or cache access) at the two destinations of a conditional branch, it is possible to carry out attacks using hardware units other than the PHT state [25,26]. However, such attacks require more complex code gadgets. Compared to attacks that need to find complex gadgets where a conditional branch and another leakage source exist at their destinations, PHT-based side-channel attacks can be executed with no such restrictions, requiring only a single conditional branch, making them exceptionally powerful.

Many research efforts have been made to devise countermeasures against PHT-based attacks. The straightforward solution is to substitute conditional branch instructions in the victim’s code that rely on the susceptible PHT with other branch instructions, such as an indirect jump [27,28,29,30,31]. Although these techniques can be effective in preventing attacks, they may increase the attack surface of another side-channel attack that exploits the BTB instead. On the other hand, hardware-based mitigating approaches aim to isolate microarchitectural components such as the PHT and caches [32,33,34]. Indeed, some hardware-based techniques have proved to be effective in mitigating microarchitectural attacks [35,36,37,38,39]. However, they are restrictive due to the requirement of hardware modification in the vulnerable processors.

In this paper, we propose BranchCloak, a novel software-based mitigation technique for PHT-based side-channel attacks. The main idea of BranchCloak originates from the fact that the attacker needs to observe the PHT state changes caused by the victim to know whether the target branch has been taken or not. BranchCloak attempts to obfuscate the PHT state by augmenting the victim’s program with additional branch instructions, named r-branches, near the target branch. The r-branch is a conditional branch instruction with the following properties: (1) it occupies the same PHT entry with the target branch instruction, and (2) its branch direction is determined uniformly at random. As the r-branch randomizes the state of the PHT of interest, the attacker cannot infer the exact state change caused by the target branch. We prove that BranchCloak offers perfect security through rigorous security analysis.

BranchCloak is a software-based security measure. It can be implemented on a target application without any hardware modifications to vulnerable processors, only by modifying the program’s source code. There exist several software-based techniques [27,30,31,40] that are applicable for mitigating PHT-based attacks. As shown in our performance evaluation, the performance overhead of BranchCloak is minimal compared to previous software-based solutions. We show the effectiveness of BranchCloak by conducting experiments under practical application settings that use cryptographic libraries such as OpenSSL, MbedTLS, and Libgcrypt.

Scope of BranchCloak. The primary objective of BranchClock is to mitigate PHT-based side-channel attacks. It is important to note that the scope of BranchClock does not extend to microarchitectural side-channel attacks that target the cache or other hardware components. Additionally, Spectre-PHT-type attacks [1] are considered outside the scope of this study, despite the utilization of the PHT, as these attacks leverage the PHT to mistrain the branch predictor, not to transmit sensitive data over the PHT. Specifically, BranchCloak addresses the attacks where secret-dependent microarchitectural footprints are only observable in the PHT, but not in other components such as the cache. For instance, let us consider the following code snippet that all the instructions and data (including A and B) fit in a single cache line. A and B can be more relaxed such that they lie in the same cache lines in a balanced fashion. In this code, the cache does not reveal the secret, but the PHT does.

if (secret) { A; } else { B; }

Indeed, this is one of the common implementation practices in constant-time programming (e.g., a scatter–gather technique in OpenSSL [41]). However, the above code is not secure for PHT-based side-channel attacks. As learned from lots of previous studies, even secure (constant-time) implementations have difficulty mitigating all microarchitectural side-channel attacks. Notably, BranchCloak’s approach involves only adjusting the lower bits of the target branch and inserting dummy branches, which leaves open the possibility of utilizing other mitigation techniques against a wide range of microarchitectural side-channel attacks in parallel. By substantially decreasing the remaining attack surfaces, BranchCloak has the potential to play a vital role in strengthening the security of current solutions.

Contributions. The main contributions of this paper are as follows:

We propose BranchCloak, a novel software-based mitigation technique against PHT-based side-channel attacks. BranchCloak hinders attackers from inferring the branch direction of a target branch by randomizing the corresponding PHT-state with r-branches.
We reverse-engineer the PHT structure of Intel processors to learn how to collide a target branch with r-branches in the PHT, which is necessary to implement BranchCloak.
We implement BranchCloak, and evaluate its performance regarding the execution and storage overhead by performing extensive experiments with real-world cryptographic applications.

The paper is organized as follows. Section 2 provides some preliminary knowledge about branch predictor units and PHT-based side-channel attacks. Section 3 draws up the attack model and presents BranchCloak, the proposed mitigation method to protect against PHT-based side-channel attacks. Section 4 and Section 5 evaluate the effectiveness of BranchCloak by conducting security and performance analysis. Section 6 discusses the practicality, scalability, and limitations associated with the application of BranchCloak. Section 7 presents related work. Finally, we conclude the paper in Section 8.

2. Background

2.1. Branch Predictor Unit

The branch predictor unit is a microarchitectural element that allows modern processors to efficiently predict branches and minimize instruction pipeline stalls. In this paper, we focus on the directional branch predictor. Figure 1 shows the hybrid structure of the directional predictor. It supports two prediction modes: a one-level predictor, also known as a local predictor, and a two-level predictor, commonly referred to as a global predictor [21,22,23,42]. Each predictor is composed of multiple PHT entries of n-bit saturating counters. For instance, a previous study has experimentally revealed that the Intel processor has

2^{14}

entries in a PHT for a one-level predictor, where each PHT entry has a two-bit saturating counter with four states: Strongly Taken (ST), Weakly Taken (WT), Weakly Not-taken (WN), and Strongly Not-taken (SN) [21]. In the one-level predictor, the address of a branch instruction is the only factor that determines the index of the PHT entry that it occupies. The transition of the PHT state takes place according to the execution result of a branch instruction. For instance, if a branch instruction in the WN state is executed as Taken—i.e., the branch prediction fails—then the PHT state moves to WT.

Unlike a one-level predictor, a two-level predictor uses a global branch history to determine the index of a PHT entry in addition to the address of a branch instruction. The global branch history is maintained in a special register called a global history register (GHR). It has been demonstrated that Intel’s two-level predictor employs a three-bit saturating counter for each PHT entry [22]. This history-based method is suitable for predicting periodically repeated patterns in complex code flows, but it requires a relatively long training time compared to the one-level predictor.

The Tagged Geometric History Length (TAGE) predictor [43] is a state-of-the-art conditional branch predictor that is widely utilized in various processor models. Research has revealed that TAGE is employed in the latest CPUs from Intel [26,40] and ARM [24], and AMD has announced the use of TAGE predictors in their processors starting with the Zen 2 architecture [44]. The TAGE predictor is composed of a base predictor and a multi-level tagged component, with each entry consisting of a saturating counter. The base predictor indexes entries solely based on the branch address, similar to a one-level predictor, while each tagged component indexes entries based on the branch address and increasing GHR lengths.

2.2. PHT-Based Side-Channel Attacks

PHT-based side-channel attacks leverage collisions of PHT entries between the attacker and the victim. This technique is similar to Spectre-type attacks [1], which also exploit the PHT, but with a different objective. While Spectre attacks aim to create transient execution by mistraining the PHT, PHT-based attacks seek to infer the victim’s branch prediction result by monitoring the PHT’s state changes. In this section, we provide an introduction to PHT-based side-channel attacks, which includes a PHT collision and a general description of the attack.

2.2.1. PHT Entry Collision

The initial phase of the PHT-based side-channel attack involves identifying the PHT entry that corresponds to the victim’s target branch. The approach for determining the specific PHT entry depends on the prediction mode used.

If a one-level directional branch predictor is used, only the address of a branch instruction is used to determine the index of the PHT. As a result, two branch instructions in different processes that have the same virtual address will refer to the same entry in the PHT when using a one-level directional predictor. Figure 2 illustrates a PHT entry collision on the one-level predictor. The conditional branch instructions of the victim and attacker processes have the same virtual address (0xff00) and refer to the same PHT entry. The control flow of the victim’s conditional branch varies depending on the one-bit secret value secret, and the state change of the PHT entry caused by victim’s branch can be referenced by the attacker’s branch at the same virtual address. For example, BranchScope [21] generates a PHT entry collision by placing a branch in the attacker’s process with the same virtual address as the victim’s secret-dependent branch.

When using a two-level predictor, the global branch history is another factor that determines the index of the PHT in addition to the virtual address of a branch instruction. However, if a technique proposed by Bluethunder [22] and BranchSpectre [23] is used to fix the global branch history, the attacker can focus only on the virtual address. This enables the attacker to create a collision in the PHT entry with the victim in a two-level predictor, similar to the way in the one-level predictor.

By exploiting the PHT entry collision, the attacker can infer whether the secret-dependent branch of the victim’s process has been taken or not through the branch placed within the attacker’s process. Details about the attack process are outlined in the following section.

2.2.2. General Attack Description

In general, PHT-based side-channel attacks proceed in three steps:

Step 1.: (Initialize) The attacker forces a victim to use either a one-level or two-level directional predictor and initializes a target PHT entry into the attacker’s desired state.
Step 2.: (Wait) The attacker waits for the victim to execute the target branch. The PHT state will change according to the execution result. For instance, the state changes one step toward Strongly Taken if the target branch was executed as taken. Otherwise, the state changes toward Strongly Not-taken.
Step 3.: (Probe) After the victim’s execution, the attacker probes to see whether the target branch was actually taken or not. For this, the attacker uses his/her own branch instruction that collides with the target branch to infer the PHT state.

Figure 3 illustrates two instances of PHT-based side-channel attacks on a one-level predictor with different initial PHT states; one is a case where the PHT is initialized with Weakly Taken (Figure 3a), and the other is when the PHT is initialized with Strongly Taken (Figure 3b). Suppose that the victim’s target branch is executed as taken (i.e., the secret is true) in Step 2. In the first attack instance, a branch prediction for the attacker’s branch that executes as taken may hit in Step 3. In contrast, a prediction miss occurs when the attacker’s branch is executed as taken when the secret is false.

The second attack instance is actually the case of Branchscope [21] and BranchSpectre [23]. If the PHT is initialized as Strongly Taken and uses an n-bit counter, the result of the

2^{n - 1}

th branch prediction serves as a reference point for determining the secret value [23]. For instance, a one-level predictor that employs a two-bit counter for a PHT entry can enable inference of the secret value based on the second branch prediction result.

For a successful attack, it is necessary to slow down the victim’s execution. Otherwise, the attacker might not be able to correctly probe the victim’s target branch. Moreover, the attacker must be able to force the victim to use a one-level or a two-level predictor depending on their attack needs. There are well-established techniques to achieve this. For instance, it has been experimentally proved that the attacker can force the use of a one-level predictor by executing at least 100,000 conditional jumps prior to the attack [21]. Another work [40] further validated this notion by illustrating the attacker’s ability to manipulate the local predictor of the TAGE predictor. Similarly, the victim can be forced to use a two-level predictor by repeating a predetermined sequence of conditional jumps [22,23].

3. The Proposed Method

In this section, we present the proposed software-based mitigation technique against PHT-based side-channel attacks. We first describe our attack model and then present the proposed method and its implementation in detail.

3.1. Attack Model

The attacker’s goal is to determine the direction of a target branch in the victim process, i.e., whether the target branch has been executed as taken or not. While PHT-based side-channel attacks are possible on the target system, it is possible that other microarchitectural side-channel vulnerabilities [45,46,47,48,49] may exist simultaneously. For instance, different branches of a conditional statement may be mapped to separate cache lines. However, this can be mitigated through constant-time programming techniques that ensure a secret-dependent conditional statement is packed within a single cache line. Additionally, cache-based side-channel attacks can be mitigated through the detection of destructive cache flush operations [50,51,52] or deploying dynamic software-diversity techniques [53]. In our attack model, we consider a victim with a side-channel vulnerability exclusively in the PHT, and an attacker who attempts to leak secrets solely through PHT-based side channels.

Our attack model basically follows the model used in PHT-based side-channel attacks [21,22,23]. First, we assume that the attacker has the ability to locate their process to the same physical core as that of the victim, such that the BPU is shared between the attacker and victim’s process. The attacker can achieve co-residency with the victim using some techniques, such as manipulating an OS scheduler [54].

Second, we assume that the attacker can introduce a collision in a PHT entry. That is, the attacker can identify the address of a target branch instruction in the victim’s process: the attacker has access to the source code and program binary of the victim’s program. Once identified, the PHT entry collision can be made and the PHT state can be initialized to a desired state using well-established techniques [21,22,23]. Address space layout randomization (ASLR) may hinder the attacker from identifying the address of the victim’s target branch. The attacker can de-randomize the ASLR by exploiting data disclosure or side-channel vulnerabilities [21].

Third, we consider an attacker with the ability to slow down the victim process, which is necessary for the attack to investigate the PHT state of a target branch instruction. This can be achieved by exploiting the OS scheduler [55], conducting performance degradation attacks [56], or periodically interrupting the victim process [57,58]. A multi-stepping attacker can slow down the victim’s execution enough to read the target branch’s PHT state meaningfully but cannot examine the PHT state at the granularity of a single instruction. In contrast, a single-stepping attacker is capable of fine-grained, cycle-level interruptions of the victim process to implement precise single-stepping. Specifically, we classify the attacker’s capabilities into multi-stepping and single-stepping attacks, with BranchCloak’s defense mechanisms varying accordingly. For multi-stepping attacks, PHT-level obfuscation is applied, whereas for single-stepping attacks, branch-level obfuscation is applied. However, we assume the attacker does not have the capability to retrieve the victim’s current instruction pointer. For instance, SGX-Step [57] enables the precise retrieval of the instruction pointer within Intel’s SGX environment, which we consider out of scope in our threat model.

Finally, we assume that the attacker has access to a high-resolution timestamp counter (e.g., a rdtsc instruction) to determine whether the branch prediction was hit or missed.

3.2. BranchCloak

3.2.1. Overview

The main idea of BranchCloak is to obfuscate the PHT state so as to hinder the attacker from inferring the direction of the target branch. For this, BranchCloak inserts additional branches, which are referred to as r-branches, in the victim’s program. The r-branch is a conditional branch instruction that has the following properties: (1) its branch direction is determined uniformly at random, and (2) it occupies the same PHT entry as the target (secret-dependent) branch instruction.

Figure 4 illustrates how BranchCloak mitigates an attack. Figure 4a shows an original attack where BranchCloak is not applied to a victim. In this case, the direction of the victim’s target branch (in line 3) is easily inferred by the attacker. The attacker can manipulate the virtual address of the attacker branch such that it maps to the same PHT entry as the target branch, thereby enabling a PHT-based side-channel attack. With BranchCloak applied (in Figure 4b), two r-branches (in lines 2 and 7) are inserted near the target branch in the victim’s process. The attacker will try to leak the control flow information of the secret-dependent branch by probing the changed PHT state. However, the r-branch is deliberately constructed to reference the same PHT entry as the target branch, thereby obfuscating the PHT state before the attacker can extract changes caused by the target branch. Unless the attacker is able to distinguish the branch instruction that actually affected the state, he/she cannot infer the direction of the target branch.

Implementing BranchCloak is not straightforward, and we need to tackle two problems for a successful implementation. First, in order to make the r-branch and the target branch refer to the same PHT entry, we need to figure out the indexing structure of the PHT. How many lower bits of the address of the branch instruction are used to determine the PHT index? In the case of BTB, it was discovered that partial address bits of the indirect branch instructions are used in BTB indexing [1,31,54]. Similarly, we hypothesize that partial address bits of the conditional branch are used in PHT indexing. To verify the hypothesis, we need to reverse-engineer the PHT structure in Intel processors. Details of our analysis of the PHT structure are presented in Section 3.2.2.

Second, we need to determine how many r-branches BranchCloak need to be inserted and in which direction they should be executed to protect against PHT-based side-channel attacks. For this, we perform a probabilistic analysis to determine the direction and the optimal number of r-branches. Details of our analysis are presented in Section 3.2.3.

3.2.2. The Number of Bits to Align in PHT

We verify our hypothesis on Intel Core CPUs (Sky Lake, Comet Lake, Rocket Lake), where the partial address bits of the conditional branch instructions actually determine the PHT index. From this verification, we can assure that two branches aligned with their partial address bits will refer to the same PHT entry. Furthermore, we will dive deep into more detail to find out the number of aligned bits required to make a PHT collision.

Listing 1 shows a code snippet of our program used for the experiment. In order to create alignments at addresses between two conditional branch instructions (i.e., lines 5 and 14), we use .p2align directives. This directive aligns the subsequent instruction with bits given in the first argument (e.g., 15), and the room created by the alignment is filled with the value given in the second argument (e.g., 0x90). For instance, in the code snippet in Listing 1, two conditional jump instructions are 15-bit aligned (i.e., the lower 15 bits of these addresses are identical) and the room is filled with nop instructions.

Listing 1. Two branches referencing the same PHT entry.

We perform the experiment by changing the alignment size (i.e., the first argument of the .p2align) to check whether the two branches collide in the PHT. Our experiment procedure is as follows:

Step 1.: Initialize the branch prediction unit so as to activate a one-level predictor. We use a PHT randomization code [21] to achieve this.
Step 2.: Run the branch_one() function with the argument a = 1 twice to make the state of the PHT entry change to taken. As the one-level predictor uses a two-bit saturating counter, the PHT state moves to taken if at least two executions of the branch instruction are taken, regardless of the initialized state.
Step 3.: Run the branch_two() function with the argument b = 1 and measure the branch prediction result. If the two branches share the same PHT entry, the execution of branch_two() will result in a prediction hit.

We utilized the rdtsc instruction to measure the time taken to execute each branch, thereby identifying branch prediction hits and misses. To ensure that the branch instruction was executed precisely between two rdtsc instructions, we inserted mfence instructions before and after the branch.

From these experimental results, we learn that the prediction hit, measured in Step 3 above, occurs with at least a 95% probability, provided that the alignment size (i.e., the first argument of .p2align) is more than 15. On the other hand, the prediction accuracy falls below 50% if the alignment size is less than 15, which indicates that the 15-th address bit is critical for PHT indexing. As a result, we discover that PHT entry collision occurs if the lower 15 bits match the conditional branch instructions.

3.2.3. The Randomizing Branch

BranchCloak attempts to hinder the attacker from inferring the PHT state by employing r-branches, which share the same PHT entry as the target branch, thus obfuscating the PHT state. Hence, for BranchCloak to be effective, we need to answer the following two questions:

Q1. How many r-branches are necessary to obfuscate the PHT state?
Q2. In which direction should the r-branches be executed?

In order to answer these questions, we perform a probabilistic analysis. We define

E_{n}^{S}

to be an event where the PHT state finally moves to

S

after the n-th execution of the branch instruction, where

S \in {S T, W T, S N, W N}

. Then, the probability

P_{n}^{S}

is defined as

\begin{matrix} P_{n}^{S} = P r [E_{n}^{S}] . \end{matrix}

For brevity, we also define a probability

P_{n}

as

\begin{matrix} P_{n} = P r [E_{n}^{W T} \lor E_{n}^{S T}] . \end{matrix}

According to the definition, the following equation holds:

\begin{matrix} P_{n} & = P_{n}^{S T} + P_{n}^{W T} \\ = \frac{1}{2} (P_{n - 1}^{S T} + P_{n - 1}^{W T}) + \frac{1}{2} (P_{n - 1}^{S T} + P_{n - 1}^{W N}) \\ = \frac{1}{2} (P_{n - 2}^{S T} + P_{n - 2}^{W T}) + \frac{1}{4} (P_{n - 2}^{S T} + P_{n - 2}^{W N} + P_{n - 2}^{W T} + P_{n - 2}^{S N}) \\ = \frac{1}{2} P_{n - 2} + \frac{1}{4} \end{matrix}

(1)

And the general term of the recursion Formula (1) can be derived as follows:

\begin{matrix} P_{n} & = \frac{1}{2} P_{n - 2} + \frac{1}{4} \\ \leftrightarrow P_{n} - β & = α (P_{n - 2} - β) \end{matrix}

(2)

The values of alpha and beta that satisfy the above equation are as follows:

α = \frac{1}{2}, β = \frac{1}{2}

Substituting these into Equation (2) and denoting the left-hand side as

Q_{n}

results in the following geometric sequence:

Q_{n} = \frac{1}{2} Q_{n - 2}

For

n = 2 k

and

n = 2 k + 1

, the general terms of the geometric sequence are as follows:

{\begin{matrix} (4) & Q_{2 k + 1} = \frac{1}{2^{k}} Q_{1} (n = 2 k + 1) \\ (5) & Q_{2 k} = \frac{1}{2^{k}} Q_{0} (n = 2 k) \end{matrix}

Substituting

P_{n}

back into

Q_{n}

and simplifying the equation, we obtain the following general term:

Both of the above equations converge to

\frac{1}{2}

when n goes to infinity.

lim_{n \to \infty} P_{n} = \frac{1}{2}

Since the definition of

P_{n}

is the probability of ending at the

T a k e n \in {W T, S T}

state, we only need to consider the probability of the state changing from

Not - taken \in {W N, S N}

to Taken. Table 1 shows the probabilities of the PHT state finally moving to Taken right after executing the target branch n times (

n = 0, 1, \dots, 5

) with respect to the initial PHT state. In the table,

P_{n}

refers to the total probability when n branches all have a probability of 50% each to be taken or not taken, rather than n branches taken in the same direction.

As the goal of BranchCloak is to deviate the PHT state from the execution result of the victim’s target branch instruction, it would be more effective if BranchCloak could create a significant change in the PHT state. If r-branches inserted by BranchCloak are taken with a probability of 50% each, the probability that the PHT state moves from Not-taken to Taken and vice versa does not significantly increase. As shown in Table 1, if two branches are inserted, then the probability

P_{2}

becomes

P_{2} = 1 / 4

, so the attack success rate is reduced by only 25%. However, if the r-branches follow the same direction, the PHT state change significantly increases. We define

{\tilde{E}}_{n}^{S}

to be an event where the PHT state finally moves to

S \in {S T, W T, S N, W N}

after the n-th execution of the branch instruction, given that all the branches have been executed in the same direction. We also define the probability

{\tilde{P}}_{n}^{S}

and

{\tilde{P}}_{n}

as

\begin{matrix} {\tilde{P}}_{n}^{S} & = P r [{\tilde{E}}_{n}^{S}], \\ and {\tilde{P}}_{n} & = P r [{\tilde{E}}_{n}^{W T} \lor {\tilde{E}}_{n}^{S T}], respectively . \end{matrix}

The values of

{\tilde{P}}_{n}

are shown in Table 2. The probability

{\tilde{P}}_{2}

becomes

{\tilde{P}}_{2} = 1 / 2

, and the attack success rate is reduced by 50%, which fully randomizes the PHT states. Since

{\tilde{P}}_{n}

approaches

1 / 2

if

n \geq 2

, BranchCloak inserts two r-branches following the same direction with one random value.

3.2.4. BranchCloak with Two-Level Predictor

In the previous sections, we presented BranchCloak regarding the attack targeting the one-level predictor. Essentially, two-level predictor attacks also utilize the collided PHT entry as a leakage source. However, if both the target branch and r-branches are executed concurrently between the attacker’s initialization phase and the probing phase, the GHR is altered by the target branch, preventing PHT entry collisions between the target branch and the r-branches. Consequently, the state obfuscation-based defense mechanism of BranchCloak is rendered ineffective for multi-stepping attackers. However, the effectiveness of BranchCloak against single-stepping attackers targeting the two-level predictor remains valid and is analyzed in Section 4.3.

3.3. Implementation

BranchCloak is designed to be easily integrated to applications without hardware modification. With such a design principle in mind, we customize a GCC toolchain to implement BranchCloak. In particular, we extend the C language with two new directives, SECURE_start and SECURE_end. Those directives are used by software developers to locate a target conditional branch statement in the source code. Listing 2 is an example of a code snippet that has a secret-dependent branch statement surrounded by the directives.

Listing 2. Code snippet with a secret-dependent conditional branch.

As shown in Listing 3, our customized compiler then implicitly inserts r-branches (Lines 8, 9, 12, and 13) after the target branch statement. These r-branches have the same branch direction, which is determined uniformly at random. It is imperative to emphasize that the actual implementation should adhere to a cryptographically secure pseudorandom number generator (CSPRNG).

Listing 3. Code snippet with two r-branches augmented.

The compiler then aligns the inserted r-branches so that they occupy the same PHT entry with the target branch instruction. For simplicity, we use the .p2align directive to make all the branch instructions aligned in the memory. Listing 4 shows the code snippet in assembly after the r-branches (Lines 7, 12, 20, and 25) have been aligned with the target branch instruction (Line 4).

Listing 4. Code snippet in assembly with r-branches aligned.

The location of r-branches. The r-branches may be located right after or before the target conditional branch statement. It is noteworthy that the security of BranchCloak depends on the location of the r-branches. That is, putting r-branches before the target branch degrades the security level of BranchCloak as we discuss in detail in the next section. Thus, in our implementation, we decide to locate the r-branches right after the target branch as shown in Listing 3.

4. Security Analysis

4.1. Security Regarding the Location of r-Branches

PHT-based side-channel attacks generally leak the secret value one bit at a time. As the branch decisions of r-branches in BranchCloak are made uniformly at random, the PHT state obfuscated by r-branches will reveal no information regarding the secret to attackers, achieving perfect security. Nevertheless, the security of BranchCloak may be affected by the placement of r-branches, i.e., whether they are positioned before or after the target branch.

We analyze the security of BranchCloak with respect to the placement of r-branches. PHT-based side-channel attacks have variants according to the initial PHT state configured by the attacker; several instances of the variant are illustrated in Figure 3. Although the attack also can be initiated with other states such as WN (Weakly Not-taken) and SN (Strongly Not-taken), we do not consider them in the security analysis as they are symmetrical to the variants.

Security with r-branch after target. We analyze the security of BranchCloak in the case that r-branches are placed after the target branch instructions (see Figure 5a,b). As per the following theorem, in this case, BranchCloak does not disclose any information about the original PHT state of the target branch.

Theorem 1

(Perfect security). The PHT state obfuscated by BranchCloak gives no information about the original PHT state if two r-branches are placed after the target branch.

Proof.

In line with prior quantitative side-channel analyses [59], we analyze the mutual information between the direction of the target branch,

X \in {T, N T}

, and the final PHT state after executing the r-branches (if it is at Taken or Not-taken),

O \in {T, N T}

. The mutual information

I (X; O)

is defined as

I (X; O) = H (X) - H (X | O)

where

H (X)

denotes the entropy of the target branch direction and

H (X | O)

is the conditional entropy of the target branch direction given the final PHT state.

Define the initial PHT state

S_{0} \in {S N, W N, W T, S T}

and the possible PHT states after the execution of the target branch to

S_{X}

where X stands for the direction of the target branch (T for Taken and NT for Not-taken). Then, the following equation holds:

\begin{matrix} S_{T} & = {W N, W T, S T}, \\ S_{N T} & = {W T, W N, S N} . \end{matrix}

Furthermore, let the final state after executing the r-branches be

\begin{matrix} S_{X}^{Y} = {S_{T}^{T}, S_{T}^{N T}, S_{N T}^{T}, S_{N T}^{N T}}, \end{matrix}

where X stands for the direction of the target branch, and Y stands for the direction of the r-branches. The attacker will probe the target branch with the PHT in state

S_{X}^{Y}

. Since the two r-branches are executed in the same direction, the possible PHT states after the execution of the target branch and the r-branches are shown in Table 3:

Then, we have the following assignments for the propositional variables:

\{\begin{array}{l} p_{1} : S_{X}^{Y} \in {W T, S T} \\ q_{1} : Y = T \end{array}

\{\begin{array}{l} p_{2} : S_{X}^{Y} \in {W N, S N} \\ q_{2} : Y = N T \end{array}

Based on Table 3 expressing all possible cases, the statements

p_{1} \leftrightarrow q_{1}

and

p_{2} \leftrightarrow q_{2}

hold. Furthermore, noting that

O = T

corresponds to the case where

S_{X}^{Y} \in {W T, S T}

, the following conditional probabilities hold if

Y \sim U n i f o r m ({T, N T})

:

\begin{matrix} P (O = T | X = T) & = P (Y = T) = 0.5, \\ P (O = T | X = N T) & = P (Y = T) = 0.5, \\ P (O = N T | X = T) & = P (Y = N T) = 0.5, \\ P (O = N T | X = N T) & = P (Y = N T) = 0.5 . \end{matrix}

Given this symmetry, the conditional distribution

P (X | O)

is identical to the prior distribution

P (X)

. Specifically, using Bayes’ rule,

\begin{matrix} P (X = x | O = o) & = \frac{P (O = o | X = x) P (X = x)}{P (O = o)} . \end{matrix}

Since

P (O = o | X = x) = 0.5

for all

o \in {T, N T}

, the numerator becomes

0.5 \cdot P (X = x)

, and the denominator is

\begin{matrix} P (O = o) = \sum_{x} P (O = o | X = x) \cdot P (X = x) = 0.5 \cdot \sum_{x} P (X = x) = 0.5 . \end{matrix}

Thus,

\begin{matrix} P (X = x | O = o) = P (X = x), \end{matrix}

which confirms that X and O are statistically independent. As a result, the conditional entropy remains unchanged:

\begin{matrix} H (X | O) = \sum_{o} P (O = o) \cdot H (X | O = o) = \sum_{o} P (O = o) \cdot H (X) = H (X), \end{matrix}

since

P (X | O = o) = P (X)

for all o. Therefore, the mutual information reduces to

\begin{matrix} I (X; O) = H (X) - H (X | O) = H (X) - H (X) = 0 . \end{matrix}

This confirms that the final PHT state O does not reduce the uncertainty of the direction of the target branch X. Consequently, no information is leaked to the attacker through the branch prediction result by the attacker’s probing (if it is predicted as Taken or Not-taken).

□

Figure 5a,b illustrate how BranchCloak obfuscates the target PHT state. First, we consider the case of Figure 5a, where the victim’s secret is false. If the attack was successful, the PHT state moves to Weakly Not-taken, and a branch prediction miss should occur when the attacker’s branch is executed as taken. If both r-branches are executed as taken, the PHT entry, of which the previous state was Weakly Not-taken, changes its state to Strongly Taken. This will cause a prediction hit for the attacker’s branch being executed as taken, which makes the attacker infer an incorrect secret value.

On the other hand, the opposite case where the secret value is false and the r-branches are executed as not-taken will make the PHT state move from Weakly Not-taken to Strongly Not-taken. However, a branch prediction miss still occurs, allowing an attacker to infer the correct secret value. In Figure 5 and Figure 6, the term ‘hit’ and ‘miss’ colored in blue indicate that the prediction occurred as expected by the attacker, and the same terms in red indicate that the prediction occurred not as expected. In other words, as shown in Figure 5a, the attacker is unable to accurately determine the secret value. The probability of the attacker successfully guessing the secret value is equivalent to the probability of the attacker correctly guessing the direction of the r-branches. Therefore, BranchCloak achieves perfect security on the PHT state of the target branch.

Security with r-branch before target. We analyze the security of BranchCloak in the case of r-branches being placed in front of the target branch. Figure 6 illustrates the defense scenario in which r-branches are inserted before the target branch with respect to the initial PHT state of the attack. If the attacker sets the PHT to Weakly Taken (as depicted in Figure 6a), the PHT’s final state only relies on the direction of the r-branches.

However, it becomes vulnerable to bypassing if the attacker sets the PHT to Strongly Taken (as depicted in Figure 6b). When the r-branches are executed as not-taken, the PHT state changes from Strongly Taken to Weakly Not-taken, effectively obfuscating the attacker’s probing. Conversely, if the r-branches are executed as taken, the PHT state remains unchanged. This implies that the attacker has the same likelihood of retrieving the secret value as in the case where no r-branches are executed. Consequently, the attacker can predict three out of four cases, resulting in a 75% success rate. Therefore, BranchCloak may be vulnerable to statistical attacks if r-branches are located before the target branch.

Discussion. In summary, if r-branches are located right after the target branch, the attacker’s success rate becomes 50% for every case, giving no information about the original PHT state to adversaries. However, it requires more r-branches, as they have to be inserted at all the possible branch targets to ensure that r-branches are always executed regardless of the branching decision of the target branch. Actually, twice as many r-branches as the opposite are necessary, because a conditional branch instruction has two branch targets (i.e., taken and not-taken destinations). This introduces increased code size compared to the opposite. However, it introduces no more execution overhead because only one of these two r-branches is executed, the same as the opposite method.

4.2. Security Against Single-Stepping Attackers

BranchCloak provides protection against multi-stepping attacks through PHT-level obfuscation mechanisms. However, for single-stepping attacks, BranchCloak employs branch-level obfuscation. If the attacker is able to measure the PHT state at the granularity of a single instruction, they may be able to read the PHT state change of the target branch and the r-branch separately in different repetitions of the attack. However, even if these branches are identified as separate state changes, the attacker cannot determine whether the observed PHT state change is caused by the target branch or the r-branch. It is important to note that single-stepping attackers with the capability to retrieve the current instruction pointer are considered outside the scope of this work.

4.3. Security Against Various Attacker Capabilities

In the previous section, we demonstrated how BranchCloak defends against attacks utilizing the one-level predictor. We now direct our attention to the capabilities of BranchCloak as they pertain to other components of the hybrid conditional branch predictor, specifically the two-level predictor, and extend this analysis to the state-of-the-art TAGE predictor. Table 4 presents the effectiveness of BranchCloak based on the attacker’s stepping resolution and the type of predictor targeted. The one-level predictor in hybrid predictor and the base predictor in TAGE predictor refer to local predictors that index the PHT using only the branch address. In contrast, the two-level predictor in hybrid predictor and the history-based predictor in TAGE predictor are history-based predictors that index the PHT using both the branch address and the GHR.

Security against attacks on two-level predictor. We discuss the security of BranchCloak against attacks that use a two-level predictor. One constraint is that for r-branches and the target branch to collide, the GHR must be the same when each branch is executed. This is only valid for attackers who are capable of performing precise single-stepping and unaware of the current instruction pointer.

Notably, attackers who are unaware of the current instruction pointer must fix the GHR for each instruction to successfully cause PHT entry collisions with the target branch. Consequently, defenders do not need to consider the state of the GHR. In this case, the explanation presented in Section 4.2 remains applicable. Similarly, since attackers cannot distinguish between PHT state changes caused by the target branch and those caused by the r-branch, they cannot accurately extract the control flow of the target branch.

For attackers with lower stepping resolution, if the target branch and r-branches are executed together between the attacker’s initialization and probing phases, the target branch can alter the GHR and prevent collision with the r-branches. In this case, BranchCloak is ineffective at preventing PHT-based side-channel attacks.

Security against attacks on TAGE predictor. BranchCloak provides the same level of security with the advanced conditional branch predictor TAGE as it does with hybrid conditional branch predictors. The internal structure of the BPU is not officially documented and has been primarily inferred from reverse engineering studies. For instance, BranchScope [21] and Bluethunder [22] attacks assumed that Intel processors (e.g., Sandy Bridge, Haswell, Sky Lake) implement a hybrid branch predictor and designed their experiments accordingly. However, recent studies [26,40] have empirically validated that Intel processors from Ivy Bridge to Raptor Lake use the TAGE predictor with three PHTs. In particular, BranchScope leveraged the fact that executing a large number of random branches can force the predictor to fall back to a one-level predictor. Similarly, Yavarzadeh et al. [40] demonstrated that by executing adversarial branches, it is possible to fill the PHT entirely and ensure that no matching tagged component exists, thereby forcing the TAGE predictor to reference only the base predictor. This fallback behavior is consistent regardless of whether the underlying branch predictor is a traditional hybrid predictor or a TAGE-based design. Furthermore, Yavarzadeh et al. [40] revealed that the base predictor uses the lower 13 bits of the branch address for indexing. Given this behavior, BranchCloak’s obfuscation can be directly applied to the base predictor of Intel’s TAGE predictor.

In the case of the history-based TAGE predictor, it is similar to that of a two-level predictor. In the context of a single-stepping adversary, a successful attack on TAGE’s history-based prediction requires the attacker to maintain consistent settings for both the GHR and the PHT across each execution step. However, when the r-branches share the same GHR and are aligned such that it indexes the same PHT entry as the target branch, the attacker can no longer distinguish whether a predictor state change originated from the target branch or the r-branch. This ambiguity enables branch-level obfuscation of the predictor state. As demonstrated by Yavarzadeh et al. [40], the lower 12 bits of the branch address are used for indexing into the tagged components of Intel processors. BranchCloak leverages this insight by employing 15-bit alignment, ensuring that r-branches can effectively collide with the target branch, thereby achieving branch-level obfuscation. For multi-stepping attackers, the defense remains ineffective, as it does with a two-level predictor.

5. Performance Analysis

Applying BranchCloak may introduce some overhead when it comes to software performance. We evaluate the performance overhead of BranchCloak by conducting experiments with real-world applications. In particular, we use popular open-source cryptographic libraries, such as OpenSSL, MbedTLS, and Libgcrypt, as our victim applications using BranchCloak. The experimental environments are presented in Table 5. We use the latest version of each cryptographic library on various machines running 64-bit Ubuntu 18.04, equipped with Intel processors of various generations: Kaby Lake, Comet Lake, and Rocket Lake.

To conduct the experiment, we manually identify vulnerable functions in cryptographic libraries that contain conditional branch statements dependent on secret values. Specifically, we focus on RSA modular exponentiation and ECC scalar multiplication functions, where conditional branches are executed based on individual bits of a private key. These identified conditional branch instructions are the targets for r-branch placement by BranchCloak.

For instance, Listing 5 shows a code snippet of the BN_mod_exp_mont() function in the OpenSSL library. The function implements a sliding-window modular exponentiation, where line 9 is identified as a target branch. Likewise, we identify vulnerable functions that have secret-dependent conditional branches in MbedTLS and Libgcrypt.

Listing 5. Sliding-window modular exponentiation in OpenSSL.

The mbedtls_mpi_exp_mod() function in Listing 6 is another implementation of sliding-window modular exponentiation. The ith bit of the exponent is stored in a variable ei at line 5, and the branch dependent on ei is located in lines 7 and 10, allowing the attackers to determine the value of ei.

Listing 6. Sliding-window modular exponentiation in MbedTLS.

Listing 7 shows an implementation of scalar multiplication on elliptic curves. The _gcry_mpi_ec_mul_point() function is implemented as left-to-right binary multiplication. If the jth-bit value of a secret variable scalar is one, then double-and-add operations are performed; otherwise, only a double operation is performed. Thus, we identify that line 7 in Listing 7 is a secret-dependent branch.

Listing 7. Left-to-right scalar multiplication in Libgcrypt.

In the experiment, BranchCloak is applied to all the secret-dependent branches identified in the libraries.

5.1. Execution Overhead

We performed an evaluation of BranchCloak in terms of its execution overhead. Table 6 shows the results of our experiments. We utilized a linux-perf tool to measure the execution time of the BranchCloak-protected cryptographic libraries. For the comparison, we also measured the execution time of the original (unmodified) libraries. The term ‘Iterations per second’ in Table 6 refers to the number of executions per second of a function identified to contain a secret-dependent branch. We ran the test 100 times for each cryptographic library to obtain the average and standard deviation of the execution time. From the experiment, we found that the geometric mean of the execution overhead was 3.51% (median = 5.97%), with 9.1% of the maximum overhead. Note that the large range in these figures depends on the proportion of each secret-dependent branch in each function. For functions that already had a long execution time, the performance overhead of BranchCloak was relatively small (e.g., the case of MbedTLS).

To contextualize our work within the broader landscape of side-channel defenses, we present a comparative analysis of existing methodologies. Table 7 summarizes the techniques, protection scopes, and performance overheads of software-based defenses relevant to PHT-based side-channel attacks. Among these, the work by Coppens et al. [28] implements selective if-conversion for key-dependent branches in modular exponentiation of RSA encryption, aligning in essence with BranchCloak. However, their approach incurs a significantly higher performance overhead, reaching up to 24×.

Furthermore, previous research shows that aligning the program’s source code has a minimal impact on the performance overhead [60]. That is, according to [60], the mitigation of Spectre attacks through the alignment of all basic blocks in a program, with a maximum alignment of 5 bits per block, has been shown to result in a overhead of less than 1% in performance. Based on this finding, it can be assumed that aligning a limited number of branches using BranchCloak will similarly result in a small execution overhead.

5.2. Program Size Overhead

We measured the performance overhead regarding the size of the object file to which BranchCloak was applied. The experimental result is presented in Table 8. As OpenSSL and Libgcrypt had one secret-dependent branch in the implementations, a total of five branches (including four r-branches) were aligned in the memory. For MbedTLS, it has two secret-dependent branches, thus the increase in program size was greater than the other libraries as more branch instructions had to be aligned in the memory. It is noteworthy that the increase in program size caused by the branch alignments was equally applied to the text portion of the memory when the program executed.

5.3. Microarchitectural Side-Effects

We also conducted an evaluation of the microarchitectural performance overhead of BranchCloak by utilizing hardware performance counters. Table 9 shows our measurement results concerning various performance counters such as page faults, branch misprediction rate, and the LLC miss rate (including instruction prefetch misses), including a comparison with the original (unmodified) libraries. Since the evaluation functions as the performance counter value when executing the test file, we focused on the amount of change rather than the absolute value of the performance counter. As a result, we observed that as the size of the source code increased, the number of page faults could quadruple. This is the overhead that occurs when the function is executed for the first time, but its impact on performance diminishes when the function is executed repeatedly. Furthermore, we also determined that BranchCloak had no discernible impact on the cache miss rate, which is a critical factor influencing the execution speed of software applications.

Because the virtual addresses of the branches are used for indexing the PHT, aligning adjacent branches to refer to the same PHT can affect the performance of the branch predictor. If the number of branches with BranchCloak applied increases, particularly when applied to repeatedly executed branches (e.g., conditions in for and while loops), it can result in substantial performance overhead. Figure 7 illustrates the latency of branch prediction hits and misses for the CPUs used in the experiments. On average, there is a latency difference of 14.9 cycles between a branch prediction hit and a miss, resulting in an additional overhead of 12.9% when a misprediction occurs compared to a hit. The branch execution time was measured in an environment where noise from other programs was minimized. The absolute execution time may vary depending on the environmental context, but the cycle difference between hits and misses remains relatively consistent.

Prior research has also empirically validated that branch mispredictions increase branch latency on other Intel CPU architectures, including Sky Lake, Coffee Lake, and Cascade Lake [21,23]. Nevertheless, secret-dependent branches are likely to represent a small portion of the program’s source code. Specifically, the secret-dependent branches we identified are limited to just one or two branches in cryptographic functions. While there is a possibility that an attacker with advanced code analysis capabilities could identify additional vulnerable branches that might give up sensitive data, this risk is limited, and the overall impact of BranchCloak is expected to be minimal.

6. Limitations

6.1. Applying BranchCloak to Various Architectures

Before applying BranchCloak, it is essential to have a precise understanding of the structure of the BPU in the target processor. This requirement may introduce new overhead for defenders, as they must ascertain the internal architecture of the processor in use. However, we anticipate that BranchCloak will be effective on any processor utilizing a saturating counter-based PHT.

Although the specific indexing function may differ across processor architectures, it is generally expected that branch addresses and some form of branch history, such as the GHR, are consistently used in PHT indexing. Xu et al. [24] demonstrated that the TAGE predictor in ARM Cortex-A72 and A76 utilizes PC[14:4] for indexing, while Cortex-A53 uses PC[13:4]. For these ARM processors, BranchCloak can be directly applied with a 15-bit alignment. In the case of Cortex-A53, adjusting the alignment size to 14 bits achieves equivalent protection. For AMD processors, it has been officially reported that the TAGE predictor is being adopted starting from the Zen 2 architecture [44]. However, the exact indexing function, including the number of bits used, has not yet been publicly analyzed. For processors where the indexing logic remains unknown, further reverse engineering is necessary. If indexing involves bits beyond the lower 15, BranchCloak’s alignment size must be modified accordingly. Thus, the defender must possess accurate knowledge of the target execution environment to ensure correct application of the defense. Developing compiler support that dynamically detects the underlying microarchitecture and adjusts the alignment strategy represents a promising direction for future research.

6.2. Identifying Vulnerable Branches

The primary advantage of BranchCloak is its ease of implementation, requiring only straightforward code modifications once vulnerable branches have been identified. However, the process of identifying vulnerable branches must first be undertaken, which can introduce additional overhead for defenders. Furthermore, it is crucial for developers to recognize that sensitive branches are not limited to those related to secret-dependent cryptographic operations. Modern cryptographic libraries often employ constant-time implementations to fundamentally prevent side-channel attacks that exploit caches, BPUs, and other resources.

In addition, attacks can also target branches in other functional code to leak sensitive data. Recently, the TrustZoneTunnel attack [24] has been demonstrated to successfully carry out a model extraction attack by targeting branches in the activation functions of AI models. This highlights how even non-intuitive, non-cryptographic branches can be vulnerable to attack. It is our contention that the efficacy of BranchCloak could be enhanced by its integration with taint analysis of sensitive variables. Taint analysis is a technique used to track the propagation and utilization of untrusted data, such as user input. This approach could be employed to effectively track sensitive values and identify conditional branches dependent on them. We thus put forth this proposition as a potential avenue for future research.

7. Related Work

As the PHT is an element of the BPU, protection methods for the entire BPU are also applied to the PHT. Many hardware-based approaches have been proposed to secure the BPU. Zhao et al. [38] proposed a method to reduce the coherence of the branch address and the BPU entry by XORing the BPU entry with a high-privileged private key that is periodically re-randomized. Zhang et al. [61] presented STBPU, which encrypts the data in the BPU and remaps the BPU entry based on the secret token to mitigate collision-based attacks on the BPU. Vougioukas et al. [34] proposed a new unit called Branch Retention Buffer, which reserves the partial branch predictor state to enable isolation and maintenance of the branch predictor state depending on the individual context. Similarly, Chen et al. [37] proposed a method to secure BPU entries by XORing the full branch address and the PID in order to index the BPU entry. Zhao et al. [35] proposed a probabilistic saturating counter, which reduces the attacker’s ability to probe the saturating counter status of the PHT entry.

Evtyushkin et al. [36] proposed a software-based mitigation mechanism that randomizes the PHT state during the context switch. The authors executed 300,000 branch instructions to flush the entire PHT, which resulted in 1.2 ms of overhead per context switch. In order to reduce this performance overhead, the authors proposed a method that reduced the number of flushes through OS scheduling, but the performance overhead still reached 20%, depending on the number of simultaneous processes.

Rane et al. [27] proposed Raccoon, a source code-level obfuscation tool for programs that executes a decoy path. Raccoon uses a transaction buffer to buffer intermediate results along the real and the decoy paths. Both paths of the conditional branch are executed once and the transaction buffer is updated in each execution. But non-transactional memory updates occur only along the real path, enabling a pathway for defense against PHT-based side-channel attacks. Raccoon’s performance overhead shows a large deviation from 1× to 1000×, depending on the type of benchmark program used for testing, and has a geometric mean of 16.1×.

Yavarzadeh et al. [40] proposed Half&Half, a software-based mitigation method that isolates the PHT for different processes. Assuming that Intel’s branch predictor had a structure similar to TAGE, the authors conducted reverse engineering on Intel’s branch predictor and discovered that the sixth bit of the branch address plays a pivotal role in determining the PHT index. Consequently, by exerting control over the sixth bit of the branches in each process, they were able to isolate the PHTs of two distinct processes. Additionally, binary level modification is necessary to apply Half&Half to binaries before their execution, resulting in overhead for all unknown binaries. In contrast, BranchCloak ensures security by modifying the source code of the vulnerable executable itself, thereby eliminating the need to modify the source code or binaries of other unknown executables and avoiding any additional overhead.

Several studies also proposed methods to remove the conditional branch itself using the if-conversion [28,29,30,31]. Molnar et al. [62] developed source-to-source program transformation frameworks to prevent the program counter from being affected by sensitive values. Furthermore, Coppens et al. [28] advanced a compiler back-end transformation method that further developed the method revealed in [62].

In recent studies [30,31], conditional branches are replaced with conditional move (CMOV) and indirect jump, which prevents changes to the PHT state, that can mitigate a PHT-based side-channel attacks. However, the use of indirect jump also creates a vulnerability to BTB-based side-channel attacks. To mitigating such an attack, Lee et al. [31] proposed Zigzagger, a method for inserting a set of trampoline branch instructions. Similar to Zigzagger, Hosseinzadeh et al. [30] presented a run-time randomization algorithm that obfuscates the address of trampoline branches. The authors periodically randomize the location of the trampoline branch and proposed a mitigation method for high-resolution attacks using a single-stepping method such as SGX-Step [57]. Regardless of the periodic re-randomization of the trampoline location, the performance and code size overheads of this method reach 64%, depending on the number of trampolines.

Retpoline [63] is similar to the methods described above in that it replaces the indirect jump and call with return instructions, but we consider this technique out-of-scope for this paper because it does not apply to conditional jumps. In addition, it is not only practically difficult to remove all conditional branches and apply this method in real-world applications, but it can also cause a massive slowdown in branches receiving performance advantages due to branch prediction.

8. Conclusions

In this paper, we proposed a new horizon of mitigation methods against PHT-based side-channel attacks. These attacks are particularly severe because they do not require complex attack gadgets and can exploit a broad attack surface. We presented and evaluated our novel methodology, BranchCloak, which obscures the control flow of the victim process. BranchCloak is designed to mitigate attacks that exploit the PHT state to leak the direction of secret-dependent conditional branches. The key idea behind BranchCloak is to make vulnerable target conditional branches aligned with r-branches in the memory so that they share the same entry in the PHT. As these branches refer to the same PHT entry, an attacker cannot distinguish which branch has caused the change in PHT state, thus making a PHT-based side-channel attack infeasible. Thus, in light of this, we implemented BranchCloak by customizing a C compiler that inserts r-branches pointing to the same PHT entry. We proved the effectiveness of BranchCloak through probabilistic analysis and evaluated its performance overhead with real-world cryptographic libraries. Our results showed that BranchCloak has an average performance loss of 3.51% and code size overhead of less than 188 KB per secret-dependent branch with BranchCloak applied.

Author Contributions

Conceptualization, J.K.; methodology, J.K. and H.J.; software, J.K.; validation and visualization, J.K. and H.J.; writing—original draft, J.K.; writing—review and editing, H.J. and Y.S.; supervision Y.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by a National Research Foundation of Korea (NRF) grant, funded by the Korean government (MSIT) (No. 2023R1A2C2006862).

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Kocher, P.; Horn, J.; Fogh, A.; Genkin, D.; Gruss, D.; Haas, W.; Hamburg, M.; Lipp, M.; Mangard, S.; Prescher, T.; et al. Spectre attacks: Exploiting speculative execution. In Proceedings of the 2019 IEEE Symposium on Security and Privacy, San Francisco, CA, USA, 19–23 May 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 1–19. [Google Scholar]
Lipp, M.; Schwarz, M.; Gruss, D.; Prescher, T.; Haas, W.; Fogh, A.; Horn, J.; Mangard, S.; Kocher, P.; Genkin, D.; et al. Meltdown: Reading kernel memory from user space. In Proceedings of the 27th USENIX Security Symposium (USENIX Security 18), Baltimore, MD, USA, 15–17 August 2018; pp. 973–990. [Google Scholar]
Maisuradze, G.; Rossow, C. ret2spec: Speculative execution using return stack buffers. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, Toronto, ON, Canada, 15–19 October 2018; pp. 2109–2122. [Google Scholar]
Wikner, J.; Razavi, K. {RETBLEED}: Arbitrary Speculative Code Execution with Return Instructions. In Proceedings of the 31st USENIX Security Symposium (USENIX Security 22), Boston, MA, USA, 10–12 August 2022; pp. 3825–3842. [Google Scholar]
Bhattacharyya, A.; Sandulescu, A.; Neugschwandtner, M.; Sorniotti, A.; Falsafi, B.; Payer, M.; Kurmus, A. Smotherspectre: Exploiting speculative execution through port contention. In Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security, London, UK, 11–15 November 2019; pp. 785–800. [Google Scholar]
Horn, J. Speculative Execution, Variant 4: Speculative Store Bypass. 2018. Available online: https://bugs.chromium.org/p/project-zero/issues/detail?id=1528 (accessed on 12 February 2025).
Xu, Y.; Cui, W.; Peinado, M. Controlled-channel attacks: Deterministic side channels for untrusted operating systems. In Proceedings of the 2015 IEEE Symposium on Security and Privacy, San Jose, CA, USA, 17–20 May 2015; IEEE: Piscataway, NJ, USA, 2015; pp. 640–656. [Google Scholar]
Van Bulck, J.; Weichbrodt, N.; Kapitza, R.; Piessens, F.; Strackx, R. Telling Your Secrets without Page Faults: Stealthy Page {Table-Based} Attacks on Enclaved Execution. In Proceedings of the 26th USENIX Security Symposium (USENIX Security 17), Vancouver, BC, Canada, 16–18 August 2017; pp. 1041–1056. [Google Scholar]
Gras, B.; Razavi, K.; Bos, H.; Giuffrida, C. Translation leak-aside buffer: Defeating cache side-channel protections with {TLB} attacks. In Proceedings of the 27th USENIX Security Symposium (USENIX Security 18), Baltimore, MD, USA, 15–17 August 2018; pp. 955–972. [Google Scholar]
Schwarz, M.; Lipp, M.; Moghimi, D.; Van Bulck, J.; Stecklina, J.; Prescher, T.; Gruss, D. ZombieLoad: Cross-privilege-boundary data sampling. In Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security, London, UK, 11–15 November 2019; pp. 753–768. [Google Scholar]
Van Schaik, S.; Milburn, A.; Österlund, S.; Frigo, P.; Maisuradze, G.; Razavi, K.; Bos, H.; Giuffrida, C. RIDL: Rogue in-flight data load. In Proceedings of the 2019 IEEE Symposium on Security and Privacy (SP), San Francisco, CA, USA, 19–23 May 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 88–105. [Google Scholar]
Koruyeh, E.M.; Khasawneh, K.N.; Song, C.; Abu-Ghazaleh, N.B. Spectre Returns! Speculation Attacks using the Return Stack Buffer. In Proceedings of the WOOT@ USENIX Security Symposium, Baltimore, MD, USA, 13–14 August 2018. [Google Scholar]
Barberis, E.; Frigo, P.; Muench, M.; Bos, H.; Giuffrida, C. Branch History Injection: On the Effectiveness of Hardware Mitigations Against {Cross-Privilege} Spectre-v2 Attacks. In Proceedings of the 31st USENIX Security Symposium (USENIX Security 22), Boston, MA, USA, 10–12 August 2022; pp. 971–988. [Google Scholar]
Behnia, M.; Sahu, P.; Paccagnella, R.; Yu, J.; Zhao, Z.N.; Zou, X.; Unterluggauer, T.; Torrellas, J.; Rozas, C.; Morrison, A.; et al. Speculative interference attacks: Breaking invisible speculation schemes. In Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Lausanne, Switzerland, 19–23 April 2021; pp. 1046–1060. [Google Scholar]
Li, L.; Yavarzadeh, H.; Tullsen, D. Indirector: High-Precision Branch Target Injection Attacks Exploiting the Indirect Branch Predictor. In Proceedings of the 33rd USENIX Security Symposium (USENIX Security 24), Philadelphia, PA, USA, 14–16 August 2024; pp. 2137–2154. [Google Scholar]
Cheng, S.H.W.; Chuengsatiansup, C.; Genkin, D.; McNeil, D.; Murray, T.; Yarom, Y.; Zhang, Z. Evict+ Spec+ Time: Exploiting Out-of-Order Execution to Improve Cache-Timing Attacks. IACR Trans. Cryptogr. Hardw. Embed. Syst. 2024, 2024, 224–248. [Google Scholar] [CrossRef]
Ragab, H.; Mambretti, A.; Kurmus, A.; Giuffrida, C. GhostRace: Exploiting and Mitigating Speculative Race Conditions. In Proceedings of the 33rd USENIX Security Symposium (USENIX Security 24), Philadelphia, PA, USA, 14–16 August 2024; pp. 6185–6202. Available online: https://www.vusec.net/projects/ghostrace (accessed on 16 February 2025).
Jang, H.; Kim, T.; Shin, Y. SysBumps: Exploiting Speculative Execution in System Calls for Breaking KASLR in macOS for Apple Silicon. In Proceedings of the 2024 ACM SIGSAC Conference on Computer and Communications Security, Salt Lake City, UT, USA, 14–18 October 2024; pp. 64–78. [Google Scholar]
Zhang, J.; Chen, C.; Cui, J.; Li, K. Timing side-channel attacks and countermeasures in CPU microarchitectures. ACM Comput. Surv. 2024, 56, 1–40. [Google Scholar] [CrossRef]
Chowdhuryy, M.H.I.; Zheng, H.; Yao, F. MetaLeak: Uncovering Side Channels in Secure Processor Architectures Exploiting Metadata. In Proceedings of the 2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA), Buenos Aires, Argentina, 29 June–3 July 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 693–707. [Google Scholar]
Evtyushkin, D.; Riley, R.; Abu-Ghazaleh, N.C.; Ponomarev, D. BranchScope: A New Side-Channel Attack on Directional Branch Predictor. In Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems, Williamsburg, VA, USA, 4–11 August 2018; pp. 693–707. [Google Scholar]
Huo, T.; Meng, X.; Wang, W.; Hao, C.; Zhao, P.; Zhai, J.; Li, M. Bluethunder: A 2-level Directional Predictor Based Side-Channel Attack against SGX. IACR Trans. Cryptogr. Hardw. Embed. Syst. 2019, 2020, 321–347. [Google Scholar] [CrossRef]
Chowdhuryy, M.H.I.; Yao, F. Leaking Secrets through Modern Branch Predictor in the Speculative World. IEEE Trans. Comput. 2021, 71, 2059–2072. [Google Scholar] [CrossRef]
Xu, T.; Ding, A.A.; Fei, Y. TrustZoneTunnel: A Cross-World Pattern History Table-Based Microarchitectural Side-Channel Attack. In Proceedings of the 2024 IEEE International Symposium on Hardware Oriented Security and Trust (HOST), Washington, DC, USA, 6–9 May 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 1–11. [Google Scholar]
Ronen, E.; Gillham, R.; Genkin, D.; Shamir, A.; Wong, D.; Yarom, Y. The 9 lives of Bleichenbacher’s CAT: New cache attacks on TLS implementations. In Proceedings of the 2019 IEEE Symposium on Security and Privacy (SP), San Francisco, CA, USA, 19–23 May 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 435–452. [Google Scholar]
Yavarzadeh, H.; Agarwal, A.; Christman, M.; Garman, C.; Genkin, D.; Kwong, A.; Moghimi, D.; Stefan, D.; Taram, K.; Tullsen, D. Pathfinder: High-resolution control-flow attacks exploiting the conditional branch predictor. In Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, San Diego, CA, USA, 27 April–1 May 2024; Volume 3, pp. 770–784. [Google Scholar]
Rane, A.; Lin, C.; Tiwari, M. Raccoon: Closing Digital Side-Channels through Obfuscated Execution. In Proceedings of the 24th USENIX Security Symposium (USENIX Security 15), Washington, DC, USA, 10–12 August 2015; pp. 431–446. [Google Scholar]
Coppens, B.; Verbauwhede, I.; De Bosschere, K.; De Sutter, B. Practical mitigations for timing-based side-channel attacks on modern x86 processors. In Proceedings of the 2009 30th IEEE Symposium on Security and Privacy, Oakland, CA, USA, 17–20 May 2009; IEEE: Piscataway, NJ, USA, 2009; pp. 45–60. [Google Scholar]
Choi, Y.; Knies, A.; Gerke, L.; Ngai, T.F. The impact of if-conversion and branch prediction on program execution on the intel itanium processor. In Proceedings of the 34th ACM/IEEE International Symposium on Microarchitecture, Austin, TX, USA, 1–5 December 2001; p. 182. [Google Scholar]
Hosseinzadeh, S.; Liljestrand, H.; Leppänen, V.; Paverd, A. Mitigating branch-shadowing attacks on intel sgx using control flow randomization. In Proceedings of the 3rd Workshop on System Software for Trusted Execution, Toronto, ON, Canada, 15 October 2018; pp. 42–47. [Google Scholar]
Lee, S.; Shih, M.W.; Gera, P.; Kim, T.; Kim, H.; Peinado, M. Inferring fine-grained control flow inside SGX enclaves with branch shadowing. In Proceedings of the 26th USENIX Security Symposium (USENIX Security 17), Vancouver, BC, Canada, 16–18 August 2017; pp. 557–574. [Google Scholar]
Gruss, D.; Lipp, M.; Schwarz, M.; Fellner, R.; Maurice, C.; Mangard, S. Kaslr is dead: Long live kaslr. In Proceedings of the International Symposium on Engineering Secure Software and Systems, Bonn, Germany, 3–5 July 2017; Springer: Berlin/Heidelberg, Germany, 2017; pp. 161–176. [Google Scholar]
Ainsworth, S.; Jones, T.M. Muontrap: Preventing cross-domain spectre-like attacks by capturing speculative state. In Proceedings of the 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA), Virtual, 29 May–3 June 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 132–144. [Google Scholar]
Vougioukas, I.; Nikoleris, N.; Sandberg, A.; Diestelhorst, S.; Al-Hashimi, B.M.; Merrett, G.V. BRB: Mitigating Branch Predictor Side-Channels. In Proceedings of the 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA), Washington, DC, USA, 16–20 February 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 466–477. [Google Scholar]
Zhao, L.T.; Hou, R.; Wang, K.; Su, Y.L.; Li, P.N.; Meng, D. A Novel Probabilistic Saturating Counter Design for Secure Branch Predictor. J. Comput. Sci. Technol. 2021, 36, 1022–1036. [Google Scholar] [CrossRef]
Evtyushkin, D.; Ponomarev, D.; Abu-Ghazaleh, N. Understanding and mitigating covert channels through branch predictors. ACM Trans. Archit. Code Optim. (TACO) 2016, 13, 1–23. [Google Scholar] [CrossRef]
Chen, C.; Shen, C.; Zhang, J. Lightweight and Secure Branch Predictors against Spectre Attacks. In Proceedings of the 2022 27th Asia and South Pacific Design Automation Conference (ASP-DAC), Virtual, 17–20 January 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 25–30. [Google Scholar]
Zhao, L.; Li, P.; Hou, R.; Huang, M.C.; Li, J.; Zhang, L.; Qian, X.; Meng, D. A lightweight isolation mechanism for secure branch predictors. In Proceedings of the 2021 58th ACM/IEEE Design Automation Conference (DAC), Virtual, 5–9 December 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 1267–1272. [Google Scholar]
Sakalis, C.; Kaxiras, S.; Ros, A.; Jimborean, A.; Själander, M. Efficient invisible speculative execution through selective delay and value prediction. In Proceedings of the 2019 ACM/IEEE 46th Annual International Symposium on Computer Architecture (ISCA), Phoenix, AZ, USA, 22–26 June 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 723–735. [Google Scholar]
Yavarzadeh, H.; Taram, M.; Narayan, S.; Stefan, D.; Tullsen, D. Half&Half: Demystifying Intel’s Directional Branch Predictors for Fast, Secure Partitioned Execution. In Proceedings of the 2023 IEEE Symposium on Security and Privacy (SP), San Francisco, CA, USA, 22–24 May 2023; IEEE Computer Society: Piscataway, NJ, USA, 2023; pp. 1220–1237. [Google Scholar]
Yarom, Y.; Genkin, D.; Heninger, N. CacheBleed: A timing attack on OpenSSL constant-time RSA. J. Cryptogr. Eng. 2017, 7, 99–112. [Google Scholar] [CrossRef]
Mittal, S. A survey of techniques for dynamic branch prediction. Concurr. Comput. Pract. Exp. 2019, 31, e4666. [Google Scholar] [CrossRef]
Seznec, A. Tage-sc-l branch predictors again. In Proceedings of the 5th JILP Workshop on Computer Architecture Competitions (JWAC-5): Championship Branch Prediction (CBP-5), Seoul, Korea, 18 June 2016. [Google Scholar]
Suggs, D.; Subramony, M.; Bouvier, D. The AMD “Zen 2” Processor. IEEE Micro 2020, 40, 45–52. [Google Scholar] [CrossRef]
Yarom, Y.; Falkner, K. {FLUSH+ RELOAD}: A High Resolution, Low Noise, L3 Cache {Side-Channel} Attack. In Proceedings of the 23rd USENIX Security Symposium (USENIX Security 14), San Diego, CA, USA, 20–22 August 2014; pp. 719–732. [Google Scholar]
Gruss, D.; Maurice, C.; Wagner, K.; Mangard, S. Flush+ Flush: A fast and stealthy cache attack. In Proceedings of the International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment, Donostia-San SebastiÃin, Spain, 7–8 July 2016; Springer: Berlin/Heidelberg, Germany, 2016; pp. 279–299. [Google Scholar]
Gruss, D.; Spreitzer, R.; Mangard, S. Cache Template Attacks: Automating Attacks on Inclusive {Last-Level} Caches. In Proceedings of the 24th USENIX Security Symposium (USENIX Security 15), Washington, DC, USA, 10–12 August 2015; pp. 897–912. [Google Scholar]
Irazoqui, G.; Eisenbarth, T.; Sunar, B. S $ A: A shared cache attack that works across cores and defies VM sandboxing–and its application to AES. In Proceedings of the 2015 IEEE Symposium on Security and Privacy, San Jose, CA, USA, 18–20 May 2015; IEEE: Piscataway, NJ, USA, 2015; pp. 591–604. [Google Scholar]
Liu, F.; Yarom, Y.; Ge, Q.; Heiser, G.; Lee, R.B. Last-level cache side-channel attacks are practical. In Proceedings of the 2015 IEEE Symposium on Security and Privacy, San Jose, CA, USA, 18–20 May 2015; IEEE: Piscataway, NJ, USA, 2015; pp. 605–622. [Google Scholar]
Hunger, C.; Kazdagli, M.; Rawat, A.; Dimakis, A.; Vishwanath, S.; Tiwari, M. Understanding contention-based channels and using them for defense. In Proceedings of the 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA), Burlingame, CA, USA, 7–11 February 2015; IEEE: Piscataway, NJ, USA, 2015; pp. 639–650. [Google Scholar]
Chen, S.; Zhang, X.; Reiter, M.K.; Zhang, Y. Detecting privileged side-channel attacks in shielded execution with Déjá Vu. In Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security, Abu Dhabi, United Arab Emirates, 2–6 April 2017; pp. 7–18. [Google Scholar]
Shih, M.W.; Lee, S.; Kim, T.; Peinado, M. T-SGX: Eradicating Controlled-Channel Attacks Against Enclave Programs. In Proceedings of the NDSS, San Diego, CA, USA, 26 February–1 March 2017. [Google Scholar]
Crane, S.; Homescu, A.; Brunthaler, S.; Larsen, P.; Franz, M. Thwarting cache side-channel attacks through dynamic software diversity. In Proceedings of the NDSS, San Diego, CA, USA, 8–11 February 2015; pp. 8–11. [Google Scholar]
Evtyushkin, D.; Ponomarev, D.; Abu-Ghazaleh, N. Jump over ASLR: Attacking branch predictors to bypass ASLR. In Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture, Taipei, Taiwan, 15–19 October 2016; pp. 1–13. [Google Scholar]
Gullasch, D.; Bangerter, E.; Krenn, S. Cache games–bringing access-based cache attacks on AES to practice. In Proceedings of the 2011 IEEE Symposium on Security and Privacy, Oakland, CA, USA, 22–25 May 2011; IEEE: Piscataway, NJ, USA, 2011; pp. 490–505. [Google Scholar]
Allan, T.; Brumley, B.B.; Falkner, K.; Van de Pol, J.; Yarom, Y. Amplifying side channels through performance degradation. In Proceedings of the 32nd Annual Conference on Computer Security Applications, Los Angeles, CA, USA, 5–9 December 2016; pp. 422–435. [Google Scholar]
Van Bulck, J.; Piessens, F.; Strackx, R. SGX-Step: A practical attack framework for precise enclave execution control. In Proceedings of the 2nd Workshop on System Software for Trusted Execution, Shanghai, China, 28 October 2017; pp. 1–6. [Google Scholar]
Kou, Z.; He, W.; Sinha, S.; Zhang, W. Load-step: A precise trustzone execution control framework for exploring new side-channel attacks like flush+ evict. In Proceedings of the 2021 58th ACM/IEEE Design Automation Conference (DAC), Virtual, 5–9 December 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 979–984. [Google Scholar]
Ito, A.; Ueno, R.; Homma, N. On the success rate of side-channel attacks on masked implementations: Information-theoretical bounds and their practical usage. In Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security, Los Angeles, CA, USA, 7–11 November 2022; pp. 1521–1535. [Google Scholar]
Jang, H.; Shin, Y. MicroCFI: Microarchitecture-Level Control-Flow Restrictions for Spectre Mitigation. IEEE Access 2023, 11, 138699–138711. [Google Scholar] [CrossRef]
Zhang, T.; Lesch, T.; Koltermann, K.; Evtyushkin, D. STBPU: A Reasonably Safe Branch Predictor Unit. arXiv 2021, arXiv:2108.02156. [Google Scholar]
Molnar, D.; Piotrowski, M.; Schultz, D.; Wagner, D. The program counter security model: Automatic detection and removal of control-flow side channel attacks. In Proceedings of the International Conference on Information Security and Cryptology, Seoul, Republic of Korea, 1–2 December 2005; Springer: Berlin/Heidelberg, Germany, 2005; pp. 156–168. [Google Scholar]
Intel. Retpoline, A Branch Target Injection Mitigation. 2018. Available online: https://www.intel.com/content/dam/develop/external/us/en/documents/retpoline-a-branch-target-injection-mitigation.pdf (accessed on 26 January 2025).

Figure 1. Directional branch predictor.

Figure 2. Illustration of a PHT entry collision.

Figure 3. Two example scenarios of PHT-based side-channel attacks: (a) PHT initialized to Weakly Taken, (b) PHT initialized to Strongly Taken. Reference point for determining secret value is highlighted in bold.

Figure 4. Comparison between attacks without BranchCloak (a) and with BranchCloak (b). The attacker’s branch (line 3 of (a)) and the victim’s secret-dependent branch (line 6 of (a)) refer to the same PHT entry. With BranchCloak, the two r-branches (line 2 and 7 of (b)) also refer to the same PHT entry.

Figure 5. Illustration of the defense mechanism of BranchCloak when the r-branches are inserted after the target branch: (a) PHT initialized to Weakly Taken, (b) PHT initialized to Strongly Taken.

Figure 6. Illustration of the defense mechanism of BranchCloak when the r-branches are inserted at the front of the target branch: (a) PHT initialized to Weakly Taken, (b) PHT initialized to Strongly Taken.

Figure 7. Average branch execution time based on branch prediction result.

Table 1. Probability of the PHT state ending at Taken after the n-th branch execution. Branches are executed as Taken or Not-taken with a probability of 50% each.

Initial PHT State	$P_{0}$	$P_{1}$	$P_{2}$	$P_{3}$	$P_{4}$	$P_{5}$
WN	0	$1 / 2$	$1 / 4$	$1 / 2$	$3 / 8$	$1 / 2$
SN	0	0	$1 / 4$	$1 / 4$	$3 / 8$	$3 / 8$

Table 2. Probability of the PHT state ending at Taken after the n-th branch execution. All the branches are executed in the same direction.

Initial PHT State	${\tilde{P}}_{0}$	${\tilde{P}}_{1}$	${\tilde{P}}_{2}$	${\tilde{P}}_{3}$	${\tilde{P}}_{4}$	${\tilde{P}}_{5}$
WN	0	$1 / 2$	$1 / 2$	$1 / 2$	$1 / 2$	$1 / 2$
SN	0	0	$1 / 2$	$1 / 2$	$1 / 2$	$1 / 2$

Table 3. All possible outcomes of

S_{X}^{Y}

depending on the initial PHT state

S_{0}

.

Table 3. All possible outcomes of

S_{X}^{Y}

depending on the initial PHT state

S_{0}

.

(a) $S_{X}^{Y} (S_{0} = SN)$				(b) $S_{X}^{Y} (S_{0} = WN)$				(c) $S_{X}^{Y} (S_{0} = WT)$				(d) $S_{X}^{Y} (S_{0} = ST)$
	X	T	NT		X	T	NT		X	T	NT		X	T	NT
Y		T	NT	Y		T	NT	Y		T	NT	Y		T	NT
T		ST	WT	T		ST	WT	T		ST	ST	T		ST	ST
NT		SN	SN	NT		SN	SN	NT		WN	SN	NT		WN	SN

Table 4. Security of BranchCloak against various predictors.

Predictor Type	Attacker Type	BranchCloak’s Defense Mechanism
Hybrid (1-level)	multi-stepping	PHT-level obfuscation
Hybrid (1-level)	single-stepping	Branch-level obfuscation
Hybrid (2-level)	multi-stepping	N/A †
Hybrid (2-level)	single-stepping	Branch-level obfuscation
TAGE (base predictor)	multi-stepping	PHT-level obfuscation
TAGE (base predictor)	single-stepping	Branch-level obfuscation
TAGE (history-based)	multi-stepping	N/A †
TAGE (history-based)	single-stepping	Branch-level obfuscation

† Outside BranchCloak’s defense boundary.

Table 5. Experimental settings.

Category	Setting
OS	Ubuntu 18.04 64-bit
CPU model	Intel i5-7500 (Kaby Lake) Intel i5-10600 (Comet Lake) Intel i9-11900 (Rocket Lake)
Cryptographic libraries	OpenSSL 3.1.0 MbedTLS 3.1.0 Libgcrypt 1.9.4

Table 6. Execution overhead.

CryptographicLibrary	CPU	Iterations per Second		Overhead (%)
CryptographicLibrary	CPU	w/o BranchCloak (σ)	w/ BranchCloak (σ)	Overhead (%)
OpenSSL	i5-7500	1,179,533 (1.62)	1,078,476 (10,048)	8.57
	i5-10600	1,499,467 (2.16)	1,409,985 (12,512)	5.97
	i9-11900	1,687,880 (1.94)	1,582,077 (23,834)	6.27
MbedTLS	i5-7500	11,975 (1.21)	11,830 (34.5)	1.21
	i5-10600	15,160 (1.48)	15,037 (44.8)	0.81
	i9-11900	15,510 (1.11)	15,405 (116)	0.68
Libgcrypt	i5-7500	904,426 (1.18)	822,143 (1911)	9.10
	i5-10600	1,175,296 (1.56)	1,116,076 (1410)	5.04
	i9-11900	1,430,408 (0.80)	1,312,709 (8883)	8.23

σ

: standard deviation.

Table 7. Software-based mitigations for PHT-based side-channel attacks.

Mitigation	Technique	Scope of Protection	Performance Overhead
Raccoon [27]	Executing decoy path	All conditional branches	<1000×
Zigzagger [31]	Obfuscating the control flow through trampoline branches	All conditional branches	<2.19×
Hosseinzadeh et al. [30]	Converting conditional branches into conditional move and indirect branch	All conditional branches	<1.64×
Evtyushkin et al. [36]	Randomizing the PHT in context switch	Conditional branches across processes	<1.2×
Half&Half [40]	Software-based PHT partitioning	Conditional branches across two domains	<1.07×
Coppens et al. [28]	Converting conditional branches into conditional moves	User-informed secret-dependent branches	<24×
BranchCloak	Obfuscating the PHT state through insertion of colliding branches	User-informed secret-dependent branches	<1.09×

Table 8. Program size overhead.

(Unit: Bytes)
Cryptographic Library	w/o BranchCloak	w/ BranchCloak	w/ BranchCloak per Aligned Branches
OpenSSL (bn_exp.o)	25,504	213,920 (5) †	37,863
MbedTLS (bignum.o)	45,448	401,800 (11)	32,395
Libgcrypt (ec.o)	215,000	399,344 (5)	36,868

† Number of aligned branches.

Table 9. Effect of BranchCloak on page faults, branch misprediction rate, and cache miss rate.

Performance Counter	Cryptographic Library	Architecture	Not Applied	Applied
Page Faults	OpenSSL	i5-7500	288	290
		i5-10600	288	291
		i9-11900	308	310
	MbedTLS	i5-7500	84	88
		i5-10600	84	88
		i9-11900	84	88
	Libgcrypt	i5-7500	152	154
		i5-10600	152	153
		i9-11900	149	150
LLC Miss Rate (%)	OpenSSL	i5-7500	20.98	20.58
		i5-10600	15.95	14.01
		i9-11900	23.01	22.72
	MbedTLS	i5-7500	2.17	2.03
		i5-10600	1.47	1.23
		i9-11900	2.58	2.87
	Libgcrypt	i5-7500	1.33	0.97
		i5-10600	0.89	0.71
		i9-11900	20.17	18.86
Branch Misprediction Rate (%)	OpenSSL	i5-7500	0.207	0.204
		i5-10600	0.207	0.207
		i9-11900	0.244	0.237
	MbedTLS	i5-7500	0.288	0.274
		i5-10600	0.282	0.272
		i9-11900	0.225	0.219
	Libgcrypt	i5-7500	0.109	0.108
		i5-10600	0.108	0.108
		i9-11900	0.115	0.107

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Kim, J.; Jang, H.; Shin, Y. BranchCloak: Mitigating Side-Channel Attacks on Directional Branch Predictors. Electronics 2025, 14, 1758. https://doi.org/10.3390/electronics14091758

AMA Style

Kim J, Jang H, Shin Y. BranchCloak: Mitigating Side-Channel Attacks on Directional Branch Predictors. Electronics. 2025; 14(9):1758. https://doi.org/10.3390/electronics14091758

Chicago/Turabian Style

Kim, Jihoon, Hyerean Jang, and Youngjoo Shin. 2025. "BranchCloak: Mitigating Side-Channel Attacks on Directional Branch Predictors" Electronics 14, no. 9: 1758. https://doi.org/10.3390/electronics14091758

APA Style

Kim, J., Jang, H., & Shin, Y. (2025). BranchCloak: Mitigating Side-Channel Attacks on Directional Branch Predictors. Electronics, 14(9), 1758. https://doi.org/10.3390/electronics14091758

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

(a) $S_{X}^{Y} (S_{0} = SN)$				(b) $S_{X}^{Y} (S_{0} = WN)$				(c) $S_{X}^{Y} (S_{0} = WT)$				(d) $S_{X}^{Y} (S_{0} = ST)$
	X	T	NT		X	T	NT		X	T	NT		X	T	NT
Y		T	NT	Y		T	NT	Y		T	NT	Y		T	NT
T		ST	WT	T		ST	WT	T		ST	ST	T		ST	ST
NT		SN	SN	NT		SN	SN	NT		WN	SN	NT		WN	SN

(a) $S_{X}^{Y} (S_{0} = SN)$				(b) $S_{X}^{Y} (S_{0} = WN)$				(c) $S_{X}^{Y} (S_{0} = WT)$				(d) $S_{X}^{Y} (S_{0} = ST)$
	X	T	NT		X	T	NT		X	T	NT		X	T	NT
Y		T	NT	Y		T	NT	Y		T	NT	Y		T	NT
T		ST	WT	T		ST	WT	T		ST	ST	T		ST	ST
NT		SN	SN	NT		SN	SN	NT		WN	SN	NT		WN	SN

Article Menu

BranchCloak: Mitigating Side-Channel Attacks on Directional Branch Predictors

Abstract

1. Introduction

2. Background

2.1. Branch Predictor Unit

2.2. PHT-Based Side-Channel Attacks

2.2.1. PHT Entry Collision

2.2.2. General Attack Description

3. The Proposed Method

3.1. Attack Model

3.2. BranchCloak

3.2.1. Overview

3.2.2. The Number of Bits to Align in PHT

3.2.3. The Randomizing Branch

3.2.4. BranchCloak with Two-Level Predictor

3.3. Implementation

4. Security Analysis

4.1. Security Regarding the Location of r-Branches

4.2. Security Against Single-Stepping Attackers

4.3. Security Against Various Attacker Capabilities

5. Performance Analysis

5.1. Execution Overhead

5.2. Program Size Overhead

5.3. Microarchitectural Side-Effects

6. Limitations

6.1. Applying BranchCloak to Various Architectures

6.2. Identifying Vulnerable Branches

7. Related Work

8. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

(a) $S_{X}^{Y} (S_{0} = SN)$				(b) $S_{X}^{Y} (S_{0} = WN)$				(c) $S_{X}^{Y} (S_{0} = WT)$				(d) $S_{X}^{Y} (S_{0} = ST)$
	X	T	NT		X	T	NT		X	T	NT		X	T	NT
Y		T	NT	Y		T	NT	Y		T	NT	Y		T	NT
T		ST	WT	T		ST	WT	T		ST	ST	T		ST	ST
NT		SN	SN	NT		SN	SN	NT		WN	SN	NT		WN	SN