1. Introduction
FPGAs provide an efficient platform for the realization of streaming-based applications making them a suitable choice for hosting implementations of cryptographic algorithms. Moreover, from a security perspective, FPGAs offer the advantage of reconfiguration, which allows a design update if a security flaw is detected. Encryption algorithms such as AES are proven to be algorithimically secure. However, the same cannot be guaranteed for the hardware running these cryptographic implementations if it is not located in a controlled and secure environment. If an FPGA-based cryptographic implementation is exposed to an environment which is fully under the control of an attacker, physical attacks such as: Side-Channel Analysis (SCA) or Fault Injection Attacks (FIA) can be launched to extract sensitive information.
SCA is a relevant threat for hardware cryptographic implementations. A successful SCA results in the loss of system’s confidentiality as secret information (usually key) can be extracted. The principle of SCAs involves the dependency between processed data and some measurable leakage parameter. The attacker may try to exploit the information leakages from various cryptographic operations running on an electronic device to extract the secret key. Some well-known side-channels are power [
1], timing [
2], and electromagnetic radiation [
3].
Among SCAs, power analysis attacks have gained the most attention. Introduced by Kocher et al. in 1998 [
1], they have now been successfully used against various encryption algorithms [
4]. Power analysis attacks are based on the fact that different operation cause variations in the activities on the signal lines, which result in different power consumption. The power consumption can be easily measured by observing the current or the voltage on a shunt resistor or the EM radiations using a magnetic probe. The relative ease with which these attacks can be applied makes them extremely powerful.
The power analysis attacks can either exploit the relationship between the measured power and the operations executed by the device known as
Simple Power Analysis (SPA) [
1] or by understanding the correlation between the processed data and the measured power called the
Correlation Power Analysis (CPA) [
1]. A closer inspection of physical attacks reveal that the basic attribute of security, i.e., Confidentiality cannot be transferred to the hardware realization of a cryptographic algorithm. Once under the control/possession of an attacker, there are often no technical or time limitations to characterize the device, underlying security circuits, and eventually extracting the secret key. Moreover, the static nature of cryptographic hardware implementations further simplifies the attack. However, as mentioned earlier, the reconfigurable nature of FPGAs allows them to switch between different hardware configurations. This dynamic behavior would undermine the static attack principles of SCA and FIA, consequently making the breach of confidentiality difficult and in most cases impossible. To continually reconfigure parts of a cryptographic circuit, the
Partial Reconfiguration (PR) of modern FPGAs can be used. PR allows parts of a circuit to be modified dynamically, without interrupting the remaining logic.
The implementation of these dynamic circuits requires a two-step approach. First there is a requirement to create a set of polymorphic realizations of the entire (or parts of) cryptographic algorithm. These realizations which are functionally identical but structurally different can then be dynamically shuffled using Partial Reconfiguration. A lot of recently published articles focus on the first part, several strategies at different stages of the CAD flow have been proposed to generate these polymorphic realizations of the AES called `variants’. However, there is not much literature available which discusses the influence of individual components of PR-based countermeasures in regard to resistance against SCA. This work attempts to fill this gap and focuses on the evaluation of a complete PR system to analyze and quantify the contribution of components normally present in such a PR system. Especially the influence of (a) placement, (b) noise from the static part of the system and (c) PR noise is examined. This allows a comparison of the influence of variants against often neglected contributors. Furthermore, we report the costs of such a partially reconfigurable system in terms of area overhead as well as the influence of cryptographic throughput compared to a non-PR system.
The remainder of the paper is organized as follows:
Section 2 covers the literature related to countermeasures against power attacks.
Section 3 explains the concept of dynamic circuits and how they can protect a cryptographic implementation.
Section 4 provides the details for generating netlist-level variants.
Section 5 describes the measurement setup. In
Section 6, we present a discussion on the results, while
Section 7 highlights the conclusion and the future work.
2. Background and Previous Work
Countermeasures for power-based SCA attempt to reduce the dependence between the processed data at an intermediate stage of the algorithm (which could be used in an adversarial model), and the actual power consumption of the device.
The first proposal advocating the usage of PR as a countermeasure against power-based SCA was presented in [
5]. The work utilizes a serial-based AES architecture with registers to store intermediate results. The registers are connected to a switch-matrix which can be dynamically reconfigured to randomly change the position of registers between different AES rounds. The additional delay (at different stages) introduced by the registers reduces the correlation between intermediate values and power consumption.
The work in [
6] proposes a number of countermeasures which are especially suited to FPGA-based cryptographic implementations. The presented countermeasures utilize some specialized resources on an FPGA fabric, such as Shift-Register LUT (SRL), BRAM, and Digital Clock Manager. A similar idea was presented by Sasdrich et al. [
7] using CFGLUTs for the randomization of AES switch boxes.
Recently, an alternative method to create variants of a cryptographic circuit was proposed in [
8]. The work explores the idea of implementation diversity to generate a set of AES variants. The variants are generated by constraining a certain percentage of cells to a particular region of FPGA fabric. The approach results in a number of diversified implementations (configurations) which are functionally similar but physically different, i.e, having varying placement and routing and consequently, varying dynamic power profiles. The authors propose that shuffling between these variants using PR would result in different observed power values for the same hypothetical intermediate values thus reducing the correlation between processed data and measured power.
Protecting a cryptographic implementation by varying its location during run time (so-called moving target approach) appears to be a promising approach. However, depending upon the available space on the fabric, the method can generate only a limited number of unique power profiles. Another recently published article [
9] combines implementation diversity with the moving target approach. The variants consist of four realizations of the Sbox function of the AES and 64 random noise sources. The Sbox variants can be mapped randomly to any of the 16 partially reconfigurable regions. Recently, the work in [
10] presented the idea of implementation diversity from a different perspective. Instead of generating variants at the placement or routing stages of the FPGA CAD flow, the authors suggest modifications at the synthesis level. Compared to the approach in [
8] where the circuit structure is fixed, there is much more freedom during synthesis to change the structure of the circuit. The work in [
11] also explores synthesis level changes by using sub-sets of a standard cell library while synthesizing the Sbox, resulting in different structural realizations of the Sbox. However, unlike the work in [
10], this approach lacks flexibility to generate a large number of distinct variants. Some recent studies [
12,
13] have also demonstrated the feasibility of launching remote power side-channel attacks on FPGAs in a cloud computing environment, where the fabric can be shared among multiple users or programs. A detailed summary of the PR-based countermeasures has been provided in [
4].
To the best of our knowledge, only the work in [
8,
9] present findings for an actual PR system implemented on FPGA. However, the influence of PR system in the overall resistance against SCA have not been studied.
3. Concept of Dynamic Circuits
Section 1 presents the feasibility of the dynamic circuits as an effective countermeasure against the SCA. This dynamic realization of a cryptographic algorithm involves two steps, (a) generation of variants, circuits which are functionally identical but structurally modified and (b) a reconfigurable system (PR system), to load variants during runtime at different locations. In the proposed PR system, the variants are partial bitfiles which are generated from one of the netlist randomization techniques (Flip Flop-Inverter Chains) presented in [
10]. To manage the continuous switching between different variants, a hardware-based reconfiguration manager (see
Figure 1) is needed. It allows reconfiguring one variant while another is in use.
A very simple realization of a PR system would only contain a single partial area. Such an architecture would be sufficient to evaluate the effectiveness of individual variants. However, switching from one configuration to another would require halting the encryption process resulting in a serious throughput penalty. To avoid such stalls, a straightforward solution is to use a double buffer system with two partial areas. Now as one partial area encrypts the incoming data, the other partial area can be configured simultaneously with the next variant. The switch from one partial area to another can then be performed within a single clock cycle incurring no additional overhead in terms of latency. This improved throughput comes at the expense of high area overhead. However, the resources allocated to the different partial areas may not be a strict overhead if the encryption algorithm is part of a much larger system with several subsystems composed of different modules. In such a system, the non-occupied partial areas can be used to map the modules of other subsystems. One can imagine a streaming hardware accelerator with en- and decryption (see [
14] for an example) or a multi-tenant cloud-based FPGA server with the reconfigurable fabric shared between several users.
An example of the proposed PR system is presented on the right-hand side of
Figure 1. The TRNG randomly selects a variant from the secure storage and loads it a random time point over the ICAP interface on a random location on the FPGA. Each of the four partial areas in
Figure 1 can hold the implementation of a variant. Since the outputs of these PAs are multiplexed, only the output of a single PA can selected at a time. However, as mentioned earlier the other PAs could be used to map the modules of other subsystems, increasing the background noise which increases the difficulty of a side-channel attack.
All the prior studies related to implementation diversity focus on quantifying the contribution of variants in providing resistance against SCA. However, in an actual PR system, there are multiple sources of background noise which can complicate the power analysis. Therefore, actual influence of variants can only be evaluated if the measurement setup allows isolating power dissipation of individual components.
In this paper, we extend the work presented in [
10] which focused on the novel aspect of the generation of netlist-level variants. We provide an in-depth evaluation of the degree of protection provided by netlist randomization against power side-channel analysis. With the implementation of a self-configurable system, we analyze the effect of individual components in the resistance against SCA. This helps us quantify the actual contribution of netlist randomization as a countermeasure. Although there are multiple parameters which could influence the resistance against SCA, we have focused on some of the most important components expected to have a strong influence. Some of the components which have been evaluated include background noise sources such as additional clock or PR noise, the effect of a variant’s placement, and the number of variants exchanged during runtime (for further details see
Section 6.2). To the best of our knowledge, this is the first work which isolates the influence of implementation diversity as a SCA countermeasure in a PR-based system.
4. Variants Generation
As mentioned in
Section 2, variants of a cryptographic algorithm can be generated at the different stages of a CAD flow. For the evaluation of the proposed PR system, we have selected the synthesis level variants proposed in [
10] as they offer a greater degree of flexibility compared to the constrained placement approach presented in [
8]. However, generation of synthesis level variants requires access to the underlying details of a tool such as ability to perform changes at the netlist level, which unfortunately is not the case with most of the commercial tools. Therefore, the work in [
10] bypasses this problem by making use of an open-source synthesis tool called Yosys [
15]. Yosys allows fine-grained changes in the design at the netlist level. The tool transforms the HDL description of a circuit into an internal representation which allows easy modifications of the design at both coarse and fine level. Once all transformations are completed, ABC [
16] is used for mapping and final logic optimizations. The design can then be exported as an EDIF or BLIF netlist and used for placement and routing flow.
4.1. AES Architecture
As a case study, we have used the reference AES architecture provided by the
ChipWhisperer [
17]. The implementation requires 4547 Look-up Tables (LUTs) and 785 registers. The implementation consists of a Round Function module and Key Scheduling module. Both of these modules are instantiated in a
Core entity as shown in
Figure 2. A finite state machine determines the next mode of operation, initialization state, intermediate or the last round, output available, etc. Key Schedule is a module that expands and generates keys for each round based on the given secret key. The Round Function module consists of the main subfunctions of AES—AddRoundKey, SubBytes, ShiftRows, and MixColumns.
It should be noted that a different (serialized) AES implementation was used for experimentation in [
10]. Unfortunately, we were not able to launch a successful CPA attack against it using the ChipWhisperer’s measurement setup.
4.2. Synthesis Level Changes Using Yosys
To enhance the resistance of AES against SCA, three classes of variations of the serial-based AES implementation were presented in [
10]. These variant classes include data hiding, dummy logic generation, and sub-par mapping to influence the power consumption by introducing background noise or data manipulation. Since the focus of this paper is to determine the influence of PR on the overall resistance offered by netlist level manipulations against SCA, we have selected variant class 2, which inserts dummy chains of cascaded inverters and flip-flops in different AES modules (see
Figure 3). From the resource utilization and CPA values reported in [
10], we found this class to be particularly effective against SCA, compared to the other two variations. These chains run in parallel to the main design and contribute to the noise in power consumption. The chain can be inserted into any of the three main modules of the AES design, and its length can also be randomized when the design is modified in Yosys. For this work, we have inserted the chain into the
round module, which is mainly targeted by different attack models for leakage analysis. The chain length has been varied by generating Yosys scripts using a parametrized python program which can generate scripts for chains of different lengths in a selected range.
5. Measurement Setup
As discussed in
Section 1, to launch a power analysis attack, a power consumption profile of the cryptographic implementation is needed. For collecting power traces and later performing the SCA, we have selected the ChipWhisperer [
17] tool-chain. The ChipWhisperer is an open-source tool-chain developed by NEWAE Technology Inc. It is dedicated to hardware research and incorporates a variety of components to carry out physical attacks on embedded hardware. To execute an attack, the ChipWhisperer requires: a
Capture board and
Target board(s). For this work, we use the CW305 target board which enables security analysis on FPGAs. The board features an Artix-7 (xc7a100t) target device and has been interfaced with ChipWhisperer-Lite (capture board) for performing a power-based SCA against the AES algorithm implemented on the target board. The communication between the two boards has been established via a 20-pin connector and the power fluctuations during encryption are passed to the capture board via a SMA connector.
The details of the measurement setup are as follows:
- 1.
Variants of the AES are implemented on a Artix-7 (xc7a100t) device. The operating frequency of AES is 10 MHz.
- 2.
The power traces for encryption have been collected by measuring the voltage-drop (amplified by 20-dB) across a shunt resistor. The choice was made keeping in mind the reproducibility of the results, as the standard ChipWhisperer-Lite package does not include an EM-probe.
- 3.
A single encryption run corresponds to one measured trace.
- 4.
The traces are passed to the capture board via a SMA connector and are then sampled at ~40 MS s−1.
- 5.
A total of 10,000 traces has been recorded for each variant. For the system with active PR, the number of traces depend upon the reconfiguration time (see Equation (
1)).
The physical realization of netlist randomization has been accomplished by implementing a self-configurable system (
Figure 4) based on the concepts presented in
Section 3. The proposed system consists of two partial regions (PA0 and PA1) for hosting any of the generated variants. For partial reconfiguration, the
Internal Configuration Access Port (ICAP) [
18] is used to configure the regions PA0 and PA1 with partial bitstreams. The 32-bit wide ICAP interface can operate at a maximum frequency of 100 MHz. Therefore, the theoretical throughput of the ICAP interface is around 400 MS s
−1. The ICAP is operated with the ChipWhisperer’s USB clock running at 96 MHz. The Chipwhisperer has two clock domains: encryption and USB. The reason behind having two separate clock domains is to have a dedicated clock for encryption while the other clock (USB) takes care of the rest. This allows for the USB clock to be switched off during encryption which reduces the noise in the power traces.
As mentioned before, most of the features of the implemented self-configurable system have been borrowed from the system shown in
Figure 1, however, for the sake of simplicity some features such as the TRNG have been implemented in the ChipWhisperer software. For experimentation, we generate a total of 10 variants (v1, v2, …, v10), which are realizations of the Flip-Flop Inverter chain (see
Section 4.2) of different sizes. With the baseline implementation (v0) included, a total of 11 designs were used for experimentation, while encryption is running on one partial area (PA0 or PA1), the TRNG randomly selects one of the 10 remaining designs to be loaded on the other partial area. As only 11 variants are in use and our results show no major influence of the frequency of exchanging the variants (please refer to the experiments in
Section 6.2.6), we think the entropy of the selection is of secondary importance for this work.
The compressed size of the bitfile for the largest variant was found to be ~ 465 kB, which results in a reconfiguration time (
) of ~
ms. The AES implementation used in this work requires 12 clock cycles (latency
) for a single encryption. Therefore, for a operating frequency (
), the number of encryption operations that can be performed in parallel to a single reconfiguration (
Npr) can be calculated as:
For our measurement system (
10 MHz), this amounts to
Npr encryption operations. It should be noted that the upper limit on the operating frequency is defined by the measurement setup. The maximum sampling frequency of ChipWhisperer is 105 MS/s. This limits the operating frequency of AES to ~25 MHz when
oversampling is used. To have comparable results, we set the
value to 10 MHz, which is the default value for the CW305 measurement setup. No additional latency is incurred by the modifications of AES core to generate variants. According to the timing analysis, the default AES core and all the variants function properly over 100 MHz. The resource overhead (see
Table 1) introduced by the proposed self-configurable system consists of one partial region, reconfiguration interface (including the logic to communicate with ChipWhisperer), and BRAM Tiles. The BRAM tiles are used for storing the partial bitstreams of variants.
7. Conclusions and Future Work
This work provides a comprehensive evaluation of netlist randomization as a countermeasure against side-channel analysis. The efficacy of the countermeasure has been analyzed under various scenarios, quantifying the influence of individual components on the overall resistance achieved against SCA. A group of AES implementations (variants) were generated by modifying the netlist using an open source synthesis tool Yosys. The power measurements for variants were carried out on the ChipWhisperer’s CW305 board with an Artix-7 target. Dynamic shuffling of the variants was realized using the proposed self-configurable system.
The results of
Section 6.2 reveal some interesting insights. Firstly, the netlist-based variants did not offer the expected level (based on [
10]) of resilience against SCA. Increasing the number of variants or varying their intervals of active encryption did not have a significant impact on the overall resistance. All of these observations point to the fact that the
quality of individual variants is critical. Variants which introduce varying amount of background noise do influence the power consumption. However, this change does not thwart power-based side channels as the correlation peaks are still aligned at the same instant in time and hence can be easily distinguished by collecting a sufficient number of traces. Therefore, shuffling between such variants will only provide a marginal improvement in the overall security and hence cannot be relied on as a sole countermeasure to diffuse power-base attacks.
On the other hand, changing the placement of variants and noise introduced by the USB clock did have a meaningful effect. The largest contribution in the resistance against SCA came from the PR noise. As seen in
Figure 14, the
values with PR enabled stay above 30 even for 100,000 traces. A possible future research direction would be to explore variants which can shift the CPA peaks in time (by misaligning the power traces) or can mask the intermediate values exploited in the leakage models. Another possibility is to load the vacant partial areas with special noise generator modules. The combination of such variants with the noise introduced by placement and PR would likely harden the implementation diversity based countermeasures against a large variety of SCAs. For independent verification of our results and potential future research, the modified ChipWhisperer design and related tooling to collect measurements are made publicly available and can be accessed at
https://www.tu-ilmenau.de/ra, accessed on 19 July 2022.