Using simple instruction sequences to characterize an embedded processor’s fault response has been shown to be an effective technique during physical device testing [
10,
11,
12]. SimpliFI supports this technique, and provides benefits over physical testing by enabling rapid evaluation through simulation and collecting hardware state information during execution. The goal of this type of evaluation is to build a knowledge base or model of how any type of instruction may be vulnerable to fault attacks. Therefore, all aspects of an instruction are of interest: opcode, addressing mode, operands, data, etc. An evaluator generally wants to know how each of these instruction components contribute to overall fault response. Acquiring this information essentially provides insight into how the hardware that implements each component is vulnerable to fault attacks. To fully characterize a device, tests are designed to isolate microarchitectural components to evaluate their fault responses, and to generalize the common fault vulnerabilities and behavior exhibited by instructions with similar parameters.
To demonstrate SimpliFI’s instruction characterization capabilities, we characterize ADD, LW (load word), and JALR (jump-and-link with register) instructions from the RISC-V 32 bit integer ISA (RV32I). These three instructions together represent the range of capabilities on a load-store architecture, where most instructions are categorized as performing arithmetic and logic, memory access, or control flow functions. Although these three instructions are representative of the full ISA, the same analysis can be performed for all instructions. Within each instruction, the effect of different destination registers, source registers, source orderings, register values, and immediate values are evaluated. To do this, a separate test is created for each instruction component that isolates the behavior for the component of interest. To test fault effects on the destination register, we simulate faults on multiple copies of the same instruction with varying destinations. The same approach is used for the other components of interest. For the purposes of this evaluation, the following terms apply:
For each instruction component test (e.g., destination and source 1), 4 instances of the same instruction with varying selections of the target component were evaluated. To evaluate each of these components using SimpliFI, a unique test program was written for each instruction component. Listing 2 provides the specific test programs for evaluating the impact of the first source register in ADD instructions. The target instructions are padded with NOPs to ensure previous and future test setup instructions do not interact with the target instruction execution. Furthermore, each subtest was conducted with faults being applied at each execution stage, with clock glitch widths ranging from 12 to 2 nanoseconds in 250 picosecond increments.
ADD Instruction
Figure 8 shows the results from injection clock glitches during different stages of an
ADD instruction. The fault evaluation analysis calculates the Hamming distance (HD) between the results of clean simulations with no faults and the results of fault injection trials. The top graphs in
Figure 8 and subsequent figures plot the number of faulty bits in the output register in response to multiple clock glitch widths. The bottom heatmaps depict how the number of faulty bits in the entire hardware register state changes as the instruction executes. In each heatmap, the lowest point on the vertical axis shows the number of faulty bits in the cycle immediately following fault injection, and time proceeds up the vertical axis until the instruction finishes execution. Since the state only contains faulty bits after the fault is injected, previous cycles of execution are omitted from the heatmaps. For example, the hardware state fault heatmap in
Figure 8c starts at execution cycle 3 since the fault was injected during stage 2.
Listing 2.
An example instruction sequence test program that focuses on evaluation of the ADD source 1 instruction component.
Listing 2.
An example instruction sequence test program that focuses on evaluation of the ADD source 1 instruction component.
The instruction output plots in
Figure 8 show that the impact of identical faults on the instruction output varies depending on when the fault is injected. When faults are injected at the beginning of execution in stage 0, only clock glitch widths between 7.25 and 5.5 ns result in corrupted output data. However, faults injected in stage 2 affect instruction output as long as their glitch widths are shorter than 10.5 ns. The instruction fault responses give an indication of how different faults will affect software execution since the instruction outputs are ultimately relevant to software behavior.
The hardware state fault propagation heatmaps instead give insight into how faults affect processor execution even when they fail to affect instruction outputs. The hardware state fault propagation plots in
Figure 8 always exhibit erroneous bits remaining in the circuitry up until the end of execution where corrupted output data appears in the registers. For example, a high volume of state corruption can be seen throughout execution in
Figure 8a in response to faults that cause corrupted outputs. Glitches in the range of 7.75 to 5.75 ns induce hardware state errors that propagate through to the output, although only glitches shorter than 7.25 ns affect the data registers and glitches between 7.75 and 7.25 ns only affect the program counter. There is also significant error propagation from glitches shorter than 3 ns, which only affect the program counter by the end of execution.
However, attacks during stages 1 and 2 have significantly different effects on the hardware.
Figure 8b,c show that every tested glitch width resulted in at least 50 corrupted state bits within the third subsequent execution cycle, but that the glitches applied in stage 1 result in up to 85 state errors. However, none of these errors propagate to the data outputs, and only 7.75 to 5.25 ns glitches in stage 2 resulted in program counter errors. Another fault behavior in these results is that certain faults induce few errors during injection, while others induce significantly many errors. One trend shown by all of the hardware fault propagation heatmaps is that glitches resulting in few immediate errors tend to cause amplified state corruption a few cycles later. These gradually amplifying faults seem to have the greatest impact on instruction outputs. Conversely, faults that immediately cause significant state error suddenly disappear two clock cycles before the end of execution.
In most of the tests, only the targeted destination register and the program counter are affected by fault injection. However, clock glitches with 4.25 ns widths applied in stage 1 not only affect the target destination register, but other registers as well. This is shown in
Figure 9, where faults were injected on the instruction decode stage of multiple
ADD instructions each targeting a different destination register. The instruction targeting register R15 also modifies R11, and the instruction target R5 modifies R1. This behavior was only observed in response to faults during stage 1. Furthermore, the final erroneous bits in the destination registers vary slightly from one instruction to another. For 4.25 ns glitches injected in stage 1, instructions targeting destinations R2 and R25 result in slightly more erroneous bits. These anomalous behaviors for stage 1 faults call attention to the underlying circuitry for the pipeline stage. It may or may not be a coincidence that targeting registers R5 and R15 results in corrupted values in registers R(5−4=1) and R(15−4=11), while targeting R2 and R25 results in additional corruption in the intended destinations for the exact same fault attack. The important observations here are that
(i) data can be written to the wrong destination register, and
(ii) that the selected destination register has a minimal but non-zero impact on the faulty bits that propagate to the end of execution.
When testing different source registers, SimpliFI revealed that the selected source register has a greater impact on faulty instruction outputs compared to the destination register. The impact from faulting the instruction decode stage of
ADD instruction induced the response shown in
Figure 8c for all destination register selections. However, the responses for the same exact faults vary when selecting different source registers, as shown in
Figure 10. While the responses have similar monotonic structures, the exact number of faulty output bits changes depending on the first source register. The impact of the second source register is lesser than the first. However, the instruction using R2 as the second source register was not affected by 10 to 8.25 ns glitches (
Figure 10b).
Furthermore, the first and second source registers’ impacts on the fault response exhibit opposite behavior at the extreme ends of their fault sensitivity ranges. In
Figure 10a, fault responses to glitches longer than 4 seconds are influenced by the first source register, while
Figure 10b shows that responses to glitches shorter than 3 ns are influenced by the second source register. The different fault responses observed when testing different source 1 and source 2 operands indicate that the hardware that either pipelines the source selections or that propagates values from the source register to the ALU is unbalanced. The hardware for source 1 appears to be more sensitive since fault responses fluctuate significantly more when targeting the first source operand compared to the second. However, the logic paths that select R2 for the second source may be unbalanced as indicated by the different fault sensitivity shown in
Figure 10b.
LW Instruction
Results from testing the LW instruction showed that different classes of instructions have unique responses to fault attacks. Unlike the ADD instruction, faults during the instruction fetch did not result in faulty outputs for the LW instruction, and the fault responses to attacks during the instruction decode varied between the two classes. Even though the same destination and source registers were selected for both classes, the way that the microarchitecture handles decoding of the two instruction types leads to different instruction fault responses. In particular, faults on the decode stage of LW instructions corrupted more than one unintended register in some cases, where faults on the ADD instruction corrupted at most one additional register. Additionally, the LW exhibited vulnerabilities to faults during the memory receive stage (stage 5) of the BRISC-V pipeline, which was not a vulnerable stage for the ADD instruction. This is expected since arithmetic instructions do not require further memory access after the instruction fetch. One unexpected microarchitectural effect was revealed in the LW results, where the first instruction had a different output fault response than subsequent sections. When store instructions were placed in between the target LW instructions, all tested instances of the LW instruction exhibited nearly-identical fault responses.
JALR Instruction
Testing the JALR instruction demonstrated further impacts of the microarchitecture on instruction fault responses. The JALR instruction computes the target program execution address by adding an immediate value to an address obtained from a source register, and also stores the return address in a selected destination register. The fault response of the program counter matched closely with the fault response of ADD instruction destination registers for faults injected during the instruction decode and execute stages. This is likely because the ALU is used in similar ways during ADD and JALR registers. For example, both of these instructions obtain their first operand from a source register during the decode, and use a similar add operation to compute their final value. On the other hand the fault response of the return address destination register in JALR instructions was similar to that of the ADD destination register for faults during the instruction receive stage, but not during the decode or execute stage. This behavior is also attributable to the microarchitecture, where the ALU used in the execute stage has logic for computing the return address which is separate from the logic for adding two full operands together. Finally, the destination registers for both of these instructions share the exact same fault response for faults during the writeback stage (stage 6), since the only event that happens is the final value being written into the destination register.
The raw program counter values collected from the JALR test results indicated exactly how flow control could be violated using fault attacks. Faults during the instruction fetch induce faulty bits in the hardware state but they do not propagate to the new program counter value. In general, the precise timing of a fault during the instruction decode stage determined the type of effect that the fault had on flow control. Some faults resulted in skipped jumps, which is in line with traditional models. Other faults reset the program execution to start at address 0, after which the processor would sometimes successfully complete the original jump, but other times remain at the program start. The former case is previously unobserved behavior which would enable an attacker to execute instructions from the start of the program, but then allow the program to continue executing from the originally-intended function address. Finally, the remaining instruction decode faults resulted in other non-zero corruptions of either the target address, return address, or both.