A Secure Architecture for Modular Division over a Prime Field against Fault Injection Attacks

: Fault injection attacks pose a serious threat to many cryptographic devices. The security of most cryptographic devices hinges on a key block called modular division (MD) over a prime ﬁeld. Although a lot of research has been done to implement the MD over a prime ﬁeld in hardware e ﬃ ciently, studies on secure architecture against fault injection attack are very few. A few of the studies that focused on secure architecture against fault injection attack can only detect faults but not locate faults. In this regard, this paper designs a novel secure architecture for the MD over a prime ﬁeld, which can not only detect faults, but also can locate the error processing element. In order to seek the best optimal performance, four word-oriented systolic structures of a main function module (MFM) were designed, and three error detection schemes were developed based on di ﬀ erent linear arithmetic codes (LACs). The MFM structures were combined ﬂexibly with the error detection schemes. The time and area overheads of our architecture were analyzed through the implementation in an application-speciﬁc integrated circuit (ASIC), while the error detection and location capabilities of our architecture were demonstrated by C ++ simulation, in comparison to two existing methods. The results show that our architecture can detect single-bit error (SBE) with 100% accuracy and locate the erroneous processing element (PE), and correctly identify most of the single PE errors and almost all of the multi-PE errors (when there are more than three erroneous PEs). The only weakness of our architecture is the relatively high time and area overhead ratios.


Introduction
Currently, there are various important integrated circuit (IC) devices, ranging from pivotal calculators to security-sensitive devices. Many of these IC devices face the risk of fault injection attacks. The consequences of these attacks include sudden failure of the IC and the leak of key secret information [1][2][3][4]. Over the years, many fault injection methods have emerged, namely, heavy ion radiation, electromagnetic interference, and laser exposure, posing an increasingly high threat to IC devices.
In the field of IC security, widespread attention has been paid to the protection against fault injection attacks [5][6][7][8][9][10], resulting in multiple measures to prevent the ICs from being attacked by fault injection. The typical measures include physical protection [11], the hardware/time redundancy (module duplication/re-computation) method [12][13][14][15], and the error detection codes (EDC)-based technique [10,[16][17][18][19][20][21][22][23]. Among them, the EDC-based technique achieves the best tradeoff between fault coverage and hardware/time overheads [24]. Recently, Mustafa et al. [25] presented a novel differential fault attack (DFA)-aware floor-planning technique, which mitigates the threat from different fault Inspired by the concurrent error detection scheme for the MD in [40], this paper proposes a novel MD architecture capable of effectively detecting and locating erroneous processing elements, and applicable to related cryptographic implementations, laying the basis for prevention of natural faults and fault attacks. The contributions of this paper are as follows: 1. This paper extends our work in [40] to present a new secure MD architecture that can not only detect, but also locate, the error. 2. Twelve combinations of four word-oriented systolic implementations of MFM and three error-detecting schemes with different LAC values were explored to seek the best tradeoff between area, time overheads, and error detection capability. These combinations were modeled using Verilog language and synthesized using Synopsys Design Complier with the TMSC (Taiwan Semiconductor Manufacturing) 90nm CMOS (Complementary metal-oxide-semiconductor) standard cell library. Their functions were also verified using Modelsim. 3. Random fault injections were simulated using the C++ program and the simulation result shows that the proposed architecture can detect single-bit error (SBE) with 100% accuracy and locate the erroneous processing element (PE). The detection capability of single-PE error (multiple-bit error is injected into one PE) and multi-PE error vary with the value of LAC. However, it reaches 99.898% when the number of erroneous processing elements is three or more. 4. In addition, the proposed architecture can greatly shorten the delay in error reporting.
The remainder of this paper is organized as follows: Section 2 briefly reviews the MD algorithm and the LAC; Section 3 sets up our architecture and describes its algorithm; Section 4 analyzes the error detection and location capabilities of our architecture; Section 5 presents the application-specific integrated circuit (ASIC) implementation results of our architecture, and compares them with those of existing schemes; Section 6 puts forward the conclusions of this research.

The MD Algorithm
Let X, Y, and M be three integers, where the greatest common divisor GCD(Y, M) = 1. The MD problem aims to find an integer R satisfying RY ≡ X(mod M). Chen and Qin [31] optimized the Inspired by the concurrent error detection scheme for the MD in [40], this paper proposes a novel MD architecture capable of effectively detecting and locating erroneous processing elements, and applicable to related cryptographic implementations, laying the basis for prevention of natural faults and fault attacks. The contributions of this paper are as follows: 1.
This paper extends our work in [40] to present a new secure MD architecture that can not only detect, but also locate, the error.

2.
Twelve combinations of four word-oriented systolic implementations of MFM and three error-detecting schemes with different LAC values were explored to seek the best tradeoff between area, time overheads, and error detection capability. These combinations were modeled using Verilog language and synthesized using Synopsys Design Complier with the TMSC (Taiwan Semiconductor Manufacturing) 90nm CMOS (Complementary metal-oxide-semiconductor) standard cell library. Their functions were also verified using Modelsim.

3.
Random fault injections were simulated using the C++ program and the simulation result shows that the proposed architecture can detect single-bit error (SBE) with 100% accuracy and locate the erroneous processing element (PE). The detection capability of single-PE error (multiple-bit error is injected into one PE) and multi-PE error vary with the value of LAC. However, it reaches 99.898% when the number of erroneous processing elements is three or more.

4.
In addition, the proposed architecture can greatly shorten the delay in error reporting.
The remainder of this paper is organized as follows: Section 2 briefly reviews the MD algorithm and the LAC; Section 3 sets up our architecture and describes its algorithm; Section 4 analyzes the error detection and location capabilities of our architecture; Section 5 presents the application-specific integrated circuit (ASIC) implementation results of our architecture, and compares them with those of existing schemes; Section 6 puts forward the conclusions of this research.

The MD Algorithm
Let X, Y, and M be three integers, where the greatest common divisor GCD(Y, M) = 1. The MD problem aims to find an integer R satisfying RY ≡ X(mod M). Chen and Qin [31] optimized the extended binary GCD algorithm for the MD over a prime field, which requires less iteration than the other existing algorithms. Its equivalent description is shown as Algorithm 1. The LAC can be added and multiplied as normal integers:

Algorithm 1 Equivalent Description of Modular division algorithm over prime field in [31]
In theory, p can be any relatively small prime number. In practice, however, the p value greatly affects the performance of different applications. This paper adjusts the p value for specific MFM structure, provided that it satisfies p = 2 i − 1, where l = 2,3,5 (namely, p = 3,7,31).

Proposed Secure Architecture and Its Algorithm Description
This section firstly proposes a secure architecture for the MD, and then shows its algorithm description. The relevant parameters are defined as in Table 1 below. The number of words in operands e = n/w X i The value of X at the i-th iteration of Algorithm 2 X k The k-th word of X (X can be A, B, R, or S) The concatenation of X and Y

CA k
The 1-bit carry-out signal from the computation of the k-th word of A

CR k
The 2-bit carry-out signal from the computation of the k-th word of R As shown in Figure 2, the proposed secure architecture contains two modules, namely, the MFM (marked by green line) and the EDM (marked by red line). The former is employed to compute the MD and the latter to detect the error in MFM. The computing process of the secure architecture is summarized as Algorithm 2. In this algorithm, Lines 4, 18-20 describe the error detection function, which is mapped into EDM of Figure 2, and the left lines describe the word-oriented computing process of MD, which is mapped to MFM.
MFM architecture of Figure 2 is similar with that in [40]. In this architecture, the n-bit operands A, B, R, S are split into e w-bit words, each of which is processed by a PE. A total of e PEs are combined into a systolic array to processes them word by word where e = n/w . Specifically, the control element (CE) executes the operations in Lines 9-15, producing control variables (α, β, φ, λ, δ), initial carry-in signals (1-bit CA -1 and 2-bit CR -1 ), and the f signal that controls the initialization in Lines 2-4; the PEk, the physical mapping of the k-th step in Line 17 in for-loop, targets the k-th word of A, B, R, and S. From right to left, a total of e PEs are connected to complete the computation of one iteration (e = 4 in Figure 2). In addition, in MFM, the testing element (TE) module is used to execute the condition judgment operation in Line 6. Finally, the output control element (OCE) executes the operations in Lines 23-24, ensuring that the MD problem outputs parallel results. This paper implements the secure architecture for four types of MFM structures, namely, Type-8, Type-16, Type-32, and Type-64. The number in the name of each type equals w, the word size of PE. Note that, in this paper, we do not show the implementation of MFM; the implementation details of MFM can be found in [40]. Appl. Sci. 2020, 10, x FOR PEER REVIEW 6 of 15 The proposed secure systolic architecture for the MD.

Algorithm 2 Proposed word-oriented MD Algorithm with concurrent error detection
The EDM encompasses e sub-ED elements (ED e−1 , ED e−2, . . . , ED 1 , ED 0 ). Each of them detects the errors in a specific PE. Once an error is detected in one of the PEs, the EDM will issue an alarm immediately and terminate the MD process. The structure of ED k , shown as Figure 3, contains three components: ACPG k , CPP k , and CMP k . Among them, ACPG k executes the operation of Line 18 to . CPP k executes the operation of Line 19 to predict the . CMP k compares the actual and predicted check parts. If the two parts are unequal, CMP k will issue an alarm Ef k about the high possibility of a fault injection attack on PE k , and the system will terminate the MD process; otherwise, the MD process will continue. The logic diagrams of ACPG k and CPP k are shown as Figures 4 and 5, respectively. Note that our architecture provides an ED to each PE. Hence, the validity of any Ef will lead to an error alarm and the termination of the MD, which overcomes the defect of the EDM in [40]-taking the entire n-bit iterative result as the detection so that the system cannot output the iterative detection result until the last word of the iteration result is valid. and the system will terminate the MD process; otherwise, the MD process will continue. The logic diagrams of ACPGk and CPPk are shown as Figures 4 and 5, respectively. Note that our architecture provides an ED to each PE. Hence, the validity of any Ef will lead to an error alarm and the termination of the MD, which overcomes the defect of the EDM in [40]-taking the entire n-bit iterative result as the detection so that the system cannot output the iterative detection result until the last word of the iteration result is valid.    diagrams of ACPGk and CPPk are shown as Figures 4 and 5, respectively. Note that our architecture provides an ED to each PE. Hence, the validity of any Ef will lead to an error alarm and the termination of the MD, which overcomes the defect of the EDM in [40]-taking the entire n-bit iterative result as the detection so that the system cannot output the iterative detection result until the last word of the iteration result is valid.    and the system will terminate the MD process; otherwise, the MD process will continue. The logic diagrams of ACPGk and CPPk are shown as Figures 4 and 5, respectively. Note that our architecture provides an ED to each PE. Hence, the validity of any Ef will lead to an error alarm and the termination of the MD, which overcomes the defect of the EDM in [40]-taking the entire n-bit iterative result as the detection so that the system cannot output the iterative detection result until the last word of the iteration result is valid.

Attacker Model
Since the emergence of fault injection, various attack methods have been created to inject fault into semiconductors [41], which makes it hard to define the fault model. Referring to the relevant literature, this paper puts forward the following common hypotheses on the capability of attackers: (1) Attackers are incapable of directly invading, modifying, or rebuilding the circuit structure, using powerful fault injection techniques like a focused ion beam [42]. It is a most powerful assumption of the capability of attacker. If attackers are capable of rebuilding the circuit, then our secure architecture is not able to resist this attack. However, the attack is high-cost and is hard to carry out in practice. It needs very expensive consumables and a strong technical background [24]. (2) Attackers are incapable of tampering with the clock signal [10]. Although tampering with the clock signal is a viable option for an attacker, it is a common assumption for a secure architecture designer because the designer is usually focused on protecting the data path of the chip but not the clock. (3) Attackers are capable of injecting a fault into either MFM or EDM, but not both at the same time. However, the probability that a fault is injected into MFM and EDM at the same time and escape from a comparison result is very small in practice. (4) Attackers are incapable of controlling the error pattern in the MFM output. It is also a strong assumption of the capability of an attacker, that if an attacker is capable of injecting a fault in MFM and causes an error pattern that our architecture cannot detect, our secure scheme will not work.

Five Types of Error Models
In this paper, the five types of error models were adopted to verify the error detection capability of our secure architecture.
(1) Single-bit error (SBE): The injected fault causes a one-bit error in the PE output. The five error models are visualized in Figure 6, where each black rectangle represents a PE stricken by the injected fault, and each black dot represents an erroneous bit in PE output.
Since the emergence of fault injection, various attack methods have been created to inject fault into semiconductors [41], which makes it hard to define the fault model. Referring to the relevant literature, this paper puts forward the following common hypotheses on the capability of attackers: (1) Attackers are incapable of directly invading, modifying, or rebuilding the circuit structure, using powerful fault injection techniques like a focused ion beam [42]. It is a most powerful assumption of the capability of attacker. If attackers are capable of rebuilding the circuit, then our secure architecture is not able to resist this attack. However, the attack is high-cost and is hard to carry out in practice. It needs very expensive consumables and a strong technical background [24]. (2) Attackers are incapable of tampering with the clock signal [10]. Although tampering with the clock signal is a viable option for an attacker, it is a common assumption for a secure architecture designer because the designer is usually focused on protecting the data path of the chip but not the clock. (3) Attackers are capable of injecting a fault into either MFM or EDM, but not both at the same time. However, the probability that a fault is injected into MFM and EDM at the same time and escape from a comparison result is very small in practice. (4) Attackers are incapable of controlling the error pattern in the MFM output. It is also a strong assumption of the capability of an attacker, that if an attacker is capable of injecting a fault in MFM and causes an error pattern that our architecture cannot detect, our secure scheme will not work.

Five Types of Error Models
In this paper, the five types of error models were adopted to verify the error detection capability of our secure architecture.
(1) Single-bit error (SBE): The injected fault causes a one-bit error in the PE output. The five error models are visualized in Figure 6, where each black rectangle represents a PE stricken by the injected fault, and each black dot represents an erroneous bit in PE output.

Simulation and Comparison of Error Detection Capabilities
In order to analyze the error detection capability of the proposed systolic MD architecture, we first verified the proposed secure architecture using the C++ program, then simulated fault injection and got the error detection capability of the architecture based on 100,000 testing cases. The testing results were compared with the result of the error detection scheme in [40] in Table 2. In addition, we also applied Mozaffari Kermani's multi-column parity prediction scheme to our MFM, and using the same way, investigated its error detection capability. Here, we need to clarify a fact-in this paper we borrowed Mozaffari Kermani's multi-column parity prediction idea and applied it to MD over a prime field, but in [36], Mozaffari Kermani applied it to MD over GF(2 m ). The MD operations over a prime field and GF(2 m ) are different, thus the cost of error detection is different. For a simplified description, we used Parity, Style-I, and Style-II to represent the error detection scheme in [36,40], and in this paper. Table 2 shows simulation results of all three schemes. As shown in Table 2, for the SBE model, all three methods detected 100% of the errors. For the other error models, Style-II exhibited a poorer error detection capability than Style-I and Parity, due to its extremely small p value. However, Style-II architectures with p = 7 and p = 31 were similar to Style-I and Parity in error detection performance. When there were three or more erroneous PEs, Style-II could detect 99.898% or more errors, slightly behind that of Style-I and Parity. Despite the lag in detection ability, Style-II has an advantage over the two contrastive methods-once an error was detected in a PE, the erroneous PE could be located 100%. Table 2 also shows that the error detection capability of Style-I varied with the MFM structures-the longer the word size of the PE in the MFM, the better the error detection capability. By contrast, Style-II's error detection capability changed only slightly with the MFM structures. For Style-II, the error detection capability increases with the check factor p, under the same MFM structure.

Analysis on Time and Area Overheads
This section mainly presents the time and area overheads of our architecture (Style-II) with three different check factors (p = 3, 7, and 31) under each MFM structure. In order to get time and area overheads, we first modeled the proposed architecture using Verilog, then verified using Modelsim, and finally synthesized the circuit by Synopsys Design Vision with TSMC 90nm CMOS standard cell library. For comparison, Style-I and Parity methods were also modeled and synthesized under the same conditions. The synthesized results are given in Table 3, where time (area) overhead ratio refers to the quotient between extra time (area) overhead and MFM time (area).
Firstly, as shown in Table 3, the time and area overhead ratios of Style-II always surpassed those of Style-I, whichever the check factor, but they were lower than those of Parity at p = 3 or 7. For example, when the MFM structure belonged to Type-8, the mean time and area overhead ratios of Style-I were 0.51% and 41.31%, respectively; those of Style-II with p = 3 were 11.86% and 45.67%, respectively; those of Style-II with p = 7 were 20.70% and 77.84%, respectively; those of Parity were 30.07% and 72.23%, respectively; those of Style-II with p = 31 were 49.64% and 123.22%.
Secondly, we noticed that, when the p value was fixed, the time overhead ratio increased with the operand size n, the area overhead ratio decreased with the growth in n, and the product of time and area overhead ratios decreased with n. For example, for Style-II with p = 7, the mean time and area overhead ratios were 18.50% and 104.80%, respectively, when n = 128, and 23.64% and 46.80%, respectively, when n = 1024. Judging by only time and area overheads, Style-II's performance is negatively correlated with p-value. However, considering overall performance including time overheads, area overheads, and error detection capability, Style-II with p = 7 should be a better choice compared with Style-II with p = 3 or 31.
Thirdly, under the fixed p-value, Style-II's performance varied with the MFM structures. The larger the word size of the PE in the MFM, the greater the time and area overhead ratios. For example, when Style-II with p = 3 was applied to Type-8 MFM, the mean time and area overhead ratios were 11.86% and 45.67%, respectively; when it was applied to Type-16 MFM, the two ratios were 33.98% and 51.02%, respectively. Hence, Style-II works better in error detection of short-word systolic implementation of the MFM. On the contrary, Style-I is more suitable for error detection of long-word systolic implementation of the MFM. In both methods, the time and area overheads are negatively correlated with the operand size n. In other words, the two methods are more efficient for large integer MD problems.
Finally, Style-II showed a much shorter delay in error reporting than Style-I and Parity. The delay of Style-II increased with w and p, but remained basically constant when the MFM structure was stable and the operand size varied. Overall, Style-II with p = 7 and Type-8 MFM strikes a good balance between time overheads, area overheads, and error detection capability.

Conclusions
This paper extends the work in [40] to put forward a new LAC-based secure architecture for the MD over a prime field against fault injection attacks. Instead of taking the long n-bit iteration result as a detection cell, this paper takes the short w-bit word as the detection cell to implement the function of locating the erroneous processing element. In this paper, four word-based MFM systolic structures with different word sizes and three error detection schemes with different values of linear arithmetic code were explored to seek an optimal tradeoff between different performance indexes. These combination architectures were modeled using Verilog and synthesized by Synopsys Design Vision with the TSMC 90nm CMOS standard cell library to get time and area overheads. The error detection and location capability of proposed architectures were also investigated using the C++ simulation method. The same methods were also used to test the performance of the architectures based on the Style-I scheme and the Parity scheme. The simulation results show that the proposed architecture with p = 7 and Type-8 MFM strikes a good balance between time overheads, area overheads, and error detection capability. Despite having greater area overheads than Style-I, the architecture enjoys unique advantages in the location of the error processing element and timely error reporting. The research results help to find and locate fault attacks quickly. However, the large time and area overheads of this architecture maybe limited its application; we need to optimize its implementation in future research to expand its application range.