Reliable Delay Based Algorithm to Boost PUF Security Against Modeling Attacks

Silicon Physical Unclonable Functions (sPUFs) are one of the security primitives and state-of-the-art topics in hardware-oriented security and trust research. This paper presents an efficient and dynamic ring oscillator PUFs (d-ROPUFs) technique to improve sPUFs security against modeling attacks. In addition to enhancing the Entropy of weak ROPUF design, experimental results show that the proposed d-ROPUF technique allows the generation of larger and updated challenge-response pairs (CRP space) compared with simple ROPUF. Additionally, an innovative hardware-oriented security algorithm, namely, the Optimal Time Delay Algorithm (OTDA), is proposed. It is demonstrated that the OTDA algorithm significantly improves PUF reliability under varying operating conditions. Further, it is shown that the OTDA further efficiently enhances the d-ROPUF capability to generate a considerably large set of reliable secret keys to protect the PUF structure from new cyber-attacks, including machine learning and modeling attacks.

One of the most popular and highly efficient hardware security primitives is based on silicon ring oscillator PUFs (ROPUFs) [15][16][17][18][19][20][21]. In simple terms, ROPUFs exploit the intrinsic manufacturing process variations (MPV) of semiconductor integrated circuits (ICs) to generate chip-unique identifiers (IDs) for several security applications, including low-cost device and system authentication, true random cryptographic key generation, and protection of deeply embedded systems against new cyber-attacks. ROPUF design components are based on the linear entropy of physical primitives or blocks (CLBs, LUTs, MUXs, Flip Flops, etc.) with inherently random behaviors that are used to construct its complete structure. In this regard, the relation between challenges and responses is also linear and directly proportional to the number of these random primitives. Compared to other sPUFs, including APUFs, BPUFs, and SRAM PUFs, ROPUFs have a weaker Entropy in terms of the CRP space (smaller challenge-response pairs), and, thus, a limited number of cryptographic keys can be generated by their structures [19]. Due to this limitation, an adversary may try to emulate the design by applying all challenges and uncovering the corresponding responses within a limited time that is proportional to the challenge-response pairs (CRP space). Considering its weak entropy, ROPUFs are highly vulnerable to Machine Learning (ML) and other modeling attacks that ultimately aim to clone the behavior of its challenge-response pairs (CRPs). Integrating more components into the PUF design without degrading its intended performance, i.e., in terms of the uniqueness and reliability, etc. (ROPUF reliability may be negatively impacted), is one way to overcome such a drawback. For example, the configurable ROPUF (c-ROPUF) incorporates more design blocks for mapping more ring oscillators (ROs) in a small area, to improve reliability, area overhead, power consumption, and the PUF capability to generated a larger CRP space [3]. Unfortunately, only a limited number of these components can be integrated, taking into consideration the performance of the generated response bits at different operating conditions [20,21]. Additionally, just like the simple ROPUF, the c-ROPUF extracts non-updated secret keys, and both are based on a static structure that has a fixed CRP behavior [3].
In this paper, a dynamic, multi-stage ROPUF design (d-ROPUF) that increases the CRP space to enhance ROPUF secret key unclonablity and updatability by means of a dynamic PUF structure with multiple CRP behaviors, is introduced [22]. This ensures that PUF design is less vulnerable to machine learning and other modeling attacks. The proposed d-ROPUF is an area-efficient design, leveraging an appropriate automatic mechanism, and dedicated and reconfigurable FPGA resources to build a dynamic multi-stage ROPUF structure inside a single CLB and offer updated secret keys to enhance ROPUF security against new cyber attacks. This also enhances the ability of the ROPUF function to generate a larger and updated CRP space with the help of dynamic ring oscillators. Data samples in terms of RO sample frequencies and the corresponding response bits are extracted from four different PUF structures, and are used by an Optimal Time Delay Algorithm (OTDA) to evaluate the performance of the d-ROPUF design. Experiment results show that employing dynamic ring oscillators coupled with OTDA improves the number of possible CRPs and enhances the security of the ROPUF against modeling attacks. The results also show that the proposed OTDA improves CRP space to enhance silicon PUF's security and protect them against machine learning attacks. The main contributions in this paper can be summarized as follows: • A dynamic RO based PUF primitive architecture is proposed and demonstrated that is reliable over temperature and voltage (VT) variations.

•
The proposed PUF primitive can be automatically configured to output updated secret keys based on the updated behavior of CRP space.

•
Based on process variability of the dynamic RO structures, the proposed technique provides a reliable and large number of challenge response pairs (CRPs) to protect the PUF entity against modeling attacks.

•
The proposed PUF primitive can further generate a larger number of reliable secret keys using a proposed Optimal Time Delay Algorithm (OTDA).

Silicon Physical Unclonable Functions
Silicon Physical Unclonable Functions (sPUFs) are one of the promising hardware-based security primitives and state-of-the-art topics in the emerging hardware-oriented security and trust (HOST) research. They have recently evolved to facilitate the rapid growth in cyber security applications, including physical tampering, hardware Trojans, intellectual property (IP) theft/piracy, machine learning attacks, protection of internet of things (IoT) and smart system devices, security of deeply embedded architectures and real-time computing systems [8,9,[22][23][24][25][26][27]. Due to the uncontrolled manufacturing imperfections of CMOS integrated circuits (ICs) that are fabricated at the nanoscale level with naturally random behavior, it impossible for two silicon devices to be identical, even when they are fabricated using the same fabrication tools at a certain foundry. Silicon PUFs (sPUFs) take advantage of the minor differences in IC fabrications to extract unique identifiers for security applications. Theoretically, a silicon PUF is an implicitly introduced instance-specific function embodied in a silicon device to extract the inherently unique features of its physical characteristics. In this regard, a delay-based PUF can be easily implemented on a silicon device (Micro-controller, ASIC, and/or FPGA) and take advantage of manufacturing process variation of its ICs to produce truly random, inherently unique, and highly reliable silicon signatures, known as silicon figure-prints [9,[15][16][17][18][20][21][22]. Mathematically, PUFs are irreversible (one way) probabilistic challenge-response functions and can be written as follows: Figure 1 shows the concept of a one-way (irreversible) physical unclonable function. As seen in the figure, a silicon PUF is designed with a number of n-binary input bits, known as the input challenges. A PUF instance with the challenge-response relationship is easy to compute; however, it is very hard (almost impossible) to reverse or retrieve. For the generation of a binary response bit, an input challenge is required. Arbiter PUFs (APUFs) are among the popular delay based PUFs. The design of an APUF circuit is presented in Figure 2 [8,10,11] . As shown in the figure, APUFs use a switch-box structure to create a race between two delay paths with an arbiter at the end. Basically, two identical delay paths are formed to produce one response bit based on the delay of these paths. For that, an arbiter (D-latch) is placed at the end of the APUF circuit to decide the winning signal that reaches first. For example, in ROPUF each input challenge is applied to select a specific pair of RO frequencies (two RO frequencies). In 2002, the notion of ring oscillator PUFs (ROPUFs) has been introduced, as shown in Figure 3 [9]. As seen in the figure, ROPUF mainly consists of an odd number of serially connected inverters known as ring oscillators (ROs). The output signals of the individual ROs are fed back to their inputs to form the delay loop. Theoretically, delay components (inverters, Muxs, routing, etc.) are assumed to be identical, as shown by different simulation software (Xilinx, Altera, NI Multisim, etc.). For example, this simulation software shows that the generated frequencies of the individual ROs are exactly the same and do not have the ability to show the real RO frequencies.
Practically, this is not the case when these frequencies are measured using physically mapped hardware designs. The inconsequential differences in ICs manufacturing process variations result in random differences in the frequencies generated by the ROs that are mapped at different locations of a silicon device. Thus, a RO generates a non-uniform clock signal (continually generates "1" and "0") with unique frequency, leveraging manufacturing process variations of the inverters and interconnects of its delay loops. A total of n-RO comparison frequency pairs (two RO frequencies for each pair) are used to generate n-binary response bits. ROPUFs exploit these minor differences in frequencies to extract chip-unique signatures. As shown in Figure 3, the associated logic (MUXs) are used to select two ROs based on an applied challenge (Ci). There are two binary counters for counting the non-uniform number of cycles generated by each RO. These counters can be simultaneously started and stopped before hardware measurements are fed to the comparator. This circuit basically compares measured frequencies obtained from both counters to generate one response, "1" or "0", based on a specific algorithm. Weak PUFs and strong PUFs are first introduced by Guajardo, followed by Ruhrmair et al. in [19,[22][23][24][25][26]. These expressions are not meant to show that one PUF type is superior or inferior to another. For example, Arbiter PUFs (APUFs) offer a high number of CRPs, hence they are known as strong physical random functions [8]. In contrast, ring oscillator PUFs (ROPUFs) are a typical example of weak silicon PUFs, since they only offer a limited number of CRPs [3,9]. The selected RO pair is known as a challenge-response frequency pair that is used for the generation of the challenge response space or CRPs (the total number of the possible challenge responses pairs). For this, the frequencies of a selected frequency RO pair are compared for the generation of a response bit in the generated secret key. For the generation the secret key of n-binary response bits (length = n binary bits), n number of input challenges are needed. It is extremely hard (almost impossible) to fully or partially retrieve a subset of input challenges c i ∈ C, where c i is a subset of all possible set of input challenges C, given the corresponding response bits r i ∈ R, where r i is a subset of the possible set of generated responses R, as shown in Equation (1). For example, for two subsets of input challenges c i = c j , the corresponding binary responses is probabilistic, can be the same or different, and cannot be used to know the applied challenges. Similarly, two silicon devices (m,n) will never generate the same signature (binary response bits) for a specific set of input challenges i.e., Signature m [c i ] = Signature n [c i ].

Cyber Attacks on Silicon PUFs
The entity of silicon PUFs can be accessible to an adversary who has unlimited and physical access to the device that implements a PUF design for security purposes. In such a scenario, the adversary may try to compose the PUF structure by easily retrieving a subset of the CRP space to study the behavior of the entire CRP space and model its structure. Modeling attacks that aim to clone the behavior of the CRP space of certain silicon PUFs can be driven by different Machine Learning (ML) algorithms. ML algorithms are usually used to learn the behavior of the CRP space in order to model silicon PUFs physical structure. For launching a modeling attack, ML gives a parameter based attack for modeling the PUF structures. For example, some of the standard parametric models in APUF are known as the linear additive delay models [8,19,[22][23][24][25][26]. This type of modeling attack sums up the delay of an element in a certain path of an APUF to estimate the total path delay and the corresponding response bit at the end of the path. There have been other techniques used to predict the CRP space, such as solving integer equations [8] and linear programming [27]. For instance, IEEE T-IFS 2013 [22] suggests that a Slender PUF protocol is used as a resilient technique against all known machine learning attacks, and has a high performance when tested with ML algorithms

Proposed Dynamic Ring Oscillator Technique
Ring oscillator (RO) loops are a well-known and efficient hardware architecture for measuring FPGA/VLSI circuit delay. As explained earlier, RO loop can be simply constructed from an odd number of serially connected inverters. A ROPUF circuit is fundamentally based on ring oscillator loops, in addition to the associate hardware circuits (MUX, counters, compactor, etc.). As compared to other sPUFs, ROPUFs are easy to realize on real hardware and can provide higher performance in terms of reliability, randomness, uniqueness, etc. However, due to their weak entropy, ROPUFs may only offer a limited number of challenge-response pairs (CRP space), and, thus, they can generate smaller and fewer numbers of secret keys. For these reasons, ROPUFs are categorized as weak sPUFs, and are more vulnerable to modeling attacks [19]. Fabrication variations for the extractions of inherently unique secret keys from silicon device parameters, which interact directly with the manufacturing process variations, are at the fundamental of ROPUF research. As a drawback, prior research on the fabrication of ROPUFs has been mainly focused on the static ROPUF structure. Static ROPUFs, including the simple ROPUF design, are designed with one structure that has a limited CRP number and fixed CRP space behavior. Configurable ROPUF (c-ROPUFs) are proposed to improve the CRP space and provide better reliability [3]. Both simple and configurable ROPUFs are based on a static ROPUF structure (fixed) and generate non-updated secret keys. Dynamic ROPUFs can overcome these weaknesses by enhancing the CRP space, which allows the generation of larger and updated secret keys with dynamic behaviors. Figure 4 shows a general FPGA-based scheme for the proposed dynamic ROPUF (d-ROPUFs). As seen in the figure, the d-ROPUFs implement different RO structures (different RO stages) on different locations of silicon devices for updated CRP behaviors. For this, the d-ROPUF design should have the ability to reconfigure itself to a new structure with a different number of inverter stages that generate updated RO frequencies within new frequency ranges. The use of d-ROPUF increases CRP space and ensures for the generation of updated secret keys based on new CRP behaviors. Further, d-ROPUF can be used to randomly extract complex, large, and highly unclonable (unpredictable) cryptographic keys from different areas of silicon devices and makes it harder for an adversary to model PUF behavior or correlate input challenges to the corresponding response bits. Figure 5 shows the architecture of the hardware-oriented security technique (d-ROPUF), proposed to improve ROPUFs security against new cyber attacks. The figure shows a design of dynamic, multi-stage ROPUF structures on a small area (single CLB) of and FPGA device that can be altered automatically using a configuration mechanism. As shown in the figure, the d-ROPUF design is mapped inside a single CLB of a Spartan-3E FPGA that contains eight LUTs inside four slices (two LUTs per a slice). This ensures high design efficiency in terms of low-cost, enhanced area overhead, and low power consumption of the proposed PUF when implemented on real hardware. The LUTs are programmed as an inverter or buffer-programmed XOR gates. To construct RO loops with an odd number of inverters (one, three, five, and seven inverters), seven out of the eight LUTs are configured as inverters. The eighth LUT is configured as a buffer and an extra delay to keep the generated frequency of the RO structure less than 300 MHz (less than the maximum operating frequency value of the Spartan 3E FPGA). As seen in the figure, the LUTs are connected using internal CLB routing to form four RO structures inside a single CLB. As seen in Figure 5, four d-ROPUF structures (LUTs and local routing) are distinguished in different colors. Each structure contains n LUTs with n − 1 inverters and one buffer. These four RO structures are named one, three, five, and seven RO stages. As shown in the figure, the one-stage structure is implemented using the green color (two LUTs: One inverter and one buffer), the three-stage structure is implemented using green and blue colors (four LUTs: Three inverters and one buffer). The five-stage structure is implemented using green, blue, and purple colors (six LUTs: Five inverters and one buffer). Finally, the seven-stage structure is implemented using green, blue, purple, and brown colors (six LUTs: Seven inverters and one buffer). As seen in Figure 5, the enable signal (VDD), shown by the red lines, is used to activate the selected the CLB and configure the PXOR LUTs to buffer/inverter. The creation of identical ring oscillator loops is important to eliminate the delay caused by the dynamic routing of FPGA devices.      To implement the design on the entire area of FPGA, each FPGA area is divided into top and bottom regions with 120 CLBs for each region. The hard macro design is instantiated at all the CLBs in each FPGA area separately. Figure 8 shows the full architecture of the proposed d-ROPU technique for an FPGA area with 120 CLBs.The figure shows a timing micro-controller that controls activation and deactivation periods, challenge generator, 120 decoders and encoder, hard macro of the individual ROs, the reconfiguration mechanism, and main and reference counters connected to a logic analyzer and PC for the collection of data samples. A short delay period (stabilization period) is needed prior to the activation of the main counter for the signal to be stabilized before the counter starts to measure the actual frequency of the activated RO.
As shown in Figure 8, as soon as the T1 signal is received by the 16-bit binary counter, the counter starts counting for a stabilization period of 0.1 ms (prior to the activation of any RO). At the end of the stabilization period, a T2 is sent to the challenge generator, 120 decoders, and a 14-bit reference counter for the activation of a selected RO. The main counter counts the number of clock cycles of the active RO for 0.1 ms (activation period). At the same period, the reference counter counts the number of clock cycles of the reference clock (50 MHz) with a step size of 5000 clock cycles. At the end of the activation period, the timing controller receives a T3 signal from the reference counter to send a deactivation signal to deactivate the running RO and stop the counters. As soon as the deactivation T2 signal is received, the decoder disconnects the active RO and the challenge generator automatically generates the proper challenge for the next RO to be activated, and the reference counter zeros out its values and stops the counting process. When the deactivation T1 signal is received, the main counter stops the counting process and forwards the counted number of clock cycles to the Agilent logic analyzer through a 16-bit data bus connection. To avoid the self-heating noise of the neighboring ROs, each RO is activated for a shorter time of 0.1 ms (activation period), and another period of 0.1 ms is allowed before the activation of the next RO. Since a total of 10 sample frequencies per an individual RO is considered for data analysis, the total time of the 10 RO runs in a d-ROPUF structure is 10 × 0.3 ms = 0.003 s. For the collection of the data samples from a d-ROPUF structure (120 ROs), a total time of 120 ROs × 0.003 ms = 0.36 s is needed. As shown in Figure 8, upon receiving T4 by the re-reconfiguration mechanism (after every 0.36 s), R1 and R2 signals are automatically generated to reconfigure the d-ROPUF design to a new structure with a different number of RO stages. These signals are separately generated (a part of the input challenge) and properly controlled by the reconfiguration mechanism and the timing controller. This iterative process is continued, and the generated RO sample frequencies for the d-ROPUF structures are collected in the form of data sheets and stored at the logic analyzer before they are analyzed using a PC. Figure 9a,b shows the concept behind bit flip occurrence in the PUF output at varying temperatures due to the change in the RO frequencies [9]. As seen in the figures, as the temperature increases, the frequencies represented in blue and green lines (two RO frequencies of a selected frequency pair) decreases. As shown in Figure 9a, as the temperature increases, the blue line drops faster than the green one, which makes its frequency value less than the value of green line (below RO frequency < green RO frequency).

Reliability and Bit Flips
This results in a bit flip occurrence in the generated PUF output at different temperatures. However, as illustrated in Figure 9b, even if the temperature increases, the blue line did not drop faster than the green line, and thus, there is no bit flip occurrence in this case. From the same figures, it is observed that the value of frequency difference between the blue RO frequency and green RO frequency lines in Figure 9a is smaller as compared to the same value in Figure 9b. For the selection of reliable RO frequency pairs, a higher value of the frequency difference plays the key role in preventing bit flips in the generate PUF responses. In other words, bit flips can be prevented by selecting the RO frequency pairs with reliable frequency differences.

Optimal Time Delay Algorithm (OTDA)
The proposed OTDA is an executive comparison based algorithm that compares sample frequencies based on their O(n 2 ) complexity. Figure 10 shows the results of the experimental study to determine reliable frequency difference threshold used to prevent bit flip occurrences [6]. This threshold is used to generate a reliable PUF output at varying operating conditions using data samples collected from a population of 30 Spartan 3E FPGAs. From the figure, it is observed that RO frequency pairs with frequency differences of 1.5 MHz or higher can be considered reliable in avoiding bit flip occurrences. This value (1.5 MHz) is the reliable threshold where the probability of a bit flip in the generated response bits is 0%. The OTDA is divided into two main steps. The generation of reliable CRPs of different d-ROPUF structures is step 1, which is represented by Algorithm 1 [6]. (i, j are loop counters for the location of reliable RO frequency pairs) k ← 1; (k is row counter (k rows) of reliable challenge-response array CRPs[k,5]) Td i , Td j , ∆Td ij ← 0; (Initialize RO Time delays) Threshold ← 0.67ms; (Assign the reliable threshold value) f [RO num ] = xlsread(RO f requencies); (Store RO frequencies of d-ROPUF in an f array) The OTDA step 1 flowchart is also shown in Figure 11. As seen in the figure, sample frequencies of the individual ROs that are generated at certain operating conditions (temperature and/or supply voltage) are the main input data to the OTDA. The goal of this step is to store reliable frequency pairs with a frequency difference of 1.5 MHz or higher. To accomplish this, an RO frequency freq(i) is compared with all RO frequencies, an operation having O(n 2 ) complexity, using time delays (Td i , Td j ) of RO frequency pairs. These time delays are computed as follows: The absolute difference between the time delays (∆T d ) is computed and used to generate reliable responses as follows: The reliable threshold (R.T.) value is empirically calculated from data samples (RO frequencies) that are collected under varying operating conditions. A reliable response bit is generated based on the following equations: The input and output of the OTDA are as follows: • Inputs: 240 frequencies extracted for one, three, five, and seven stages of d-ROPUF structures represented as f i . • Output: List of the possible reliable RO frequency pairs stored in an array, named CRPs(n:5) , in n rows and five columns, where n <= n(n − 1).
Specifically, the first two values of each row of are the frequency pairs (f(i), f(j)) of the reliable ROi and ROj (that passes the optimal time delay threshold). The next two values are the indexes of these ROs (i and j values). The last value is the reliable response bit r (0 or 1) generated by comparing the frequencies of the two ROs. As previously mentioned, the OTD algorithjm aims to enhance the capability of the d-ROPUF to obtain larger a CRP space, computed using the probability of combinations of statistical formula as shown in Figure 12. For this, sample RO frequencies are digitized using the proposed OTD algorithm and the neighbor coding algorithm, as used by previous researchers [3,9]. Response bits are generated from the top and bottom (120 CLBs) of 30 FPGAs, and the possible number of reliable bits are obtained with the help of the OTDA algorithm. After the selection of reliable response bits from different d-ROPUF structures, Algorithm 1 is used to determine the possible cryptographic keys of length 64, 128, and 256 as follows: • Inputs: CRP[i,j] the set of the reliable response bits for 1, 3, 5, 7 stages of ROs that is the result of Algorithm 1.  The number of possible reliable RO comparison Challenge Response Pairs (CRPs) for all d-ROPUF stages (one, three, five, seven), that pass the reliability threshold (OTD > 0.67 ms) for 30 different FPGA chips (S3E100) are shown in Table 1. Table 1 shows the number of reliable d-ROPUF response bits generated at varying temperatures. The table also shows the result of all possible reliable RO comparison CRPs for d-ROPUFs (three stages) which pass the optimal time delay threshold (OTD > 0.67 ms) for 30 FPGAs. It is observed that the OTDA can enhance the capability of the CRP space based on the probability of combinations of statistical formula. Table 1 shows a substantial improvement in the possibly reliable cryptographic keys of different lengths (64 bits, 128 bits, 256 bits, and 512 bits) which highly improves the security of the ROPUF against modeling attacks.

Conclusions
This paper presents a dynamic ROPUF structure (d-ROPUF) with updated challenge/response behavior to boost silicon PUF's security against modeling and machine learning attacks. The design is implemented and evaluated on 30 Spartan 3E FPGA devices under varying operating conditions. An Optimal Time Delay Algorithm (ODTA) is proposed to improve the reliability and improve the CRP space of silicon PUFs. Experimental results show that the proposed algorithm generates a considerable number of reliable comparison pairs with no bit-flips occurrence at varying environmental conditions.