cryptography

: In this work we propose a novel implementation on recent Xilinx FPGA platforms of a PUF architecture based on the NAND SR-latch (referred to as NAND-PUF in the following) which achieves an extremely low resource usage with very good overall performance. More speciﬁcally, a 4 bit NAND-PUF macro has been designed referring to the Artix-7 platform occupying only 2 slices. The optimum excitation sequence has been determined by analysing the reliability versus the excitation time of the PUF cells under supply voltage variations. A 128 bit NAND-PUF has been tested on 16 FPGA boards under supply voltage and temperature variations and measured performances have been compared against state-of-the-art PUFs from the literature. The comparison has shown that the proposed PUF implementation exhibits the best reliability performance while occupying the minimum FPGA resource usage achieved in the PUF literature.


Introduction
Nowadays computer society has become more and more focused on the Hardware Security threat due to the increasing effectiveness of hardware attacks and tamper methods [1,2]. Indeed, though our smartphone, laptops and tablets are protected by secure software protocols, they are exposed to hardware attacks. The most common approach of modern cryptographic algorithms is to exploit secret keys stored in device's memories to hide sensitive data. However, the great hypothesis of these security protocols is to assume that the cryptographic key stored on the device memory (which is used by the cryptographic algorithms) can't be accessed by malicious attackers. Nowadays we can say that this is a too optimistic hypothesis since, in the last twenty years, researchers have investigated several tampering techniques such as micro-probing, focused ion beam, glitch attacks and side-channel attacks, demonstrating a fundamental weakness from the hardware security prospective in IoT devices [3][4][5][6][7]. Over the years researchers have focused on novel techniques to store secret keys in order to deal with hardware attacks and new tamper methods. Finally, the Physical Unclonable Functions (PUFs) have been introduced, demonstrating to be a secure mechanism to store cryptographic keys [8]. A PUF is a physical entity that produces an ideally unclonable output which depends on the physical implementation and characteristics of the device itself [9]. Indeed, PUFs are based on physical phenomena which exploit the peculiarity of the device's silicon (given by mismatch and process variations), to extract a "fingerprint" that replaces the cryptographic key [10]. PUF-generated keys offer an advantage over traditional storage methods, like non-volatile memories (NVMs), as they are generated dynamically upon each usage, rather than being physically stored within the device [11]. Additionally, PUF-based devices can be challenging to reverse-engineer due to the unpredictable nature of manufacturing variations. This has led to the adoption of innovative security solutions utilizing PUFs, such as protection The basic SR-Latch implemented by means of two cross-coupled NOR gates has been exploited to implement a PUF for the first time in [37], where a full-custom implementation referring to a 130 nm CMOS process is presented. Then in [38,39] two NAND-based SR-Latch PUFs have been proposed. More recently a revisited version of the NAND-based SR-Latch has been exploited to implement a PUF whose performances have been assessed through measurements on a single Altera Stratix III FPGA board in [40]. Another improved version of the NAND-based SR-Latch has been then presented in [41]. Both the PUF implementations in [40,41] and require additional gates to guarantee reliable operation of the NAND-PUF, resulting in relatively high FPGA resources usage.
In this work, we introduce a new implementation of the NAND-PUF which exploits a completely novel excitation strategy to guarantee high reliability while keeping the FPGA resources usage extremely low. The performances of the proposed implementation have been evaluated on several Xilinx Artix-7 (28 nm process) FPGA boards. The proposed PUF requires minimal resources usage (only two Look Up Tables) and demonstrates remarkable stability under voltage and temperature variations. To the best of our knowledge, this design is one of the most compact FPGA-compatible PUFs reported in the literature, offering 2 bits/slice density.
The organization of the paper is as follows: Section 2 outlines the PUFs metrics used to evaluate PUFs performance, in Section 3 the architecture of the NAND-PUF is analyzed and the FPGA implementation presented. Section 4 presents the results of the experimental validation, Section 5 compares the proposed NAND-PUF implementation with the state-of-the-art and finally in Section 6 conclusions are drawn.

Review of Main Performance Metrics for PUFs Evaluation and Comparison
Since PUFs are used in several authentication protocols such as, for example, the Challenge Response authentication, they have to be characterized through a rigorous analysis, and their performances have to be evaluated through standard metrics, which allow also to compare different PUF architectures. In the following we review the most commonly adopted metrics for PUFs performance evaluation: Uniformity , Randomness, Uniqueness, and Reliability.
Since PUFs should be exploited to generate a key, high entropy (e.g., Shannon entropy) has to be guaranteed in order to make the PUF-generated keys suitable for cryptographic purposes. Indeed, if the number of 0s of a PUF-generated key is exactly equal to the number of 1s, it means that the entropy of the key is 1 and no masking technique is required. The number of 0s and 1s is defined as Uniformity or also Bias of the response.
Though the Uniformity of a PUF gives information about the quality of the key of a given PUF implementation, it doesn't give any information about the bistream 0 s and 1 s distribution. The Randomness of a PUF is a measure of the quality of the bitstream extracted in terms of statistical performances. Indeed, each PUF generates an n-bit response based on a specific challenge string. To guarantee the unpredictability of the key, Randomness must be assessed as specified in [42]. More specifically, according to [33,[43][44][45][46], the Randomnes of a given PUF realization can be evaluated through a subset of NIST random tests [47]. Each test of the NIST suite produces a positive value p in the interval p ∈ [0, 1] (p-value). The closer the p-value is to 1, the better the bitstream performs. The PUF bitstream passes the given test if the p-value is greater than 0.01 [35,43].
The Uniqueness of a PUF relies on its inherent randomness generated by the manufacturing variability of the underlying physical structure. The response of a PUF instance on silicon is unique to each device being given by a combination of mismatch and process variations associated to the manufacturing process of the integrated circuit. Therefore Uniqueness has to be quantified on different implementations of the same PUF circuit over different devices (FPGAs or chips). More specifically, the same design has to be physically implemented on different devices and each of these devices has to be excited with the same stimuli in the same environmental conditions. Then, the unique identifier (i.e., the collected response to a given challenge) has to be extracted from each device and the average value of the sum of inter-class Hamming Distance (HD inter ) between each possible pair-wise response couple has to be computed. According to [9], the inter-class HD is defined as: where k are the realization of the PUF (i.e., the number of device under test), n are the bit of the response, (R i ) is the i-th response taken from the i-th implementation of the PUF. To ensure each PUF fingerprint is unpredictable, the same challenge applied to n PUFs should yield different responses. Therefore, the ideal n-chip inter-class Hamming Distance should be 50%. The Reliability of a PUF is determined by how consistently it generates the same response to a given stimulus across different sessions and environmental conditions (i.e., different temperatures and different supply voltages). Indeed, some of the bit-cells of the PUF array could produce bits which vary when noise or voltage or temperature variations occur. These cells are called Unstable Cells and are related to the Reliability. It has to be noted that a given cell is considered unstable even if it generates just one output different from the others in a set of 1000 measurements. The Reliability is evaluated by selecting a reference challenge-response, and comparing it with new ones generated using the stimuli in different conditions. More specifically, the intra-class Hamming Distance (HD intra ) between the Golden-Key (GK) extracted in nominal condition, and k responses, collected in their respective power supply voltage and working temperature (typically ±10% of the nominal V DD and T ∈ [0 • C,75 • C]) is evaluated. In details, the Reliability is defined as follows [9]: where R i represents the generated i-th response at a given power supply voltage and temperature, and R re f the reference GK one. The evaluation of these metrics allows to characterize the PUF, covering a broad spectrum of possible working conditions. Another widely adopted parameter to characterize a PUF is the Bit-Error-Rate (BER), defined as: Since from the information of the BER in a given condition, the value of the Reliability can be extracted from Equation (2), the BER is often used in PUF evaluation instead of the Reliability. It has also to be remarked that each PUF is characterized by a nominal BER (BER Typ ) due to transient noise variations which introduce noise in the excitation sequence of the PUF.

NAND-PUF Architecture and Design
The NAND-PUF architecture adopted in this work is depicted in Figure 1. It is composed by two NAND gates, denoted as I 1 and I 2 , arranged to implement a NAND SR-Latch circuit. The NAND-PUF exploits the prohibit excitation state (i.e., both the Set and Reset of the NAND latch equal to 0) of the SR-Latch to excite a fully symmetric circuit in order to generate an unique key response whose outcome ideally only depends on technology mismatch variations.
The main novelty of this paper is the excitation strategy adopted to increase the reliability of the basic SR-Latch. In order to gain insight into circuit behaviour and to better explain the proposed excitation sequence, the transistor level scheme depicted in Figure 2 can be utilized.   Referring to the value of the exitation signal Start, the excitation sequence can be splitted into two main intervals:

1.
When the Start signal is low (i.e., Start = Gnd), the outputs O 1 , X, Y and O 2 are set to V DD and no current flows in Mn 1,1 and Mn 1,2 . More in detail, this excitation state forces the sources of Mn 2,1 and Mn 2,2 to V DD , turning off transistors Mn 2,1 , Mn 2,2 , Mp 2,1 , and Mp 2,2 . The equivalent circuit in this phase is depicted in Figure 3a.

2.
When the Start signal is high (i.e., Start = V DD ), the operation of the circuit can be further divided into three different phases: (a) In the first phase, when Start goes high, transistors Mn 1,1 and Mn 1,2 are activated, while transistors Mn 2,1 , Mn 2,2 , Mp 2,1 , and Mp 2,2 are turned off. During this phase, the two transistors Mn 1,1 and Mn 1,2 discharge the parasitic capacitances at the source of Mn 2,1 and Mn 2,2 , which were previously charged to V DD , until the gate-source voltage V gs of Mn 2,1 and Mn 2,2 is greater than the threshold voltage V th n . The equivalent circuit in this phase is depicted in Figure 3b.
differential output voltage whose sign depends on the mismatch of the three transistors Mn 1,1 , Mn 2,1 , and Mp 2,1 , and the transistors Mn 1,2 , Mn 2,2 , and Mp 2,2 , respectively. The equivalent circuit in this phase is depicted in Figure 3d. Thus, the working principle of the NAND-PUF relies on the conversion of a small current difference given by transistors Mn 1,1 and Mn 1,2 and guaranteed by mismatch variations, to generate a differential output voltage due to a positive feedback.

Architecture on FPGA
When referring to a semi-custom, standard-cell implementation of an ASIC, the NAND SR-Latch in Figure 1 can be easily implemented by using digital NAND gates taken from the standard-cell library of the technology. However this kind of straightforward implementation is not allowed on FPGA platforms, where the access to single digital gates is not available, and mux-based look-up tables are used to implement logic functions. Indeed, FPGAs are arranged as a matrix of Configurable Logic Blocks (CLBs). Each CLB contains programmable logic elements (such as lookup tables and flip-flops) and routing resources that can be configured by the user to implement custom digital logic circuits. CLBs are connected together to form the programmable interconnect fabric of the FPGA. They can be connected in various ways to create complex digital circuits, and the number and arrangement of CLBs in an FPGA determine its size and capacity. Each CLB contains two Slices which are complex blocks including the configurable digital blocks (synchronous and asynchronous) to implement digital operations. Each slice contains 8 Flip-Flops, 4 of which can be configured as Latches and 4 Look Up Tables (LUTs), each of which can be configured as a 6 inputs 1 output function or as a 5 input 2 outputs function. LUTs can be configured to perform the NAND function, thus for each NAND-PUF cell, just two LUTs are required. However, one of the most complex and at the same time critical step in the design of a PUF on FPGA is the symmetry of interconnections. If the interconnections are not balanced in terms of delay path, the response will be degraded, reducing the biasing performance and uniqueness of the PUF. To address this issue, manual balanced interconnections must be selected. In this work, a novel and original implementation of the NAND-PUF on an FPGA is presented, with accurate design to balance interconnection delays and NAND elements.
The macro of 4 NAND-PUF bit-cells implemented using 8 LUTs is shown in Figure 4. NAND gates and interconnections belonging to each of the 4 PUF cells have been highlighted with the same color, to visually show that the interconnections belonging to the same PUF bit exhibit a similar path lenght. To quantify paths delay in a more detailed way, a delay analysis on single paths and on the delay differences between pairwised paths has been carried out. The propagation delay of each of the 8 interconnections has been denoted as tp i,j , where i assume values: A, B, C, D and j assumes values: 1, 2. Progation delay values obtained from the Xilinx Vivado design tool after place and route on the FPGA device are summarized in Table 1.  From that analysis it is clear that the difference in terms of propagation delay (denoted as |Delta t p |) is always lower than 20 ps, and thus nominally we can say that are well balanced, since mismatch variations would dominate over the systematic difference in the propagation delay. It should be emphasized that integrating the architecture on an FPGA introduces a routing delay which alters the behavior of the NAND PUF. As a result, a phenomenon similar to the one described in [48][49][50] is observed. Nevertheless, as shown in [34], the delay difference caused by mismatches still determines the output value (either 0 or 1).

Comparison with Respect to Previous SR-NAND Latch Implementations
The literature already contains various implementations of NAND-based SR-latches, such as those presented in [38,39]. In [38], a new method for utilizing unstable bits from SR-latches was proposed. By exploiting the position of these unstable bits, the variety of responses was increased from 2 N to an ideal value of 3 N . However, the FPGA implementation of the PUF presented here differs from the one in [38]. In the implementation on the Spartan-3e of [38], the focus was on guaranteeing a maximum of 43 unstable cells to maximize the possible challenge-response pair set. To achieve this, a flip-flop was added before the NAND latch, increasing the number of unstable bits in the response and reducing the skew between the two NAND gates. This design implementation required two LUTs and one flip-flop, and was integrated into a single CLB on the Spartan-3E. The authors of [38] also presented an implementation on the Spartan-6 in which two NAND cells and two flip-flops were used to reproduce the same architecture as in [51]. This implementation has an high impact on the hardware resources, requiring four different Slices for each instance.
Another NAND based SR-latch was designed on a Spartan-6 in [39]. The architecture of [39] was integrated with a resource consumption of half slice, very similar to [33][34][35][36]52]. Such implementation has been already reported in [38,51]. However, authors inserted an explicit reset condition in [39] through two Flip-Flops positioned in front of the latch. It has to be remarked that, even if the implementation on FPGA of [38,39] are similar to the one presented in this work, the key extraction relies on different sources of entropy. Indeed, in [38] authors extracts the key directly from the final state of the NAND latch, however selecting the most reliable 128 PUF-cells among the 512 instantiated, whereas in [39] the information comes by the number of oscillations that each SR latch made until it finds the steady state.
The proposed design exhibits several differences with respect to previous approaches: first of all, we exploits only the SR NAND latch, and through the custom routing strategy we guarantee a good matching between the two delay paths, providing good uniqueness and ensuring the metastability of the SR latch without the exploitation of two additional flip-flops. In terms of hardware resources, we implement 4 instances in a single CLB, and each instance exploits just two LUTs. Thus, the 4 LUTs of each Slice are occupied by the 4 NAND cells. Furthermore, we implement the 128 bit macro with considering 32 CLB, arranged in a 8 × 4 matrix. In Figure 5 the FPGA implementation of [38,39] and this work have been depicted. The enabled LUTs and Flip-Flops of CLBs have been highlighted in gray color.

Experimental Results
In this Section, we report experimental results on the evaluation of the proposed NAND-PUF. As a case study, we have considered a 128-bit NAND-PUF cell array, meaning that 32 4-bit macros have been placed in a 8 × 4 array. Adopting a macro-based design allowed us to preserve the balanced internal routing and structure of the NAND-PUF cells. The analysis of the proposed implementation has been carried out by considering a broad range of experimental aspects, in order to fully cover its behavior with PVT variations.

Testbed of the NAND-PUF
Our evaluation campaign has involved 16 boards, mounting Xilinx Artix-7 100t FPGAs. FPGAs' core voltage has been supplied by means of Teledyne T3PS43203P programmable power supply unit. We have considered a supply voltage range of ±10% around the nominal one, which is 0.9V (model-2e of the Xilinx Artix-7). The working temperature of the FPGA has been accurately set by using an Espec SH-621 climate chamber, in the range [5 • C,80 • C]. A system clock of 50 MHz has been used for all the measurements. We refer hereon to the duration of the Start signal as N CLK in number of clock cycles, considering the system clock period as timing reference.
All boards are supervised through a daughter board mounting a FT232H chip, which serves as USB-SPI interface, stimulated by custom Python scripts. The duration of the Start signal can be changed through the SPI. By means of this simple but effective testbed, we have been able to evaluate the steadyness of the proposed PUF under different stimuli conditions, revealing some interesting features when power supply voltage variations take hold. The block scheme of the adopted testbed is depicted in Figure 6.

Reliability, Bias, Unstable Cells and Uniqueness
In order to evaluate the performance of the proposed NAND-PUF and evaluate the impact of the start stimuli over main performance parameters, we have measured 10 3 responses for different time-durations (N CLK ) of the Start signal ranging from 1 up to 256 clock cycles. The BER in terms of Intra Hamming Distance in orange, the Bias of the response in green and the Unstable Cells (UCs) in blue are reported in Figure 7 as a function of N CLK . As it can be observed, the time in which the output is sampled has an impact on both the UCs and the BER. The more time the PUF array cells are leaved to run, the lower the number of UCs will be, and, as a consequence, a lower BER and a higher Reliability will be achieved. It has to be pointed out, however, that in this reasoning, we are losing a key parameter, which is the BER when supply voltage variations occur. In order to consider also this point in our discussion, we tested for 30 different values of N CLK the reliability of the PUF when the voltage varies of ±10% and results of the measurement campaign over a reference board (the one also used for the analysis of Figure 7) are reported in Figure 8 as an heatmap whose values are written in each box.
The inter-class HD over 16 Artix-7 boards is depicted in Figure 9, with the values on the x-axis reported as a percentage of 128-bit responses. The mean value of the inter-class HD has been found to be 49.50% with a standard deviation of 4.59%. As can be seen, the obtained values are very close to the ideal value of 50%, confirming the effectiveness of the proposed implementation and in particular of the routing strategy. Indeed, these results demonstrate that the proposed NAND-PUF implementation is able to efficiently extract entropy from the manufacturing process and our statistical analysis of responses from different devices implementing the same circuit did not reveal any structured artifacts (unbalanced routing or significant unbiased cells).
In typical condition (i.e., V DD = 0.9 V) the HD intra is always good (see Figure 7) but under supply voltage variations the number of unstable cells for low values of N CLK increases and the Reliability worsens. However, for N CLK greater than 128 the Reliability is good also at the two ±10% supply voltage corners. According to this analysis we have chosen N CLK = 256 to implement an excitation sequence which guarantees high performance even under supply voltage variations. Once selected the number of clock cycles on which the Start signal has to be set high, we tested reliability performance of the NAND-PUF with respect to voltage and temperature variations. Experimental results are reported in Figures 10 and 11. As it can be observed, in both the tests, the Reliability is always greater than 93% and, it is evident that the impact of voltage variations is stronger than the one of temperature variations. Indeed, the worst case corner for temperature variation is met at approximately 70 • C, which is about 96%, that can be considered good also with respect to nominal corner, indeed it is worsened of about 2%, which is a very good result considering that the architecture is implemented on an FPGA and not on an ASIC.

Enhancing the Set of Possible Responses
In [38] a technique to enhance the set of responses by exploiting unstable cells positions has been proposed. Denoting with UC the number of unstable cells, the set of possible responses can be expanded to: where 128 C UC denotes the number of combinations of 128 elements taken UC at a time as in [38]. The upper bound of this technique is given when 128/3 unstable cells are generated. We investigated the performance of this technique when N CLK is changed and experimental results have been depicted in Figure 12. These results are extracted from 1000 repeated read-out responses from each location, in accordance with [38]. As it can be observed, the best number of unstable cells is reached at N CLK = 6.

Randomness and Uniformity Results
In this section, experimental results of a thorough measurement campaign over 16 FPGA boards is reprted, with the aim of evaluating the Randomness of the proposed NAND-PUF implementation. NIST tests have been carried out according to [53,54]. The results of these tests are presented in Table 2. In Figure 13 a visual representation of the 128x16 bits extracted from the 16 devices is depicted and from that figure, by a visual inspection it is clear that the generated sequences are uncorrelated. We found that produced responses over 16 boards met the minimum requirements to pass all tests. We further calculated the average bias of the response by taking the mean of the 16 devices' responses. Results have shown that the bias of the NAND-PUF is approximately 46.85% with a standard deviation of 4.06%. Experiments revealed that the best uniformity was achieved in 3 of the 16 tested devices, with 1/0 bias values of 50.00%. These results suggest that the proposed NAND-PUF design effectively exploits process and mismatch variations, and there is no evidence of systematic bias. It can therefore be concluded that with a larger number of devices, the bias would converge to a mean value closer to the ideal one of 50%. It's important to note that the responses used in this analysis were taken without any post-processing or elaboration techniques.

Comparison with State-of-the-Art
In order to compare performances of different PUF designs we introduce the following figures of merit (FOMs): and their version normalized with respect to the resource consumption: where BER typ is the BER measured in typical conditions by multiple repeated measurements (HD intra ), ∆ V and ∆ T denote the maximum range of voltage and temperature variations assumed in the measurements respectively, and BER wc V,T is the worst case BER measured under voltage and temperature variations respectively. Finally the bits/Slice is the number of PUF-bits which can be implemented in a single FPGA Slice. According to the above definitions, FOM HD andFOM HD evaluate both how good the Reliability and Uniqueness of a given PUF are, whereasFOM HD takes into account also the resource consumption in terms of bits/Slice. The higher the FOM HD andFOM HD are, the better the PUF performs with respect to Uniqueness and Reliability. The FOM BER V,T evaluates how good is the Reliability with respect to Voltage (BER wc V ), Temperature (BER wc T ) and Transient Noise (BER Typ ) variations in selected variation ranges (r.g. ∆ V,T ) for a given PUF implementation. TheFOM BER V,T takes into account also the resource consumption. The lower is the BER, the higher the FOM BER is. Thus an higher FOM B ER implies a lower impact on PUF-generated key when environmental variations come.
The proposed NAND-PUF implementation has then been compared with other FPGAintegrated PUF architectures and main metrics are summarized in Table 3. For what concerns the Uniqueness and the Reliability in nominal conditions, the proposed PUF results comparable with the state-of-the-art. By considering the required resources in terms of Slices and CLBs, the proposed design is comparable to [34,36] and is among the most compact architectures deployed on FPGA.
However, PUF-primitives employed in [24,33,34,52] occupies two LUTs and two flip-flops while the here proposed design demands just two LUTs, thus outperforming the above mentioned implementations. Indeed, with respect to theFOM HD which evaluates also the resource consumption, it is clear that [36] outperforms other works, but it is followed by the proposed work in terms of performance.
On the other hand, among the most compact architecture on FPGA (i.e., the ones which occupy 0.5 Slice/bit), the proposed architecture has resulted to be the most reliable one, reaching the best FOM BER V,T . If compared with works which employ more hardware resources, it is overcome by just [39] which however has 4 times the resource consumption and requires also some post-processing to select the most reliable cells among the ones instantiated on FPGA.

Conclusions
In this work the NAND-PUF architecture has been successfully implemented on Xilinx FPGAs for the first time, with a focus on the Artix-7 platform achieving a resouce usage as low as 0.5 slices per bit. The macro of 4 NAND-PUF bit-cells implemented using 2 slices has been optimized with custom place and route scripts to achieve delay balancing with differences in terms of propagation delay lower than 20 ps, thus guaranteeing that mismatch variations would dominate over the systematic difference in the propagation delay. The reliability of the proposed implementation has been strongly improved with respect to previous works dealing with SR-Latch-based PUFs by using a novel excitation approach. The optimum excitation sequence has been determined through a thorough measurement campaign considering supply voltage variations. A 128-bit PUF cell array, implemented as 32 balanced 4-bit macros has been tested on 16 Artix 7 FPGA boards, and the results have been compared to state-of-the-art PUFs. The comparison has demonstrated that the proposed implementation exhibits comparable performance with state-of-the-art PUFs in terms of Uniqueness and Reliability in nominal conditions, while occupying the minimum FPGA resource count achieved in the literature. In addition, the proposed NAND-PUF reaches the best trade-off with respect to resource consumption and Reliability, reaching the bestFOM BER V,T of about 5.961, generating 2 bits/Slice.