Analysis of Entropy in a Hardware-Embedded Delay PUF

: The magnitude of the information content associated with a particular implementation of a Physical Unclonable Function (PUF) is critically important for security and trust in emerging Internet of Things (IoT) applications. Authentication, in particular, requires the PUF to produce a very large number of challenge-response-pairs (CRPs) and, of even greater importance, requires the PUF to be resistant to adversarial attacks that attempt to model and clone the PUF (model-building attacks). Entropy is critically important to the model-building resistance of the PUF. A variety of metrics have been proposed for reporting Entropy, each measuring the randomness of information embedded within PUF-generated bitstrings. In this paper, we report the Entropy, MinEntropy, conditional MinEntropy, Interchip hamming distance and National Institute of Standards and Technology (NIST) statistical test results using bitstrings generated by a Hardware-Embedded Delay PUF called HELP. The bitstrings are generated from data collected in hardware experiments on 500 copies of HELP implemented on a set of Xilinx Zynq 7020 SoC Field Programmable Gate Arrays (FPGAs) subjected to industrial-level temperature and voltage conditions. Special test cases are constructed which purposely create worst case correlations for bitstring generation. Our results show that the processes proposed within HELP to generate bitstrings add signiﬁcantly to their Entropy, and show that classical re-use of PUF components, e.g., path delays, does not result in large Entropy losses commonly reported for other PUF architectures.


Introduction
The number of independent sources of information used to distinguish a system is a measure of its complexity, and relates to the amount of effort required to copy or clone it. The relationship between complexity and effort can be exponential, particularly for systems designed to conceal or mask the information and only provide controlled access to it. A physical unclonable function (PUF) is an information system that can meet these criteria under certain conditions. The information embedded in a PUF is random, enabling it to serve hardware security and trust roles related to key generation, key management, tamper detection and authentication [1]. PUFs represent an alternative to storing keys in non-volatile-memory (NVM), thereby reducing cost and hardening the embedding system against key-extraction-based attacks. PUFs are widely recognized as next-generation security and trust primitives that are ideally suited for authentication in industrial, automotive, consumer and military IoT-based systems, and for dealing with many of the challenges related to counterfeits in the supply chain.
PUFs enable access to their stored random information using a challenge-response-pair (CRP) mechanism, whereby a server or adversary 'asks a question' usually in the form of a digital bitstring and the PUF produces a digital response after measuring a set of circuit parameters within the chip. The nanometer size of the integrated circuit (IC) features and the analog nature of stored information makes it extremely difficult to read out the information using alterative access mechanisms. The circuit parameters that are measured vary from one copy of the chip to another, and can only be controlled to a small, but non-zero, level of tolerance by the chip manufacturer. This feature of the PUF makes it unclonable and provides each copy of the chip with a distinct 'personality', in the spirit of fingerprints or DNA for biological systems.
Strong PUFs are a special class of PUFs that are distinguished from weak PUFs by the amount of information content they possess. The traditional definition for distinguishing between weak and strong PUFs is to consider only the number of CRPs that can be applied. For weak PUFs, the number of CRPs is polynomial while strong PUFs have an exponential number, e.g., the number of challenges for an n-binary-input weak PUF can be n 2 while a strong PUF typically has 2 n . Unfortunately, this traditional definition leads to a misnomer as to the true strength of the PUF to adversary attacks. For example, the original Arbiter PUF [2,3] is classified as strong even through machine-learning-based model-building attacks have shown that only a small, polynomial, number of CRPs are needed to predict its complete behavior.
Therefore, a truly strong PUF must have both an exponential number of CRPs and an exponential number of unique, uncorrelated responses, i.e., a large input challenge space is necessary but is not a sufficient condition. This requires the PUF to have access to a large source of entropy, either in the form of IC features from which random information is extracted, or in an artificial form using a cryptographic primitive, such as a secure hash function. Either mechanism makes the PUF resilient to machine learning attacks. However, using a secure hash for expanding the CRP space of the PUF and for obfuscating its responses consumes additional area and increases the required reliability of the PUF. Therefore, the former scenario, i.e., a large source of entropy, is more attractive but more difficult to achieve.
In this paper, we present results that support this more attractive alternative using a hardware-embedded delay PUF called HELP. HELP generates bitstrings from delay variations that occur along paths in an on-chip macro, i.e., the source of entropy for HELP is within-die manufacturing process variations that cause path delays to be slightly different in each copy of the chip. Macros or functional units that implement cryptographic algorithms and common data path operators such as multipliers typically possess at least 32 inputs and therefore, HELP meets the large input space requirement of a strong PUF.
Moreover, the wire interconnectivity within the macro used by HELP provides a large number of testable paths, on order of 2 n for n inputs, satisfying the large output space requirement of a strong PUF. Unlike other PUFs that meet these conditions, the task of generating input test sequences (challenges) that test all of the testable paths is an NP-complete problem. Although this may appear to be a drawback, it, in fact, makes the task of model-building HELP much more difficult. For example, the adversary not only must devise a machine learning strategy that is able to predict output responses, but he/she must also expend a large effort on generating the challenges, which is typically accomplished using automatic test pattern generation (ATPG) algorithms. Note that these characteristics of HELP, namely, the use of a functional unit as a source of Entropy, paths of arbitrary length and the ATPG requirement, distinguish HELP from other delay-based PUFs such as the Arbiter and Ring Oscillator (RO) PUFs.
This paper investigates entropy of HELP using 500 instances of a functional unit (the entropy source) embedded on a set of 20 Xilinx Zynq 7020 Field Programmable Gate Arrays (FPGAs). The specific contributions of this paper include the following: • Strong experimental evidence that HELP leverages within-die variations (WDV) almost exclusively as its source of entropy. • A statistical evaluation of Entropy, MinEntropy, conditional MinEntropy, Interchip hamming distance and NIST statistical test results on hardware generated bitstrings.
• A special worst-case analysis that maximizes correlations and dependencies introduced by (1) full path reuse and (2) partial path reuse where the same paths in different combinations or paths with many common segments are used to generate distinct bits.
The rest of this paper is organized as follows. Related work is presented in Section 2 and an overview of HELP is given in Section 3. Statistical results are described in Section 4 using FPGA-based path delay data and bitstrings. A worst-case correlation analysis is presented in Section 5 and conclusions in Section 6.
One of the earliest delay-based PUFs, called the Arbiter PUF, uses n-bit differential delay lines and a latch to generate a 1-bit PUF response [19,20]. Because of the limited amount of entropy, model-building attacks are effective against the Arbiter PUFs [21]. Ring Oscillator (RO) PUF [22] measure the frequency difference between two identical ring oscillators by counting the transitions on the output of each RO and then comparing counter values to generate a PUF bit. The number of challenges is limited to the number of pairings (n 2 ) and therefore the RO PUF is a weak PUF. The authors of [23] analyze RO frequency differences, selecting those pairings where the frequency difference is large enough to avoid any bit flip errors caused by environmental variations. The authors of [24] propose a scheme to produce (n − 1) reliable bits, and Ref. [25] proposes a longest increasing subsequence-based grouping algorithm (LISA) for FPGAs that sequentially pairs RO-PUF bits and can generate n/2 reliable bits out of n ring oscillators. In [26], the authors propose a regression based distiller to remove systematic variations. PUF responses are affected by the environmental variations such as temperature and voltage variations, thus processing is required to extract the entropy from the noise. Several schemes including helper data and fuzzy extractor schemes are proposed to improve the reliability of bitstring regeneration and improve randomness [27]. Helper data is generated during the enrollment phase, which is carried out in a secure environment and is later used with the noisy responses during regeneration to reconstruct the key. Bosch et al. [28] demonstrated a hardware implementation of concatenated codes based fuzzy extractors that have been used to produce bitstrings with high reliability. Reference [29] discusses a fuzzy extractor scheme based on repetition codes that can limit the usable entropy and show that such a scheme is not applicable to the PUFs with small entropy. Dodis et al. [30] provided a formal definition and analysis of entropy loss in fuzzy extractors. The authors of [31] evaluated the reliability and unpredictability properties of five different types of PUFs (Arbiter, RO, SRAM, flip-flop and latch PUFs) from an Application-specific integrated circuit (ASIC) implementation.

HELP Overview
HELP attaches to an on-chip functional unit, such as a portion of the Advanced Encryption Standard (AES) labeled sbox-mixedcol on the left side of Figure 1. The logic gate structure of the functional unit defines a complex interconnected network of wires and transistors. This combinational data path component includes 64 primary inputs (PIs) and 64 primary outputs (POs) and is implemented in Wave Dynamic Differential Logic (WDDL) logic-style [32] on a Xilinx Zynq FPGA using approx. 2900 LUTs and 30 K wire segments. Path delay is defined as the amount of time (∆t) it takes for a set of 0-to-1 and 1-to-0 bit transitions introduced on the PIs of the functional unit (input challenge) to propagate through the logic gate network and emerge on a PO. HELP uses a clock-strobing technique to obtain high resolution measurements of path delays as shown on the left side of Figure 1. A series of launch-capture operations are applied in which the vector sequence that defines the input challenge is applied repeatedly to the PIs using the Launch row flip-flops (FFs) and the output responses are measured on the POs using the Capture row FFs. On each application, the phase of the capture clock, Clk2, is incremented forward with respect to Clk1, by small ∆ts (approx. 18 ps), until the emerging signal transition on a PO is successfully captured in the Capture row FFs. A set of XOR gates connected to the Capture row FF inputs and outputs (not shown) provide a simple means of determining when this occurs. When an XOR gate value becomes 0, then the input and output of the FF are the same (indicating a successful capture). The first occurrence in which this occurs during the clock strobe sweep causes the current phase shift value to be recorded as the digitized delay value for this path. The current phase shift value is referred to as the launch-capture-interval (LCI). The Clock strobe module is shown in the center portion of Figure 1, which utilizes features on Xilinx Digital Clock Manager (DCM).
The digitized path delays are collected by a storage module and stored in an on-chip block RAM (BRAM) as shown in the center of Figure 1. Each digitized timing value is stored as a 16-bit value, with 12 binary digits serving to cover a signed range between +/− 2048 and 4 binary digits of fixed point precision to enable up to 16 samples of each path delay to be measured and averaged. The digitized path delays are stored in the upper half of the 16 KByte BRAM. We configure the applied challenges to test 2048 paths with rising transitions and 2048 paths with falling transitions. The digitized path delays are referred to as PUFNums, or PN, with PNR used to refer to rising path delays and PNF for falling. Once a set of 4096 PN are collected, a sequence of operations implemented in VHDL are started to produce the bitstring and helper data, as shown on the far right of Figure 1. These operations are described below.

Implementation Details
We created 25 instances of sbox-mixedcol on each of 20 chips, for a total of 500 implementations (25 separate programming bitstreams are generated). Figure 2 shows a screen snapshot of Xilinx Vivado Implementation view, which depicts a completed instance of the functional unit in the lower right corner (labeled as instance1). The VHDL code for sbox-mixedcol is synthesized and implemented into a pblock, which is shown as a magenta rectangle surrounding instance1. Once completed, tcl commands are issued that save a set of constraints for the wire and LUT components of the functional unit to a file called a check-point. The base y coordinate of the pblock is then incremented by 3 to create a sequence of pblock implementations, each of which is synthesized into a separate bitstream. In this fashion, a sequence of identical and overlapping pblock instances of the functional unit are created and tested, one at a time. The rationale for doing this is two-fold. First, it increases the statistical significance of the analysis without requiring a corresponding increase in the number of chips. Second, data from overlapping instances on the same chip implicitly eliminate chip-to-chip process Path delay is defined as the amount of time (∆t) it takes for a set of 0-to-1 and 1-to-0 bit transitions introduced on the PIs of the functional unit (input challenge) to propagate through the logic gate network and emerge on a PO. HELP uses a clock-strobing technique to obtain high resolution measurements of path delays as shown on the left side of Figure 1. A series of launch-capture operations are applied in which the vector sequence that defines the input challenge is applied repeatedly to the PIs using the Launch row flip-flops (FFs) and the output responses are measured on the POs using the Capture row FFs. On each application, the phase of the capture clock, Clk 2 , is incremented forward with respect to Clk 1 , by small ∆ts (approx. 18 ps), until the emerging signal transition on a PO is successfully captured in the Capture row FFs. A set of XOR gates connected to the Capture row FF inputs and outputs (not shown) provide a simple means of determining when this occurs. When an XOR gate value becomes 0, then the input and output of the FF are the same (indicating a successful capture). The first occurrence in which this occurs during the clock strobe sweep causes the current phase shift value to be recorded as the digitized delay value for this path. The current phase shift value is referred to as the launch-capture-interval (LCI). The Clock strobe module is shown in the center portion of Figure 1, which utilizes features on Xilinx Digital Clock Manager (DCM).
The digitized path delays are collected by a storage module and stored in an on-chip block RAM (BRAM) as shown in the center of Figure 1. Each digitized timing value is stored as a 16-bit value, with 12 binary digits serving to cover a signed range between +/− 2048 and 4 binary digits of fixed point precision to enable up to 16 samples of each path delay to be measured and averaged. The digitized path delays are stored in the upper half of the 16 KByte BRAM. We configure the applied challenges to test 2048 paths with rising transitions and 2048 paths with falling transitions. The digitized path delays are referred to as PUFNums, or PN, with PNR used to refer to rising path delays and PNF for falling. Once a set of 4096 PN are collected, a sequence of operations implemented in VHDL are started to produce the bitstring and helper data, as shown on the far right of Figure 1. These operations are described below.

Implementation Details
We created 25 instances of sbox-mixedcol on each of 20 chips, for a total of 500 implementations (25 separate programming bitstreams are generated). Figure 2 shows a screen snapshot of Xilinx Vivado Implementation view, which depicts a completed instance of the functional unit in the lower right corner (labeled as instance 1 ). The VHDL code for sbox-mixedcol is synthesized and implemented into a pblock, which is shown as a magenta rectangle surrounding instance 1 . Once completed, tcl commands are issued that save a set of constraints for the wire and LUT components of the functional unit to a file called a check-point. The base y coordinate of the pblock is then incremented by 3 to create a sequence of pblock implementations, each of which is synthesized into a separate bitstream. In this fashion, a sequence of identical and overlapping pblock instances of the functional unit are created and tested, one at a time. The rationale for doing this is two-fold. First, it increases the statistical significance of the analysis without requiring a corresponding increase in the number of chips. Second, data from overlapping instances on the same chip implicitly eliminate chip-to-chip process variations, and provide a basis on which we can prove experimentally that HELP leverages within-die variations almost exclusively. variations, and provide a basis on which we can prove experimentally that HELP leverages within-die variations almost exclusively.

PN, PND and PNDc Processing Steps
The PN processing operations shown on the far right in Figure 1 are designed to eliminate both chip-to-chip performance differences and environmental variations, while leaving only within-die variations as a source of entropy for HELP. In order to accomplish this, the following modules and operations are defined. The PNDiff module creates unique, pseudo-random pairings between elements of the PNR and PNF groups using two seeded linear feedback shift registers (LFSR). The LFSRs are used to generate 11-bit addresses to access any of the 2048 PNR and PNF values. The two 11-bit LFSR seeds are configuration parameters. The PN differences are referred to as PND. The primary reason for creating PND is to increase the magnitude of within-die variations, i.e., path delay variations are doubled (in the best case) over those available in the PNR and PNF. Figure 3a shows an example of this process using a pairing of paths from the PNR and PNF sets. The graph contains curves for 500 PNR and 500 PNF, one for each of the 500 chip-instances. Although it is difficult to distinguish between the two groups in the figure, the PNF have a larger delay and are displayed above the PNR. The 13 line-connected points in each curve represent the PN measured under a range of environmental conditions, called temperature-voltage (TV) corners. The PN at the x-axis position given by 0 are those measured under nominal conditions (referred to as enrollment values below), i.e., at 25 °C, 1.00 V. The PN at positions 1, 2 and 3 are also measured at 25 °C but at supply voltages of 0.95, 1.00 and 1.05 V. Similarly, the other groups of three consecutive points along the x-axis are measured at these supply voltages but at temperatures 0 °C, −40 °C and 85 °C. The PN measured under TV corners numbered 1 to 12 are referred to as regeneration PN. Figure 3b plots the PND defined by subtracting pointwise, each PNF from a PNR for each chip-instance.

PN, PND and PND c Processing Steps
The PN processing operations shown on the far right in Figure 1 are designed to eliminate both chip-to-chip performance differences and environmental variations, while leaving only within-die variations as a source of entropy for HELP. In order to accomplish this, the following modules and operations are defined. The PNDiff module creates unique, pseudo-random pairings between elements of the PNR and PNF groups using two seeded linear feedback shift registers (LFSR). The LFSRs are used to generate 11-bit addresses to access any of the 2048 PNR and PNF values. The two 11-bit LFSR seeds are configuration parameters. The PN differences are referred to as PND. The primary reason for creating PND is to increase the magnitude of within-die variations, i.e., path delay variations are doubled (in the best case) over those available in the PNR and PNF. Figure 3a shows an example of this process using a pairing of paths from the PNR and PNF sets. The graph contains curves for 500 PNR and 500 PNF, one for each of the 500 chip-instances. Although it is difficult to distinguish between the two groups in the figure, the PNF have a larger delay and are displayed above the PNR. The 13 line-connected points in each curve represent the PN measured under a range of environmental conditions, called temperature-voltage (TV) corners. The PN at the x-axis position given by 0 are those measured under nominal conditions (referred to as enrollment values below), i.e., at 25 • C, 1.00 V. The PN at positions 1, 2 and 3 are also measured at 25 • C but at supply voltages of 0.95, 1.00 and 1.05 V. Similarly, the other groups of three consecutive points along the x-axis are measured at these supply voltages but at temperatures 0 • C, −40 • C and 85 • C. The PN measured under TV corners numbered 1 to 12 are referred to as regeneration PN. Figure 3b plots the PND defined by subtracting pointwise, each PNF from a PNR for each chip-instance. variations, and provide a basis on which we can prove experimentally that HELP leverages within-die variations almost exclusively.

PN, PND and PNDc Processing Steps
The PN processing operations shown on the far right in Figure 1 are designed to eliminate both chip-to-chip performance differences and environmental variations, while leaving only within-die variations as a source of entropy for HELP. In order to accomplish this, the following modules and operations are defined. The PNDiff module creates unique, pseudo-random pairings between elements of the PNR and PNF groups using two seeded linear feedback shift registers (LFSR). The LFSRs are used to generate 11-bit addresses to access any of the 2048 PNR and PNF values. The two 11-bit LFSR seeds are configuration parameters. The PN differences are referred to as PND. The primary reason for creating PND is to increase the magnitude of within-die variations, i.e., path delay variations are doubled (in the best case) over those available in the PNR and PNF. Figure 3a shows an example of this process using a pairing of paths from the PNR and PNF sets. The graph contains curves for 500 PNR and 500 PNF, one for each of the 500 chip-instances. Although it is difficult to distinguish between the two groups in the figure, the PNF have a larger delay and are displayed above the PNR. The 13 line-connected points in each curve represent the PN measured under a range of environmental conditions, called temperature-voltage (TV) corners. The PN at the x-axis position given by 0 are those measured under nominal conditions (referred to as enrollment values below), i.e., at 25 °C, 1.00 V. The PN at positions 1, 2 and 3 are also measured at 25 °C but at supply voltages of 0.95, 1.00 and 1.05 V. Similarly, the other groups of three consecutive points along the x-axis are measured at these supply voltages but at temperatures 0 °C, −40 °C and 85 °C. The PN measured under TV corners numbered 1 to 12 are referred to as regeneration PN. Figure 3b plots the PND defined by subtracting pointwise, each PNF from a PNR for each chip-instance.   TV-related effects on delay negatively impact bitstring reproducibility. It is clear that subtraction alone, which is used to create the PND, is not effective at removing all of the variations introduced by different environmental conditions (if it was, the curves would be horizontal lines).
We propose a TV compensation (TVCOMP) process that is applied to the PND as a mechanism to eliminate most of the remaining temperature-voltage variations (called TV-noise).
TVCOMP is applied to the entire set of 2048 PND measured for each chip-instance at each of the 13 TV corners separately (note, Figure 3b shows only one of the PND from the larger set of 2048 that exist for each chip-instance and TV corner). The TVCOMP procedure first converts the PND to 'standardized' values. Equation (1) represents the first transformation, which makes use of two constants, i.e., µ chip (mean) and Rng chip (range), obtained by measuring the mean and range of the distribution defined by the PND. The second transformation is represented by Equation (2) (2) Figure 3c illustrates the effect of TVCOMP under these conditions. The PND c ('c' for compensated) plotted in the graph are obtained by applying the TVCOMP procedure to the 2048 PND measured under each of the 13 TV corners for each chip, i.e., 13 TV corners × 500 chip-instances = 6500 separate applications. Several features of TVCOMP are evident. First, the transformation significantly reduces TV-noise which is evident by the flatter curves (note that the scale used on the y-axis is amplified over that shown in Figure 3b). Second, global (chip-wide) performance differences are also nearly eliminated between the chip-instances, leaving only within-die variations. This is illustrated nicely by the highlighted red curves (25 instances) for chip 20 . The curves shown in Figure 3a,b for the 25 instances on chip 20 are grouped together, illustrating that these instances have similar performance characteristics as expected, since they are obtained from the same chip. However, the corresponding curves in Figure 3c are distributed across most of the y-range, and are indistinguishable from the 450 curves from the other 19 chip-instances. The dispersion of the chip 20 curves across the entire range illustrates that the random information leveraged by HELP is based on within-die variations (WDV), and not on global performance differences that occur from chip-to-chip.
The differences that remain in the PND c are those introduced by WDV and uncompensated TV noise (TVN). The range of TVN for the bottom-most curve in Figure 3c is labeled and is approx. 3, which translates to approx. 90 ps. In general, PND c with larger amounts of TVN are more likely to introduce bit flip errors. Therefore, it is desirable to make TVN as small as possible, and is the main driver for using the TVCOMP process.
The last operation applied to the PNs is represented by the Modulus operation shown on the right side of Figure 1. Modulus is a standard mathematical operation that computes the positive remainder after dividing by the modulus. The Modulus operation is required by HELP to eliminate the path length bias that exists in the PND c , which acts to reduce randomness and uniqueness in the generated bitstrings. The value of the Modulus is also a configuration parameter, similar to the LFSR seed, µ ref and Rng ref parameters, and is discussed further in the following. The term modPND c is used to refer to the values used in the bitstring generation process.

Offset Method
An optional offset can also be applied to PND c values prior to the application of the Modulus to further improve the statistical quality of the bitstrings. An offset is computed for each PND c separately in a characterization process. The offset is simply the median value of the PND c , derived using PNs from a sample of chips or from a nominal simulation. The offsets are transmitted to the token and are therefore a second component of the challenges. The token adds the individual offsets to each of the PND c as they are generated. The offset shifts the PND c upwards and centers the population over the 0-1 line associated with the Modulus. We use the term PND co to refer to the PND c with offsets applied. Since the offset is a population-based value, it leaks no information regarding the bit values generated from the modPND co (to be discussed).
As an example, three randomly selected PND c are shown in Figure 4. The PND c from the 500 chip-instances are given on the left in the same format as that used in Figure   using PNs from a sample of chips or from a nominal simulation. The offsets are transmitted to the token and are therefore a second component of the challenges. The token adds the individual offsets to each of the PNDc as they are generated. The offset shifts the PNDc upwards and centers the population over the 0-1 line associated with the Modulus. We use the term PNDco to refer to the PNDc with offsets applied. Since the offset is a population-based value, it leaks no information regarding the bit values generated from the modPNDco (to be discussed).
As an example, three randomly selected PNDc are shown in Figure 4. The PNDc from the 500 chip-instances are given on the left in the same format as that used in Figure 3c  The shift amounts are shown between the two sets of waveforms. The centering of the population over the 0-1 lines ensures that nearly equal numbers of chips produce 0 s and 1 s for each of the corresponding PNDco. We restrict the offset encoding to 4 bits, making it possible to shift the population by Modulus/(2 × 16). The additional factor of 2 in the denominator accounts for the fact that the maximum shift required to reach one of the 0-1 lines is half the Modulus.

Margining
A Margin technique is used to improve reliability by identifying and excluding bits that have the highest probability of 'flipping' from 0 to 1 or 1 to 0. As an illustration, Figure 5 plots 18 of the 2048 modPNDco from Chip1 along the x-axis. The red curve line-connects the data points obtained under enrollment conditions while the black curves line-connects data points under the 12 regeneration TV corners. A set of margins are shown of size 2 surrounding two strong bit regions of size 8. Designators along the top given as 's0', 's1', 'w0' and 'w1' classify each of the enrollment data points as either a strong 0 or 1, or a weak 0 or 1, resp. Data points that fall on or within the hatched areas are classified as weak as a mechanism to avoid bit flip errors introduced by uncompensated TV noise (TVN) that occurs during regeneration. The shift amounts are shown between the two sets of waveforms. The centering of the population over the 0-1 lines ensures that nearly equal numbers of chips produce 0 s and 1 s for each of the corresponding PND co . We restrict the offset encoding to 4 bits, making it possible to shift the population by Modulus/(2 × 16). The additional factor of 2 in the denominator accounts for the fact that the maximum shift required to reach one of the 0-1 lines is half the Modulus.

Margining
A Margin technique is used to improve reliability by identifying and excluding bits that have the highest probability of 'flipping' from 0 to 1 or 1 to 0. As an illustration, Figure 5 plots 18 of the 2048 modPND co from Chip 1 along the x-axis. The red curve line-connects the data points obtained under enrollment conditions while the black curves line-connects data points under the 12 regeneration TV corners. A set of margins are shown of size 2 surrounding two strong bit regions of size 8. Designators along the top given as 's0', 's1', 'w0' and 'w1' classify each of the enrollment data points as either a strong 0 or 1, or a weak 0 or 1, resp. Data points that fall on or within the hatched areas are classified as weak as a mechanism to avoid bit flip errors introduced by uncompensated TV noise (TVN) that occurs during regeneration. The Margin method improves bitstring reproducibility by eliminating data points classified as 'weak' in the bitstring generation process. For example, the data points at indexes 4, 6, 7, 8, 10 and 14 would introduce bit flip errors at one or more of the TV corners during regeneration because at least one of the regeneration data points is in the opposite bit value region, i.e., they cross one of the annotated 0-1 lines, from the corresponding enrollment value. A helper data string is constructed during enrollment that records the strong/weak status of each modPNDco, which is used during regeneration to identify which modPNDco generate bits (strong) and which are skipped (weak).

Entropy Analysis
The statistical analysis is carried out using the bitstrings generated from the 500 chip-instances. Entropy is defined by Equation (3) and MinEntropy by Equation (4). The frequency pij of '0' s and '1' s is computed at each bit position i across the 500 chip-instance bitstrings of size 2048 bits, i.e., no Margin is used in this analysis. Figure 6 plots incremental Entropy and MinEntropy for both the original modPNDco and the 4-bit offset technique using black and blue curves, resp., as chip-instances are added, one at a time, to the analysis (a similar analysis is presented in [33]). The x-axis gives the index of the chip-instance starting with two chip-instances on the left and ending with 500 chip-instances on the right. The 4-bit offset technique shifts and centers the population of chip-instances associated with each modPNDc over a 0-1 line as discussed in Section 3.3. The centering has a significant impact on Entropy and MinEntropy, which is reflected in the larger values and the gradual approach of the curves to the ideal value of 2048 as chip-instances are added. Figure 7a,b depict bar graphs of Entropy and MinEntropy for Moduli 10 through 30 (x-axis). The height of the bars represents the average values computed using the 2048-bit bitstrings from 500 chip-instances, averaged across 10 separate LFSR seeds. Entropy varies from 2037 to 2043, and is close to the ideal value of 2048 independent of the Moduli. MinEntropy varies between 1862 at Moduli 12 up to 1919, which indicates that, in the worst case, each bit contributes between 91% and 93.7% bits of Entropy. The Margin method improves bitstring reproducibility by eliminating data points classified as 'weak' in the bitstring generation process. For example, the data points at indexes 4, 6, 7, 8, 10 and 14 would introduce bit flip errors at one or more of the TV corners during regeneration because at least one of the regeneration data points is in the opposite bit value region, i.e., they cross one of the annotated 0-1 lines, from the corresponding enrollment value. A helper data string is constructed during enrollment that records the strong/weak status of each modPND co , which is used during regeneration to identify which modPND co generate bits (strong) and which are skipped (weak).

Entropy Analysis
The statistical analysis is carried out using the bitstrings generated from the 500 chip-instances. Entropy is defined by Equation (3) and MinEntropy by Equation (4). The frequency p ij of '0' s and '1' s is computed at each bit position i across the 500 chip-instance bitstrings of size 2048 bits, i.e., no Margin is used in this analysis. Figure 6 plots incremental Entropy and MinEntropy for both the original modPND co and the 4-bit offset technique using black and blue curves, resp., as chip-instances are added, one at a time, to the analysis (a similar analysis is presented in [33]). The x-axis gives the index of the chip-instance starting with two chip-instances on the left and ending with 500 chip-instances on the right. The 4-bit offset technique shifts and centers the population of chip-instances associated with each modPND c over a 0-1 line as discussed in Section 3.3. The centering has a significant impact on Entropy and MinEntropy, which is reflected in the larger values and the gradual approach of the curves to the ideal value of 2048 as chip-instances are added. Figure 7a,b depict bar graphs of Entropy and MinEntropy for Moduli 10 through 30 (x-axis). The height of the bars represents the average values computed using the 2048-bit bitstrings from 500 chip-instances, averaged across 10 separate LFSR seeds. Entropy varies from 2037 to 2043, and is close to the ideal value of 2048 independent of the Moduli. MinEntropy varies between 1862 at Moduli 12 up to 1919, which indicates that, in the worst case, each bit contributes between 91% and 93.7% bits of Entropy.

Uniqueness
The InterChip hamming distance (InterChipHD) results are shown in Figure 7c, again computed using the bitstrings from 500 chip-instances, averaged across 10 separate LFSR seed pairs. Hamming distance is computed between all possible pairings of bitstrings, i.e., 500 × 499/2 = 124,750 pairings for each seed and then averaged.
The values for a set of Margins of size 2 through 4 (y-axis) are shown for each of the Moduli. Figure 8 provides an illustration of the process used for dealing with weak and strong bits under the Margin scheme in the InterchipHD calculation. The helper data bitstrings HelpD and raw bitstrings BitStr for two chips Cx and Cy are shown along the top and bottom of the figure, resp. The HelpD bitstrings classify the corresponding raw bit as weak using a '0' and as strong using a '1'. The InterchipHD is computed by XOR'ing only those BitStr bits from the Cx and Cy that have both HelpD bits set to '1', i.e., both raw bits are classified as strong. This process maintains alignment in the two bitstrings and ensures the same modPNDc from Cx and Cy are being used in the InterchipHD calculation.
InterChip HD, HDInter, is computed using Equation (5). The symbols NC, NBa and NCC represent 'number of chips', 'number of bits' and 'number of chip combinations', resp. (NCC is 124,750 as indicated above) This equation simply sums all the bitwise differences between each of the possible pairing of chip-instance bitstrings BS as described above and then converts the sum into a percentage by dividing by the total number of bits that were examined. Bit cnter from the center of Figure 8 counts the number of bits that are used for NBa in Equation (5), which varies for each pairing of chip-instances a. The HDInter is computed separately for each of the 10 seeds and the

Uniqueness
The InterChip hamming distance (InterChipHD) results are shown in Figure 7c, again computed using the bitstrings from 500 chip-instances, averaged across 10 separate LFSR seed pairs. Hamming distance is computed between all possible pairings of bitstrings, i.e., 500 × 499/2 = 124,750 pairings for each seed and then averaged.
The values for a set of Margins of size 2 through 4 (y-axis) are shown for each of the Moduli. Figure 8 provides an illustration of the process used for dealing with weak and strong bits under the Margin scheme in the InterchipHD calculation. The helper data bitstrings HelpD and raw bitstrings BitStr for two chips Cx and Cy are shown along the top and bottom of the figure, resp. The HelpD bitstrings classify the corresponding raw bit as weak using a '0' and as strong using a '1'. The InterchipHD is computed by XOR'ing only those BitStr bits from the Cx and Cy that have both HelpD bits set to '1', i.e., both raw bits are classified as strong. This process maintains alignment in the two bitstrings and ensures the same modPNDc from Cx and Cy are being used in the InterchipHD calculation.
InterChip HD, HDInter, is computed using Equation (5). The symbols NC, NBa and NCC represent 'number of chips', 'number of bits' and 'number of chip combinations', resp. (NCC is 124,750 as indicated above) This equation simply sums all the bitwise differences between each of the possible pairing of chip-instance bitstrings BS as described above and then converts the sum into a percentage by dividing by the total number of bits that were examined. Bit cnter from the center of Figure 8 counts the number of bits that are used for NBa in Equation (5), which varies for each pairing of chip-instances a. The HDInter is computed separately for each of the 10 seeds and the

Uniqueness
The InterChip hamming distance (InterChipHD) results are shown in Figure 7c, again computed using the bitstrings from 500 chip-instances, averaged across 10 separate LFSR seed pairs. Hamming distance is computed between all possible pairings of bitstrings, i.e., 500 × 499/2 = 124,750 pairings for each seed and then averaged.
The values for a set of Margins of size 2 through 4 (y-axis) are shown for each of the Moduli. Figure 8 provides an illustration of the process used for dealing with weak and strong bits under the Margin scheme in the InterchipHD calculation. The helper data bitstrings HelpD and raw bitstrings BitStr for two chips Cx and Cy are shown along the top and bottom of the figure, resp. The HelpD bitstrings classify the corresponding raw bit as weak using a '0' and as strong using a '1'. The InterchipHD is computed by XOR'ing only those BitStr bits from the Cx and Cy that have both HelpD bits set to '1', i.e., both raw bits are classified as strong. This process maintains alignment in the two bitstrings and ensures the same modPNDc from Cx and Cy are being used in the InterchipHD calculation.
InterChip HD, HD Inter , is computed using Equation (5). The symbols NC, NB a and NCC represent 'number of chips', 'number of bits' and 'number of chip combinations', resp. (NCC is 124,750 as indicated above) This equation simply sums all the bitwise differences between each of the possible pairing of chip-instance bitstrings BS as described above and then converts the sum into a percentage by dividing by the total number of bits that were examined. Bit cnter from the center of Figure 8 counts the number of bits that are used for NB a in Equation (5), which varies for each pairing of chip-instances a. The HD Inter is computed separately for each of the 10 seeds and the average value is given in Figure 7c. The HD inter vary from 49.4% to 51.2% and therefore are close to the ideal value of 50%. average value is given in Figure 7c. The HDinter vary from 49.4% to 51.2% and therefore are close to the ideal value of 50%.

NIST Test Evaluation
The NIST statistical test suite is used to evaluate randomness of the bitstrings [34]. The bitstrings are constructed as described above for Interchip HD. All tests are passed with at least 488 bitstrings passing of the 500 bitstrings as required by NIST except for CummulativeSums (NIST test #4) under two Moduli. The two failing cases failed with 487 and 482 bitstrings passing, resp., so the failures were only by at most six chips in the worst case.

Correlation Analysis
Correlation analysis measures whether a relationship exists between modPNDco in which the bit response from one allows the response from a second to be predicted with probability greater than 50%. All strong PUF architectures to date have the potential to exhibit correlation because the 2 n response bits are generated from a much smaller set of m components, with the m components representing the underlying random variables. For the case of a 64-stage Arbiter PUF, the 256 path segments are all reused in every challenge, and therefore, the potential for correlation introduced by path segment reuse is very high. HELP also reuses path segments, but the probability of two paths sharing a large number of path segments is very small. The following analysis focuses on the reuse of path segments within HELP despite the fact that, in practice, it is statistically rare.
Our correlation analysis of path segment reuse (called Partial Reuse) is carried out using a set of 'unique' paths, and therefore, it ensures that at least one path segment is different in any pairing of PN used to create PND, PNDc, PNDco and modPNDc (note: we refer to PNDc in the following because the analysis focuses on how the Offset and Modulus operations affect the results). An example of partial reuse is shown in Figure 9. The highlighted red wire on the left indicates that the two paths, labeled 'path #1' and 'path #2', share all of the initial path segments, and are only different at the fanout point where they diverge into LUTa and LUTb. The two paths then reconverge at the next gate and form a 'bubble' structure.

NIST Test Evaluation
The NIST statistical test suite is used to evaluate randomness of the bitstrings [34]. The bitstrings are constructed as described above for Interchip HD. All tests are passed with at least 488 bitstrings passing of the 500 bitstrings as required by NIST except for CummulativeSums (NIST test #4) under two Moduli. The two failing cases failed with 487 and 482 bitstrings passing, resp., so the failures were only by at most six chips in the worst case.

Correlation Analysis
Correlation analysis measures whether a relationship exists between modPND co in which the bit response from one allows the response from a second to be predicted with probability greater than 50%. All strong PUF architectures to date have the potential to exhibit correlation because the 2 n response bits are generated from a much smaller set of m components, with the m components representing the underlying random variables. For the case of a 64-stage Arbiter PUF, the 256 path segments are all reused in every challenge, and therefore, the potential for correlation introduced by path segment reuse is very high. HELP also reuses path segments, but the probability of two paths sharing a large number of path segments is very small. The following analysis focuses on the reuse of path segments within HELP despite the fact that, in practice, it is statistically rare.
Our correlation analysis of path segment reuse (called Partial Reuse) is carried out using a set of 'unique' paths, and therefore, it ensures that at least one path segment is different in any pairing of PN used to create PND, PND c , PND co and modPND c (note: we refer to PND c in the following because the analysis focuses on how the Offset and Modulus operations affect the results). An example of partial reuse is shown in Figure 9. The highlighted red wire on the left indicates that the two paths, labeled 'path #1' and 'path #2', share all of the initial path segments, and are only different at the fanout point where they diverge into LUT a and LUT b . The two paths then reconverge at the next gate and form a 'bubble' structure.
It is also possible to pair the same PNs in different combinations to produce a much larger set of PND c (on order of n 2 with n PNs). We refer to this as Full Reuse. Full path reuse can result in dependent bits, i.e, bits that are completely determined by other bits. Reference [25] investigates these dependencies for ROs and proposes schemes designed to eliminate and/or reduce the number of dependent bits.
'unique' paths, and therefore, it ensures that at least one path segment is different in any pairing of PN used to create PND, PNDc, PNDco and modPNDc (note: we refer to PNDc in the following because the analysis focuses on how the Offset and Modulus operations affect the results). An example of partial reuse is shown in Figure 9. The highlighted red wire on the left indicates that the two paths, labeled 'path #1' and 'path #2', share all of the initial path segments, and are only different at the fanout point where they diverge into LUTa and LUTb. The two paths then reconverge at the next gate and form a 'bubble' structure. Figure 9. Hamming distance illustration for results shown in Figure 7. Reuse worst-case example of two paths forming a 'bubble'. The path segments which define the bubble are unique to each path while the remaining components are common to both paths.  Figure 7. Reuse worst-case example of two paths forming a 'bubble'. The path segments which define the bubble are unique to each path while the remaining components are common to both paths.
We show in the following that the Offset and Modulus operations break the correlations found in classic dependency analysis typically exemplified using RO frequencies as f(RO A ) > f(RO B ) and f(RO B ) > f(RO C ) implies f(RO A ) > f(RO C ). Therefore, partial reuse and full reuse of paths have a smaller penalty in terms of Entropy and MinEntropy when they occur within HELP.

Preliminaries
As indicated earlier, the HELP algorithm creates differences (PND) between PNR and PNF using a pair of LFSR seeds, which are then compensated using TVComp to produce PND c . A key objective of our analysis is to purposely create worst case conditions for correlations by crafting the PND such that partial reuse and full reuse test cases are created. The analysis of correlations requires the set of PND that are constructed to be adjacent to each other in the arrays on which the analysis is performed. Therefore, the LFSRs used in the HELP algorithm are not used to create the PND and instead a linear, sequential pairing strategy is used.
The Offset and Modulus operations in the HELP algorithm are the key components to improving Entropy. As an aid to help with the discussion that follows, Figure 10 illustrates how these two operators modify the PND c . The figure shows four groups of 10 vertical line graphs, with each line graph containing 500 PND c data points corresponding to the 500 chip-instances. The line graph on the left and bottom illustrates that the vertical spread in the line-connected points is caused by within-die delay variations.
Cryptography 2017, 1, 8 11 of 19 It is also possible to pair the same PNs in different combinations to produce a much larger set of PNDc (on order of n 2 with n PNs). We refer to this as Full Reuse. Full path reuse can result in dependent bits, i.e, bits that are completely determined by other bits. Reference [25] investigates these dependencies for ROs and proposes schemes designed to eliminate and/or reduce the number of dependent bits.
We show in the following that the Offset and Modulus operations break the correlations found in classic dependency analysis typically exemplified using RO frequencies as f(ROA) > f(ROB) and f(ROB) > f(ROC) implies f(ROA) > f(ROC). Therefore, partial reuse and full reuse of paths have a smaller penalty in terms of Entropy and MinEntropy when they occur within HELP.

Preliminaries
As indicated earlier, the HELP algorithm creates differences (PND) between PNR and PNF using a pair of LFSR seeds, which are then compensated using TVComp to produce PNDc. A key objective of our analysis is to purposely create worst case conditions for correlations by crafting the PND such that partial reuse and full reuse test cases are created. The analysis of correlations requires the set of PND that are constructed to be adjacent to each other in the arrays on which the analysis is performed. Therefore, the LFSRs used in the HELP algorithm are not used to create the PND and instead a linear, sequential pairing strategy is used.
The Offset and Modulus operations in the HELP algorithm are the key components to improving Entropy. As an aid to help with the discussion that follows, Figure 10 illustrates how these two operators modify the PNDc. The figure shows four groups of 10 vertical line graphs, with each line graph containing 500 PNDc data points corresponding to the 500 chip-instances. The line graph on the left and bottom illustrates that the vertical spread in the line-connected points is caused by within-die delay variations.  The Reference PND c shown on the left are the compensated differences before the Offset and Modulus operations are applied. The DC bias introduced by differences in the lengths of the paths changes the vertical positions of the line graphs, which spans a range from −72 to +40 launch-capture intervals (LCIs) (Recall that 1 LCI = 18 ps, and represents the phase adjustment resolution of the Xilinx DCM.). The Offset and Modulus operations are designed to increase the Entropy in the PND c by eliminating this bias. For example, the No Offset, Mod group show the PND c from the Reference PND c group after a Modulus of 24 is applied. Similarly, the Offset, No Mod group show the Reference PND c after subtracting the median value from each line graph, which effectively centers the populations of 500 PND co over the 0 horizontal line. Finally, the Offset, Mod group shows the PND c with both operations applied, and represents the values used in the HELP algorithm. Here, an Offset is first applied to center the populations over the closest multiple of 12 and then a modulus of 24 is applied (the boundaries used to separate the '0' and '1' bit values are 12 and 24 for a Modulus of 24, see Figure 5). We analyze the change in Entropy and MinEntropy as each of the operations are applied. Note that HELP processes 2048 PND c at a time during bitstring generation, of which only 10 are shown in Figure 10.

Partial Reuse
Although we defined path segment reuse above as a pair of PNs with at least one path segment that is different in a given PND c , we do not want to restrict our analysis to these types of specific physical characteristics but instead want to analyze the actual worst case. The Xilinx Vivado implementation view does not provide information that directly reflects the chip layout, and therefore, a broader approach to correlation analysis is required to ensure the worst case correlations are found.
We use Pearson's correlation coefficient (PCC) [35] to measure the degree of correlation that exists among PND c and then select a subset of the most highly correlated for Entropy and MinEntropy analyses. Figure 11 depicts the construction process used to create an exhaustive set of PND c , from which the most highly correlated are identified. In order to simplify the construction process, the TVComp operation is applied to a set of 2048 PNR and 2048 PNF separately for each of the 500 chip-instances (HELP normally applies TVComp only once, and to the PND as discussed in Section 3.2, for processing efficiency reasons, but the results using either method are nearly identical.). Note the 'c' subscript is not used in the PNR/PNF designation for clarity. TVComp eliminates chip-to-chip delay variations and makes it possible to compare data from all chips directly in the following analysis. is first applied to center the populations over the closest multiple of 12 and then a modulus of 24 is applied (the boundaries used to separate the '0' and '1' bit values are 12 and 24 for a Modulus of 24, see Figure 5). We analyze the change in Entropy and MinEntropy as each of the operations are applied. Note that HELP processes 2048 PNDc at a time during bitstring generation, of which only 10 are shown in Figure 10.

Partial Reuse
Although we defined path segment reuse above as a pair of PNs with at least one path segment that is different in a given PNDc, we do not want to restrict our analysis to these types of specific physical characteristics but instead want to analyze the actual worst case. The Xilinx Vivado implementation view does not provide information that directly reflects the chip layout, and therefore, a broader approach to correlation analysis is required to ensure the worst case correlations are found.
We use Pearson's correlation coefficient (PCC) [35] to measure the degree of correlation that exists among PNDc and then select a subset of the most highly correlated for Entropy and MinEntropy analyses. Figure 11 depicts the construction process used to create an exhaustive set of PNDc, from which the most highly correlated are identified. In order to simplify the construction process, the TVComp operation is applied to a set of 2048 PNR and 2048 PNF separately for each of the 500 chip-instances (HELP normally applies TVComp only once, and to the PND as discussed in Section 3.2, for processing efficiency reasons, but the results using either method are nearly identical.). Note the 'c' subscript is not used in the PNR/PNF designation for clarity. TVComp eliminates chip-to-chip delay variations and makes it possible to compare data from all chips directly in the following analysis. Figure 11. PND pairing creation process for partial reuse analysis using Pearson's correlation coefficient. Note: all PN are TVCOMP'ed but subscript 'c' is removed for clarity.
Only one of the PNR, PNR0, is used to create a set of 2048 PNDc by pairing it as shown with each of the PNF. Correlations that occur in the generated bitstring are rooted in correlations among the PNDc. Therefore, the 2048 PNDc are themselves paired, this time with each other under all combinations for 2048 × 2047/2 = 2,096,128 pairing combinations. The same process is carried out using the first PNF, PNFo, with all of the PNR (not shown) to create a second set of PNDc, which are again paired under all combinations. We use only one rising reference PN, PNR0, and one falling reference PN, PNF0, because the value of the PCC is identical for other choices of these references.
For each of the 2 million+ PNDc pairings, the Pearson correlation coefficient (PCC) given by Equation (6) is computed using enrollment data from the 500 chip-instances. PCC can vary from highly correlated (−1.0 and 1.0) to no correlation (0.0). The absolute value of the PCC in each group of 2 million+ rising and falling PNDc are then sorted from high to low. Scatterplots of the most highly and least correlated PNDc pairings are shown in Figure 12 from the larger set of more than 4 million pairings. The most highly correlated 1024 PNDc pairings (for a total of 2048 PNDc since each pairing contains two PNDc) are used in the bitstring generation process for the Entropy and Conditional Figure 11. PND pairing creation process for partial reuse analysis using Pearson's correlation coefficient. Note: all PN are TVCOMP'ed but subscript 'c' is removed for clarity.
Only one of the PNR, PNR 0 , is used to create a set of 2048 PND c by pairing it as shown with each of the PNF. Correlations that occur in the generated bitstring are rooted in correlations among the PND c . Therefore, the 2048 PND c are themselves paired, this time with each other under all combinations for 2048 × 2047/2 = 2,096,128 pairing combinations. The same process is carried out using the first PNF, PNF o , with all of the PNR (not shown) to create a second set of PND c , which are again paired under all combinations. We use only one rising reference PN, PNR 0 , and one falling reference PN, PNF 0 , because the value of the PCC is identical for other choices of these references.
For each of the 2 million+ PND c pairings, the Pearson correlation coefficient (PCC) given by Equation (6) is computed using enrollment data from the 500 chip-instances. PCC can vary from highly correlated (−1.0 and 1.0) to no correlation (0.0). The absolute value of the PCC in each group of 2 million+ rising and falling PND c are then sorted from high to low. Scatterplots of the most highly and least correlated PND c pairings are shown in Figure 12 from the larger set of more than 4 million pairings. The most highly correlated 1024 PND c pairings (for a total of 2048 PND c since each pairing contains two PND c ) are used in the bitstring generation process for the Entropy and Conditional MinEntropy (CmE) evaluation below. Highly correlated PND c are stored as adjacent values to facilitate analysis of the corresponding 2-bit sequences: The 2048 PNDc are processed into bitstrings under four different scenarios as shown in Figure  10. For example, the PNDc are compared to a global mean under the Reference scenario (see annotation in figure). The global mean is the average PNDc across all chip-instances and all 2048 PNDc (500 × 2048). A '0' is assigned to the bitstring for cases in which the PNDc for a chip-instance falls below the global mean and a '1' otherwise. Given the large DC bias associated with the PNDc under the Reference scenario, the Entropy and CmE statistics are expected to be very poor.
The No Offset, Mod and Offset, Mod bitstring generation scenarios use the value 12 as the boundary between '0' and '1' (for Modulus 24 as shown in the figure), i.e., PNDc ≥ 0 and < 12 produce a '0' and those ≥ 12 and < 24 produce a '1'. The '0'-'1' boundary for the Offset, No Mod scenario is 0 and the sign bit is used to assign '0' (for negative PNDc) and '1' (for positive PNDc). The Offset, Mod scenario represents the operations performed by the HELP algorithm. The analysis is extended for this scenario by evaluating Entropy and CmE over Moduli between 14 and 30 to fully illustrate the impact of the Modulus operation.
The PNDc from a normal use case are also analyzed using these four bitstring generation scenarios to determine how much Entropy/CmE is lost when compared to the highly correlated case analysis. For the normal use case, no attempt is made to correlate PNDc and instead random pairings of PNR and PNF are used to construct the PNDc. Table 1 provides a summary of the eight scenarios investigated.  Figure 13 provides a graphic that depicts the process used to compute Entropy and Conditional MinEntropy (CmE), (modeled after the technique proposed in [31]). As indicated earlier, highly correlated PNDc and the corresponding bits that they generate are kept in adjacent positions in the array. The bitstrings are of length 2048. Therefore, each chip-instance provides 1024 sets of 2-bit sequences. The 2048 PND c are processed into bitstrings under four different scenarios as shown in Figure 10. For example, the PND c are compared to a global mean under the Reference scenario (see annotation in figure). The global mean is the average PND c across all chip-instances and all 2048 PND c (500 × 2048). A '0' is assigned to the bitstring for cases in which the PND c for a chip-instance falls below the global mean and a '1' otherwise. Given the large DC bias associated with the PND c under the Reference scenario, the Entropy and CmE statistics are expected to be very poor.
The No Offset, Mod and Offset, Mod bitstring generation scenarios use the value 12 as the boundary between '0' and '1' (for Modulus 24 as shown in the figure), i.e., PND c ≥ 0 and < 12 produce a '0' and those ≥ 12 and < 24 produce a '1'. The '0'-'1' boundary for the Offset, No Mod scenario is 0 and the sign bit is used to assign '0' (for negative PND c ) and '1' (for positive PND c ). The Offset, Mod scenario represents the operations performed by the HELP algorithm. The analysis is extended for this scenario by evaluating Entropy and CmE over Moduli between 14 and 30 to fully illustrate the impact of the Modulus operation.
The PND c from a normal use case are also analyzed using these four bitstring generation scenarios to determine how much Entropy/CmE is lost when compared to the highly correlated case analysis. For the normal use case, no attempt is made to correlate PND c and instead random pairings of PNR and PNF are used to construct the PND c . Table 1 provides a summary of the eight scenarios investigated.  Figure 13 provides a graphic that depicts the process used to compute Entropy and Conditional MinEntropy (CmE), (modeled after the technique proposed in [31]). As indicated earlier, highly correlated PND c and the corresponding bits that they generate are kept in adjacent positions in the array. The bitstrings are of length 2048. Therefore, each chip-instance provides 1024 sets of 2-bit sequences. Equation (7) is used to compute the Entropy of the 1024 2-bit sequences for each chip-instance, which is then divided by 1024 to convert into Entropy/bit. The pi represents the frequencies of the four 2-bit patterns as given in Figure 13. The Entropy/bit value reported below is the average of 500 chip-instance values. CmE is computed using Equation (8) (also from [31]). The expression max(pX/pW) represents the maximum conditional probability among the four values computed for each 2-bit sequence. Again, the sum over the 1024 2-bit sequences is converted to CmE/bit for each chip-instance and the average across all 500 chip-instances is reported: ( | ) = − log .
The Entropy and CmE results are plotted in Figure 14 for both the highly correlated and normal use scenarios. The x-axis represents the experiment, with 0 plotting the results using the Reference bitstring generation method (from Figure 10), 1 representing the No Offset, Mod, 2 representing Offset, No Mod and 3 through 11 representing the Offset, Mod method for Moduli between 30 and 14, respectively. The maximum Entropy/bit is 2 while the maximum CmE is 1. From the trends, it is clear that both Offset and Modulus improve the statistical quality of the bitstrings over the Reference. However, Modulus appears to provide the biggest benefit, which is captured by the drops in Entropy and CmE for experiment 2 in which the Modulus is not applied. Moreover, the loss in Entropy is almost zero between the normal use and highly correlated scenarios and CmE drops on average by only 0.2 bits for experiments 3 through 11 for the Offset, Mod method. Therefore, partial reuse under worst case conditions introduces only a small penalty on the quality of the bitstrings generated by the HELP algorithm. Equation (7) is used to compute the Entropy of the 1024 2-bit sequences for each chip-instance, which is then divided by 1024 to convert into Entropy/bit. The p i represents the frequencies of the four 2-bit patterns as given in Figure 13. The Entropy/bit value reported below is the average of 500 chip-instance values. CmE is computed using Equation (8) (also from [31]). The expression max(p X /p W ) represents the maximum conditional probability among the four values computed for each 2-bit sequence. Again, the sum over the 1024 2-bit sequences is converted to CmE/bit for each chip-instance and the average across all 500 chip-instances is reported: The Entropy and CmE results are plotted in Figure 14 for both the highly correlated and normal use scenarios. The x-axis represents the experiment, with 0 plotting the results using the Reference bitstring generation method (from Figure 10), 1 representing the No Offset, Mod, 2 representing Offset, No Mod and 3 through 11 representing the Offset, Mod method for Moduli between 30 and 14, respectively. The maximum Entropy/bit is 2 while the maximum CmE is 1. From the trends, it is clear that both Offset and Modulus improve the statistical quality of the bitstrings over the Reference. However, Modulus appears to provide the biggest benefit, which is captured by the drops in Entropy and CmE for experiment 2 in which the Modulus is not applied. Moreover, the loss in Entropy is almost zero between the normal use and highly correlated scenarios and CmE drops on average by only 0.2 bits for experiments 3 through 11 for the Offset, Mod method. Therefore, partial reuse under worst case conditions introduces only a small penalty on the quality of the bitstrings generated by the HELP algorithm.
bitstrings over the Reference. However, Modulus appears to provide the biggest benefit, which is captured by the drops in Entropy and CmE for experiment 2 in which the Modulus is not applied. Moreover, the loss in Entropy is almost zero between the normal use and highly correlated scenarios and CmE drops on average by only 0.2 bits for experiments 3 through 11 for the Offset, Mod method. Therefore, partial reuse under worst case conditions introduces only a small penalty on the quality of the bitstrings generated by the HELP algorithm.   Figure 10 and cases as given in Table 1.

Full Reuse
Full reuse refers to the repeated use of the PN in multiple PND c as shown for the 2-PN reuse example in Figure 15. Here, two rise PN, PNR 0 and PNR 1 are paired in all combinations with two fall PN, PNF 0 and PNF 1. The traditional analysis predicts that, because of correlation, only a subset of the16 possible bit patterns can be generated when using PND A through PND D to generate a 4-bit response. In particular, patterns "0110" and "1001" are not possible. However, as indicated earlier, the Modulus and Offset operations break the classical dependencies and allow all patterns to be generated, as we show below.

Full Reuse
Full reuse refers to the repeated use of the PN in multiple PNDc as shown for the 2-PN reuse example in Figure 15. Here, two rise PN, PNR0 and PNR1 are paired in all combinations with two fall PN, PNF0 and PNF1. The traditional analysis predicts that, because of correlation, only a subset of the16 possible bit patterns can be generated when using PNDA through PNDD to generate a 4-bit response. In particular, patterns "0110" and "1001" are not possible. However, as indicated earlier, the Modulus and Offset operations break the classical dependencies and allow all patterns to be generated, as we show below. The frequency of the 16 patterns for the 2-PN experiment are shown in Figure 16. Here, PNDc are created for each of the 500 chip-instances according to the illustration in Figure 15. With 2048 bits/chip-instance, there are 512 4-bit columns each with 500 instances. The graph simply plots the percentage of each pattern across this set of 500 × 512 = 256,000 samples under each of the PNDc scenarios described earlier with reference to Figure 10. The ideal distribution is uniform with the percentage 1/16 × 100 = 6.25% for each 'Pattern Bin' along the x-axis as annotated in the figure. The frequency of the 16 patterns for the 2-PN experiment are shown in Figure 16. Here, PND c are created for each of the 500 chip-instances according to the illustration in Figure 15. With 2048 bits/chip-instance, there are 512 4-bit columns each with 500 instances. The graph simply plots the percentage of each pattern across this set of 500 × 512 = 256,000 samples under each of the PND c scenarios described earlier with reference to Figure 10. The ideal distribution is uniform with the percentage 1/16 × 100 = 6.25% for each 'Pattern Bin' along the x-axis as annotated in the figure.
The distributions associated with the Reference (black) and Offset, No Mod (red) experiments are clearly not uniform. Pattern bins 6 and 9 are zero for Reference, as predicted by the classical dependency analysis. Although the differences are small, the Offset, No Mod distribution is slightly better with non-zero values in pattern bins 6 and 9 and most of the other pattern bins closer to the ideal value of 6.25%. The Modulus operation, particularly in combination with the Offset operation, produce much better results. The percentages for the Offset, Mod experiment (yellow curve) vary by at most 1.2% from the ideal value of 6.25%.
are created for each of the 500 chip-instances according to the illustration in Figure 15. With 2048 bits/chip-instance, there are 512 4-bit columns each with 500 instances. The graph simply plots the percentage of each pattern across this set of 500 × 512 = 256,000 samples under each of the PNDc scenarios described earlier with reference to Figure 10. The ideal distribution is uniform with the percentage 1/16 × 100 = 6.25% for each 'Pattern Bin' along the x-axis as annotated in the figure. Figure 16. Frequency of 4-bit patterns for bin 0 with pattern "0000" through bin 15 with pattern "1111" using 2-PN reuse data under four scenarios from Figure 10. Ideal frequency value is 1/16 = 6.25%. Reference PNDc exhibits the worst case behavior with frequencies of 0% for patterns "0110" and "1001", while Offset, Mod exhibits the best behavior.
The distributions associated with the Reference (black) and Offset, No Mod (red) experiments are clearly not uniform. Pattern bins 6 and 9 are zero for Reference, as predicted by the classical dependency analysis. Although the differences are small, the Offset, No Mod distribution is slightly better with non-zero values in pattern bins 6 and 9 and most of the other pattern bins closer to the Figure 16. Frequency of 4-bit patterns for bin 0 with pattern "0000" through bin 15 with pattern "1111" using 2-PN reuse data under four scenarios from Figure 10. Ideal frequency value is 1/16 = 6.25%. Reference PNDc exhibits the worst case behavior with frequencies of 0% for patterns "0110" and "1001", while Offset, Mod exhibits the best behavior.
The positive impact of the Offset and Modulus operations on Entropy is further supported by an analysis carried out in a 3-PN experiment, where 3 rise and 3 fall PN are combined under all combinations to produce a 9-bit column (analogous to 2-PN illustration in Figure 15). With 9-bit columns, there are 512 possible pattern bins. Using the 2048 bitstrings from 500 chip-instances, we were able to construct 227 full 9-bit columns (left over columns were discarded), for a total sample size of 113,500. A scatterplot showing the results for the 3-PN experiment is given in Figure 17 using Offset, Mod PND c bitstring data (black dots). The ideal percentage is 1/512 × 100 = 0.195%. As a reference, the results using PND c constructed without reusing any rising or falling PN (referred to as the normal use scenario above) are superimposed in blue. The smaller variation of the frequencies under the normal use scenario, when compared with the 3-PN full reuse scenario, clearly shows that there is a penalty associated with reuse, but none of the pattern bins are empty in either case and most of the frequency values are within 0.1% of the ideal value at 0.195%. ideal value of 6.25%. The Modulus operation, particularly in combination with the Offset operation, produce much better results. The percentages for the Offset, Mod experiment (yellow curve) vary by at most 1.2% from the ideal value of 6.25%. The positive impact of the Offset and Modulus operations on Entropy is further supported by an analysis carried out in a 3-PN experiment, where 3 rise and 3 fall PN are combined under all combinations to produce a 9-bit column (analogous to 2-PN illustration in Figure 15). With 9-bit columns, there are 512 possible pattern bins. Using the 2048 bitstrings from 500 chip-instances, we were able to construct 227 full 9-bit columns (left over columns were discarded), for a total sample size of 113,500. A scatterplot showing the results for the 3-PN experiment is given in Figure 17 using Offset, Mod PNDc bitstring data (black dots). The ideal percentage is 1/512 × 100 = 0.195%. As a reference, the results using PNDc constructed without reusing any rising or falling PN (referred to as the normal use scenario above) are superimposed in blue. The smaller variation of the frequencies under the normal use scenario, when compared with the 3-PN full reuse scenario, clearly shows that there is a penalty associated with reuse, but none of the pattern bins are empty in either case and most of the frequency values are within 0.1% of the ideal value at 0.195%. Figure 17. Frequency of 9-bit patterns for bin 0 with pattern "000,000,000" through bin 512 with pattern "111,111,111" using 3-PN reuse data (black) and normal data (blue). The distribution should be uniform with each bin percentage at 1/512 = 0.195% as shown by the dotted line. Table 2 presents the MinEntropy computed using Equation (4)  In all cases, except for row 3, column 2, the MinEntropy values in the last row are larger than those in the first three rows. Moreover, the drop in MinEntropy over the normal use case scenario in the last row is 0.19, 1.45 and 1.9 bits, resp., illustrating that the penalty associated with reuse is Figure 17. Frequency of 9-bit patterns for bin 0 with pattern "000,000,000" through bin 512 with pattern "111,111,111" using 3-PN reuse data (black) and normal data (blue). The distribution should be uniform with each bin percentage at 1/512 = 0.195% as shown by the dotted line. Table 2 presents the MinEntropy computed using Equation (4)  In all cases, except for row 3, column 2, the MinEntropy values in the last row are larger than those in the first three rows. Moreover, the drop in MinEntropy over the normal use case scenario in the last row is 0.19, 1.45 and 1.9 bits, resp., illustrating that the penalty associated with reuse is very modest.

Conclusions
An analysis of the statistical characteristics of a Hardware-Embedded Delay PUF (HELP) are presented in this paper, with emphasis on Interchip Hamming Distance, Entropy, MinEntropy, conditional MinEntropy and NIST statistical test results. The bitstrings generated by the HELP algorithm are shown to exhibit excellent statistical quality. An experiment focused on purposely constructing worst case correlations among path delays is also described as a means of demonstrating the Entropy-enhancing benefit of the Offset and Modulus operations carried out by the HELP algorithm. Special data sets are constructed which maximize physical correlations and dependencies introduced by reusing components of the underlying Entropy. Although statistical quality is reduced under these worst case conditions, the reduction is modest. Therefore, the Modulus and Offset operations harden the HELP algorithm against model-building attacks.
A quantitative analysis of the relationship between Entropy as presented in this paper and the level of effort required to carry out model-building attacks on HELP is the subject of a future work. Developing a formal quantitative framework that expresses the relationship between Entropy and model-building effort is inherently difficult because of the vastly different mathematical domains on which each is based. Best practices relating Entropy to security properties that predict attack resilience is focused on correlating results from separate analyses of Entropy and model-building resistance. A thorough treatment of model-building resistance requires a wide range of machine-learning experiments. Work on this topic is on-going and will be reported in a separate paper in the near future.