An Optimality Summary: Secret Key Agreement with Physical Unclonable Functions

We address security and privacy problems for digital devices and biometrics from an information-theoretic optimality perspective to conduct authentication, message encryption/decryption, identification or secure and private computations by using a secret key. A physical unclonable function (PUF) provides local security to digital devices and this review gives the most relevant summary for information theorists, coding theorists, and signal processing community members who are interested in optimal PUF constructions. Low-complexity signal processing methods are applied to simplify information-theoretic analyses. The best trade-offs between the privacy-leakage, secret-key, and storage rates are discussed. Proposed optimal constructions that jointly design the vector quantizer and error-correction code parameters are listed. These constructions include modern and algebraic codes such as polar codes and convolutional codes, both of which can achieve small block-error probabilities at short block lengths, corresponding to a small number of PUF circuits. Open problems in the PUF literature from signal processing, information theory, coding theory, and hardware complexity perspectives and their combinations are listed to stimulate further advancements in the research on local privacy and security.


Motivations
Fundamental advances in cryptography were made in secret during the 20th century.One exception was Claude E. Shannon's paper "Communication Theory of Secrecy Systems" [1].Until 1967, the literature on security was not extensive, but a book [2] with a historical review of cryptography changed this trend [3].Since then, the amount of sensitive data to be protected against attackers has increased significantly.Continuous improvements in security are needed and every improvement creates new possibilities for attacks [4].
Recent hardware-intrinsic security systems, biometric secrecy systems, 5th generation of cellular mobile communication networks (5G) and beyond as well as the emerging internet of things (IoT) networks, have several salient characteristics that differentiate them from existing architectures.These include the deployment of large numbers of possibly low-complexity terminals with light or no infrastructure, stringent constraints on latency, and primary applications of data gathering, inference, and control.These unique characteristics call for a rethinking of some of the fundamentals of data communications and storage.For instance, these characteristics make it very challenging to provide adequate secrecy and privacy primitives.In particular, traditional cryptographic protocols, which require key distribution or certificate management, might not be suitable to support the diverse applications supported by these technologies and might not be able to assure the privacy of personal information intrinsic in the data collected by such applications.Furthermore, low-complexity terminals may not have the processing power to implement such protocols, and even if they do, latency tolerances may not permit the processing time needed for cryptographic operations.Similarly, traditional methods of storing a secret key in a secure non-volatile memory (NVM) can be shown to be not secure due to possible invasive attacks to the hardware.Thus, secrecy and privacy for information systems are issues that need to be rethought in the context of recent networks, digital circuits, and database storage.
Information-theoretic security is an emerging approach to provide secrecy and privacy for, e.g., wireless communication systems and networks by exploiting the unique characteristics of the wireless communication channel.Information-theoretic security methods such as physical layer security (PLS) use signal processing, advanced coding, and communication techniques to secure wireless communications at the physical layer.There are two key advantages of PLS.Firstly, it enables the use of resources available at the physical layer such as multiple measurements, channel training mechanisms, power, and rate control, which cannot be utilized by the upper layers of the protocol stack.Secondly, it is based on an information-theoretic foundation for secrecy and privacy that does not make assumptions on the computational capabilities of adversaries, unlike cryptographic primitives.By considering the security and privacy requirements of recent digital systems and the potential benefits from information-theoretic security and privacy methods, it can be seen that information-theoretic methods can complement or even replace conventional cryptographic protocols for wireless networks, databases, and user authentication and identification.Since information-theoretic methods do not generally require pre-shared secret keys, they might considerably simplify the key management in complicated networks.Thus, these methods might be able to fulfill the stringent hardware area constrains of digital devices and latency constraints in certain 5G/6G applications, or to save computations; hence, battery life for low-power devices such as IoT devices.Furthermore, information-theoretic methods offer "built-in" secrecy and privacy, which are generally agnostic to the network infrastructure and provide better scalability as the size of a network or database increases.
A promising local solution to information-theoretic security and privacy problems is a physical unclonable function (PUF) [5].PUFs generate "fingerprints" for physical devices by using their intrinsic and unclonable properties.For instance, consider ring oscillators (ROs) with a logic circuit of multiple inverters serially connected with a feedback of the output of the last inverter into the input of the first inverter, as depicted in Figure 1.RO outputs are oscillation frequencies 1/ x, where x is the oscillation period, that are unique and uncontrollable since the difference between different RO outputs is caused by submicron random manufacturing variations that cannot be controlled.One can use RO outputs as a source of randomness, called a PUF circuit, to extract secret keys that are unique to the digital device that embodies these ROs.The complete method that puts out a unique secret key by using RO outputs is called an RO PUF.Similarly, binary static random access memory (SRAM) outputs can be used as a source of randomness to implement SRAM PUFs in almost all digital devices because most digital devices have embedded SRAMs used for data storage.The logic circuit of an SRAM is depicted in Figure 2 and the logically stable states of an SRAM cell are (Q, Q) = (1, 0) and (0, 1).During the power-up, the state is undefined if the manufacturer did not fix it.The undefined power-up state of an SRAM cell converges to one of the stable states due to random and uncontrollable mismatch of the inverter parameters, fixed when the SRAM cell is manufactured [6].There is also random noise in the cell that affects the cell at every power-up.Since the physical mismatch of the cross-coupled inverters is a manufacturing variation, the power-up state of an SRAM cell is considered as a PUF response with one challenge, which is the address of the SRAM cell [6].
PUFs resemble biometric features of human beings.In this review, we will list state-of-the-art methods that bridge the gap between the practical secrecy systems that use PUFs and the information-theoretic security limits by • Modeling real PUF outputs to solve security problems with valid assumptions; • Analyzing methods that make information-theoretic analysis tractable, e.g., by transforming PUF symbols so that the transform-domain outputs are almost independent and identically distributed (i.i.d.), and that result in smaller hardware area than benchmark designs in the literature; • Stating the information-theoretic limits for realistic PUF output models and providing optimal and practical (i.e., low-complexity and finite-length) code constructions that achieve these limits; • Illustrating best-in-class nested codes for realistic PUF output models.
In short, we start with real PUF outputs to obtain mathematically-tractable models of their behaviour and then list optimal code constructions for these models.Since we discuss methods developed from the fundamentals of signal processing and information theory, any further improvements in this topic are likely to follow the listed steps in this review.

Organization and Main Insights
This paper is organized as follows.In Section 2, we define a PUF, list its existing and potential applications, and analyze the most promising PUF types.The PUF output models and design challenges faced when manufacturing reliable, low-complexity, and secure PUFs are listed in Section 3. The main security challenge in designing PUFs, i.e., output correlations, is tackled in Section 4 mainly by using a transform coding method, which can provably protect PUFs against various machine learning attacks.The reliability and secrecy performance (e.g., the number of authenticated users) metrics used for PUF designs are defined and jointly optimized in Section 5. PUF security and complexity performance evaluations for the defined transform coding method are given in Section 6. Performance results for error-correction codes used in combination with previous code constructions that are used for key extraction with PUFs, are shown in Section 7 in order to illustrate that previous key extraction methods are strictly suboptimal.We next define the information theoretic metrics and the ultimate key-leakage-storage rate regions for the key agreement with PUFs problem, as well as comparing available code constructions for the key agreement problem in Section 8. Optimal code constructions for the key extraction with PUFs are implemented in Section 9 by using nested polar codes, which are used in 5G networks in the control channel, to illustrate significant gains from using optimal code constructions.In Section 10, we provide a list of open PUF problems that might be interesting for information theorists, coding theorists, and signal processing researchers in addition to the PUF community.

PUF Basics
We give a brief review of the literature on PUFs and discuss the problems with previous PUF designs that can be tackled by using signal processing and coding-theoretic methods.
A PUF is a function that is embodied in a physical device and is unclonable.In the literature, there are alternative expansions of the term PUF such as "physically unclonable function", suggesting that it is a function that is only physically-unclonable.Such PUFs may provide a weaker security guarantee since they allow their functions to be digitally-cloned.For any practical application of a PUF, we need the property of unclonability both physically and digitally.We therefore consider a function as a PUF only if it is a physical function, which is embodied in a physical device, that is unclonable digitally and physically.
Physical identifiers such as PUFs are heuristically defined to be complex challenge-response mappings that depend on the random variations in a physical object.Secret sequences are derived from this complex mapping, which can be used as a secret key.One important feature of PUFs is that the secret sequence generated is not required to be stored and it can be regenerated on demand.This property makes PUFs cheaper (no requirement for a memory for secret storage) and safer (the secret sequence is regenerated only on demand) alternatives to other secret generation and storage techniques such as storing the secret in an NVM [5].
There are an immense number of PUF types, which makes it practically impossible to give a single definition of PUFs that covers all types.We provide the following definition of PUFs that includes all PUF types of interest for this review.

Definition 1 ([5]
).A PUF is a challenge-response mapping embodied by a physical device such that it is fast and easy for the physical device to put out the PUF response and hard for an attacker, who does not have access the PUF circuits, to determine the PUF response to a randomly chosen challenge, even if he has access to a set of challenge-response pairs.The terms used in Definition 1, i.e., fast, easy, and hard, are relative terms that should be quantified for each PUF application separately.There are physical functions, called physical one-way functions (POWFs), in the literature that are closely related to PUFs.Such functions are obtained by applying the cryptographic concept of "one-way functions", i.e., functions that are easy to evaluate but (on average) difficult to invert [7], to physical systems.As the first example of POWFs, the speckle pattern obtained from coherent waves propagating through a disordered medium is a one-way function of both the physical randomness in the medium and the angle of the beam used to generate the optical waves [8].
Similar to POWFs, biometric identifiers such as the iris, retina, and fingerprints are closely related to PUFs.Most of the assumptions made for biometric identifiers are satisfied also by PUFs, so we can apply almost all of the results in the literature for biometric identifiers to PUFs.However, it is common practice to assume that PUFs can resist invasive (physical) attacks, which are considered to be the most powerful attacks used to obtain information about a secret in a system, unlike biometric identifiers that are constantly available for attacks.The reason for this assumption is that invasive attacks permanently destroy the fragile PUF outputs [5].This assumption will be the basis for the PUF system models used throughout this review.We; therefore, assume that the attacker does not observe a sequence that is correlated with the PUF outputs, unlike biometric identifiers, since physical attacks applied to obtain such a sequence permanently change the PUF outputs.

Applications of PUFs
A PUF can be seen as a source of random sequences hidden from an attacker who does not have access to the PUF outputs.Therefore, any application that takes a secret sequence as input can theoretically use PUFs.We list some scenarios where PUFs fit well practically: • Security of information in wireless networks with an eavesdropper, i.e., a passive attacker, is a PLS problem.Consider Wyner's wiretap channel model introduced in [9], where a transmitter sends a message through a broadcast channel so that a legitimate receiver can reliably reconstruct the message, while the message should be kept secret from an eavesdropper.This model is the most common PLS model, which is a channel coding problem unlike the secret key agreement problem we consider below that is a source coding problem.A randomized encoder helps the transmitter in keeping the message secret by confusing the eavesdropper.Therefore, one can use PUFs at the transmitter as the source of local randomness when a message should be sent securely through the wiretap channel.
• Consider a 5G/6G mobile device that uses a set of SRAM outputs, which are available in mobile devices, as PUF circuits to extract secret keys so that the messages to be sent are encrypted with these secret keys before sending the data over the wireless channel.Thus, the receiver (e.g., a base station) that previously obtained the secret keys (sent by mobile devices, e.g., via public key cryptography) can decrypt the data, while an eavesdropper who only overhears the data broadcast over the wireless channel cannot easily learn the message sent.
• The controller area network (CAN) bus standard used in modern vehicles is illustrated in [10] to be susceptible to denial-of-service attacks, which shows that safety-critical inputs of the internal vehicle network such as brakes and throttle can be controlled by an attacker.One countermeasure is to encrypt the transmitted CAN frames by using block ciphers with secret keys generated from PUF outputs used as inputs.
• IoT devices such as wearable or e-health devices may carry sensitive data and use a PUF to store secret keys so that only a mobile device with access to the secret keys can control the IoT devices.
One common example of such applications is when PUFs are used to authenticate wireless body sensor network devices [11].
• Cloud storage requires security to protect users' sensitive data.However, securing the cloud is expensive and the users do not necessarily trust the cloud service providers.A PUF in a universal serial bus (USB) token, i.e., Saturnus, has been trademarked to encrypt user data before uploading the data to the cloud, decrypted locally by reconstructing the same secret from the same PUF.
• System developers want to mutually authenticate a field programmable gate array (FPGA) chip and the intellectual property (IP) components in the chip, and IP developers want to protect the IP.In [12], a protocol is described to achieve these goals with a small hardware area that uses one symmetric cipher and one PUF.
Other applications of PUFs include providing non-repudiation (i.e., undeniable transmission or reception of data), proof of execution on a specific processor, and remote integrated circuit (IC) enabling.Every application of PUFs has different assumptions about the PUF properties, computational complexity, and the specific system models.Therefore, there are different constraints and system parameters for each application.We focus mainly on the application where a secret key is generated from a PUF for user, or device, authentication with privacy and secrecy guarantees, and low complexity.

Main PUF Types
We review four PUF types, i.e., silicon, arbiter, RO, and SRAM PUFs.We consider mainly the last two PUF types for algorithm and code designs due to their common use in practice and because signal processing techniques can tackle the problems arising in designing these PUFs.For a review of other PUF types that are mostly considered in the hardware design and computer science literatures, and various classifications of PUFs, see, e.g., [4,13,14].The four PUF types considered below can be shown to satisfy the assumption that invasive attacks permanently change PUF outputs, since digital circuit outputs used as the source of randomness in these PUF types change permanently under invasive attacks due to their dependence on nano-scale alterations in the hardware.

Silicon and Arbiter PUFs
Common complementary metal-oxide-semiconductor (CMOS) manufacturing processes are used to build silicon PUFs, where the response of the PUF depends on the circuit delays which vary across integrated circuits (ICs) [5].Due to high sensitivity of the circuit delays to environmental changes (e.g., ambient temperature and power supply voltage), arbiter PUFs are proposed in [15], for which an arbiter (i.e., a simple transparent data latch) is added to the silicon PUFs so that the delay comparison result is a single bit.The difference of the path delays is mapped to, e.g., the bit 0 if the first path is faster, and the bit 1 otherwise.The difference between the delays can be small, causing meta-stable outputs.Since the output of the mapper is generally preset to 0, the incoming signals must satisfy the setup time (t setup ) of the latch to switch the output to 1, resulting in a bias in the arbiter PUF outputs.Symmetrically implementable latches (e.g., set-reset latches) should be used to overcome this problem, which is difficult because FPGA routing does not allow the user to enforce symmetry in the hardware implementation.We discuss below that PUFs without symmetry requirements, e.g., RO PUFs, provide better results.

Ring Oscillator PUFs
The logic circuit of an odd number of inverters serially connected, where the output of the last inverter is given as input to the first inverter is an RO, as depicted in Figure 1.The first logic gate in Figure 1 is a NAND gate, giving the same logic output as an inverter gate when the ENABLE signal is 1 (ON), to enable/disable the RO circuit.The manufacturing-dependent and uncontrollable component in an RO is the total propagation delay of an input signal to flow through the RO, determining the oscillation frequency 1/x of an RO that is used as the source of randomness.A self-sustained oscillation is possible when the ring provides a 2π phase shift and has unit voltage gain at the oscillation frequency 1/x.
Consider an RO with m ≥ 3 inverters.Each inverter should provide a phase shift of π m with an additional phase shift of π due to the feedback.Therefore, the signal should flow through the RO twice to provide the necessary phase shift [16].Suppose a propagation delay of τ d for each inverter, so the oscillation frequency of an RO is . We remark that since RO outputs are generally measured by using 32-bit counters, it is realistic to assume that a measured RO output 1 x is a realization of a continuous distribution that can be modeled by using the histogram of a family of RO outputs with the same circuit design, as assumed below.
The propagation delay τ d is affected by nonlinearities in the digital circuit.Furthermore, there are deterministic noise sources such as the cross-talk between adjacent signal traces and additional random noise sources such as thermal noise and flicker noise [16].Such effects should be eliminated to have a reliable RO output.Rather than improving the standard RO designs, which would impose the condition that manufacturers should change their RO designs, the first proposal to fix the reliability problem was to make hard bit decisions by comparing RO pairs [17], as illustrated in Figure 3.
In Figure 3, the multiplexers are challenged by a bit sequence of length at most ⌈log 2 N ⌉ so that an RO pair is selected among N ROs.The counters put out the number of rising edges from each RO for a fixed time duration.A logic bit decision is made by comparing the counter values, which can be bijectively mapped to the oscillation frequencies.For instance, when the upper RO has a greater counter value, then the bit 0 is generated; otherwise, the bit 1.Given that ROs are identically laid out in the hardware, the differences in the oscillation frequencies are determined mainly by uncontrollable manufacturing variations.Furthermore, it is not necessary to have a symmetric layout when hard-macro hardware designs are used for different ROs, unlike arbiter PUFs.
The key extraction method illustrated in Figure 3 gives an output of N 2 bits, which are correlated due to overlapping RO comparisons.This causes a security threat and makes the RO PUF vulnerable to various attacks, including machine learning attacks.Thus, non-overlapping pairs of ROs are used in [17] to extract each bit.However, there are systematic variations in the neighboring ROs due to the surrounding logic, which also should be eliminated to extract sequences with full entropy.Furthermore, ambient temperature and supply voltage variations are the most important effects that reduce the reliability of RO PUF outputs.A scheme called 1-out-of-k masking is proposed as a countermeasure to these effects, which compares the RO pairs that have the maximum oscillation frequency differences for a range of voltages and temperatures to extract bits [17] .The bits extracted by such a comparison are more reliable than the bits extracted by using previous methods.The main disadvantages of this scheme are that it is inefficient due to unused RO pairs, and only a single bit is extracted from the (semi-) continuous RO outputs.We review transform-coding based RO PUF methods below that significantly improve on these methods without changing the standard RO hardware designs.

SRAM PUFs
There are multiple memory-based PUFs such as SRAM, Flip-flop, DRAM, and Butterfly PUFs.Their common feature is to posses a small number of challenge-response pairs with respect to their sizes.As the most promising memory-based PUF type that is already used in the industry, we consider SRAM PUFs that use the uncontrollable settling state of bi-stable circuits [18].In the standard SRAM design, there are four transistors used to form the logic of two cross-coupled inverters, as depicted in Figure 2, and two other transistors to access the inverters.The power-up state, i.e., (Q, Q) = (1, 0) or (0, 1), of an SRAM cell provides one secret bit.Concatenating many such bits allows to generate a secret key from SRAM PUFs on demand.We provide an open problem about SRAM PUFs in Section 10.
3 Correlated, Biased, and Noisy PUF Outputs PUF circuit outputs are biased (nonuniform), correlated (dependent), and noisy (erroneous).We review a transform-coding algorithm that extracts an almost i.i.d.uniform bit sequence from each PUF, so a helper-data generation algorithm can correct the bit errors in the sequence generated from noisy PUF outputs.Using this transform-coding algorithm, we also obtain memoryless PUF measurement-channel models, so standard information-theoretic tools, which cannot be easily applied to correlated sequences, can be used.
Remark 1.The bias in the PUF circuit outputs is considered in the PUF literature to be a big threat against the security of the key generated from PUFs since the bias allows to apply, e.g., machine learning attacks.However, it is illustrated in [19,Figure 6] that the output bias does not change the informationtheoretic rate regions significantly, illustrating that there exist code constructions that do not require PUF outputs to be uniformly distributed.
There are multiple key-generation, i.e., generated-secret (GS), and key-binding, i.e., chosen-secret (CS), methods to reconstruct secret keys from noisy PUF outputs, where the key is generated from the PUF outputs or bound to them, respectively.A code-offset fuzzy extractor (COFE) [20] is an example of key-generation methods and the fuzzy-commitment scheme (FCS) [21] is a key-binding method.Since a secret key should be stored in a secure database for both models, it might be practical to allow a trusted entity to choose the secret key that is bound to a PUF output.Thus, we first analyse a method that significantly improves reliability, privacy, secrecy, and hardware cost performance by using a transformcoding algorithm that is applied to PUF outputs in combination with the FCS.We remark that the information-theoretic analysis of the CS model follows directly from the analysis of the GS model [22], so one can use either model for comparisons.
Correlation in PUF outputs might leak information about the secret key, called secrecy leakage, and about the PUF output, called privacy leakage [22][23][24].Moreover, noise reduces the reliability of PUF outputs and error-correction codes are needed to satisfy the stringent reliability constraints.The transformcoding approach proposed in [25] in combination with a set of scalar quantizers has made its way into secret-key binding with continuous-output identifiers.This approach allows to significantly reduce the output correlation and to adjust the effective noise at the PUF output with reliability guarantees; see [26] for a similar decorrelation approach applied to provide differential privacy to temporally-correlated eye movement measurements.

PUF Output Model
Consider a (semi-)continuous output physical function such as an RO output as a source that puts out a real-valued symbol x.Systematic variations in RO outputs in a two-dimensional array are less than the systematic variations in one-dimensional ROs, since in a two-dimensional array the maximum distance between RO hardware logic circuits is less, which decreases the variations in the RO outputs caused by surrounding hardware logic circuits [27].Thus, consider a two-dimensional RO array of size l = r × c and represent the array as a vector random variable X l .Suppose there is a single two-dimensional RO array in each device with the same circuit design and the RO array emits an output X l according to a probability density f X l .Each RO output is disturbed by mutually-independent additive Gaussian noise and the vector noise is denoted as Z l .Define the noisy RO outputs as Y l = X l + Z l .Observe that X l and Y l are correlated.A secret key can thus be agreed by using these outputs [28,29].
Remark 2. PUF outputs are noisy, as discussed above in this section.However, the first PUF outputs are used by, e.g., a manufacturer to generate or embed a secret key, which is called the enrollment procedure.Since a manufacturer can measure multiple noisy outputs of the same RO to estimate the noiseless RO output, we can consider that the PUF outputs measured during enrollment are noiseless.However, during C n = Enc (S) the reconstruction step, e.g., an IoT device observes a noisy RO output, which can be the case because the IoT device cannot measure the RO outputs multiple times due to delay and complexity constraints.Therefore, we consider a key-agreement model where the first measurement sequence (during enrollment) is noiseless and the second measurement sequence (during reconstruction) is noisy; see also Figure 8 below.Extensions to key agreement models with two noisy sequences, where the noise components can be correlated, are discussed in [24,30,31].
One needs to extract random sequences with i.i.d.symbols from X l and Y l to employ available information-theoretic results in [32] for secret-key binding with identifiers by using the FCS.An algorithm is proposed in [25] that extracts almost i.i.d.binary and uniformly distributed random vectors X n and Y n from X l and Y l , respectively.For such X n and Y n , we can define a binary error vector as E n = X n ⊕Y n , where ⊕ is the modulo-2 sum.The random sequence E n corresponds to a sequence of i.i.d.Bernoulli random variables with parameter p, i.e., E n ∼ Bern n (p).The channel P Y |X is thus a binary symmetric channel (BSC) with crossover probability p, i.e., BSC(p).We discuss a transform-coding method below, which further provides reliability guarantees for each bit generated.
The FCS can reconstruct a secret key by using correlated random variables without leaking any information about the secret key [21].The FCS is depicted in Figure 4, where an encoder Enc(•) maps a secret key S ∈ S, which is uniformly distributed in the set {1, 2, . . ., |S|}, into a binary codeword C n that is added modulo-2 to the binary PUF-output sequence X n during enrollment.The resulting sequence is called helper data W , sent through a public and noiseless communication link to a database.The modulo-2 sum of the helper data W and Y n gives the result which can be later mapped to an estimate Ŝ of the secret key S by the decoder Dec(•) during reconstruction.
We next give information-theoretic rate regions for the FCS; see [33] for information-theoretic notation and basics.Definition 2. A secret-key vs. privacy-leakage rate pair (R s , R ℓ ) is achievable by the FCS with perfect secrecy, i.e., zero secrecy leakage, if, given any δ > 0, there is some n ≥ 1, and an encoder and decoder for which R s = log |S| n and where (3) suggests that S and W are independent and (4) suggests that the rate of dependency between X n and W is bounded.The achievable secret-key vs. privacy-leakage rate, or key-leakage, region R FCS for the FCS is the union of all achievable pairs.Theorem 1 ( [32]).The key-leakage region R FCS for the FCS with a channel P Y |X that is a BSC(p), uniformly distributed X and Y , and zero secrecy leakage is where is the binary entropy function.
The region R FCS suggests that any (secret-key, privacy-leakage) rate pair that sums to 1 bit/source-bit is achievable with the constraint that the secret-key rate is at most the channel capacity of the BSC(p), i.e., max . Furthermore, smaller secret-key rates and greater privacy-leakage rates than these rates are also achievable.
The FCS is a particular realization of the CS model.The region R of all achievable (secret-key, privacyleakage) rate pairs for the CS model with a negligible secrecy-leakage rate, where a generic encoder is used to confidentially transmit an embedded secret key to a decoder that observes Y n and the helper data W , is given in [22] as where  [32].This point corresponds to the highest achievable secret-key rate; see Figure 7 below.We remark that the region R gives an outer bound for the perfect-secrecy case considered to obtain R FCS .

Transform Coding Steps
The main aim of transform coding is to reduce correlations between outputs of the ROs in the same two-dimensional array by using, e.g., a linear transformation.We discuss a transform-coding algorithm proposed in [34] as an extension of [25] to provide reliability guarantees to each generated bit.Joint optimization of the quantizer and error-correction code parameters to maximize the security and reliability performance, and a simple method to decrease hardware storage are its main steps.The output of these post-processing steps is a bit sequence X n (or its noisy version Y n ) used in the FCS.One applies the same post-processing steps for the enrollment and reconstruction.The difference is that during enrollment the design parameters are chosen as a function of the PUF-circuit output statistics by the device manufacturer.
It thus suffices to discuss only the enrollment steps.Figure 5 shows the post-processing steps that include transformation, histogram equalization, quantization, bit allocation, and bit-sequence concatenation.RO outputs X l are correlated due to, e.g., the surrounding logic in the hardware.A transform T r×c (•) of size r×c is applied to an array of RO outputs to reduce correlations.Decorrelation performance of a transform depends on the source statistics.We model each real-valued output T in the transform domain, called transform coefficient, obtained from an RO-output dataset in [35] by using the corrected Akaike's information criterion (AICc) [36] and the Bayesian information criterion (BIC) [37].These criteria suggest that a Gaussian distribution can be fitted to each transform coefficient T for the DCT, discrete Walsh-Hadamard transform (DWHT), discrete Haar transform (DHT), and Karhunen-Loève transform (KLT), which are common orthogonal transforms considered in the literature for image processing, digital watermarking, etc. Use maximum-likelihood estimation to derive unbiased estimates for the parameters of Gaussian distributions.
The histogram equalization step in Figure 5 converts the probability density of the i-th coefficient T i into a standard normal distribution such that T i = T i −µ i σ i , where µ i is the mean and σ i is the standard deviation of the i-th transform coefficient for all i = 1, 2, . . ., l. Quantization steps for all transform coefficients are thus the same.Without histogram equalization, we need a different quantizer for each transform coefficient.Therefore, the histogram equalization step reduces the storage for the quantization steps.Transformed and equalized coefficients T i are independent if the transform T r×c (•) decorrelates the RO outputs perfectly and the transform coefficients T i are jointly Gaussian.One can thus use a scalar quantizer for all coefficients without a performance loss for this case.Scalar quantizers and bit extraction methods are given below to satisfy the security and reliability requirements of the FCS with the independence assumption, which can be combined with a correlation-thresholding approach in practice.

Joint Quantizer and Error-Correction Code Design
The aim of the post-processing steps in Figure 5 is to extract a uniformly-random bit sequence X n .Use a quantizer ∆(•) with quantization-interval values k = 1, 2, • • • , 2 K i , where K i is the number of bits we extract from the i-th coefficient T i for i = 1, 2, . . ., l.Let , where Φ −1 (•) is the quantile function of the standard normal distribution.
The output k is assigned to a bit sequence of length K i .The most likely error event when we quantize T i is a jump to a neighboring quantization step due to zero-mean noise model assumed.Thus, apply a Gray mapping when assigning bit sequences of length K i to the integers k = 1, 2, . . ., 2 K i , so neighboring bit sequences change only in one bit.
We next discuss a reliability metric proposed for a joint quantizer and code design by fixing the maximum number of erroneous transform coefficients and considering an error-correction code that can correct all error patterns with up to a fixed number of errors.

Quantizer Design with Fixed Maximum Number of Errors
We discuss a conservative approach that assumes that either all bits extracted from a transform coefficient are correct or they all flip.The correctness probability P c of a transform coefficient is defined to be the probability that all bits associated with this coefficient are correct.This metric is used to determine the number of bits extracted from each coefficient such that there is a channel encoder and a bounded minimum distance decoder (BMDD) that satisfy the block-error probability constraint P B ≤ 10 −9 , which is a common block-error probability considered for PUFs that consist of CMOS circuits [17].This approach results in reliability guarantees for the random-output RO arrays.
Let Q(•) be the Q-function, σ 2 n the noise variance, and f T the probability density of the standard Gaussian distribution.For a K-bit quantizer and the quantization boundaries b k as in (7) for an equalized Gaussian transform coefficient T , the correctness probability is which calculates the probability that the additive noise will not change the quantization interval assigned to the transform coefficient, i.e., all bits associated with the transform coefficient stay the same after adding noise.Suppose the channel decoder can correct all errors in up to C max transform coefficients.Suppose further that coefficient errors occur independently, i.e., noise on different transform coefficients are mutually independent, and that the correctness probability P c,i (K) of the i-th coefficient T i for i = 1, 2, . . ., l is at least P c (C max ).A sufficient condition for satisfying the block-error probability constraint P B ≤ 10 −9 is that P c (C max ) satisfies the inequality Determine the number K i of bits extracted from the i-th transform coefficient as the maximum value K such that P c,i (K) ≥ P c (C max ).The first coefficient, i.e., DC coefficient, T 1 is not used since its value is a scaled version of the mean of the RO outputs in the same array, which is generally known by an attacker.Ambient-temperature and supply-voltage variations have a highly-linear effect on the RO outputs, so the DC coefficient is the most affected coefficient, which is another reason not to use the DC coefficient [38].Therefore, choose K 1 = 0 so that the total number n(C max ) of extracted bits is In the worst case, the coefficients in error are the coefficients from which the largest number of bits are extracted.Sort the numbers K i of bits extracted from all coefficients in descending order such that The channel decoder thus must be able to correct up to bit errors, which can be satisfied by using a block code with minimum distance d min ≥ 2e(C max )+1 since a BMDD can correct all errors with up to e = ⌊ d min−1 2 ⌋ errors [39].Suppose a key bound to physical identifiers in a device is used in the advanced encryption standard (AES) with a uniformly-distributed secret key of length 128 bits.The block code used in the FCS should thus have a code length of at most n(C max ) bits, code dimension of at least 128 bits, and minimum distance of d min ≥ 2e(C max ) + 1 for a fixed C max .The code rate should be as high as possible to operate close to the optimal (secret-key, privacy-leakage) rate point of the FCS.This optimization problem is hard to solve.One can illustrate by an exhaustive search over a set of C max values and over a selection of algebraic codes that there is a channel code that satisfies these constraints with a reliability guarantee for each extracted bit.Restricting the search to codes that admit low-complexity encoders and decoders is desired for, e.g., IoT applications, for which complexity is the bottleneck.
Conditions listed above are conservative.For a given transform coefficient, the correctness probability can be significantly greater than the correctness threshold P c (C max ).Secondly, due to Gray mapping, it is more likely that less than K i bits are in error when the i-th coefficient is erroneous.It is also unlikely that the bit errors always occur in the transform coefficients from which the largest numbers of bits are extracted.Thus, even if a channel code cannot correct all error patterns with up to e(C max ) errors, it can still be the case that the block-error probability constraint is satisfied.We next illustrate such a case.

PUF Performance Evaluations
Suppose RO outputs X l are represented as a vector random variable with an autocovariance matrix C X X .Consider RO arrays of sizes 8 × 8 and 16 × 16.Autocovariance matrix of such RO array outputs and noise parameters can be estimated from the RO output dataset in [35].Using the dataset, we compare the performance of the DCT, DWHT, DHT, and KLT in terms of their decorrelation efficiency, complexity, uniqueness, and security.

Decorrelation Performance
One should eliminate correlations between the RO outputs and make them independent to extract uniform bit sequences by quantizing each transform coefficient separately.Use the decorrelation efficiency η c [40] as a decorrelation performance metric.Consider the autocovariance matrix C TT of the transform coefficients, so η c of a transform is where the indicator function 1{a = b} takes on the value 1 if a = b and 0 otherwise.The decorrelation efficiency of the KLT is 1, which is optimal [40].We list the average decorrelation efficiency results of other transforms in Table 1.All transforms have similar and good decorrelation efficiency performance for the RO outputs in the dataset [35].The DCT and DHT have the highest efficiency for 8×8 RO arrays, whereas for 16× 16 RO arrays, the best transform is the DWHT.Table 1 indicates that increasing the array size improves η c .

Transform Complexity
We measure the complexity of a transform in terms of the number of operations required to compute the transform.We are interested in a computational-complexity comparison for RO arrays of sizes r = c = 8 and r = c = 16, which are powers of 2, so that fast algorithms are available for the DCT, DWHT, and DHT.The computational complexity of the KLT for r = c = n is O(n 3 ), while it is O(n 2 log 2 n) for the DCT and DWHT, and O(n 2 ) for the DHT [41].There are efficient implementations of the DWHT without multiplications [34], which can be applied also to the transforms proposed in [42].The DWHT is thus a good candidate for RO PUF designs for IoT applications.For instance, a hardware implementation of two-dimensional (2D) DWHT in a Xilinx ZC706 evaluation board with a Zynq-7000 XC7Z045 systemon-chip is illustrated in [34] to require approximately 11% smaller hardware area and 64% less processing time than the benchmark RO PUF hardware implementation in [43].

Uniqueness and Security
The bit sequence extracted from a physical identifier should be uniformly distributed to make the rate region R FCS in (5)   optimal uniqueness results due to their high decorrelation efficiencies and equipartitioned quantization intervals, which are better than previous RO PUF results with mean values of 0.462 [17] and 0.473 [35].
The national institute of standards and technology (NIST) provides a set of randomness tests that check whether a bit sequence can be differentiated from a uniformly random bit sequence [44].Apply these tests to measure the randomness of the generated sequences.The bit sequences generated from ROs in the dataset [35] with the DWHT pass most of the applicable tests for short lengths, which is considered to be an acceptable result [44].The KLT performs the best due to its optimal decorrelation performance.One can apply a thresholding approach such that the reliable transform coefficients from which the bits are extracted do not have high correlations, which further improves the security performance.

Error-Correction Codes for PUFs with Transform Coding
Suppose that bit sequences extracted by using the transform-coding method are i.i.d. and uniformly distributed so that the secrecy leakage is zero.We assume that signal processing steps mentioned above perform well, so we can conduct standard information-and coding-theoretic analysis.We list different codes designed for the transform-coding algorithm according to the reliability metric considered above.
Select a channel code for the quantizer designed above for a fixed maximum number of errors to store a secret key of length 128 bits.The correctness probabilities defined in (8) for the transform coefficients with the three highest and three smallest probabilities are plotted in Figure 6.The indices of the 16 × 16 transform coefficients follow the order in the dataset [35], where the coefficient index at the first row and first column is 1, and it increases columnwise up to 16 so that the second row starts with the index 17, the third row with the index 33, etc.The most reliable transform coefficients are the low-frequency coefficients, which are in our case at the upper-left corner of the 2D transform-coefficient array with indices such as 1, 2, 3, 17, 18, 19, 33, 34, and 35.The low-frequency transform coefficients therefore have the highest signal-to-noise ratios (SNRs) for the source and noise statistics obtained from the RO dataset [35].The least reliable coefficients are observed to be spatially away from the transform coefficients at the upper- left or lower-right corners of the 2D transform-coefficient array.These results indicate that the SNRpacking efficiency, which can be defined similarly as the energy-packing efficiency, of a transform follows a more complicated scan order than the classic zig-zag scan order used for the energy-packing efficiency.
Observe from Figure 6 that increasing the number of extracted bits decreases the correctness probability for all coefficients since the quantization boundaries get closer so that errors due to noise become more likely, i.e., the probability P c (K) defined in (8) decreases with increasing K.
Fix the maximum number C max of transform coefficients allowed to be in error and calculate the correctness threshold P c (C max ) using ( 9), the total number n(C max ) of extracted bits using (10), and the number e(C max ) of errors the block code should be able to correct using (11).Observe that if C max ≤ 10, P c (C max ) is so large that P c,i (K = 1) ≤ P c (C max ) for all i = 2, . . ., l.If 11 ≤ C max ≤ 15, n(C max ) is less than the required code dimension of 128 bits.Furthermore, increasing C max results in a smaller correctness threshold P c (C max ) so that the maximum of the number K max (C max ) = K ′ 1 (C max ) of bits extracted among the l − 1 used coefficients increases.This result can increase hardware complexity.Therefore, consider only the cases where C max ≤ 20.Table 2 shows P c (C max ), n(C max ), and e(C max ) for a range of C max values used for channel-code selection.
Consider binary (extended) Bose-Chaudhuri-Hocquenghem (BCH) and Reed-Solomon (RS) codes, which have good minimum-distance d min properties.An exhaustive search does not provide a code with dimension of at least 128 bits and with parameters satisfying any of the (n(C max ), e(C max )) pairs in Table 2.However, the correctness threshold analysis leading to Table 2 is conservative.Thus, choose a BCH code with parameters as close as possible to an (n(C max ), e(C max )) pair, for which one can prove that even if the number e BCH of errors the chosen BCH code can correct is less than e(C max ), the blockerror probability constraint is satisfied.Consider the binary BCH code with the block length 255, code dimension 131, and a capability of correcting all error patterns with up to e BCH = 18 errors.
First, impose the condition that exactly one bit is extracted from each coefficient, i.e., K i = 1 for all i = 2, 3, . . ., l, so that in total n = l − 1 = 255 bits are obtained, which results in mutually independent bit errors E i .Therefore, the chosen block code should be able to correct all error patterns with up to e = 20 bit errors rather than e(20) = 25 bit errors, which is still greater than the error-correction capability e BCH = 18 of the considered BCH code.
The block error probability P B for the BCH code C(255, 131, 37) with a BMDD corresponds to the probability of having more than 18 errors in the codeword, i.e., we obtain where P c,i is the correctness probability of the i-th transform coefficient T i defined in (8) 5) and ( 6), the maximum secret-key rate point (R * ℓ , R * s ), and a finite-length bound for n = 255 bits, P B = 10 −9 , and BSC (0.0097).
2, 3, . . ., 256, F j is the set of all size-j subsets of the set {2, 3, . . ., 256}, and A c denotes the complement of the set A. The correctness probabilities P c,i are different and they represent probabilities of independent events due to the independence assumption for the transform coefficients.We use the discrete Fourier transform characteristic function method [45] to calculate the block-error probability and obtain the result P B ≈ 1.26×10 −11 < 10 −9 .The block-error probability constraint is thus satisfied by using the BCH code C(255, 131, 37) with a BMDD although the conservative analysis suggests that it would not be satisfied.
We next compare the BCH code C(255, 131, 37) with previous codes proposed for binding keys to physical identifiers with the FCS and a secret-key length of 128 bits such that P B ≤ 10 −9 is satisfied.The (secret-key, privacy-leakage) rate pair for this code is (R s , R ℓ ) = ( 131 255 , 1 − 131 255 ) ≈ (0.514, 0.486) bits/source-bit.This pair is significantly better than previous results.The main reason for obtaining a better (secret-key, privacy-leakage) rate pair is that the quantizer defined above allows to obtain a higher identifier-output reliability by decreasing the number of bits extracted from a transform coefficient.
Compare the secret-key R s and privacy-leakage R ℓ rates of the BCH code C(255, 131, 37) with the region of all achievable rate pairs for the CS model and the FCS for a BSC P Y |X with crossover probability p b = 1 − 1 l−1 l i=2 P c,i (K i = 1) ≈ 0.0097, i.e., the probability of being in error averaged over all used transform coefficients with the quantizer defined above.Compute the boundary points of the region R by finding the optimal auxiliary random variable U in (6) when P Y |X is a BSC.The regions of all rate pairs achievable by the FCS and CS model, the maximum secret-key rate point, the (secret-key, privacyleakage) rate pair of the BCH code, and a finite-length (non-asymptotic) bound [46] for the block length of n = 255 bits and P B = 10 −9 are plotted in Figure 7.
The maximum secret-key rate is R * s ≈ 0.922 bits/source-bit with a corresponding minimum privacyleakage rate of R * ℓ ≈ 0.079 bits/source-bit.There is a gap between the rate tuple achieved by the BCH code and the only operation point where the FCS is optimal, i.e., (R * ℓ , R * s ).Part of this rate loss can be explained by the short block length of the code and the small block-error probability constraint.The finite-length bound given in [46, Theorem 52] establishes that the rate pair (R s , R ℓ ) = (0.691, 0.309) bits/source-bit is achievable by using the FCS, as depicted in Figure 7.One can therefore further improve the rate pairs by using better channel codes and decoders with higher hardware complexity, but this may not be possible for IoT applications.Figure 7 also illustrates that there exist other code constructions (other than standard channel codes) that reduce the privacy-leakage rate as well as the storage rate for each fixed secret-key rate, which we consider below.

Code Constructions for PUFs
Consider the two-terminal key agreement problem, where the identifier outputs during enrollment are noiseless.We mention two optimal linear code constructions from [47] that are based on Wyner-Ziv (WZ) coding [48].The first construction uses random linear codes and achieves all points of the key-leakagestorage regions of the GS and CS models.The second construction uses nested polar codes for vector quantization during enrollment and for error correction during reconstruction.Simulations show that nested polar codes achieve privacy-leakage and storage rates that improve on existing code designs, and one designed code achieves a rate tuple that cannot be achieved by existing methods.Several practical code constructions for key agreement with identifiers have been proposed in the literature.For instance, the COFE and the FCS both require a standard error-correction code to satisfy the constraints of, respectively, the key generation (GS model) and key embedding (CS model) problems, as discussed above.Similarly, a polar code construction is proposed in [49] for the GS model.These constructions are shown to be suboptimal in terms of the privacy-leakage and storage rates.
The binary Golay code is used in [22] as a vector quantizer (VQ) in combination with Slepian-Wolf (SW) codes [50] to illustrate that the key vs. storage (or key vs. leakage) rate ratio can be increased via quantization.This observation motivates the use of a VQ to improve the performance of previous constructions.We next consider VQ by using WZ coding to decrease storage rates.The WZ-coding construction turns out to be optimal, which is not coincidental.For instance, the bounds on the storage rate of the GS model and on the WZ rate (storage rate) have the same mutual information terms optimized over the same conditional probability distribution.This similarity suggests an equivalence that is closely related to the concept of formula duality.In fact, the optimal random code construction, encoding, and decoding operations are identical for both problems.One therefore can call the GS model and WZ problem functionally equivalent.Such a strong connection suggests that there might exist constructive methods that are optimal for both problems for all channels, which is closely related to the operational duality concept.
Consider the GS model in Figure 8(a), where a secret key is generated from a biometric or physical source.During enrollment, the encoder observes an i.i.d.noiseless sequence X n , generated by the source according to some P X , and computes a secret key S and public helper data W as (S, W ) = Enc(X n ).During reconstruction, the decoder observes a noisy source measurement Y n of the source X n through a memoryless channel P Y |X together with the helper data W .The decoder estimates the secret key as S = Dec(Y n , W ). Similarly, Figure 8(b) shows the CS model, where a secret key S that is independent of (X n , Y n ) is embedded into the helper data as W = Enc(X n , S).The decoder for the CS model estimates the secret key as S = Dec(Y n , W ). The source, measurement, secret key, and storage alphabets X , Y, S, and W are finite sets, which can be achieved if, e.g., the transform-coding algorithm discussed above is applied.

Definition 3.
A key-leakage-storage tuple (R s , R ℓ , R w ) is achievable for GS and CS models if, given any δ > 0, there is some n ≥ 1, an encoder, and a decoder such that R s = log |S| n and log |W| ≤ n(R w + δ) (storage) (18) are satisfied.The key-leakage-storage regions R gs and R cs for the GS and CS models, respectively, are the closures of the sets of achievable tuples for the corresponding models.

Theorem 2 ([22]
).The key-leakage-storage regions for the GS and CS models, respectively, are and where U − X − Y form a Markov chain.These regions are convex sets.The alphabet U of the auxiliary random variable U can be limited to have size |U | ≤ |X | + 1 for both regions.
Remark 3. One can improve the weak secrecy to strong secrecy, i.e., we can replace (15) with I(S; W ) ≤ δ by applying information reconciliation and privacy amplification steps to multiple blocks of identifier outputs as described in [51], e.g., by using multiple PUFs in a device for key agreement.

Optimal Code Constructions with Polar Codes
Polar codes [52] have a low encoding/decoding complexity, asymptotic optimality for various informationtheoretic problems, and good finite length performance if a list decoder is used in combination with an outer code.Furthermore, they have a structure that allows a simple nested code design and they can be used for WZ coding [53].Polar codes rely on the channel polarization phenomenon, where a channel is converted into polarized bit channels by a polar transform.This transform converts an input sequence U n with frozen and unfrozen bits to a codeword of the same length n.A polar decoder processes a noisy observation of the codeword together with the frozen bits to estimate U n .
Let C(n, F, G |F | ) denote a polar code of block length n, where F is the set of indices of the frozen bits and G |F | is the sequence of frozen bits.In the following, we use the nested polar code construction proposed in [53] for WZ coding.

Polar Code Construction for the GS Model
Consider two polar codes C 1 (n, F 1 , V ) and C(n, F, V ) with F = F 1 ∪ F w and V = [V, W ], where V has length m 1 and W has length m 2 such that m 1 and m 2 satisfy for some distortion q ∈ [0, 0.5] and δ > 0. The indices in F 1 represent frozen channels with assigned values V for both codes and C has additional frozen channels with assigned values W denoted by the set of indices F w , so the codes are nested.
The code C 1 serves as a VQ with a desired distortion q since its rate is greater than the lossy source coding capacity with average distortion q.The code C serves as the error-correction code for a BSC(q * p A ) since its rate is less than channel capacity of this channel.The idea is to obtain W during enrollment and store it as public helper data.For reconstruction, (W, V, Y n ) are used by the decoder to estimate the secret key S of length n − m 1 − m 2 .Figure 9 shows the block diagram of this construction.Suppose V is the all-zero vector so that no additional storage is necessary.This choice has no effect on the average distortion E[q] between X n and X n q defined below; see [53,Lemma 10].Enrollment: The uniform binary sequence X n generated by a PUF during enrollment is treated as the noisy observation of a BSC(q).X n is quantized by a polar decoder of C 1 .Extract from the decoder output U n the bits at indices F w and store them as the helper data W .The bits at the indices j ∈ {1, 2, . . ., n}\F are used as the secret key.Note that applying the polar transform to U n generates X n q , which is a distorted version of X n .The distortion between X n and X n q is modeled as a BSC(q) because the error sequence E n q = X n ⊕ X n q resembles an i.i.d.sequence ∼ Bern n (q) when n → ∞ [53, Lemma 11].Reconstruction: During reconstruction, the polar decoder of C observes the binary sequence Y n , which is a noisy measurement of X n through a BSC(p A ).The frozen bits V = [V, W ] at indices F are given to the polar decoder.The output U n of the polar decoder is the estimate of U n and contains the estimate S of the secret key at the unfrozen indices of C, i.e., j ∈ {1, 2, . . ., n} \ F.
We next summarise a method to design practical nested polar codes for the GS model.Construction of C and C 1 : Since C ⊆ C 1 are nested codes, they must be constructed jointly.F and F 1 should be chosen such that the reliability and security constraints are satisfied.For a given secret key size 3. Find an F 1 ⊂ F that results in an average distortion of E[q] with a minimum possible amount of helper data.Use F 1 as the frozen set of C 1 .

Helper Data and Key Extraction
Step 1 is a conventional channel code design task, similar to the codes designed for the transform-coding algorithm above, and step 2 is applied by Monte-Carlo simulations.For step 3, we start with F ′ 1 = F and compute the resulting average distortion E[q ′ ] via Monte-Carlo simulations.If E[q ′ ] is not less than E[q], remove elements from F ′ 1 according to the reliabilities of the polarized bit channels and repeat the procedure until we obtain the desired average distortion E The distortion level introduced by the VQ is an additional degree of freedom in choosing the code design parameters.For instance, different values of P B can be targeted with the same code by changing the distortion level.Alternatively, devices with different p A values can be supported by using the same code.This additional degree of freedom makes the mentioned code design suitable for a wide range of applications.

Designed Polar Codes for the GS Model
Consider, e.g., the GS model where S is used in the AES and log |S| = n−m 1 −m 2 = 128 bits, as considered above.If we use PUFs in an FPGA as the randomness source, we must satisfy a block-error probability P B of at most 10 −6 .Consider a BSC P Y |X with crossover probability p A = 0.15, which is a common value for SRAM PUFs under ideal environmental conditions [6] and for RO PUFs under varying environmental conditions [38].Consider nested polar codes for these parameters to illustrate that one can achieve better key-leakage-storage rate tuples than previously proposed codes.
Code 1 : Consider n = 1024 bits and recall that n − m 1 − m 2 = 128 bits, P B = 10 −6 , and p A = 0.15.Polar successive cancellation list (SCL) decoders with list size 8 are used as the VQ and channel decoder.First, design the code C of rate 128/1024 and evaluate its performance with the SCL decoder for a BSC with a range of crossover probabilities.A block-error probability of 10 −6 is observed at a crossover probability of p c = 0.1819.Since p A = 0.15, this corresponds to an average distortion of E[q] = 0.0456 and the target average distortion is obtained at n − m 1 = 778 bits.Thus, m 2 = 650 bits of helper data suffice to obtain a block-error probability of P B = 10 −6 .
Code 2 : Consider the same parameters as in Code 1, except n = 2048 bits.Apply the same steps as above.A crossover probability of p c = 0.2682 is required to obtain a block-error probability of 10 −6 , which gives an average distortion of E[q] = 0.1689.The target average distortion is achieved with helper data of length 611 bits.
The error probability P B is calculated as an average over a large number of PUF realizations, i.e., over a large number of PUF devices with the same circuit design.To satisfy the block-error probability requirement for each PUF realization, one could consider using the maximum distortion instead of E[q] in step 3 of the design procedure given above.This would increase the amount of helper data.One can guarantee for the considered parameters a block-error probability of at most 10 −6 for 99.99% of all realizations x n of X n by adding 32 bits to the helper data for Code 1 and 33 bits for Code 2. The numbers of extra helper data bits required are small since the variance of the distortion q over all PUF realizations is small for the blocklengths considered.For comparisons, we consider the helper data sizes required to guarantee P B = 10 −6 for 99.99% of all PUF realizations.

Code Comparisons
Figure 10 depicts the storage-key (R w , R s ) projection of the boundary points of the region R gs,bin for p A = 0.15.Furthermore, we show the point with the maximum secret-key rate R * s and the minimum storage rate R * w to achieve R * s .For the FCS and COFE, use the random coding union bound [46,Thm. 16] to confirm that the plotted rate pairs are achievable for a secret-key length of 128 bits, an error probability of P B = 10 −6 , and blocklengths of n = 1024 and n = 2048.These rate pairs are shown in Figure 10 to the right of the dashed line representing R w + R s = 1 bit/source-bit.Similarly, the rate pairs achieved by the previous polar code design in [49], and Codes 1 and 2 are shown in Figure 10.
Storage rates of the FCS and COFE are 1 bit/source-bit, which is suboptimal.The previous polar code construction in [49] achieves a rate point with R s + R w = 1 bit/source-bit, which is expected since this is a SW-coding construction.The previous polar code construction improves on the rate pairs achieved by the FCS and COFE in terms of the key vs. storage ratio.Nested polar codes achieve the key-leakage-storage rates of approximately (0.125, 0.666, 0.666) bits/source-bit by Code 1 and (0.063, 0.315, 0.315) bits/sourcebit by Code 2, projections of which are depicted in Figure 10.These rates are significantly better than all 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9  The block error probability satisfies P B ≤ 10 −6 and the key length is 128 bits for all code points.previous code constructions for the same parameters and without any private key assumption.Designed nested polar codes increase the ratio R s /R w from approximately 0.188 for Code 1 to 0.199 for Code 2, suggesting to increase the blocklength to obtain better ratios.Code 2 achieves privacy-leakage and storage rates that cannot be achieved by previous methods without applying the method of time sharing, since Code 2 achieves privacy-leakage and storage rates of 0.315 bits/source-bit that are significantly less than the minimum privacy-leakage and storage rates R * w = R * ℓ = H b (p A ) ≈ 0.610 bits/source-bit that can be asymptotically achieved by previous methods.
Apply the sphere packing bound [54,Eq. (5.8.19)] to upper bound the key vs. storage rate ratio that can be achieved by SW-coding constructions for the maximum secret-key rate point.Consider p A = 0.15, n = 1024, and P B = 10 −6 , for which the sphere packing bound requires that the rate of the code C satisfies R C ≤ 0.273.Assume that the key rate is given by its maximal value R s = R C and the storage rate is given by its minimal value R w = 1 − R C , then we arrive at R s /R w ≤ 0.375.A similar calculation for n = 2048 yields R s /R w ≤ 0.437.These results indicate that there are still gaps between the maximum key vs. storage rate ratios achieved by WZ-coding constructions and the ratios achieved by Codes 1 and 2. The gaps can be reduced by using different nested polar codes that improve the minimum-distance properties, as in [55], or by using nested algebraic codes for which design methods are available in the literature, as in [56].We remark again that such optimality-seeking approaches, e.g., based on information-theoretic security, provide the right insights into the best solutions for the digital era's security and privacy problems.

Discussions and Open Problems
• We want to use low-complexity scalar quantizers after transformation without extra secrecy leakage; however, the decorrelation efficiency metric does not fully represent the dependency between transform coefficients.What is the right metric to use for choosing the transform used in combination with scalar quantizers?Is mutual information between transform coefficients an appropriate metric for this purpose?The choice of the transform should also depend on a reliability metric such as SNR-packing efficiency so that the transform, quantizers, and the error-correction codes can be designed jointly.What is the right reliability metric for this purpose?
• It is shown in [38] that the ambient temperature and supply voltage affect the RO outputs deterministically rather than adding extra random noise, which was assumed in the RO PUF literature.What are the right output models for common PUF types, i.e., what are the deterministic and random components, and how are they related?
• SRAM PUFs are already used in products.In the literature there is no extensive analysis of the output correlations between different SRAMs in the same device possibly because SRAM outputs are binary and it is difficult to model the correlation between binary symbols.However, SRAM outputs are modeled in [6] as binary-quantized sums of independent Gaussian random variables.Is it possible to determine or approximate the correlations between the Gaussian random variables of different SRAMs?If yes, this might be useful for an attacker to obtain information about the secret sequence generated from the SRAM PUF output, which causes extra secrecy leakage.
• The transform-coding approach discussed above results in reliability guarantees for the randomoutput RO arrays, which considers an average over all ROs manufactured.The worst case scenario is when the transform coefficient value is on the quantization boundary, for which the secret-key capacity is 0 bit.If one replaces the average reliability metric used above by a lower bound on the reliability of each RO, i.e., a worst-case scenario metric, how would this change the rate of the error-correction code used?For a fixed code, what should be the optimal bound on the reliability of each RO to maximize the yield, i.e., the percentage of ROs among all manufactured ROs for which the worst-case reliability guarantee is satisfied?
• Are the WZ problem and the GS model operationally equivalent (cf.operational duality)?
• Linear block-code constructions discussed above are for uniformly-distributed PUF outputs.Can one construct other (random) linear block codes that are asymptotically optimal for non-uniform PUF outputs?Is it necessary to use an extensions of the COFE for this purpose?
• Consider the nested polar code design procedure given above.It is not possible to construct a code with this procedure for n ≤ 512 since q * p A is an increasing function of q for any q ∈ [0, 0.5].Is a nested polar code construction possible for n = 512 if one improves the code design procedure and the decoder?

Figure 6 :
Figure 6: The correctness probabilities for transform coefficients.

Figure 10 :
Figure 10: Storage-key rates for the GS model with p A = 0.15.The (R *w , R * s ) point is the best possible point achieved by SW-coding constructions, which lies on the dashed line representing R w + R s = H(X).The block error probability satisfies P B ≤ 10 −6 and the key length is 128 bits for all code points.

Table 1 :
The average RO output decorrelation-efficiency results.
valid.A common measure, called uniqueness, for measuring randomness of bit sequences is the average fractional Hamming distance between the bit sequences extracted from different RO PUFs.Similar uniqueness results are obtained for all transforms, where the mean Hamming distance is 0.500 and Hamming distance variance is approximately 7×10 −4 .All transforms thus provide close to