1. Motivations
Fundamental advances in cryptography were made in secret during the 20th century. One exception was Claude E. Shannon’s paper “Communication Theory of Secrecy Systems” [
1]. Until 1967, the literature on security was not extensive, but a book [
2] with a historical review of cryptography changed this trend [
3]. Since then, the amount of sensitive data to be protected against attackers has increased significantly. Continuous improvements in security are needed and every improvement creates new possibilities for attacks [
4].
Recent hardware-intrinsic security systems, biometric secrecy systems, 5th generation of cellular mobile communication networks (5G) and beyond, as well as the internet of things (IoT) networks, have numerous noticeable characteristics that differentiate them from existing mechanisms. These include large numbers of low-complexity terminals with light or no infrastructure, stringent constraints on latency, and primary applications of inference, data gathering, and control. Such characteristics make it difficult to achieve a sufficient level of secrecy and privacy. Traditional cryptographic protocols, requiring certificate management or key distribution, might not be able to handle various applications supported by such technologies and might not be able to assure the privacy of personal information in the data collected. Similarly, low complexity terminals might not have the necessary processing power to handle such protocols, or latency constraints might not permit the processing time required for cryptographic operations. Similarly, traditional methods that store a secret key in a secure nonvolatile memory (NVM) can be illustrated to be not secure because of possible invasive attacks to the hardware. Thus, secrecy and privacy for information systems are issues that need to be rethought in the context of recent networks, digital circuits, and database storage.
Information-theoretic security is an emerging approach to provide secrecy and privacy, for example, for wireless communication systems and networks by exploiting the unique characteristics of the wireless communication channel. Information-theoretic security methods such as physical layer security (PLS) use signal processing, advanced coding, and communication techniques to secure wireless communications at the physical layer. There are two key advantages of PLS. Firstly, it enables the use of resources available at the physical layer such as multiple measurements, channel training mechanisms, power, and rate control, which cannot be utilized by the upper layers of the protocol stack. Secondly, it is based on an information-theoretic foundation for secrecy and privacy that does not make assumptions on the computational capabilities of adversaries, unlike cryptographic primitives. By considering the security and privacy requirements of recent digital systems and the potential benefits from information-theoretic security and privacy methods, it can be seen that information-theoretic methods can complement or even replace conventional cryptographic protocols for wireless networks, databases, and user authentication and identification. Since information-theoretic methods do not generally require pre-shared secret keys, they might considerably simplify the key management in complicated networks. Thus, these methods might be able to fulfill the stringent hardware area constrains of digital devices and delay constraints in 5G/6G applications, or to avoid unnecessary computations, increasing the battery life of low power devices. Information-theoretic methods offer “built-in” secrecy and privacy, generally independent of the network infrastructure, providing better scalability with respect to an increase in the network or data size.
A promising local solution to information-theoretic security and privacy problems is a physical unclonable function (PUF) [
5]. PUFs generate “fingerprints” for physical devices by using their intrinsic and unclonable properties. For instance, consider ring oscillators (ROs) with a logic circuit of multiple inverters serially connected with a feedback of the output of the last inverter into the input of the first inverter, as depicted in 
Figure 1. RO outputs are oscillation frequencies 
, where 
 is the oscillation period, that are unique and uncontrollable since the difference between different RO outputs is caused by submicron random manufacturing variations that cannot be controlled. One can use RO outputs as a source of randomness, called a PUF circuit, to extract secret keys that are unique to the digital device that embodies these ROs. The complete method that puts out a unique secret key by using RO outputs is called an RO PUF. Similarly, binary static random access memory (SRAM) outputs are utilized as a source of randomness to implement SRAM PUFs in almost all digital devices because most digital devices have embedded SRAMs used for data storage. The logic circuit of an SRAM is depicted in 
Figure 2 and the logically stable states of an SRAM cell are 
 and 
. During the power-up, the state is undefined if the manufacturer did not fix it. The undefined power-up state of an SRAM cell converges to one of the stable states due to random and uncontrollable mismatch of the inverter parameters, fixed when the SRAM cell is manufactured [
6]. There is also random noise in the cell that affects the cell at every power-up. Since the physical mismatch of the cross-coupled inverters is due to manufacturing variations, an SRAM cell output during power-up is a PUF output that is a response with one challenge, where the challenge is the address of the SRAM cell [
6].
PUFs resemble biometric features of human beings. In this review, we will list state-of-the-art methods that bridge the gap between the practical secrecy systems that use PUFs and the information-theoretic security limits by
      
Modeling real PUF outputs to solve security problems with valid assumptions;
Analyzing methods that make information-theoretic analysis tractable, for example, by transforming PUF symbols so that the transform-domain outputs are almost independent and identically distributed (i.i.d.), and that result in smaller hardware area than benchmark designs in the literature;
Stating the information-theoretic limits for realistic PUF output models and providing optimal and practical (i.e., low-complexity and finite-length) code constructions that achieve these limits;
Illustrating best-in-class nested codes for realistic PUF output models.
In short, we start with real PUF outputs to obtain mathematically-tractable models of their behavior and then list optimal code constructions for these models. Since we discuss methods developed from the fundamentals of signal processing and information theory, any further improvements in this topic are likely to follow the listed steps in this review.
  Organization and Main Insights
In 
Section 2, we provide a definition of a PUF, list its existing and potential applications, and analyze the most promising PUF types. The PUF output models and design challenges faced when manufacturing reliable, low-complexity, and secure PUFs are listed in 
Section 3. The main security challenge in designing PUFs, i.e., output correlations, is tackled in 
Section 4 mainly by using a transform coding method, which can provably protect PUFs against various machine learning attacks. The reliability and secrecy performance (e.g., the number of authenticated users) metrics used for PUF designs are defined and jointly optimized in 
Section 5. PUF security and complexity performance evaluations for the defined transform coding method are given in 
Section 6. Performance results for error-correction codes used in combination with previous code constructions that are used for key extraction with PUFs, are shown in 
Section 7 in order to illustrate that previous key extraction methods are strictly suboptimal. We next define the information theoretic metrics and the ultimate key-leakage-storage rate regions for the key agreement with PUFs problem, as well as comparing available code constructions for the key agreement problem in 
Section 8. Optimal code constructions for the key extraction with PUFs are implemented in 
Section 9 by using nested polar codes, which are used in 5G networks in the control channel, to illustrate significant gains from using optimal code constructions. In 
Section 10, we provide a list of open PUF problems that might be interesting for information theorists, coding theorists, and signal processing researchers in addition to the PUF community.
  2. PUF Basics
We give a brief review of the literature on PUFs and discuss the problems with previous PUF designs that can be tackled by using signal processing and coding-theoretic methods.
A PUF is defined as an unclonable function embodied in a device. In the literature, there are alternative expansions of the term PUF such as “physically unclonable function”, suggesting that it is a function that is only physically-unclonable. Such PUFs may provide a weaker security guarantee since they allow their functions to be digitally-cloned. For any practical application of a PUF, we need the property of unclonability both physically and digitally. We therefore consider a function as a PUF only when the function is a physical function, i.e. it is in a device, and it is not possible to clone it physically and digitally.
Physical identifiers such as PUFs are heuristically defined to be complex challenge-response mappings that depend on the random variations in a physical object. Secret sequences are derived from this complex mapping, which can be used as a secret key. One important feature of PUFs is that the secret sequence generated is not required to be stored and it can be regenerated on demand. This property makes PUFs cheaper (no requirement for a memory for secret storage) and safer (the secret sequence is regenerated only on demand) alternatives to other secret generation and storage techniques such as storing the secret in an NVM [
5].
There is an immense number of PUF types, which makes it practically impossible to give a single definition of PUFs that covers all types. We provide the following definition of PUFs that includes all PUF types of interest for this review.
Definition 1 ([
5])
. We define a PUF as a challenge-response mapping embodied by a device such that it is fast and easy for the device to put out the PUF response and hard for an attacker, who does not have access to the PUF circuits, to determine the PUF output to a randomly chosen input, given that a set of challenge-response (or input-output) pairs is accessible to him. The terms used in Definition 1, i.e., fast, easy, and hard, are relative terms that should be quantified for each PUF application separately. There are physical functions, called physical one-way functions (POWFs), in the literature that are closely related to PUFs. Such functions are obtained by applying the cryptographic method of “one-way functions”, which refers to easy to evaluate and (on average) difficult to invert functions [
7], to physical systems. As the first example of POWFs, the pattern of the speckle obtained from waves that propagate through a disordered medium is a one-way function of both the physical randomness in the medium and the angle of the beam used to generate the optical waves [
8].
Similar to POWFs, biometric identifiers such as the iris, retina, and fingerprints are closely related to PUFs. Most of the assumptions made for biometric identifiers are satisfied also by PUFs, so we can apply almost all of the results in the literature for biometric identifiers to PUFs. However, it is common practice to assume that PUFs can resist invasive (physical) attacks, which are considered to be the most powerful attacks used to obtain information about a secret in a system, unlike biometric identifiers that are constantly available for attacks. The reason for this assumption is that invasive attacks permanently destroy the fragile PUF outputs [
5]. This assumption will be the basis for the PUF system models used throughout this review. We; therefore, assume that the attacker does not observe a sequence that is correlated with the PUF outputs, unlike biometric identifiers, since physical attacks applied to obtain such a sequence permanently change the PUF outputs.
  2.1. Applications of PUFs
A PUF can be seen as a source of random sequences hidden from an attacker who does not have access to the PUF outputs. Therefore, any application that takes a secret sequence as input can theoretically use PUFs. We list some scenarios where PUFs fit well practically:
Security of information in wireless networks with an eavesdropper, i.e., a passive attacker, is a PLS problem. Consider Wyner’s wiretap channel model introduced in [
9]. This model is the most common PLS model, which is a channel coding problem unlike the secret key agreement problem we consider below that is a source coding problem. A randomized encoder helps the transmitter in keeping the message secret by confusing the eavesdropper. Therefore, at the WTC transmitter, PUFs can be used as the local randomness source when a message should be sent securely through the wiretap channel.
Consider a 5G/6G mobile device that uses a set of SRAM outputs, which are available in mobile devices, as PUF circuits to extract secret keys so that the messages to be sent are encrypted with these secret keys before sending the data over the wireless channel. Thus, the receiver (e.g., a base station) that previously obtained the secret keys (sent by mobile devices, e.g., via public key cryptography) can decrypt the data, while an eavesdropper who only overhears the data broadcast over the wireless channel cannot easily learn the message sent.
The controller area network (CAN) bus standard used in modern vehicles is illustrated in [
10] to be susceptible to denial-of-service attacks, which shows that safety-critical inputs of the internal vehicle network such as brakes and throttle can be controlled by an attacker. One countermeasure is to encrypt the transmitted CAN frames by using block ciphers with secret keys generated from PUF outputs used as inputs.
IoT devices such as wearable or e-health devices may carry sensitive data and use a PUF to store secret keys in such a way that only a device to which the secret keys are accessible can command the IoT devices. One common example of such applications is when PUFs are used to authenticate wireless body sensor network devices [
11].
Cloud storage requires security to protect users’ sensitive data. However, securing the cloud is expensive and the users do not necessarily trust the cloud service providers. A PUF in a universal serial bus (USB) token, i.e., Saturnus®, has been trademarked to encrypt user data before uploading the data to the cloud, decrypted locally by reconstructing the same secret from the same PUF.
System developers want to mutually authenticate a field programmable gate array (FPGA) chip and the intellectual property (IP) components in the chip, and IP developers want to protect the IP. In [
12], a protocol is described to achieve these goals with a small hardware area that uses one symmetric cipher and one PUF.
Other applications of PUFs include providing non-repudiation (i.e., undeniable transmission or reception of data), proof of execution on a specific processor, and remote integrated circuit (IC) enabling. Every application of PUFs has different assumptions about the PUF properties, computational complexity, and the specific system models. Therefore, there are different constraints and system parameters for each application. We focus mainly on the application where a secret key is generated from a PUF for user, or device, authentication with privacy and secrecy guarantees, and low complexity.
  2.2. Main PUF Types
We review four PUF types, i.e., silicon, arbiter, RO, and SRAM PUFs. We consider mainly the last two PUF types for algorithm and code designs due to their common use in practice and because signal processing techniques can tackle the problems arising in designing these PUFs. For a review of other PUF types that are mostly considered in the hardware design and computer science literatures, and various classifications of PUFs, see, for example, [
4,
13,
14]. The four PUF types considered below can be shown to satisfy the assumption that invasive attacks permanently change PUF outputs, since digital circuit outputs used as the source of randomness in these PUF types change permanently under invasive attacks due to their dependence on nano-scale alterations in the hardware.
  2.3. Silicon and Arbiter PUFs
Common complementary metal-oxide-semiconductor (CMOS) manufacturing processes are used to build silicon PUFs, where the response of the PUF depends on the circuit delays, which vary across integrated circuits (ICs) [
5]. Due to high sensitivity of the circuit delays to environmental changes (e.g., ambient temperature and power supply voltage), arbiter PUFs are proposed in [
15], for which an arbiter (i.e., a simple transparent data latch) is added to the silicon PUFs so that the delay comparison result is a single bit. The difference of the path delays is mapped to, for example, the bit 0 if the first path is faster, and the bit 1 otherwise. The difference between the delays can be small, causing meta-stable outputs. Since the output of the mapper is generally pre-assigned to the bit 0, the signals that are incoming are required to satisfy a setup time (
), required by the latch to change the output to the bit 1, resulting in a bias in the arbiter PUF outputs. Symmetrically implementable latches (e.g., set-reset latches) should be used to overcome this problem, which is difficult because FPGA routing does not allow the user to enforce symmetry in the hardware implementation. We discuss below that PUFs without symmetry requirements, for example, RO PUFs, provide better results.
  2.4. RO PUFs
The RO logic circuit is depicted in 
Figure 1, where an odd number of inverters are connected serially with feedback. The first logic gate in 
Figure 1 is a NAND gate, giving the same logic output as an inverter gate when the ENABLE signal is 1 (ON), to enable/disable the RO circuit. The manufacturing-dependent and uncontrollable component in an RO is the total propagation delay of an input signal to flow through the RO, determining the oscillation frequency 
 of an RO that is used as the source of randomness. A self-sustained oscillation is possible when the ring that oscillates at the oscillation frequency 
 of the RO provides a phase shift of 2
 with a voltage gain of 1.
Consider an RO with 
 inverters. Each inverter should provide a phase shift of 
 with an additional phase shift of 
 due to the feedback. Therefore, the signal should flow through the RO twice to provide the necessary phase shift [
16]. Suppose a propagation delay of 
 for each inverter, so the oscillation frequency of an RO is 
. We remark that since RO outputs are generally measured by using 32-bit counters, it is realistic to assume that a measured RO output 
 is a realization of a continuous distribution that can be modeled by using the histogram of a family of RO outputs with the same circuit design, as assumed below.
The propagation delay 
 is affected by nonlinearities in the digital circuit. Furthermore, there are deterministic and additional random noise sources [
16]. Such effects should be eliminated to have a reliable RO output. Rather than improving the standard RO designs, which would impose the condition that manufacturers should change their RO designs, the first proposal to fix the reliability problem was to make hard bit decisions by comparing RO pairs [
17], as illustrated in 
Figure 3.
In 
Figure 3, the multiplexers are challenged by a bit sequence of length at most 
 so that an RO pair out of 
N ROs is selected. The counters count the number of times a rising edge is observed for each RO during a fixed time. A logic bit decision is made by comparing the counter values, which can be bijectively mapped to the oscillation frequencies. For instance, when the upper RO has a greater counter value, then the bit 0 is generated; otherwise, the bit 1. Given that ROs are identically laid out in the hardware, the differences in the oscillation frequencies are determined mainly by uncontrollable manufacturing variations. Furthermore, it is not necessary to have a symmetric layout when hard-macro hardware designs are used for different ROs, unlike arbiter PUFs.
The key extraction method illustrated in 
Figure 3 gives an output of 
 bits, which are correlated due to overlapping RO comparisons. This causes a security threat and makes the RO PUF vulnerable to various attacks, including machine learning attacks. Thus, non-overlapping pairs of ROs are used in [
17] to extract each bit. However, there are systematic variations in the neighboring ROs due to the surrounding logic, which also should be eliminated to extract sequences with full entropy. Furthermore, ambient temperature and supply voltage variations are the most important effects that reduce the reliability of RO PUF outputs. A scheme called 
1-out-of-k masking is proposed as a countermeasure to these effects, which compares the RO pairs that have the maximum difference between their oscillation frequencies for a wide range of temperatures and voltages to extract bits [
17]. The bits extracted by such a comparison are more reliable than the bits extracted by using previous methods. The main disadvantages of this scheme are that it is inefficient due to unused RO pairs, and only a single bit is extracted from the (semi-) continuous RO outputs. We review transform-coding based RO PUF methods below that significantly improve on these methods without changing the standard RO hardware designs.
  2.5. SRAM PUFs
There are multiple memory-based PUFs such as SRAM, Flip-flop, DRAM, and Butterfly PUFs. Their common feature is to possess a small number of challenge-response pairs with respect to their sizes. As the most promising memory-based PUF type that is already used in the industry, we consider SRAM PUFs that use the uncontrollable settling state of bi-stable circuits [
18]. In the standard SRAM design, there are four transistors used to form the logic of two cross-coupled inverters, as depicted in 
Figure 2, and two other transistors to access the inverters. The power-up state, i.e., 
 or 
, of an SRAM cell provides one secret bit. Concatenating many such bits allows to generate a secret key from SRAM PUFs on demand. We provide an open problem about SRAM PUFs in 
Section 10.
  3. Correlated, Biased, and Noisy PUF Outputs
PUF circuit outputs are biased (nonuniform), correlated (dependent), and noisy (erroneous). We review a transform-coding algorithm that extracts an almost i.i.d. uniform bit sequence from each PUF, so a helper-data generation algorithm can correct the bit errors in the sequence generated from noisy PUF outputs. Using this transform-coding algorithm, we also obtain memoryless PUF measurement-channel models, so standard information-theoretic tools, which cannot be easily applied to correlated sequences, can be used.
Remark 1. The bias in the PUF circuit outputs is considered in the PUF literature to be a big threat against the security of the key generated from PUFs since the bias allows to apply, for example, machine learning attacks. However, it is illustrated in [19] (Figure 6) that the output bias does not change the information-theoretic rate regions significantly, illustrating that there exist code constructions that do not require PUF outputs to be uniformly distributed.  We consider two scenarios, where a secret key is either generated from PUF outputs (i.e., generated secret [GS] model) or they are bound to PUF outputs (chosen secret [CS] model). An example of GS methods is code-offset fuzzy extractors (COFE) [
20], and an example of the CS methods is the fuzzy-commitment scheme (FCS) [
21]. We first analyze a method that significantly improves privacy, reliability, hardware cost and secrecy performance, by transforming the PUF outputs into a frequency domain, which are later used in the FCS. We remark that the information-theoretic analysis of the CS model follows directly from the analysis of the GS model [
22], so one can use either model for comparisons.
PUF output correlations might cause information leakage about the PUF outputs (i.e., privacy leakage) and about the secret key (i.e., secrecy leakage) [
22,
23]. Furthermore, channel codes are required to satisfy the constraint on the reliability due to output noise. The transform coding method proposed in [
24] adjusts the PUF output noise to satisfy the reliability constraint in addition to reducing the PUF output correlations.
  3.1. PUF Output Model
Consider a (semi-)continuous output physical function such as an RO output as a source with real valued outputs 
. Since in a two-dimensional (2D) array the maximum distance between RO hardware logic circuits is less than in a one-dimensional array, decreasing the variations in the RO outputs caused by surrounding hardware logic circuits [
25], we consider a 2D RO array of size 
 that can be represented as a vector random variable 
. Each device embodies a single 2D RO array that has the same circuit design and we have 
, where 
 is a probability density function. Mutually independent and additive Gaussian noise denoted as 
 disturbs the RO outputs, i.e., we have noisy RO outputs 
. Since 
 and 
 are dependent, using these outputs a secret key can be agreed [
26,
27].
Remark 2. PUF outputs are noisy, as discussed above in this section. However, the first PUF outputs are used by, for example, a manufacturer to generate or embed a secret key, which is called the enrollment procedure. Since a manufacturer can measure multiple noisy outputs of the same RO to estimate the noiseless RO output, we can consider that the PUF outputs measured during enrollment are noiseless. However, during the reconstruction step, for example, an IoT device observes a noisy RO output, which can be the case because the IoT device cannot measure the RO outputs multiple times due to delay and complexity constraints. Therefore, we consider a key-agreement model where the first measurement sequence (during enrollment) is noiseless and the second measurement sequence (during reconstruction) is noisy; see also Section 8. Extensions to key agreement models with two noisy sequences, where the noise components can be correlated, are discussed in [23,28,29].  We extract i.i.d. symbols from 
 and 
 such that information theoretic tools used in [
30] for the FCS can be applied. An algorithm is proposed in [
24] to obtain almost i.i.d. uniformly-distributed and binary vectors 
 and 
 from 
 and 
, respectively. For such 
 and 
, we can define a binary error vector as 
, where ⊕ is the modulo-2 sum. We then obtain the random sequence 
, so the channel 
 is a binary symmetric channel (BSC) with crossover probability 
p. We discuss a transform-coding method below, which further provides reliability guarantees for each bit generated.
The FCS can reconstruct a secret key from dependent random variables with zero secrecy leakage [
21]. For the FCS, depicted in 
Figure 4, an encoder 
 maps a secret key 
, which is uniformly distributed in the set 
, into a codeword 
 with binary symbols that are later added to the PUF-output sequence 
 in modulo-2 during enrollment. The output is called helper data 
, sent to a database via a noiseless, public and authenticated communication link. The sum of 
W and 
 in modulo-2 is 
, mapped to a secret key estimate 
 during reconstruction by the decoder 
.
We next give information-theoretic rate regions for the FCS; see [
31] for information-theoretic notation and basics.
Definition 2. The FCS can achieve a secret-key vs. privacy-leakage rate pair  with zero secrecy leakage (i.e., perfect secrecy) if, given any , there is some , and an encoder and decoder pair for which we have  andwhere (3) suggests that S and W are independent and (4) suggests that the rate of dependency between  and W is bounded. The achievable secret-key vs. privacy-leakage rate, or key-leakage, region  for the FCS is the union of all achievable pairs.  Theorem 1 ([
30])
. The key-leakage region  for the FCS with perfect secrecy, uniformly-distributed X and Y, and a channel  iswhere  is defined as the binary entropy function. The region 
 of all achievable (secret-key, privacy-leakage) rate pairs for the CS model with a negligible secrecy-leakage rate is [
22]
        
        such that 
 forms a Markov chain and it suffices to have 
. The auxiliary random variable 
U represents a distorted version of 
X through a channel 
. The FCS is optimal only at the point 
 [
30], corresponding to the maximum secret-key rate.
  4. Transformation Steps
Transform coding methods decrease RO output correlations for ROs that are in the same 2D array by using, for example, a linear transformation. We discuss a transform-coding algorithm proposed in [
32] as an extension of [
24] to provide reliability guarantees to each generated bit. Joint optimization of the error-correction code and quantizer in order to maximize the reliability and secrecy are the main steps. The output of these post-processing steps is a bit sequence 
 (or its noisy version 
) utilized in the FCS. It suffices to discuss only the enrollment steps, depicted in 
Figure 5, since the same steps are used also for reconstruction.
 are correlated RO outputs, where the cause of correlations is, for example, the surrounding logic in the hardware. A transform 
 with size 
 transforms RO outputs to decrease output correlations. We model each output 
T in the transform domain, i.e., 
transform coefficient, calculated by transforming the RO outputs given in the dataset [
33] by using the Bayesian information criterion (BIC) [
34] and the corrected Akaike’s information criterion (AICc) [
35], suggesting a Gaussian distribution as a good fit for the discrete Haar transform (DHT), discrete Walsh–Hadamard transform (DWHT), DCT, and Karhunen–Loève transform (KLT).
In 
Figure 5, the histogram equalization changes the probability density of the 
i-th coefficient 
 into a standard normal distribution so that quantizers are the same for all transform coefficients, decreasing the storage. Obtained coefficients 
 are independent when the transform coefficients 
 are jointly Gaussian and the transform 
 decorrelates the RO outputs perfectly. For such a case, scalar quantizers do not introduce any performance loss. Bit extraction methods and scalar quantizers are given below for the FCS with the independence assumption, which can be combined with a correlation-thresholding approach in practice.
  5. Joint Quantizer and Error-Correction Code Design
The steps in 
Figure 5 are applied to obtain a uniform binary sequence 
. We utilize a quantizer 
 that assigns quantization-interval values of 
, where 
 represents the number of bits obtained from the 
i-th coefficient. We have
      
      where we have 
, and 
 is the standard Gaussian distribution’s quantile function. A length-
 bit sequence represents the output 
k. Since the noise has zero mean, we use a Gray mapping to determine the sequences assigned to each 
k, so neighboring sequences differ only in one bit.
  Quantizers with Given Maximum Number of Errors
We discuss a conservative approach that suppose either bits assigned to a quantized transform coefficient all flip or they are all correct. Let the correctness probability 
 of a coefficient be the probability that all bits assigned to a transform coefficient are correct, used to choose the number of bits extracted from a coefficient in such a way that one can design a channel encoder with a bounded minimum distance decoder (BMDD) to satisfy the reliability constraint 
, a common value for the block-error probability of PUFs that use CMOS circuits [
17].
Let 
 be the Q-function, 
 the probability density of the standard Gaussian distribution, and 
 the noise variance. The correctness probability can be calculated as
        
        where 
K is the length of the bit sequence assigned to a quantizer with quantization boundaries 
 from (
7) for an equalized Gaussian transform coefficient 
. In (
8), we calculate the probability that the additive noise will not change the quantization interval assigned to the transform coefficient, i.e., all bits associated with the transform coefficient stay the same after adding noise.
Assume that all errors in up to 
 coefficients can be corrected by a channel decoder, that the correctness probability 
 of the 
i-th coefficient 
 is greater than or equal to 
, and that errors occur independently. We first find the minimum correctness probability that satisfies 
, denoted as 
, by solving
        
        which allows to find the maximum bit-sequence length 
 for the 
i-th transform coefficient such that 
. The first transform coefficient, i.e., DC coefficient, 
 can in general be estimated by an attacker, which is the first reason why it is not used for key extraction. As the second reason, temperature and voltage changes affect RO outputs highly linearly, which affects the DC coefficient the most [
36]. Thus, we fix 
, so the total number of extracted bits can be calculated as
        
We first sort 
 values in descending order such that 
 for all 
. Thus, up to
        
        bit errors must be corrected for the worst case scenario. Using a BMDD, a block code with minimum distance 
 can satisfy this requirement [
37].
The advanced encryption standard (AES) requires a seed of, e.g., a secret key with length 128 bits. If the FCS is applied to PUFs to extract such a secret key for the AES, the block code designed should have a code length  bits, code dimension ≥128 bits, and minimum distance , given a . Such an optimization problem is generally hard to solve but, using an exhaustive search over different  values and over different algebraic codes, one can show the existence of a channel code that satisfies all constraints. Considering codes with low-complexity implementations is preferred for, e.g., IoT applications. We remark that the correctness probability might be significantly greater than , that the probability that less than  bits are actually in error when the i-th coefficient is erroneous is high, and that the bit errors do not necessarily happen in the coefficients from which the maximum-length bit sequences are obtained. Therefore, we next illustrate that even though  errors cannot be corrected, the constraint  is satisfied.
  7. Error-Correction Codes for PUFs with Transform Coding
Suppose that bit sequences extracted by using the transform-coding method are i.i.d. and uniformly distributed, so perfect secrecy is satisfied. We assume that signal processing steps mentioned above perform well, so we can conduct standard information- and coding-theoretic analysis. We provide a list of codes designed for the transform-coding algorithm by using the reliability metric considered above.
Select a channel code for the quantizer designed above for a fixed maximum number of errors for a secret key of size 128 bits. The correctness probabilities for the coefficients with the smallest and highest probabilities are depicted in 
Figure 6. Transform coefficients that represent the low-frequency coefficients are the most reliable, which are at the upper-left corner of the 2D transform-coefficient array with indices such as 
. These coefficients thus have the highest signal-to-noise ratios (SNRs). Conversely, the least reliable coefficients are observed to be coefficients that represent intermediate frequencies, indicating that one can define a metric called SNR-packing efficiency, defined similarly as the energy-packing efficiency, and show that it follows a more complicated scan order than the classic zig-zag scan order used for the energy-packing efficiency.
Fix 
, defined above, and calculate 
 via (
9), 
 via (
10), and 
 via (
11). If 
, 
 is large and 
 for all 
. In addition, if 
, then 
 bits. Furthermore, if 
 increases, 
 decreases, so the maximum of the number 
 of bits extracted among all used coefficients increases, increasing the hardware complexity. Thus, consider only the cases where 
. 
Table 2 shows 
, 
, and 
 for a range of 
 values used for channel-code selection.
Consider Reed–Solomon (RS) and binary (extended) Bose–Chaudhuri–Hocquenghem (BCH) codes, whose minimum-distance 
 is high. There is no BCH or RS code with parameters satisfying any of the 
 pairs in 
Table 2 such that its dimension is ≥128 bits. However, the analysis leading to 
Table 2 is conservative. Thus, we next find a BCH code whose parameters are as close as possible to an 
 pair in 
Table 2. Consider the binary BCH code that can correct all error patterns with up to 
 errors with the block length of 255 and code dimension of 131 bits.
First, extract exactly one bit from each transform coefficient, i.e.,  for all , so  bits are extracted, resulting in mutually-independent bit errors . Thus, all error patterns with up to  bit errors should be corrected by the chosen code rather than  bit errors. However, this value is still greater than .
The block error probability 
 for the BCH code 
 with a BMDD is equal to the probability of encountering more than 18 errors, i.e., we have
      
	  where 
 is the correctness probability of the 
i-th coefficient 
 as in (
8) for 
, 
 denotes the complement of the set 
A, and 
 is the set of all size-
j subsets of the set 
. 
 values are different and they represent probabilities of independent events because we assume that the transform coefficients are independent. We apply the discrete Fourier transform characteristic function method [
43] to evaluate the block-error probability with the result 
. The block-error probability (i.e., reliability) constraint is therefore satisfied by the BCH code 
, although the conservative analysis suggested otherwise. This code achieves a (secret-key, privacy-leakage) rate pair of 
 bits/source-bit, which is significantly better than previous results. We next consider the region of all achievable rate pairs for the CS model and the FCS for a BSC 
 with crossover probability 
, i.e., probability of being in error averaged over all used coefficients with the above defined quantizer. The (secret-key, privacy-leakage) rate pair of the BCH code, regions of all rate pairs achievable by the FCS and CS model, the maximum secret-key rate point, and a finite-length bound [
44] for the block length of 
 bits and 
 are depicted in 
Figure 7 for comparisons.
Denote the maximum secret-key rate as 
 bits/source-bit and the corresponding minimum privacy-leakage rate as 
 bits/source-bit. The gap between 
 at which the FCS is optimal and the rate tuple achieved by the BCH code can be explained by the short block length and small block-error probability. However, the finite-length bound given in [
44] (Theorem 52) suggests that the FCS can achieve the rate tuple 
 bits/source-bit, shown in 
Figure 7. Better channel code designs and decoders (possibly with higher hardware implementation complexity) can improve the performance, but they might not be feasible for IoT applications. 
Figure 7 shows that there are other code constructions (that are not standard error-correcting codes) that can achieve smaller privacy-leakage and storage rates for a fixed secret-key rate, illustrated below.
  8. Code Constructions for PUFs
Consider the two-terminal key agreement problem, where the identifier outputs during enrollment are noiseless. We mention two optimal linear code constructions from [
45] that are based on distributed lossy source coding (or Wyner–Ziv [WZ] coding) [
46]. The random linear code construction achieves the GS and CS models’ key-leakage-storage regions and the nested polar code construction jointly designs vector quantization (during enrollment) and error correction (during reconstruction) codes. Designed nested polar codes improve on existing code designs in terms of privacy-leakage and storage rates, and one code achieves a rate tuple that existing methods cannot achieve.
Several practical code constructions for key agreement with identifiers have been proposed in the literature. For instance, the COFE and the FCS both require a standard error-correction code to satisfy the constraints of, respectively, the key generation (GS model) and key embedding (CS model) problems, as discussed above. Similarly, a polar code construction is proposed for the GS model in [
47]. These constructions are sub-optimal in terms of storage and privacy-leakage rates.
A Golay code is used as a vector quantizer (VQ) in [
22] in combination with distributed lossless source codes (or Slepian–Wolf [SW] codes) [
48] to increase the ratio of key vs. storage rates (or key vs. leakage rates). Thus, we next consider VQ by using WZ coding to decrease storage rates. The WZ-coding construction turns out to be optimal, which is not coincidental. For instance, the bounds on the storage rate of the GS model and on the WZ rate (storage rate) have the same mutual information terms optimized over the same conditional probability distribution. This similarity suggests an equivalence that is closely related to the concept of formula duality. In fact, the optimal random code construction, encoding, and decoding operations are identical for both problems. One therefore can call the GS model and WZ problem functionally equivalent. Such a strong connection suggests that there might exist constructive methods that are optimal for both problems for all channels, which is closely related to the operational duality concept.
Consider the GS model, where a secret key is generated from a physical or biometric source, depicted in 
Figure 8(
a). The encoder 
 observes during enrollment the noiseless i.i.d. sequence 
 to generate public helper data 
W and a secret key 
S, i.e., 
. The decoder 
 observes during reconstruction the helper data 
W and a noisy measurement 
 of 
 through a memoryless channel 
 to estimate the secret key, i.e., 
. Similarly, the CS model is shown in 
Figure 8(
b), where a secret key 
S independent of 
 is chosen and embedded into the helper data, i.e., 
. The alphabets 
, 
, 
, and 
 are finite sets, which can be achieved if, for example, the transform-coding algorithm discussed above is applied.
Definition 3. For GS and CS models, a key-leakage-storage tuple  is achievable 
if, given any , there is an encoder, a decoder, and some  such that  andare satisfied. The key-leakage-storage 
regions  for the GS model and  for the CS model are the closures of the sets of achievable tuples for these models.  Theorem 2 ([
22])
. The key-leakage-storage regions  and  for the GS and CS models, respectively, arewhere  form a Markov chain.  and  are convex sets and  suffices for both rate regions. Remark 3. Improvement of the weak secrecy to strong secrecy, where (15) is replaced with , is possible by using multiple identifier output blocks as described in [49], e.g., by using multiple PUFs in the same device.  Assume, as above, that 
 and the channel 
 for 
. Define the star-operation as 
. The key-leakage-storage region of this GS model is
      
  Comparisons Between Code Constructions for PUFs
We consider three best code constructions proposed for the GS and CS models, which are COFE and the polar code construction in [
47] for the GS model, and FCS for the CS model, in order to compare them with the WZ-coding constructions. The FCS and COFE achieve only a single point on the key-leakage rate region boundary, i.e., 
 and 
.
Adding a VQ step, one can improve these two methods. During enrollment rather than , its quantized version  can be used for this purpose, which can be asymptotically represented as summing the original helper data and another independent random variable , i.e.,  is the (new) helper data. Modified FCS and COFE can achieve the key-leakage region when a union of all achieved rate tuples is taken over all . Nevertheless, the helper data of the modified FCS and COFE have length n bits, i.e., the storage rate is 1 bit/source-bit, which is suboptimal.
The storage rate of 1 bit/source-bit is decreased by using the polar code construction proposed in [
47]. Nevertheless, this construction cannot achieve the key-leakage-storage region. In addition, in [
47] there is an assumption that a “private” key that is shared between the encoder and decoder is available, which is not realistic because there is a need for hardware protection against invasive attacks to have such a private key. If such a hardware protection is feasible, there is no need to utilize an on-demand key reconstruction and storage method like a PUF. The previous methods cannot, therefore, achieve the key-leakage-storage region for a BSC, unlike the distributed lossy source coding constructions proposed in [
45]. To compare such WZ-coding constructions, we use the ratio of key vs. storage rates as the metric, which determines the design procedures to control the storage and privacy leakage.
  9. Optimal Nested Polar Code Constructions
The first channel codes with asymptotic information-theoretic optimality and low decoding complexity are polar codes [
50], whose finite length performance is good when a list decoder is utilized. Nesting two codes is simple with polar codes due to their simple matrix representation; therefore, one can use them for distributed lossy source coding [
51]. The 
channel polarization phenomenon, i.e., converting a channel into polarized binary channels by using a polar transform, is the core of polar codes. The polar transform takes a sequence 
 with unfrozen and frozen bits as input and converts it into a codeword that has also length 
n. The decoder then observes a noisy codeword in addition to the fixed frozen bits of 
 in order to estimate the bit sequence 
. A polar code with block length 
n, and frozen bit sequence 
 at indices 
 are denoted as 
. We next utilize nested polar codes that are proposed for WZ coding in [
51].
  9.1. The GS Model Polar Code Construction
Consider two nested polar codes 
) 
 (
n, 
, 
V) 
  = 
 ⋃ 
    = [
W,
V]
 W  m2  V  m1  m1  m2 satisfy
		
		 for a 
δ > 0 and some distortion 
q ∈ [0,0.5]. Two polar codes 
(
n,
, 
) and 
(
n,
,
V) are nested since the set of indices 
 refer to frozen channels with values 
V, which are common to both polar codes, and the code 
 has further frozen channels with values 
W at indices 
.
Since the rate of 
 is greater than the capacity of the lossy source coding problem for an average distortion 
q, it functions as a VQ with distortion 
q. Furthermore, since the rate of 
 is less than the channel capacity of the BSC(
), it functions as an error-correcting code. We want to calculate the values 
W during enrollment, stored as the public helper data, such that 
 can be used during reconstruction to estimate the key 
S with length 
, which is depicted in 
Figure 9. We assign the all-zero vector to 
V, so to not increase storage, which does not affect the average distortion 
 between 
 and 
 defined below; see [
51] (Lemma 10) for a proof.
During enrollment, the PUF outputs 
 are observed by a polar decoder of 
 and considered as noisy measurements of a sequence 
 measured through a BSC
, i.e., 
 is quantized into 
 by a polar decoder of 
. The polar decoder puts out the sequence 
 and the bit values 
W at its indices 
 are publicly stored as the helper data. Furthermore, the bit values at indices 
 are assigned as the secret key 
S. We remark that the polar transform of 
 is the sequence 
 that is the quantized (or distorted) version of 
. Consider the error sequence 
, which also models the distortion between 
 and 
. The error sequence is shown in [
51] (Lemma 11) to resemble a sequence that is distributed according to 
 when 
n tends to 
∞.
During reconstruction, a polar decoder of  then observes , a noisy version of  measured through a BSC. The frozen bits  = [V,W]    Un    j ∈ {1,2,…,n}\.
Next, a design procedure to implement practical nested polar codes that satisfy these properties is summarised.
Nested polar codes 
 must be constructed jointly such that the sets of indices 
 and 
 result in codes that satisfy the security and reliability constraints simultaneously. Suppose the block length 
n, key length 
, target block-error probability 
, and BSC crossover probability 
 are given, which depends on the PUF application considered. Then we have the following design procedure [
45]:
Design a polar code  with rate , corresponding to fixing its indices  that determine the frozen bits. This step is a conventional error-correcting code design task.
Find the maximum BSC crossover probability  for which the code  achieves the target block-error probability , which can be achieved by evaluating the performance of  for a BSC over a crossover probability range. Using the inverse of the star-operation , the target distortion averaged over a large number of realizations of  that should be achieved by  is . This step can be applied via Monte-Carlo simulations.
Find an index set , representing the frozen set of , such that  and the target distortion  is achieved with a minimal amount of helper data. This step can be applied by starting with  and then computing the resulting average distortion  obtained from Monte-Carlo simulations. If  is greater than , we remove elements from  according to polarized bit channel reliabilities. This step is repeated until the resulting average distortion  is less than the target (or desired) distortion .
An additional degree of freedom is provided by varying the distortion level in the design procedure above, making the design procedure suitable for numerous applications. Using this degree of freedom, PUFs with different BSC crossover probabilities  can be supported by using the same nested polar codes with different distortion levels. Similarly, different PUF applications with different target block-error probabilities  can also be supported by using the same nested codes with different distortion levels.
  9.2. Designed GS Model Nested Polar Codes
We design nested polar codes to generate a secret key 
S of length 
 bits, used in the AES. Furthermore, the common target block-error probability for PUFs used in an FPGA is 
 and the common BSC 
 crossover probability for SRAM and RO PUFs is 
 [
6,
36]. We consider these PUF applications and parameters to design nested polar codes that improve on previously proposed codes.
Code 1: Suppose a block length of  bits and a fixed list size of 8 for polar successive cancellation list (SCL) decoders are used for nested codes. First, the code  with rate  is designed to determine , which is defined in the design procedure steps above, obtained by using the SCL decoder. We obtain the crossover probability value , corresponding to a target distortion of . This target distortion is obtained with a minimal helper data W length of  bits.
Code 2: Suppose a block length of  bits. Applying the design procedure steps given above, we obtain for Code 2 the value , resulting in a target distortion of . This target distortion is obtained with a minimal helper data W length of  bits.
For these nested polar code designs, the error probability  is considered as the average error probability over a large number of input realizations, corresponding to a large number of PUF circuits that have the same circuit design. This result can be improved by satisfying the target error probability for each input realization, which can be implemented by using the maximum distortion rather than  in the design procedure discussed above. A block-error probability that is ≤ can be guaranteed for  of all realizations of input  by including an additional 32 bits for the helper data W for Code 1 and an additional 33 bits for Code 2. The numbers of additional bits included are small because the distortion q has a small variance for the block lengths considered. For code comparisons below, we depict the sizes of helper data needed to guarantee the target block-error probability of  for  of all PUF realizations.
  9.3. Comparisons of Codes
The boundary points of 
 for 
 are projected onto the storage-key 
 plane and depicted in 
Figure 10. The point 
, defined in 
Section 3.1, is also depicted. Furthermore, we use the random coding union bound from [
44] (Theorem 16) to obtain the rate pairs that can be achieved by using the FCS or COFE. These points are shown in 
Figure 10 in addition to the rate tuples achieved by the previous SW-coding based polar code design from [
47], and Codes 1 and 2 discussed above.
The COFE and FCS result in a storage rate of 1 bit/source-bit, which is strictly suboptimal. The previous SW-coding based polar code construction in [
47] achieves a rate tuple such that 
 bit/source-bit, as expected because it is an SW-coding construction that corresponds to a syndrome coding method in the binary case. The previous SW-coding based polar code construction improves the rate tuples achieved by the COFE and FCS in terms of the ratio of key vs. storage rates. Code 1 achieves the key-leakage-storage tuple of 
 bits/source-bit and Code 2 of 
 bits/source-bit, which significantly improve on all previous code constructions without any private key assumption. Thus, Codes 1 and 2 results also suggest that for these parameters increasing the block length increases the 
 ratio, which is 
 for Code 1 and 
 for Code 2. Furthermore, the privacy-leakage and storage rate tuple achieved by Code 2 cannot be achieved by using previous constructions without applying the time sharing method, because Code 2 achieves the privacy-leakage (and storage) rate of 
 bits/source-bit that is less than the minimal privacy-leakage (and storage) rates 
 bits/source-bit that can be achieved by using previous code constructions.
To find an upper bound on the the ratio of key vs. storage rates for the maximum secret-key rate point, we apply the sphere packing bound from [
52] (Equation (5.8.19)) for the channel 
 and code parameters 
, and 
. The sphere packing bound shows that the rate of 
, as depicted in 
Figure 9, must satisfy 
 bits/source-bit. Suppose the key rate is fixed to its maximum value 
 and the storage rate is fixed to its minimum value 
, so we have the ratio of 
. Similarly, for 
 we obtain the ratio of 
. The two finite-length results that are valid for WZ-coding constructions with nested codes indicate that ratio of key vs. storage rates achieved by Codes 1 and 2 can be further increased. Using different nested polar codes that improve the minimum-distance properties, as in [
53], or using nested algebraic codes for which design methods are available in the literature, as in [
54], one can reduce the gaps to the finite-length bounds calculated for nested code constructions. We remark again that such optimality-seeking approaches, for example, based on information-theoretic security, provide the right insights into the best solutions for the digital era’s security and privacy problems.