1. Introduction
Physical unclonable functions or PUFs [
1,
2] utilize inherent manufacturing variations to produce hardware tokens that can be used as building blocks for authentication protocols. Many different physical phenomena have been proposed to provide PUF behavior including Ring Oscillators (ROs) [
2,
3,
4], crosscoupled latches or flipflops [
5], capacitive particles in a coating [
6], and beads in an optical card [
1]. PUFs can be categorized by the type of functionality they provide: Weak PUFs, also called Physically Obfuscated Keys (POKs), are used for secure key storage. Strong PUFs implement (pseudo) random functions based on exploiting manufacturing variations that vary from PUF instance to PUF instance; strong PUFs are used for device identification and authentication. The primary advantage of using a strong PUF is that the communication between the authenticator and the physical device need not be protected. This is because each authentication uses a distinct input to the “pseudorandom function”.
In this work, we deal with the design and implementation of strong PUFs. A strong PUF is a physical token that implements a function that translates
challenges into
responses. The
reliability property of a strong PUF says that if the same challenge is provided multiple times, the responses should be close (according to some distance metric) but not necessarily the same. The
security property is that an adversary with access to the PUF (and the ability to see responses for chosen queries) should not be able to predict the response for an unqueried challenge. Recently, attacks in this model allow creation of accurate models for many proposed strong PUFs [
7,
8,
9,
10,
11]. We only consider attacks using the input and output behavior. An additional required security property is that the adversary should not be able to physically clone the device to create a device with similar behavior (see [
12] for more on required properties).
These attacks can be mitigated by designing new PUFs or adding protections to the PUF challengeresponse protocol. Our focus is on improving the challengeresponse protocol. A natural way to prevent learning attacks is to apply a oneway hash to the output. However, the PUF reliability property only says that responses are noisy: they are close, not identical. Fuzzy extractors [
13] can remove noise at the cost of leaking information about the response. This information leakage prevents fuzzy extractors from being reused across multiple challenges [
14,
15,
16,
17]. Computational fuzzy extractors do not necessarily leak information [
18] and some constructions may be reusable [
19]. Herder et al. have designed a new computational fuzzy extractor and PUF challengeresponse protocol designed to defeat modeling attacks [
20]. Their construction is based on the learning parity with noise (LPN) problem [
21], a wellstudied cryptographic problem. Essentially, the LPN problem states that a product of a secret vector and a random matrix plus noise is very hard to invert. However, their work did not implement the protocol, leaving its viability unclear.
Our Contribution: We provide the first implementation of a challengeresponse protocol based on computational fuzzy extractors. Our implementation allows for arbitrary interleaving of challenge creation, called
$\mathrm{Gen}$, and the challenge response protocol, called
$\mathrm{Ver}$, and is stateless. Our implementation builds on a Ring Oscillator (RO) PUF on a Xilinx Zynq All Programmable SoC (System on Chip) [
22], which has an FPGA (Field Programmable Gate Array) and a coprocessor communicating with each other.
Our approach is based on the LPN problem, like the construction of Herder et al. [
20], with the following fundamental differences:
In order to minimize area overhead so that all control logic and other components fit well inside the FPGA, we reduce storage (by not storing the whole multiplication matrix of the LPN problem, but only storing its hash) and, most importantly, we outsource Gaussian elimination to the coprocessor.
We keep the same adversarial model: we only trust the hardware implementation of
$\mathrm{Gen}$ and
$\mathrm{Ver}$ in FPGA, i.e., we do not trust the coprocessor. In fact, we assume that the adversary controls the coprocessor and its interactions with the trusted FPGA logic; in particular, the adversary observes exactly what the coprocessor receives from the trusted FPGA logic and the adversary may experiment with the trusted FPGA logic through the communication interface. As in Herder et al. [
20], we assume that an adversary can not be successful by physically attacking the trusted FPGA logic with its underlying RO PUF. We notice that in order to resist side channel attacks on the FPGA logic itself, the logic needs to be implemented with side channel counter measures in mind. This paper gives a first proofofconcept without such counter measures added to its implementation.
In order to outsource Gaussian elimination to the coprocessor, the trusted FPGA logic reveals socalled “confidence information” about the RO PUF to the coprocessor, which is controlled by the adversary. This fundamentally differs from Herder et al. [
20] where confidence information is kept private.
To prove security of our proposal, we introduce a mild assumption on the RO PUF output bits: we need to assume that the RO PUF output bits are represented by an “LPNadmissible” distribution for which the LPN problem is still conjectured to be hard. We argue that the RO PUF output distribution is LPNadmissible for two reasons (see Definition 1 and discussion after Theorem 1):
Since we reveal the “confidence information”, we can implement a masking trick which, as a result, only reveals a small number of equations, making the corresponding LPN problem very short; we know that very short LPN problems are harder to solve than long ones.
Studies have shown that the RO pairs which define the RO PUF on FPGA can be implemented in such a way that correlation among RO pairs can be made extremely small; this means that the RO PUF creates almost identical independent distributed (i.i.d.) noise bits in the LPN problem and we know that, for i.i.d. noise bits, the LPN problem has been wellstudied and conjectured to be hard for decades.
Our main insight is that “confidence information” does not have to be kept private—this is opposed to what has been suggested in Herder et al. [
20]. As a result of our design decisions:
The area of our implementation on FPGA is 49.1 K LUTs (Lookup Tables) and 58.0 K registers in total. This improves over the estimated area overhead of 65.7 K LUT and 107.3 K registers for the original construction of Herder et al. [
20].
The throughput of our implementation is 1.52 K $\mathrm{Gen}$ executions per second and 73.9 $\mathrm{Ver}$ executions per second.
According to our experiments on real devices, even though the underlying RO PUF has a measured maximum error rate (i.e., the fraction of RO pairs producing a wrong response bit) of 8% for temperatures from 0 to 70 degree Celsius, $\mathrm{Ver}$ correctly reconstructed responses for 1000 measurements.
Organization: This paper is organized as follows: In
Section 2, we introduce the necessary background and the original LPNbased PUF design of [
20]. Our simplified and implementation friendly LPNbased PUF construction is described in
Section 3. The implementation details and security analysis with discussion on admissible LPN distributions are provided in
Section 4 and
Section 5, respectively. We compare our work with related work in
Section 6. The paper concludes in
Section 7.
3. Our Construction
It costs significant area overhead to perform Gaussian elimination in hardware. For this reason, we propose a hardware software codesign of LPNbased PUFs where Gaussian elimination is pushed to the untrusted software, i.e., only the hardware components are assumed to be trusted in that they are not tampered with and an adversary can only observe the interactions between hardware and software.
As in the original construction, our LPNbased PUF has a Physical Obfuscated Key (POK), which always measures the same response, at its core. It also has two modes of operation:
$\mathrm{Gen}$ (
Figure 3) and
$\mathrm{Ver}$ (
Figure 4). Each instance of
$\mathrm{Gen}$ creates a challengeresponse pair. If a correctly created challenge is given to the LPNbased PUF operating in
$\mathrm{Ver}$ mode, then the corresponding response is regenerated. In
Figure 3 and
Figure 4, the functions in white boxes are implemented in hardware and are trusted. The processor (in the grey box) and the software running on it are considered to be untrusted; therefore, all of the computation executed by the software should be verified by the trusted hardware in the diamond shaped boxes.
Generation $\mathbf{Gen}$: Matrix $\mathbf{A}$ is selected at random by the manufacturer, and it can be the same for all the devices. Therefore, its hash value can be completely hard coded into the circuit to prevent adversaries from manipulating matrix $\mathbf{A}$. In our construction, a POK is implemented by m Ring Oscillator pairs (RO pairs), where m is the number of rows in matrix $\mathbf{A}$. Note that we use a POK, but our protocol supports a large number of challengeresponse pairs yielding a strong PUF overall.
The POK generates a vector $\mathbf{d}$ with count differences coming from the RO pairs. The sign of each entry in $\mathbf{d}$ gives a binary vector $\mathbf{e}$, and the absolute value of the entries in $\mathbf{d}$ represents confidence information $\mathbf{co}$. $\mathrm{Gen}$ selects a set $I$ of $2n$ bits from $\mathbf{e}$ where the corresponding confidence information is at least some predefined threshold ${T}_{min}$.
The processor feeds the rows of $\mathbf{A}$ one by one into the hardware. By using a bit select module based on set $I$, the hardware extracts and feeds the rows of ${\mathbf{A}}_{I}$ to a hardware matrixvector multiplier. The hardware matrixvector multiplier takes input matrix ${\mathbf{A}}_{I}$, an input vector $\mathbf{s}$ generated by a True Random Number Generator (TRNG), and ${\mathbf{e}}_{I}$, and computes ${\mathbf{b}}_{I}={\mathbf{A}}_{I}\xb7\mathbf{s}\oplus {\mathbf{e}}_{I}$. We set all bits ${\mathbf{b}}_{i}=0$ for $i\notin I$ to generate the complete vector $\mathbf{b}$ of m bits. This masking trick has the advantage that no unnecessary information is given to the adversary (and will allow us to have a reduction to a very short LPNproblem). After $\mathbf{b}$ is constructed, hash values ${h}_{1}=\mathrm{H}(\mathbf{b},\mathbf{s},1)$ and ${h}_{0}=\mathrm{H}(\mathbf{b},\mathbf{s},0)$ are computed. The challenge is $({h}_{1},I,\mathbf{b})$ with response ${h}_{0}$.
Since any software computation is considered untrusted, the hardware needs to verify the hash of $\mathbf{A}$. This circuitry (with hash $\mathrm{H}(\mathbf{A})$ embedded) is used in $\mathrm{Gen}$ mode to verify that the (untrusted) processor provides the correct $\mathbf{A}$ specified by the manufacturer, instead of an adversarially designed matrix. This is needed because the underlying LPN problem is only hard if $\mathbf{A}$ is a randomly chosen matrix; an adversarially designed $\mathbf{A}$ could leak the POK behavior. Similarly, matrix $\mathbf{A}$ also needs to be verified in $\mathrm{Ver}$ for the same reason.
Verification $\mathbf{Ver}$: In $\mathrm{Ver}$ mode, the adversary inputs a possibly corrupted or maliciously generated challenge $({I}^{\prime},{\mathbf{b}}^{\prime},{h}_{1}^{\prime})$ ( in $\mathrm{Ver}$, we denote all the variables, which should have the same values as the corresponding variables used in $\mathrm{Gen}$, as the original variable followed with a single quotation mark (${}^{\prime}$). e.g., $\mathbf{b}$ in $\mathrm{Gen}$ should be equal to ${\mathbf{b}}^{\prime}$ in $\mathrm{Ver}$, if it is not maliciously manipulated by adversaries). Before a corresponding response will be reconstructed, the hardware in $\mathrm{Ver}$ mode needs to check whether $({I}^{\prime},{\mathbf{b}}^{\prime},{h}_{1}^{\prime})$ was previously generated in $\mathrm{Gen}$ mode.
The POK measures again count differences from its RO pairs and obtains ${\mathbf{e}}^{\prime}$ and ${\mathbf{co}}^{\prime}$. There are some errors in ${\mathbf{e}}^{\prime}$, so $\mathbf{e}\ne {\mathbf{e}}^{\prime}$. For indices i corresponding to high confidence, we expect ${\mathbf{e}}_{i}^{\prime}\ne {\mathbf{e}}_{i}$ with (very) small probability. We use this fact to remove (almost all of) the noise from the linear system.
The POK observes which i corresponds to a bit ${\mathbf{e}}_{i}^{\prime}$ that has high confidence value; we call the set of reliable bit positions ${I}^{*}$, which should have a similar size as that of $I$. The POK then sends ${I}^{*}$ to the processor. The processor takes input challenge $c=({h}_{1}^{\prime},{I}^{\prime},{\mathbf{b}}^{\prime})$ and picks a subset ${I}^{\prime \prime}\subset {I}^{\prime}\cap {I}^{*}$ with ${I}^{\prime \prime}=n$ such that matrix ${\mathbf{A}}_{{I}^{\prime \prime}}^{\prime}$ has fullrank. We notice that, by using a subset ${I}^{\prime \prime}$ of both ${I}^{\prime}$ and ${I}^{*},$ the probability ${\mathbf{e}}_{i}\ne {\mathbf{e}}_{i}^{\prime}$ for $i\in {I}^{\prime \prime}$ is much smaller than for $i\in {I}^{\prime}$ or ${I}^{*}$.
The processor computes and transmits to the hardware matrixvector multiplier the inverse matrix
${({\mathbf{A}}_{{I}^{\prime \prime}}^{\prime})}^{1}$ of matrix
${\mathbf{A}}_{{I}^{\prime \prime}}^{\prime}$. Next, the rows of matrix
${\mathbf{A}}^{\prime}$ are fed one by one into the hardware and its hash is computed and verified against
$\mathrm{H}(\mathbf{A})$. The rows corresponding to submatrix
${\mathbf{A}}_{{I}^{\prime \prime}}^{\prime}$ are extracted (using a bit select functionality based on
${I}^{\prime \prime}$) and the columns of
${\mathbf{A}}_{{I}^{\prime \prime}}^{\prime}$ are fed into the hardware matrixvector multiplier one by one. This verifies that
${\mathbf{A}}_{{I}^{\prime \prime}}^{\prime}$ and
${({\mathbf{A}}_{{I}^{\prime \prime}}^{\prime})}^{1}$ are inverses of one another (the equal identity matrix box in
Figure 4 verifies that the columns of the identity matrix are being produced one after another). The correctness of matrix
${({\mathbf{A}}_{{I}^{\prime \prime}}^{\prime})}^{1}$ is guaranteed by checking whether the hash of matrix
${\mathbf{A}}^{\prime}$ matches the hash value
$\mathrm{H}(\mathbf{A})$ (as was done in
$\mathrm{Gen}$). The correctness of
${\mathbf{A}}^{\prime}$ implies the correctness of
${\mathbf{A}}_{{I}^{\prime \prime}}^{\prime}$ (since
${\mathbf{A}}_{{I}^{\prime \prime}}^{\prime}$ was fed as part of
${\mathbf{A}}^{\prime}$ into the hardware). Therefore, if all checks pass, then
${({\mathbf{A}}_{{I}^{\prime \prime}}^{\prime})}^{1}$ is indeed the inverse of a properly corresponding submatrix of the matrix
$\mathbf{A}$ used in
$\mathrm{Gen}$. (The reason why this conclusion is important is explained in
Section 5.).
Next, the hardware computes the vector
${\mathbf{b}}_{{I}^{\prime \prime}}^{\prime}\oplus {\mathbf{e}}_{{I}^{\prime \prime}}^{\prime}$ and multiplies this vector with matrix
${({\mathbf{A}}_{{I}^{\prime \prime}}^{\prime})}^{1}$ using the hardware matrixvector multiplier:
The (nonmalleable) hash value ${h}_{1}^{\prime \prime}=\mathrm{H}({\mathbf{b}}^{\prime},{\mathbf{s}}^{\prime},1)$ is compared with the input value ${h}_{1}^{\prime}$ from the challenge.
Suppose that input ${h}_{1}^{\prime}$ was generated as ${h}_{1}^{\prime}=\mathrm{H}(\mathbf{b},\mathbf{s},1)$ for some $\mathbf{b}$ and $\mathbf{s}$. If ${h}_{1}^{\prime \prime}={h}_{1}^{\prime}$, we conclude that ${\mathbf{s}}^{\prime}$ is correct in that it is equal to input $\mathbf{s},$ which was hashed into ${h}_{1}^{\prime}$ when ${h}_{1}^{\prime}$ was generated, and ${\mathbf{b}}_{{I}^{\prime \prime}}^{\prime}$ is correct in that it is equal to input ${\mathbf{b}}_{{I}^{\prime \prime}},$ which was hashed into ${h}_{1}^{\prime}$ when ${h}_{1}^{\prime}$ was generated.
Since the adversary is not able to solve the LPN problem, the check
${\mathbf{b}}_{{I}^{\prime \prime}}^{\prime}={\mathbf{b}}_{{I}^{\prime \prime}}$ together with the conclusion that
${\mathbf{b}}_{{I}^{\prime \prime}}^{\prime}$ led to a proper solution
${\mathbf{s}}^{\prime}$ of the LPN problem by using the bits in the POKgenerated vector
${\mathbf{e}}_{{I}^{\prime \prime}}^{\prime}$ implies that only the LPNbased PUF itself could have generated
${h}_{1}^{\prime}$ and, hence, the challenge. This means that the LPNbased PUF must have selected
$\mathbf{s}$ and produced the inputted challenge with
${h}_{1}^{\prime}$ during an execution of
$\mathrm{Gen}$. We notice that vector
${\mathbf{s}}^{\prime}=\mathbf{s}$ can only be recovered if
${\mathbf{e}}_{{I}^{\prime \prime}}$ in the execution of
$\mathrm{Gen}$ equals
${\mathbf{e}}_{{I}^{\prime \prime}}^{\prime}$ in (
1). We conclude that
$\mathrm{Ver}$ is now able to generate the correct response
${h}_{0}^{\prime \prime}=\mathrm{H}({\mathbf{b}}^{\prime},{\mathbf{s}}^{\prime},0)=\mathrm{H}(\mathbf{b},\mathbf{s},0)$ (since
${h}_{0}^{\prime \prime}$ must have been the response that was generated by the LPNbased PUF when it computed the challenge with
${h}_{1}^{\prime}={h}_{1}^{\prime \prime}$).
If ${h}_{1}^{\prime \prime}\ne {h}_{1}^{\prime}$, then likely ${\mathbf{e}}_{{I}^{\prime \prime}}$ and ${\mathbf{e}}_{{I}^{\prime \prime}}^{\prime}$ only differ in a few positions (if there is no adversary). By flipping up to t bits in ${\mathbf{e}}_{{I}^{\prime \prime}}^{\prime}$ (in a fixed pattern, first all single bit flips, next all double bit flips, etc.), new candidate vectors ${\mathbf{s}}^{\prime}$ can be computed. When the hash ${h}_{1}^{\prime \prime}$ verifies, the above reasoning applies and ${\mathbf{s}}^{\prime}=\mathbf{s}$ with the correct response ${h}_{0}^{\prime \prime}$. Essentially, since the bits we are using for verification are known to be reliable, if the system is not under physical attacks, then very likely there are no more than t bit errors in ${\mathbf{e}}_{{I}^{\prime \prime}}^{\prime}$. Internally, we can try all the possible error patterns on ${\mathbf{e}}_{{I}^{\prime \prime}}^{\prime}$ with at most t bit flips, and check the resulted ${h}_{1}^{\prime \prime}$ against ${h}_{1}^{\prime}$ to tell whether the current ${\mathbf{e}}_{{I}^{\prime \prime}}^{\prime}$ was used in $\mathrm{Gen}$ or not.
If none of the t bit flip combinations yields the correct hash value ${h}_{1}^{\prime}$, then the exception ⊥ is output. This decoding failure can be caused by attackers who feed invalid CRPs, or a very large environmental change that results in more than t bit errors in ${\mathbf{e}}_{{I}^{\prime \prime}}^{\prime}$, which can also be considered as a physical attack. We notice that allowing tbit error reduces the security parameter with ≈t bits since an adversary only needs to find ${\mathbf{e}}_{{I}^{\prime \prime}}^{\prime}$ within t bit errors from ${\mathbf{e}}_{{I}^{\prime \prime}}$ . In order to speed up the process of searching for the correct ${\mathbf{s}}^{\prime}$, we use a pipelined structure which keeps on injecting possible ${\mathbf{b}}_{{I}^{\prime \prime}}^{\prime}\oplus {\mathbf{e}}_{{I}^{\prime \prime}}^{\prime}$ (with at most t bit flips) to the hardware matrixvector multiplier.
Being able to recover
${\mathbf{s}}^{\prime}=\mathbf{s}$ is only possible if
${\mathbf{e}}_{{I}^{\prime \prime}}$ in the execution of
$\mathrm{Gen}$ and
${\mathbf{e}}_{{I}^{\prime \prime}}^{\prime}$ in Equation (
1) are equal up to
t bit flips. This is true with high probability if
${T}_{min}$ is large enough and
${I}^{\prime \prime}$ was properly selected as a subset of
${I}^{\prime}\cap {I}^{*}$. As explained by Herder et al. [
20],
m should be large enough so that an appropriate
${T}_{min}$ can be selected with the property that, when the RO pairs are measured again, there will be a set
${I}^{\prime \prime}\subseteq I\cap {I}^{*}$ of
n reliable bits: in particular, according to the theorical analysis in [
20], for
$n=128$, this corresponds to
m at most 450 RO pairs for operating temperature from 0 to 70 degrees Celsius and error probability (of not being able to reconstruct
$\mathbf{s}$) less than
${10}^{6}$. Readers are referred to Equation (2) in [
20] for an estimation of
${T}_{min}$ given the distribution of RO outputs and the desired error rate.
Comparison with previous work: Our approach differs from the construction of Herder et al. in [
20] in the following ways:
We output the indices of reliable bits to the untrusted processor, instead of keeping the positions of these reliable bits private inside the hardware. In
Section 5, we argue that the distribution of
$\mathbf{e}I$ and, in general,
$\mathbf{e}\mathbf{co}$ are still LPNadmissible.
By masking $\mathbf{b}$ (i.e., making $\mathbf{b}$ allzero outside ${\mathbf{b}}_{I}$), we can reduce the security to a very short LPN problem with $2n$ equations (corresponding to set $I$).
By revealing $I$ and ${I}^{*}$ to the processor, the processor can select a submatrix ${\mathbf{A}}_{{I}^{\prime \prime}}$ with ${I}^{\prime \prime}\subseteq I\cap {I}^{*},$ which is a full rank matrix. This would consume more area if done in hardware.
Since the processor knows the selected submatrix ${\mathbf{A}}_{{I}^{\prime \prime}}$, the processor can compute the inverse matrix. Hence, we do not need a complex Gaussian eliminator in hardware and we reuse the matrixvector multiplier used in $\mathrm{Gen}$ mode.
Because the processor with executing software is considered to be untrusted, we add mechanisms to check manipulation of ${\mathbf{A}}_{{I}^{\prime \prime}}$ and ${\mathbf{b}}_{{I}^{\prime \prime}}$.
Matrix $\mathbf{A}$ does not need to be hardcoded in circuitry. Instead, a hashof$\mathbf{A}$checking circuitry is hard coded in hardware giving less area overhead.
4. Implementation
We implemented our construction on Xilinx Zynq All programmable SoC [
22]. The Zynq platform contains an ARMbased programming system with some programmable logic around it. Having a hard core embedded, this platform makes our software hardware codesign implementation easier and more efficient in software execution. We implemented the software units on the ARM core and the hardware units in the programmable logic. The communication between these two parts is over an AXI Stream Interface in order to maximize the communication throughput. The FPGA layout of the implemented LPNbased PUF is shown in
Figure 5.
We have 450 RO pairs for generating
$\mathbf{e}$ with confidence information
$\mathbf{co}$ as depicted in
Figure 3 and
Figure 4. Each RO has five stages, and the two ROs in each RO pair are manually placed at adjacent columns on the FPGA to minimize the systematic effects [
33] (see
Figure 6). We measure the toggles at the output of each RO for 24 clock cycles to generate
$\mathbf{e}$ and
$\mathbf{co}$. In
$\mathrm{Gen}$ mode, module Index Selection compares vector
$\mathbf{co}$ with a threshold
${T}_{min}$ to produce an index vector, which indicates the positions of all the reliable bits. This is used in module Bit Selection to condense the 450bit vector
$\mathbf{e}$ to a 256bit vector
${\mathbf{e}}_{I}$ by selecting 256 bits out of all the reliable bits. Set
$I$ restricted to these 256 bits is sent to the processor as part of the generated challenge.
Next, the processor sends matrix
$\mathbf{A}$ (450 rows times 128 columns) to the hardware row by row. All the rows will be fed into the hash function to compute
$\mathrm{H}(\mathbf{A})$ to verify the correctness of
$\mathbf{A}$. Only the rows in
${\mathbf{A}}_{I}$, which will be used later in a matrix multiplication, are selected and stored by the Row Selection module. Since we implemented a pipelined matrixvector multiplier for multiplying a
$128\times 128$ matrix with a 128bit vector,
$\mathrm{Gen}$ multiplies the
$256\times 128$ submatrix
${\mathbf{A}}_{I}$ of
$\mathbf{A}$ with a randomly generated vector
$\mathbf{s}$ by loading this submatrix in two parts. After XORing
${\mathbf{e}}_{I}$, we obtain a 256bit vector
${\mathbf{b}}_{I}$. Module Bit Expand adds zeroes to create the full 450bit vector
$\mathbf{b}$. After Bit Expand, we feed the 450 bits of
$\mathbf{b}$ and the 128 bits of
$\mathbf{s}$ to the hash module to compute
${h}_{1}=\mathrm{H}(\mathbf{b},\mathbf{s},1)$ and
${h}_{0}=\mathrm{H}(\mathbf{b},\mathbf{s},0)$. We implemented a multiple ring oscillator based true random number generator [
34] to generate the 128bit vector
$\mathbf{s}$, and SHA256 [
35] is implemented as the hash function.
In $\mathrm{Ver}$ mode, 450 RO pairs are evaluated in the same way as in $\mathrm{Gen}$. Now, the module Index Selection generates index set ${I}^{*}$, which is sent to the processor. A correctly and nonmaliciously functioning processor should take the intersection of ${I}^{*}$ and the correct set ${I}^{\prime}=I$, which was produced by $\mathrm{Gen}$. From this intersection, the processor selects an index set ${I}^{\prime \prime}$ such that the submatrix ${\mathbf{A}}_{{I}^{\prime \prime}}^{\prime}={\mathbf{A}}_{{I}^{\prime \prime}}$ is invertible. Since a randomly selected 128 × 128 submatrix may not be invertible, it may require the processor to try a couple of times until finding an invertible matrix. Matrix ${({\mathbf{A}}_{{I}^{\prime \prime}}^{\prime})}^{1}$ is streamed to the hardware row by row and is stored in registers, which will be the matrix input for the matrixvector multiplier.
Next, the processor streams ${\mathbf{A}}^{\prime}=\mathbf{A}$ into the hardware row by row. All the rows will be fed into the hash function to compute $\mathrm{H}(\mathbf{A})$ to verify the correctness of $\mathbf{A}$. At the same time, the rows of ${\mathbf{A}}^{\prime}=\mathbf{A}$ are fed into the Row Selection module for selecting the 128 × 128 submatrix ${\mathbf{A}}_{{I}^{\prime \prime}}^{\prime}$. All the rows in ${\mathbf{A}}_{{I}^{\prime \prime}}^{\prime}$ are temporarily saved in an array of shift registers. After all the rows of ${\mathbf{A}}_{{I}^{\prime \prime}}^{\prime}$ are streamed in, the array of shift registers can shift out ${\mathbf{A}}_{{I}^{\prime \prime}}^{\prime}$ column by column and reuse the pipelined matrixvector multiplier in $\mathrm{Gen}$ to check whether the product of ${{\mathbf{A}}^{\prime}}_{{I}^{\prime \prime}}^{1}$ and each column of ${\mathbf{A}}_{{I}^{\prime \prime}}^{\prime}$ is a column of the identity matrix or not.
If the above two checks pass, then the inverse matrix
${({\mathbf{A}}_{{I}^{\prime \prime}}^{\prime})}^{1}$ will be multiplied with
${\mathbf{e}}_{{I}^{\prime \prime}}^{\prime}\oplus {\mathbf{b}}_{{I}^{\prime \prime}}^{\prime}$ to recover
${\mathbf{s}}^{\prime}$. Here, the processor should have given
${\mathbf{b}}^{\prime}=\mathbf{b}$ to the hardware so that the Bit Selection module can be used to obtain
${\mathbf{b}}_{{I}^{\prime \prime}}^{\prime}={\mathbf{b}}_{{I}^{\prime \prime}}$ (with
${I}^{\prime \prime}\subseteq {I}^{\prime}=I$). The recovered
${\mathbf{s}}^{\prime}$ is further verified by computing
${h}_{1}^{\prime \prime}=\mathrm{H}({\mathbf{b}}^{\prime},{\mathbf{s}}^{\prime},1)$. If
${h}_{1}^{\prime \prime}={h}_{1}^{\prime}$; then, the hardware computes and outputs
${h}_{0}^{\prime}=\mathrm{H}({\mathbf{b}}^{\prime},{\mathbf{s}}^{\prime},0)$. According to the calculation in [
20], we set
$t=1$. This means that we need to exhaustively try all the one bit flips. This means that there are 129 possible possible
${\mathbf{e}}_{{I}^{\prime \prime}}^{\prime}\oplus {\mathbf{b}}_{{I}^{\prime \prime}}^{\prime}$ in total (these can be fed one by one into the pipelined matrixvector multiplier). If none of these yields a correct
${h}_{1}^{\prime \prime}$, then the hardware will output an allzero vector to indicate a ⊥ response. Similarly, if any of the above checks (of the hash of
$\mathbf{A}$ and of the inverse matrix) fails, then the hardware will output the allzero vector as well.
Our implementation results. The area of our full design on FPGA is 49.1 K LUTs and 58.0 K registers in total. The area utilization of each part is shown in
Table 1. The three most costly components are two 128 × 128 register arrays and the 450 RO pairs, which form together the underlying RO PUF. The dynamic power consumption of the complete implementation is 1.753 W, and its static power consumption is 0.163 W.
The throughput of our implementation is measured as 1.52 K $\mathrm{Gen}$ executions per second and 73.9 $\mathrm{Ver}$ executions per second. The execution time of $\mathrm{Gen}$ is dominated by the matrix transmission, which takes 91% of the overall execution time. However, $\mathrm{Ver}$ is dominated by the software Gaussian elimination, where each Gaussian elimination takes about 3880 $\mathsf{\mu}$s to finish, and each $\mathrm{Ver}$ requires 3.47 Gaussian eliminations on average.
Comparison. The original construction in [
20] would need a hardware Gaussian eliminator in an implementation. The most efficient implementation of a pipelined Gaussian eliminator takes 16.6 K LUT and 32.9 K registers on FPGA for a 128 row matrix [
36]. In our design, we save this area by pushing the computation to the software.
One may argue that, in order to push Gaussian elimination to untrusted software, we have to add extra hardware to check for the correctness of the inverse matrix. Notice, however, for checking this inverse matrix, we reuse the matrixvector multiplier in $\mathrm{Gen}$. Therefore, the only additional hardware overhead is one 128 × 128 register array. If we do Gaussian elimination in hardware, then we need registers to store the whole matrix ${\mathrm{A}}_{{I}^{*}\cap {I}^{\prime}}^{\prime}$ of size 128 × 256 and the output matrix of the Gaussian elimination, which is another 128 × 128 bits: this is because a random matrix constructed by 128 rows in ${\mathrm{A}}_{{I}^{*}\cap {I}^{\prime}}^{\prime}$ may not have full rank. As a result, the hardware may need to try a couple of times in order to find an invertible submatrix of ${\mathrm{A}}_{{I}^{*}\cap {I}^{\prime}}^{\prime}$. For these reasons, compared to our implementation in this paper, Gaussian elimination in hardware will cost an additional 128 × 128 register utilization together with the control logic for randomly selecting 128 rows out of a 256row matrix.
If we would implement the original construction in hardware, then its area overhead without additional control logic is estimated at 65.7 K LUT (49.1 K + 16.6 K) and 107.3 K register (58.0 K + 32.9 K + 16.4 K). This resource utilization would be larger than the available resources on our FPGA (53.2 K LUT and 106.4 K registers).
Experimental Results. We characterized the error rate of RO pairs defined as the percentage of error bits generated in one 450 bit vector
$\mathbf{e}$. The error rate of the implemented 450 RO pairs at room temperature is
$2.7\%$, which is in the range (2∼
$4\%$) that has been reported in a large scale characterization of ring oscillators [
4]. We measured the error rate of 450 RO pairs under different temperatures from 0 to 70 degrees Celsius where the output of the 450 RO pairs is compared to a reference output vector
$\mathbf{e}$ generated at 25 degrees Celsius. We observed a maximum error rate of 8% (36 out of 450) over 1000 repeated measurements. This error rate is within the range of the error correction bound (9%) estimated in [
20]. We also characterized the bias of all the RO pair outputs,
$\tau =0.47$ for our implementation.
We experimented with the whole system under different temperatures (at
$0{\phantom{\rule{3.33333pt}{0ex}}}^{\circ}$C,
$25{\phantom{\rule{3.33333pt}{0ex}}}^{\circ}$C and
$70{\phantom{\rule{3.33333pt}{0ex}}}^{\circ}$C). This showed that
$\mathrm{Ver}$ was always able to reconstruct the proper response. No failure was observed over 1000 measurements under different tempratures. We did not perform testing on voltage variation and aging because the overall error rate is only affected by the error rate of the RO pair output bits. As long as the error rate of RO outputs is lower than 9% [
20], given the current implementation, we can have a large probability to regenerate the correct response. The overall error rate will not be affected by how we introduce the errors in RO pairs.
If a TRNG has already been implemented in the system, then the TRNG can be reused for the LPNbased PUF as well. As a part of a proofofconcept implementation of the LPNbased PUF, we did not perform a comprehensive evaluation of our implemented TRNG.
Future Direction. In our implementation, the area of LPN core mainly consists of two 128 × 128 bit register arrays for storing two matrices. It is possible to eliminate storage of these two matrices in order to significantly reduce the area at the cost of paying a performance penalty.
The proposed alternative implementation works as follows: instead of storing two matrices for checking (in $\mathrm{Ver}$) whether ${({\mathbf{A}}_{{I}^{\prime \prime}}^{\prime})}^{1}$ and ${\mathbf{A}}_{{I}^{\prime \prime}}^{\prime}$ are indeed inverse matrices of one another, we only store at most one row or one column in the hardware at the same time. In $\mathrm{Ver}$, we will need to first feed in matrix ${\mathbf{A}}^{\prime}$, and let the hardware check its hash. At the same time, the rows in ${\mathbf{A}}_{{I}^{\prime \prime}}^{\prime}$ are selected and fed into another hash engine to compute $\mathrm{H}({\mathbf{A}}_{{I}^{\prime \prime}}^{\prime}),$ which is separately stored. However, the hardware does not store any of the rows of ${\mathbf{A}}_{{I}^{\prime \prime}}^{\prime}$ (and this avoids the need for a 128 × 128 bit register array). Notice that after this process the authenticity of matrix ${\mathbf{A}}^{\prime}$ has been verified, and, as a result, we know that the rows of ${\mathbf{A}}_{{I}^{\prime \prime}}^{\prime}$ are equal to the rows of ${\mathbf{A}}_{{I}^{\prime \prime}}$, hence, the stored hash $\mathrm{H}({\mathbf{A}}_{{I}^{\prime \prime}}^{\prime})=\mathrm{H}({\mathbf{A}}_{{I}^{\prime \prime}})$ which can now be used to verify the submatrix ${\mathbf{A}}_{{I}^{\prime \prime}}^{\prime}$ whenever it is loaded again into the hardware.
Next, matrix ${({\mathbf{A}}_{{I}^{\prime \prime}}^{\prime})}^{1}$ is fed into the hardware column by column. When a column is fed into the hardware, e.g., the ith column, we store it in the hardware temporarily. Then, the processor sends the whole matrix ${\mathbf{A}}_{{I}^{\prime \prime}}^{\prime}$ to the hardware row by row. Its hash is computed on the fly and in the end compared with the stored hash $\mathrm{H}({\mathbf{A}}_{{I}^{\prime \prime}}^{\prime})=\mathrm{H}({\mathbf{A}}_{{I}^{\prime \prime}})$. At the same time, each received row of ${\mathbf{A}}_{{I}^{\prime \prime}}^{\prime}$ is multiplied (inner product) with the current stored ith column of ${({\mathbf{A}}_{{I}^{\prime \prime}}^{\prime})}^{1}$. This is used to check if the product is indeed equal to the corresponding bit in the ith column of the identity matrix. If this check passes for all rows of ${\mathbf{A}}_{{I}^{\prime \prime}}^{\prime}$ and the hash verifies as well, then this column will be added to the intermediate value register of ${({\mathbf{A}}_{{I}^{\prime \prime}}^{\prime})}^{1}\xb7({\mathbf{e}}_{{I}^{\prime \prime}}^{\prime}\oplus {\mathbf{b}}_{{I}^{\prime \prime}}^{\prime})$, based on whether ith bit of ${\mathbf{e}}_{{I}^{\prime \prime}}^{\prime}\oplus {\mathbf{b}}_{{I}^{\prime \prime}}^{\prime}$ equals 1 or not.
In the above protocol, we also hash all received columns of ${({\mathbf{A}}_{{I}^{\prime \prime}}^{\prime})}^{1}$ and store the hash in a separate register. If the above checks pass for all columns, then we know that this hash must correspond to ${({\mathbf{A}}_{{I}^{\prime \prime}})}^{1}$. This will facilitate the process of trying other possible versions of ${\mathbf{e}}_{{I}^{\prime \prime}}^{\prime}\oplus {\mathbf{b}}_{{I}^{\prime \prime}}^{\prime}$ in the future (where the processor again feeds matrix ${({\mathbf{A}}_{{I}^{\prime \prime}}^{\prime})}^{1}$ column by column so that its hash and at the same time ${({\mathbf{A}}_{{I}^{\prime \prime}}^{\prime})}^{1}\xb7({\mathbf{e}}_{{I}^{\prime \prime}}^{\prime}\oplus {\mathbf{b}}_{{I}^{\prime \prime}}^{\prime})$ can be computed).
The first trial/computation of ${({\mathbf{A}}_{{I}^{\prime \prime}}^{\prime})}^{1}\xb7({\mathbf{e}}_{{I}^{\prime \prime}}^{\prime}\oplus {\mathbf{b}}_{{I}^{\prime \prime}}^{\prime})$ requires the processor to feed in matrix ${\mathbf{A}}^{\prime}$ (450 × 128 bits) once, matrix ${\mathbf{A}}_{{I}^{\prime \prime}}^{\prime}$ (128 × 128 bits) 128 times (since it needs to be fed in once after each column of ${({\mathbf{A}}_{{I}^{\prime \prime}}^{\prime})}^{1}$ is fed in), and ${({\mathbf{A}}_{{I}^{\prime \prime}}^{\prime})}^{1}$ (128 × 128 bits) once. If the first trial on ${\mathbf{e}}_{{I}^{\prime \prime}}^{\prime}\oplus {\mathbf{b}}_{{I}^{\prime \prime}}^{\prime}$ fails, we will need to feed in ${({\mathbf{A}}_{{I}^{\prime \prime}}^{\prime})}^{1}$ a few more times until it recovers the correct $\mathbf{s}$; now, only $\mathrm{H}({({\mathbf{A}}_{{I}^{\prime \prime}}^{\prime})}^{1})$ needs to be checked, hence, ${\mathbf{A}}_{{I}^{\prime \prime}}^{\prime}$ does not need to be sent again and again. Therefore, we can estimate the throughput upper bound of this new implementation by our time measurement in our current implementation. Since the hardware computation time in our implementation does not dominate the overall computation time, we can estimate the performance of the new alternative implementation by only counting software computation time and data transmission time. In $\mathrm{Gen}$, this alternative implementation will have a similar execution time because the matrix can be multiplied with vector $\mathbf{s}$ row by row and output bit by bit. $\mathrm{Ver}$ in this alternative implementation will require transmitting 2,171,136 bits for the first trial/computation of ${({\mathbf{A}}_{{I}^{\prime \prime}}^{\prime})}^{1}\xb7({\mathbf{e}}_{{I}^{\prime \prime}}^{\prime}\oplus {\mathbf{b}}_{{I}^{\prime \prime}}^{\prime})$. Knowing that it takes about 600 $\mathsf{\mu}$s to send 57,600 bits in our implementation, transmitting 2,171,136 will require about 22,626 $\mathsf{\mu}$s, and on average we will need to try Gaussian elimination 3.472 times to find an invertible matrix, which will take about 13,471$\mathsf{\mu}$s. The throughput upper bound of this alternative implementation would be 27.0 $\mathrm{Ver}$ per second.
To implement this alternative solution, we can reuse some of the current components: Bit Select, Index Select, Bit Expand, ROs, TRNG and Communication (technically, Row Select can be reused as well, but in our current implementation, Row Select is highly integrated with matrix registers. Therefore, we cannot get a separate area utilization number of Row Select without matrix registers). We will need to double the size of the hash circuitry because we will always need two hash engines to run at the same time. The area of LPN core can be reduced significantly, the lower bound (without the state machine for controlling all the components) of the area utilization would be 2 K LUTs and 4.8 K registers. Adding to this the utilization of the other components, the total size would be at least 23.4 K LUTs and 24.8 K registers.
If area size needs to be further reduced, we recommend implementing the alternative solution at the cost of a 1/3 lower throughput of $\mathrm{Ver}$.
The comparison between our implementation and the estimation of the previous construction and an alternative implementation is summarized in
Table 2.
5. Security Analysis
We adopt the following security definition from Herder et al. [
20]:
Definition 2. A PUF defined by two modes $\mathrm{Gen}$ and $\mathrm{Ver}$ is ϵsecure with error δ ifand for all probabilistic polynomial time (PPT) adversaries $A$, ${\mathrm{Adv}}_{\mathit{PUF}}^{\mathrm{s}\mathrm{uprd}}(A)\le \u03f5$, which is defined in terms of the following experiment [20]. Algorithm 1 Stateless PUF Strong Security 
 1:
procedure ${\mathrm{Exp}}_{\mathbf{PUF}}^{\mathrm{s}\mathrm{uprd}}(A)$  2:
$A$ makes polynomial queries to $\mathrm{Gen}$ and $\mathrm{Ver}$.  3:
When the above step is over, $A$ will return a pair $(\mathbf{c},\mathbf{r})$  4:
if $A$ returns $(\mathbf{c},\mathbf{r})$ such that:  5:
then return 1  6:
else return 0  7:
end procedure

The $\mathrm{s}\mathrm{uprd}$ advantage of $A$ is defined as For our construction (reusing the proof in [
20]), the security game in Definition 2 is equivalent to that where the adversary
$A$ does not make any queries to
$\mathrm{Ver}$.
The adversary in control of the processor can repeatedly execute $\mathrm{Gen}$ and $\mathrm{Ver}$ and receive various instances of sets $I$ and ${I}^{*}$. This information can be used to estimate confidence information and this gives information about $\mathbf{co}$ to the adversary. Therefore, we assume the strongest adversary, who has full knowledge about $\mathbf{co}$ in the following theorem.
Theorem 1. Let ${\chi}_{2n}$ be the conditional distribution of ${\mathbf{e}}_{I}=sign({\mathbf{d}}_{I})$ given $\mathbf{co}=\left\mathbf{d}\right$ and given index set $I$ with $\leftI\right=2n$ and $I\subseteq \{i:{T}_{min}<{\mathbf{co}}_{i}={\mathbf{d}}_{i}\left\right\}$. If the distribution ${\chi}_{2n}$ is LPNadmissible, then the proposed PUF construction has ${\mathrm{Adv}}_{\mathit{PUF}}^{\mathrm{s}\mathrm{uprd}}(A)$ negligible in n under the random oracle model.
The proof uses similar arguments to those found in sections VIII.A and VII.C.2 of [
20], with three differences: (1) the adversary who wants to break LPN problems takes an LPN instance
$(\mathbf{A},\mathbf{b})$, where
$\mathbf{A}\in {\{0,1\}}^{2n\times n}$ and
$\mathbf{b}\in {\{0,1\}}^{2n}$. Thus, the related LPN problem is restricted to only
$2n$ equations, instead of
m equations (m > 2n) in [
20]; (2) the distribution of
$\mathbf{e}$ is from
${\chi}_{2n}$ which is conditioned on
$\mathbf{co}$, instead of
$\chi $ in [
20]; (3) in our construction, the querries to the random oracle are
$(\mathbf{b},\mathbf{s},0)$ and
$(\mathbf{b},\mathbf{s},1)$, which are different from
$(\mathbf{b},\mathbf{s})$ and
$\mathbf{s}$ used in the original construction. But this does not affect the capability of recovering
$\mathbf{s}$ from the look up table constructed in the original proof in [
20].
The above theorem talks about LPNadmissible distributions for very short LPN problems (i.e., the number of linear equations is
$2n$, twice the length of
$\mathbf{s}$). Short LPN problems are harder to solve than longer LPN problems [
32]. Thus, we expect a larger class of LPNadmissible distributions in this setting.
In our implementation,
$\mathbf{d}$ is generated by RO pairs on the FPGA. It has been shown in [
33] that, across FPGAs, the behavior of ROs correlate, i.e., if one depicts the oscillating frequency as a function of the spatial location of the RO, then this mapping looks the same across FPGAs. This means that different RO pairs among different FPGAs with the same spatial location behave in a unique way: an adversary may still program its own FPGAs from the same vendor and measure how the output of RO pairs depend on spatial locality its on its own FPGAs (which is expected to be similar across FPGAs).
The spatial locality of one RO pair does not influence the behavior of another RO pair on the same FPGA. However, if the output of one RO pair is known, the adversary is able to refine its knowledge about spatial locality dependence. In this sense, RO pairs with different spatial locations on the same FPGA become correlated. However, since the two neighboring ROs are affected almost the same by systematic variations, the correlation (even with knowledge of confidence information) between the RO pair outputs generated by physically adjacent ROs is conjectured to be very small. This claim is also verified experimentally in [
33]. We can conclude that different RO pairs will show almost i.i.d. behavior, if all the bits are generated by comparing neighboring ROs. However, even though the larger part of spatial locality is canceled out, conditioned on the adversary’s knowledge of how spatial locality influences RO pairs, an RO pair’s output does not look completely unbiased with
$\tau =0.5$. In general, however,
$\tau >0.4$ (this corresponds to the inter Hamming distance between RO PUFs) [
20,
33]. Hence, the 450 RO pairs seem to output random independent bits, i.e., Bernoulli distributed with a bias
$\tau >0.4$. Since we conjecture the hardness of LPN stating that Bernoulli distributions (with much smaller bias) are LPNadmissible, this makes it very likely that in our implementation an LPNadmissible distribution is generated.
As a final warning, we stress that replacing the 450 RO pairs by, e.g., a much smaller (in area) ring oscillating arbiter PUF introduces correlation, which is induced by how such a smaller PUF algorithmically combines a small pool of manufacturing variations into a larger set of challenge response pairs. This type of correlation will likely not give rise to a LPNadmissible distribution (the confidence information may be used by the attacker to derive an accurate software model of the smaller PUF which makes ${\chi}_{2n}$—as perceived by the adversary—a low entropy source distribution).
Including $\mathbf{b}$ and $\mathbf{s}$ in the hash computation. We analyze the reasons why $\mathbf{s}$ and $\mathbf{b}$ must be included in ${h}_{1}$:
Parameter $\mathbf{s}$. $\mathbf{s}$ is the only dynamic variable in the design so it ensures that challengeresponse pairs are unpredictable. (We cannot directly use $\mathbf{s}$ as a part of the challenge or response itself as this would provide information about $\mathbf{e}$ to the adversary.)
Parameter $\mathbf{b}$. This inclusion is because of a technicality regarding Definition 2 (one bit flip in $\mathbf{b}$ likely gives a new valid challengeresponse pair).
Checking the hash of $\mathbf{A}$. We note that
$\mathrm{Gen}$ checks if the adversary provides the correct matrix
$\mathbf{A}$ as input by verifying the hash of
$\mathbf{A}$. If this check is not done, then the adversary can manipulate matrix
$\mathbf{A}$, and, in particular, submatrix
${\mathbf{A}}_{{I}^{\prime \prime}}^{\prime}$ and its inverse
${({\mathbf{A}}_{{I}^{\prime \prime}}^{\prime})}^{1}$ in
$\mathrm{Ver}$. This leads to the following attack: suppose the inverse of the manipulated matrix is close to the original inverse
${({\mathbf{A}}_{{I}^{\prime \prime}})}^{1}$ with only one bit flipped in column
j. Let
$\mathbf{C}$ be an allzero matrix with only the one bit flipped in the
jth column. Then,
$\mathrm{Ver}$ computes
Since $\mathrm{Ver}$ repeats this computation by flipping at most t bits in ${\mathbf{b}}_{{I}^{\prime \prime}}\oplus {\mathbf{e}}_{{I}^{\prime \prime}}^{\prime}$, we may assume that the term $({\mathbf{A}}_{{I}^{\prime \prime}}^{1}\oplus \mathbf{C})({\mathbf{e}}_{{I}^{\prime \prime}}\oplus {\mathbf{e}}_{{I}^{\prime \prime}}^{\prime})$ will be equal to zero for one of these computations. This means that $\mathrm{Ver}$ outputs the correct response based on ${\mathbf{s}}^{\prime}=\mathbf{s}$ only if $\mathbf{C}{\mathbf{A}}_{{I}^{\prime \prime}}\mathbf{s}=\mathbf{0}$. Due to the specific choice for $\mathbf{C},$ this happens if and only if the jth row of ${\mathbf{A}}_{I}$ has inner product zero with $\mathbf{s}$. By observing whether $\mathrm{Ver}$ outputs a valid response or ⊥, the adversary is able to find out whether this inner product is equal to 0 or 1 and this leaks information about $\mathbf{s}$.
Machine Learning Resistance. Current strong PUF designs are delaybased and vulnerable to Machine Learning (ML) attacks. In order to obtain a strong PUF design with provable security, we bootstrap a strong PUF from a weak PUF (also called POK) by including a digital interface within the trusted computing base. This makes the new LPNbased strong PUF from [
20] provably secure as its security is reduced to the hardness of the LPN problem (at the price of not being lightweight due to the larger digital interface). In other words, no ML attack can ever attack the LPNbased PUF unless an ML approach can solve LPN.