Key Distribution for Post Quantum Cryptography using Physical Unclonable Functions

Lattice and code cryptography can replace existing schemes such as Elliptic Curve Cryptography because of their resistance to quantum computers. In support of public key infrastructures, the distribution, validation and storage of the cryptographic keys is then more complex to handle longer keys. This paper describes practical ways to generate keys from physical unclonable functions, for both lattice and code based cryptography. Handshakes between client devices containing the PUFs and a server are used to select sets of addressable positions in the PUFs, from which streams of bits called seeds are generated on demand. The public and private cryptographic key pairs are computed from these seeds together with additional streams of random numbers. The method allows the server to independently validate the public key generated by the PUF, and act as a certificate authority in the network. Technologies such as High performance computing, and graphic processing units can further enhance security by preventing attackers to make this independent validation when only equipped with less powerful computers.


Introduction
In most public key infrastructure (PKI) schemes for applications such as cryptographic currencies, financial transactions, secure mails and wireless communications, the public keys are generated by private keys with RSA and elliptic curve cryptography (ECC). These private keys are natural numbers, typically 3000-bit long for RSA and 256-bits long for ECC. For example, in the case of ECC, the primitive element of the elliptic curve cyclic group is multiplied by the private key to find the public key. It is now anticipated that Quantum Computers (QC) will be able to break both RSA and ECC when the technology to manufacture enough quantum nodes becomes available. The paper entitled "A Riddle Wrapped in an Enigma" by N. Koblitz and A. J. Menezes suggested that the ban of RSA and ECC by National Security Agency is unavoidable, and that the risk of QC being is only one element of the problem [1]. Plans to develop post quantum cryptographic (PQC) schemes have been proposed to secure blockchains by Kiktenko et al. [2], and for cryptocurrency security by Semmouni et al. [3], even if the timeline for availability of powerful QC is highly speculative. Recently Campbell et al. [4], and Kampanakisy et al. [5], are proposing distributed ledger cryptography, and digital signatures with PQC.
Lattice-based algorithms exploit hardness to resolve problems such as the Closest Vector Problem (CVP), learning with error (LWE), and learning with rounding (LWR) algorithms and share some similarities with the knapsack cryptographic problem.

Learning with Error cryptography
The LWE of the CVP problem was first introduced by Regev [16]: the knowledge of integer-based vector t, and matrix A with t = A.s1 cannot hide the vector s1; however, the addition of a "small" vector of error s2 with t = A.s1+s2, makes it hard to distinguish the vectors s1 and s2 from t. The vector s2 needs to be small enough for the encryption/decryption cycles, but large enough to prevent a third party from uncovering the private key (s1; s2) from the public information (t; A). The public-private cryptographic key pair generation for client device i can be based on polynomial computations in a lattice ring, and is described in Fig1: 1-The generation of a first data stream called seed a(i) that is used for the key generation; in the case of LWE, the seed a(i) is shared openly in the network. 2-The generation of a second data stream called seed b(i) that is used to compute a second data stream for the private key Sk(i); the seed b(i) is kept secret. 3-The public key Pk(i) is computed from both data streams and is openly shared. 4-The matrix A(i) is generated from seed a(i). 5-The two vectors s1(i) and s2(i) are generated from seed b(i). 6

Figure 1: Example of public-private key generation for LWE based cryptography.
A digital signature algorithm (DSA) can be realized from the LWE instance by first generating a public-private key pair as in Figure 1. The secret key is then used to sign a message, and the public key is used to verify this signed message. In CRYSTALS-Dilithium [8], the authors use a Fiat-Shamir with Aborts approach [17] for their signing and verification procedure. The outline of the signing procedure is as follows: 1-Generate a masking vector of polynomials y. 2-Compute vector A.y and set w1 to be the high-order bits of the coefficients in this vector. 3-Create the challenge c, as the hash of the message and w1. 4-Compute intermediate signature z = y + c.s1. 5-Set parameter β to be the maximum coefficient of c.s1. 6-If any coefficient of z is larger than γ1 -β, then reject and restart at step 1. 7-If any coefficient of the low-order bits of A.z -c.t is greater than γ2 -β, then reject and restart at step 1.
Note: γ1, γ2, and β are set such that the expected number of repetitions is between 4 and 7.
The general outline of the verification procedure is given by the following: Compute w1' to be the high-order bits of A.z -c.t and accept if all coefficients of z are less than γ1 -β and if c is the hash of the message and w1'.
Encapsulation allows for two parties to securely share a symmetric key by encapsulating the key in ciphertext. When both parties have the symmetric key, they are then able to use a symmetric-key encryption algorithm to communicate (e.g., AES). These algorithms are known as key encapsulation mechanisms (KEM) and a few examples from NIST are SABER [18], Classic McEliece [14][15], CRYSTALS KYBER [19], and NTRU [20]. The process of using encapsulation with LWE/LWR is described below: • The public and private keys of both parties are constructed as described in Figure 1.
• Person A sends Person B their public key.
• Person B randomly generates a symmetric key and encapsulates it in a ciphertext with the public key of person A. • Person B sends the ciphertext to person A.
• Person A decapsulates the ciphertext with their private key. Both parties now have the symmetric key in their possession.

Learning with Rounding cryptography
The learning with rounding problem was first introduced by Banerjee [21]. It is the derandomized version of learning with error, which deterministically generates the noise in the LWE by rounding coefficients. This will eliminate the noise sampling, and significantly reduces the bandwidth [22]. The LWR is proved to be as hard as LWE to solve. Hence, it remains secure to be used in cryptographic applications. In schemes such as "Saber", a constant h is added as a constant vector to simulate the rounding operation by bit shifting, therefore playing a similar protecting role than the error vectors of LWE [18]. Saber, which is one of the NIST's finalists on key encapsulation category, uses LWR for key generation in public key encryption and key encapsulation. Below all three steps of PKE and KEM are described:

Saber PKE Key Generation
1-Similar to LWE, seed a(i) is used to generate matrix A(i).

3-The vector t(i) is computed: t(i) ← A(i) . s(i) + h(i).
4-Both seed a(i) and t(i) become the public key Pk(i).

5-s(i) becomes the private key Sk(i). Saber PKE Encryption
1-The seed a(i) and t(i) is extracted from public key to encrypt the message m.

5-v'(i)
is used to encrypt the message m which denoted as cm.

2-The message m' is decrypted by reversing computations with v(i) and cm.
Saber key encapsulation mechanism has three steps of Saber KEM Key Generation, Saber KEM Encapsulation, Saber KEM Decapsulation: Saber KEM Key Generation 1-Saber PKE key generation is used to return seed a(i), t(i) and s(i).

NTRU cryptography
Cryptographic algorithms such as FALCON, which uses NTRU ( degree of TRUncated polynomial ring) arithmetic is also based Lattice cryptography. The parameters of the scheme include a large prime number N, a large number q and a small number p that are both used for modulo arithmetic. Two numbers df and dg are used to truncate the polynomials f(i) and g(i). The key generation cycle for client device (i), as shown in Fig.2, is the following: 1-Generation of the two truncated polynomials f(i) and g(i) from seed a(i), and seed b(i).  The polynomials f(i) and g(i) are not always usable, they are subject to some pre-conditions such as invertible modulo p and q. The client device needs to try several possible random numbers, and select the ones giving acceptable private keys. Once sufficient public and private keys are available, the encryption of the plaintext message m, m ∈ {-1, 0, 1} N , is done by finding the random polynomial r, r ∈ {-1, 0, 1} N which uses a corresponding parameter dr and calculating the ciphertext with the equation e ≡ r.h + m (mod q). To retrieve m from e, we first calculate a ≡ f.e (mod q), lift the coefficients of a to be between ∓ /2. Then, a (mod p) is equal to m. [23].

2-Computation of
NTRU lattices can also be applied to DSA. This was originally introduced in NTRUSign, but NIST submissions such as Falcon expand on these algorithms [9]. Falcon utilizes the GPV framework applied to NTRU lattices; that is, the public key is a long basis for an NTRU lattice while the private key is a short basis. From here, the message m is sent a non-lattice point utilizing a random value salt and hash function H. Using the short basis, a user signs by finding the closest vector v to c. The signature is (salt, s = c -v), verified by checking if s is short and H (msg ǁ salt)-s is a point on the lattice (verified using the long basis [24]).

Code-based cryptography
Code based algorithms such as Classic McEliece are implemented with binary Goppa codes, that is Goppa codes with underlying computations in finite Galois fields GF(2 m ). The parameters are an irreducible polynomial of degree t, the field exponent m, and code length n. The resulting code has error-correction capability of t errors, the information-containing part of the code word has a size of k = n -m x t and has generator matrix G with a size of k x n [14][15].
The block diagram of Fig.3 is showing an example of public-private key generation for Code-based cryptography, and client device i. 1-Seed a(i) is used to create a random invertible binary k x k scrambling matrix S(i).

Figure 3: Example of public-private key generation for code-based cryptography.
Given a generator matrix of a binary Goppa code G, an irreducible polynomial of degree t, the field exponent m, and the code length n, the encryption process involves the following steps: 1-Create the public key, Ĝ(i) as described above.

2-Multiply the message m by Ĝ(i), creating the ciphertext message
.
3-Add a random error vector e of Hamming weight t to to obtain the ciphertext c.
Given a ciphertext c, a decoding algorithm, and the private key {G; S(i) -1 , P(i) -1 }, decryption involves the following steps: 2-Use the decoding algorithm to correct the errors to obtain One example of a decoding algorithm is Patterson's algorithm. This algorithm calculates the error-locator polynomial which has roots corresponding with the locations of the error bits added to the encrypted message. This algorithm can be implemented as follows [25]: Input: Syndrome polynomial s, Goppa polynomial g of degree t Patterson (s, g) 1-t = s -1 mod g. Once the error locator polynomial is found, the Berlekamp Trace Algorithm can be used to find the roots of the polynomial via factorization. These correspond to the locations of the error bits added to the message. The Berlekamp Trace Algorithm can be implemented as follows [26]: Input: Polynomial to factor p, trace polynomial t, basis index i Berlekamp Trace (p, t, i): return the root of p. 3-.

Public-private key pairs
As part of a PKI, the public-private key pairs can be used to securely transmit shared secret keys though KEM, and to digitally sign messages with DSA, see Fig. 4. The public key Pk(2) of client 2 encapsulates the shared secret key of client 1, that can only be viewed by client (2), thanks to its private key Sk(2) that reverses the encapsulation. Client 1 uses its private key Sk(1) to digitally sign a message that is verified with the public key Pk(1), providing non-alteration and non-repudiation in the transaction. The trust and integrity of such architecture relies on the following: i. The secure generation and distribution of the public-private key pairs to the client devices that are participating to the PKI. ii. The identification of the client devices, and trust in their public keys.
iii. The sharing of the public keys among participants.
Most PKI's are relying on certificate authorities (CA) and registration authorities (RA) to offer such an environment of trust and integrity. The architecture is vulnerable to several threats, including loss of identity, man-in-the-middle attacks, and side channel attacks in which the private keys are exposed during KEM, and DSA.

PKI with network of PUFs
The use of networks of PUFs can mitigate the vulnerabilities of PKI's. PUF technology exploits the variations created during fabrication to differentiate each device from all other devices, acting as a hardware "fingerprint" [27][28][29]. Solutions based on PUFs embedded in the hardware of each node can mitigate the risk of an opponent reading the keys stored in non-volatile memories. The keys for the PKI can be generated on demand, with one-time use, stealing a key become useless as new keys are needed at each transaction. During enrollment cycles, the images of the PUF's are stored in look-up tables in the CA, see Fig 5; enrollment has to be done only once in a secure environment. Handshake protocols [30] can select a portion of the PUFs, and their image stored in the CA, to extract a data stream that generates the key pairs. The PUFs can be erratic, therefore the generation cryptographic keys, the focus of this work, is challenging. A single-bit mismatch in a cryptographic key is not acceptable for most encryption protocols. Therefore, the use of error correcting (ECC) methods, helper data, and fuzzy extractors can minimize the levels of errors [31][32][33]. The alternate method is the ones where the CA has search engines, such as response-based-cryptography (RBC), that can handle the validation of erratic keys [34][35][36][37][38].
The RBC engine that validates public keys shown in Figure 5 generates public/private key pairs until the public key matches the client's provided key. The server searches over a seed (e.g., a 256 bit seed), and uses that seed for key generation. If the generated public key matches the client's public key, then the client is authenticated. If the public keys do not match, then the server flips one bit of the seed at a time (increasing the Hamming distance) until the public keys match. Thus, the search is carried out by generating the public/private key pairs by iterating over the seed and increasing the Hamming distance until the seed is found that matches the client's public key. The search space for a 256 bit key is 2 256 and would be nearly impossible to authenticate a user in a fixed time without the use of parallel computing. HPC and GPU technologies are valuable to enhance the ability of the CA to validate of the public key generated by the client devices. For instance, Graphics Processing Units (GPUs) can be employed to parallelize and accelerate the authentication process. By using a GPU, the server can search over multiple keys in parallel.

Implementation of PQC algorithms for PKI.
The CRYSTALS-Dilithium digital signature algorithm consists of the following procedures: key generation, signing, and verification. These procedures are computationally bounded by two operations: multiplication in the polynomial ring, ℤq[X]/(Xn+1), and matrix/vector expansion via an eXtendable Output Function (XOF). Therefore, any attempt to optimize Dilithium should target these operations. We describe below literature that focuses on such optimizations.
The operation of polynomial multiplication has a quasi-linear time complexity bounded by the Number Theoretic Transform (NTT) implementation, and the operation of expansion via XOF is bounded by the SHAKE-128 implementation. Using the AVX2 instruction set, matrix and vector expansion is optimized by using a vectorized SHAKE-128 implementation that operates on 4 sponges that can absorb and squeeze blocks in parallel. Additionally, Ducas et al. [8] use the AVX2 instruction set to optimize the NTT thus speeding up the polynomial ring multiplication by about a factor of 2. This optimization is achieved by interleaving the vector multiplications and Montgomery reductions so that parts of the multiplication latencies are hidden.
Nejatollahi et al. [39] outline two different works that optimize the NTT using an Nvidia GPU. The first reports higher throughput polynomial multiplication [40] and the second is a performance evaluation between several versions of the NTT, including iterative NTT, parallel NTT, and CUDA-based FFT (cuFFT) for different polynomial sizes [41]. Strictly algorithmic optimizations of the NTT are presented in other works [42][43]. Longa et al. [42] show that limiting the coefficient length in polynomials to 32 bits yields an efficient modular reduction technique. By employing this new technique in NTT, reduction is only required after multiplication, and significant performance gains are achieved when compared to a baseline implementation. Additionally, the authors use signed integer arithmetic which decreases the number of add operations necessary in both sampling and polynomial multiplication. Greconici et al. [43] use signed integer arithmetic to decrease the number of add operations which leads to performance gains in several functions including NTT and SHAKE-128. The authors also employ a merging layers technique in NTT that reduces the number of loads and stores by about a factor of 2.
The SABER KEM algorithm is similarly computationally bounded by polynomial multiplication and hashing functions. As mentioned by D'Anvers et al. [18], since SABER uses power-of-2 moduli, this eliminates the need for rejection sampling and makes modular reduction fast by using bit shift operations. However, one drawback of using power-of-2 moduli is the inability to take advantage of faster NTT multiplication since the moduli are not prime. As described above, Akleylek et al. [41] examines the performance of different multiplication techniques. By implementing a verison of cuFFT in a similar fashion for SABER, we may observe a speedup in polynomial multiplication. In addition, SABER is computationally bounded by hashing and extendible functions. SABER uses SHA3_256 and SHA3_512 functions for hashing and SHAKE128 as an XOF. Roy et al. [44] demonstrate parallelizing SHAKE128 using AVX2 and batching four operations, thus achieving a 38% increase in throughput for SABER's key generation. Additionally, optimizing the hashing functions and SHAKE128 in a different way, the SABER technical documentation describes replacing the SHA3 functions with SHA2 and replacing SHAKE128 with AES in counter mode [18].
Focusing on three PQC algorithms, SABER, CRYSTALS-DILITHIUM, and NTRU, a breakdown of the fraction of time spent (as a percentage) in the hashing/XOR and polynomial multiplication components of the algorithms is reported in Table 1. NTRU spends the majority of its time doing polynomial multiplication first, then hashing second [45], but no benchmarks have been calculated thus far. The times spent for the hashing and polynomial multiplication components of CRYSTALS-Dilithium and SABER are reported as percentages of the total execution time for the key pair generation procedure where the percentages are an average of 10 time trials.

PUF-based key distribution for LWE lattice cryptography.
The proposed generic protocol to generate public-private key pairs with PUFs for LWE lattice cryptography is shown in Fig. 6. The random number generator (a) is used for the generation of seed a(i), which is public information. However, Seed k that is needed for the generation of the private key Sk(i) is generated from the PUF. The outline of a protocol generating a key pair for LWE cryptography is the following: 1-The CA uses random numbers generator and hash function to be able to point at a set of addresses in the image of the PUF-i. 2-From these addresses a stream of bits called Seed K' is generated by the CA. 3-The CA communicate to Client (i) through a handshake the instructions needed to find the same set of addresses in the PUF. 4-Client (i) uses the PUF to generate the stream of bits called Seed K. The two data streams Seed K and Seed K' are similar, however slightly differ from each other's due to natural physical variations and drifts occurring in PUFs.
[If needed Client (i) applies error correcting codes (ECC) to reduce the difference between Seed K and Seed K'; the corrected, or partially corrected, data stream is used to generate the vectors s1(i) and s2(i)] 5-Client (i) independently uses a random numbers generator (a) to generate a second data stream Seed a(i), which is used for the computation of the matrix A(i).

8-The public key Pk(i) is {a(i); t(i)}.
9-Client (i) communicate to the CA through the network the public key Pk(i); 10-The CA uses a search engine to verify that Pk(i) is correct. The search engine initiates the validation by generating a public key from Seed a(i) and Seed K' with lattice cryptography codes. If the resulting public key is not Pk(i), an iteration process gradually injects errors into Seed K' and computes the corresponding public keys. The search converges when a match in the resulting public key is found, or when the CA concludes that the public key should be bad. 11-If the validation is positive the public key Pk(i) is posted online by the RA. This protocol is applicable for single use key pairs that are generated for each transaction. The random number generators of the first step of the protocol can generate new data streams, which point at different portions of the PUFs, thereby can trigger the generation of new key pairs. The search engine described above can benefit from noise injection and high performance computing. The injection of noise in Seed K will make the search too difficult for CA, unless equipped with HPC's, or GPUs. This can preclude hostile CA's from participating.

PKI architecture with PUF-based key distribution and LW.
The PUF-based key pair generation scheme with LWE cryptography, as presented in the previous section, can be integrated in a PKI securing a network of i clients. The example of Fig . 7, is showing two client devices communicating directly, either by exchanging secret keys through KEM or using DSA. The client devices independently generate the seed a(i), while the PUFs and their images are used for the independent generation of the vectors s1(i) and s2(i). The role of the CA is to check the validity of the vectors t(i) , and to transmit both the seeds a(i) the vectors t(i) to the RA, which maintain a ledger with valid public keys. Such an architecture is secured assuming the following conditions: i. The enrollment process in which the PUFs are characterized to generate their image is accurate and was not compromised by the opponent. ii. The database stored in the CA that contains the image of the PUFs for the i client devices is protected from external, and internal attacks. iii. The PUFs embedded in each client device are reliable, unclonable, and tamper resistant. iv. The key generation process, KEM, and DSA are protected from side channel analysis.
As we experimentally verified that the latencies of the key generation process from the PUFs are low enough, such a protocol can be used to change the key pairs after each encryption cycles. Therefore, the potential loss of the secret keys during an encryption/decryption cycles has minimum impact as different keys will be used during the subsequent cycles.

PUF-based key distribution for LWR lattice cryptography.
There are some similarities between LWE and LWR implementations. The seed k of the PUF is only used to generate one vectors s1(i), while a constant vector h(i) can be generated independently. The public vector t(i) is computed in a similar way: t(i) ← A(i) . s1(i) + h(i).

PUF-based key distribution for NTRU lattice cryptography.
The protocol to generate key pairs from PUFs for NTRU cryptography is similar than the one presented above in section 4.1 for LWE, see Fig. 8. We are suggesting a method where the only source of randomness is the PUF, seed K, to compute both the public key Pk(i), and the private key Sk(i). In our implementation, seed K feeds the hash function SHA-3, and SHAKE, to generate a long stream of bits, then compute the two polynomials f(i) and g(i).
As previously discussed in section 2.3, the polynomials f(i) and g(i) are not always usable due to pre-conditions, therefore a scheme to try several possible addressing of the PUF has to be developed. One way is to implement a deterministic method that is known by both the client device, and the CA, which can have a negative impact on the latencies. We preferred the solution driven by the client device that ask the CA to initiate new handshakes. The summary of the method used to generate the key pairs for NTRU cryptography is the following: 1-The CA uses random numbers to point at a set of addresses in the image of the PUF-i. 2-From these addresses a stream of bits called seed K' is generated by the CA. 3-The CA sent the handshake to the client (i) to find the same addresses. 4-Client (i) uses the PUF to generate seed K. 5-Client (i) applies ECC) on seed K and generates the truncated polynomials f(i) and g(i). 6-Computation of Fp(i) and Fq(i) and verify that the pre-conditions are fulfilled. 7-If needed ask for a new handshake and iterate. : h(i) ← p . Fq(i) . g(i). 9

PUF-based key distribution for code-based cryptography.
An example of a protocol to generate the key pairs with PUFs for Code-based cryptography is shown in Fig. 9. The overall protocol is similar to the one presented above for lattice cryptography.
Like it is done with NTRU the only source of randomness is seed K that is generated from the PUF to compute the two matrices S(i) and P(i).
The brief outline of a protocol generating key pairs for code-based cryptography is the following: 1-The CA uses random numbers to point at a set of addresses in the image of the PUF-i. 2-From these addresses a stream of bits called seed K' is generated by the CA. 3-The CA sent the handshake to the client (i) to find the same addresses. 4-Client (i) uses the PUF to generate seed K. 5-Client (i) applies ECC) on seed K and generates the matrices S(i) and P(i).

Experimental section
To demonstrate the principles and protocols discussed in sections 3 and 4, we designed a simple, small scale experiment. Five cryptosystems were selected as candidates for consideration: AES, ECC, qTESLA, CRYSTALS-Dilithium and SABER. However, each of these cryptosystems have a list of parameter sets as a part of their specifications. To narrow down the scope, we chose parameter sets that were inherently compatible with a 256-bit output from a hypothetical PUF as well as ones that were best optimized between speed, size, and security for IoT devices. PQC PKI cryptosystems that remained through NIST's round 3 were heavily favored, and a strong enough NIST security level was also an important consideration. For these reasons, the parameter sets AES256, ECC Secp256r1, qTESLA-p-I, CRYSTALS-Dilithium 2, and LightSABER were used as a performance comparison.

Experimental Methodology.
As of the time of writing, there are few implementations of RBC engines proposed. A previously established algorithmic approach targeted for HPC was chosen and scaled down to work using OpenMP instead of MPI for a single machine. This engine does not have to rely on the scalability of distributed memory nor the complications of MPI communication, so a flag in shared memory protected by critical sections was used instead. All implementations utilized the same overall structure and key iteration mechanism proposed in the distributed memory variant. The use of compilation with AVX was thusly carried over and maintained as a fair comparison between all variants; however, further optimizations can be made by taking advantage of AVX2 or other wide vector technologies. The AES256 implementation takes advantage of the AES-NI instruction set, whereas all other cryptosystems tested do not use any extra instruction sets over the vector-based ones such as AVX and SSE.
RBC engines targeting purely CPU platforms were only considered for demonstrative purposes. The purpose of this experiment is to compare the relativistic performance between all five chosen cryptosystems. The ease of porting one cryptosystem to another all on the CPU influenced the scope of experiments. Future experimental evaluations exploring GPU focus will require more dedicated, specialized programming for each cryptosystem. The CPU used for the experiments was an AMD Ryzen 9 3900X 12-Core CPU, with SMT (hyperthreading) and PBO (an opportunistic auto-overclocking) enabled.

Evaluation of the effective key throughput
For our performance metric used to compare the RBC cryptosystem implementations, we chose what we coin as the "effective key throughput." Given a small enough number of mismatches between the keys, overhead of the setup and clean up phases takes over. Thus, when computing the key throughput (the number of keys searched per second), an insufficient number of mismatches will reflect inaccurately on maximum effective throughput possible. To combat this issue, we took each cryptosystem and iteratively found the minimum Hamming distance before the key throughput levels off. This is what we refer to as the "effective key throughput." For AES256, the minimum Hamming distance is 4, while for ECC Secp256r1, qTESLA-p-I, CRYSTALS-Dilithium 2, and LightSABER the minimum Hamming distance is 3. Unfortunately, due to the intractable nature of the problem, the single bit error jump from a Hamming distance of 3 to 4 makes it infeasible to run a statistically sufficient number of runs for ECC and qTESLA-p-I. For this reason, the AES256 benchmarks ran at a Hamming distance of 4, and the remaining cryptosystems ran at a Hamming distance of 3.
Similar to the distributed memory experiments, the experiments were running by randomly generating a key, then the combination of N bit difference that was sufficiently the middle point of a thread's workload was chosen to. This was done to reduce the need for a high number of iterations to reach a statistical central point. Thus, 10 iterations were performed for each cryptosystem, and the median was taken rom set of 10 iterations as the statistic point of interest to reduce the influence of background processes. In Figure 10, the median of each RBC cryptosystem's effective key throughput is plotted against each other. The AES256 implementation, aided by AES-NI, runs several orders of magnitude more efficiently than the public key cryptography variants at 2.17 10 8 keys per second. ECC Secp256r1 performed the second slowest at 4.77 10 4 keys per second. The post-quantum cryptographies largely performed better than ECC with 1.97 10 5 and 6.83 10 5 keys per second for CRYSTALS-Dilithium 2 and LightSABER respectively. qTESLA-p-I was the worst performing PQC and overall cryptosystem out of all five at 2.24 10 4 keys per second. To be better get a sense of the relativistic scaling, we set ECC Secp256r1's effective key throughput results as the reference point since we are interested in how the PQC algorithms perform when replacing it in future PKI cryptosystems. This is plotted in Figure 11, where now the response variable is displayed in percentage of throughout relative to ECC Secp256r1's. Shown here, AES256 is roughly 4550 times more performant than ECC Secp256r1. CRYSTALS-Dilithium 2 is over 4.14 times more efficient than ECC Secp256r1. The most efficient PQC was LightSABER at 14.3 times faster, and the worst overall cryptosystem was qTESLA-p-I at 0.469 times slower.
From these results, we strongly advise against qTESLA when used in an RBC environment, only made worse by NIST's round 3 ruling to push this cryptography to the side. Out of what was tested, this leaves CRYSTALS-Dilithium as the strongest candidate for DSA in a PQC environment. For key encapsulation, our results show that SABER is a strong candidate for its relatively fast key generation. Future testing might consider comparing FALCON against CRYSTALS-Dilithium for DSA, and CRYSTALS-KYBER, NTRU, and Classic McEliece against SABER for KEM. Figure 11: Maximum effective throughput relative to the performance of ECC Secp256r1 (as a percentage) achieved for each RBC cryptosystem implementation with AMD Ryzen 9 3900X.

Conclusion and future work
The PQC algorithms under standardization are encouraging, the latencies are reasonable, making the protocols suitable for PKIs securing networks of client devices and IoTs. The generation, distribution, and storage of the public-private key pairs for PQC can be complex because the keys are usually very long. This paper proposes to generate the public-private key pairs by replacing the random number generators by data streams generated from addressable PUFs to get the seeds needed in the PQC algorithms. Unlike the key pairs computed by PQC algorithms, the seeds are relatively short, typically 256-bit long. The use of PUFs as source of randomness is applicable to all five lattice-based codes under consideration in the phase III investigation of NIST, and to the code-based Classic McEliece scheme. In order to simultaneously generate key pairs from a server acting as the certificate authority, and the client device having access to its PUF, it is critical to handle the bit error rates (BERs) that are frequent with physical elements. We verified in the experimental section that the RBC can find the erratic seeds by testing an excess of 10 5 seeds per second with CRYSTAL DILITHIUM II and light SABER, which is faster than what we measured with mature algorithms such as the ones with elliptic curves.
In this work we have not yet studied the multivariate-based RAINBOW code, which is also an important scheme under consideration for standardization; we are currently studying ways to use