1. Introduction
A computing system is built with multiple layers of abstraction from application to operating systems, to instruction set architecture and transistor level design. Generally speaking, a higher layer of abstraction such as application or OS is built atop a stack of abstracted layers beneath it which it must trust. The initial sources of this trust are known as
roots of trust, which are typically hardware features. Such features must be small and secure by design to perform one or more security related functions such as verifying software, authenticating a device and protecting cryptographic keys. Silicon physically unclonable functions (PUFs) have been proposed as such roots of trust [
1].
At a circuit level, a PUF harnesses device parametric variations to create a binary digital mapping that maps
m-bit input challenge,
c, to an
n-bit response,
r. The tuple (
c,
r) is called a challenge–response pair (CRP) of the PUF. PUFs with a limited challenge set, with low values of
m, are called
Weak PUFs [
2], which are used for key generation. PUFs with exponential CRPs are called
Strong PUFs [
2] and primarily target authentication applications. Various circuits have been proposed to harness manufacturing process variations [
3] to create Strong or Weak PUFs. SRAM based PUFs, which exploit start-up values in SRAM cells [
4,
5], and Arbiter PUFs [
6], which exploit signal races, have received the most attention.
The salient properties that make a PUF attractive for security applications are its
tamper-proof nature and
uniqueness. It has been claimed that PUFs are tamper-proof because they exploit inherent disorder in the manufacturing process, and any attempt to break package or delayer metal interconnections will change that physical disorder [
2].
Uniqueness, on the other hand, is harnessed by exploiting inherent variations of the semiconductor manufacturing process to create unique identifiers/keys. Thus, PUFs are unclonable by both the manufacturer and designer. These properties provide a major advantage over non-volatile memory based key storage, which has been shown to be vulnerable to physical attacks [
7]. SRAM PUFs are the most popular Weak PUFs. An SRAM usually starts up in the same initial state upon power-up [
4]. This state varies from chip to chip. This is the basis of SRAM PUF function [
8].
Unfortunately, PUF circuit characteristics are affected by environmental variations, noise and aging. This impacts repeatability of the response, which is called reliability, in the context of PUFs. Reliability of PUFs is a key design concern in Weak PUFs, as the responses are typically used for cryptographic key generation (or identification) and need to be quite robust to prevent data corruption downstream.
The primary focus of this work is on improving the reliability of Weak PUFs. First, we propose a new voter based technique to reduce the error rate of the SRAM-based PUF responses by harnessing the statistical bias. Next, we present analytical results on error rate pertaining to the voter design. To enable a complete solution, we then propose a design for Design for Test (DFT)/testing based approach that capitalizes on the voter based characterization to improve overall reliability of the system. We have simulated example designs of the proposed method to measure error rates, area, performance, and yield of the proposed method, and compare it against prior approaches to demonstrate the benefits of this solution.
The paper is arranged as follows. In
Section 2, we present the related background and literature. In
Section 3, we present proposed voter method along with the related circuit design. In
Section 4, we analyze our proposed method in greater detail using
random walk models and also present DFT techniques to improve the overall reliability. In
Section 5, we discuss the results and design examples and differentiate with related schemes. Finally, we conclude the paper in
Section 6 with insight into future works.
3. Proposed Technique
In this section, we describe the preliminaries for the proposed method. Then, we present our approach and the design of associated circuits. We conclude this section with observations on the simulated reliability improvement.
3.1. Harnessing Statistical Bias for Improving Reliability
An SRAM PUF cell is expected to produce reliable value for key generation. If the relative drive strength of the cross-coupled inverters is low, then any noise present in the cell will determine the outcome of the PUF creating errors. When an SRAM cell exhibits errors during start-up, it is possible that if the cell is powered-up several times, it will exhibit a statistical bias (bias towards 0 or 1). That is, if we assume that the mean value of noise that gets coupled to the PUF cells is close to zero, multiple evaluation of the PUF’s response would give the true statistical bias of the PUF response. Temporal Majority Voting (TMV) is one of the well-known techniques to extract such a bias in the presence of noise. If the statistical bias is strong, it may be detected using only a limited number of experiments. Contrarily, if the bias is weak, a much larger number of start-up experiments may be necessary to detect the true bias. A problem with a larger number of start-up experiments is that the associated circuits for bookkeeping will grow in size. Our proposed technique avoids this problem while extracting such a weak bias. Through simulation and analysis, we show that this technique is superior to traditional TMV.
3.2. Temporal Majority Voting
A simple way to reduce the error rate is using a Temporal Majority Voting scheme [
10,
23]. For example, a simple 4-bit counter based TMV counts from 0 to 15 and hence can be used as 15-way voting. If the resultant value after 15 evaluations of the cell’s response is greater than 8, then the final value is classified as 1, or else it can be classified as 0. The concept of TMV has been discussed extensively in previous works by Mathew et al. and Xiao et al. [
10,
23]. The mathematical model of the TMV is a binomial counting process and, hence, the reduction in error rate can be calculated analytically. For example, a PUF cell whose error rate is
, reduces to
where
(
N is odd) when an
N-way TMV is used [
24]. The circuit implementation of the TMV typically consists of a
n-bit counter, where
. The counter counts the number of ones; it is incremented by 1 if and only if the response from the SRAM cell is 1.
A major disadvantage of TMV is the high cost when the statistical bias of an SRAM PUF cell is weak. This will be explored further in
Section 4.
3.3. New Voter Design
In this subsection, we present our voter design. Even though we demonstrate the proposed technique using SRAM-based PUFs, the technique is generic to use with other classes of Weak PUFs to improve their reliability. The proposed voter is based on an UP/DOWN counter (UDC). A simple counter starts at an initial value of count 0 and counts upwards. By contrast, the n-bit UP/DOWN counter used in this design starts at an initial value of . The counter value is incremented if the response from the current trial is 1, else (0) it is decreased. When the counter reaches a terminal value of 0 (or ), the counter saturates and retains the terminal value.
The complete setup of the proposed voter scheme along with the PUFs is shown in
Figure 3. The output of the PUF cell is used as input to the
n-bit UP/DOWN counter. In the figure, we show a setup where an UP/DOWN counter is shared with four PUF cells, as an example. Starting at an initial counter value of
, multiple trials are conducted until the
n-bit UP/DOWN counter reaches terminal values of 0 or
. Unlike TMV, where the number of trials is
fixed, trials may continue
indefinitely in an UP/DOWN counter until a terminal value is reached. When the counter reaches a terminal value of 0 (or
), the value of the PUF cell is resolved as a
logic-0 (or
logic-1). When a terminal value is reached, the trials are terminated, otherwise, in practice, the trials are continued for a pre-determined number of times. It is possible that no decision can be reached when the trials are terminated. The optimal values for
n for varying error rates will be discussed in greater detail in
Section 4.3. Multiple PUF cells can share a single UP/DOWN counter or each cell can be assigned an UP/DOWN counter, as done in previous designs [
10]. The multiplexer is appropriately chosen (4:1 in
Figure 3). As expected, sharing the counter across multiple cells would increase the run-time but reduce the area overhead.
3.4. Circuit Design
The voting process described above needs multiple evaluations of the PUF output, but multiple power-ups of the circuit are inefficient. Instead, the SRAM cell can be modified with minimal changes to implement the scheme. Instead of using start-up values during power-up, the circuit can be converted to a pre-charge/discharge circuit. This modified circuit is shown in
Figure 4a. The clock signal first enables the paths from
to
and
and pre-charges the nodes to the supply voltage. During evaluation, depending on process variation, the output will settle in
logic-1 or
logic-0 due to the mismatch in the strengths of the discharge paths. The timing diagrams are shown in
Figure 4b. The cell output is registered by the counter, as shown in
Figure 3, on the rising edge of the delayed clock signal. The clock delay is to allow the cell to settle to its stable value during evaluation. This design is similar to changes required for enabling TMV in related work [
10], Sense-Amplifier PUFs [
25] and meta-stability based true random number generators (TRNGs) [
26]. The circuit changes above have minimal impact on the cell area. The UP/DOWN counter circuit can be implemented using flip-flops along with the required logic for initialization and saturation detection.
3.5. Error Rate from Simulation
We defer description of the details of simulation settings to
Section 5. Here, we describe the methodology for obtaining error rate from simulation. We simulate a noisy PUF cell in SPICE using 45 nm technology [
27] by randomly varying supply noise. The power supply noise distribution is varied to control the error rate of the SRAM cell. One million samples were collected from the cell. The outputs were fed into a 4-bit UP/DOWN counter, and the new, more reliable output bits were obtained. The new error rate was calculated for these bits to obtain the reliability improvement. As SPICE simulations are expensive, for an initial error rate of less than 0.24, we generated
bits with required initial error rate from a Python script and fed the bits to the UP/ DOWN counter. As shown in
Figure 5, the error rate of the proposed technique obtained through simulation is less than
for an initial error rate of ≤0.16. We reserve the reliability improvement over traditional TMV for the next section. As the order of magnitude of the error rate obtained through simulation is in the range of
, a large number of simulations may be required to get an accurate estimate. In order to estimate the error rate with higher accuracy, analysis of the proposed technique is presented in the next section.
4. Analysis of the Proposed Counter Based Design
In the following subsections, we describe the basics of the random walk model and use it to analytically evaluate the UP/DOWN counter scheme. In addition, we compare our approach to Temporal Majority Voting (TMV) and present a DFT scheme to improve overall reliability.
4.1. Operation of the Proposed Voter as Random Walk
Let
be a sequence of independent random variables from the set
. Let the probability of value 1 from a trial be
p, where
. Then, the probability of value
is
. If
represents the sum of such sequence after
N-trials, then
The path traced by
is called a simple random walk [
28]. This is an elementary one-dimensional random walk on an integer number line. The properties of random walk and associated problems are well studied and are related to the Markov process.
The UP/DOWN counter based scheme proposed above can be modeled and analyzed using random walk based models. For the purpose of analysis, without any loss in generality, let us assume that, in the absence of any noise, the PUF cell settles at logic-1 due to process variation. In the presence of noise, there is a possibility that the cell will settle at logic-0, which is opposite to the inherent process variation of the cell.
In
Table 1, we describe the notations used in our analysis of the UP/DOWN counter. In this analysis, we assume the noise experienced by the cell has zero mean; hence, for a statistical bias towards
logic-1,
p must be >0.5. If the cell has a strong bias towards
logic-1, then
; otherwise, for a weak bias,
p has a value slightly above
.
As mentioned earlier, the
n-bit UP/DOWN counter is initialized at
and multiple trials are conducted. The state transition diagram to better illustrate this process is shown in
Figure 6. If the PUF cell creates
logic-1, it is symbolized by
else,
in Label (
2). However, due to the absorbing saturation (decision) barrier at the end as shown in
Figure 6, Label (
2) cannot be used directly to model UP/DOWN counter. Nevertheless the UP/DOWN counter is related to the well-studied
Gambler’s ruin problem [
28] and, hence, the metrics of interest can be determined. For the proposed voter scheme, we are concerned with three probabilities: (i) probability of reaching the correct state,
(probability of success); (ii) probability for reaching the wrong state,
(probability of error); and (iii) probability of unresolved output
. If we consider our working example, as the cell is biased towards
logic-1, the probability of reaching the end state
is the probability of success
. Similarly, the probability of the trials leading to the end state 0 is the probability of error
. The last probability case may arise when the counter has not reached a saturating state after a given number of trials. As the circuit is designed to resolve to a decision, we are concerned with
and
with a limited number of trials.
4.2. Error Rate
A random walk with absorbing barrier is akin to the random walk in Gambler’s ruin problem as mentioned earlier. Instead of deriving probability of error in
T trials, the probability of errors occurring in an infinite number of trials can be derived [
28] and is given by
This value serves as an upper bound on the worst case error rate, resulting with the use of the UP/DOWN counter. The above expression is directly related to the ruin probability in Gambler’s ruin problem [
28] and the derivation is excluded for brevity. As Label (
3) signifies the probability of error under infinite trials, the probability of error
is lower under fixed number of trials. Nevertheless, the expression gives meaningful bounds for the design, signifying that the resultant error rate is always less than the expression derived in Label (
3). This is shown in
Figure 5, where the error rate using our scheme using simulation and analytical results are plotted. Thus, the equation gives insights into the benefits of using the proposed voting scheme.
Similar to probability of error, the probability of reaching correct state (probability of success) under infinite trials can be derived as
From Labels (
3) and (
4), we can infer that, under infinite trials, the UP/DOWN counter saturates at either one of the saturating ends when
. In reality, due to the limited number of trials, the counter value can be struck in a value between the saturating values. As this primarily occurs in cells with high error rate, they can be neglected for key generation if the design has redundant cells. A testing/DFT scheme using this method is discussed in
Section 4.4.
4.3. UP/DOWN Counter vs. TMV
In
Figure 7, the comparisons of error rates after using the TMV and UP/DOWN counter scheme are plotted against the initial error rate of the cell. Analytical results were used for both schemes. The expression for error rate of TMV can be found in related publications [
10]. For comparison, the error rate reduction by using
and 6-bit TMV counters and UP/DOWN counters are plotted. For the target error rate of the order of
, the new voter design offers a significant advantage over a traditional TMV. The 4-bit UP/DOWN counter is capable of handling
the initial error rate compared to a 4-bit, 15-way TMV to get a final error rate ≤10
. Similarly, 5-bit and 6-bit UP/DOWN counters offer
and
improvement. We also note that, for a 5-bit UP/DOWN counter, the counter still outperforms a 6-bit TMV counter. Hence, we expect an
n-bit UP/DOWN counter to do better than an
-bit TMV counter for
. Another point to mention is that the improvement in reliability when using an UP/DOWN counter is a conservative estimate, as Label (
3) is an upper bound, and, in reality, better gains are expected. The UP/DOWN counter has an area penalty of ∼10% to ∼15% compared to a similar simple counter used by TMV in 45 nm technology Nangate open cell library [
29]. This area increase is acceptable for the significant improvement offered in error rate reduction.
4.4. DFT Based on Trials to Settlement
Label (
4) quantifies the probability of success when infinite trials are applied to each cell. In reality, the UP/DOWN counter may settle in a value between the end states due to the limited number of trials. This is related to the inherent error rate of each cell. The expected number of trials needed by the UP/DOWN counter to reach a decision state [
28] can be derived as
where
. This expected value for a 4-bit UP/DOWN counter is plotted in
Figure 8 for different initial error rates. As illustrated, the average number of trials for reaching the saturation increases with the error rate. The exact distributions for probability of reaching the correct value under given number of trials can be derived using probability generating functions. Such explicit expressions for number of trials and probability of not reaching any state can be used to improve the design. The explicit expression permits the designers to estimate how much provision is to be made for the masked bits in the design.
As the average number of trials to reach the end state is related to inherent error rate of the cell, this information can be used during the trials to filter out cells with high error rates. For example, consider the test setup shown in
Figure 9. The PUF array has redundant cells so that, during the trials, the cells with high error rates can be discarded for operation. Hence, the aim of the test is to generate
mask information that indicates which of the cells in the PUF array should be considered for real-time operation. One simple way to achieve this is to set an empirical or analytical threshold based on Label (
5) (or using tighter bound expressions) for the number of trials to apply to the PUF array. If a PUF cell does not reach the end state within the target number of trials, the cell is marked as invalid. As this mask does not reveal any information about the PUFs’ values, it can be made public. During real-time operation, this mask value along with raw PUF response is combined to filter out responses of cells with high error rates. The resultant identifier is used as a key. This technique may also be combined with ECC or other post-processing techniques to reduce the probability of error further. This technique can also be used to determine whether a PUF chip is reliable enough for operation. For example, if the mask implies that the number of cells that are reliable in an array is less than the length of the desired key length, the chip can be marked as unreliable and hence, rejected during testing. Bhargava et al. have proposed similar mask generation, but by adjusting supply voltage [
30].
Hence, even though the UP/DOWN scheme has subtle changes in comparison to the TMV scheme, they offer significant improvement in error rate and also provide leverage to identify high error rate cells.