Optimally Fortifying Logic Reliability through Criticality Ranking

With CMOS technology aggressively scaling towards the 22-nm node, modern FPGA devices face tremendous aging-induced reliability challenges due to bias temperature instability (BTI) and hot carrier injection (HCI). This paper presents a novel anti-aging technique at the logic level that is both scalable and applicable for VLSI digital circuits implemented with FPGA devices. The key idea is to prolong the lifetime of FPGA-mapped designs by strategically elevating the VDD values of some LUTs based on their modular criticality values. Although the idea of scaling VDD in order to improve either energy efficiency or circuit reliability has been explored extensively, our study distinguishes itself by approaching this challenge through an analytical procedure, therefore being able to maximize the overall reliability of the target FPGA design by rigorously modeling the BTI-induced device reliability and optimally solving the VDD assignment problem. Specifically, we first develop a systematic framework to analytically model the reliability of an FPGA LUT (look-up table), which consists of both RAM memory bits and associated switching circuit. We also, for the first time, establish the relationship between signal transition density and a LUT’s reliability in an analytical way. This key observation further motivates us to define the modular criticality as the product of signal transition density and the logic observability of each LUT. Finally, we analytically prove, for the first time, that the optimal way to improve the overall reliability of a whole FPGA device is to fortify individual LUTs according to their modular criticality. To the best of our knowledge, this work is the first to draw such a conclusion. Electronics 2015, 4 151


Introduction
As electronic device technology aggressively scales towards the 22-nm node, especially with the recent introduction of high-k material to avoid the gate tunneling effect, the aging-induced reliability issue will be exacerbated greatly [1,2].As such, structural degradation in modern Complementary Metal Oxide Semiconductor (CMOS) devices can potentially accelerate, therefore resulting in hard faults at a much faster pace [3].Because these hard faults cannot be rectified to make an ICchip more reliable to use, it is imperative to develop effective anti-aging techniques at the circuit, logic and architecture levels, especially for the applications that require high field reliability.Such applications include automobiles, aircraft, medical equipments or power plants, whereby the performance degradation and circuit failure can potentially be life-threatening.
Major aging mechanisms of CMOS technology include bias temperature instability (BTI), hot carrier injection (HCI), electro-migration (EM), stress migration (SM) and time-dependent dielectric breakdown (TDDB) [4].All of these mechanisms are responsible for the gradual oxide wear-out or the interconnect failure that causes circuit performance degradation and transistor failure.Furthermore, all of these mechanisms can be worsened by the high switching rate of a circuit, excess supply voltage or high operational temperature.Among all of these transistor aging mechanisms, the most prominent ones are the negative bias-temperature instability (NBTI), which affects PMOS transistors, and the positive one (PBTI), which affects NMOStransistors [1,2,5,6].The major effect of the NBTI and PBTI is that they increase the magnitude of the transistor's threshold voltage and reduce the effective carrier mobility over time, therefore leading to a reduction in the operational reliability of the CMOS transistor.Ultimately, such aging mechanisms will shorten the lifetime of CMOS devices.In the past, the effect of PBTI was negligible in comparison to NBTI.However, since the introduction of the high-k/metal gate materials, its effect becomes comparable.
Historically, Field Programmable Gate Array (FPGA) technology has been always at the forefront to exploit the latest advancements in CMOS technology.This is because FPGA devices typically have regular and highly-scalable structures, as well as stringent demands on high performance and energy efficiency.For example, FPGAs that use a 22-nm high-k/metal gate process technology and operate with frequencies up to 1.5 GHz have been announced [7].Unfortunately, CMOS technology scaling also poses several technical challenges to FPGA device's reliability.Specifically, these issues include manufacturing variability, sub-threshold leakage, power dissipation, increased circuit noise sensitivity and reliability concerns, due to transient (e.g., radiation-induced soft errors) and permanent (e.g., transistor aging) failures [8,9].In this paper, we present a novel technique at the logic level, specifically designed to mitigate the aging effect of FPGA devices.Our proposed method is both scalable and applicable for Very Large Scale Integration (VLSI) digital circuits implemented with modern FPGA devices.

Research Objective and Key Contribution
Fundamentally, there are two approaches to mitigate the reliability issues in FPGA devices.The first approach takes a bottom-up strategy, which involves analyzing failure mechanisms at the level of device physics, therefore improving the overall reliability of FPGA devices through transistor engineering or circuit optimization.The second approach attempts to improve FPGA device reliability in a top-down direction, i.e., establishing the relationship between the reliability of individual circuit logic components and the reliability of the whole device.In other words, the second approach formulates the FPGA device reliability problem as a system engineering problem and solves it at the logic and architecture design levels.
This paper focuses on mitigating the negative impacts due to FPGA transistor aging at the logic level.Specifically, we aim at developing a systematic approach for discriminatively scaling V DD s within an FPGA device in order to optimally improve its overall reliability.As will be shown later, our proposed criticality-based approach is totally independent of the specific ways to enhance the reliability of individual FPGA components.Besides elevating the V DD s of LUTs, we can also use device engineering or even modular redundancy.As such, we first develop a systematic framework to analytically model the reliability of an FPGA LUT (look-up table), which consists of both Static Random-Access Memory (SRAM) bits and associated switching circuits.While the majority of all existing work focused on studying the timing degradation due to BTI effects, we concentrate on investigating the BTI-induced switching degradation in FPGA.We also, for the first time, establish the relationship between signal transition density and a LUT's reliability in an analytical way.This key observation further motivates us to define the modular criticality as the product of signal transition density and the logic observability of each LUT.Finally, we analytically prove that the optimal way to improve the overall reliability of a whole FPGA device is to fortify individual LUTs according to their modular criticality.To the best of our knowledge, this work is the first to draw such a conclusion.
The rest of the paper is organized as follows.Section 2 states the existing study results on CMOS technology aging.We then delve into more detailed descriptions of the analysis procedure for FPGA aging due to BTI in Section 4. In Sections 5 and 6, we outline our modeling strategy of FPGA reliability, our proposed strategy to maximize its overall reliability and the optimality proof of our proposed approach, respectively.Subsequently, Section 7 describes the reliability improvement results that we obtained using benchmarks from the Altera benchmark suite of the Quartus University Interface Program (QUIP).In these results, we aim to illustrate both the effectiveness and the computational efficiency of our proposed approach.Afterwards, we present and analyze the usefulness of modular criticality values by applying discriminative logic fortification to several circuits.As we will show, the knowledge of modular criticality values for a given circuit can significantly increase the cost-effectiveness of hardware redundancy.Finally, Section 9 concludes the paper.

Modeling BTI-Induced CMOS Device Aging
Several predictive models for BTI have been developed based on reaction-diffusion (R-D) models [10,11].In particular, several studies analyzed the BTI effect on threshold voltage changes.
Traditionally, although BTI can be categorized into two different effects on the transistor model, the NBTI, which affects PMOS transistors, is far more important than the PBTI, which affects NMOS transistors.However, with the better understanding of high κ/metal gate transistors in sub-45-nm technology, the PBTI effect becomes more important and comparable to NBTI.In this paper, we adopt the most recent results and combine both the NBTI and PBTI effects when estimating BTI's impact on the transistor threshold voltage V th .
For brevity, we omit the detailed description of the physical mechanism for both the BTI and PBTI.Instead, we refer interested readers to many existing studies [11][12][13] for further information.
Fundamentally, there are two types of BTI effects: static BTI and dynamic BTI.The static NBTI/PBTI corresponds to the case when the PMOS/NMOS transistor is under constant stress.In this case, ∆V th due to NBTI/PBTI at time t can be expressed, according to [14], as: where n is the time exponent and n = 1/6 to 1/4 depends on the diffusion type used in the physics modeling.A is another constant depending on the hole density, temperature T and the electrical field , where q is the electron charge, k is the Boltzmann constant and C ox is the oxide capacitance per unit area.Dynamic BTI corresponds to the case where the PMOS/NMOS transistor undergoes alternate stress (V gs = (−/+)V DD ) and recovery (V gs = (+/−)V DD ) periods.Fundamentally, both NBTI and PBTI have two phases: (1) the stress phase, at which the gate-source voltage is reversely (positively) biased (V GS = −(+)V DD ); and (2) the relaxation phase (V GS = 0).As shown in Figure 1, at the stress phase, the interface of channel and gate oxide creates some interface traps.The created interface traps make the magnitude of threshold voltage V th increase; on the other hand, some of the interface traps may be removed, and as a result, the V th of the transistor decrease, due to the widely different diffusivity of H 2 in the oxide and poly-Si.The recovery becomes a two-step process, with fast recovery driven by the H 2 in oxide followed by slow recovery of H 2 by back diffusion from poly-Si.Thus, ∆V th can be separately expressed in stress and recovery periods.
where t e either equals t ox or the diffusion distance of hydrogen in the initial stage of recovery.This parameter captures the fast drop of V th at the beginning of the recovery phase.This effect is verified with estimated measurement data from [11].This model also accurately captures the dependence of the fractional recovery on t ox .Thus, thicker dielectrics have higher fractional recovery.2) Recovery (R).The dashed line represents the overall aging process, i.e., the increasing trend of V th . time In order to predict the long-term threshold voltage degradation (∆V th ) due to NBTI at a time t, the stress and recovery cycles can be simulated for m = t T clk cycles to obtain the long-term degradation.However, for high performance circuits, m can be very large, even for t = 1 month.Thus, it becomes impractical to perform simulation in order to predict ∆V th .However, various recent studies have shown that it is possible to obtain a closed form for the upper bound on the ∆V th as a function of the duty cycle α, T clk and t [15].In fact, the models of PBTI and NBTI are similar to each other.The BTI effect on V th can be calculated as follows [15], where A is a function-dependent factor of the temperature, n is a constant depending on the fabrication process (n = 1/6 or n = 1/4 based on the diffusion type [14]), Y is the duty cycle and t is the total time (transistor age) [15].In this paper, we define the duty cycle of a transistor as the ratio between the stress time to the total time, which also can be defined by signal probability (SP).To further verify the accuracy of this model, we have compared the results of our modeling and the experimental data collected by [15] for the TSMC45-nm technology node.Both data have shown very good matching.The R-D based V th model discussed above assumes nominal degradation without considering the statistical variation in the underlying degradation process.In reality, due to the finite number of Si-H bonds in the channel, breaking and re-passivation of these bonds experience stochastic fluctuations [16].This phenomenon is similar to the random V th variation induced by the number and the placement of dopant atoms in the channel, known as the random dopant fluctuation (RDF) effect.The general framework of BTI variation has been proposed by Stewart in [17], where the number of broken bonds N IT in the channel has been modeled as a Poisson random variable.Under this assumption, N IT satisfies the following: where σ(N IT ) and µ(N IT ) represent the mean and the standard deviation (SD) of N IT .µ(∆V th ) is the nominal (mean) V th degradation due to the BTI.A G is the effective channel area.We can further derive the SD of V th as: This equation shows that since the nominal V th degradation follows a fractional power law, the µ(∆V th ) also maintains a power relationship with respect to time with a fixed exponent of 1/12.Note that unlike the nominal V th degradation, the BTI-induced V th SD depends on the transistor dimension A G with a reverse square relationship.

Aging-Induced Error Probability in CMOS
In Section 2, we analyzed the temporal degradation of V th in CMOS transistors due to BTI.In this section, we show that knowing the threshold voltage degradation of a single transistor due to BTI, one can predict the degradation of CMOS transistor switching performance and SRAM read/write performance with a high degree of accuracy.

BTI-Induced Error Probability in CMOS Switches
As V th increases due to transistor aging, the voltage drive (V DD − V th ) decreases, thus gradually degrading the digital switching behavior of a CMOS transistor.However, quantifying such a negative impact on digital switching is very challenging for two fundamental reasons [18].First, the switching failure of a given logic circuit is almost always caused by a group of several transistors that gradually experience increases in V th ; therefore, it is very difficult to attribute the overall logic failure of a circuit to a single gate.Second, it is very challenging to define a clean cut-off point of the voltage drive (V DD −V th ), beyond which the transistor switching will stop functioning correctly.In fact, for a given logic gate, as long as its output voltage level can be correctly interpreted by its receiving circuit, any input voltage level is theoretically acceptable.To overcome these issues, we use a modeling approach based on the voltage transfer curve (VTC) analysis proposed in [19] and further developed in [18].In this method, the amount of headroom to a switching failure is measured with the worst-case static noise margin (SNM) present in the gate pair, which can be determined using the DC noise source configuration shown in Figure 2a,b, or equivalently from a butterfly plot shown in Figure 2c.Here, the VTC of the first gate is plotted combined with the inverse VTC of the second one.Positive SNM corresponds to the existence of two areas entirely enclosed by the VTCs, corresponding to the two stable states of the gate pair.Intuitively, the larger the values of W 1 and W 2 , the better the switching performance is.As in both [18] and [19], the exact threshold value of W 1 and W 2 , as well as their corresponding threshold values of V * th can be determined empirically.In this paper, we define that the switching failure happens when the VTC asymmetry ratio γ exceeds 0.1, where γ is defined as In many aspects, this quantifying method based on SNM is conceptually very similar to the well-known "eye diagram" used in analog circuit analysis.
. VTC curves of the 22-nm predictive CMOS device model [20] for five different V th values.
To further validate the SNM-based method, we have used the 22-nm predictive CMOS device model [20], and our SPICE simulation results are presented in Figure 3.We define the SNM values to be the side lengths of the largest squares, which can be inscribed into the areas (Figure 2c).For gates with more than one input, multiple possible VTCs exist depending on the input configuration.As suggested in [18], we solve this problem by considering only gate combinations that are expected to be critical, due to their topology.For example, given the common logic gates, such as NAND2 and NOR2, an obvious choice for the critical VTC will be the input combination of a weak low and a weak high values due to the stacked transistors in the corresponding output path.In Figure 3, we have plotted the VTC curves of the 22-nm predictive CMOS device model [20] for five different V th values.It shows that as the V th value increases from 20 mV to 100 mV, the VTC asymmetry ratio γ defined above increases from 0.0 to 0.5.Clearly, when V th reaches 50 mV, the VTC asymmetry ratio γ ≥ 0.1, thus signaling the switching failure.
As discussed in Section 2, both analytical modeling and empirical study have shown that ∆V th follows a Gaussian distribution.Furthermore, its mean µ(∆V th ) and variance σ(∆V th ) can be obtained by Equations ( 1) and (2).For any given CMOS device, when ∆V th > ∆V * th , digital switching fails, where the threshold value ∆V * th can be obtained through SPICE simulations.Therefore, the error probability of digital switching for a particular gate can be computed as: The relationship between P err and the normal distribution of ∆V th can be depicted as in Figure 4.

BTI-Induced Error Probability in SRAM Cells
The majority of FPGA devices are SRAM-based, i.e., they store logic cell configuration data in the static memory organized as an array of latches.Figure 5 illustrates a standard logic design for a SRAM cell consisting of six transistors.Unfortunately, In a static random-access memory (SRAM) cell, a mismatch in the strength between the neighboring transistors, caused by BTI-induced aging, can result in the failure of the cell [21], therefore causing the FPGA logic to malfunction.Specifically, there are mainly three causes of SRAM failure.
Read failure: An increase in the cell access time that exceeds the delay requirements can cause SRAM cell read failure.In SRAM cell concepts, the cell access time is defined as time of generating a difference of pre-specified voltage between two bit-lines.The threshold voltage V th of access transistor AX R and the pull-down NMOS N R may significantly increase the access time.
Write failure: The inability of writing data into SRAM cells is called write failure.For example, suppose the SRAM cell currently stores the value "1", when writing "0" into this cell; the node V L gets discharged through the bit line BL in Fig. 5 to the low value V WR determined by the voltage division between the PMOS P L and the access transistor AX L [21].If V L cannot be reduced within time below the trip point of inverter P R − N R (V TRIPWR ), the write failure occurs.

BL BL
Hold Failure: Hold failure happens when the content of a SRAM cell cannot be preserved due to the application of lower power voltage V DD , which aims at saving leakage power consumption.For example, in Figure 5, if the voltage of node L is lower than the trip point of inverter (P R −N R ), hold failure occurs.Additionally, flipping of the cell data with the application of a supply voltage lower than the nominal one can also cause the failure of data holding in a SRAM cell at the standby mode.All of these failure modes can be caused by the BTI-induced V th changes in CMOS transistors.In this paper, we adopt the probabilistic SRAM failure model first proposed in [21] in order to analyze and quantify the failure probabilities (access-time failure, read/write failure and hold failure) of synchronous random-access memory (SRAM) cells.Unfortunately, there is no close-form solution for the overall error probability for a given SRAM cell.Instead, we rely on a numerical method to obtain the error probability solutions.In Figure 6, we present such error probability results.Note that it is the σ(V th ), not the V th itself, that determines the combined error probability of a SRAM cell.Later, in Section 4.1, we will use these results to compute the aggregated error probability of FPGA LUTs due to BTI effects.

Modeling FPGA Device Aging
An FPGA is a logic device that contains a two-dimensional array of generic logic elements (LEs) and programmable switches, as shown in Figure 7a.A logic element depicted in Figure 7b can be configured (i.e., programmed) to perform a simple function, and a programmable switch can be customized to provide interconnections among the logic elements.A custom design can be implemented by specifying the function of each logic element, selectively setting the connection of each logic element and selectively setting the connection of each programmable switch.A logic element usually contains a programmable look-up table (LUT), programmable interconnects and flip-flops (FF).An n-input look-up table is typically implemented by a static random access memory (SRAM) and is used to implement any n-input combinational function.The flip-flops can be selectively used to implement sequential circuits.Most FPGA devices also embed certain macro cells, such as block RAMs, dedicated multipliers, clock managers and I/O interface circuits.Logic elements are usually grouped into logic array blocks (LABs).Since the LUT is the basic logic element to implement the logic function, in this work, we analytically quantify the aging-induced effect on the transistor for the FPGA reliability issue.
In FPGAs, LUTs are considered the basic blocks for mapping Boolean functions.Modern FPGAs allow modifications to the mapped function of LUTs through reconfiguration, partial or full, online or offline.The logic structure of a typical FPGA logic block consists of SRAM configuration bits and switching network.Figure 8 depicts a small two-input LUT, whereas modern FPGA devices typically use six-or eight-input LUTs.In the following, we derive the error probability of a LUT analytically based on the error probability results of the SRAM cell and switching transistors developed in Sections 3.1 and 3.2.Ideally, we should also incorporate flip-flops into our analytical framework.There are two reasons why we did not do that.First, this paper mainly focuses on the logic correctness of a placed and routed circuit implemented with an FPGA device.Flip-flops are clocked circuits whose outputs may change on an active edge of the clock signal based on its input.Flip-flops normally would not change the output upon input change, even when the clock signal is asserted.Therefore, the logic correctness of an FF mostly depends on the timing violations due to device aging, which can be more effectively addressed by reducing the clock rate or allowing more generous timing margins at the design stage.Secondly, for all combinational circuits, FF does not exist.Even for the sequential circuits, the number of FFs is much smaller than the number of switches in a modern FPGA.Finally, although we did not include FFs in our theoretical analysis, we include every gate in our experiments of extracting modular criticality through simulations.As discussed in Section 3, the device aging effect can cause read, write and hold failure in SRAM cells.In fact, the error probability of a SRAM cell is determined by σ(V th ), which can be described as a function of device duration t, technology node G device and signal probability α.Therefore, the error probability of a SRAM cell e SRAM = f (t, G device , α).
After configuring an FPGA device, the SRAM cells in each of the activated LUTs store different logic values, "0" and "1", that determine the functionality of each LUT, hence the overall functionality of the complete implemented logic design.During the operation of an FPGA device, for any given LUT used, different combinations of input signals will switch on different transistor paths.The switched-on path will establish the connection to a specific SRAM cell, whose stored logic bit becomes the output.
Assume that the LUT has N inputs; the total number of bits in the N -LUT will be 2 N .Furthermore, assume the access probability and the error probability of each SRAM cell to be P i and e i , respectively.Because the error probability of a memory cell also depends on its content [21], we denote the error probability of a SRAM cell that stores "0" and "1" as e i,0 and e i,1 , respectively.Finally, we suppose that the error probability of each memory cell is totally independent.Therefore, the error probability of a 2 N configuration memory block in a N -LUT can be written as . Now, we assume that all SRAM cells are designed and manufactured with the same error characteristics; therefore, ∀i, e i,0 = e 0 and e i,1 = e 1 .As a result, P SRAM = i∈S 0 P i e 0 + i∈S 1 P i e 1 , where S 0 and S 1 denote the set of all memory cells that store "0" and "1", respectively.Because the signal probability α equals i∈S 1 P i , the final result: As discussed in the previous section, the "1" cell is much more critical than the "0" cell, and the increasing of the signal probability of the LUT output for the "1" cell error probability will definitely increase the total probability of the SRAM.

Modeling Error Probability of Switching Network
As illustrated in Figure 8, the LUT of modern FPGA devices typically uses an NMOS pass transistor as the switching elements.Different input signal combinations can turn on some of these switching transistors and route the bit content of one of LUT memory cell to the output y.Therefore, the error probability for the switching network can be written as: where P i and e path,i denote the probability of taking switching path i and the probability that the path i malfunctions.
A faulty switching path can be caused by the switching degradation of the NMOS transistor, as discussed in Section 3.1.Fundamentally, the long stress time on the NMOS transistor may lead to increasing of threshold voltage V th , which, in turn, results in degraded switching strength.Figure 9 illustrates a switching path consisting of N NMOS transistor switches.Assuming the aging-induced error probability of an individual NMOS transistor M i to be e M i , the probability for the switching path to be faulty can be written as , when ∀i, e M i = e M .Substituting this into Equation (5) results in: Each LUT in an FPGA device consists of two parts: the SRAM cell array and the multiplexer switching network.Assuming that these two components malfunction independently, the error probability of a LUT can be formalized as follows: where e 1 , e 0 and e M denote the error probability of a SRAM cell that stores "1" and "0" and the aging-induced error probability of an individual NMOS transistor (M), respectively.Furthermore, Because e 1 > e 0 , as discussed in Section 4.1, obviously dP err,LUT dα > 0, which means that the overall error probability of a LUT increases monotonically with the increase of the output signal probability α.
It will become clear that this result is critical in our optimal solution of improving the overall reliability of a placed and routed logic design with an FPGA device.

Analyzing FPGA Device Reliability
In this section, an intuitive approach to reliability analysis is described.It is based on the observation that a failure at a gate close to the primary output has a greater probability of propagating to the primary output than a gate several levels of logic away from the primary outputs.This is because a failure that has to propagate through several levels of logic has a higher probability of being logically masked.This can be quantified by applying the concept of observability, which has historically found use in the testing and logic synthesis domains.
In reliability analysis, the logic observability of any logic node can be defined as the probability that a logic value upset error (0 → 1 or 1 → 0) at the logic node under consideration will change the circuit outputs.As stated in [22], logic observability can be computed with Boolean differences, symbolic techniques based on binary decision diagrams (BDDs) or simulation.In this study, we will attempt to derive a closed-form expression for the logic reliability, P correct (e), where e denotes the error probability of each logic gate.
To the best of our knowledge, there has not been any systematic study on accurately measuring modular criticality values within a large-scale VLSI digital circuit.The most related works to this paper are several recent studies that explored various analytical ways of computing the overall logic reliability of VLSI logic circuits [23][24][25][26].Reliability analysis of logic circuits refers to the problem of evaluating the effects of errors due to noise at individual transistors, gates or logic blocks on the outputs of the circuit.The models for noise range from the highly specific decomposition of the sources, e.g., single-event upsets, to highly abstract models that combine the effects of different failure mechanisms [27,28].For example, in [22], the authors developed an observability-based approach that can compute a closed-form expression for circuit reliability as a function of the failure probabilities and observability of the gates.Unfortunately, all of these analytical studies, although mathematically concise, have to make some key assumptions, therefore seriously limiting their applicability and accuracy.For example, the method in [22] needs to approximate the multi-gate correlations in order to handle reconvergent fan-out.In addition, it is not clear how the existing analytical approaches can handle some unspecified probabilistic input vector distributions or more complicated correlation patterns within a VLSI logic circuit.

Optimally Improving Reliability via Discriminative V DD Scaling
In a typical FPGA CAD flow, after logic synthesis and technology mapping, any given logic circuit will be converted into a network of LUTs (G).Without loss of generality, we assume that the circuit under consideration consists of N LUTs, and each LUT has k inputs and one output.Furthermore, we assume that G has M signal nets, each of which connects the output port of exactly one LUT to the input ports of a number of LUTs.Furthermore, we define the signal probability and error observability of signal net i as α i and β i , respectively.Finally, in this study, we define the product of α i and β i as the logic criticality γ i of LUT i.
G's output reliability R(G, {e i } N i=1 ) is its probability of being correct in all its output ports when a large ensemble of identically and independently distributed (i.i.d.) random inputs are applied.Here, {e i } N i=1 denotes the vector of error probability of all N gates.Intuitively, the larger the γ i is, the more critical the LUT i is to the correctness of the whole circuit G.Note that the input vector distribution need not to be uniform i.i.d.Instead, it can be any general form.In other words, the larger the logic criticality γ i is, the more sensitive the overall output reliability is towards LUT i's error.
The intuitive explanation of our definition of γ i = α i × β i is straightforward.First, for any LUT i, α i represents the frequency of its output switching, which is directly related to the transistor aging and shows how likely a switching error will occur.Second, β i shows how sensitive the final output of G will be to the output error of LUT i. Essentially, γ i reflects the combined effect of both α i and β i towards G's overall correctness.In the following, we will show that our definition is not only intuitive, but also optimal in the sense that, using the ranking of logic criticality γ i as the guidance, we can optimally maximize the overall reliability improvement of G given a fixed amount of extra resources, such as additional chip area or extra power budget.Thus, observability-based reliability analysis makes two simplifying assumptions for estimating the effect of multiple gate failures.
1.The effect of LUT failures at the primary output is decoupled from each other, i.e., a failure at each LUT i is assumed to affect the output with a probability β i regardless of other LUT failures.This assumption allows the joint observability to be replaced by simultaneous observability, which is computationally less demanding, to compute the effect of multiple gate failures at the output.2. The observability of the LUTs are assumed to be independent of each other.Using this assumption, the computation of the simultaneous observability of two LUTs can be simplified to the product of the individual LUT observabilities.For instance, the probability that LUT 1 is observable and LUT 2 is not observable is given by β 1 (1 − β 2 ), and the probability that LUT 1 and LUT 2 are both not observable is given by With this background, we shall derive the expression for the probability of error at the output for a general circuit network G with N LUTs.Without loss of generality, we assume that the circuit has a single output y.Denote the error probability and logic observability of the ith LUT by e i and β i , respectively.Using the first assumption, the output y will be in error when an odd number of faulty LUTs in G are simultaneously observable.Using the second assumption, the simultaneous observability of a set of LUTs can be computed by simply multiplying the individual observabilities of the LUTs.
In general, the probability that only the LUTs in F are observable is given by has the same magnitude as A and the same sign as A when F has an even number of LUTs and the opposite sign as A when F has an odd number of LUTs.Thus, when F has an odd number of LUTs, the expression 1/2(A − B) gives the probability that the LUTs in F are observable, and when F has an even number of LUTs, 1/2(A − B) is equal to zero.Thus, the probability that an odd number of LUTs in G is observable is given by: By the first simplifying assumption, the probability of error at the output y given that the LUTs in G have failed is also given.
Thus, Pr(y err )|G = . The probability that the LUTs in G are in error and the LUTs in G c are error-free is given by i∈G e i j∈G c (1 − e j ).Thus, the probability of error at the output y is given by This result clearly shows that, in order to minimize the overall error probability Pr(y err ), we should always choose the largest e i β i terms to remove.Therefore, given N LUTs in an FPGA design, if only K of them can be fortified, in order to maximize the overall design reliability, we should always choose the K LUTs with the largest criticality values γ i , where γ i = α i × β i .

Results and Analysis
To validate our error probability model and our discriminative assignment strategy, we have chosen 10 circuits from the Altera benchmark suite of the Quartus University Interface Program (QUIP).The overall procedure of our experiments is depicted in Figure 10.All of our test circuits are in the form of Verilog source files.We rely on the commercial Altera Quartus 2 software to perform all FPGA logic synthesis, logic optimization and technology mapping.Finally, the resulting .QVM files from the Quartus contain both the LUT netlist and the encoded logic truth table for each LUT.We then use our in-house logic simulation tools to read in the .QVM file, and perform logic simulation.As in many other studies, we use extensive Monte Carlo logic simulation to obtain the error probability of any given circuit design.For each benchmark circuit, we cover all possible input combinations.For each input combination, we run many simulation iterations in order to obtain accurate output reliability.Of course, the number of simulation iterations for any given input vector will highly depend on the specific topology and complexity of the targeted logic circuit.We continue logic simulations until the out error probability saturates.Our results have shown that typically 5,000 logic simulations for each input vector are often sufficient.As for logic observability, we use a similar approach, the only difference being that we only invert the logic value at the logic node under consideration, while keeping all logic values at all other nodes unchanged.The observability will be measured by counting the probability for any output to change its value.Obviously, these measurement results also take the dependency of logic observability on internal logic values into considerations.To deal with the intensive computations required for the above logic simulations, we employ the STOKEScomputing cluster at UCF (University of Central Florida), which consists of 3,450 compute cores (Intel Xeon 64-bit processors) and over 7.5 TB of RAM.The total simulation took about one week to complete.
We use the 45-nm predictive technology model (PTM) to model all CMOS devices (http://ptm.asu.edu).At the nominal V DD = 1V and V th = 0.18V , we assume the error probability of all transistors to be zero.We then set the on-time to be C = 3 years and obtain the duty cycle values Y from our logic simulations.Next, using Equation (1), we compute the ∆V th , which can then be used to obtain σ 2 (∆V th ) using Equation (2).Using Equation (3), we then obtain the error probability of a single transistor P err .Finally, we can calculate the error probability of any single LUT by the method discussed in Section 4. Note that the above methodology of computing error probability caused by device aging is only applicable to pass-transistor switches.Because the SRAM elements store constant values, we use a different approach to evaluate the aging effect on the error probability.Specifically, as discussed in Section 4.1, after obtaining σ 2 (∆V th ) using Equation (2), we can utilize the empirically-measured data, as shown in Figure 6, to read out three main components of the error probability of a SRAM memory cell [21], which can be readily combined to obtain the total memory error probability due to device aging.Our results have shown that for the 45-nm CMOS technology, after three years of switch-on time, the ∆V th is 0.063 V, which induces about 1.34 × 10 −4 in LUT error probability.This error probability can be completely eliminated by elevating the V DD to be about 1.1 V.

Logic Simulation Input Vector
Output Vector All results in Table 1 have been obtained under the above assumptions.For each of these ten benchmarks, we conduct four sets of experiments denoted by U, A, B and C. Type U experiments serve as the baseline when no circuit fortification is done.In Type A experiments, we use the optimal fortification strategy that we developed in Section 6, i.e., we chose K LUTs with the largest criticality values to fortify.In Type B experiments, we randomly pick K LUTs to fortify, while in Type C experiments, we do the opposite to our optimal fortification strategy: we chose K LUTs with the smallest criticality values to fortify.Finally, we have tried three different K values, which are 10%, 20% and 30% of N .1, for all benchmark circuits, our optimal discriminative voltage scaling method has significantly improved its overall logic circuit reliability.The improvement ranges from approximately three-times to five-times.Not surprisingly, the opposite voltage scaling (Type C) has performed poorly with reliability improvements ranging from merely 10% to 30% for K = 10%N .Also intuitively true, when K values increases from 10% to 30%, for any benchmark circuit and any voltage scaling method, the improvement in overall circuit reliability steadily increases.Somewhat surprisingly, when comparing Type B with Type C experiments, very few differences can be found.This essentially shows that, without utilizing the LUT criticality values as the guidance for discriminative voltage scaling, the reliability improvement is almost as poor as the worst scenario.This finding clearly shows the significant advantage of our proposed discriminative voltage scaling scheme based on the LUT criticality ranking.
When examining the results in Table 1 more carefully, one can find that the effectiveness of our discriminative voltage scaling method varies widely.For example, after fortification, the reliability of FLIP_RISKS has been improved by almost 3.22-times, while the reliability of EX1010 has only been improved by 1.58-times, although both circuits are of almost the same size.To better understand this phenomenon, in Figure 11, we have plotted the value profile of LUT criticality and LUT error probability values for both circuits.In each circuit, we first sort all of the LUTs according to the decreasing order of criticality.We then plot the LUT error probability values according to this sorted order.Comparing Figure 11a,b, one can easily observe that for the circuit FLIP_RISKS, the sorting order of LUT criticality and error probability match quite closely.In contrast, for Ex1010, these two orderings differ greatly.In other words, for FLIP_RISKS, the most critical LUT often is the one with the highest error probability, while for Ex1010, the opposite is true.Therefore, in the case of Ex1010, we may have fortified many LUTs with very low error probability, hence the relatively low effectiveness of our discriminative voltage scaling.

Related Work
Criticality analysis has been extensively studied in software [29], but is quite rare in error-resilient computing device research.Only recently, the general area of criticality analysis (CA), which provides relative measures of significance for the effects of individual components on the overall correctness of system operation, has been investigated in digital circuit design.For example, in [30], a novel approach to optimize digital integrated circuit yield with regards to speed and area/power for aggressive scaling technologies is presented.The technique is intended to reduce the effects of intra-die variations using redundancy applied only on critical parts of the circuit.In [31], the researchers have explored the idea of discriminatively fortifying a large H.264 circuit design with FPGA fabric.They recognize that: (1) different system components contribute differently to the overall correctness of a target application and therefore should be treated distinctively; and (2) abundant error resilience exists inherently in many practical algorithms, such as signal processing, visual perception and artificial learning.Such error resilience can be significantly improved with effective hardware support.However, in [31], the authors used Monte Carlo-based fault injection, and therefore, the resulting algorithm cannot be efficiently applied to large-scale circuits.Furthermore, their definition of modular criticality was quite ad hoc, therefore lacking analytical justification.
More relevant to our study, [32] introduced a logic-level soft error mitigation methodology for combinational circuits.Their key idea is to exploit the existence of logic implications in a design and to selectively add pertinent functionally redundant wires to the circuit.They have demonstrated that the addition of functionally redundant wires reduces the probability that a single-event transient (SET) error will reach a primary output and, by extension, the soft error rate (SER) of the circuit.Obviously, the proposed circuit techniques can be readily applied using our proposed criticality estimation method, especially in a large-scale circuit case.However, more importantly, the method used in [32] to determine circuit criticality is mostly done by assessing the SET sensitization probability reduction achieved by candidate functionally-redundant wires and selects an appropriate subset that, when added to the design, minimizes its SER.Consequently, their overall method of criticality analysis is rather heuristic and utilizes largely "local" information.In addition, it is not very clear how this method can scale with very large-scale circuits.
Samudrala et al. [33] also targeted hardening combinational circuits, but focused on mapping digital designs onto Xilinx Virtex FPGAs against single-event upsets (SEUs).They do not perform detailed criticality analysis.Instead, their method uses the signal probabilities of the lines to detect SEU-sensitive sub-circuits of a given combinational circuit.Afterwards, the circuit components deemed to be sensitive are hardened against SEUs by selectively applying triple modular redundancy (STMR) to these sensitive sub-circuits.More recently, in [34], a new methodology to insert selective TMR automatically for SEU mitigation has been presented.Again, the criticality was determined based on empirical data.Because the overall method is cast as a multi-variable optimization problem, it is not clear how this method can scale with circuit size, and few insights will be provided as to which part of the circuit is more critical than others, and by how much.
Finally, another related study [6] also studied the transistor aging mostly due to NBTI and PBTI for FPGA technology.However, they only investigated the effect of transistor aging, due to NBTI and PBTI, in LUTs, by considering different implementations through detailed SPICE simulations.In contrast, our study involves both analytical and empirical studies.More importantly, we study how to improve the overall logic reliability for logic circuits implemented with an FPGA device, without modifying any logic structure in FPGA circuit implementations.Our main approach is to strategically elevate V DD at various critical components to maximize its reliability benefits.

Conclusions
There are two fundamental contributions in this work.First, to the best of our knowledge, this study is the first one to reveal the analytical relationship between the BTI-induced device aging and its device reliability through a probabilistic argument.Building upon this finding, we were able to derive analytical models to model the circuit reliability of LUTs in an FPGA device.Second, for the first time, we show that, given a fixed amount of extra resources, the optimal way to allocate them, so that the overall reliability of circuit design can be maximized, is to use the criticality to prioritize the resource allocation.This solution is quite general in its applicability.Moreover, the extra resource considered can take many forms.In this work, we chose to use elevated V DD , but this can also be replaced with hardware redundancy, transistor device engineering or transistor sizing.

Figure 1 .
Figure 1.Illustration of Dynamic BTI.Each clock cycle consists of two phases: (1) Stress (D) and (2) Recovery (R).The dashed line represents the overall aging process, i.e., the increasing trend of V th .

Figure 2 .
Figure 2. (a) Using a looped gate chain with feedback configuration to model a gate chain with infinite length in (b); (c) the voltage transfer curves (VTCs) of the gate pair are used in butterfly plots to determine the static noise margins (SNMs) [18].

Figure 4 .
Figure 4. Probabilistic density function of ∆V th (mV).∆V *th denotes the cut-off point, beyond which the transistor stops switching correctly.The shaded area represents the total error probability that the transistor malfunctions.

Figure 5 .
Figure 5. Transistor network of a standard SRAM cell.

Figure 7 .
Figure 7. (a) Sketch of the FPGA architecture; (b) diagram of a simple logic block.FF, flip-flop.

Figure 8 .
Figure 8. Logic diagram of a two-input LUT.

Figure 10 .
Figure 10.CAD flow of our circuit design experiments.

Figure 11 .
Figure 11.Profile comparison between LUT criticality and LUT error probability values.(a) Results of circuit FLIP_RISKS.(b) Results of circuit Ex1010.

Table 1 .
Results of the overall error probability P err for all 10 Quartus University Interface Program (QUIP) benchmarks.