Low Power Electronics and Applications Stochastically Estimating Modular Criticality in Large-scale Logic Circuits Using Sparsity Regularization and Compressive Sensing

This paper considers the problem of how to efficiently measure a large and complex information field with optimally few observations. Specifically, we investigate how to stochastically estimate modular criticality values in a large-scale digital circuit with a very limited number of measurements in order to minimize the total measurement efforts and time. We prove that, through sparsity-promoting transform domain regularization and by strategically integrating compressive sensing with Bayesian learning, more than 98% of the overall measurement accuracy can be achieved with fewer than 10% of measurements as required in a conventional approach that uses exhaustive measurements. Furthermore, we illustrate that the obtained criticality results can be utilized to selectively fortify large-scale digital circuits for operation with narrow voltage headrooms and in the presence of soft-errors rising at near threshold voltage levels, without excessive hardware overheads. Our numerical simulation results have shown that, by optimally allocating only 10% circuit redundancy, for some large-scale benchmark circuits, we can achieve more than a three-times reduction in its overall error probability, whereas if randomly distributing such 10% hardware resource, less than 2% improvements in the target circuit's overall robustness will be observed. Finally, we conjecture that our proposed approach can be readily applied to estimate other essential properties of digital circuits that are critical to designing and analyzing them, such as the observability measure in reliability analysis and the path delay estimation in stochastic timing analysis. The only key requirement of our proposed methodology is that these global 4 information fields exhibit a certain degree of smoothness, which is universally true for almost any physical phenomenon.


I. INTRODUCTION
As electronic device technology aggressively scales, there will likely be a sharp increase in manufacturing defect levels and transient fault rates, e.g., [2], [3], [4], [5], which will undoubtedly degrade the performance and limit the reliability of emerging VLSI circuit systems.To overcome these challenges, digital circuit designers frequently conduct the tasks of measuring electrical parameters, discovering logic patterns, and analyzing structures within a very large-scale gate network [6], [7], [8].Unfortunately, given the exponential increase in both size and complexity of modern VLSI digital circuits, the total computational cost and effort of an exhaustive measurement approach can be prohibitive, especially considering the more pronounced delay variations for minimum transistor sizes caused by statistical deviations of the doping concentration [9], [10].For example, because the Copyright c 201x IEEE.Personal use of this material is permitted.However, permission to use this material for any other purposes must be obtained from the IEEE by sending an email to pubs-permissions@ieee.org.yield of low voltage digital circuits is found to be sensitive to local gate delay variations due to uncorrelated intra-die and inter-die parameter deviations [11], [12], extensive path delay measurements, which typically take days of completion time, have to be made in order to evaluate the influence of process variations on path delays in VLSI digital circuits.Such a long latency in measurements has been shown to seriously impede fast design space explorations in VLSI digital circuit design [13], [14], sometimes even forcing circuit designers to resort to suboptimal design solutions.As such, we believe that it is imperative to find an accurate, robust, scalable, and computationally efficient method to minimize the total measuring efforts for VLSI digital circuit design.
On the other hand, the area of stochastic signal estimation and learning has developed many paradigm-shifting techniques that focus on maximizing computational efficiency in numerical analysis [15].For example, compressive sensing, a novel sensing paradigm that goes against the common wisdom in data acquisition, has recently developed a series of powerful results about exactly recovering a finite signal x 0 ∈ R N from a very limited number of observations [16], [17], [18], [19].In particular, suppose we are interested in completely recovering an unknown sparse 1-D signal x = {x 1 , x 2 , x 3 , . . ., x N } ∈ R N , if x's support T (x) = {t i : x i = 0}, i ∈ [1, N], has small cardinality, and we have K linear measurements of x in the form of where the a k = (a 1,k , a 2,k , • • • , a N,k ) are pre-determined test vectors, we can achieve extremely high estimation accuracy with very computationally efficient l 1 minimization algorithms [20].We believe that there is a strong linkage between stochastic signal processing techniques and parameter measurements in logic design.In order to bridge the conceptual gap between them, we make the key observation that many interesting property of logic circuits can be treated mathematically as 2-D random field of stochastic signal distributions.As a direct result, measuring or recovering physical parameters in VLSI digital circuits allows extremely accurate methods provided that these parameters possess certain forms of smoothness prior, which is almost universally true for any physics-based phenomena.In this paper, we attempt to develop an efficient method to stochastically estimate the modular criticality of a very large-scale digital circuit.The importance of estimating modular criticality stems from two facts.First, with the advent of next-generation CMOS technology, reliability of logic circuits emerges as a major concern.Therefore, the value distribution of modular criticality within a modern VLSI circuit can provide invaluable guidance in robust circuit design.Second, because some logic gates within a digital circuit may be more critical than others, intuitively, more hardware resources should be allocated to these more critical gates, i.e., stronger computing "fortification", in order to achieve higher overall error resilience of the digital circuit under consideration.
Unfortunately, despite of the theoretic elegance of compressive sensing principle, directly applying it to estimating modular criticality in a digital circuit proves to be difficult for several reasons.Specifically, unlike most signal measurements that are uniformly distributed in a 1-D or 2-D lattice system, the netlist of digital circuits typically exhibits the interconnect structure of a random graph.Intuitively, to apply compressive sensing, we need to somehow map our circuit structure G to a 2-D regular grid system, such that the planarity of G is maximized and all the neighboring systems of each gate is largely conserved.Secondly, we normally can only measure the modular criticality value of each gate one by one.It is not obvious how we can make linear measurements of all gates in a conventional way.Thirdly, because our objective is to accurately estimate the modular criticality of each gate while minimizing the total number of measurements.Intuitively, how we choose the measurement locations is essential to our final performance.Finally, how do we choose the transforming basis A in Eqn. ( 1) is also important.Not surprisingly, we desire it to be efficiently computable and yet still observe the uniform uncertainty principle required by compressive sensing principle, which requires that every set of columns in A with cardinality less than s approximately behaves like an orthonormal system [16].

Contributions and Outline
To the best of our knowledge, there has not been any systematic study on accurately measuring modular criticality values within a large-scale VLSI digital circuit.The most related work to this paper are several recent studies that explored various analytical ways of computing the overall logic reliability of VLSI logic circuits [6], [7], [8], [21], [22], [23], [24], [25], [26], [27], [28], [29].Reliability analysis of logic circuits refers to the problem of evaluating the effects of errors due to noise at individual transistors, gates, or logic blocks on the outputs of the circuit.The models for noise range from highly specific decomposition of the sources, e.g., singleevent upsets, to highly abstract models that combine the effects of different failure mechanisms [30], [31], [32], [33], [34].For example, in [35], the authors developed an observabilitybased approach that can compute a closed-form expression for circuit reliability as a function of the failure probabilities and observability of the gates.Unfortunately, all of these analytical studies, although mathematically concise, have to make some key assumptions, therefore seriously limiting their applicability and accuracy.For example, the method in [35] needs to approximate the multi-gate correlations in order to handle reconvergent fan-out.In addition, it is not clear how the existing analytical approaches can handle some unspecified probabilistic input vector distributions or more complicated correlation patterns within a VLSI logic circuit.
To overcome the limitations due to analytical approaches, digital circuit designers have to resort to other standard techniques for reliability analysis such as using fault injection and simulation in a Monte Carlo framework.However, although parallelizable, these brutal-force techniques are too inefficient for use on large-scale circuits.In this paper, we aim at strategically integrating both empirical and analytical means by applying the newly developed compressive sensing principle.Our main objective is to improve the measurement accuracy of modular criticality values in large-scale digital circuits while minimizing the overall computational efforts and time.Most importantly, we attempt to develop an accurate measurement approach that is general in its applicability, i.e., it can not only handle any input distribution but also deal with any kind of gate correlation patterns.Our ultimate goal is to provide a general framework that can also be used to tackle other engineering problem with the similar nature, i.e., accurately extracting global information using a very small number of local measurements.
We first show that there are several technical obstacles if directly applying the classic compressive sensing technique.Specifically, how to promote signal sparsity of the value distribution of modular criticality?how to effectively acquire linear measurements across the whole circuit?and how to optimally determine the best measurement locations?In this paper, we inversely estimate DCT transform coefficients of modular criticality values using two techniques of domain regularization in order to promote signal sparsity.More importantly, we develop a new adaptive strategy based on Bayesian learning to select optimal locations for additional measurements, which provides us with both confidence level and termination criteria of our estimation scheme.Our adaptive algorithm greatly enhances the capability of the classic compressive sensing technique, which only considers measurement locations pre-chosen, therefore working statically.Finally, to illustrate the value of knowing the modular criticality values in a VLSI digital circuit, we propose a new concept of discriminative circuit fortification that can significantly improve the overall circuit robustness with very limited hardware resource overhead.We show that the obtained modular criticality values can greatly help the designer to decide where to optimally allocate extra hardware resource in order to improve the overall robustness of the target circuit.
The rest of the paper is organized as follows.Section II states our target problem in a mathematically formal way and Section III overviews our proposed estimation framework.We then delve into more detailed descriptions of stochastic-based criticality analysis procedure.Specifically, in Section IV, we describe the specific techniques we utilized to promote signal sparsity through transform domain regularization.In V and VI, we outline the detailed algorithmic steps of our basic estimation strategy and more advanced adaptive version based on Bayesian learning, respectively.Subsequently, Section VII describes some estimation results we obtained using benchmarks from ISCAS89 suite.In these results, we aim to illustrate both the effectiveness and the computational efficiency of our proposed approach.Afterwards, we present and analyze the usefulness of modular criticality values by applying discriminative logic fortification to several circuits.As we will show, the knowledge of modular criticality values for a given circuit can significantly increase the cost-effectiveness of hardware redundancy.Finally, Section X concludes the paper.Given a digital circuit G consisting of N gates, we define G's output reliability R(G, {e i } N i=1 ) as its probability of being correct in all its output ports when a large ensemble of identically and independently distributed (i.i.d.) random inputs are applied.Here, {e i } N i=1 denotes the vector of error probability of all N gates.We then define the modular criticality Φ i of gate i as

II. PROBLEM STATEMENT
Intuitively, the larger the Φ i is, the more critical the gate i is to the correctness of the whole circuit G.Note that the input vector distribution need not to be uniform i.i.d.. Instead, it can be any general form.Fig. 1 sketches the idea of defining Φ i .Basically, we define the modular criticality value of any gate i to be the slope of its output error probability over the error probability of gate i. Intuitively, the larger the modular criticality (or the error slope) is, the more sensitive the overall output reliability is towards gate i's error.
According to our definition in Equation ( 2), Φ i will be a function of the specific value of e i .However, we observed that, for all benchmark circuits we used, such dependency is fairly weak.As a result, in this paper, we use the expected value of Φ i (e) as the final modular criticality of the gate i.As an inherent property of the digital circuit under our consideration, Φ i only depends on the target circuit's topology and logic function.In order to show the weak relationship between Φ i and e i , we choose a benchmark circuit c7552.benchform ISCAS89 benchmark suite as an example.It consists of 3512 logic gates.We randomly picked two gates, G5723 and G776, and use Monte Carlo simulations to obtain various Φ i values at different e i s.As shown in Fig. 2, for each gate, the average criticality value can be computed through the well-known least square fitting method.Using our definition of modular criticality in Equation (2), we have performed extensive numerical measurements on all 11 benchmarks circuits.For the c7552.bench in ISCAS89 benchmark suite, we plotted its criticality distribution at various gate error probability values in Fig. 3.We make two interesting observations.First, the locations of all critical gates, "hot spots", do not change in a noticeable way as the gate error probability δ changes from 0.05 to 0.5, which again validates the point above that the modular criticality is an inherent circuit property depending only on circuit topology and gate functions (See Fig. 2).Second, there are very large variations in modular criticality values for different gates in the same digital circuit.As shown in Fig. 3(e), for all 11 benchmark circuits, the modular criticality values can differ by more than 17 times.
Unfortunately, despite of the simplicity of our definition for modular criticality, accurately measuring them within a large-scale digital circuit turns out to be very computationally intensive.It should be clear that, according to our definition of modular criticality in Eqn. 2, measuring the reliability of a given logic circuit is an essential step to obtain the modular criticality values for its logic gates.Although there are both analytical and empirical ways to obtain reliability values for a given circuit, unfortunately, reliability analysis of logic circuits is computationally complex because of the exponential number of inputs, combinations, and correlations in gate failures.For large-scale logic circuits, analytical solutions are challenging to derive.Despite of some successes in recent work, most of these analytical techniques need to make some key assumptions, therefore compromising their accuracy.
Given a digital circuit G consisting of N gates, suppose we need T seconds in order to evaluate G's reliability for a fixed e configuration.According to our definition of modular criticality in Eqn. 2, we need the total execution of N × T × k in order to obtain all modular criticality values within G, where k denotes the number of e k required for computing the error slope.Note that the runtime of Monte Carlo simulation for a single round of reliability analysis grows exponentially with the circuit size N .As a result, for very large N , the straightforward Monte Carlo-based strategy for modular criticality is extremely time-consuming.The key objective of this paper is to develop a highly robust and scalable approach to accurately estimate the modular criticality of a target digital circuit.

III. OUTLINE OF OUR APPROACH
The backbone of our proposed methodology is the welldeveloped compressive sensing theory that exploits two principles: sparsity and incoherence [16], [20].While sparsity pertains to the signals of interest, incoherence has to do with sensing modality.Sparsity expresses the fact that many natural signals are sparse or compressible in the sense that they have concise representations when expressed in the proper basis.Incoherence extends the duality between time and frequency and expresses the idea that objects having a sparse representation in one domain must be spread out in the domain in which they are acquired, just as a Dirac or a spike in the time domain is spread out in the frequency domain.Put differently, incoherence says that unlike the signal of interest, the sampling/sensing waveforms have an extremely dense representation.
Fig. 4 illustrates our overall methodology.Given a VLSI digital circuit consisting of N logic gates, we first convert it into a netlist graph, in which each node represents one logic gate and all edges represent the interconnects between gates (Step 1).To facilitate our estimation approach, we need to first map this circuit graph to a 2-D regular grid system according to the procedure defined in Section IV (Step 2).As will be shown in Section IV-B, random circuit mapping to a 2-D grid will produce a 2-D signal field with very little sparsity, therefore unsuitable for compressive sensing.To enable the application of compressive sensing to our criticality estimation problem, we developed two techniques: graph mapping and spectral transform, to enhance and expose the hidden sparsity in the information field that we are interested into.Finally, we developed an adaptive strategy to optimally select our compressive sensing measurements for both increasing our estimation accuracy and reducing the total number of measurements required.

IV. SPARSITY-PROMOTING DOMAIN REGULARIZATION
For compressive sensing to be effective, two requirements have to be met.First, the target information field must allow efficient measurements in a sparse signal form.Second, there must exist a numerically-efficient optimization method to reconstruct the full-length signal from the small amount of collected data.Mathematically, it means that by projecting the signal X ∈ R M ×N to a new M × N coordinate systems, the resulting signal Y ∈ R M ×N will contain relatively very few non-zero terms.Under such constraints, the worst signal example would be a 2-D independent random variables, where there is no stochastic prior information to be utilized.Fortunately, almost all physically realistic signals will possess a certain degree of sparsity, which is especially true when the signal under consideration is transformed with some domain regularization.To illustrate this, we accurately measured modular criticality distribution of a real-world digital circuit c7552.benchconsisting of 3412 gates.Fig. 5a and 5b show its φ distribution in 2-D and its corresponding 1-D representation.As can be clearly seen, its signal energy projected by DCT concentrates on very few terms, while the rest of its DCT terms are relatively small.
Because the sparsity of signal field is so critical to the effectiveness of compressive sensing, we borrow the idea of domain regularization from stochastic optimization to promote or enhance the signal sparsity of modular critical values under our consideration.Conceptually, regularization involves introducing additional prior knowledge in order to enhance or expose the inherent sparsity of a target signal field, which facilitates solving an ill-posed problem and prevents overfitting.Such additional constraints often penalize complexity and promote smoothness or sparsity (bounds on the vector space norm).Theoretically, regularization is justified by the principle underlying Occam's razor on the solution 1 .From a Bayesian point of view, many regularization techniques correspond to imposing certain prior distributions on model parameters.The complexity of regularization varies greatly, ranging from the least-squares method to much more sophisticated statistical learning methods such as ridge regression, lasso, and L2-norm in support vector machines.
Intuitively, the choice of regularization techniques ultimately depends on the generating physics behind the observed quantities.In other words, the form of the regularization constraint should be consistent with and promote the ex-pected properties of the physical system under investigation.In this study, we exploit the fact that logic gates located closely exhibit stronger correlations between their logic values.Specifically, we propose 1) using graph mapping to regularize the spatial information of the target logic circuit, and 2) using DCT to encode modular criticality information into a compressive form.

A. DCT Transform
The classic compressive sensing methodology requires acquiring multiple linear measurements of all signals.This is clearly infeasible to do in our modular criticality measurements.Therefore in this work, instead of directly computing the target Φ field x ∈ R M ×N , we estimate its DCT transform terms y ∈ R M ×N , therefore the limited number of Φ measurements can be looked as linear measurements of y.This indirection is important because the essence of compressive sensing is: instead of measuring a lot of information at each location, compressive sensing measures a bit of information from all locations during each measurement.Obviously, once y is accurately estimated, the 2-D signal x can be obtained through 2-D DCT.
The discrete cosine transform (DCT) is a linear transform that is widely used for image coding and spectral analysis because of its compression power for smooth and correlated data.Mathematically, a DCT is a Fourier-related transform similar to the discrete Fourier transform (DFT), but using only real numbers.A DCT transform expresses a sequence of finitely many data points in terms of a sum of cosine functions oscillating at different frequencies.The use of cosine rather than sine functions is critical in these applications: for compression, it turns out that cosine functions are much more efficient, whereas for differential equations the cosines express a particular choice of boundary conditions.The DCT basis that we have used in this paper, the type-II DCT, is the most commonly used form.Multidimensional variants of the various DCT types follow straightforwardly from the one-dimensional definitions: they are simply a separable product (equivalently, a composition) of DCTs along each dimension.For example, a two-dimensional DCT-II of an image or a matrix is simply the one-dimensional DCT-II, from above, performed along the rows and then along the columns (or vice versa).That is, the 2D DCT-II is given by the following formula (3) Inversely, we have (4) In both Equations 3 and 4, ( and We now select K measurements of x.Let K pairs of (m, n) denote their corresponding x and y index in x and y p,q their measurement values.Now the standard compressive sensing formulation can be written as y = Ax, where the transform matrix A can be defined as where 0 ≤ i ≤ K −1 and its location determines two constants m and n.In addition, 0 ≤ j ≤ MN − 1 and Therefore, we now have K linear equations defined by Eqn. 7.
Obviously, because we have M • N > K unknowns, these linear equations are under-determined.We will discuss in more details how to solve them in Section V.The netlist of a typical digital circuit exhibits a random graph-like interconnect structure.However, compressive sensing is mostly designed to process 2-D signals uniformly distributed on a regular grid system.Therefore, before applying the compressive sensing technique, we need to first map the digital circuit under consideration to a 2-D grid system.The main criterion for such mapping is to promote the hidden sparsity originally residing inside our target circuit.Fig. 6 illustrates the significant difference in sparsity between a random graph embedding and an optimized one for the same set of measured modular criticality values.

B. Graph Mapping
In Fig. 6(a), we plotted the 2D modular criticality distribution for a random placement.As Fig. 6(b) shows, its 2D DCT transform has very poor signal sparsity.In other words, its DCT terms have a wide range of values.For the same distribution of modular criticality values, we tried another technique of placement method.Specifically, we tried to obtain a new placement where high magnitude terms concentrate on a small corner.To achieve this, we use the concentration ratio of DCT values for the resulting mapping as our cost function.Because the solution space of possible placements is huge, we choose the well-known simulated annealing algorithm to obtain the best mapping solution for both methods.Specifically, we define the simulated annealing cost c as where DCT(i, j) denotes the DCT transform results at index i and j.Because the simulated annealing methods have been widely used in FPGA placements, we omit all implementation details of the simulated annealing.Instead, we focus on the choice of cost functions for both methods.In Fig. 6(c) and (d), we plotted the resulting placement and its DCT transform results.Obviously, the new placement has much better signal sparsity than the randomized placement in Fig. 6(a).Unfortunately, this simulated annealing-based approach is quite time consuming.In practice, we found that a simple sorting method can achieve similar sparsity-promoting effects.To illustrate such a phenomenon, we sorted the same criticality value distribution and its corresponding DCT transform in Fig. 6(e) and (f).The most interesting observation is that, by just sorting the criticality values, the resulting DCT transform terms are confined in a quite narrow range.Unfortunately, before our estimation phase, we don't know the final modular criticality values.As such, we don't have prior information to guide our placement.We have tried two heuristic methods.In the first method, we use the intuition to maintain the often assumed smoothness prior.Consequently, it is necessary that the chosen mapping to keep the original spatial neighboring system.In other words, if in the original digital circuit, two gates are closely connected, the shout be located closely in the resulting mapped circuit.In the second method, we first conduct a brief survey of modular criticality by performing a very small number of Monte Carlo simulations (e.g., iter = 100).We then using this roughly estimated criticality values as starting point to iteratively improve our graph mapping.The basic idea is that, after each estimation round, we use the obtained estimation results to regularize the underlying modular criticality field and iteratively improve our estimation (See Algorithm 1 for more details.).

V. ESTIMATION WITH STATIC COMPRESSIVE SENSING
Let y = {y i,j } M,N i=1,j=1 be the field of modular criticality values we'd like to estimate, if we make K accurate y measurements, according to Equation 7, we obtain K linear equations.Because K N , we have seriously underdetermined linear equations, i.e., Following [20], we convert the compressive sensing problem in Equation ( 7) into a standard Min-l 1 problem with quadratic constraints, which finds the vector with minimum l 1 norm that comes close to explaining the observations: where is a user specified parameter.This approach was first discovered as basis pursuit in [36]) and recently recast as the core of compressive sensing methodology.
Fortunately, many studies have found that the basic recovery procedure based on compressive sensing is quite computationally efficient even for large scale problems where the number of data points is in the millions [20].In this study, we focus on the problem sizes that can be recast as linear programs (LPs).Should the benchmark circuit sizes exceed millions of gates, the same recovery problem can be solved as second-order cone programs (SOCPs), where more sophisticated numerical methods need to be employed.For all of our test cases, we solved the LPs using a standard primal-dual method outlined in Chapter 11 of [37].
The standard form of primal-dual method can be readily solved by interior-point methods widely used in convex optimization problems that include inequality constraints, minimize f 0 (x) where f 0 , • • • , f m : R n → R are convex and twice continuously differentiable, and A ∈ R p×n with rank(A) = p < n.
We assume that the problem is solvable, i.e., an optimal x * exists.We denote the optimal value f 0 (x * ) as p * .We also assume that the problem is strictly feasible, i.e., there exists x ∈ D that satisfies Ax = b and This means that Slater's constraint qualification holds, so there exist dual optimal λ * ∈ R m , ν * ∈ R p , which together with x * satisfy the KKT conditions Interior-point methods solve the problem in Eqn. 12 (or the KKT conditions in Eqn.13) by applying Newton's method to a sequence of equality constrained problems, or to a sequence of modified versions of the KKT conditions.We omit further details of these interior-point algorithms, which can found in [37].We outline the key steps of our proposed estimation algorithm based on static compressive sensing in Algorithm (1).Essentially, this algorithm is an iterative procedure that improves our estimation accuracy based on the solution of previous round.For all of our benchmark cases, our iterative algorithm converges very quickly.Fig. 7 shows the typical converging procedure of our estimation procedure.In this particular case, we have 10 iterations.As illustrated, each iteration improves upon its previous one and the whole procedure converges very quickly.

VI. ESTIMATION WITH ADAPTIVE COMPRESSIVE SENSING
As shown in the above, the original form of compressive sensing can only statically solve the signal recovery problem without providing any additional information on how accurate the resulting estimates probabilistically.More importantly, the classical version of compressive sensing doesn't provide any guidance on how these samples should be drawn in order to maximize the overall measurement efficiency.Moreover, for a given requirement of accuracy, how many measurements should we need?Mathematically, all these questions can be satisfactorily answered by computing the posterior probability P (x|y), where x = {x i } N i=1 and y = {y i } K i=1 denote the N parameters to be estimated and the K measurements, respectively.
Bayesian estimation, by contrast, calculates fully the posterior distribution P (x|y).Of all the x values made possible by this distribution, it is our job to select a value that we consider best in some sense.For example, we may choose the expected value of x assuming its variance is small enough.The variance that we can calculate for the x from its posterior distribution allows us to express our confidence in any specific value we may use as an estimate.If the variance is too large, we may declare that there does not exist a good estimate for x.  (14) In this work, we devise an adaptive compressive sensing methodology based on the proposed framework from Bayesian compressive sensing.We consider the inversion of compressive measurements from a Bayesian perspective.Specifically, from this standpoint we have a prior belief that should be sparse in the basis, data are observed from compressive measurements, and the objective is to provide a posterior belief (density function) for the values of the results.Besides the improved accuracy over the point estimate, the Bayesian formalism, more importantly, provides a new framework that allows us to address a variety of issues that previously have not been addressed.Specifically, rather than providing a point (single) estimate for the weights , a full posterior density function is provided, which yields "error bars" on the estimated; these error bars may be used to give a sense of confidence in the approximation to, and they may also be used to guide the optimal design of additional compressive measurements, implemented with the goal of reducing the uncertainty in ; in addition, the Bayesian framework provides an estimate for the posterior density function of additive noise encountered when implementing the compressive measurements.We assume x to be compressible in the basis A. Therefore, let x s represent an 1-D vector that is identical to the vector x for the largest K elements magnitude.Furthermore, the remaining elements in x s are set to zero.As a result, the vector δ denotes the difference vector between x and x s , the smallest Since y is constituted through random compressive samples, the components of δ may be approximated as a zero-mean Gaussian noise as a consequence of Central Limit Theorem [] for large N − K.We therefore have the Gaussian likelihood model In a Bayesian formulation our understanding of the fact that is sparse is formalized by placing a sparseness-promoting prior on x.A widely used sparseness prior is the Laplace density function [38].
Given the compressive measurements y, and assuming the likelihood function P (y|x) in Equation ( 16), it is straightforward to demonstrate that the solution in (1) corresponds to a maximum a posteriori (MAP) estimate for using the prior P (x) in Equation (17).
According to Equation ( 14), to evaluate the posterior distribution P (x|y), we also need the evidence Assuming the hyperparameters β and α = {α i } N i=1 are known, given the compressive measurements y and the projection matrix A, the posterior for x can be expressed analytically as a multivariate Gaussian distribution Furthermore, the mean µ and covariance Σ in Equation ( 19) are defined as and where It is useful to have a measure of uncertainty in the estimated x values, where The diagonal elements of the covariance matrix Σ in Equation (21) provide "error bars" on the accuracy of our estimation.Equation ( 21) allows us to efficiently compute the associated error bars of our estimation algorithm based on compressive sensing.But more importantly, it provides us with the possibility of adaptively selecting the locations of our compressive measurements in order to minimize the estimation uncertainty.Such an idea of minimizing the measurement variance to optimally choose sampling measurements has been previously explored in the machine learning community under the name of experimental design or active learning [39].Furthermore, the error bars also give a way to determine how many measurements are enough for our estimation with compressive sensing.
The differential entropy therefore satisfies Clearly, the location of the next optimal measurement is the one that minimizes the differential entropy in Equation (23).Assume we add one more compressive measurement (K + 1), if we let H new (x|y) represent the new differential entropy as a consequence of adding this new projection vector, we have the entropy difference by adding (K + 1) compressive measurement as As such, in order to minimize the overall estimation uncertainty, all we need to do is to maximize r T K+1 Σr K+1 .The procedure of our estimation methodology based on adaptive compressive sensing is outlined in Algorithm 2. There are two while loops nested in this algorithm.The outer loop controls how to increase the number of compressive samples and how to optimally select the optimal locations of these samples.In the inner loop, for a fixed number of compressive samples, we perform the estimation strategy similar to Algorithm 1, i.e., the estimation method based on static compressive sensing.Note also, each time we add more compressive sample, we utilize the existing estimation results to perform domain regularization.

VII. RESULTS AND ANALYSIS
To validate the effectiveness of our proposed estimation methodology, we choose eleven benchmark logic circuits from the ISCAS89 suite with variable sizes from 6 to 3512 gates.We then implemented both static and adaptive versions of our estimations algorithm with the Matlab language.All benchmark circuits as well as the source code of our software implementations can be downloaded at http://www.cs.ucf.edu/ ˜mingjie/crit-proj.
In this paper, we focus on two aspects of the performance of our estimation approach: estimation accuracy and execution time.For estimation accuracy, we consider both value and ordinal correctness.Formally, let Φ 0 M ×N and Φ M ×N denote the precisely measured modular criticalities and the estimated ones, we then define the value correctness C val as Furthermore, In most empirical studies, circuit designers are more likely to be interested in the relative ranking of modular criticality value among all logic gates instead of their absolute values.As such, we define a new concept called ordinal correctness C ord as where the set A 0 δ and A δ denote the set of δ% most critical logic gates according to the complete measurements and estimated results, respectively.Additionally, • represents the cardinality of a set.
Fig. 9(a)-(f) illustrate the overall effectiveness of our estimation strategy based on compressive sensing.Fig. 9(a) displays the complete modular criticality measurements of the circuit c7552.benchthrough extensive Monte Carlo simulations.To facilitate discerning the criticality differences between various measurements in Fig. 9(b)-(f), we normalize all modular criticality measurements and sort them in a descending order.As can be clearly observed, when the number of compressive measurements K exceeds 10% of the total number of gates, both the value and ordinal correctness, C val and C ord , reach more than 95%.As shown in Section VIII, we found that, for most empirical applications, more than 9% ordinal correctness is sufficient.

A. Static vs. Dynamic Compressive Sensing
As detailed in Section V and VI, our proposed modular criticality estimation can be achieved either by static compressive sensing or adaptive compressive sensing.In general, our adaptive compressive sensing strategy based on Bayesian learning is more involved in computational effort, therefore the total execution time of adaptive compressive sensing is about 50% longer than the static version.However, the adaptive approach is much more informative than the static one in the sense that it provides detailed "error bar" information on the numerical stability for a fixed estimation case.More importantly, the adaptive compressive sensing also can help       the estimation algorithm to optimally choose the best compressive measurement locations in order to converge faster and minimize the result variances (see Section VI for more details).
To clearly identify the performance benefits of our adaptive estimation methodology relative to the static methodology, we plotted the convergence history of both methodologies for the benchmark circuit c7552.benchwith two difference performance metrics, i.e., the value correctness (C val ) and the ordinal correctness (C ord ) as defined in Equations 25 and 26.Both Fig. (10) and (11) clearly show the performance superiority of the adaptive approach.Specifically, for both C val and C ord , the adaptive methodology converges much faster than the static one.Such a difference in convergence rate is actually much more pronounced in the C val case than in the C ord case.The reason of such phenomenon is quite intuitive.For C ord (δ) to be correct, all we care about is the identity of the gates with the top δ% modular criticality.This is in contrast with the C val case, where the absolute values of modular criticality are under consideration.In other words, intuitively, the high value of C val is harder to achieve than the high value of C ord .Depending on the circumstance of application, one of these two performance metrics may be preferred.Carefully studying both Fig. 10 and Fig. 11 reveals another important distinction between the cases of C val and C ord .On average, we found that the adaptive approach is much more stable numerically than the static approach.To demonstrate such an important difference, we run 100 iterations with difference random seeds for the same problem configuration and then compute the variances in both C val and C ord values along different numbers of compressive measurements.It can be clearly seen that, in both Fig. 10 and Fig. 11, once the K values exceeds about 100, the variance in C val and C ord values approaches zero rather quickly.In contrast, the results from the static methodology exhibits a very high variance in both correctness metrics.
For a more comprehensive comparison, we list all of our test results for the eleven benchmark circuits from the ISCAS98 in Table I and II.Studying these two tables, we make several observations.First, in both cases, whenever the total number   of compressive measurements exceeds 10% of total gate count, both estimation schemes worked well.Second, the larger the target circuit is, the more accurate our estimation will be.This somewhat surprising result actually has a plausible explanation.As the size of target circuit becomes larger, the size of neighborhood or surrounding gates for each gate becomes much larger, therefore the gradient of change in their modular criticality actually decreases.In other words, our estimation method based on compressive sensing has more prior smoothness to exploit.As we discussed in Section III, ample smoothness prior is the key to the success of compressive sensing.

B. Execution Time Comparisons
The standard algorithm for reliability analysis with a gate failure model is based on fault injection and random pattern simulation in a Monte Carlo framework.By generating a random number uniformly distributed over the interval [0, 1] for each gate and comparing it to the failure probability of the gate, it can be determined whether the gate has failed or not.Thus, the set of all failed gates constitutes a sample of failed gates.At the same time, we independently generate i.i.d.distributed random input vectors.Unfortunately, such a Monte Carlo-based methodology is known to be quite time consuming.For a logic circuit with N in input ports, there are 2 Nin different input vectors.For any realistic values of N in , the astronomically large size of input patterns renders exhaustive simulation infeasible.This problem is even more serious for our strategy of estimating modular criticality values because, in theory, exhaustively measuring modular criticality values needs to perform reliability analysis N times for a logic circuit with N logic gates.Thus, a very large number of runs are required to achieve convergence of the solution, and this makes the algorithm computationally infeasible for any largescale circuits.
The main motivation of our compressive sensing-based methodology is to significantly reduce the overall runtime for accurately measuring modular criticality within a large-scale logic circuit.Given a digital circuit G consisting of N logic gates, we denote the total run time for one complete logic simulation of G as T .If we use the exhaustive measurement method for modular criticality values, in theory, the total run time is T tot,BL = N • T , which can be exceedingly long.However, if using our compressive estimation algorithm, we only need K rounds of logic simulation in order to achieve a predetermined accuracy, therefore the total run time T tot,CS equals to K • T + T cs , where T cs denotes the run time for the compressive estimation.As such, the speedup of our compressive estimation methodology can be computed as Because K N and T cs T , it should be clear that S ≈ N K .In our case, all these speedups are approximately 10.We have performed both Monte Carlo-based and compressive sensing-based experiments for all eleven benchmark circuits.For Monte Carlo-based measurements of modular criticality, we use 1e-6 as the termination criterion for each simulation run.For all of the compressive sensing-based experiments, we determine the sample size by achieving 98% of accuracy when compared with the exhaustive measurements.Our result have shown that, on average, about 10% of total number of gates are good enough to be the sample size.As shown in Fig. 12, all of the speedup values are quite close to about 10.In particular, the circuit c880 achieves the largest speedup of about 12 because it only requires approximately 8% of circuit size in order to achieve 98% estimation accuracy.

VIII. ILLUSTRATIVE APPLICATIONS: DISCRIMINATIVE FORTIFICATION
To illustrate the value of knowing accurate modular criticality values, we now propose a novel system-level approach, Discriminatively Circuit Fortification (DCF), to achieving error resilience that preserves the delivery of expected performance and accurate results with high probability, despite of the presence of unexpected faulty components located at random locations, in a robust and efficient way.The key idea of DCF is to judiciously allocate hardware redundancy according to each component's criticality with respect to the device's overall target error resilience.This idea is motivated by two emerging trends in modern computing.First, while ASIC design and manufacturing costs are soaring with each new technology node, the computing power and logic capacity of modern CMOS devices steadily advances.Second, many newly emerging applications, such as data mining, market analysis, cognitive systems and computational biology, are expected to dominate modern computing demands.Unlike conventional computing applications, they are often tolerant to imprecision and approximation.In other words, for these applications, computation results need not always be perfect as long as the accuracy of the computation is "acceptable" to human users.
We choose 11 benchmark circuits from the ISCAS89 suite.We show that, using the same amount of extra circuit resource, allocating them according to the ranking of modular criticality will achieve much large improvements (4 or 5 times more) in the target circuit's reliability than allocating them obliviously, i.e., without knowing modular criticality.
We study two error models for digital circuits.The first model assumes the constant gate failure for each gate within a digital circuit, and all gates fail independently.As such, there is a nonzero probability that a large number of gates in the circuit fail simultaneously.Suppose we have a N -gate digital circuit with each gate having a constant error probability e, the probability of k faulty gates is For example, in a circuit with 10 000 gates, if each gate has a failure probability of 10%,then the probability of ten gates failing simultaneously is 0.125.It is not unreasonable to expect such high failure rates for emerging technologies such as carbon nanotube transistors or single electron transistors.However, these failure rates are much beyond what has been predicted for future semiconductor technologies.Thus, the independent gate failure model has an inherent disadvantage that the average number of gate failures and the fraction of clock cycles in which gate failures occur are dependent, and cannot be varied independently.This problem is resolved by limiting the maximum number of gate failures.Therefore, we consider another error model called constant-k.
We quantify the output error probability for a given digital circuit in two ways called whole and fractional error probability.Assume that the target circuit G has m input ports and n output ports, let the input and output bit vector, respectively.Moreover, we use x t i and y t j to denote the ith input bit and the jth output bit of the tth Monte Carlo simulation, respectively.As such, x t and y t denote the input and output bit vector, where t ∈ [1, T ], i ∈ [1, m], and j ∈ [1, n].To facilitate our discussion, we define two kinds of indicator variables as the following, and To calculate the overall output error probability, we apply a large assembly of randomly generated input bit vectors drawn according to the Monte Carlo principle.For each specific random input bit vector x t , we evaluate our target circuit G two times.First, we assume all gates to be perfect and the resulting output vector to be y t .In the second evaluation, we set some logic gates to be faulty according to one of two gate error models we discussed above, and the resulting output vector to be y t .We then define the whole and fractional output error probability according the following two equations.
Clearly, in Equation 30, even if one bit differs between y t and y t , we count it as one, whereas in Equation 30, we use a fraction number to account for the fact that the number of differing bits between y t and y t is also important.Intuitively, the fractional error probability is more suitable for some application with inherent error resilience.For example, for some multimedia applications such as video decoding, a small number of output bit errors probably are less critical than a large number of output bit errors.In order to demonstrate that, using the same amount of extra circuit resource, allocating them according to the ranking of modular criticality will achieve much large improvements (4 or 5 times more) in the target circuit's reliability than allocating them obliviously, we chose three ways of distributing the circuit fortification.In Method A, we choose 10% of all logic gates with the highest modular criticality values, while in Method B, we choose 10% of all logic gates with the lowest modular criticality values.In addition, we also randomly choose 10% of all gates and call it as Method C. In Fig. 13, we plot the overall circuit reliability against different gate error probability.Not surprisingly, as the gate error probability δ increases, the overall reliability of c7552.benchdecreases.However, for any fixed δ value, there is a significant improvement in overall circuit reliability between the case A and the baseline, whereas in the case C, there are very little improvements in circuit reliability.Maybe most interestingly, if we choose 10% gates randomly to fortify, the resulting improvements in overall circuit reliability are very similar to that in the case B. This clearly shows that, without the knowledge of modular criticality value, logic circuit fortification will be much less effective.Table III presents the circuit reliability results if we use the error probability defined by Equation (30).We assume that each logic gate is independent and has a fixed error probability value δ.As expected, as δ increases, the overall circuit reliability decreases.For each case, the fortification strategy A always performs the best.Another important observation is that as we increase the fortification ratio from 10% to 30%, although there is a stead improvement in reliability, there seems to be an effect of diminished return.In other words, even if we double the fortification ratio from 10% to 20%, on average, there is a less than 20% improvement for most circuit benchmarks.As we discussed above, we also investigated a different error model where, instead of fixing the error probability of each gate, we fix the number of fault gates but choose the identity of these faulty gates randomly.This fixed-k model is obviously different from the fixed-p, where the number of faulty gates actually changes probabilistically.We realize that actually modeling fault behavior in a digital circuit is rather difficult, and we hypothesize that the actual error behavior is somewhere between the fixed-p and fixed-k models.Table IV presents the circuit reliability results if we use the error probability defined by Equation (30).We now assume the number of faulty gates is fixed.It should be noted that in this fixed-k model, the error behavior of each gate is no long independent.Instead, they are correlated.Surprisingly, we observe very similar reliability improvement results as in Table III.Namely, as δ increases, the overall circuit reliability decreases.For each case, the fortification strategy A always performs the best.Similar to the previous case, we also observed that as we increase the fortification ratio from 10% to 30%, although there is a steady improvement in reliability, there seems to be an effect of diminished return.In other words, even if we double the fortification ratio from 10% to 20%, on average, there is a less than 20% improvement for most benchmark circuits.As we mentioned above, a growing number of emerging applications, such as data mining, market analysis, cognitive systems and computational biology, typically process massive amounts of data and attempt to model real-world complex system.As such, it has been found that there is certain degree of inherent tolerance to imprecision and approximation in these applications.In other words, for these applications, computation results need not always be bit-wise accurate.Instead, as long as the accuracy of the computation is "acceptable" to end uses, these computations are considered to be acceptable.As such, we consider the second method to quantify the error probability of a digital circuit.In this method, as defined in Eqn.31, we take consideration of the number of error bits among all output bits.In other words, with the first definition in Eqn. 30, if any bit is different between the output bit vector and the reference one, we count it a complete error.However, with the second method, we differentiate the case with many bits wrong and few bits wrong.We repeat the above numerical experiments with the new definition of error probability and present all results in Table V and VI for both fixed-p and fixed-k models, respectively.In general, we observe very similar reliability improvements in Table V and VI when compared with Table III and IV.However, we obtain much smaller absolute reliability improvements.This is quite misleading because the reduction of error probability is more telling.To explain this phenomenon, we recorded the number of error bits during each logic simulation round and plotted its histogram in Fig. 14.As shown in this figure, during the majority of logic simulation rounds, the number of error bits of output is less than 5, while the total number of output bits is 108.In other words, on average, the absolute value of error probability is reduced by 108/5 ≈ 22 times.

IX. RELATED WORK
Criticality analysis has been extensively studied in software [40], but is quite rare in error-resilient computing device research.Only recently, the general area of criticality analysis (CA), which provides relative measures of significance for the effects of individual components on the overall correctness of system operation, has been investigated in digital circuit design.For example, in [41], a novel approach to optimize digital integrated circuits yield with regards to speed and  area/power for aggressive scaling technologies is presented.The technique is intended to reduce the effects of intra-die variations using redundancy applied only on critical parts of the circuit.In [42], we have explored the idea of discriminatively fortifying a large H.264 circuit design with FPGA fabric.We recognize that 1) different system components contribute differently to the overall correctness of a target application, therefore should be treated distinctively, and 2) abundant error resilience exists inherently in many practical algorithms, such as signal processing, visual perception, and artificial learning.Such error resilience can be significantly improved with effective hardware support.However, in [42], we used Monte Carlo-based fault injection and therefore the resulting algorithm can not be efficiently applied to large scale circuit.Furthermore, our definition of modular criticality was quite primitive.
More relevant to our study, study [43] introduced a logiclevel soft error mitigation methodology for combinational circuits.Their key idea is to exploit the existence of logic implications in a design, and to selectively add pertinent functionally redundant wires to the circuit.They have demonstrated that the addition of functionally redundant wires reduces the probability that a Single-Event Transient (SET) error will reach a primary output, and, by extension, the Soft Error Rate (SEP) of the circuit.Obviously, the proposed circuit techniques can be readily applied using our proposed criticality estimation method, especially in a large-scale circuit case.But more importantly, the method used in [43] to determine circuit criticality is mostly done by assessed the SET sensitization probability reduction achieved by candidate functionally redundant wires, and selects an appropriate subset that, when added to the design, minimizes its SER.Consequently, their overall method of criticality analysis is rather heuristic and utilize largely "local" information.In addition, it is not very clear how this method can scale with very large-scale circuits.
Study [44] also targets hardening combinational circuits, but focused on mapping digital designs onto Xilinx Virtex FPGAs against single event upsets (SEUs).They don't perform detailed criticality analysis.Instead, their method uses the signal probabilities of the lines to detect SEU sensitive subcircuits of a given combinational circuit.Afterwards, the circuit components deemed to be sensitive are hardened against SEUs by selectively applying triple modular redundancy (STMR) to these sensitive subcircuits.More recently, in [45], a new methodology to insert selective TMR automatically for SEU mitigation has been presented.Again, the criticality was determined based on empirical data.Because the overall method is cast as a multi-variable optimization problem, it is not clear how this method can scale with circuit size and little insights will be provided as to which part of circuit is more critical than others and by how much.
Unlike all these studies, we aim at ranking the significance of each potential failure for each gate in the target VLSI digital circuit's design based on a failure rate and a severity ranking.The main focus of our study is to accurately perform such criticality analysis for very large-scale circuits.The key insight underlying our approach is that criticality analysis can be re-cast as a combined problem of uncertainty analysis and sensitivity analysis.Although our methodology to conduct sensitivity analysis on spatial models is based on computationally expensive Monte Carlo simulation, we take advantage of the latest developments in stochastic signal processing, therefor are able to significantly reduce the overall measurement methods.Fundamentally, we believe that our method of assessing criticality of logic circuits, although complex at the algorithmatic level, can be much more robust when considering the exponential number of inputs, combinations, and correlations in gate failures.

X. CONCLUSIONS
Despite of the elegance of many stochastic signal processing techniques, they have been rarely applied to digital circuit design and analysis.This is largely due to the significant conceptual gap between continuous signal processing and discrete Boolean algebra.This paper presents our first attempt to bridge such a gap by estimating the modular criticality inside a very large-scale digital circuit.In addition to achieving high accuracy with very limited measurements, we have also shown how to adaptively choose sampling locations in order to obtain our desired results while minimizing our overall efforts.
Besides modular criticality, we hypothesize that many other important properties of digital circuits also possess certain degree of inherent sparsity, such as signal path delay and logic observability.As such, very efficient and effective measuring methods may exist in order to extract these information with significantly reduced computational efforts.We are currently looking into other stochastic learning algorithms and identify other potential problems in digital circuit design and analysis that are amenable to such stochastic and Bayesian treatment.

Fig. 9 :
Fig. 9: (a) Accurate Φ measurements.(b)-(f) Compressive estimations vs. different number of compressive measurements.K: number of compressive measurements.N : Total number of logic gates.All these results are obtain through dynamic compressive sensing.

Fig. 12 :
Fig. 12: Execution time speedup of our proposed method over the conventional Monte-Carlo based method.

TABLE II :
Estimation Accuracy vs. Number of Measurements for Dynamic Estimation Scheme.