Gate Sizing Methodology with a Novel Accurate Metric to Improve Circuit Timing Performance under Process Variations

: The impact of process variations on circuit performance has become more critical with the technological scaling, and the increasing level of integration of integrated circuits. The degradation of the performance of the circuit means economic losses. In this paper, we propose an efﬁcient statistical gate-sizing methodology for improving circuit speed in the presence of independent intra-die process variations. A path selection method, a heuristic, two coarse selection metrics, and one ﬁne selection metric are part of the new proposed methodology. The ﬁne metric includes essential concepts like the derivative of the standard deviation of delay, a path segment analysis, the criticality, the slack-time, and area. The proposed new methodology is applied to ISCAS Benchmark circuits. The average percentage of optimization in the delay is 12%, the average percentage of optimization in the delay standard deviation is 27.8%, the average percentage in the area increase is less than 5%, and computing time is up to ten times less than using analytical methods like Lagrange Multipliers.


Introduction
The continuous technological scaling and the increase in the level of integration of the nanometer circuits have made the process variations a main concern in the design of integrated circuits [1,2]. Intra-die process variations, which can be spatially independent or correlated, are increasing in new technologies [2]. Random Dopant Fluctuations (RDF), body and gate line edge roughness, work-function and stress channel are, among others, the main causes of intra-die local variations in advanced technology nodes [3][4][5][6]. Process variations impact the performance parameter of the circuits [1,[7][8][9][10][11][12][13][14]. Process variations have an impact on the delay, power, noise, ageing, soft errors, and leakage, among other performance parameters. Degraded circuit performance due to process variations reduces chip revenue [7].
Gate-sizing optimization techniques have been widely used to improve the performance of circuits. This optimization method can be done using analytical methods such Lagrange multipliers as in [15][16][17][18][19][20][21], or using heuristics and metrics as in [19,[22][23][24][25][26][27][28][29][30][31]. Gate-sizing optimization techniques based in heuristics and metrics consume less computing time than analytical methods as Lagrange multipliers or Geometric Programming. In heuristic and metric optimization methods, the gate selection metric and the optimization methodology define the results of the optimization process. In [22], the objective of the methodology is to minimize the leakage of the circuit. Two metrics are used, the first metric based on the Yield Slack (YS i ) identifies the gates with more timing resources, and the other metric (S i = (∆L/∆TY i )YS i ) computes the timing yield and the leakage caused by changing the V th of the gate i. The gates with the highest metric scores are selected to resize. This methodology is quite accurate but computationally expensive. In [24], the objective is to maximize a profit function. One heuristic and two metrics are used. The first metric is based on the slack and preselects a set of critical gates, and the second selection metric (S i = ∆p − percentil/∆W) is used to measure the change in delay in the p-percentile after resizing the gate i. The gates with the highest metric scores are resized. This methodology is accurate, complex, and computationally expensive. In [25], the objective is to minimize the delay (µ + σ) of the circuit. Twenty percent of paths with the highest µ + σ are selected, then, a recursive formula is used to increase and decrease the size of the gates until the delay converges to an acceptable value. In this case, the resizing metrics need more elements to select the gates that benefit the optimization of delay with the care of the area. In [27], the objective is to minimize σ 2 . A heuristic, metric, and a cost function are used. The critical paths are selected using metrics based on statistical slack. The cost function (Cost i = µ i + λσ i ) is used to optimize µ and σ of the critical gates. The cost function uses a λ factor, which provides more emphasis on the optimization of the standard deviation of the delay. This methodology considers the fan-in and fan-out of the gate i. This work provides good results in the percentage optimization of the delay standard deviation σ, but the delay and area are not the main concern. In [29], one heuristic and two metrics are used. The objective is to optimize the timing yield of the circuit. The first metric is based on the concept of criticality and is used to select the most critical gates to reduce the number of gates to analyze by the second metric. The second metric is computationally more expensive as it is based on the effective yield gradient (EYG i ). The fan-in and fan-out cone of the analyzed gate i is considered. This metric and methodology is accurate but with cost in computing time. In [31], a heuristic and a metric are used to optimize the timing yield of the circuit. The metric is the adjacent criticality, which takes into account the criticality of the gate i and the criticality of the fan-out gates. This method is accurate but with cost in computing time.
This paper proposes a new methodology that includes a critical path selection method, a heuristic, two coarse selection metrics to preselect critical gates, and a fine metric to select the final set of gates to resize. The fine metric includes important concepts such as the derivative of the standard deviation of delay, the criticality, the slack time, and the area of the analyzed gate. The metric also includes the concept of the segment and the variations in the input transition time. The methodology is applied to ISCAS benchmark circuits, and it offers more benefit in delay reduction at lower area cost and computing time. This work focuses on independent intra-die process variations in the transistor threshold voltage (Vth) [13,[32][33][34][35]. Even more, the extension of the results to consider other types of independent process variations is straightforward.
The organization of the rest of the paper is as follows. Section 2 presents the optimization methodology. Section 3 presents coarse strategies for pruning candidate gates. Section 4 presents the proposed accurate metric, named fine metric, to select the best candidate gates to improve circuit performance. Section 5 presents the heuristic sizing methodology. Section 6 presents simulation results on ISCAS benchmark circuits and a comparison with previous works. Finally, Section 7 presents the conclusions of this work. Figure 1 shows a flux diagram of our proposal oriented to optimize circuit yield based on a statistical framework. The first step is to read the circuit information. The second step is to obtain a set of critical paths using Deterministic Static Timing Analysis (DSTA) based on corner analysis. Then, the obtained set is pruned using Statistical Static Timing Analysis (SSTA). Next, candidate gates in the critical paths are selected using coarse strategies with a low computational cost. The first coarse strategy is based on a simple metric, and the second on the concept of gate criticality. Then, the fine selection metric is used to prune the set of candidate gates to be sized-up. It must be noted that the more expensive but accurate fine selection metric evaluates a smaller set of gates that were reduced by the low-cost coarse pruning strategies. A sizing heuristic is applied to a subset of the ranked candidate gates obtained after the fine metric selection. Then, circuit information is updated. If the area of the optimized circuit (A c ) is smaller than the area constraint (A t ) or the derivative of the mean delay is negative (∂µ/∂K < 0), or the derivative of delay standard deviation is negative (∂σ/∂K < 0) the process continues; otherwise, the results are the optimized circuit. The main constraint in our optimization methodology is the area constraint (A t ). The other two constraints are used for not sizing-up the gates of the circuit when the benefit is limited or even there is no benefit. DSTA and SSTA are applied to the optimized circuit to obtain the final timing information.

Coarse Strategies for Pruning Candidate Gates
Two low-cost strategies are used for pruning candidate gates.

Coarse Selection Using a Simple Metric
The first coarse metric for pruning candidate gates is based on the nominal gate delay. The use of this coarse metric avoids making statistical evaluations saving computing time. Using the alpha-power law model [36], the nominal delay of a logic inverter can be expressed as where C L is the load capacitance, V DD is the power supply, L is the transistor channel length, T ox is the gate oxide thickness, µ is the charge mobility in the transistor channel, ε ox is the dielectric constant, V th is the transistor voltage threshold, and α is a constant that depends on the technology. W = KW min where W min is the transistor channel width of a minimum-sized inverter and K is a scaling factor of the transistor size. Making the derivative of delay (d) in Equation (1) with respect to K gives the delay sensitivity of the inverter delay to small changes in the inverter size (S D,W ). For a given technology, the following expression for the delay sensitivity can be obtained: In the previous expression, it can be observed that the impact of a small change of K on the nominal inverter delay depends only on C L and the inverter size K. Thus, Equation (2) can be used as an initial coarse metric for pruning the set of candidate gates. We want to highlight that only those logic gates having small values of delay sensitivities are discarded.

Coarse Selection Using the Gate Criticality
The gate criticality (N i ) is the number of times that a critical path crosses through a gate [17,31,37]. If the gate has a high criticality, it means that a significant number of critical paths share this gate.
The gate criticality is usually used in metrics to select the best candidate gates [17,31,37], and we use it in the same way as shown later on. In addition, this work proposes to use the gate criticality for coarsely pruning candidates gates. Those gates having a low value of gate criticality are removed from the set of candidate gates.

Metric Fundamentals
The proposed fine metric is based on the evaluation of a path segment and modelling the input transition time as a normal distribution.

Path Segment Evaluation
For simplicity, a 5-inverter chain with different values of load capacitances (see Figure 2) is used to analyze the path segment behavior. The analyzed circuit allows for analyzing the relative impact on the standard delay of each gate when one gate is sized-up. A path segment of a logic path is defined as that composed by: (a) the gate to size-up, (b) the preceding gate, and c) the driven gate by the sized-up gate. For instance, the path segment when the gate G i is sized-up (See Figure 2) is composed by the gates G i , G i−1 , and G i+1 .  Figure 3a shows the behavior of the delay standard deviation of each gate in the 5-inverter chain as gate G i is sized-up. In addition, Figure 3b shows the relative impact on the delay variance of each gate as gate G i is sized-up by an amount ∆K i . Figure 3a,b was obtained with SPICE. The following occurs when gate G i is sized-up:

Path Segment
- The gate delay standard deviation of gate G i reduces because σ Vth ∝ 1/( √ W L) as indicated [38]. -At the same time, the output driving current (I ds ∝ W/L) of gate G i increases leading to faster output transitions and small output variations [39,40]. Consequently, the delay standard deviation of the gate G i+1 decreases. - The load capacitance of the preceding gate G i−1 increases, and as a consequence, the delay variance of the gate G i−1 increases.
In addition, sizing-up inverter G i does not cause significant changes in the delay standard deviation of gates G i−2 and G i+2 . Similar behavior has been found for other gate sizes and loading conditions [29,31].
The previous behavior is consistent with Figure 3b showing the relative impact on the delay variance of each gate as gate G i is sized-up by an amount ∆K i . In Figure 3b, it can also be observed that the sum of the changes of delay variances of each gate in the path segment (Sum) is lower than the change in delay variance of the path segment (PS). The difference between Sum and PS is due to the impact of the input transition time as will be explained next.

Modelling the Input Transition Time as a Normal Distribution
The second important issue considered in the proposed fine metric is that the variations in the input transition time [39][40][41][42][43][44][45] are modelled with a normal distribution. As a first consequence, the delay distribution (D) of a gate G i depends on both the normal distribution due to the independent variations in the transistor threshold voltage at the gate G i and variations at its input transition time (D = D Vth + D sin ). As a second consequence, a covariance appears between the delays of two consecutive gates (e.g., Cov(D i ;D i+1 )). The correlation for this delay covariance is almost one as the output transition time of a gate is the input transition time of the next gate [41][42][43].

Derivation of the Basic Fine Metric
The SPICE simulations (See Figure 3b) clearly show that the change in the delay variance of the path segment (PS) tracks the change well in the delay variance of the entire logic path Path when a gate is sized-up. Next, the fine metric is obtained. Let us first express the gate delay variance and the variance of the output transition time in terms of delay sensitivities due to small changes in the V th of the transistors and the gate input transition time (sin), where S D i Vth i is the delay sensitivity of gate i due to variations of its V th , σ Vth i is the variation of the transistor threshold voltage at gate i due to the manufacturing process, S D i sin i is the delay sensitivity of gate i due to variations at its input transition time, and σ 2 D i ,sin is the variation of the input transition time at gate i. S sout i Vth i is the sensitivity of the output transition time due to changes in the V th of gate i.
The term σ 2 D i ,sin depends on the previous gates (σ 2 D i ,sin = σ 2 D i−1 ,sout ). Then, using (3) and (4), the standard deviation of delay of the gate i can be expressed as The covariance between two consecutive gates in terms of the delay sensitivities requires first obtaining the delay distributions of two consecutive gates. Let us first to obtain the delay distribution of the gate i − 1, and the delay distribution of the gate i is expressed in a similar way, Since the variations in Vth are independent intra-die process variations, the covariance the delay distributions in (6) and (7) is given by We are using (5) and (8) next. The change in the delay variance of a logic path when a gate i is sized-up by a small increment ∆K can be obtained by making the derivative of the delay variance of a logic path with respect to K i : Equations (5) and (8) can be replaced at each corresponding term on the right side in (9). The first term in the previous equation is zero as it does not depend on a change in K i . The second, third, and fourth terms are different from zero because they are impacted by the re-sizing of the central gate as explained before. The fifth and sixth terms deserve a particular analysis.
Analysis of the fifth term in Equation (9) Let us first analyze the fifth term of Equation (9). The delay variance of the gate i + 2 can be obtained using (5). The change in delay variance of gate i + 2 when the gate i is sized-up by a small increment ∆K can be expressed by The first term on the right side in Equation (10)  do change due to variations in K i , but the variation of the product of the squared sensitivities is small enough not to be consider. As a result of that, the fifth term of Equation (9) can be neglected.
Analysis of the sixth term in Equation (9) Let us now analyze the sixth term in Equation (9). The sixth term is composed of four terms of covariance between any pair of consecutive gates. These terms are ∂Cov(D i−2 , Using Equation (8), the term ∂Cov(D i−2 , D i−1 )/∂K i can be expressed by The previous term can be neglected because S does not depend on the variations in K i .
Using Equation (8), the term ∂Cov(D i+1 , D i+2 )/∂K i can be expressed by The previous term can be neglected because S do not change due to variations in K i .
The terms ∂Cov(D i−1 , D i )/∂K i and ∂Cov(D i , D i+1 )/∂K i have a strong dependence on the sized-up gate, and, hence, they should not be neglected.
Based on the previous analysis, the change in the delay variance of a logic path when a gate i is sized-up can be approximated by the change in the delay variance of the path segment as follows:

Basic Fine Metric
The covariance between adjacent gate can be expressed in terms of the product of their standard deviation and correlation (ρ ≈ 1 between adjacent gates [41,45]): The proposed metric is based on Equation (13), substituting Equation (14) in Equation (13), making the operations in Equation (13), and after ordering the terms gives, Equation (15) represents the variations in the path segment due to size changes at gate G i [45].

Including Area, Gate Criticality, and Slack Time
The relative area cost of the gates is also considered. For instance, increasing the inverter size by a small increment ∆K i has a different area cost than increasing a 3-Nand gate by the same amount ∆K i . The gate area is computed by A i = A min K i , where A min is the gate area for a minimum-sized symmetrical inverter allowed by the technology. In addition, gates with higher criticality are preferred.
When resizing gates with high criticality, all the gates in the fan out cone of these gates improve their delay standard deviation at the same time, and the increase in area is only in the critical gates as in [17,31,37]. A gate belonging to a path with higher slack time is a better candidate gate [22,46]. The final fine selection metric including area cost, gate criticality, and slack time is as follows:

Sizing Heuristic
The sizing algorithm is composed of two parts. The first part evaluates the metrics, and the second part size-up those candidate gates selected by the metrics. A set of critical paths (set2) is the input to the metrics. At the beginning of the process, two coarse selection metrics are applied. Then, the fine selection metric (M i ) is applied to the remaining gates. The gates are ranked according to their metric score, and one-quarter of the gates (n) selected by the fine metric is the final set of candidate gates for resizing (set(g c ,M c )). In the second part of the algorithm, the gates are sized-up in proportion to its metric value (∆K = step * M c [i]/M cmax ), where step is the maximum size change that a gate can take at an iteration, and M cmax is the highest metric score. The obtained gate sizes with the optimization process are adjusted to comply the design rules of the technology. The maximum size of a gate is restricted to ten times its original size. The timing information and load capacitances are updated. Then, the process repeats until the restrictions are fulfilled.
The main target of our optimization process is to minimize the standard deviation of delay (σ) with restriction in area. However, during the optimization process, both the standard deviation and the mean delay (µ) reduce as the gates are sized-up. Figure 4a shows the change in the mean and standard deviation of the delay resizing gate G i in the logic path shown in Figure 2. A reduction in σ (µ) as the gate G i is sized-up means that dσ/dK (dµ/dK) is smaller than zero (See Figure 4b). Hence, in algorithm one, the optimization process ends according to the area restriction or if the derivative of σ (µ) is greater than zero. It ensures that some of area, delay, or sigma variables do not deteriorate at the expense of the others.

Simulation Results on the ISCAS Benchmark Circuits
An in-house tool that implemented the proposed flow in Figure 1 was developed. The algorithms have been written in C++ code. The effectiveness of our proposal has been validated on ISCAS 85 benchmark circuits implemented with a 65-nm technology. The layouts of the minimum-sized benchmark circuits have been obtained using the Mentor Graphics suite of synthesis and layout tools. Equation (15) is computed obtaining the delay sensitivities of each gate as a function of gate size, load capacitance, and input transition time. The gates delay sensitivities are obtained with SPICE, and MATLAB is used to adjust a polynomial at each sensitivity data. Polynomial expressions of the gate delay, output transition time, and delay sensitivities to changes in V th and s in are obtained. It must be noted that this process is just made once for the entire digital library of a given technology.  Figure 2). For each area combination of the gates in the path, the standard deviation of the path delay is measured. The obtained standard deviation of the path delay for the simulated area is plotted as a circle in Figure 5. On the other hand, the metric is applied repeatedly to the circuit in Figure 2, resizing the gate with the highest metric value in each iteration. For each iteration, the area and the standard deviation of path delay are measured. The path area and its respective standard deviation of the path delay are plotted on the blue line in Figure 5. The solid line corresponds to the optimized path with the lowest area cost. A close agreement between the results obtained with the fine metric and SPICE is observed.

Benefit of the Low-Cost Pruning Strategies
The benefit of using the low-cost pruning strategies is shown in Figures 6 and 7. Figure 6 shows the optimization results using the simple coarse metric and also only the fine metric selection (without the simple coarse metric). It can be observed that the percentage of optimization in σ, µ + 3σ, and mean delay µ using the simple coarse metric follow the optimization results without the simple coarse metric. A small decrease in the optimization results appears in some cases, but this is not significant. Even more, a reduction in computing time can be observed when the simple coarse selection metric is used (see Figure 6d). %Opt.
(a) Optimization in σ  Figure 7 shows the optimization results using the criticality coarse metric and also only the fine metric selection (without the criticality coarse metric). It can be observed that the percentage of optimization in σ, µ + 3σ, and µ, using the criticality coarse metric, follow the optimization results closely without the criticality coarse metric, but computing time is saved when the criticality coarse metric is used (see Figure 7d). %Opt.
(a) Optimization in σ %Opt.  Table 1 shows an example step-by-step of Algorithm 1 for some ISCAS Benchmark circuits. First, the timing information of the circuits before the optimization is given. Table 1 shows the benefit in path pruning using DSTA and SSTA. It can also be observed that using the coarse metrics reduces the number of gates that will be analyzed by the fine metric. The reduction in the number of gates depends on the circuit topology. The results of the optimization process are illustrated at the right end of the Table 1. The circuit data information is updated after the optimization. Using DSTA and SSTA, the µ and σ of the longest critical path of the circuit is obtained. Table 2 shows the optimization results for 5% of area constraint. In addition, optimization results for the complete derivative of the delay variance of the logic path (DVP) and Lagrange method (L) are presented. The Lagrange methodology optimizes the delay standard deviation subject to the area restriction. The Lagrangian is solved using a gradient method. After optimization, the results of percentage optimization in delay standard deviation, delay, and area of our proposal follow those using the full derivative (DVP) of the logic path closely (see Table 2). In addition, our results approach those obtained with the Lagrange method. However, our proposal saves a significant amount of computing time.

Comparison with Previous Works
Our proposal was compared with the results from other authors. Papers from other authors present algorithms and implementations with specific strategies. Thus, this comparison is presented to indicate the benefits of our proposal in perspective with other works.
The results of our proposal also have been compared against the results in [27]. This work uses the delay variance as the objective function, and a cost metric function that maximizes the reduction in σ, to select gates. In [27] (See Table 3), the change in µ is positive. In our proposal, the change in µ is negative in all cases, which means that the delay always reduces after optimization. The reduction of the standard deviation of the delay in our proposal is slightly lower than in [27]. The increase in area is considerably smaller in our proposal. Finally, our proposal presents a lower cost in computing time. The results of our proposal also have been compared against the results in [25]. In work presented in [25], the objective of the methodology is to minimize µ + σ. The methodology uses a heuristic and a metric. The average of the percentage of optimization in the mean delay in [25] is 36.9% and with our proposal is 11.12%. The average of the percentage of optimization in the variability (σ/µ) in [25] is 19.8% and with our proposal is 19.04%. The area increase in [25] is 50.88% and with our proposal is only 3.05%. It can be observed that our proposal presents good results trading-off the benefit in optimization and area penalization.

Conclusions
A statistical design methodology for circuit timing optimization has been proposed. A method to select critical paths is presented. The proposed methodology uses a heuristic, two low-cost coarse selection metrics, and a fine metric for selecting the best candidate gates to size-up. The use of coarse selection metrics allows a reduction in computing time. The basic fine metric allows for selecting the gates providing the higher benefit in the reduction of the delay standard deviation at the lowest area cost. Even more, the criticality and the slack-time are considered in the final fine metric. The use of path segment evaluation in the fine metric saves computing time. The proposed statistical design methodology has been validated on ISCAS benchmark circuits, the average of the percentage optimization in the delay standard deviation (∆σ) is 27.8%, the average of the percentage optimization in the delay (∆µ + 3σ) is 12%, and the computing time is up to ten times less than Lagrange optimization methods. It should also be noted that the optimization results of the proposal are close to those obtained with Lagrange optimization method. The proposed statistical design sizing methodology is suitable for modern complex circuits.