A Learning-Based Framework for Circuit Path Level NBTI Degradation Prediction

: Negative bias temperature instability (NBTI) has become one of the major causes for temporal reliability degradation of nanoscale circuits. Due to its complex dependence on operating conditions, it is a tremendous challenge to the existing timing analysis ﬂow. In order to get the accurate aged delay of the circuit, previous research mainly focused on the gate level or lower. This paper proposes a low-runtime and high-accuracy machining learning framework on the circuit path level ﬁrstly, which can be formulated as a multi-input–multioutput problem and solved using a linear regression model. A large number of worst-case path candidates from ISCAS’85, ISCAS’89, and ITC’99 benchmarks were used for training and inference in the experiment. The results show that our proposed approach achieves signiﬁcant runtime speed-up with minimal loss of accuracy.


Introduction
Invasive uninterrupted scaling of CMOS and fin field-effect transistor (FinFET) technologies to nanoscale level leads to various fallouts such as variability of process parameters and aging [1]. Fabrication-induced geometric and electrical parameter variations, e.g., changes in device effective channel length and threshold voltage, have introduced large-scale variability of circuit performance. Meanwhile, runtime aging effects, such as electromigration, thermal cycling, and negative bias temperature instability (NBTI), have become another serious concern in nanoscale integrated circuit design [2].
NBTI is known to be the most critical reliability issue that can affect circuit lifetime [3,4]. NBTI occurs when a pMOS device is under negative bias conditions (V gs = −V dd ), especially at high temperatures. Due to NBTI, the threshold voltage (|V th |) of the transistor increases with time, resulting in a reduction in drive current. The reduction in drive current in turn results in temporal degradation in the performance of a circuit, causing reliability degradation over time, and may eventually cause the circuit to fail. On the contrary, when the pMOS is off (V gs = 0), |V th | will decrease gradually before stress injection, and pMOS degradation is relaxed. This condition is defined as the recovery phase of NBTI. Degradation caused by NBTI increases gradually repeating stress and recovery cycles [5].
Obviously, an early estimation of reliability is necessary in the design phase and should be considered as one of the design parameters to ensure the reliable operation of circuits for a desired period of time. To facilitate the reliability analysis process, considerable efforts have been put into estimating the circuit performance degradation. As a result, many researchers have begun analyzing these effects at the transistor and gate level in order to enable the evaluation of larger circuit. Nevertheless, an approach which can analyze the temporal degradation using a low-runtime and high-accuracy method at the circuit level directly has yet to evolve. The central objective of this paper is to propose a regression-based machine learning algorithm in high level due to NBTI effects, which is the first such paper in the literature to the best of the authors' knowledge.
The rest of the article is organized as follows: Section 2 introduces the related research on NBTI degradation comprehensively from physical to circuit level. Section 3 introduces the general idea of the work. Section 4 describes in detail the proposed framework. Section 5 reports the experimental results. The paper is concluded in Section 6.

Literature Review on NBTI Degradation Research
The growing concern of device failure due to NBTI has prompted a significant effort on the part of the research community, including but not limited to the content shown in Figure 1. For example, the aging measurement method [6][7][8][9] does not appear in Figure 1, which also belongs to the aging domain.

Physical Level
Two prevalent physical mechanisms, reaction-diffusion (R-D) [10,11] and trapping-detrapping (T-D) [12], have been proposed in the literature to explain NBTI. Because T-D has not been fully proven, the R-D model is widely used to interpret the NBTI mechanisms [13]. According to the R-D model, in conventional pMOS, due to crystal mismatch at the Si-SiO 2 interface, traps are present in the form of Si dangling bonds after the growth of gate oxide. Positive interface traps are generated due to disintegration of silicon-hydrogen (Si-H) bonds and one H atom that can diffuse away from the interface under negative bias conditions, as shown in Figure 2a. As a result, the threshold voltage increases and the pMOS transistor becomes slower and fails to meet timing constraints. Once the stress is removed, the H near the Si/SiO 2 interface will anneal the broken Si-bonds, as shown in Figure 2b, where two interface traps disappear, leading to a partial recovery of the degradation. The amount of recovery is highly related to the frequency, duty cycle, the magnitude of bias change, etc.

NBTI Analytical Model
From the numerical solution of the standard reaction diffusion (R-D) model, the NBTI degradation models were developed, which formed the basis for higher-level research.
In [14], the NBTI-related increase in the threshold voltage of a pMOS transistor in the continuous stress phase from t 0 to t, i.e., static NBTI, was evaluated using Equation (1). where where ∆V th1 is the change in threshold voltage that pMOS already exhibits at time t 0 , t ox is the oxide thickness, and C ox is the gate capacitance per unit area. E 0 and E a are device-dependent parameters and constant. A NBTI is a technology-dependent constant, and k is the Boltzmann constant. T is the temperature in K. δ v is a constant added to include the impact of oxide traps and other charge residues. In a realistic working circuit, the gate voltage of pMOS will periodically change. When V gs = 0, the pMOS transistor is placed in the recovery phase, and the threshold-voltage drift is partially recovered. Equation (3) shows the final change in the threshold voltage of a pMOS transistor [14] assuming the recovery happens at t 0 , i.e., dynamic NBTI.
where η is a constant 0.35. In order to predict a long-term dynamic NBTI effect, an updated model was proposed [15], which includes the recovery effect and is useful for the estimation of degradation by years. In this model, the V th degradation after time t has passed (∆V th,t ) is expressed as Equation (4).
where T clk is the clock period, and α is the stress probability of pMOS. Here, stress probability is defined as time ratio of stress phase in one clock period, i.e., (time of stress)/(time of stress + time of recovery). Stress probability is sometimes also named duty cycle. β t is a parameter that has a dependence on temperature, T clk , α, and t. The specific formula of β t is detailed in the literature [15]. n is equal to 0.25. When the operating frequency is higher than 10 kHz, there is little relation between ∆V th,t and T clk [15]. In this case, the relationship between ∆V th,t and α can be formulated as Equation (5).
where parameter C is dependent on temperature. The specific formula of C is detailed in the literature [15].

MOSFET Model Reliability Analysis (MOSRA) Aging Model
To simulate the effect of NBTI, an efficient simulation analysis and an accurate aging model are required to translate the amount of electrical stress into device parameter degradation. There are three major industry-standard aging simulators, namely, Hewlett simulation program with integrated circuit emphasis (HSPICE) MOSRA from Synopsys, RelXpert from Cadence, and Eldo from Mentor Graphics [16]. In this work, we used MOSRA to model the circuit degradation behavior in the experiment.
MOSRA is the built-in aging model integrated into Synopsys HSPICE, which can be used to predict the long-term reliability and performance of the circuit. Recently, MOSRA was deployed in many studies for reliability simulation and circuit design using the Synopsis HSPICE platform [17][18][19]. It should be noted that the MOSRA model itself is also constructed on the basis of a specific NBTI analytic model.
Here, it is useful to briefly review how degradation is calculated by the MOSRA model. The working of MOSRA takes place in two phases, i.e., the pre-and post-stress simulation phases [20,21]. The pre-stress simulation phase is also known as the fresh simulation phase, where the electrical stress of every user-selected metal oxide semiconductor field effect transistor (MOSFET) is computed by HSPICE on the basis of the circuit behavior and the built-in MOSRA models. Then, the post-stress simulation phase is executed. The post-stress simulation simulates the degradation effect on the circuit performance on the basis of the result attained from the pre-stress simulation. After that, the result of the circuit performance is obtained, such as the circuit delay.
A simplified procedure for HSPICE MOSRA is shown in Figure 3. When SIMmode = 0, only the pre-stress simulation is selected. In addition to outputting some fresh results, a ∆V th file with the suffix radeg0 is also included. When SIMmode = 1, only the post-stress simulation is selected. In this phase, HSPICE uses the ∆V th information in the radeg0 file and updates the pMOS device model for reliability analysis. It should be noted that SIMmode is 2 by default, where pre-and post-stress simulations are executed sequentially. It is reasonable that, even though HSPICE can automate the analysis process and is usually regarded as a gold standard for reliability analysis, the simulations are extremely time-consuming if applied to circuits of a certain scale. As for the MOSRA model, the intrinsic regressive calculation process is conducted when an aging analysis is performed to achieve high accuracy.

Gate Level
NBTI aging destroys the traditional two-dimensional assumption in modern-day static timing analysis (STA), introducing the high-dimensional correlation problem. For accurate STA for NBTI, one common method involves adding extra dimensions to the traditional two-dimensional (2D) look up table (LUT ). For example, in [22], the extracted delay information, such as gate delay and output transition time, was stored in n + 4-dimensional LUTs. The n + 4-dimension is defined as (1) V th shifts of different transistors (n transistors corresponding to n dimensions in the LUT) inside the cell, (2) input slope, (3) output load, (4) temperature, and (5) voltage. For dimensionality reduction, [23] proposed three-dimensional (3D) LUTs, including three input variables: input slew, capacitive load, and NBTI stress probability.
As discussed above, aging adds new challenges to the existing timing analysis flow, as it complicates the simple variation model assumed by the modern timing libraries. As a possible solution, learning-based timing characterization is under active research. The authors of [24] proposed a learning-based method for predicting the NBTI-induced delay degradation in large designs like processors. The training design contains tens of thousands of training samples, and the sample is a gate delay associated with a particular set of predictor parameters. Although the work in [24] used a machine learning method, it essentially constructed an aging-aware 3D LUT for each cell as proposed in [23]; thus, the author also classified it as an aging-aware LUT.
In general, each cell in the technology library file is characterized using accurate HSPICE simulations when the threshold-voltage drift of a pMOS transistor is calculated by the NBTI analytical model, which also applies to the literature [24], because each training sample also needs HSPICE simulations to obtain the gate delay. Hence, these methods have a high library characterization overhead. In addition, the traditional two-dimensional LUT has inherently inaccurate estimations caused by the interpolation and extrapolation methods. This problem becomes more serious with higher dimensionality.

Path Level
The performance degradation result of the circuit can be obtained through integrated software, such as STA tools and in-house tools, which essentially encapsulate the lower level technology as discussed above. Once the netlist of the circuit is entered, an increase in path delay can be obtained. The specific implementation process is not described in detail here.
The above approaches and tools are either inaccurate or extremely time-consuming for long-range aging simulations. In searching for a low-runtime and high-accuracy solution for aging-aware circuit path delay estimation, we utilized regression-based machine learning algorithms in path level firstly. Compared with the traditional method, the solution proposed in this paper is directly oriented toward the circuit path, which does not need to consider the implementation details of the gate level and below.

Main Idea
In order to study the aging characteristics of the circuit and find the hidden rules, sufficient experiments were implemented on the basis of ISCAS'85, ISCAS'89, and ITC'99 benchmark circuits. The MOSRA model was applied to a certain number of worst-case path candidates from c7552 within ISCAS'85 for aging-aware delay degradation over 10 years in time steps of 1 year at a temperature of 400 K. The HSPICE simulation results are shown in Figure 4. For each curve, 11 time points, including the fresh time at time = 0, are marked on the horizontal axis. The vertical axis represents the aged delay corresponding to each time point. The detailed experimental process can be found in Section 4. It can be observed that the path delay changes in the first year were relatively large, while the degradation speed in the subsequent years slowed down. In addition, there were high similarities in the changing trends among different benchmark circuit paths, which could be well utilized by machine learning. The characterized path delay, collected from HSPICE simulations or real chip measurements, could be trained as the sampling set. For a new path, inference could be made on the basis of a small amount of known data, thereby reducing the cost of the entire path characterization.

Proposed Learning-Based Framework
An overview of the proposed framework is shown in Figure 5, where ML is the abbreviation of machine learning. Similar to the traditional machine learning architecture, the framework mainly consisted of two stages: training and inference. The training sample set contained two parts: input and output. Each training input corresponded to a training output. The available machine learning model could be obtained by training; then, the inference process could predict the unknown result. Taking Figure 4 as an example, we first divided the curves into two categories: TR = {tr i , 1 ≤ i ≤ o} for training, i.e., where the delay of all the time points on the curve was already known, and PR = {pr j , 1 ≤ j ≤ p}, i.e., where the aged delay corresponding to only a few time points on the curve was already known, but most other time points needed to be predicted. Therefore, we needed to distinguish the time points on the horizontal axis.
Suppose there are n time points on the horizontal axis; then, the set is denoted as X, which is divided into two categories: XI = {xi x , 1 ≤ x ≤ m}, XO = {xo y , 1 ≤ y ≤ n-m}. The difference between xi x and xo y is that, for any curve pr j , the aged delay corresponding to xi x , denoted as d (pr j ) xi x , is known, while the aged delay corresponding to xo y , denoted as d (pr j ) xo y , is to be predicted. As for the curve tr i , the aged delays of xi x and xo y , denoted as d xo y respectively, are all known. It should be noted that xi x and xo y can be sorted crosswise on the horizontal axis. Now, the problem we want to solve is that, given a curve pr j to be predicted, the d (p,r j ) xi x value of any time point xi x ∈ XI is known, and we need to predict the d (p,r j ) xo y corresponding to all other time points xo y ∈ XO. The corresponding machine learning model form iŝ whered (pr j ) xo y is the predicted output corresponding to xo y on the curve pr j , and ML represents a machine learning model.
For utilizing the similarity between curve tr i and pr j , and reducing the training set space, this article drew on research results from the literature [25]. To build this model, from the point of view of machine learning, the training sample sets were constructed as shown in Equation (7), where TR in and TR out are the training input and training output blocks in Figure 5.
Unlike the traditional regression model, this is a multi-input-multioutput regression model, which was developed using Python and scikit-learn libraries, such as linear regression, k-nearest neighbor regression, and random forest regression. After this model was trained, we could perform inference. The predicting sets were constructed as shown in Equation (8), where INF in and INF out are the predicted input and predicted output blocks in Figure 5. During the inference process, INF in only represented one row or several rows of data at the same time.
Taking Table 1 as an example, it can be seen that there were four training samples from TR = {tr i , 1 ≤ i ≤ 4} and one predicting curve, while there were eight time points on each curve, where XI = {xi x , 1 ≤ x ≤ 3}, XO = {xo y , 1 ≤ y ≤ 5}. The data in the grid represent the delay of each circuit path corresponding to time points xi 1 -xi 3 and xo 1 -xo 5 . According to the previous description, TR in , TR out , and INF in were defined as shown in Equations (9) and (10), while INF out was composed of five predicting dates, represented as "ˆ" in Table 1.  . Numerical Experiment

Circuit Level Experiment Setup
We evaluated the learning-based chip age prediction model on a subset of benchmark circuits. The circuits were synthesized using a Synopsys Design Compiler with Nangate 45 nm Open Cell Library. We selected 30 worst-case path candidates from c499, c6288, and c7552 within the ISCAS'85 benchmark, s13207, s15850, and s38584 within the ISCAS'89 benchmark, and b04, b08, and b14 within the ITC'99 benchmark using PrimeTime tool. These candidates constituted training sample sets and predicting sets. Other parameters were as follows: 1 ≤ x ≤ 3}, and XO = {xo y , 1 ≤ y ≤ 8}, where xi 1 , xi 2 , and xi 3 correspond to 0 years, 1 years, and 5 years.
As discussed above, circuit aging depends on the operating conditions, such as temperature, voltage, and stress probability. For simplicity, we fixed the year of aging and temperature at 10 years and 400 K for NBTI during the experiment. The input signal source is described in Table 2, including the static and dynamic NBTI effects. As for the dynamic NBTI effect, different types of stress probability are listed. We then used Synopsys HSPICE to capture the aging effects of the critical path candidates. The prediction accuracy was measured according to two types of error, the relative root-mean-square error (rRMSE) and the mean absolute error (MAE), which were calculated using the following equations: whered xo y and d xo y are the predicted and real outputs at xo y of the curve. In this article, it was stipulated that each predicting circuit path corresponded to a rRMSE or a MAE. A lower error denoted the better prediction and fitting performance of the learning method.

Static NBTI Condition
We considered the static NBTI first, i.e., case 1 shown in Table 2. During the experiment, the accuracy of the machine learning prediction and the running performance were the two key evaluation points. The experimental results are listed in Tables 3-5, where rRMSE and MAE are the average values of all pr j (1 ≤ j ≤ 20) using the linear, k-nearest neighbor, and random forest regression methods, while runtime is the time spent on the critical path by the three regression methods and HSPICE. Each evaluation was applied to the benchmark circuits mentioned above. Here, we list the three circuits from ISCAS'85, ISCAS'89, and ITC'99 benchmark. It should be noted that the runtime under each case fluctuated in a small range with different operations; thus, the runtime here can be regarded as an average value.
It can be seen from Tables 3-5 that the error obtained using linear regression, whether for rRMSE or MAE, was relatively concentrated, while that obtained using the other two methods was divergent. As for MAE, linear regression provided the lowest error level in each circuit case, improved by one or two orders of magnitude compared with the other two regression models. The rRMSE result showed a similar improvement.
In terms of runtime, the computation overhead could be significantly reduced by adopting any regression method. In fact, the time spent in HSPICE simulation was greatly related to the complexity of the critical path, while the time spent in regression methods remained basically unchanged. For different critical paths, the runtime ratio between HSPICE and linear regression changed significantly. A longer critical path resulted in a more obvious improvement effect of runtime caused by the regression method.  Considering the two factors of prediction accuracy and central processing unit (CPU) runtime, it can be concluded that the linear regression, achieving the highest prediction accuracy with minimal computation overhead, was more suitable for solving the problem than the k-nearest neighbor and random forest regression methods. On the other hand, it also illustrated the highest similarity among the curves.
In order to more intuitively observe the accuracy of linear regression prediction, MAE could be used. Taking the critical paths from c499, c6288, and c7552 as examples, the data obtained using the model prediction and the actual values are shown in Figure 6. Excellent predictability was observed between the predicted and the real data.

Dynamic NBTI Condition
Next, we focused on the linear regression and checked its performance for dynamic NBTI corresponding to case 2, case 3, and case 4, as presented in Table 2.
Taking the critical path from c499 as an example, the relationship among the ∆V th,t of one pMOS, the path delay, and the NBTI stress probability is shown in Figure 7. It can be seen that the ∆V th,t and path delay increased as the stress probability increased due to the overall stress time increasing. This conclusion also applied to the critical paths from c6288 and c7552. The prediction accuracy and runtime verification was performed using different types of NBTI stress probability, and the experimental results are shown in Tables 6-8. It can be seen that the accuracy and runtime of linear regression showed good consistency under dynamic NBTI conditions. Compared to the static NBTI condition, rRMSE and MAE had almost the same accuracy level. However, the runtime of HSPICE was significantly longer, and the variation in linear regression was relatively small, showing that the linear regression had better speedup here.

Comparison with Other Studies
Similarly to [23,24], our work also used the Nangate 45nm Open Cell Library, assuming 400 K and 10 years of operation during the experiment, and the simulation results of HSPICE were used to verify the accuracy and runtime performance. In [18,19], the aged delay of each circuit path in the designated 10th year was measured (corresponding tod xo 8 in this paper) and then compared with the HSPICE date (corresponding to the d xo 8 ). The obvious difference is that [23,24] calculated the aged path delay using the LUT method, while our proposed method was a machining learning framework at the circuit path level.
In order to verify our proposed method, we compared our experimental results with [23] and [24], as shown in Table 9. In [23], the numerical experiments were conducted on a circuit design, while, in [24] and this paper, a set of designs was used, where we only considered the worst case. The technique in [23] took a total of 14 s to calculate all path delays, and the average time was approximately 0.00055 s. From the experimental data, it can be judged that our proposed method had better performance in terms of prediction accuracy and runtime.

Conclusions
Traditional circuit aged delay is mostly based on the LUT scheme. Each cell in the library is characterized by accurate HSPICE simulations, which clearly result in high computation overhead. This paper proposed a novel learning-based aged delay prediction method at the circuit level. The experimental results showed that the framework with linear regression could greatly reduce the runtime while ensuring accuracy.
It should be noted here that the proposed method has a precondition, i.e., the input signal is very regular, whether under static or dynamic NBTI conditions. However, for some very special workloads which cannot be expressed in the manner shown in Table 2, even if the usable training sample sets are available, the application effect of the method remains to be verified.
During the experiment, we found that, when the model trained on one benchmark circuit was directly used for inference on another benchmark circuit, the prediction accuracy decreased. Furthermore, this study was not extended to experiments using real silicon. We will address these issues in future research.