1. Introduction
The demand for ultra-low-power circuits has increased significantly in recent years. Operating with a supply voltage below the transistor’s threshold voltage offers great potential for energy savings and has thus received special attention. However, this approach introduces a major challenge. In the subthreshold region, the drive current changes exponentially with the threshold voltage (
Vth). Consequently, circuits become highly sensitive to process variations. This sensitivity can lead to cell delay changes of up to 300% [
1], posing significant challenges to circuit reliability.
Statistical Static Timing Analysis (SSTA) is a powerful method for accurately estimating circuit delay variations caused by on-chip process fluctuations. The most fundamental SSTA technique is the Monte Carlo (MC) simulation. MC simulation is highly accurate because it does not require simplifying assumptions about parameter distributions or gate delay models. Its precision is also independent of the number of variables or circuit size. For these reasons, MC simulation is considered the “gold standard” for timing sign-off. However, its significant computational cost makes it too time consuming for iterative timing closure.
A key aspect of SSTA is the choice of statistical distribution to model gate delay. While various distributions, such as skew-normal and gamma, have been used to model the non-Gaussian nature of delay, the lognormal distribution is often favored in analytical models. This preference is rooted in its physical basis, as the total delay can be seen as a product of multiple independent random factors, which, by the central limit theorem in the log domain, leads to a lognormal distribution. It also offers significant analytical convenience for tasks like variance propagation. For a simple illustration, consider a delay D proportional to 1/μ and L2, where mobility μ and channel length L have small, independent normal variations around their nominal values. The resulting delay D will be a product of these random variables, making its distribution skewed and well approximated by a lognormal distribution. In this work, we leverage this well-established lognormal model to develop an analytical framework for the challenging subthreshold region.
To reduce the high cost of MC simulation, several faster methods have been proposed, such as Latin hypercube sampling [
2,
3] and Quasi-MC sampling [
4,
5]. However, as technology scales and the number of process variables increases, the accuracy of these sampling-based methods diminishes. Other approaches focus on improving the delay model itself. For instance, V. Khandelwal et al. developed a second-order delay model to capture the nonlinear relationship between delay and process parameters [
6,
7,
8,
9]. Similarly, M. Vijaykumar and V. Vasudevan proposed a skew-normal delay model [
10], and H. Yan et al. used an adaptive polynomial method for wide-voltage circuits [
11]. Concurrently, recent algorithmic approaches continue to address skewed distributions by employing techniques like Gaussian Mixture Models (GMMs) to more accurately model non-Gaussian effects [
12]. While these methods improve simulation speed and maintain accuracy, they are primarily mathematical models and offer limited circuit-level design insights.
In contrast to purely mathematical approaches, design-oriented SSTA models are derived from fundamental device physics. This approach was explored by J. Shiomi et al., who modeled the correlation between adjacent gates with a constant factor [
13,
14]. M. Alioto et al. proposed an analytical model based on the Fanout-of-4 (FO4) inverter metric [
15], which was later refined by Y. Cui et al. [
16]. P. Cao et al. further advanced this area by describing the path delay as a linear combination of correlated gate delays in the near-threshold region [
17]. A key advantage of these methods is their ability to estimate delay variation with minimal simulation. However, a significant limitation is that they have primarily focused on combinational logic, neglecting sequential cells.
The statistical modeling of sequential cells remains an open challenge. S. Fisher et al. proposed a delay model for sequential cells but did not consider the impact of the clock path on the propagation delay (
Tcq) [
18]. Other researchers have used machine learning and analytical methods to study the correlation between setup and hold times [
19,
20], the correlation between propagation delay and clock slope, or the interdependencies among setup, hold, and propagation delay [
21,
22]. The application of advanced machine learning, particularly Graph Neural Networks (GNNs), continues to be an active area of research for SSTA, with recent work focusing on balancing accuracy and runtime for non-Gaussian analysis [
23]. While this body of work effectively reduces timing pessimism, it focuses on the correlation between timing parameters rather than the statistical characteristics of the delays themselves. Consequently, these models have not been extended to full sequential path analysis.
This paper addresses this gap by deriving analytical statistical delay models for both sequential cells and full sequential paths in the subthreshold region. The main contributions of this work are summarized as follows:
- (1)
Stack Structure Model: We propose a statistical delay model for stack structures that considers multi-transistor threshold voltage variations, decomposing their combined effect on gate delay.
- (2)
Flip-Flop Model: We model the flip-flop’s non-step input delay as a linear combination of the step delays of the current and previous gates. This approach effectively decouples the correlation between adjacent gates within the flip-flop.
- (3)
Efficient Path Model: We propose a method to compute sequential path delay variance based on the PVT-independent nature of relative delay sensitivity. This allows the model to be extended to multiple PVT corners using only a single nominal SPICE simulation, significantly reducing characterization time.
The remainder of this paper is organized as follows:
Section 2 describes the master-slave flip-flop structure and defines its key timing parameters.
Section 3 and
Section 4 derive the analytical statistical delay models for the flip-flop and the full sequential path, respectively, and present experimental results. Finally,
Section 5 provides a summary of our findings.
2. Preliminaries
A master-slave flip-flop is the most commonly used flip-flop in circuits. As shown in
Figure 1, in order to verify the reliability of the flip-flop, STA checks two types of timing constraints: setup time and hold time. After correctly capturing the logical value, there is a delay from the clock to the output, during which the captured value propagates to the flip-flop output, as shown in
Figure 2.
The setup time is the minimum time that the input D must remain stable before the clock rising edge. For the master-slave flip-flop, the input must be propagated through
I1 and
I2 before the clock rising edge comes, so as to ensure that the voltage at both ends is equal when
I8 is turned on; otherwise, the cross-coupled inverters
I2 and
I8 will stay at an incorrect value. Thus, the expression of setup time of the flip-flop is:
The hold time is the minimum time that the input D must remain stable after the clock rising edge. The hold time is actually to ensure the data on the D pin cannot affect the data sampled by the flip-flop after the clock rising edge. Thus,
I1 must be closed when the rising edge of the clock is approaching. However, since the delay of D pin to
I1 and CLK to
I1 are different, the data at the D pin must remain unchanged until the CLK signal reaches
I1. Therefore, the expression of hold time of the flip-flop is:
The propagation delay
Tcq is the delay from the start of the clock rising edge to the output Qn of the flip-flop. The expression of
Tcq can be written as:
5. Experiments and Discussion
In order to verify the accuracy of the proposed statistical delay model, this section verifies the model proposed in the previous sections on the ISCAS’89 benchmark circuit, and the supply voltage given operates in the subthreshold region. The whole experiment takes the MC simulation results as the gold standard and compares them with other methods.
5.1. Experimental Setup
In this section, the proposed statistical timing model for sequential path was validated by using the process of TSMC28nm in the subthreshold region. The digital circuit flow is adopted to verify the proposed model by analyzing multiple benchmark circuits in ISCAS’89. The RTL of the benchmark circuit is converted into a gate level netlist by synthesis, and then the path netlist is extracted by Prime Time. The critical paths selected for validation from these benchmarks feature diverse logic depths, various cell types, and realistic fan-out conditions, ensuring a comprehensive test of our model’s performance on complex sequential paths. First, the timing modeling of the inverter and stack structure is carried out, and the propagation delay statistical value of the flip-flop are obtained by using the inter-correlation between stages. Then, the lookup table of the relative delay sensitivity Xcell0 of the combinational logical cell is obtained by analytical method, and the statistical delay of the combination logic path are further obtained by inter-stage correlation. Finally, the statistical delay model of the sequential path is obtained by combining the previous two steps.
5.2. Experimental Result and Analysis
In this section, the model proposed in this paper is verified and analyzed through three variation values: path delay standard deviation and path maximum and minimum delays.
As shown in
Table 8,
Table 9 and
Table 10, the statistical delay value of one path is verified under different processes, voltages, and temperatures. Under the process corners FF, TT, and SS, the mean error of the standard deviation of the path is 4%, the mean error of the maximum delay value is 5.3%, and the mean error of the minimum delay value is 7.5%. Under 0.3 V, 0.27 V, and 0.25 V supply voltages, the mean error of standard deviation is 3.4%, the mean error of maximum delay value is 4.7%, and the mean error of minimum delay value is 8.8%. At −25 °C, 25 °C, and 75 °C, the mean error of standard deviation is 1.4%, the mean error of maximum delay is 5.2%, and the mean error of minimum delay is 8.1%.
In addition to process and voltage variations, the model’s performance was also evaluated across a range of operating temperatures. As presented in
Table 11, the model maintains high accuracy at -25 °C, 25 °C, and 75 °C, with an average error of only 1.4% for standard deviation, demonstrating its thermal robustness.
In order to further verify the accuracy of the model, five circuits are selected from ISCAS’89 benchmark, namely s27, s1196, s13207, s15850, and s1423. Among them, s1423 has the largest number of cells, which is 56. The comparison error results of three metrics with respect to 10,000 runs of MC simulation are listed in
Table 12.
The percentage error for each metric is calculated using the following formulas, where Predicted refers to the value from our model, and MC refers to the value from Monte Carlo simulation:
From the table, the average errors are 5.23%, 5.62%, and 5.92% for standard deviation, maximum delay, and minimum delay, respectively. Compared with other two methods, it can be found that the standard deviation, maximum delay, and minimum delay prediction achieve up to 6.0X (10.2X), 4.3X (4.2X), and 2.7X (2.8X) precision improvements, respectively. This indicates that our proposed model is much more accurate than the method in [
15], which does not consider the inter-stage correlation, and the method in [
18], which does not consider the influence of the clock path on the propagation delay
Tcq.
Beyond its high accuracy, our model’s primary advantage is its computational efficiency. This efficiency arises from a fundamental methodological difference with Monte Carlo simulations. MC analysis is computationally expensive, as it relies on thousands of repetitive circuit simulations to achieve statistical convergence.
In contrast, our analytical model requires only a single nominal SPICE simulation to extract key parameters. The statistical moments are then computed directly via analytical expressions, a step with negligible computational cost. Essentially, our method replaces thousands of transient simulations with one nominal simulation and a simple calculation.
This approach provides a highly favorable time–accuracy trade-off. By maintaining high fidelity to the MC results while drastically reducing runtime, our model enables the rapid, iterative timing analysis essential for modern design cycles, a task where MC simulation is prohibitively slow.
The practical impact of this efficiency in industrial scenarios is significant. By providing fast and accurate statistical timing feedback, our model can be integrated into standard CAD flows to enable early, variation-aware design optimization. This can help reduce the number of costly design iterations and shorten the overall time to market. Furthermore, its high speed makes it a candidate for reducing the reliance on lengthy MC simulations during the final timing sign-off, potentially saving days of computation on large designs.
6. Conclusions
This paper presented an analytical statistical model for sequential path delay variation in low-voltage circuits, based on the lognormal distribution. We introduced a novel method to characterize the delay of complex structures, including stacked transistors and flip-flops, by decomposing the impact of multi-threshold voltage variations and resolving inter-stage correlations through a linear delay model. The use of relative delay sensitivity further enhances the model’s efficiency, allowing for rapid characterization across multiple PVT corners from a single nominal simulation.
Compared to purely mathematical or machine learning-based approaches, our analytical model offers distinct advantages in terms of interpretability and design insight. While ML models may achieve high accuracy after extensive training, our physics-based model provides a clear, transparent relationship between process parameters and timing variations without requiring a large dataset. This makes it particularly valuable for early-stage design exploration and optimization. A qualitative comparison summarizing these trade-offs against other state-of-the-art methods is provided in
Table 13.
While the proposed framework demonstrates high accuracy and efficiency, several avenues for future research are identified to further enhance its robustness and applicability. The model’s scalability to advanced nodes like FinFETs is a promising direction; although this would require adapting the underlying device equations, the core methodology of linear decomposition and variance propagation remains applicable. The framework could also be extended to incorporate time-dependent aging effects, such as NBTI and HCI, by modeling their impact as a long-term shift in the threshold voltage distribution. Furthermore, a comprehensive evaluation using additional industrial metrics, such as Mean Absolute Error (MAE) and 95th-percentile delay, would further validate its practical utility for timing sign-off. Finally, while the model shows excellent agreement with extensive SPICE simulations, validation against silicon measurements from a test chip is the ultimate goal to confirm its real-world performance. Addressing these aspects will be the focus of our future investigations.