You are currently viewing a new version of our website. To view the old version click .
Electronics
  • Article
  • Open Access

11 September 2022

Design of Light-Weight Timing Error Detection and Correction Circuits for Energy-Efficient Near-Threshold Voltage Operation

,
,
,
and
1
National ASIC System Engineering Technology Research Center, Southeast University, Nanjing 210096, China
2
Department of Electrical and Computer Engineering, University of Alberta, Edmonton, AB T6G 1H9, Canada
*
Authors to whom correspondence should be addressed.
This article belongs to the Special Issue VLSI Circuits & Systems Design

Abstract

Near-threshold voltage (NTV) operation has the potential to improve the energy efficiency of digital integrated circuits. However, the use of a conservative timing guard band to avoid the timing errors introduces excessive timing margins, thus causing larger energy dissipation in the NTV region. An error-tolerant design based on timing error detection and correction circuits has been shown to be a promising solution to mitigate these issues. This paper presents a light-weight timing error-tolerant flip-flop (ETFF) design. This design detects timing errors using a node transition signal detector with only nine transistors and corrects these errors during the same clock cycle. Moreover, transistor sizing is explored to optimize the trade-off between performance and area overhead. The proposed ETFFs are inserted into a monitored circuit by replacing original flip-flops at timing-monitored points. To further reduce the overhead, we develop a mean-time-to-failure-aware method to select the monitored points by simultaneously considering the critical path coverage and activation rates of flip-flops. The simulation results show that a CNN accelerator using the proposed timing error-tolerant design implemented in the SMIC CMOS 40 nm process can robustly work at 1.1–0.3 V with only 3.5% area overhead. Furthermore, this design reduces the area overhead by 54.68% and improves the energy efficiency by 53.69% at 0.6 V, compared with the Razor flip-flop design. The advantage of the proposed design lies in that it requires smaller circuit overheads and can work reliably in a wider range of supply voltages.

1. Introduction

Lowering supply voltages to the near-threshold voltage (NTV) region is one of the effective techniques for achieving higher energy efficiency in energy-constrained circuits [1,2,3]. However, NTV operations also cause new challenges due to the increasing delay caused by process, voltage and temperature (PVT) variations under the scaling voltages [2]. These challenges are specifically manifested as: (1) over 10× loss in performance, (2) 5× increase in performance variation, and (3) a five-order of magnitude increase in the functional failure rate of memory and logic circuits [3]. Moreover, the PVT-induced variations affect both the clock signals and data paths, so the critical paths may fail to deliver the output data within the given clock period [4]. Furthermore, timing errors in data paths cannot be tolerated by masking because the delay of bit flipping will be recurrently accumulated in circuits such as the multiply-accumulate (MAC) units in a neural network (NN) processor [2]. Thus, the propagation of timing errors incurs a significant accuracy loss, especially in the deep neural network (DNN) accelerators containing a large number of MACs [5].
Conventional integrated circuit designs avoid the PVT-induced timing errors by reserving voltage and timing margins as a timing guard band. However, the conservative guard band causes the reduction in throughput and excessive cost of energy wasting [5], because a circuit does not always work in the worst case. Timing error-tolerant techniques based on the error detection and correction (EDAC) circuits have emerged as a promising solution [2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27]. The EDAC designs use the timing error detection (TED) circuits to monitor the timing conditions of circuits at run time. The timing error correction (TEC) circuits are designed to recover the timing errors resulting from the delay violations. Thus, the high operation frequency can be retained under the lower supply voltages. Moreover, the EDAC design can be used with the adaptive voltage frequency scaling technique to eliminate the excessive voltage and timing margins, further saving the energy consumption [3,25,26].
The EDAC designs have been researched for many NN accelerators [2,3,5,6,7,8,9,10,11,12,19,20,27] and the circuits of microprocessors [13,14,15,16,17,18,21,22,23,24,25,26]. One prominent EDAC design is the Razor flip-flop (RFF) [13]. An RFF detects timing errors by comparing the outputs of a shadow latch and the main master-slave flip-flop (MSFF). It corrects timing errors by refreshing the instruction and redoing operations. However, the TED design of the RFF causes considerable circuit costs in power and area. The TEC design increases the constraint of the hold time, which makes the TEC design of the RFF unsuitable for NTV operations [1].
In order to expand the operating voltage range and achieve a higher energy efficiency, we improved and extended our previous work [27]. In this paper, a timing error tolerant flip-flop (ETFF) is proposed and applied in the processing element (PE) circuits of a convolutional NN (CNN) accelerator, as shown in Figure 1. An ETFF consists of a node transition signal detector (NTSD) using only nine transistors and a data selection error correction unit (DSEC). The NTSD monitors the timing conditions by detecting the wrong transitions of nodes, which are caused by delay violations. Once the delay of a circuit violates the timing constraints as the supply voltage reduces, the NTSD will immediately generate an error signal. The DSEC is designed based on a conventional transmission-gate flip-flop (TGFF) [28] and two extra transmission gates. The DSEC driven by error signals from the NTSD will then select valid input data to recover the timing errors during the same clock cycle.
Figure 1. A PE circuit using the proposed ETFF design in a CNN accelerator.
Compared to our previous work [27], the improvement and extension introduces the following two novelties: (1) Transistor sizing for the proposed ETFF is explored to further improve the trade-off between power, delay and area. The lowest supply voltage that the proposed ETFF steadily works at is extended to 0.3 V. (2) The proposed TEC design of the ETFF is simplified and improved to retain the robust edge-sampling characteristic of a master-slave flip-flop with only two extra transmission gates. The main contributions of this work are as follows:
  • A light-weight timing error-tolerant circuit, namely, the ETFF, is designed to extend the lowest operation voltage to 0.3 V with a 25.63% area reduction compared with the RFF design [13].
  • Transistor sizing is used to improve the power-delay product (PDP) of the proposed ETFF by 9.16–99.84% at supply voltages of 1.1–0.3 V.
  • Benefiting from the proposed EDAC design, a CNN accelerator implemented in the SMIC COMS 40 nm process can reliably perform the classification at the supply voltages in the NTV region with an energy saving of up to 55.29%.

3. Proposed Timing Error-Tolerant Flip-Flop

In this section, the structure and principle of the proposed light-weight EDAC design ETFF are illustrated. The ETFF uses a node transition signal detector (NTSD) with only nine extra transistors to detect the timing errors. These errors are corrected in the same clock cycle by the proposed data selection error correction (DSEC) unit. Moreover, in order to use fewer EDAC circuits to realize more effective timing detection, a mean-time-to-failure-aware hybrid selection (MAHS) method is proposed, considering the variability in noncritical paths.

3.1. Node Transition Signal Detector (NTSD)

The proposed NTSD circuit consists of seven transistors denoted by M1–M7, respectively, and one skewed inverter denoted by I8. The schematic and operation of the NTSD are presented in Figure 3 and Table 1. M7 controlled by the clock signal donated by CK is used as the detection window regulator. It determines if the NTSD works at the range of the high clock phase. During the low clock phase, M7 is switched on, keeping the signal of the FVDD node in logic-high. When the clock pulse is high and M7 is switched off, the FVDD node becomes a floating node. Once the transitions of the input signal denoted by D occur, the voltage at the FVDD node will immediately drop and I8 will generate a timing error signal.
Figure 3. The schematic and operation of the proposed NTSD: (a) the input data D transitions from logic “0” to “1”; (b) the input data D transitions from logic “1” to “0”.
Table 1. Operation of the proposed NTSD design.
The timing error detection principle of the NTSD, under two input data transition scenarios, is explained in detail below. As shown in Figure 3a and Table 1, when D is logic “0”, M2 and M3 are switched on. So, the FVDD node will be in the logic-high state, the same as the internal node denoted by n1. The internal node denoted by n2 will stay in logic-low under the normal transmission without a timing error. Once D transitions from logic “0” to “1”, M2 is abruptly switched off, n1 is discharged to logic-low state and M4 is switched on. However, the floating node FVDD will be discharged to logic-low because n2 is discharged by M6 to stay in logic-low for a short time.
When the input D is logic “1”, M1 and M4 are switched on, as shown in Figure 3b. So, the FVDD node and n2 stay in logic-high, n1 will stay in logic-low under the normal transmission. Once D transitions from logic “1” to “0”, M4 is abruptly switched off and the floating node FVDD will be discharged to logic-low state, because n1 discharged by M5 will be in logic-low for a short time. Then, I8 connected to FVDD will promptly capture the voltage change and generate the timing error signal.
To ensure these abrupt transitions can be immediately detected under NTV, the NMOS transistor M5 and M6 are used as a discharge tube to make the node n1 or n2 stay in logic-low for sufficient time. Otherwise, the floating node FVDD will not be fully discharged to active I8 to generate a timing error signal. Consequently, it requires a higher ratio of width to length to ensure the discharge characteristics of M5 and M6.
The sizing issues in the proposed transistor level design are analyzed as follows. During the detection phase, the voltage at the floating node FVDD will drop due to the charge-sharing effect. These charges will flow from node FVDD and n2 through M4 and M6 to VSS when the input signal D changes from logic “0” to “1”, or from node FVDD and n1 through M2 and M5 to node VSS in another case, as shown in Figure 3. The proposed design detects delay errors by capturing the discharge state of the floating node FVDD. Thus, three techniques can be applied to improve the functionality and robustness of the proposed design.
  • The inverter I8 requires skewed transistor sizing to ensure that it has a sufficiently high logic threshold voltage regardless of process corners.
  • The node capacitance at n1 and n2 must be increased through the transistor sizing to support sufficient charges.
  • The transistor sizes of M5 and M6 must be enlarged to ensure the fast and sufficient voltage reduction at the floating node FVDD and a successful logic switch occurs at the node denoted by ERR.
Notably, all of these design techniques must consider the effects of extra area consumption and delay exacerbation under serious NTV PVT variations. Moreover, a limited and varying voltage swing leads to a small noise margin and large delay penalty in the skewed inverter I8. These concerns render the design of this NTSD challenging. The transistor sizing process for I8, M5 and M6 is explored to improve the energy efficiency and enable the proposed EDAC design to robustly work at NTV, as discussed in Section 3.3.

3.2. Data Selection Error Correction (DSEC)

The DSEC circuit based on the structure of the conventional TGFF [28] is composed of two latches and two transmission gates denoted by G1 and G2, as shown in Figure 4. G1 and G2 are driven by error signals from the NTSD to select the valid inputs. Under the nominal timing conditions, when the system circuits work without timing errors, G1 stays switched on and G2 stays switched off.
Figure 4. The schematic of the proposed ETFF.
Once the transition of input signal caused by delay violations occurs, the NTSD will generate an error signal and transmit it to G1 and G2. G1 will be promptly switched off and G2 will become transparent to select the valid input signal after late transition. Then, the output of the slave latch denoted as Q will follow the valid input signal through G2.
Combining the DSEC circuit with the NTSD, the ETFF is designed. The schematic of the proposed ETFF is shown in Figure 4. As a direct result, the proposed ETFF retains the edge-sampling characteristic of a master-slave flip-flop with the abilities of detecting and correcting timing errors. Characteristics of the proposed ETFF compared to the RFF design [13] and standard TGFF cell [28] working at 0.6 V are shown in Table 2. The ETFF with merely nine extra transistors only has 1.7× area overhead and 1.59× switching energy of the standard TGFF, compared with the RFF design which has 2.3× area overhead and 2.12× switching energy of the TGFF. Moreover, this design has a shorter average error detection delay and does not need one extra clock cycle to reload valid data from memory circuits, compared with the RFF design. This further improves the efficiency of application circuits.
Table 2. Characteristics of the proposed ETFF compared to the Razor [13] and TGFF [28] at 0.6 V under TT process corner @ 25 °C.

3.3. Transistor Sizing

To ensure the inverter I8 to capture a subtle voltage dropping at the floating node FVDD, I8 requires skewed transistor sizing to have a sufficiently high logic threshold voltage. We investigate the impact of inverse narrow PMOS width effect [31] on the threshold voltage at different supply voltages with SMIC 40 nm HVT process technology. The results are shown in Figure 5a, indicating that the variation of threshold voltage increases as the supply voltage decreases. The threshold voltage of the inverter remains nearly flat for transistor width larger than 400 nm but decreases quickly as the transistor width approaches the minimum width (W = 120 nm). To minimize the area overhead, we set the width of the PMOS transistor in the skewed inverter I8 as 400 nm.
Figure 5. (a) VT of the inverter with different PMOS transistor sizing. (b) The lowest operating voltage of the NTSD with different sizing of M5 and M6.
The lowest operating voltage of the NTSD decreases as the width of M5 and M6 increases, as shown in Figure 5b. At the operation frequency of 10 MHz, the lowest operating voltage remains nearly flat when the width of M5 and M6 transistors increases to larger than 500 nm. At the operation frequency of 100 MHz, the lowest operating voltage remains nearly flat when the width of M5 and M6 increases to larger than 800 nm.
Figure 6 indicates that the delay of timing detection decreases at the supply voltages of 1.1–0.3 V as the width of transistors M5 and M6 increases. The change in the delay of the NTSD is insignificant when the width of transistors is larger than 500 nm. As the supply voltage increases from the standard voltage 1.1 V to the NTV, the delay of the NTSD increases much more quickly than expected due to drain current increasing.
Figure 6. The delay of timing detection with various transistor sizing and supply voltages.
Simulation results in Table 3 present the performances of average power, worst case delay and the PDP of the proposed NTEE with the different sizes of M5 and M6 at supply voltages of 1–0.3 V. The 9.16–99.84% reduction in the PDP indicates the effectiveness of transistor sizing method. Although the delay in the worst case increases with the voltage scaling, the PDP reduces and the reduction trend gradually decreases as the width of M5 and M6 increases. These precipitously change at a supply voltage of 0.5 V (almost NTV). Thus, the proposed ETFF achieves the lowest PDP at the supply voltage of 0.5 V, although the power saving reduces 5× compared with the lowest supply voltage of 0.3 V.
Table 3. The power, delay and PDP of the proposed ETFF with transistor sizing.

3.4. Proposed MTTF-Aware Hybrid Selection (MAHS) Method

Considering the variability in noncritical paths, we introduce the mean-time-to-failure (MTTF) constraint [32] to propose an MTFF-aware hybrid selection (MAHS) method. This method simultaneously considers the coverage and activation rates of all FFs instead of only circuit paths. The constraints of the MTTF and the circuit cost in area (the number of the monitored points) are also considered to select the final monitored registers in application circuits.
The automatic flow using the proposed MAHS method is presented in Figure 7. The STA and VCS dynamic simulations are performed to output the information of the FFs, data paths and timing conditions of the monitored circuit. Then, the FFs are sorted by the values of the covered paths and activation rate, by using the python script. All of the FFs on the data paths are scanned to find the FFi with the maximal coverage rate, until the number of data paths covered by the FFs is not smaller than 60% of all data paths. After activation rates of FFs are scanned, an FF with an activation rate larger than 60% will be selected even if it has a path coverage rate less than 60%.
Figure 7. An automatic design flow of the proposed MAHS algorithm.
As shown in Figure 8, the node B with the same path coverage as node A is selected as the candidate FF, because it has a higher activation rate over the node A. The node D with a smaller activation rate will not be chosen, although its path coverage rate is larger than 60%. The coverage-rate-based and activation-rate-based selections are iteratively performed to obtain all candidate FFs to be replaced. In the processing element (PE) array circuits of baseline CNN accelerator, we select 28 FFs covering 874 paths and 59 FFs with 60% activation rates among a total of 831 FFs on 874 paths. Finally, the proposed MAHS method chooses 39 FFs, thus reducing 25 FFs with 3.5% area and 2.17% power savings, compared with the common method choosing endpoints of critical paths with a timing slack smaller than 10% of the clock period.
Figure 8. Illustration of the MAHS sifting monitor points.
Iizuka et al. proposed a stochastic framework to estimate the MTTF constraint by modeling the circuit operation as a continuous-time Markov process [32]. The state transition probability denoted as Pi,j (s, t) that the circuit is in state i at time s and will stay in state j at time t is given by:
p i , j ( s , t ) = P ( X ( t ) = j X ( s ) = i )
In the case of a stationary Markov process, pi,j (s,t) can be simply expressed as pi,j (t). Q-matrix using qi,j (the transition rate of the leaving state i) is expressed by:
Q = - q 1 , 1 ( t ) q 1 , 2 ( t ) q 1 , N s t a t e ( t ) q 2 , 1 ( t ) - q 2 , 2 ( t ) q 2 , N s t a t e ( t ) q N s t a t e , 1 ( t ) q N s t a t e , 2 ( t ) - q N s t a t e , N s t a t e ( t )
Let t denote the eigenvalue matrix of Q-matrix, and U denotes the corresponding eigenvector matrix of Q-matrix. Then, the matrix of state transition probability can be expressed by:
P ( t ) = p 1 , 1 ( t ) p 1 , 2 ( t ) p 1 , N s t a t e ( t ) p 2 , 1 ( t ) p 2 , 2 ( t ) p 2 , N s t a t e ( t ) p N s t a t e , 1 ( t ) p N s t a t e , 2 ( t ) p N s t a t e , N s t a t e ( t ) = U Λ ( t ) U 1
The state transition probability being at state fail at time t from the state valid, denoted by Pvalid,fail which is computed by (6), so the MTTF of a circuit can be calculated by:
M T T F = 0 t d p v a l i d , f a i l ( t ) d t   d t
To further verify the effectiveness of the MAHS algorithm, we also applied it to the ISCAS’89 benchmark circuits [33], in addition to the PE array of baseline 40 nm CNN accelerator. The comparison results are listed in Table 4, where the common selection method selects FF endpoints of critical paths with a timing slack smaller than 10% of the clock period. The comparison results indicate that the proposed selection method can perform better area overhead saving implemented in larger test processors with complicatedly interlaced data paths. Furthermore, the proposed ETFFs inserted in circuits using the MAHS method can obtain an area reduction of 2.7–29.8% and save 5.65% power, compared with the RFF design [13] using the common selection method.
Table 4. The number of monitored points selected by different methods in ISCAS’89 benchmark circuits [33] and the circuit of a CNN accelerator.

4. Application and Performance Analysis

The structure design and operating principle details of the proposed light-weight timing error-tolerant design, namely, ETFF, have been described in Section 3. To verify the effectiveness of area and power savings, we applied the proposed ETFF design in a CNN accelerator. Moreover, the circuit-level comparison details with other EDAC designs are discussed.

4.1. Experiment Setup

The circuit of a CNN accelerator based on the classic LeNet-5 model [34] for digit classification is implemented as a baseline circuit by using the SMIC 40 nm process. This baseline circuit consists of a 4 × 4 processing element (PE) array, external and internal memory units (input and output FIFO and weight buffers), data transfer bus and parameter configuration unit. Each PE circuit is composed of a 16-bit fixed multiplier and adder (1/3/12 fixed) and the input and output registers built based on the structure of the TGFF. The proposed ETFF has been inserted in the circuit of data paths by replacing an original TGFF, as shown in Figure 1. The parameters of this baseline CNN model are trained by Python with 10,000 images in the MNIST dataset. The accuracy of classification inferred by using accurate adders is 98.73%.
The hardware prototype of the baseline accelerator is implemented in RTL Verilog and synthesized using the Synopsys Design Compiler. The layout of the proposed ETFF design is generated by using the Cadence Virtuoso, following the standard cell design rules defined by the SMIC 40 nm process technology, as shown in Figure 9. Moreover, buffers are added for input signals and a load of a fanout-of-4 inverter (FO4) is used at the output, to simulate a real environment. The output load of the FO4 is also considered for power and delay evaluation. The parasitic parameters netlist is extracted by the Mentor Graphics Calibre. The ETFF cell has been inserted into the standard cell library, after the post-layout simulation has been conducted. The STA and VCS simulations are performed to analyze the static and dynamic timing.
Figure 9. Layouts for the proposed (a) NTSD and (b) ETFF.

4.2. Performance Analysis

The EDAC functions and performances of delay, switching energy and average power of the proposed ETFF design are evaluated by using HSPICE simulator under scaling supply voltages, which have been discussed in Section 3. Furthermore, to verify the robustness, exhaustive 10 k Monte Carlo (MC) simulations with 3-sigma process variation are performed for a wide voltage range of 0.2–1.1 V and the frequency range of 0.5–10 K MHz. The timing waveforms of main signals are displayed in Figure 10, where the transitions of the input signal D from logic “0” to “1” and from logic “1” to “0” are all introduced. Figure 10 presents the 10 K MC results for the voltage of 1.1, 0.6, 0.4 and 0.3 V at a frequency of 500, 100, 5, and 1 MHz, respectively. When the voltage is scaled to 0.3 V at the frequency of 1 MHz, there is enough timing margin, allowing further increase in operation frequency or throughout. However, significant noises appear in the FVDD signal and the error signal, as shown in Figure 10d. These noises will affect the EDAC function and the output signal, if the supply voltage is further reduced. The simulation results indicate that the lowest operating voltage of the ETFF can be scaled to 0.3–0.6 V.
Figure 10. The results of 10 K MC simulations: (a) at 1.1 V, 500 MHz; (b) at 0.6 V, 100 MHz; (c) at 0.4 V, 5 MHz; (d) at 0.3 V, 1 MHz.
By replacing original FFs at monitored points selected by using the proposed MAHS method, 39 ETFFs are inserted into the PE array circuits of a CNN accelerator. Voltage scaling is also performed on CNN accelerator circuits to estimate the effectiveness and efficiency of the proposed ETFF design. The energy saving of up to 55.27% compared with the baseline circuit has been obtained without any loss in classification accuracy, when the operation voltage is scaled down to 0.5 V at the operating frequency of 100 MHz.
Table 5 shows the characteristics of the proposed ETFF design and other EDAC designs applied in NN accelerators. In comparison with other EDAC designs, the proposed ETFF causes a small area overhead of only 3.5%, because it uses only nine extra transistors and less monitored points. Although the design in [16] based on the TEC method of the DSTB [15] and TB [22] has less area overhead compared with ours, the proposed design brings the largest energy saving (55.27% overall energy saving at 0.5 V), benefiting from light-weight design and voltage scaling. Moreover, the proposed design reduces area overhead by 54.68% and improves energy efficiency by 53.69% at 0.6 V, compared with the design in [13], as discussed in Section 3.2.
Table 5. The characteristics of the proposed ETFF design and other EDAC designs applied in NN accelerators.

5. Conclusions

In this paper a light-weight timing error detection and correction circuit design, namely, ETFF, is proposed to increase energy efficiency by scaling supply voltages down to the near-threshold voltage region. This transistor-level design utilizes a node transition signal detector with only nine transistors to detect timing errors. These errors can be immediately recovered by data selection based on the proposed error correction design during the same clock cycle. Moreover, transistor sizing is used to optimize the trade-off between performance and overheads and enable the ETFF to stably work in a wider voltage range of 1.1–0.3 V. Furthermore, monitored points are selected by using the proposed MAHS method that simultaneously considers the coverage and activation rates of all flip-flops instead of only those on circuit paths with a timing slack smaller than 10–20% of the clock period. A baseline CNN accelerator using the SMIC 40 nm process can reliably operate under near-threshold voltages, benefiting from the proposed design and leading to 55.27% overall energy saving at 0.5 V. Additionally, the power overhead of timing error-tolerant circuits can also be considered in the selection of monitored points in further work. The proposed light-weight design can be more efficient in saving energy for larger circuits. As an example, deep neural network accelerators with a massive number of layers and weights that have to be recurrently calculated would benefit from the proposed design and will be considered in future work.

Author Contributions

Conceptualization, X.F. and H.L. (Hao Liu); methodology, X.F.; software, X.F. and H.L. (Hongwei Li); validation, J.H., X.F. and H.L. (Hongwei Li); formal analysis, J.H., X.F. and H.L. (Hongwei Li); investigation, X.F.; resources, H.L. (Hao Liu) and S.L.; data curation, X.F.; writing—original draft preparation, X.F.; writing—review and editing, X.F., H.L. (Hongwei Li); S.L. and J.H.; visualization, X.F.; supervision, J.H. and H.L. (Hao Liu); project administration, H.L. (Hao Liu) and X.F.; funding acquisition, H.L. (Hao Liu) and J.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the fundamental research funds for the central universities, grant number 3206002204C3 and the Natural Sciences and Engineering Research Council (NSERC) of Canada, grant number RES0048688. The APC was funded by Project 8506006040.

Acknowledgments

This work is supported by the fundamental research funds for the central universities under Project 3206002204C3 and the Natural Sciences and Engineering Research Council (NSERC) of Canada under Project RES0048688. Xuemei Fan is supported financially by the state-sponsored scholarship program administered by the China Scholarship Council (CSC).

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Dreslinski, R.G.; Wieckowski, M.; Blaauw, D.; Sylvester, D.; Mudge, T. Near-threshold computing: Reclaiming Moore’s Law Through Energy Efficient Integrated Circuits. Proc. IEEE 2010, 98, 253–266. [Google Scholar] [CrossRef]
  2. Whatmough, P.N.; Lee, S.K.; Brooks, D.; Wei, G.-Y. DNN Engine: A 28-nm Timing-Error Tolerant Sparse Deep Neural Network Processor for IoT Applications. IEEE J. Solid-State Circuits 2018, 53, 2722–2731. [Google Scholar] [CrossRef]
  3. Kim, S.; Cerqueira, J.P.; Seok, M. A Near-Threshold Spiking Neural Network Accelerator with a Body-Swapping-Based In Situ Error Detection and Correction Technique. IEEE Trans. Very Large Scale Integr. VLSI Syst. 2019, 27, 1886–1896. [Google Scholar] [CrossRef]
  4. Agwa, S.; Yahya, E.; Ismail, Y. ERSUT: A Self-Healing Architecture for Mitigating PVT Variations without Pipeline Flushing. IEEE Trans. Circuits Syst. II Express Briefs 2016, 63, 1069–1073. [Google Scholar] [CrossRef]
  5. Shin, D.; Choi, W.; Park, J.; Ghosh, S. Sensitivity-Based Error Resilient Techniques with Heterogeneous Multiply–Accumulate Unit for Voltage Scalable Deep Neural Network Accelerators. IEEE J. Emerg. Sel. Top. Circuits Syst. 2019, 9, 520–531. [Google Scholar] [CrossRef]
  6. Zhang, J.; Rangineni, K.; Ghodsi, Z.; Garg, S. Thundervolt: Enabling Aggressive Voltage Underscaling and Timing Error Resili-ence for Energy Efficient Deep Learning Accelerators. In Proceedings of the 55th Annual Design Automation Conference, San Francisco, CA, USA, 24–28 June 2018. [Google Scholar]
  7. Pandey, P.; Basu, P.; Chakraborty, K.; Roy, S. GreenTPU: Improving Timing Error Resilience of a Near-Threshold Tensor Pro-cessing Unit. In Proceedings of the 56th ACM/IEEE Design Automation Conference (DAC), Las Vegas, NV, USA, 2–6 June 2019. [Google Scholar]
  8. Zhang, J.; Ghodsi, Z.; Garg, S.; Rangineni, K. Enabling Timing Error Resilience for Low-Power Systolic-Array Based Deep Learning Accelerators. IEEE Des. Test 2019, 37, 93–102. [Google Scholar] [CrossRef]
  9. Whatmough, P.N.; Lee, S.K.; Lee, H.; Rama, S.; Brooks, D.; Wei, G. A 28 nm SoC with a 1.2 GHz 568nJ/Prediction Sparse Deep-Neural-Network Engine with >0.1 Timing Error Rate Tolerance for IoT Applications. In Proceedings of the IEEE International Solid-State Circuits Conference (ISSCC), San Francisco, CA, USA, 5–9 February 2017; pp. 242–243. [Google Scholar]
  10. Ghosh, A.; Naseem, M.S.; Kumar, C.I. Time-Borrowing Flip-Flop Architecture for Multi-Stage Timing Error Resilience in DVFS Processors. In Proceedings of the 2021 International Conference on Intelligent Technologies (CONIT), Hubli, India, 25–27 June 2021. [Google Scholar] [CrossRef]
  11. Fan, X.; Wang, R.; Zeng, Q.; Liu, H.; Lu, S. A Simple Steady Timing Resilient Sample Based on Delay Data Sense Detection. In Proceedings of the 2019 IEEE 13th International Conference on ASIC (ASICON), Chongqing, China, 29 October–1 November 2019. [Google Scholar] [CrossRef]
  12. Bull, D.; Das, S.; Shivashankar, K.; Dasika, G.S.; Flautner, K.; Blaauw, D. A Power-Efficient 32 bit ARM Processor Using Timing-Error Detection and Correc-tion for Transient-Error Tolerance and Adaptation to PVT Variation. IEEE J. Solid-State Circuits 2010, 46, 18–31. [Google Scholar] [CrossRef]
  13. Das, S.; Roberts, D.; Lee, S.; Pant, S.; Blaauw, D.; Austin, T.; Flautner, K.; Mudge, T. A Self-Tuning DVS Processor using Delay-error Detection and Correction. IEEE J. Solid-State Circuits 2006, 41, 792–804. [Google Scholar] [CrossRef]
  14. Sharma, P.; Das, B.P. Design and Analysis of Leakage-Induced False Error Tolerant Error Detecting Latch for Sub/Near-Threshold Applications. IEEE Trans. Device Mater. Reliab. 2020, 20, 366–375. [Google Scholar] [CrossRef]
  15. Bowman, K.A.; Tschanz, J.W.; Kim, N.S.; Lee, J.C.; Wilkerson, C.B.; Lu, S.-L.L.; Karnik, T.; De, V.K. Energy-Efficient and Metastability-Immune Resilient Circuits for Dynamic Variation Tolerance. IEEE J. Solid-State Circuits 2008, 44, 49–63. [Google Scholar] [CrossRef]
  16. Lee, S.K.; Whatmough, P.N.; Brooks, D.; Wei, G.Y. A 16-nm Always-On DNN Processor with Adaptive Clocking and Multi-Cycle Banked SRAMs. IEEE J. Solid-State Circuits 2019, 54, 1982–1992. [Google Scholar] [CrossRef]
  17. Zhang, H.; He, W.; Sun, Y.; Seok, M. An Area-Efficient Scannable In Situ Timing Error Detection Technique Featuring Low Test Overhead for Resilient Circuits. In Proceedings of the 2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD), Munich, Germany, 1–4 November 2021. [Google Scholar] [CrossRef]
  18. Sato, T.; Kunitake, Y. A Simple Flip-Flop Circuit for Typical-Case Designs for DFM. In Proceedings of the 8th International Symposium on Quality Electronic Design (ISQED’07), San Jose, CA, USA, 26–28 March 2007; pp. 539–544. [Google Scholar] [CrossRef]
  19. Zhang, J.; Garg, S. FATE: Fast and Accurate Timing Error Prediction Framework for Low Power DNN Accelerator Design. In Proceedings of the 2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), San Diego, CA, USA, 5–8 November 2018. [Google Scholar]
  20. Jain, A.; Veggetti, A.M.; Crippa, D.; Benfante, A.; Gerardin, S.; Bagatin, M. Radiation Tolerant Multi-Bit Flip-Flop System with Embedded Timing Pre-Error Sensing. IEEE J. Solid-State Circuits 2022, 57, 2878–2890. [Google Scholar] [CrossRef]
  21. Uytterhoeven, R.; Dehaene, W. Design Margin Reduction Through Completion Detection in a 28-nm Near-Threshold DSP Processor. IEEE J. Solid-State Circuits 2021, 57, 651–660. [Google Scholar] [CrossRef]
  22. Choudhury, M.; Chandra, V.; Mohanram, K.; Aitken, R. TIMBER: Time Borrowing and Error Relaying for Online Timing Error Resilience. In Proceedings of the DATE, Dresden, Germany, 8–12 March 2010. [Google Scholar] [CrossRef]
  23. Hao, Z.; Xiang, X.; Chen, C.; Meng, J.; Ding, Y.; Yan, X. EDSU: Error Detection and Sampling Unified Flip-Flop with Ultra-Low Overhead. IEICE Electron. Express 2016, 13, 20160682. [Google Scholar] [CrossRef]
  24. Zhang, Y.; Khayatzadeh, M.; Yang, K.; Saligane, M.; Pinckney, N.; Alioto, M.; Blaauw, D.; Sylvester, D. iRazor: Current-Based Error Detection and Correction Scheme for PVT Varia-tion in 40-nm ARM Cortex-R4 Processor. IEEE J. Solid-State Circuits 2017, 53, 619–631. [Google Scholar] [CrossRef]
  25. Zhou, J.; Liu, X.; Lam, Y.H.; Wang, C.; Chang, K.H.; Lan, J.; Je, M. HEPP: A New In-Situ Timing-Error Prediction and Prevention Technique for Variation-Tolerant Ultra-Low-Voltage Designs. In Proceedings of the IEEE Asian Solid-State Circuits Conference (A-SSCC), Singapore, 11–13 November 2013; pp. 129–132. [Google Scholar]
  26. Shan, W.; Shang, X.; Shi, L.; Dai, W.; Yang, J. Timing Error Prediction AVFS With Detection Window Tuning for Wide-Operating-Range ICs. IEEE Trans. Circuits Syst. II Express Briefs 2017, 65, 933–937. [Google Scholar] [CrossRef]
  27. Fan, X.; Li, H.; Li, Q.; Wang, R.; Liu, H.; Lu, S. A Light-Weight Timing Resilient Scheme for Near-Threshold Efficient Digital ICs. In Proceedings of the IEEE Asia Pacific Conference on Circuits and Systems (APCCAS), Ha Long, Vietnam, 8–10 December 2020; pp. 133–136. [Google Scholar]
  28. Markovic, D.; Nikolic, B.; Brodersen, R.W. Analysis and Design of Low-Energy Flip-Flops. In Proceedings of the IEEE International Symposium on Low Power Electronics and Design (ISLPED), Huntington Beach, CA, USA, 6–7 August 2001; pp. 52–55. [Google Scholar]
  29. Markovic, D.; Wang, C.C.; Alarcon, L.P.; Liu, T.-T.; Rabaey, J.M. Ultralow-Power Design in Near-Threshold Region. Proc. IEEE 2010, 98, 237–252. [Google Scholar] [CrossRef]
  30. Maheshwari, N.; Sapatnekar, S. Timing Analysis and Optimization of Sequential Circuits; Springer Science & Business Media: Berlin/Heidelberg, Germany, 1998. [Google Scholar]
  31. Zhou, J.; Jayapal, S.; Busze, B.; Huang, L.; Stuyt, J. A 40 nm Dual-Width Standard Cell Library for Near/Sub-Threshold Operation. IEEE Trans. Circuits Syst. I Regul. Pap. 2012, 59, 2569–2577. [Google Scholar] [CrossRef]
  32. Iizuka, S.; Masuda, Y.; Hashimoto, M.; Onoye, T. Stochastic Timing Error Rate Estimation Under Process and Temporal Variations. In Proceedings of the IEEE International Test Conference (ITC), Anaheim, CA, USA, 6–8 October 2015. [Google Scholar]
  33. Brglez, F.; Bryan, D.; Kozminski, K. Combinational Profiles of Sequential Benchmark Circuits. In Proceedings of the IEEE Inter-national Symposium on Circuits and Systems, Portland, OR, USA, 8–11 May 1989; pp. 1929–1934. [Google Scholar]
  34. Lecun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based Learning Applied to Document Recognition. Proc. IEEE 1998, 86, 2278–2324. [Google Scholar] [CrossRef] [Green Version]
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.