Path Specific Register Design to Reduce Standby Power Consumption

Salman, Emre; Qi, Qi

doi:10.3390/jlpea1010131

Open AccessArticle

Path Specific Register Design to Reduce Standby Power Consumption

by

Emre Salman

^* and

Qi Qi

Department of Electrical and Computer Engineering, Stony Brook University, Stony Brook, NY 11794, USA

^*

Author to whom correspondence should be addressed.

J. Low Power Electron. Appl. 2011, 1(1), 131-149; https://doi.org/10.3390/jlpea1010131

Submission received: 25 November 2010 / Revised: 11 April 2011 / Accepted: 13 April 2011 / Published: 15 April 2011

(This article belongs to the Special Issue Selected Topics in Low Power Design - From Circuits to Applications)

Download

Browse Figures

Versions Notes

Abstract

:

A methodology is proposed to design low leakage registers by considering the type of timing path, i.e., short or long, and type of register, i.e., launching or capturing. Three different dual threshold voltage registers are developed where each register trades, depending upon the timing path, a different timing constraint for reducing the leakage current. For example, the first proposed register is used as a launching register in a noncritical path, trading clock-to-Q delay for leakage current. Other timing constraints such as setup and hold times are maintained the same not to introduce any timing violations. Alternatively, the second and third registers, trade, respectively, setup time and hold time for leakage current while maintaining clock-to-Q delay constant. The effect of the proposed methodology on leakage current is investigated for four technology nodes. The overall reduction in the leakage current of a register can exceed 90% while maintaining the clock frequency and other design parameters such as area and dynamic power the same. Three ISCAS 89 benchmark circuits are utilized to evaluate the methodology, demonstrating, on average, 23% reduction in the overall leakage current.

Keywords:

leakage current; low leakage register design; power consumption; static power; timing constraints; timing paths

1. Introduction

Power dissipation is a primary limitation to further expand the capabilities of modern CMOS integrated circuits. Miniaturization of the physical dimensions and advanced manufacturing technologies such as 3-D integration [1] and system-in-package [2] have tremendously increased the integration capability where power consumption has become the primary design barrier. A wide range of applications such as high performance microprocessors, ASICs, and systems-on-chip suffer from this limitation.

Multicore architectures have been proposed to maintain the clock frequency constant, thereby preventing the increase in power consumption [3,4]. Unfortunately, only the dynamic power is affected by the clock frequency whereas the overall static power continues to increase due to higher leakage current.

Traditionally, technology scaling has relied on enhancing the drive current capability by reducing the channel length and gate oxide thickness. Power supply voltage has also been reduced to satisfy reliability constraints. Decreasing the power supply voltage requires the threshold voltage to be also reduced to maintain high drive current capability. The reduction of the threshold voltage, however, exponentially increases the subthreshold leakage current [5]. Similarly, a reduction in the gate oxide thickness exponentially increases the mechanical tunneling of the carriers through the oxide, producing significant gate leakage current [6].

More than 40% of the total energy in the active mode can be dissipated due to idle transistors in modern systems-on-chip [7–9]. Furthermore, leakage current is the dominant source of energy consumption when the IC is in the idle mode, significantly degrading the battery life in portable devices.

ITRS identifies leakage power consumption as “a clear long term threat and a focus topic for design technology in the next 15 years” [10]. Projections of the overall power dissipation within an IC are plotted in Figure 1 based on ITRS predictions.

The contribution of the static and dynamic power are separately highlighted, assuming a switching activity of 0.5 and constant clock frequency in each technology node. As illustrated in this figure, overall static power dominates dynamic power in deep submicrometer CMOS technologies. High variability of the leakage current due to process variations further exacerbates this issue [11].

The development of alternative gate dielectric materials with higher permittivity, i.e., high-K dielectric, and metal gate transistors permit thicker dielectric layers, significantly reducing the gate leakage current [12,13]. The continuation of technology scaling below 45 nm have been possible partly due to this progress at the device level. As the gate leakage current has been significantly reduced, subthreshold leakage has become the dominant component for static power dissipation.

Various methodologies have been proposed to alleviate subthreshold leakage current consumption such as multi-threshold voltage CMOS (MTCMOS), also referred to as power gating [14], dynamic adjustment of the threshold voltage through body biasing [15], and multi-threshold voltage transistors, also referred to as dual threshold voltage (dual-V_th) partitioning [16]. These existing approaches have several limitations, particularly for low leakage register design, as further described in Section 3.

A comprehensive methodology is proposed in this paper to design path specific dual-V_th, low leakage registers while simultaneously considering clock-to-Q delay, setup time, hold time, type of timing path (short or long), and type of register (launching or capturing). Existing dual-V_th based registers reduce the leakage current only along the feedback path to not affect the timing constraints [17–19]. This traditional approach significantly limits the amount of leakage that can be reduced, particularly in sub 22 nm CMOS technologies. Furthermore, in conventional approaches, the hold time of the register may be affected which may produce a timing violation depending upon the type of timing path and register. These limitations of the existing approaches are overcome with the proposed design methodology while significantly increasing the amount of leakage current that is reduced.

The rest of the paper is organized as follows. Exiting multi-threshold voltage based leakage reduction techniques are summarized in Section 2. Background material reviewing different types of timing paths and timing constraints of a register are provided in Section 3. A methodology is described in Section 4 to design path specific registers with low leakage current. The results are discussed in Section 5. Finally, the paper is concluded in Section 6.

2. Previous Work

Existing techniques to reduce leakage current are summarized in this section with an emphasis on multi-threshold voltage design. Related limitations of these techniques are also discussed.

MTCMOS is a commonly used leakage reduction technique where a high threshold voltage (high-V_th) sleep transistor is placed between the circuit and power supply and/or ground node, as shown in Figure 2.

When the circuit operates in the idle mode, high-V_th sleep transistor is cutoff, disconnecting the circuit from the power supply voltage and/or ground node. During the active mode, the sleep transistor is on and the combinational circuit consisting of low threshold voltage (low-V_th) transistors operates normally. The drain of the sleep transistor is referred to as virtual power (if the sleep transistor is placed between the circuit and power supply) and virtual ground (if the sleep transistor is placed between the circuit and ground node). Subthreshold leakage current is reduced during the idle mode since the sleep transistor behaves as a large resistance between the combinational circuit and power supply and/or ground node.

There are however several limitations of MTCMOS. When the mode of operation changes from idle to active, the circuit requires a specific amount of time to charge the virtual power node or discharge the virtual ground node. This required time is referred to as wake up latency [20]. Several clock cycles are typically required for the virtual ground or power to stabilize. Furthermore, the circuit may experience ground bounce during this time, affecting the reliable operation of nearby logic circuits.

Another limitation of MTCMOS that is more related to this paper is its application to memory elements such as a register. MTCMOS cannot be directly applied to a register since the state of the register should be preserved even when the register is in the idle mode. In conventional MTCMOS, however, the idle circuit is disconnected from the power supply voltage and the state of the circuit is lost. Several different versions of MTCMOS have been developed specifically for register design to alleviate this issue [8,14,21–23]. These techniques, however, require additional inverters and transmission gates, decreasing the amount of power that can be reduced while also increasing the overall area.

Exploiting the dependence of the threshold voltage on bulk potential has also been proposed to dynamically adjust the threshold voltage, referred to as adaptive body biasing [15]. During idle mode, the substrate of the circuit is reverse biased to increase the threshold voltage, thereby reducing the leakage current. The primary drawback of this methodology is to generate the bias voltage for the substrate in a power efficient way. A control circuitry is also required, further decreasing the power efficiency.

Another technique to reduce the leakage current is based on utilizing the multi-threshold voltage transistors that are provided by the manufacturing technology. This technique is also referred to as dual-V_th partitioning [24]. Those logic gates that are not part of the critical path are replaced with high-V_th transistors to reduce the leakage current by exploiting the excessive slack. Alternatively, those gates along the critical path are implemented with low-V_th transistors to satisfy the timing constraints, as depicted in Figure 3.

A similar approach has been developed to design the registers. Those transistors that are not located along the clock-to-Q delay path have been replaced with high-V_th devices to reduce the leakage current within a register [17–19]. Unfortunately, in these existing approaches, the number of high-V_th transistors is sufficiently small, limiting the overall reduction in the leakage current. Furthermore, since these transistors are not located along the clock-to-Q delay path, the size of these transistors is typically small. Alternatively, those transistors that are located along the clock-to-Q delay path are typically sized larger, making leakage current more significant in these transistors. Another important limitation of the existing approaches is the inability to consider important timing constraints such as setup and hold times. The type of timing path, i.e., short or long, and the type of register, i.e., launching or capturing, significantly affect the design process of low leakage registers, as demonstrated in this paper. Ignoring these effects not only decreases the amount of leakage current that can be reduced, but may also affect reliable circuit operation since the timing constraints may be violated. Thus, application of dual-V_th partitioning to the design process of a register requires additional attention. A methodology is proposed in this paper to design dual-V_th, low leakage registers by simultaneously considering the clock-to-Q delay, setup time, hold time, and the type of register and timing path. The simultaneous consideration of these parameters is critical to exploit multi-threshold voltage transistors and to guarantee system functionality and timing in deep submicrometer CMOS technologies.

3. Background

Timing characteristics of synchronous systems are briefly introduced in Section 3.1. The timing constraints of a register, i.e., setup and hold times, are reviewed in Section 3.2.

3.1. Timing Characteristics of Synchronous Systems

A simple synchronous digital circuit consisting of two sequentially-adjacent registers with a combinational circuit between these registers is shown in Figure 4.

The first register is referred to as launching register whereas the second register is called capturing register.

Two inequalities should be satisfied for this circuit to function properly [25]. Referring to Figure 4, the first inequality is

T_{Cf} + T_{CP} \geq T_{Ci} + T_{D} + T_{S}

(1)

where T_Ci and T_Cf are the delay for the clock signals to arrive, respectively, at the launching and capturing registers. Note that T_Ci and T_Cf are also referred to as, respectively, the delay of the clock launch path and clock capture path. T_CP is the clock period. T_D is the data path delay consisting of the clock-to-Q delay of the launching register, logic delay of the combinational circuit, and the interconnect delay. T_S is the setup time of the capturing register. Note that (1) determines the maximum speed of the circuit, making this inequality important for critical paths.

The second inequality that needs to be satisfied is

T_{Ci} + T_{D} \geq T_{Cf} + T_{H}

(2)

where T_H is the hold time of the capturing register. This inequality guarantees that no race condition exists, i.e., the data is not latched to the final register within the same clock edge. Note that (2) is relatively more important for those timing paths where the data path delay is small, i.e., short paths, such as a shift register or counter.

These inequalities, the type of data path (short versus long), and the type of register (launching and capturing) play an important role in the design of low leakage, dual-V_th registers, as described in Section 4. The timing constraints of a register and related circuit level issues are described in the following section.

3.2. Timing Constraints of a Register

Inequalities (1) and (2) require a difference called a skew to be larger than or equal to a timing constraint. These inequalities, therefore, can be rewritten as [25]

Setup skew \geq T_{S}

(3)

Hold skew \geq T_{H}

(4)

where the setup skew and hold skew are, respectively

Setup skew = T_{Cf} + T_{CP} - (T_{Ci} + T_{D})

(5)

Hold Skew = T_{Ci} + T_{D} - T_{Cf}

(6)

Note the important difference between setup-hold skews and setup-hold times: Setup and hold skews refer to any time difference between the data and clock signals whereas the setup and hold times refer to the minimum required time difference to reliably capture and store the data.

Transistor level realization of a widely used master slave type, edge triggered register is illustrated in Figure 5.

According to the setup time constraint, the data signal should be stable at the input of a register for a sufficient amount of time before the active edge of the clock signal. In the example shown in Figure 5, the active edge is a low-to-high transition of the clock signal since the data propagates to the output after this transition. Setup time guarantees that the data is reliably latched to the master before the rising edge of the clock signal arrives. Ideally, the data signal should propagate through TG1 and INV1, arriving at the output of INV1 before the rising edge of the clock signal. According to this condition, the path that determines the setup time consists of TG1 and INV1, as depicted in Figure 5. This condition, however, may require a relatively large setup time. A conventional technique to characterize the setup time constraint of a register is to examine the setup skew versus clock-to-Q delay relationship, as shown in Figure 6(a) [25–27].

The smallest setup skew that corresponds to the nominal clock-to-Q delay is approximately equal to the summation of the two delays: TG1 and INV1. As the setup skew is further reduced, clock-to-Q delay gradually increases since for smaller setup skews, the data signal cannot reach to the output of INV1. After a specific point, the clock-to-Q delay starts to exponentially increase due to a race condition at node r since this node is simultaneously driven by two gates: TG1 and TG2. The race condition occurs between the new data driven by TG1 and old data driven by TG2. This region is referred to as metastable and therefore avoided during the characterization process. Typically, a 10% degradation in clock-to-Q delay is allowed while characterizing the setup time, as shown in Figure 6(a).

According to the hold time constraint, the data signal should be stable at the input of a register for a sufficient amount of time after the active edge of the clock signal. This constraint is due to non-ideal characteristics of TG1 as a switch. If the hold time constraint is not satisfied, the new data can be latched into the register and overwrite the previous valid data during the same clock cycle. Note that hold time can sometimes be smaller than zero. In this case, even if the new data propagates through TG1, a race condition exists at node r between the new and old data. If the old data succeeds over the new data, the register works correctly and the negative hold time is valid. The hold time constraint is therefore partly determined by the relative drive strengths of TG1 and TG2. Note that, if the hold time is further reduced, the clock-to-Q delay exponentially increases, as shown in Figure 6(b). Similar to setup time characterization, a 10% degradation in clock-to-Q delay is allowed while characterizing the hold time.

These timing constraints (setup and hold times) and clock-to-Q delay play an important role in the design process of low leakage, dual-V_th registers. When specific transistors within a register are replaced with high-V_th devices to reduce leakage current, the timing constraints may change. Ignoring this effect may produce timing violations, causing a degradation in clock frequency or functional failure. The proposed methodology overcomes this limitation, as described in the following section.

4. Proposed Methodology

As described in Section 2, existing work on dual-V_th based register design does not consider different types of data paths and registers. Referring to Figure 5, a typical approach is to design TG1, INV1, TG3, and INV3 with low-V_th transistors to improve the setup time and clock-to-Q delay. The remaining inverters and transmission gates that are located along the feedback path are designed with high-V_th devices to minimize the leakage current. This approach, however, is not practical for all of the timing paths. For example, in a short path, reduced clock-to-Q delay may not be desirable according to the second inequality determined by (2). The amount leakage current that can be reduced is also limited since all of the transistors located along the forward signal path, i.e., within TG1, INV1, TG3, and INV3, are low-V_th devices. Note that these transistors are typically sized larger to minimize clock-to-Q delay and setup time. The leakage current is therefore relatively more important for these transistors as compared to those that are located along the feedback paths.

The design process of a dual-V_th, low leakage register is therefore strongly dependent upon the type of data path, i.e., long (critical), noncritical, and short; and type of register, i.e., launching or capturing, as illustrated in Figure 4. Three different types of dual-V_th registers that consider these dependencies are proposed in this paper, as described in Section 4.1. Assigning the proper threshold voltage to each transistor within these registers are discussed in Section 4.2. The amount of leakage that can be reduced by utilizing the proposed registers is evaluated in Section 4.3. Finally, simulation results based on three ISCAS 89 benchmark circuits are provided in Section 4.4.

4.1. Path Specific Dual-V_th Register Design

The type of timing path and register should be considered during the design process of a dual-V_th, low leakage register. Consider, for example, a launching register in a noncritical or short path. In this case, clock-to-Q delay of the register is not critical and therefore can be traded to reduce leakage current. Similarly, for a a capturing register in a noncritical or short path, (2) is the important inequality and the setup time of this register is not critical. Setup time therefore can be traded to achieve low leakage in a capturing register of a noncritical or short path. Existing techniques cannot utilize this opportunity since the transistors located along the clock-to-Q delay and setup path are realized with low-V_th devices. Finally, consider a capturing register in a critical path. In this case, the hold time is not critical since (1) is the important constraint. Hold time therefore can be traded to achieve low leakage in a capturing register of a critical or long path. Additional constraints, however, exist for each of these three cases to guarantee that both (1) and (2) are satisfied after specific transistors are replaced with high-V_th devices.

Three different types of dual-V_th registers are proposed depending on the type of data path and register, as summarized in Table 1 and described in the following:

Register 1

This register is designed to replace launching registers in noncritical or short paths. Since there is excessive setup slack in noncritical paths, the primary objective is to trade clock-to-Q delay for leakage current. Both setup and hold times of the register, however, should remain the same (or be reduced) since this register behaves as a capturing register for the previous data path, which may be a critical or short path. Thus, to guarantee that the timing characteristics of the previous path are not affected, the setup and hold times of the register should not increase.

Register 2

This register is designed to replace capturing registers in noncritical or short paths. Due to excessive setup slack, the primary objective is to trade setup time for leakage current. The clock-to-Q delay of the register, however, should remain the same (or be reduced) since this register behaves as a launching register for the following data path, which may be a critical path. Furthermore, the hold time should also remain the same (or be reduced) since for a short data path, (2) is critical. Note that this second register is sufficiently effective to reduce leakage current since the setup time is relatively more important in advanced technologies, as shown in Figure 7. According to this figure, starting 22 nm technology, setup time of the register is higher than the clock-to-Q delay. Thus, the opportunity to trade setup time for leakage current should not be overlooked. Note that the setup time has been characterized using the procedure described in Section 3.2.

Register 3

The third register is designed to replace capturing registers in critical paths. The primary objective is to trade hold time for leakage current since in a critical path, (1) is important and hold slack is typically large. The clock-to-Q delay should remain the same (or be reduced) since the register behaves as a launching register for the following data path, which may also be a critical path. Furthermore, the setup time should also remain the same (or be reduced) since for a critical path, (1) is important.

4.2. Threshold Voltage Assignment

An edge triggered D type flip-flop with 2X drive capability is chosen from an industrial standard cell library. The transistor level schematic of the register is illustrated in Figure 8, including the W/L ratios of each transistor.

Note that in the master latch, a tristate inverter is used that combines the TG1 and INV1 of Figure 5. Similarly, the feedback of the master latch also utilizes a tristate inverter. This schematic and W/L ratios are used in the simulations without any modification.

In the original version, the register shown in Figure 8 is designed using only low-V_th transistors. To design Register 1, high-V_th devices are used for those transistors located along the clock-to-Q delay path, i.e., M13, M14, M17, M18, M19, M20, M21, and M22. Clock-to-Q delay is therefore traded to reduce leakage current. Note that, the setup and hold times of the register remain the same since these transistor do not affect the timing constraints of the register.

To design Register 2, high-V_th transistors are used only for M2 and M3 to trade setup time for leakage current. Note that M5 and M6 are designed using low-V_th transistors even though this inverter is along the setup path, as illustrated in Figure 5. However, as described in the previous section, clock-to-Q delay and hold time of the register should remain the same. Replacing M5 and M6 with high-V_th transistors affects the clock-to-Q delay since this inverter drives the input of the slave latch.

Finally, to design Register 3, high-V_th transistors are used for M7, M8, M9, and M10 to trade hold time for leakage current. Note that the feedback path becomes weaker due to high-V_th transistors. As such, hold time increases since it is more difficult for the old data to overwrite the new data at the output of the first gate, thereby requiring a larger hold time constraint. Low-V_th devices are used for the remaining transistors to guarantee that the clock-to-Q delay and setup time remain the same. For example, M1, M2, M3, and M4 directly affect the setup time constraint and therefore designed with low-V_th transistors. Threshold voltage assignment of all of the transistors are listed in Table 2 for each register.

4.3. Reduction in the Leakage Current

The amount of reduction in the leakage current achieved by utilizing the proposed three registers is evaluated in this section. Four CMOS technology generations, 45 nm, 32 nm, 22 nm, and 16 nm, are considered using a predictive technology model [28,29].

The register illustrated in Figure 8 is simulated for each technology node where the W/L ratios of the transistors are maintained constant. The leakage current drawn from the power supply is evaluated for the three registers and the results are compared with the leakage current of the original register where only low-V_th transistors are used.

The results are illustrated in Figure 9. Note that for the first register, the state of the clock signal does not change the results since all of the high-V_th transistors are within the slave latch. For the second and third registers, however, high-V_th transistors exist within the tristate inverters. The state of the clock signal is therefore important in evaluating the results. For example, for the second register, clock signal should be at V_SS to guarantee that the initial tristate inverter is not in the high impedance state. Similarly, for the third register, clock signal should be at V_DD so that the second tristate inverter located along the feedback path is not in the high impedance state. The leakage current of the original register is therefore compared with the first two registers and third register when the clock signal is, respectively, at V_SS and V_DD.

The leakage current increases with technology, exhibiting a large jump in the 16 nm node. A significant amount of reduction in the leakage current, 79% on average, is achieved by the first register since the number of high-V_th transistors is higher, as listed in Table 2. The second register also achieves a considerable amount of reduction in the leakage current, 13% on average and higher below 32 nm technology nodes, since the importance of setup time has been increasing with technology, as depicted in Figure 7. The reduction in the leakage current obtained by the third register is relatively smaller, as further discussed in Section 5. All of the results are listed in Table 3 where the absolute reduction in the leakage current is also provided for each case.

The timing constraints (setup and hold times) and clock-to-Q delay of the three registers are characterized as described in Section 3.2. As listed in Table 4, all of the three registers satisfy the required timing constraints listed previously in Table 1.

Specifically, for the first register, setup and hold times are slightly reduced as compared to the original register whereas clock-to-Q delay increases, on average, by 24.6 ps to improve the leakage current. The required condition is therefore satisfied since the setup and hold times do not increase. For the second register, setup time increases, on average, by 13.3 ps to reduce the leakage current. Alternatively, clock-to-Q delay remains the same whereas hold time is reduced, thereby satisfying the required condition. Note that the hold time is reduced since M2 and M3 are high-V_th transistors in this register. It is therefore more difficult for the input data to propagate to the output of the first tristate inverter, requiring a shorter hold time. For the third register, setup time and clock-to-Q delay remain approximately the same whereas hold time increases, on average, by 1.7 ps to reduce the leakage current. The last register therefore also satisfies the required timing constraints.

4.4. Simulation Results

Three ISCAS 89 benchmark circuits, s27, s526, and s1423, are utilized in this section to better evaluate the efficacy of the proposed methodology on functional circuits rather than only on a register [30]. The total number of gates in these sequential circuits is, respectively, 8, 141, and 490 whereas the total number of registers is, respectively, 3, 21, and 74.

First, the leakage current of the circuits is analyzed when the registers are designed only with low-V_th transistors. In the second step, registers within each sequential circuit is replaced with the proposed registers based on the type of timing path. Since the critical paths are typically a small percentage of the overall circuit, Register 1 and Register 2 can be effectively utilized to trade, respectively, clock-to-Q delay and setup time for leakage power. In the last step, the methodology proposed in [17–19] is evaluated by replacing the low-V_th transistors along the feedback path of a register (M7 to M10, M15, and M16 in Figure 5) with high-V_th transistors. The overall reduction in leakage current is compared for each case in four different technologies. Note that the register illustrated in Figure 5 is used for all of the circuits. Predictive device models are used for each technology [28,29]. The analysis is performed using H-SPICE [31].

The results of the analysis are listed in Table 5.

As summarized in this table, the proposed methodology achieves a significant reduction in the overall leakage current. Average reduction over three circuits and four technologies is approximately 23%. Note that the overall reduction in the leakage current increases as the size of the circuit grows and the ratio of the number of registers to the overall number of gates increases. Also note that according to these results, the reduction achieved by the methodology described in [17] is negligible due to two reasons: (1) As illustrated in Figure 5, the feedback path of the master latch consists of a tristate inverter. Leakage current in a tristate inverter is significantly less than a regular inverter due to increased impedance between the power supply and ground; (2) The feedback path of the slave latch consists of only a transmission gate. The results provided in [17] assume a different register architecture, as shown in Figure 8. For this architecture, there is an inverter along the feedback path of both master and slave latches, thereby increasing the overall reduction in leakage. In this work, the register is chosen from an industrial cell library without any modification. Note that the proposed methodology achieves a higher reduction in leakage current as compared to [17] even for the register shown in Figure 8 since the number of high-V_th transistors is higher in the proposed dual-V_th registers. Also note that the effect of high-V_th transistors on setup and hold times is not considered in [17]. This effect can be significant since an unexpected increase in the setup or hold times can produce a timing violation, as described in Section 4.2.

5. Discussion and Future Study

According to the results presented in the previous section, the first register achieves the highest amount of reduction due to two reasons: (1) greatest number of high-V_th transistors are used in this register and (2) the width of these transistors is relatively high to reduce the clock-to-Q delay. The second register also achieves a reasonable amount of reduction whereas the reduction achieved by the third register is small (2.5% on average) due to two reasons: (1) the stack effect within the tristate inverter increases the standby impedance between the power supply voltage and ground node and (2) since this tristate inverter is located along the feedback path, the width of the transistors is smaller, decreasing the leakage current. Note however that this leakage reduction is achieved without degrading the clock frequency. Area and dynamic power also remain the same. Furthermore, the absolute leakage reduction achieved by the third register is 20 nA in the 16 nm technology node. Even though the percent reduction is small, when a large number of registers is considered, the absolute reduction can become in the range of milliamperes. When the first two registers are also considered, the overall savings in the standby power consumption of a register significantly increase.

Also note that, three dual-V_th registers have been proposed, each for a specific type of timing path (critical or noncritical) and register (launching or capturing), as listed in Table 1. Two additional registers that achieve enhanced reduction in the leakage current can be designed based on the proposed registers. Consider, for example, the first proposed register (launching in a critical path) which behaves as a capturing register for the previous path. If the previous path is also noncritical, as depicted in Figure 10, not only clock-to-Q delay, but also setup time can be traded to reduce the leakage current within this register.

In this case, the number of high-V_th transistors becomes higher, increasing the overall reduction in the leakage current. According to Table 3, the overall reduction, which corresponds to the summation of the reduction achieved by the first and second registers, exceeds 90% for sub 45 nm technology nodes. Alternatively, if the previous path is a critical path, not only clock-to-Q delay, but also hold time can be traded to reduce the leakage current. The overall reduction in this case is approximately equal to the summation of the reduction achieved by the first and third registers.

The primary disadvantage of the proposed methodology is the degradation in the robustness of a circuit. For example, the clock-to-Q delay of a launching register in a noncritical path is traded for the leakage current. Thus, the available timing slack of this data path is reduced. A reduced timing slack typically corresponds to a higher sensitivity to variations. The overall robustness is therefore degraded. Note however that this disadvantage is a common limitation in a large number of low power design techniques that rely on exploiting excessive slack.

Finally, also note that the results presented in this paper are based on a specific type of register. A similar methodology can be applied to other types of registers where clock-to-Q delay, setup, and hold times are traded to reduce the leakage current without affecting the clock frequency. The numerical results may change depending upon the transistor level design of a register. Effect of different register architectures on leakage reduction can therefore be investigated as future work. Application of the proposed methodology to pulsed latches also remains as a future study.

6. Conclusions

A methodology has been proposed to design low leakage registers, minimizing standby power dissipation. Traditional dual-V_th registers utilize high-V_th transistors only along the feedback path of the master and slave latches where the overall reduction in leakage current is limited. As opposed to existing techniques, a register design methodology that considers the type of timing path (short or long) and register (launching and capturing) is developed. Three different dual-V_th registers are introduced where the first register trades clock-to-Q delay for leakage current, achieving, on average, 79% reduction in leakage current. The second and third registers trade, respectively, setup time and hold time to further reduce the leakage current. Depending on the type of timing paths, the overall reduction in the leakage current of a register can exceed 90%. Furthermore, an average reduction of 23% in leakage current is demonstrated for three ISCAS 89 benchmark circuits. Clock frequency and other design parameters such as area and dynamic power remain the same.

Figure 1. Projections of the IC overall power dissipation normalized to 45 nm technology node, highlighting the dominance of static power over dynamic power.

Figure 2. Multi-threshold voltage CMOS (MTCMOS) design to reduce leakage current: (a) sleep transistor is placed between the circuit and power supply; (b) sleep transistor is placed between the circuit and ground node.

Figure 3. Dual threshold voltage partitioning to reduce leakage current while maintaining clock frequency the same.

Figure 4. Simple synchronous circuit consisting of a combinational logic and two types of registers: Launching and capturing.

Figure 5. Transistor level schematic of a widely used master slave type edge triggered register, illustrating the paths for clock-to-Q delay and setup time.

Figure 6. Timing constraint characterization for sequential cells: (a) setup skew versus clock-to-Q delay for setup time characterization, (b) hold skew versus clock-to-Q delay for hold time characterization.

Figure 7. Dependence of clock-to-Q delay and setup time of a register on technology.

Figure 8. Transistor level schematic of a master slave type, edge triggered register where the numbers represent the W/L ratio for each transistor. Three different dual-V_th, low leakage registers are designed based on this schematic.

Figure 9. Comparison of leakage current obtained from the original and proposed registers for four technology nodes: (a) absolute leakage current; (b) percent reduction in the leakage current.

Figure 10. Illustration of a register (R₂) that simultaneously behaves as a launching register of a noncritical path and a capturing register of the previous noncritical path.

Table 1. Timing characteristics of the proposed dual-V_th registers.

**Table 1.** Timing characteristics of the proposed dual-V_th registers.
	Timing Path	Register Type	Clock-to-Q Delay	Setup Time	Hold Time
Register 1	Noncritical	Launching	Larger	Same or less	Same or less
Register 2	Noncritical	Capturing	Same or less	Larger	Same or less
Register 3	Critical	Capturing	Same or less	Same or less	Larger

Table 2. Threshold voltage assignment of the three proposed registers.

**Table 2.** Threshold voltage assignment of the three proposed registers.
	Register 1	Register 2	Register 3
M1	low-V_th	low-V_th	low-V_th
M2	low-V_th	high-V_th	low-V_th
M3	low-V_th	high-V_th	low-V_th
M4	low-V_th	low-V_th	low-V_th
M7	low-V_th	low-V_th	high-V_th
M8	low-V_th	low-V_th	high-V_th
M9	low-V_th	low-V_th	high-V_th
M10	low-V_th	low-V_th	high-V_th
M13	high-V_th	low-V_th	low-V_th
M14	high-V_th	low-V_th	low-V_th
M17	high-V_th	low-V_th	low-V_th
M18	high-V_th	low-V_th	low-V_th
M19	high-V_th	low-V_th	low-V_th
M20	high-V_th	low-V_th	low-V_th
M21	high-V_th	low-V_th	low-V_th
M22	high-V_th	low-V_th	low-V_th

Table 3. Leakage current of the original and proposed registers for four technology nodes.

**Table 3.** Leakage current of the original and proposed registers for four technology nodes.
	Technology (nm)

	45	32	22	16
	Original register (CLK = V_SS)	57 nA	123 nA	658 nA	3813 nA
	Original register (CLK = V_DD)	53 nA	111 nA	585 nA	3413 nA
	1st register	11 nA	19 nA	137 nA	786 nA
Reduction (%)	79.2	82.9	76.6	77
Reduction (abs)	42 nA	92 nA	448 nA	2627 nA

2nd register	54 nA	109 nA	536 nA	3133 nA
Reduction (%)	5.3	11.4	18.5	17.8
Reduction (abs)	3 nA	14 nA	122 nA	680 nA

3rd register	50 nA	108 nA	580 nA	3393 nA
Reduction (%)	5.7	2.7	0.85	0.6
Reduction (abs)	3 nA	3 nA	5 nA	20 nA

Table 4. Clock-to-Q delay, and setup and hold times of the original and proposed registers for four technologies.

**Table 4.** Clock-to-Q delay, and setup and hold times of the original and proposed registers for four technologies.
		Technology(nm)

		45	32	22	16
		Original register	Clk to Q Delay (ps)	20	18.2	14.8	11.9
			Setup time (ps)	16.5	16.2	15.2	13.4
			Hold time (ps)	−10	−8.8	−6.3	−4.8

1st register	Clk to Q Delay (ps)	45	41	41.2	36
	Setup time (ps)	15	14.7	13	11.3
	Hold time (ps)	−11	−10.2	−8	−5.8

2nd register	Clk to Q Delay (ps)	20	18	14.8	11.9
	Setup time (ps)	29	28	29	28.6
	Hold time (ps)	−18	−16.6	−16.6	−14.7

3rd register	Clk to Q Delay (ps)	20	18.2	14.8	11.9
	Setup time (ps)	17	15	15	13.6
	Hold time (ps)	−7.8	−8	−4.7	−2.5

Table 5. Analysis and comparison of leakage current in three ISCAS 89 benchmark circuits.

**Table 5.** Analysis and comparison of leakage current in three ISCAS 89 benchmark circuits.
Circuit	Technology (nm)	Original	This Work	[17]
s27	45	270.6 nA	224.2 nA	262.3 nA
	32	585.3 nA	488.1 nA	576.9 nA
	22	3 μA	2.6 μA	2.9 μA
	16	17.5 μA	14.8 μA	17.4 μA

s526	45	2.4 μA	1.8 μA	2.3μA
	32	5.1 μA	3.7 μA	5 μA
	22	26.3 μA	19.6 μA	26.2 μA
	16	151 μA	111.1 μA	150.6 μA

s1423	45	8.5 μA	6.2 μA	8.3 μA
	32	18.2 μA	13.2 μA	17.9 μA
	22	93.1 μA	68.8 μA	92.7 μA
	16	535.1 μA	391.8 μA	534 μA

References

Pavlidis, V.F.; Friedman, E.G. Three-Dimensional Integrated Circuit Design; Morgan Kaufmann: Boston, MA, USA, 2009. [Google Scholar]
Tai, K.L. System-in-Package (SIP): Challenges and Opportunities. Proceedings of the ASP-DAC 2000, Asia and South Pacific, Yokohama, Japan, 25–28 January 2000; pp. 191–196.
Konstadinidis, G.K.; Tremblay, M.; Chaudhry, S.; Rashid, M.; Lai, P.F.; Otaguro, Y.; Orginos, Y.; Parampalli, S.; Steigerwald, M.; Gundala, S.; et al. Implementation of a Third-Generation 16-Core 32-Thread Chip-Multithreading SPARC Processor. Proceedings of the IEEE International Solid-State Circuits Conference, Lille, France, 30 December 2008; pp. 84–85.
Rusu, S.; Tam, S.; Muljono, H.; Stinson, J.; Ayers, D.; Chang, J.; Varada, R.; Ratta, M.; Kottapalli, S.; Vora, S. A 45 nm 8-Core Enterprise Xeon Processor. Proceedings of the IEEE International Solid-State Circuits Conference, Taipei, Taiwan, 22 December 2009; pp. 56–57.
Ferre, A.; Figueras, J. Characterization of Leakage Power in CMOS Technologies. Proceedings of the Electronics, Circuits and Systems 1998 IEEE International Conference, Lisboa, Portugal, 7–10 September 1998; pp. 185–188.
Taur, Y.; Wann, C.H.; Frank, D.J. 25 nm CMOS Design Considerations. Proceedings of the Electron Devices Meeting, 1998, IEDM '98 Technical Digest., International, San Francisco, CA, USA, 6–9 December 1998; pp. 789–792.
Kursun, V.; Friedman, E.G. Multi-Voltage CMOS Circuit Design; John Wiley & Sons: Hoboken, NJ, USA, 2006. [Google Scholar]
Jiao, H.; Kursun, V. Low-leakage and compact registers with easy-sleep mode. J. Low Power Electron. 2010, 6, 1–17. [Google Scholar]
Sery, G.; Borkar, S.; De, V. Life is CMOS: Why Chase the Lifer After. Proceedings of the 39th Design Automation Conference, New Orleans, LA, USA, 2002; pp. 78–83.
The ITRS Technology Working Groups. Homepage of International Technology Roadmap for Semiconductors (ITRS). 2009. Avaiable online: http://www.itrs.net/ (accessed on 15 April 2011).
Chang, H.; Sapatnekar, S.S. Prediction of leakage power under process uncertainties. ACM Trans. Design Autom. Electron. Syst. 2007, 12, 1–27. [Google Scholar]
Chandrakasan, A.; Bowhill, W.J.; Fox, F. Design of High-Performance Microprocessor Circuits; Wiley-IEEE Press: Hoboken, NJ, USA, 2000. [Google Scholar]
Plummer, J.D.; Griffin, P.B. Material and process limits in silicon vlsi technology. Proc. IEEE 2001, 89, 240–258. [Google Scholar]
Kao, J.; Chandrakasan, A. MTCMOS Sequential Circuits. Proceedings of the 27th European Solid State Circuits Conference, Villach, Austria, 2001; pp. 317–320.
Tschanz, J.W.; Kao, J.T.; Narendra, S.G.; Nair, R.; Antoniadis, D.A.; Chandrakasan, A.P.; Member, S.; De, V. Adaptive body bias for reducing impacts of die-to-die and within die parameter variations on microprocessor frequency and leakage. IEEE J. Solid-State Circuits 2002, 37, 1396–1402. [Google Scholar]
Srivastava, A.; Sylvester, D.; Blaauw, D. Statistical Optimization of Leakage Power Considering Process Variations Using Dual-Vth and Sizing. Proceedings of the 41st IEEE/ACM Design Automation Conference, San Diego, CA, USA; 2004; pp. 773–778. [Google Scholar]
Ko, U.; Pua, A.; Hill, A.; Sivastava, P. Hybrid Dual-Threshold Design Techniques for High-Performance Processors with Low-Power Features. Proceedings of International Symposium on Low Power Electronics and Design, Monterey, CA, USA; 1997; pp. 307–311. [Google Scholar]
Umimg Ko Hill, A.; Balsara, P.T. (Eds.) Design Techniques for High-Performance, Energy-Efficient Control Logic. Proceedings of International Symposium on Low Power Electronics and Design, Monterey, CA, USA, 12–14 August 1996; pp. 307–311.
Uming Ko Balsara, P.T. High performance, Energy Efficient Master-Slave Flip-Flop circuits. Proceedings of International Symposium on Low Power Electronics and Design, San Jose, CA, 9–11 October 1995; pp. 16–17.
Singh, H.; Agarwal, K.; Sylvester, D.; Nowka, K.J. Enhanced leakage reduction techniques using intermediate strength power gating. IEEE Trans. Very Large Scale Integr. 2007, 15, 1215–1224. [Google Scholar]
Mutoh, S.; Douseki, T.; Matsuya, Y.; Aoki, T.; Shigematsu, S.; Yamada, J. 1-V power supply high-speed digital circuit technology with multithreshold-voltage CMOS. IEEE J. Solid-State Circuits 1995, 30, 847–854. [Google Scholar]
Shigematsu, S.; Mutoh, S.; Matsuya, Y.; Yamada, J. A 1 V High-Speed MTCMOS Circuit Scheme for Power-Down Applications. Proceedings of the IEEE International Symposium on VLSI Circuits, Kyoto, Japan, 8–10 Junuary 1995; pp. 125–126.
Shigematsu, S.; Mutoh, S.; Matsuya, Y.; Tanabe, Y.; Yamada, J. A 1V High-Speed MTCMOS Circuit Scheme for Power-Down Application Circuits. IEEE J. Solid-State Circuits 1997, 32, 861–869. [Google Scholar]
Kao, J.; Narendra, S.; Chandrakasan, A. Subthreshold Leakage Modeling and Reduction Techniques. Proceedings of the IEEE/ACM International Conference on Computer-Aided Design, San Jose, CA, USA; 2002; pp. 141–148. [Google Scholar]
Salman, E.; Dasdan, A.; Taraporevala, F.; Kucukcakar, K.; Friedman, E.G. Exploiting setup-hold time interdependence in static timing analysis. IEEE Trans. Comput.-Aid. Des. Integr. Circuits Syst. 2007, 26, 1114–1125. [Google Scholar]
Stojanovic, V.; Oklobdzija, V.G. Comparative analysis of master-slave latches and flip-flops for high-performance and low-power systems. IEEE J. Solid-State Circuits 1999, 34, 536–548. [Google Scholar]
Weste, N.; Harris, D. CMOS VLSI Design; Addison Wesley: White Plains, NY, USA, 2004. [Google Scholar]
Predictive Technology Model (PTM). Available online: http://www.eas.asu.edu/∼ptm (accessed on 1 September 2010).
Cao, Y.; Sato, T.; Orshansky, M.; Sylvester, D.; Hu, C. New Paradigm of Predictive MOSFET and Interconnect Modeling for Early Circuit Design. Proceedings of the IEEE Custom Integrated Circuits Conference, Orlando, FL, USA, 21–24 May 2000; pp. 201–204.
Brglez, F.; Bryan, D.; Kozminski, K. Combinational Profiles of Sequential Benchmark Circuits. Proceedings of the IEEE International Symposium on Circuits and Systems, Portland, OR, USA, 8–11 May 1989; pp. 1929–1934.
Homepage of H-SPICE™. Available online: http://www.synopsys.com (accessed on 1 September, 2010).

© 2011 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).

Share and Cite

MDPI and ACS Style

Salman, E.; Qi, Q. Path Specific Register Design to Reduce Standby Power Consumption. J. Low Power Electron. Appl. 2011, 1, 131-149. https://doi.org/10.3390/jlpea1010131

AMA Style

Salman E, Qi Q. Path Specific Register Design to Reduce Standby Power Consumption. Journal of Low Power Electronics and Applications. 2011; 1(1):131-149. https://doi.org/10.3390/jlpea1010131

Chicago/Turabian Style

Salman, Emre, and Qi Qi. 2011. "Path Specific Register Design to Reduce Standby Power Consumption" Journal of Low Power Electronics and Applications 1, no. 1: 131-149. https://doi.org/10.3390/jlpea1010131

Article Menu

Path Specific Register Design to Reduce Standby Power Consumption

Abstract

1. Introduction

2. Previous Work