Path Specific Register Design to Reduce Standby Power Consumption

A methodology is proposed to design low leakage registers by considering the type of timing path, i.e., short or long, and type of register, i.e., launching or capturing. Three different dual threshold voltage registers are developed where each register trades, depending upon the timing path, a different timing constraint for reducing the leakage current. For example, the first proposed register is used as a launching register in a noncritical path, trading clock-to-Q delay for leakage current. Other timing constraints such as setup and hold times are maintained the same not to introduce any timing violations. Alternatively, the second and third registers, trade, respectively, setup time and hold time for leakage current while maintaining clock-to-Q delay constant. The effect of the proposed methodology on leakage current is investigated for four technology nodes. The overall reduction in the leakage current of a register can exceed 90% while maintaining the clock frequency and other design parameters such as area and dynamic power the same. Three ISCAS 89 benchmark circuits are utilized to evaluate the methodology, demonstrating, on average, 23% reduction in the overall leakage current.


Introduction
Power dissipation is a primary limitation to further expand the capabilities of modern CMOS integrated circuits.Miniaturization of the physical dimensions and advanced manufacturing technologies such as 3-D integration [1] and system-in-package [2] have tremendously increased the integration capability where power consumption has become the primary design barrier.A wide range of applications such as high performance microprocessors, ASICs, and systems-on-chip suffer from this limitation.
Multicore architectures have been proposed to maintain the clock frequency constant, thereby preventing the increase in power consumption [3,4].Unfortunately, only the dynamic power is affected by the clock frequency whereas the overall static power continues to increase due to higher leakage current.
Traditionally, technology scaling has relied on enhancing the drive current capability by reducing the channel length and gate oxide thickness.Power supply voltage has also been reduced to satisfy reliability constraints.Decreasing the power supply voltage requires the threshold voltage to be also reduced to maintain high drive current capability.The reduction of the threshold voltage, however, exponentially increases the subthreshold leakage current [5].Similarly, a reduction in the gate oxide thickness exponentially increases the mechanical tunneling of the carriers through the oxide, producing significant gate leakage current [6].
More than 40% of the total energy in the active mode can be dissipated due to idle transistors in modern systems-on-chip [7][8][9].Furthermore, leakage current is the dominant source of energy consumption when the IC is in the idle mode, significantly degrading the battery life in portable devices.
ITRS identifies leakage power consumption as "a clear long term threat and a focus topic for design technology in the next 15 years" [10].Projections of the overall power dissipation within an IC are plotted in Figure 1 based on ITRS predictions.The contribution of the static and dynamic power are separately highlighted, assuming a switching activity of 0.5 and constant clock frequency in each technology node.As illustrated in this figure, overall static power dominates dynamic power in deep submicrometer CMOS technologies.High variability of the leakage current due to process variations further exacerbates this issue [11].
The development of alternative gate dielectric materials with higher permittivity, i.e., high-K dielectric, and metal gate transistors permit thicker dielectric layers, significantly reducing the gate leakage current [12,13].The continuation of technology scaling below 45 nm have been possible partly due to this progress at the device level.As the gate leakage current has been significantly reduced, subthreshold leakage has become the dominant component for static power dissipation.
Various methodologies have been proposed to alleviate subthreshold leakage current consumption such as multi-threshold voltage CMOS (MTCMOS), also referred to as power gating [14], dynamic adjustment of the threshold voltage through body biasing [15], and multi-threshold voltage transistors, also referred to as dual threshold voltage (dual-V th ) partitioning [16].These existing approaches have several limitations, particularly for low leakage register design, as further described in Section 3.
A comprehensive methodology is proposed in this paper to design path specific dual-V th , low leakage registers while simultaneously considering clock-to-Q delay, setup time, hold time, type of timing path (short or long), and type of register (launching or capturing).Existing dual-V th based registers reduce the leakage current only along the feedback path to not affect the timing constraints [17][18][19].This traditional approach significantly limits the amount of leakage that can be reduced, particularly in sub 22 nm CMOS technologies.Furthermore, in conventional approaches, the hold time of the register may be affected which may produce a timing violation depending upon the type of timing path and register.These limitations of the existing approaches are overcome with the proposed design methodology while significantly increasing the amount of leakage current that is reduced.
The rest of the paper is organized as follows.Exiting multi-threshold voltage based leakage reduction techniques are summarized in Section 2. Background material reviewing different types of timing paths and timing constraints of a register are provided in Section 3. A methodology is described in Section 4 to design path specific registers with low leakage current.The results are discussed in Section 5. Finally, the paper is concluded in Section 6.

Previous Work
Existing techniques to reduce leakage current are summarized in this section with an emphasis on multi-threshold voltage design.Related limitations of these techniques are also discussed.
MTCMOS is a commonly used leakage reduction technique where a high threshold voltage (high-V th ) sleep transistor is placed between the circuit and power supply and/or ground node, as shown in Figure 2.
When the circuit operates in the idle mode, high-V th sleep transistor is cutoff, disconnecting the circuit from the power supply voltage and/or ground node.During the active mode, the sleep transistor is on and the combinational circuit consisting of low threshold voltage (low-V th ) transistors operates normally.The drain of the sleep transistor is referred to as virtual power (if the sleep transistor is placed between the circuit and power supply) and virtual ground (if the sleep transistor is placed between the circuit and ground node).Subthreshold leakage current is reduced during the idle mode since the sleep transistor behaves as a large resistance between the combinational circuit and power supply and/or ground node.
There are however several limitations of MTCMOS.When the mode of operation changes from idle to active, the circuit requires a specific amount of time to charge the virtual power node or discharge the virtual ground node.This required time is referred to as wake up latency [20].Several clock cycles are typically required for the virtual ground or power to stabilize.Furthermore, the circuit may experience ground bounce during this time, affecting the reliable operation of nearby logic circuits.Another limitation of MTCMOS that is more related to this paper is its application to memory elements such as a register.MTCMOS cannot be directly applied to a register since the state of the register should be preserved even when the register is in the idle mode.In conventional MTCMOS, however, the idle circuit is disconnected from the power supply voltage and the state of the circuit is lost.Several different versions of MTCMOS have been developed specifically for register design to alleviate this issue [8,14,[21][22][23].These techniques, however, require additional inverters and transmission gates, decreasing the amount of power that can be reduced while also increasing the overall area.
Exploiting the dependence of the threshold voltage on bulk potential has also been proposed to dynamically adjust the threshold voltage, referred to as adaptive body biasing [15].During idle mode, the substrate of the circuit is reverse biased to increase the threshold voltage, thereby reducing the leakage current.The primary drawback of this methodology is to generate the bias voltage for the substrate in a power efficient way.A control circuitry is also required, further decreasing the power efficiency.
Another technique to reduce the leakage current is based on utilizing the multi-threshold voltage transistors that are provided by the manufacturing technology.This technique is also referred to as dual-V th partitioning [24].Those logic gates that are not part of the critical path are replaced with high-V th transistors to reduce the leakage current by exploiting the excessive slack.Alternatively, those gates along the critical path are implemented with low-V th transistors to satisfy the timing constraints, as depicted in Figure 3.
A similar approach has been developed to design the registers.Those transistors that are not located along the clock-to-Q delay path have been replaced with high-V th devices to reduce the leakage current within a register [17][18][19].Unfortunately, in these existing approaches, the number of high-V th transistors is sufficiently small, limiting the overall reduction in the leakage current.Furthermore, since these transistors are not located along the clock-to-Q delay path, the size of these transistors is typically small.Alternatively, those transistors that are located along the clock-to-Q delay path are typically sized larger, making leakage current more significant in these transistors.Another important limitation of the existing approaches is the inability to consider important timing constraints such as setup and hold times.The type of timing path, i.e., short or long, and the type of register, i.e., launching or capturing, significantly affect the design process of low leakage registers, as demonstrated in this paper.Ignoring these effects not only decreases the amount of leakage current that can be reduced, but may also affect reliable circuit operation since the timing constraints may be violated.Thus, application of dual-V th partitioning to the design process of a register requires additional attention.A methodology is proposed in this paper to design dual-V th , low leakage registers by simultaneously considering the clock-to-Q delay, setup time, hold time, and the type of register and timing path.The simultaneous consideration of these parameters is critical to exploit multi-threshold voltage transistors and to guarantee system functionality and timing in deep submicrometer CMOS technologies.

Background
Timing characteristics of synchronous systems are briefly introduced in Section 3.1.The timing constraints of a register, i.e., setup and hold times, are reviewed in Section 3.2.

Timing Characteristics of Synchronous Systems
A simple synchronous digital circuit consisting of two sequentially-adjacent registers with a combinational circuit between these registers is shown in Figure 4.
The first register is referred to as launching register whereas the second register is called capturing register.
Two inequalities should be satisfied for this circuit to function properly [25].Referring to Figure 4, the first inequality is where T Ci and T C f are the delay for the clock signals to arrive, respectively, at the launching and capturing registers.Note that T Ci and T C f are also referred to as, respectively, the delay of the clock launch path and clock capture path.T CP is the clock period.T D is the data path delay consisting of the clock-to-Q delay of the launching register, logic delay of the combinational circuit, and the interconnect delay.T S is the setup time of the capturing register.Note that (1) determines the maximum speed of the circuit, making this inequality important for critical paths.The second inequality that needs to be satisfied is where T H is the hold time of the capturing register.This inequality guarantees that no race condition exists, i.e., the data is not latched to the final register within the same clock edge.Note that (2) is relatively more important for those timing paths where the data path delay is small, i.e., short paths, such as a shift register or counter.These inequalities, the type of data path (short versus long), and the type of register (launching and capturing) play an important role in the design of low leakage, dual-V th registers, as described in Section 4. The timing constraints of a register and related circuit level issues are described in the following section.

Timing Constraints of a Register
Inequalities (1) and ( 2) require a difference called a skew to be larger than or equal to a timing constraint.These inequalities, therefore, can be rewritten as [25] Setup skew ≥ T S (3) where the setup skew and hold skew are, respectively Note the important difference between setup-hold skews and setup-hold times: Setup and hold skews refer to any time difference between the data and clock signals whereas the setup and hold times refer to the minimum required time difference to reliably capture and store the data.
Transistor level realization of a widely used master slave type, edge triggered register is illustrated in Figure 5.According to the setup time constraint, the data signal should be stable at the input of a register for a sufficient amount of time before the active edge of the clock signal.In the example shown in Figure 5, the active edge is a low-to-high transition of the clock signal since the data propagates to the output after this transition.Setup time guarantees that the data is reliably latched to the master before the rising edge of the clock signal arrives.Ideally, the data signal should propagate through TG1 and INV1, arriving at the output of INV1 before the rising edge of the clock signal.According to this condition, the path that determines the setup time consists of TG1 and INV1, as depicted in Figure 5.This condition, however, may require a relatively large setup time.A conventional technique to characterize the setup time constraint of a register is to examine the setup skew versus clock-to-Q delay relationship, as shown in Figure 6(a) [25][26][27].
The smallest setup skew that corresponds to the nominal clock-to-Q delay is approximately equal to the summation of the two delays: TG1 and INV1.As the setup skew is further reduced, clock-to-Q delay gradually increases since for smaller setup skews, the data signal cannot reach to the output of INV1.After a specific point, the clock-to-Q delay starts to exponentially increase due to a race condition at node r since this node is simultaneously driven by two gates: TG1 and TG2.The race condition occurs between the new data driven by TG1 and old data driven by TG2.This region is referred to as metastable and therefore avoided during the characterization process.Typically, a 10% degradation in clock-to-Q delay is allowed while characterizing the setup time, as shown in Figure 6(a).
According to the hold time constraint, the data signal should be stable at the input of a register for a sufficient amount of time after the active edge of the clock signal.This constraint is due to non-ideal characteristics of TG1 as a switch.If the hold time constraint is not satisfied, the new data can be latched into the register and overwrite the previous valid data during the same clock cycle.Note that hold time can sometimes be smaller than zero.In this case, even if the new data propagates through TG1, a race condition exists at node r between the new and old data.If the old data succeeds over the new data, the register works correctly and the negative hold time is valid.The hold time constraint is therefore partly determined by the relative drive strengths of TG1 and TG2.Note that, if the hold time is further reduced, the clock-to-Q delay exponentially increases, as shown in Figure 6(b).Similar to setup time characterization, a 10% degradation in clock-to-Q delay is allowed while characterizing the hold time.These timing constraints (setup and hold times) and clock-to-Q delay play an important role in the design process of low leakage, dual-V th registers.When specific transistors within a register are replaced with high-V th devices to reduce leakage current, the timing constraints may change.Ignoring this effect may produce timing violations, causing a degradation in clock frequency or functional failure.The proposed methodology overcomes this limitation, as described in the following section.

Proposed Methodology
As described in Section 2, existing work on dual-V th based register design does not consider different types of data paths and registers.Referring to Figure 5, a typical approach is to design TG1, INV1, TG3, and INV3 with low-V th transistors to improve the setup time and clock-to-Q delay.The remaining inverters and transmission gates that are located along the feedback path are designed with high-V th devices to minimize the leakage current.This approach, however, is not practical for all of the timing paths.For example, in a short path, reduced clock-to-Q delay may not be desirable according to the second inequality determined by (2).The amount leakage current that can be reduced is also limited since all of the transistors located along the forward signal path, i.e., within TG1, INV1, TG3, and INV3, are low-V th devices.Note that these transistors are typically sized larger to minimize clock-to-Q delay and setup time.The leakage current is therefore relatively more important for these transistors as compared to those that are located along the feedback paths.
The design process of a dual-V th , low leakage register is therefore strongly dependent upon the type of data path, i.e., long (critical), noncritical, and short; and type of register, i.e., launching or capturing, as illustrated in Figure 4. Three different types of dual-V th registers that consider these dependencies are proposed in this paper, as described in Section 4.1.Assigning the proper threshold voltage to each transistor within these registers are discussed in Section 4.2.The amount of leakage that can be reduced by utilizing the proposed registers is evaluated in Section 4.3.Finally, simulation results based on three ISCAS 89 benchmark circuits are provided in Section 4.4.

Path Specific Dual-V th Register Design
The type of timing path and register should be considered during the design process of a dual-V th , low leakage register.Consider, for example, a launching register in a noncritical or short path.In this case, clock-to-Q delay of the register is not critical and therefore can be traded to reduce leakage current.Similarly, for a a capturing register in a noncritical or short path, (2) is the important inequality and the setup time of this register is not critical.Setup time therefore can be traded to achieve low leakage in a capturing register of a noncritical or short path.Existing techniques cannot utilize this opportunity since the transistors located along the clock-to-Q delay and setup path are realized with low-V th devices.Finally, consider a capturing register in a critical path.In this case, the hold time is not critical since (1) is the important constraint.Hold time therefore can be traded to achieve low leakage in a capturing register of a critical or long path.Additional constraints, however, exist for each of these three cases to guarantee that both (1) and ( 2) are satisfied after specific transistors are replaced with high-V th devices.
Three different types of dual-V th registers are proposed depending on the type of data path and register, as summarized in Table 1 and described in the following: This register is designed to replace launching registers in noncritical or short paths.Since there is excessive setup slack in noncritical paths, the primary objective is to trade clock-to-Q delay for leakage current.Both setup and hold times of the register, however, should remain the same (or be reduced) since this register behaves as a capturing register for the previous data path, which may be a critical or short path.Thus, to guarantee that the timing characteristics of the previous path are not affected, the setup and hold times of the register should not increase.

Register 2
This register is designed to replace capturing registers in noncritical or short paths.Due to excessive setup slack, the primary objective is to trade setup time for leakage current.The clock-to-Q delay of the register, however, should remain the same (or be reduced) since this register behaves as a launching register for the following data path, which may be a critical path.Furthermore, the hold time should also remain the same (or be reduced) since for a short data path, (2) is critical.Note that this second register is sufficiently effective to reduce leakage current since the setup time is relatively more important in advanced technologies, as shown in Figure 7.According to this figure, starting 22 nm technology, setup time of the register is higher than the clock-to-Q delay.Thus, the opportunity to trade setup time for leakage current should not be overlooked.Note that the setup time has been characterized using the procedure described in Section 3.2.The third register is designed to replace capturing registers in critical paths.The primary objective is to trade hold time for leakage current since in a critical path, (1) is important and hold slack is typically large.The clock-to-Q delay should remain the same (or be reduced) since the register behaves as a launching register for the following data path, which may also be a critical path.Furthermore, the setup time should also remain the same (or be reduced) since for a critical path, (1) is important.

Threshold Voltage Assignment
An edge triggered D type flip-flop with 2X drive capability is chosen from an industrial standard cell library.The transistor level schematic of the register is illustrated in Figure 8, including the W /L ratios of each transistor.
Note that in the master latch, a tristate inverter is used that combines the TG1 and INV1 of Figure 5. Similarly, the feedback of the master latch also utilizes a tristate inverter.This schematic and W /L ratios are used in the simulations without any modification.
In the original version, the register shown in Figure 8 is designed using only low-V th transistors.To design Register 1, high-V th devices are used for those transistors located along the clock-to-Q delay path, i.e., M13, M14, M17, M18, M19, M20, M21, and M22.Clock-to-Q delay is therefore traded to reduce leakage current.Note that, the setup and hold times of the register remain the same since these transistor do not affect the timing constraints of the register.To design Register 2, high-V th transistors are used only for M2 and M3 to trade setup time for leakage current.Note that M5 and M6 are designed using low-V th transistors even though this inverter is along the setup path, as illustrated in Figure 5.However, as described in the previous section, clock-to-Q delay and hold time of the register should remain the same.Replacing M5 and M6 with high-V th transistors affects the clock-to-Q delay since this inverter drives the input of the slave latch.
Finally, to design Register 3, high-V th transistors are used for M7, M8, M9, and M10 to trade hold time for leakage current.Note that the feedback path becomes weaker due to high-V th transistors.As such, hold time increases since it is more difficult for the old data to overwrite the new data at the output of the first gate, thereby requiring a larger hold time constraint.Low-V th devices are used for the remaining transistors to guarantee that the clock-to-Q delay and setup time remain the same.For example, M1, M2, M3, and M4 directly affect the setup time constraint and therefore designed with low-V th transistors.Threshold voltage assignment of all of the transistors are listed in Table 2 for each register.

Reduction in the Leakage Current
The amount of reduction in the leakage current achieved by utilizing the proposed three registers is evaluated in this section.Four CMOS technology generations, 45 nm, 32 nm, 22 nm, and 16 nm, are considered using a predictive technology model [28,29].
The register illustrated in Figure 8 is simulated for each technology node where the W /L ratios of the transistors are maintained constant.The leakage current drawn from the power supply is evaluated for the three registers and the results are compared with the leakage current of the original register where only low-V th transistors are used.
The results are illustrated in Figure 9.Note that for the first register, the state of the clock signal does not change the results since all of the high-V th transistors are within the slave latch.For the second and third registers, however, high-V th transistors exist within the tristate inverters.The state of the clock signal is therefore important in evaluating the results.For example, for the second register, clock signal should be at V SS to guarantee that the initial tristate inverter is not in the high impedance state.Similarly, for the third register, clock signal should be at V DD so that the second tristate inverter located along the feedback path is not in the high impedance state.The leakage current of the original register is therefore compared with the first two registers and third register when the clock signal is, respectively, at V SS and V DD .
The leakage current increases with technology, exhibiting a large jump in the 16 nm node.A significant amount of reduction in the leakage current, 79% on average, is achieved by the first register since the number of high-V th transistors is higher, as listed in Table 2.The second register also achieves a considerable amount of reduction in the leakage current, 13% on average and higher below 32 nm technology nodes, since the importance of setup time has been increasing with technology, as depicted in Figure 7.The reduction in the leakage current obtained by the third register is relatively smaller, as further discussed in Section 5.All of the results are listed in Table 3 where the absolute reduction in the leakage current is also provided for each case.The timing constraints (setup and hold times) and clock-to-Q delay of the three registers are characterized as described in Section 3.2.As listed in Table 4, all of the three registers satisfy the required timing constraints listed previously in Table 1.
Specifically, for the first register, setup and hold times are slightly reduced as compared to the original register whereas clock-to-Q delay increases, on average, by 24.6 ps to improve the leakage current.The required condition is therefore satisfied since the setup and hold times do not increase.For the second register, setup time increases, on average, by 13.3 ps to reduce the leakage current.Alternatively, clock-to-Q delay remains the same whereas hold time is reduced, thereby satisfying the required condition.Note that the hold time is reduced since M2 and M3 are high-V th transistors in this register.It is therefore more difficult for the input data to propagate to the output of the first tristate inverter, requiring a shorter hold time.For the third register, setup time and clock-to-Q delay remain approximately the same whereas hold time increases, on average, by 1.7 ps to reduce the leakage current.The last register therefore also satisfies the required timing constraints.

Simulation Results
Three ISCAS 89 benchmark circuits, s27, s526, and s1423, are utilized in this section to better evaluate the efficacy of the proposed methodology on functional circuits rather than only on a register [30].The total number of gates in these sequential circuits is, respectively, 8, 141, and 490 whereas the total number of registers is, respectively, 3, 21, and 74.
First, the leakage current of the circuits is analyzed when the registers are designed only with low-V th transistors.In the second step, registers within each sequential circuit is replaced with the proposed registers based on the type of timing path.Since the critical paths are typically a small percentage of the overall circuit, Register 1 and Register 2 can be effectively utilized to trade, respectively, clock-to-Q delay and setup time for leakage power.In the last step, the methodology proposed in [17][18][19] is evaluated by replacing the low-V th transistors along the feedback path of a register (M7 to M10, M15, and M16 in Figure 5) with high-V th transistors.The overall reduction in leakage current is compared for each case in four different technologies.Note that the register illustrated in Figure 5 is used for all of the circuits.Predictive device models are used for each technology [28,29].The analysis is performed using H-SPICE [31].
The results of the analysis are listed in Table 5.As summarized in this table, the proposed methodology achieves a significant reduction in the overall leakage current.Average reduction over three circuits and four technologies is approximately 23%.Note that the overall reduction in the leakage current increases as the size of the circuit grows and the ratio of the number of registers to the overall number of gates increases.Also note that according to these results, the reduction achieved by the methodology described in [17] is negligible due to two reasons: (1) As illustrated in Figure 5, the feedback path of the master latch consists of a tristate inverter.Leakage current in a tristate inverter is significantly less than a regular inverter due to increased impedance between the power supply and ground; (2) The feedback path of the slave latch consists of only a transmission gate.The results provided in [17] assume a different register architecture, as shown in Figure 8.For this architecture, there is an inverter along the feedback path of both master and slave latches, thereby increasing the overall reduction in leakage.In this work, the register is chosen from an industrial cell library without any modification.Note that the proposed methodology achieves a higher reduction in leakage current as compared to [17] even for the register shown in Figure 8 since the number of high-V th transistors is higher in the proposed dual-V th registers.Also note that the effect of high-V th transistors on setup and hold times is not considered in [17].This effect can be significant since an unexpected increase in the setup or hold times can produce a timing violation, as described in Section 4.2.

Discussion and Future Study
According to the results presented in the previous section, the first register achieves the highest amount of reduction due to two reasons: (1) greatest number of high-V th transistors are used in this register and (2) the width of these transistors is relatively high to reduce the clock-to-Q delay.The second register also achieves a reasonable amount of reduction whereas the reduction achieved by the third register is small (2.5% on average) due to two reasons: (1) the stack effect within the tristate inverter increases the standby impedance between the power supply voltage and ground node and (2) since this tristate inverter is located along the feedback path, the width of the transistors is smaller, decreasing the leakage current.Note however that this leakage reduction is achieved without degrading the clock frequency.Area and dynamic power also remain the same.Furthermore, the absolute leakage reduction achieved by the third register is 20 nA in the 16 nm technology node.Even though the percent reduction is small, when a large number of registers is considered, the absolute reduction can become in the range of milliamperes.When the first two registers are also considered, the overall savings in the standby power consumption of a register significantly increase.Also note that, three dual-V th registers have been proposed, each for a specific type of timing path (critical or noncritical) and register (launching or capturing), as listed in Table 1.Two additional registers that achieve enhanced reduction in the leakage current can be designed based on the proposed registers.Consider, for example, the first proposed register (launching in a critical path) which behaves as a capturing register for the previous path.If the previous path is also noncritical, as depicted in Figure 10, not only clock-to-Q delay, but also setup time can be traded to reduce the leakage current within this register.In this case, the number of high-V th transistors becomes higher, increasing the overall reduction in the leakage current.According to Table 3, the overall reduction, which corresponds to the summation of the reduction achieved by the first and second registers, exceeds 90% for sub 45 nm technology nodes.Alternatively, if the previous path is a critical path, not only clock-to-Q delay, but also hold time can be traded to reduce the leakage current.The overall reduction in this case is approximately equal to the summation of the reduction achieved by the first and third registers.
The primary disadvantage of the proposed methodology is the degradation in the robustness of a circuit.For example, the clock-to-Q delay of a launching register in a noncritical path is traded for the leakage current.Thus, the available timing slack of this data path is reduced.A reduced timing slack typically corresponds to a higher sensitivity to variations.The overall robustness is therefore degraded.Note however that this disadvantage is a common limitation in a large number of low power design techniques that rely on exploiting excessive slack.
Finally, also note that the results presented in this paper are based on a specific type of register.A similar methodology can be applied to other types of registers where clock-to-Q delay, setup, and hold times are traded to reduce the leakage current without affecting the clock frequency.The numerical results may change depending upon the level design of a register.Effect of different register architectures on leakage reduction can therefore be investigated as future work.Application of the proposed methodology to pulsed latches also remains as a future study.

Conclusions
A methodology has been proposed to design low leakage registers, minimizing standby power dissipation.Traditional dual-V th registers utilize high-V th transistors only along the feedback path of the master and slave latches where the overall reduction in leakage current is limited.As opposed to existing techniques, a register design methodology that considers the type of timing path (short or long) and register (launching and capturing) is developed.Three different dual-V th registers are introduced where the first register trades clock-to-Q delay for leakage current, achieving, on average, 79% reduction in leakage current.The second and third registers trade, respectively, setup time and hold time to further reduce the leakage current.Depending on the type of timing paths, the overall reduction in the leakage current of a register can exceed 90%.Furthermore, an average reduction of 23% in leakage current is demonstrated for three ISCAS 89 benchmark circuits.Clock frequency and other design parameters such as area and dynamic power remain the same.

Figure 1 .
Figure1.Projections of the IC overall power dissipation normalized to 45 nm technology node, highlighting the dominance of static power over dynamic power.

Figure 2 .
Figure 2. Multi-threshold voltage CMOS (MTCMOS) design to reduce leakage current: (a) sleep transistor is placed between the circuit and power supply; (b) sleep transistor is placed between the circuit and ground node.

Figure 3 .
Figure 3. Dual threshold voltage partitioning to reduce leakage current while maintaining clock frequency the same.

Figure 4 .
Figure 4. Simple synchronous circuit consisting of a combinational logic and two types of registers: Launching and capturing.

Figure 5 .
Figure 5. Transistor level schematic of a widely used master slave type edge triggered register, illustrating the paths for clock-to-Q delay and setup time.

Figure 6 .
Figure 6.Timing constraint characterization for sequential cells: (a) setup skew versus clock-to-Q delay for setup time characterization, (b) hold skew versus clock-to-Q delay for hold time characterization.

Figure 7 .
Figure 7. Dependence of clock-to-Q delay and setup time of a register on technology.

Figure 8 .
Figure 8. Transistor level schematic of a master slave type, edge triggered register where the numbers represent the W /L ratio for each transistor.Three different dual-V th , low leakage registers are designed based on this schematic.

Figure 9 .
Figure 9.Comparison of leakage current obtained from the original and proposed registers for four technology nodes: (a) absolute leakage current; (b) percent reduction in the leakage current.

Figure 10 .
Figure 10.Illustration of a register (R 2 ) that simultaneously behaves as a launching register of a noncritical path and a capturing register of the previous noncritical path.

Table 1 .
Timing characteristics of the proposed dual-V th registers.

Table 2 .
Threshold voltage assignment of the three proposed registers.

Table 3 .
Leakage current of the original and proposed registers for four technology nodes.

Table 4 .
Clock-to-Q delay, and setup and hold times of the original and proposed registers for four technologies.

Table 5 .
Analysis and comparison of leakage current in three ISCAS 89 benchmark circuits.