A Low-Power High-Speed Sense-Ampliﬁer-Based Flip-Flop in 55 nm MTCMOS

: In this paper, a sense-ampliﬁer-based ﬂip-ﬂop (SAFF) suitable for low-power high-speed operation is proposed. With the employment of a new sense-ampliﬁer stage as well as a new single-ended latch stage, the power and delay of the ﬂip-ﬂop is greatly reduced. A conditional cut-o ﬀ strategy is applied to the latch to achieve glitch-free and contention-free operation. Furthermore, the proposed SAFF can provide low voltage operation by adopting MTCMOS optimization. Post-layout simulation results based on a SMIC 55 nm MTCMOS show that the proposed SAFF achieves a 41.3% reduction in the CK-to-Q delay and a 36.99% reduction in power (25% input data toggle rate) compared with the conventional SAFF. Additionally, the delay and the power are smaller than those of the master-slave ﬂip-ﬂop (MSFF). The power-delay-product of the proposed SAFF shows 2.7 × and 3.55 × improvements compared with the conventional SAFF and MSFF, respectively. The area of the proposed ﬂip-ﬂop is 8.12 µ m 2 (5.8 µ m × 1.4 µ m), similar to that of the conventional SAFF. With the employment of MTCMOS optimization, the proposed SAFF could provide robust operation even at supply voltages as low as 0.4 V.


Introduction
High speed and low power is the theme of digital circuits. As basic storage elements, the delay and power of the flip-flops directly determines the performance and power of digital systems. As described in [1], flip-flops contribute a significant portion of the power consumption of the digital system. Moreover, the setup-time and CK-to-Q delay of the flip-flop directly affects the maximum clock frequency of the system. Therefore, optimizing the delay and power of flip-flops can directly improve the performance and reduce the power consumption of digital systems.
The most commonly used flip-flop in digital systems is the master-slave flip-flop (MSFF). The schematic of the C 2 MOS [2] master-slave flip-flop in the SMIC 55 nm standard cell library provided by the foundry is shown in Figure 1. As shown in Figure 1, the data should pass through the first latch before the rising edge of CK, which ensures that the flip-flop can capture the correct data at the rising edge of CK. Therefore, the setup time in the MSFF is relatively long. At the same time, the CK-to-Q delay involves several logics and is also relatively large. The pulse-triggered flip-flop (PFF) has been considered to be a kind of fast flip-flop. Several PFFs have been proposed in previous work [3][4][5][6]. PFF is composed of a single latch and a clock pulse generator. The data in the PFF could be captured right after the rising edge of CK, and the setup time of the PFF is decreased to near-zero or negative. The main trouble with the PFF is the determination of the clock pulse width. A too narrow pulse width cannot guarantee the accuracy of the captured data, while a long pulse width will increase the hold time. Since the PFF should be able to work correctly at different temperatures and corners, the longest pulse width should be applied to the PFF, and the hold time of PFF is increased. This so-called sizing problem limits the application of the PFF.
The sense-amplifier-based flip-flop (SAFF), first appearing in [7], is another fast flip-flop with a near-zero or negative setup time. The SAFF is composed of a sense-amplifier (SA) stage and a slave latch. The SA stage could capture the data right after the rising edge of CK, and the output of the SA stage could be maintained during the positive half cycle of CK. Thus, the sizing problem in the PFF is removed. With a near-zero or negative setup time and a reduced hold time, the SAFF is a good candidate to substitute MSFF in the standard cell library for high-speed design. Even though these features are attractive, the SAFF has several problems. The pre-charge operation of the SAFF will increase power consumption, and a fast latch structure is needed to reduce the CK-to-Q delay. Moreover, the low voltage operation problem in the conventional SAFF should be resolved to guarantee that the SAFF can be applied to low voltage designs.
In this paper, a low-power high-speed SAFF is proposed. A new sense-amplifier stage with a smaller pre-charge load is applied to reduce the power consumption of the pre-charge operation. A new single-ended latch is employed to achieve fast, low-power and glitch-free operation. Furthermore, MTCMOS optimization is employed in the proposed SAFF to achieve robust low voltage operations. The rest of this paper is organized as follows. Section 2 gives a brief introduction to previous SAFFs. In Section 3, the structure of the proposed SAFF is described in detail. Section 4 shows the simulation results and comparisons with previous SAFFs. Finally, Section 5 draws conclusions.

Overview of Existing SAFF Architectures
The schematic of the conventional SAFF [8], which is composed of a SA and a NAND2-based set-reset (S-R) latch, is shown in Figure 2a. The SAFF operates as follows. The voltage of SN and RN is pre-charged to VDD while the CK is low; the output data are maintained by the latch. At the rising edge of CK, the pre-charge transistors MP1 and MP4 are turned off and MN5 is turned on. One of the pre-charge nodes (SN and RN) is discharged to 0 while the other remains VDD, depending on the input data. Then, the latch captures the new data from the SA stage. The always-on transistor MN6 is used to maintain the output of the SA when CK is high. For example, SN is discharged to 0 in response to D = 1 at the rising edge of CK, and SN needs to be maintained at 0 during the positive half cycle of CK. D may change to 0 during the positive half cycle, thus another path to 0 should be provided to SN, and MN6 works at this time. The main trouble with the conventional SAFF is the unbalanced delay of the S-R latch as well as the large power of the pre-charge operation. Moreover, the always-on transistor decreases the robustness of the SAFF at low supply voltages. Nikolic et al. proposed a latch for the SAFF in [9], which was composed of two inverters and several complex logics to eliminate the delay dependence between Q and QN in the conventional SAFF, so as to decrease the delay of the SAFF. The schematic of the latch is shown in Figure 2b. The two inverters are applied to get the inversion of SN and RN, and the output Q and QN is directly generated by the four signal SN, RN, S and R. The dependence between Q and QN is removed and the CK-to-Q delay is decreased. Since the delay of the inverters and complex logic cannot be ignored, the optimization of the delay in this way may not meet the expectations.
In [10], Kim et al. proposed a SAFF with a latch composed of two N-C 2 MOS circuits and two pairs of inverters as shown in Figure 3a. Lin et al. improved the latch in [10] to a single-ended structure, which reduced the power consumption substantially [11]. The schematic of the latch in Lin's SAFF is shown in Figure 3b. The delay of this kind of SAFF is greatly reduced compared with that of the conventional SAFF since there are few logics between SN and the output Q. However, there is a big glitch while the output Q and the next data input are both high, and the glitch will increase the power consumption of the SAFF. Furthermore, the current contention of the back-to-back inverters will increase power consumption too. In [12], Strollo et al. proposed a SAFF which combined the conventional SAFF and Kim's SAFF to achieve both fast and glitch-free operation. The schematic of the latch in Strollo's SAFF is shown in Figure 3c. All these SAFFs above suffer from the low voltage operation problem due to the always-on transistor in the SA stage as described in [13]. To address this problem, the SAFF in [13] employs a detection signal to gate the always-on transistor, thus overcoming the current contention of previous SAFFs. The schematic of Jeong's SAFF in [13] is shown in Figure 4. The control signal of the always-on transistor in the SA stage is changed to the detection signal. The main concern with Jeong's SAFF is that the transition completion detection logic will increase the propagation delay of the FF.

Structure of the Proposed SAFF
The schematic of the proposed SAFF is shown in Figure 5. As shown in Figure 5, the SAFF is composed of a SA stage and a slave latch, similar to the previous SAFFs. As described in previous sections, the SA stage could capture the data right after the rising edge of CK and the slave latch is applied to maintain the output during the negative half cycle of CK. The SA stage in the conventional SAFF needs to charge all the internal nodes during pre-charge operation, and some of the nodes such as n1, n2 and n3 in Figure 2a are discharged to VSS during the data-capturing operation no matter what the input data are. Actually, the pre-charge operation of n1, n2 and n3 has no practical effect on the function of the SA and is a waste of power. As shown in Figure 6a, the voltages of n1, n2, and n3 are charged close to the power supply voltage during pre-charge operation, and the sizes of the transistors MN3 and MN4 are large to decrease the propagation delay, so the pre-charge operation of these nodes is a large waste of power consumption. In this paper, the structure of the SA is changed; the NMOS controlled by CK (MN5 in Figure 2) is split into two (MN5 and MN6 in Figure 5) and moved to connect directly to the back-to-back inverter, as shown in Figure 5. Through the conversion, the nodes related to MN3 and MN4 no longer need to charge during pre-charge operation since the transistors MN5 and MN6 are off when CK is low. As shown in Figure 6b, the voltages of n1 and n2 in Figure 5 remain low throughout the operation. Thus, the power of the pre-charge operation is greatly reduced. Since pre-charge power is an important part of the power consumption of the SAFF, the power consumption of the proposed SAFF can be greatly reduced. The proposed SA structure can also improve the hold time of the proposed SAFF. The new SA stage can capture the input data faster at the rising edge of CK. This is mainly because the internal nodes n1 and n2 remain low during the operation, and the discharge time of the internal nodes is reduced. Thus, the hold time of the proposed SAFF is reduced. Even though faster data capture increases the setup time of the proposed SAFF, the increase is very small because the discharge time of the internal nodes is short.
A new single-ended latch is applied to the proposed SAFF. The proposed latch combines the advantages of Strollo's latch and Lin's latch to achieve fast and energy efficient operation. The first stage of the latch shown in Figure 5 is similar to that of Strollo's latch to achieve glitch-free operation. As shown in Figure 7b, the glitch of Lin's latch shown in Figure 7a is perfectly removed. This is mainly due to the insertion of MN9. When D is high, DN is low and the pull down path is totally cut off by MN9. Thus, the glitch is removed. The back-to-back inverters used for data storage are modified to overcome the current contention. For the output Q's transition from low to high, which means the voltage of SN is low, the feedback inverter is cut off by MN11. Similarly, for the output Q's transition from high to low, the feedback inverter is cut off by MP7. As a result, the effect of the feedback inverter on the output transition is completely eliminated. Since the latch has nothing to do with RN, the sizes of the transistors related to RN generation in the SA stage could be reduced to reduce power consumption. The 1× inverter INV1 in the latch could provide complementary output QN when necessary, and the delay difference between Q and QN is the same as MSFF, an inverter delay. As described in [13], the always-on transistor leads to function failures at low supply voltages. Even though the detection logic in [13] solves the low voltage function failures well, the complex logic increases the delay and power consumption. In this paper, MTCMOS optimization is employed to overcome the problem. To avoid suffering low voltage function failures, the driving capability of the always-on transistor should be weaker than that of the pull-down transistors. When the driving capacity of two stacked LVT-NMOSs and the always-on transistor becomes larger than that of two stacked LVT-NMOSs due to technological variation, the SAFF suffers function failures. When the always-on transistor is LVT-NMOS, three LVT-NMOSs are stacked. As shown in Figure 8a, the current of three stacked LVT-NMOSs can be larger than that of two stacked LVT-NMOSs (I1 / I2 < 1) at low supply voltages, and function failures occur. When the always-on transistor is changed to HVT-NMOS, which means two LVT-NMOSs and one HVT-NMOS are stacked, the function failures no longer occur since the current of two stacked LVT-NMOSs and one HVT-NMOS is always smaller than that of two stacked LVT-NMOSs (I1 / I3 > 1 all the time). Furthermore, the current of two stacked LVT-NMOSs and the always-on transistor needs to be larger than the leakage current to ensure correct operation. As shown in Figure 8c, when the always-on transistor adopts HVT-NMOS, the condition is still satisfied. Therefore, the problem of low voltage function failures can be well solved by multi-threshold optimization. In the proposed design, the always-on transistor is high-threshold, while others are low-threshold, as shown in Figure 5.  Figure 9 shows the transient waveforms of the proposed SAFF. As shown in Figure 9a, the proposed SAFF operates as follows: High-to-low transition: The input data D completes high-to-low conversion before the rising edge of CK; at the same time, DN achieves low-to-high conversion since it is the inverse of D. SN and RN are pre-charged to high during the negative half cycle of CK. At the rising edge of CK, RN is discharged to low through MN2, MN6 and MN4. SN remains high, and the output Q is discharged to low through MN8, MN9 and MN11.The feedback inverter is gated by MP7 until QN finishes low-to-high conversion (Q finishes high-to-low conversion). The feedback inverter can keep the voltage of Q low due to MN10 and MN11 after QN turns to high.
Low-to-high transition: The input data D completes low-to-high conversion before the rising edge of CK, and DN finishes high-to-low conversion. SN and RN are pre-charged to high during the negative half cycle of CK. At the rising edge of CK, SN is discharged to low through MN1, MN5 and MN3. Then, the output Q is charged to high through MP5; since MN10 is cut off by MN11 when SN is low, the operation is also contention-free. The output Q is maintained by MP5 during the positive half cycle of CK. As for the negative cycle of CK, SN is pre-charged to high and MP5 is off; the output Q is maintained by MP6 and MP7 at that time.
The transistor sizes of the proposed SAFF are shown in Table 1. Since the sizes of MN1, MN5 and MN3 directly determine the pull-down speed of SN, and the pull-down speed of SN determines the performance of the proposed SAFF, the sizes of MN1, MN5 and MN3 are set to be larger. The latch stage in the proposed SAFF is single-ended, so the sizes of MN2, MN6 and MN4 can be smaller to reduce power consumption. Furthermore, reducing the sizes of MN2, MN6 and MN4 can balance the pull-down speed of SN and RN since the load of RN is smaller than that of SN. The balanced speed leads to a better setup time and hold time for the proposed SAFF. The transistor MN7 is just used to provide a path to ground when the data changes during the positive half cycle of CK. MN7 will reduce the voltage difference between n1 and n2, affecting the setup time of the SAFF. Therefore, on the premise of ensuring that the current of the two stacked NMOSs and MN7 is greater than the leakage current, the smaller the driving capability of MN7, the better. The sizes of the transistors in the latch stage are designed to be similar to the sizes in the standard cell library except for the feedback inverter, since the feedback inverter is just used to maintain the data of output Q; the drive capability of the feedback inverter is not important, so the sizes of the transistors in feedback inverter are set to minimum to reduce power consumption.

Simulation Results and Comparisons
The proposed SAFF has been designed based on SMIC-55 nm technology. In order to verify the validity of the proposed SAFF, the MSFF, the conventional SAFF, Nikolic's SAFF, Lin's SAFF and Jeong's SAFF have also been designed based on the same technology for comparison. Hspice with the same settings is adopted to perform all post-layout simulations for comparisons. The performance comparisons such as of the area, power consumption, CK-to-Q delay, setup time and hold time of the various flip-flops are described in detail below. Figure 10 shows the layouts of these flip-flops. The proposed SAFF has the smallest area among the five SAFFs due to the simplified single-ended latch. The area of Nikolic's SAFF and Jeong's SAFF is quite larger compared to that of the conventional SAFF since the slave latches of these two kinds of SAFF are much more complex. The area of the MSFF is smaller than that of all kinds of SAFF even though the number of transistors in the MSFF is not the least. This is due to the fact that the number of PMOS and NMOS in the MSFF is the same, which leads to the maximum area utilization. The area of the proposed SAFF is just 11.5% larger than that of the MSFF, which has the smallest cost of all the SAFFs when used to replace the MSFFs in digital systems. In order to analyze the power consumption of the flip-flops, different input data toggle rates have been applied to these flip-flops at a clock frequency of 500 MHz at typical corner. As shown in Table 2, with the employment of the proposed SA stage and the single-ended latch, the proposed SAFF shows a great power advantage over previous SAFFs as well as the MSFF at all toggle rates. The power of Nikolic's SAFF and Jeong's SAFF is much higher than that of the conventional SAFF. This is mainly because there are some logics (inverters in Nikolic's SAFF and NAND2 in Jeong's SAFF) that flip at clock frequency. The power consumption of Lin's SAFF reflects the impact of the glitch and the current contention regarding the power consumption of the SAFF. The power of Lin's SAFF is higher than that of the conventional SAFF even though a single-ended latch is employed. In the proposed SAFF, there is no additional logic associated with the pre-charge node (SN and RN). The glitch and current contention of the previous slave latch are removed. Furthermore, the pre-charge power of the SA stage is greatly reduced owing to the proposed SA stage. As a result, the proposed SAFF can achieve such a huge power consumption advantage.  Table 3 shows the CK-to-Q delay of the flip-flops. The proposed SAFF has the lowest CK-to-Q delay among the flip-flops across all PVT corners. The main reason is that the signal passes from the SA stage to the output Q with very little logic. In addition, the proposed SA stage can capture the input data faster at the rising edge of CK. The delay of the conventional SAFF is relatively large due to the dependence of Q and QN in the SR latch. The delay of Nikolic's SAFF and Jeong's SAFF is slightly larger than that of the conventional SAFF even though the dependence of Q and QN in the conventional SAFF is removed. This is mainly because of the increased delay of the complex latches. All kinds of SAFF show speed advantages over the MSFF since the MSFF has the longest path from CK to Q compared with the SAFFs. The delay of the proposed SAFF is reduced by 56.91% compared with that of the MSFF at the typical corner. Therefore, replacing the MSFF with the proposed SAFF can result in a huge speed increase. Table 3. CK-to-Q delay (ps) of the flip-flops across PVT corners. The setup time is determined to be the minimum input data D to CK delay that guarantees successful data capture by the flip-flop [14]. If D must arrive before the rising edge of CK, the setup time is positive. On the contrary, if the input data D arrives after the rising edge of CK and the flip-flop can still capture the correct data, the setup time can be negative. Table 4 shows the setup times of these flip-flops. Obviously, the setup time of all kinds of SAFF is negative, which shows a significant performance advantage compared with the MSFF, even though the setup time of the proposed SAFF is increased compared with that of previous SAFFs, which is mainly due to the new structure of the SA stage. Actually, the setup time increase for the proposed SAFF is very small compared with that of the conventional SAFF, only 4 ps at the typical corner as shown in Table 4. Different from that of the MSFF, the setup time of these SAFFs increases at the best corner. This is because for the MSFF, input data can pass through the master latch faster at the best corner, so the setup time is reduced. For SAFFs, the SA stage can capture the input data faster at the best corner, so it is more likely to capture the previous data, and the setup time must be increased to capture the correct data. On the contrary, the setup time of these SAFFs decreases at the worst corner. Although the setup time varies between different PVT corners, the difference is actually small and will not have a major impact on circuit performance. The hold time is determined to be the minimum CK to input D delay that guarantees a successful data hold by the flip-flop [14]. If the input data must be held past the rising edge of CK, the hold time is positive. If the input data could change before the rising edge of CK and the flip-flop can still hold the correct data, the hold time can be negative. As shown in Table 5, the hold time of all kinds of SAFF is positive, which is the cost of obtaining a negative setup time. The proposed SAFF has a lower hold time than other SAFFs, since the new SA stage can capture the input data faster than the previous conventional SA stage. Most previous SAFFs have similar hold times due to the similar structure of the SA stage except Jeong's SAFF. Jeong's SAFF has the worst hold time due to the turned-off strategy, which is used to improve the setup time of the SAFF. The hold time of the SAFFs decreases at the best corner because the SA stage can capture the input data faster. On the contrary, the hold time increases at the worst corner. Additionally, the difference in the hold time between PVT corners is small and will not have a major impact on circuit performance. The power-delay-product (PDP) is employed as a comprehensive performance index to evaluate each flip-flop. Figure 11 shows the normalized PDP under different input data toggle rates. Since the delay and power of the proposed SAFF are smaller than those of other flip-flops, the PDP of the proposed SAFF is the lowest under each toggle rate. Compared with that of the MSFF, the PDP of the proposed SAFF has at least 3× improvement under different input data toggle rates, which shows a significant speed and power advantage. To evaluate the robustness of the proposed SAFF at low supply voltages, a 500-point Monte Carlo simulation assuming die-to-die global variations and within-die random mismatch has been performed. As shown in Figure 12a, the proposed SAFF can provide robust operation even at a voltage as low as 0.4 V. As for the conventional SAFF, the conventional SAFF suffers function failures at a supply voltage of 0.4 V as shown in Figure 12b. Figure 13 shows the CK-to-Q delay evaluation of the proposed SAFF as well as that of the MSFF and conventional SAFF as the supply voltage settings vary from 0.4 V to 1.2 V. As shown in Figure 13, the proposed SAFF shows delay advantages over the MSFF and conventional SAFF at all supply voltages.   Table 6 summarizes the performance of the various flip-flops. As shown in Table 6, with the proposed SA stage and the new glitch-free contention-free single-ended latch, the power of the proposed SAFF has a significant advantage over that of the other SAFFs. The leakage of the proposed SAFF is the smallest amongst all FFs, which is mainly due to the simplified single-ended latch. Furthermore, the modified latch also makes the proposed SAFF have the lowest CK-to-Q delay of all SAFFs. Even though the setup time of the proposed SAFF is a little larger than that of other SAFFs due to the new SA stage, the hold time of the proposed SAFF is improved. The proposed SA stage makes the proposed SAFF have the smallest hold time among the SAFFs. The area of the proposed SAFF is similar to that of the conventional SAFF, indicating that the proposed SAFF does not increase the area overhead when it obtains the above improvements. The clock loading of the proposed SAFF is similar to that of other SAFFs. This is because when splitting the CK-controlled transistor, the transistor size can be reduced, and the clock loading will not increase significantly. Compared with those of the MSFF, the setup time, CK-to-Q delay and power consumption of the proposed SAFF are greatly improved at the cost of a small increase in the area and hold time. The power-delay-product of the proposed SAFF is much smaller than that of the other flip-flops, indicating that the proposed SAFF can provide high-speed and low-power operation.

Conclusions
A low-power high-speed SAFF is proposed in this paper. A new structure for the SA stage is proposed to minimize the pre-charge power of the SAFF. Additionally, a glitch-free and contention-free single-ended latch is proposed. With the employment of the new SA stage and the single-ended latch, the delay and power of the SAFF are greatly optimized. The power-delay-product of the proposed SAFF shows a 2.7× improvement compared with that of the conventional SAFF at a 25% input data toggle rate. The improvement is 3.55× when compared with the MSFF, which indicates that the proposed SAFF is a good choice for replacing MSFFs in digital systems to provide low-power, high-speed operation.