A Novel Cross-Latch Shift Register Scheme for Low Power Applications

The conventional shift register consists of master and slave (MS) latches with each latch receiving the data from the previous stage. Therefore, the same data are stored in two latches separately. It leads to consuming more electrical power and occupying more layout area, which is not satisfactory to most circuit designers. To solve this issue, a novel cross-latch shift register (CLSR) scheme is proposed. It significantly reduced the number of transistors needed for a 256-bit shifter register by 48.33% as compared with the conventional MS latch design. To further verify its functions, this CLSR was implemented by using TSMC 40 nm CMOS process standard technology. The simulation results reveal that the proposed CLSR reduced the average power consumption by 36%, cut the leakage power by 60.53%, and eliminated layout area by 34.76% at a supply voltage of 0.9 V with an operating frequency of 250 MHz, as compared with the MS latch.


Introduction
The shift register has been commonly used in various digital circuits for decades. This circuit block can be applied in conversion between serial and parallel interfaces during data transmission, and it can be used as a delay circuit too. The shift register is a key component in digital circuits and is generally used in active-matrix displays, sensors, memories, communication receivers, and real-time image processing chips [1][2][3][4][5][6][7][8]. The shrinking feature size of process technology has enhanced the potential performance of electronic devices, and the capacity of shift registers has increased significantly. Therefore, asking for less overhead integrated circuitry layout area [9][10][11][12] and less power consumption [9,11,[13][14][15][16][17][18][19][20][21][22][23][24][25][26] in the shift registers design then becomes more important with the capacity increasing [27].
The conventional architecture of a shift register consists of N-bit flip-flops connected in series. Circuit implementation in such architecture is fast and efficient. However, requesting less overhead layout area and less power consumption still cannot be fulfilled. Additionally, in master and slave (MS) latch architecture, the same data are stored in two latches during one clock cycle. Hence, the MS latch tends to exhibit redundant area and dissipate more electric power. Obviously, if shift registers are implemented by MS latches without a compensation circuit, the data will be transmitted to multiple outputs at the same time and may result in a malfunction. Recently, B. D. Yang proposed a shift register that employed a pulse-latched methodology to solve this issue [27]. The concept behind this reported shift register is using a sub-shift register with an N-bit latch and using an N + 1-bit clock pulse generator to generate an N + 1 non-overlapping clock in one clock cycle to avoid data duplication. Although in this work, the author employed one additional latch in each sub-register to store 1-bit temporary data to solve the timing problem between pulsed latches; however, the circuitry complexity was increased as well. The adoption of multiple non-overlap delayed pulsed clock signals limits the circuit operating clock frequency. Moreover, this shift register uses N sub-shift registers to increase the number of latches at the same time. In [27], the author proposed a latch controlled by a pulse generator. This is similar to the behavior of a trigger flip-flop, and the area can be decreased by sharing a pulse generator so that it is suitable for high-speed application. However, the pulse width is difficult to control due to the process variations. Based upon the above discussions, the simultaneous requests for the elimination of the integrated circuit layout area and the reduction of electrical power consumption still could not be realized in the reported work. To solve that issue, in this study, a cross-latch scheme combined with a flip-flop design is proposed to achieve area saving under low operation voltage.
To overcome the current drawbacks of the conventional MS latch design, in this paper, we propose a novel cross-latch shift register (CLSR) scheme. The proposed CLSR requires only one latch to perform the same functions as those of the MS latches in a conventional shift register. Thus, the proposed CLSR consumes less power and occupies less layout area by reducing the number of transistors.
Details of the proposed configuration are explained in the next section. In Section 2, the principle of the proposed cross-latch shift register (CLSR) is discussed. Experimental validation and a mechanism depiction for the proposed CLSR are presented in Section 3. Figure 1 shows the process of data transmission for a conventional shift register under different clock signals (CLK = 1 and CLK = 0). As shown in Figure 1a, for a shift register with a positive edge triggered flip-flop, when CLK = 0 at the first stage flip-flop (FF0), the master latch (M0) is opened and receives the input data while the slave latch (S0) is closed and stores the previous data. At the same time, in the second stage flip-flop (FF1), the master latch (M1) will track the data stored in the slave latch (S0) of FF0. Likewise, as shown in Figure 1b, when CLK = 1, the slave latch of each flip-flop will track the data in the master latch and send it to the output. According to a data transmission analysis, it can be observed that when CLK = 0, the slave latch (S0) of FF0 and master latch (M1) of FF1 store the same data. When CLK = 1, each flip-flop holds the same data in the master and slave latches, respectively. This situation leads to storage of redundant data in many latches. Implementing and keeping this state may consume extra electrical power and may occupy more overhead integrated circuitry layout area. Note that in addition to showing the traditional transmission gate flip-flop (TGFF), Figure 2 also includes three recently proposed low-power FF designs [28][29][30]. Figure 2b shows the adaptive coupling (AC) flip-flop design [28]. Unlike the traditional MS latch-based design, this design uses a differential latch structure with pass transistor logic to achieve true single-phase clock (TSPC) operation. To overcome the effects of process variation on the master latch, a pair of level recovery circuits were inserted into the cross-coupled path. In this design, the clock drives only 4 transistors, and the total number of transistors is 22. When the FF operates at lower data-switching activity, lighter clock loads and reduced circuitry in the FF design can significantly reduce dynamic power consumption. However, there are floating problems in several nodes inside this FF design, which impose limitations on the applications of the ACFF design [28]. Figure 2c shows the true single-phase clock flip-flop, named TSPCFF [29]. It is composed of a conventional dynamic TSPC-based FF design with 9 transistors colored in blue and an additional 9 transistors to ensure its static operations and sufficient output drive capability. This FF design provides better power performance compared with the traditional TGFF design. Similar to the ACFF design, it also suffers from the floating problem of internal nodes. The CSFF (charge sensing flip-flop) design incorporates an XOR logic in its master latch to compare inputs D/DN with outputs QI/QN, as shown in Figure 2d [30]. In this architecture, the input data are captured only when a discrepancy occurs. This gives a performance edge in power consumption over conventional MS latch-based FF designs when the data switching activity is low. The simulation results indicate that this design encounters a floating problem of an internal node, as shown in Figure 3. Referring to the simulation waveforms shown below, if input data D change when CK = 1, internal nodes CS and DN are actually in a floating state, which adversely leads to extra power consumption in following inverter. Due to this reason, these FF designs are thus excluded from the performance comparison in this paper. These circuits use a true-signal-clock phasing scheme to reduce clock load and reduce dynamic power consumption. However, some internal floating nodes exist. Figure 3 shows the simulation waveforms of these FF designs. Therefore, in these designs, the same internal floating nodes dissipate additional static power consumption. Although this issue does not affect its function, additional static power consumption is confirmed, especially at low operating speeds or standby mode. Table 1 shows a comparison of the total number of transistors and the layout areas of the 2-bit shifted flip-flop designs, including the transistors used to generate differential clock signals and pulsed clock signals [27].

The Proposed Cross-Latch Shift Register (CLSR) Design
Appl. Sci. 2021, 11, x FOR PEER REVIEW 8 of 11 discrepancy occurs. This gives a performance edge in power consumption over conventional MS latch-based FF designs when the data switching activity is low. The simulation results indicate that this design encounters a floating problem of an internal node, as shown in Figure 3. Referring to the simulation waveforms shown below, if input data D change when CK = 1, internal nodes CS and DN are actually in a floating state, which adversely leads to extra power consumption in following inverter. Due to this reason, these FF designs are thus excluded from the performance comparison in this paper. These circuits use a true-signal-clock phasing scheme to reduce clock load and reduce dynamic power consumption. However, some internal floating nodes exist. Figure 3 shows the simulation waveforms of these FF designs. Therefore, in these designs, the same internal floating nodes dissipate additional static power consumption. Although this issue does not affect its function, additional static power consumption is confirmed, especially at low operating speeds or standby mode. Table 1 shows a comparison of the total number of transistors and the layout areas of the 2-bit shifted flip-flop designs, including the transistors used to generate differential clock signals and pulsed clock signals [27].    discrepancy occurs. This gives a performance edge in power consumption over conventional MS latch-based FF designs when the data switching activity is low. The simulation results indicate that this design encounters a floating problem of an internal node, as shown in Figure 3. Referring to the simulation waveforms shown below, if input data D change when CK = 1, internal nodes CS and DN are actually in a floating state, which adversely leads to extra power consumption in following inverter. Due to this reason, these FF designs are thus excluded from the performance comparison in this paper. These circuits use a true-signal-clock phasing scheme to reduce clock load and reduce dynamic power consumption. However, some internal floating nodes exist. Figure 3 shows the simulation waveforms of these FF designs. Therefore, in these designs, the same internal floating nodes dissipate additional static power consumption. Although this issue does not affect its function, additional static power consumption is confirmed, especially at low operating speeds or standby mode. Table 1 shows a comparison of the total number of transistors and the layout areas of the 2-bit shifted flip-flop designs, including the transistors used to generate differential clock signals and pulsed clock signals [27].

Principle of the Proposed Circuit Architecture
To solve the issues of high overhead layout area and high power consumption, the proposed CLSR scheme includes cross-latch architecture in optimizing the integrated circuitry layout area and lowering the power consumption. Figure 4 shows the architectures of the MS design (TGFF) and the proposed cross-latch shift register (CLSR) design. The proposed CLSR scheme consists of one cross-latch that performs the same functions as those of the master latch and slave latch of the conventional shift register. When CLK = 0, the slave latch S0 and master latch M1 store the same data. Thus, these two latches can be replaced by one cross-latch in the proposed CLSR design. When CLK = 1, the proposed CLSR uses one latch to hold the same data stored in the master latch and slave latch, respectively. In this CLSR scheme, we replaced the conventional master-slave (MS) latches (with the drawback of having two latches hold the same data at a different clock frequency) with one cross-latch.
In the following paragraph, we use a 3-bit shift register to explain the difference between the proposed CLSR scheme and the conventional MS latch architecture. Figure 5 shows the schematic of the proposed cross-latch shift register (CLSR) design. When CLK = 1, the master and slave latches of each flip-flop (FF0, FF1, FF2) in the conventional MS latch architecture store the same data. Thus, the two latches can be replaced by one latch (green box) and still hold the necessary data. Likewise, when CLK = 0, the master and slave latches of different flip-flops store the same data so that the data can be stored by the path shown in the blue box. The data are conducted through the latch in the blue box when CLK = 0, and they are conducted through the latch in the green box when CLK = 1.

Principle of the Proposed Circuit Architecture
To solve the issues of high overhead layout area and high power consumption, the proposed CLSR scheme includes cross-latch architecture in optimizing the integrated circuitry layout area and lowering the power consumption. Figure 4 shows the architectures of the MS design (TGFF) and the proposed cross-latch shift register (CLSR) design. The proposed CLSR scheme consists of one cross-latch that performs the same functions as those of the master latch and slave latch of the conventional shift register. When CLK = 0, the slave latch S0 and master latch M1 store the same data. Thus, these two latches can be replaced by one cross-latch in the proposed CLSR design. When CLK = 1, the proposed CLSR uses one latch to hold the same data stored in the master latch and slave latch, respectively. In this CLSR scheme, we replaced the conventional master-slave (MS) latches (with the drawback of having two latches hold the same data at a different clock frequency) with one cross-latch.
To implement this proposed CLSR scheme, an additional inverter is added in the flipflop FF0 to ensure that the input data and output Q are at the same level. In the last stage, flip-flop FF2, an independent latch is adopted to ensure that static operation is maintained in the output stage. According to the above discussion, when the N-bit shift register is built, we need to copy FF1 until N-2 bit, and FF0 and FF2 stay in the first and last stages, respectively. In sum, the CLSR design fulfills the fundamental functions that the MS design has and wipes out the drawbacks that the MS design bears.    Figure 6b shows the post-layout simulation waveforms of the 256-bit shift register implemented using the proposed CLSR scheme design when operating at 0.9 V/1.0 GHz/TT corner. The industry uses a two-letter designation to describe the different corners, where the first letter refers to the NMOS device, and the second refers to the PMOS device. The TT corner is the center corner where wafers are normally produced (e.g., typical process parameters). The proposed CLSR scheme design satisfies the realization of the shift register while reducing circuit complexity function and maintaining a fully static operation. Although the proposed CLSR scheme reduces both the power consumption and layout area of the circuit, it must bear the challenges caused by data conflicts. Figure 6a shows a specific situation caused by the rising edge of the clock. As the clock driver changes its phase from low to high, it produces a short phase difference between the rising edge and falling edge, so that CLKB and CLKI achieve equal potential. The previous data feedback path (red) is stronger than the current input data path (green). Therefore, a data conflict exists between nodes X and Y. The voltage of this glitch is about 0.167 V. This issue will cause additional power consumption, and the reliability of the circuit might be reduced [31].   In the following paragraph, we use a 3-bit shift register to explain the difference between the proposed CLSR scheme and the conventional MS latch architecture. Figure 5 shows the schematic of the proposed cross-latch shift register (CLSR) design. When CLK = 1, the master and slave latches of each flip-flop (FF0, FF1, FF2) in the conventional MS latch architecture store the same data. Thus, the two latches can be replaced by one latch (green box) and still hold the necessary data. Likewise, when CLK = 0, the master and slave latches of different flip-flops store the same data so that the data can be stored by the path shown in the blue box. The data are conducted through the latch in the blue box when CLK = 0, and they are conducted through the latch in the green box when CLK = 1. The proposed CLSR scheme will not store redundant data during data transmission. Therefore, low power consumption and less integrated circuitry layout area design goals can be achieved simply because we have reduced the number of transistors dramatically. The proposed CLSR scheme will not store redundant data during data transmission. Therefore, low power consumption and less integrated circuitry layout area design goals can be achieved simply because we have reduced the number of transistors dramatically.

Delay Clock Circuit
To implement this proposed CLSR scheme, an additional inverter is added in the flipflop FF0 to ensure that the input data and output Q are at the same level. In the last stage, flip-flop FF2, an independent latch is adopted to ensure that static operation is maintained in the output stage. According to the above discussion, when the N-bit shift register is built, we need to copy FF1 until N-2 bit, and FF0 and FF2 stay in the first and last stages, respectively. In sum, the CLSR design fulfills the fundamental functions that the MS design has and wipes out the drawbacks that the MS design bears.    Figure 6b shows the post-layout simulation waveforms of the 256-bit shift register implemented using the proposed CLSR scheme design when operating at 0.9 V/1.0 GHz/TT corner. The industry uses a two-letter designation to describe the different corners, where the first letter refers to the NMOS device, and the second refers to the PMOS device. The TT corner is the center corner where wafers are normally produced (e.g., typical process parameters). The proposed CLSR scheme design satisfies the realization of the shift register while reducing circuit complexity function and maintaining a fully static operation. Although the proposed CLSR scheme reduces both the power consumption and layout area of the circuit, it must bear the challenges caused by data conflicts. Figure 6a shows a specific situation caused by the rising edge of the clock. As the clock driver changes its phase from low to high, it produces a short phase difference between the rising edge and falling edge, so that CLKB and CLKI achieve equal potential. The previous data feedback path (red) is stronger than the current input data path (green). Therefore, a data conflict exists between nodes X and Y. The voltage of this glitch is about 0.167 V. This issue will cause additional power consumption, and the reliability of the circuit might be reduced [31]. To implement this proposed CLSR scheme, an additional inverter is added in the flip-flop FF0 to ensure that the input data and output Q are at the same level. In the last stage, flip-flop FF2, an independent latch is adopted to ensure that static operation is maintained in the output stage. According to the above discussion, when the N-bit shift register is built, we need to copy FF1 until N-2 bit, and FF0 and FF2 stay in the first and last stages, respectively. In sum, the CLSR design fulfills the fundamental functions that the MS design has and wipes out the drawbacks that the MS design bears. Figure 6b shows the post-layout simulation waveforms of the 256-bit shift register implemented using the proposed CLSR scheme design when operating at 0.9 V/1.0 GHz/TT corner. The industry uses a two-letter designation to describe the different corners, where the first letter refers to the NMOS device, and the second refers to the PMOS device. The TT corner is the center corner where wafers are normally produced (e.g., typical process parameters). The proposed CLSR scheme design satisfies the realization of the shift register while reducing circuit complexity function and maintaining a fully static operation. Although the proposed CLSR scheme reduces both the power consumption and layout area of the circuit, it must bear the challenges caused by data conflicts.  Figure 6a shows a specific situation caused by the rising edge of the clock. As the clock driver changes its phase from low to high, it produces a short phase difference between the rising edge and falling edge, so that CLKB and CLKI achieve equal potential. The previous data feedback path (red) is stronger than the current input data path (green). Therefore, a data conflict exists between nodes X and Y. The voltage of this glitch is about 0.167 V. This issue will cause additional power consumption, and the reliability of the circuit might be reduced [31].

Delay Clock Circuit
To solve this issue, we proposed a circuit that can output CLKBI and CLKI synchronously, shown in Figure 6c. The delay circuit I2 uses only two transistors (1pMOS plus 1nMOS). Since I1 and I2 receive a signal from the CLKB node at the same time, the VGS of I1 and I2 are changed at the same time. In other words, I1 and I2 can provide CLKBI and CLKI signals synchronously to the proposed CLSR design and help transmission gates to be started and closed at the same time [32]. The simulation waveforms are shown in Figure 6d. In the master-slave latch, there is one inverter delay between CLKB and CLKI. Thus, a short contention exists and power consumption increases (red circle). By using the delay clocking circuit, the synchronous output of CLKBI and CLKI (blue circle) can be achieved. Thus, the proposed design not only solves the power problem, but it also has better race characteristics than a conventional master-slave-based design.

Analysis of Layout Area and Power Consumption of the Proposed CLSR Design
To accomplish the goals of achieving low integrated circuitry layout area and low power consumption, the proposed flip-flop is designed with cross-latch architecture. Since the conventional master-slave (MS) latch transmission gate flip-flop (TGFF) is the most widely used flip-flop, it is necessary to make a comparison between this proposed CLSR scheme and the traditional MS latch flip-flop. The proposed CLSR scheme was implemented in TSMC-40 nm CMOS process standard technology [33]. Figure 7 shows the post-simulation results of the proposed 256-bit CLSR design under the process of 1GHz/TT corner frequency at a supply voltage of 0.9 V. The output QN is generated by shifting the input data according to the clock signal. When the clock is triggered positively 16 times, the data are shifted by 16-bit, and the signal of Q15 is generated. When the clock is triggered positively 256 times, the data are shifted by 256-bit, and the signal of Q255 is generated. It can be seen that after adding a delay clock circuit (Figure 6c), the above-mentioned glitch issue is clearly wiped out.

OR PEER REVIEW 8 of 11
To solve this issue, we proposed a circuit that can output CLKBI and CLKI synchronously, shown in Figure 6c. The delay circuit I2 uses only two transistors (1pMOS plus 1nMOS). Since I1 and I2 receive a signal from the CLKB node at the same time, the VGS of I1 and I2 are changed at the same time. In other words, I1 and I2 can provide CLKBI and CLKI signals synchronously to the proposed CLSR design and help transmission gates to be started and closed at the same time [32]. The simulation waveforms are shown in Figure 6d. In the master-slave latch, there is one inverter delay between CLKB and CLKI. Thus, a short contention exists and power consumption increases (red circle). By using the delay clocking circuit, the synchronous output of CLKBI and CLKI (blue circle) can be achieved. Thus, the proposed design not only solves the power problem, but it also has better race characteristics than a conventional master-slave-based design.

Analysis of Layout Area and Power Consumption of the Proposed CLSR Design
To accomplish the goals of achieving low integrated circuitry layout area and low power consumption, the proposed flip-flop is designed with cross-latch architecture.
Since the conventional master-slave (MS) latch transmission gate flip-flop (TGFF) is the most widely used flip-flop, it is necessary to make a comparison between this proposed CLSR scheme and the traditional MS latch flip-flop. The proposed CLSR scheme was implemented in TSMC-40 nm CMOS process standard technology [33]. Figure 7 shows the post-simulation results of the proposed 256-bit CLSR design under the process of 1GHz/TT corner frequency at a supply voltage of 0.9 V. The output QN is generated by shifting the input data according to the clock signal. When the clock is triggered positively 16 times, the data are shifted by 16-bit, and the signal of Q15 is generated. When the clock is triggered positively 256 times, the data are shifted by 256-bit, and the signal of Q255 is generated. It can be seen that after adding a delay clock circuit (Figure 6c), the abovementioned glitch issue is clearly wiped out.  Figure 8 shows the layout comparison between a 256-bit shift register that was constructed based on this proposed CLSR scheme and the conventional master-slave latch architecture. The number of transistors in the conventional MS latches shift register design is 6144, and the number in the proposed CLSR scheme design is 3174. Compared with the    Figure 9a shows the power consumption of the proposed CLSR scheme design in different switching activities (0-100%) at 0.9 V/250 MHz. Because the required number of transistors is decreased dramatically, the proposed CLSR scheme reduces the average power consumption by 36% compared with the conventional MS latch design. Additionally, the gates capacitances are reduced, and the dynamic power dissipation is suppressed as well. Figure 9b shows a comparison of leakage powers under different word lengths between MS latch and CLSR architectures. The leakage current of an MS latch shift register is mainly contributed by the inverters and is proportional to the numbers of inverters. In the proposed CLSR scheme, the number of inverters is reduced significantly. Thus, the proposed 256-bit CLSR scheme design saves 60.53% leakage current as compared with the conventional design.   Figure 9a shows the power consumption of the proposed CLSR scheme design in different switching activities (0-100%) at 0.9 V/250 MHz. Because the required number of transistors is decreased dramatically, the proposed CLSR scheme reduces the average power consumption by 36% compared with the conventional MS latch design. Additionally, the gates capacitances are reduced, and the dynamic power dissipation is suppressed as well. Figure 9b shows a comparison of leakage powers under different word lengths between MS latch and CLSR architectures. The leakage current of an MS latch shift register is mainly contributed by the inverters and is proportional to the numbers of inverters. In the proposed CLSR scheme, the number of inverters is reduced significantly. Thus, the proposed 256-bit CLSR scheme design saves 60.53% leakage current as compared with the conventional design.   Figure 9a shows the power consumption of the proposed CLSR scheme design in different switching activities (0-100%) at 0.9 V/250 MHz. Because the required number of transistors is decreased dramatically, the proposed CLSR scheme reduces the average power consumption by 36% compared with the conventional MS latch design. Additionally, the gates capacitances are reduced, and the dynamic power dissipation is suppressed as well. Figure 9b shows a comparison of leakage powers under different word lengths between MS latch and CLSR architectures. The leakage current of an MS latch shift register is mainly contributed by the inverters and is proportional to the numbers of inverters. In the proposed CLSR scheme, the number of inverters is reduced significantly. Thus, the proposed 256-bit CLSR scheme design saves 60.53% leakage current as compared with the conventional design.   Figure 9c shows the power consumption at different operating frequencies (1 MHz−1 GHz). From the results, the proposed CLSR scheme achieves an average power saving of 35% as compared with the conventional MS latch design. Figure 9d shows the power consumption of the proposed CLSR scheme design under different voltages. To ensure both designs can operate normally, the operating frequency is set at 25 MHz. From the post-layout simulation results, we find that when the proposed CLSR scheme operates from 0.4 V to 0.9 V, it can save 31% power consumption on average compared with the traditional design. Figure 10 shows its optimal power delay product (PDP) performance. The PDP was employed as a composite performance index in this work. For each supply voltage, the CQ delay was scanned to obtain the best PDP number. Both shift register designs were determined to function properly under process variations. The proposed CLSR scheme works well in any kind of simulation trials. According to the above results, the proposed CLSR scheme applied to the application of shift registers achieved better performance such as area, power, and PDP, compared with that of the master-slave architecture. Moreover, to achieve low power design, recently reported works try to extremely reduce the number of transistors [28][29][30]. However, these flip-flops have floating problems of internal nodes. Thus, the reported flip-flops must be selected carefully for different applications.
layout simulation results, we find that when the proposed CLSR scheme operates from 0.4 V to 0.9 V, it can save 31% power consumption on average compared with the traditional design. Figure 10 shows its optimal power delay product (PDP) performance. The PDP was employed as a composite performance index in this work. For each supply voltage, the CQ delay was scanned to obtain the best PDP number. Both shift register designs were determined to function properly under process variations. The proposed CLSR scheme works well in any kind of simulation trials. According to the above results, the proposed CLSR scheme applied to the application of shift registers achieved better performance such as area, power, and PDP, compared with that of the master-slave architecture. Moreover, to achieve low power design, recently reported works try to extremely reduce the number of transistors [28][29][30]. However, these flip-flops have floating problems of internal nodes. Thus, the reported flip-flops must be selected carefully for different applications.
Finally, Table 2 shows a detailed performance comparison of different 256-bit shift registers. The area of the proposed CLSR scheme design is 1932.47 µm 2 . Compared with the conventional MS latch shift register (2961.88 µm 2 ), the proposed CLSR scheme design achieves a 34.76% reduction in the integrated circuitry layout area. Table 2 also records the power consumption, leakage power, and PDP at 0.9 V and 0.4 V. When operating at 0.9 V/250 MHz, the proposed design can save up to 36% power consumption and 16.90% PDP. At the same time, due to the large saving of the number of transistors, the proposed CLSR scheme can effectively suppress leakage power consumption [34].

Operation voltage(V)
Master-slave CLSR Figure 10. Power delay product performance at different supply voltages.
Finally, Table 2 shows a detailed performance comparison of different 256-bit shift registers. The area of the proposed CLSR scheme design is 1932.47 µm 2 . Compared with the conventional MS latch shift register (2961.88 µm 2 ), the proposed CLSR scheme design achieves a 34.76% reduction in the integrated circuitry layout area. Table 2 also records the power consumption, leakage power, and PDP at 0.9 V and 0.4 V. When operating at 0.9 V/250 MHz, the proposed design can save up to 36% power consumption and 16.90% PDP. At the same time, due to the large saving of the number of transistors, the proposed CLSR scheme can effectively suppress leakage power consumption [34].

Conclusions
This paper proposed a novel cross-latch shift register (CLSR) scheme attempting to reduce power consumption and to eliminate integrated circuity overhead layout area. By using one cross-latch with delay clocking circuit typology to replace the master and slave latches, we reduced the number of transistors by 48.33% as compared with the conventional MS latch shift register. At the same time, the total gates capacitances were reduced, and the static and dynamic power dissipation were suppressed as well. The proposed CLSR scheme was verified by employing TSMC 40 nm CMOS process standard technology. Experimental comparisons with respect to the conventional MS latch design showed that the proposed CLSR scheme not only works well in reducing average power consumption and leakage power but also presents a better performance in circuitry layout area elimination. The research group suggests that a potential application for this novel CLSR scheme is adopting it for digital circuitry-related design approaches.