QCA-Based PIPO and SIPO Shift Registers Using Cost-Optimized and Energy-Efﬁcient D Flip Flop

: With the growing use of quantum-dot cellular automata (QCA) nanotechnology, digital circuits designed at the Nanoscale have a number of advantages over CMOS devices, including the lower utilization of power, increased processing speed of the circuit, and higher density. There are several ﬂip ﬂop designs proposed in the literature with their realization in the QCA technology. However, the majority of these designs suffer from large cell counts, large area utilization, and latency, which leads to the high cost of the circuits. To address this, this work performed a literature survey of the D ﬂip ﬂop (DFF) designs and complex sequential circuits that can be designed from it. A new design of D ﬂip ﬂop was proposed in this work and to assess the performance of the proposed QCA design, an in-depth comparison with existing designs was performed. Further, sequential circuits such as parallel-in-parallel-out (PIPO) and serial-in-parallel-out (SIPO) shift registers were designed using the ﬂip ﬂop design that was put forward. A comprehensive evaluation of the energy dissipation of all presented fundamental ﬂip-ﬂop circuits and other sequential circuits was also performed using the QCAPro tool, and their energy dissipation maps were also obtained. The suggested designs showed lower power dissipation and were cost-efﬁcient, making them suitable for designing higher-power circuits.


Introduction
CMOS technology may be severely limited by the small geometry device characteristics, which can severely restrict transistor operation. Sub-threshold conduction, DIBL, punch-through effect, and hot carrier channel effect are some of the small geometry conditions that affect the transistor performance and can damage the device permanently. Interconnect damage caused by electromigration, electrostatic discharge, and electrical overstrain are also considerable apprehensions for the reliability of small geometry devices [1,2]. Researchers are moving towards substitute technologies and QCA nanotechnology is one of the promising technologies in this area [3,4]. The fundamental principle of the exchange of information or the information flow in QCA technology is quantum mechanical tunneling [5][6][7][8][9]. There is a repulsive effect between the electrons and is of columbic nature. This provides the foundation for the information flow in QCA circuits. The QCA paradigm has some basic building blocks that are incorporated in every circuit designed. A cell in QCA is such a building block that resides quantum dots at four positions incorporating two itinerant electrons [10][11][12][13][14][15][16]. The quantum dots result in the formation of four tunnel junctions among them, which allow the easy passage of electrons among them. These electrons try to occupy the antipodal sites due to the mutual columbic repulsion. A cell in QCA is defined by its polarization state, and two polarization states exist that define either '0 logic or '1 logic [17][18][19][20]. A basic cell in QCA and how it occupies its two polarization states is depicted in Figure 1. This cell in QCA helps in forming other basic structures like a basic QCA wire. The first cell acts like a driver cell which affects the polarization of the cell next to it and so on. Cells adjust electrons so that there is minimum columbic repulsion between them, and information is transmitted down the QCA wire in this way [21][22][23][24]. Figure 2a,b show the two types of QCA wires i.e., the 90 • wire and the 45 • wire, respectively. From a 45 • wire, the complement and uncomplimentary values of a signal can both be obtained. A majority voter gate and an inverter gate are two important components in the design of the circuits in QCA. These are the basic blocks for building circuits in QCA. A majority voter gate with three inputs in QCA is presented in Figure 3 [23,25,26]. columbic nature. This provides the foundation for the information flow in QCA circuits.
The QCA paradigm has some basic building blocks that are incorporated in every circuit designed. A cell in QCA is such a building block that resides quantum dots at four positions incorporating two itinerant electrons [10][11][12][13][14][15][16]. The quantum dots result in the formation of four tunnel junctions among them, which allow the easy passage of electrons among them. These electrons try to occupy the antipodal sites due to the mutual columbic repulsion. A cell in QCA is defined by its polarization state, and two polarization states exist that define either '0′ logic or '1′ logic [17][18][19][20]. A basic cell in QCA and how it occupies its two polarization states is depicted in Figure 1. This cell in QCA helps in forming other basic structures like a basic QCA wire. The first cell acts like a driver cell which affects the polarization of the cell next to it and so on. Cells adjust electrons so that there is minimum columbic repulsion between them, and information is transmitted down the QCA wire in this way [21][22][23][24]. Figure 2a,b show the two types of QCA wires i.e., the 90˚ wire and the 45° wire, respectively. From a 45° wire, the complement and uncomplimentary values of a signal can both be obtained. A majority voter gate and an inverter gate are two important components in the design of the circuits in QCA. These are the basic blocks for building circuits in QCA. A majority voter gate with three inputs in QCA is presented in Figure 3 [23, 25,26].  [2,25]. In QCA, clocking plays a substantial part in the proper working of the circuits. It is responsible for the control of data flow in a circuit. In addition to this, it acts as a supply  [27]. columbic nature. This provides the foundation for the information flow in QCA circuits.
The QCA paradigm has some basic building blocks that are incorporated in every circuit designed. A cell in QCA is such a building block that resides quantum dots at four positions incorporating two itinerant electrons [10][11][12][13][14][15][16]. The quantum dots result in the formation of four tunnel junctions among them, which allow the easy passage of electrons among them. These electrons try to occupy the antipodal sites due to the mutual columbic repulsion. A cell in QCA is defined by its polarization state, and two polarization states exist that define either '0′ logic or '1′ logic [17][18][19][20]. A basic cell in QCA and how it occupies its two polarization states is depicted in Figure 1. This cell in QCA helps in forming other basic structures like a basic QCA wire. The first cell acts like a driver cell which affects the polarization of the cell next to it and so on. Cells adjust electrons so that there is minimum columbic repulsion between them, and information is transmitted down the QCA wire in this way [21][22][23][24]. Figure 2a,b show the two types of QCA wires i.e., the 90˚ wire and the 45° wire, respectively. From a 45° wire, the complement and uncomplimentary values of a signal can both be obtained. A majority voter gate and an inverter gate are two important components in the design of the circuits in QCA. These are the basic blocks for building circuits in QCA. A majority voter gate with three inputs in QCA is presented in Figure 3 [23, 25,26].  In QCA, clocking plays a substantial part in the proper working of the circuits. It is responsible for the control of data flow in a circuit. In addition to this, it acts as a supply columbic nature. This provides the foundation for the information flow in QCA circuits.
The QCA paradigm has some basic building blocks that are incorporated in every circuit designed. A cell in QCA is such a building block that resides quantum dots at four positions incorporating two itinerant electrons [10][11][12][13][14][15][16]. The quantum dots result in the formation of four tunnel junctions among them, which allow the easy passage of electrons among them. These electrons try to occupy the antipodal sites due to the mutual columbic repulsion. A cell in QCA is defined by its polarization state, and two polarization states exist that define either '0′ logic or '1′ logic [17][18][19][20]. A basic cell in QCA and how it occupies its two polarization states is depicted in Figure 1. This cell in QCA helps in forming other basic structures like a basic QCA wire. The first cell acts like a driver cell which affects the polarization of the cell next to it and so on. Cells adjust electrons so that there is minimum columbic repulsion between them, and information is transmitted down the QCA wire in this way [21][22][23][24]. Figure 2a,b show the two types of QCA wires i.e., the 90˚ wire and the 45° wire, respectively. From a 45° wire, the complement and uncomplimentary values of a signal can both be obtained. A majority voter gate and an inverter gate are two important components in the design of the circuits in QCA. These are the basic blocks for building circuits in QCA. A majority voter gate with three inputs in QCA is presented in Figure 3 [23, 25,26].  In QCA, clocking plays a substantial part in the proper working of the circuits. It is responsible for the control of data flow in a circuit. In addition to this, it acts as a supply In QCA, clocking plays a substantial part in the proper working of the circuits. It is responsible for the control of data flow in a circuit. In addition to this, it acts as a supply and delivers power to the cells [6, 10,14]. The clock is applied in four zones to a circuit in QCA. Each of these zones is represented by a different color. Green is for clock 0, clock 1 is represented by magenta, clock 2 by blue, and clock 3 is represented by white. Each clock zone has four distinct phases, i.e., switch, hold, release, and relax. Figure 4 depicts these different clocking zones in QCA [1,26]. and delivers power to the cells [6, 10,14]. The clock is applied in four zones to a circuit in QCA. Each of these zones is represented by a different color. Green is for clock 0, clock 1 is represented by magenta, clock 2 by blue, and clock 3 is represented by white. Each clock zone has four distinct phases, i.e., switch, hold, release, and relax. Figure 4 depicts these different clocking zones in QCA [1,26].  [1,26].
The remaining paper is organized as follows: Section 2 presents the literature review of D Flip Flop designs and shift registers highlighting their drawbacks. Section 3 presents the proposed design of the D flip flop along with its QCA implementation. Section 4 presents the proposed shift register designs along with their QCA implementation followed by the energy dissipation analysis of all designs in Section 5. Section 6 presents the performance comparison of all designs followed by conclusion in Section 7.

Literature Review
A flip flop is a sequential circuit that has output dependency on its present input and the past output. It is a memory unit that has the capability of storing 1 bit of binary information [6]. For the realization of more complex sequential circuits, flip flops and memory cells act as the characteristic building blocks. This section presents the various D flip flop designs existing in the literature. These designs are discussed in [28][29][30][31]. The drawbacks of these designs were highlighted, and a performance comparison was drawn based on the various performance parameters of QCA. The design discussed in [28] used 59 cells with an area of 0.075 μm 2 . It showed a delay of 1.75. The design proposed in [29] used 56 cells and had latency of 2.5. Hence the cell count, as well as the latency for this design, were large along with the utilization of more area, which increased overall quantum cost. The D flip flop design proposed in [30] utilized 48 QCA cells but showed a lower latency equal to 0.75. Their design is shown in Figure 5. Further, a four-bit PIPO shift register was designed using this D flip flop design that consisted of 260 cells in its QCA design and showed a latency equal to 1. This shift register design is shown in Figure 6.  The remaining paper is organized as follows: Section 2 presents the literature review of D Flip Flop designs and shift registers highlighting their drawbacks. Section 3 presents the proposed design of the D flip flop along with its QCA implementation. Section 4 presents the proposed shift register designs along with their QCA implementation followed by the energy dissipation analysis of all designs in Section 5. Section 6 presents the performance comparison of all designs followed by conclusion in Section 7.

Literature Review
A flip flop is a sequential circuit that has output dependency on its present input and the past output. It is a memory unit that has the capability of storing 1 bit of binary information [6]. For the realization of more complex sequential circuits, flip flops and memory cells act as the characteristic building blocks. This section presents the various D flip flop designs existing in the literature. These designs are discussed in [28][29][30][31]. The drawbacks of these designs were highlighted, and a performance comparison was drawn based on the various performance parameters of QCA. The design discussed in [28] used 59 cells with an area of 0.075 µm 2 . It showed a delay of 1.75. The design proposed in [29] used 56 cells and had latency of 2.5. Hence the cell count, as well as the latency for this design, were large along with the utilization of more area, which increased overall quantum cost. The D flip flop design proposed in [30] utilized 48 QCA cells but showed a lower latency equal to 0.75. Their design is shown in Figure 5. Further, a four-bit PIPO shift register was designed using this D flip flop design that consisted of 260 cells in its QCA design and showed a latency equal to 1. This shift register design is shown in Figure 6. and delivers power to the cells [6, 10,14]. The clock is applied in four zones to a circuit in QCA. Each of these zones is represented by a different color. Green is for clock 0, clock 1 is represented by magenta, clock 2 by blue, and clock 3 is represented by white. Each clock zone has four distinct phases, i.e., switch, hold, release, and relax. Figure 4 depicts these different clocking zones in QCA [1,26]. The remaining paper is organized as follows: Section 2 presents the literature review of D Flip Flop designs and shift registers highlighting their drawbacks. Section 3 presents the proposed design of the D flip flop along with its QCA implementation. Section 4 presents the proposed shift register designs along with their QCA implementation followed by the energy dissipation analysis of all designs in Section 5. Section 6 presents the performance comparison of all designs followed by conclusion in Section 7.

Literature Review
A flip flop is a sequential circuit that has output dependency on its present input and the past output. It is a memory unit that has the capability of storing 1 bit of binary information [6]. For the realization of more complex sequential circuits, flip flops and memory cells act as the characteristic building blocks. This section presents the various D flip flop designs existing in the literature. These designs are discussed in [28][29][30][31]. The drawbacks of these designs were highlighted, and a performance comparison was drawn based on the various performance parameters of QCA. The design discussed in [28] used 59 cells with an area of 0.075 μm 2 . It showed a delay of 1.75. The design proposed in [29] used 56 cells and had latency of 2.5. Hence the cell count, as well as the latency for this design, were large along with the utilization of more area, which increased overall quantum cost. The D flip flop design proposed in [30] utilized 48 QCA cells but showed a lower latency equal to 0.75. Their design is shown in Figure 5. Further, a four-bit PIPO shift register was designed using this D flip flop design that consisted of 260 cells in its QCA design and showed a latency equal to 1. This shift register design is shown in Figure 6.   [30].
The design proposed in [31] used 44 cells and showed the latency of 1. The design discussed in [13] used 54 cells in its QCA design and increased delay equal to 1.25. This used a larger cell area and a larger total area along with the increase in total quantum cost. The design proposed in [19] utilized 24 cells in its QCA design. It occupied a cell area of about 0.007776 μm 2 and showed latency equal to four clock phases, i.e., one clock cycle. Further, a four-bit shift register was designed using the presented flip flop design. It utilized 136 cells in QCA design utilizing a total area of about 0.454 μm 2 . The latency for this design was 1, and the quantum cost was 0.454.
From the broad literature review of on flip flops designed in QCA and the realization of other more complex sequential circuits using those designs, it is found that there is a prerequisite for optimized designs of the shift register and other sequential circuits in QCA nanotechnology. From Equation (1), it is observed that when clock CLK is 1, the output of the flip flop will be equal to the value of input D, and when clock CLK is 0, the output will not change and will be equal to the previous state Qn−1. The proposed design of DFF utilizes 21 cells in QCA implementation and occupies a total area of about 0.04 μm 2 . It has a latency of 1 and a cell area of 0.0068 μm 2 . It responds to the positive level of the input clock signal C and implementation of this design in QCA along with its simulated waveform are presented in Figure 8 and Figure 9, respectively. The output Q follows input D with a delay of 1. When clock C is 1, the input sequence is 01, and therefore, output is also 01 for that duration. Then, the output retains its previous value when the clock is 0. Further, when the next clock cycle appears, output again follows the input. The proposed flip flop was designed using the QCADesigner tool [32]. The design proposed in [31] used 44 cells and showed the latency of 1. The design discussed in [13] used 54 cells in its QCA design and increased delay equal to 1.25. This used a larger cell area and a larger total area along with the increase in total quantum cost. The design proposed in [19] utilized 24 cells in its QCA design. It occupied a cell area of about 0.007776 µm 2 and showed latency equal to four clock phases, i.e., one clock cycle. Further, a four-bit shift register was designed using the presented flip flop design. It utilized 136 cells in QCA design utilizing a total area of about 0.454 µm 2 . The latency for this design was 1, and the quantum cost was 0.454.

Proposed QCA Design of D Flip Flop
From the broad literature review of on flip flops designed in QCA and the realization of other more complex sequential circuits using those designs, it is found that there is a prerequisite for optimized designs of the shift register and other sequential circuits in QCA nanotechnology.  [30].

Proposed QCA Design of D Flip Flop
The design proposed in [31] used 44 cells and showed the latency of 1. The design discussed in [13] used 54 cells in its QCA design and increased delay equal to 1.25. This used a larger cell area and a larger total area along with the increase in total quantum cost. The design proposed in [19] utilized 24 cells in its QCA design. It occupied a cell area of about 0.007776 μm 2 and showed latency equal to four clock phases, i.e., one clock cycle. Further, a four-bit shift register was designed using the presented flip flop design. It utilized 136 cells in QCA design utilizing a total area of about 0.454 μm 2 . The latency for this design was 1, and the quantum cost was 0.454.
From the broad literature review of on flip flops designed in QCA and the realization of other more complex sequential circuits using those designs, it is found that there is a prerequisite for optimized designs of the shift register and other sequential circuits in QCA nanotechnology. From Equation (1), it is observed that when clock CLK is 1, the output of the flip flop will be equal to the value of input D, and when clock CLK is 0, the output will not change and will be equal to the previous state Qn−1. The proposed design of DFF utilizes 21 cells in QCA implementation and occupies a total area of about 0.04 μm 2 . It has a latency of 1 and a cell area of 0.0068 μm 2 . It responds to the positive level of the input clock signal C and implementation of this design in QCA along with its simulated waveform are presented in Figure 8 and Figure 9, respectively. The output Q follows input D with a delay of 1. When clock C is 1, the input sequence is 01, and therefore, output is also 01 for that duration. Then, the output retains its previous value when the clock is 0. Further, when the next clock cycle appears, output again follows the input. The proposed flip flop was designed using the QCADesigner tool [32]. From Equation (1), it is observed that when clock CLK is 1, the output of the flip flop will be equal to the value of input D, and when clock CLK is 0, the output will not change and will be equal to the previous state Q n−1 . The proposed design of DFF utilizes 21 cells in QCA implementation and occupies a total area of about 0.04 µm 2 . It has a latency of 1 and a cell area of 0.0068 µm 2 . It responds to the positive level of the input clock signal C and implementation of this design in QCA along with its simulated waveform are presented in Figures 8 and 9, respectively. The output Q follows input D with a delay of 1. When clock C is 1, the input sequence is 01, and therefore, output is also 01 for that duration. Then, the output retains its previous value when the clock is 0. Further, when the next clock cycle appears, output again follows the input. The proposed flip flop was designed using the QCA Designer tool [32].

Proposed Shift Register Designs
Flip flops and registers form the essential constituent. Many of the digital circuits include special purpose processors and the design of memory chips. DFF and shift registers can further be used for designing counters. In this work, new designs for different types of registers using the proposed DFF were proposed.

PIPO Designs
When the inputs are applied in a parallel manner to the register and outputs are obtained in a parallel manner, the register is called a parallel-input-parallel-output shift

Proposed Shift Register Designs
Flip flops and registers form the essential constituent. Many of the digital circuits include special purpose processors and the design of memory chips. DFF and shift registers can further be used for designing counters. In this work, new designs for different types of registers using the proposed DFF were proposed.

PIPO Designs
When the inputs are applied in a parallel manner to the register and outputs are obtained in a parallel manner, the register is called a parallel-input-parallel-output shift

Proposed Shift Register Designs
Flip flops and registers form the essential constituent. Many of the digital circuits include special purpose processors and the design of memory chips. DFF and shift registers can further be used for designing counters. In this work, new designs for different types of registers using the proposed DFF were proposed.

PIPO Designs
When the inputs are applied in a parallel manner to the register and outputs are obtained in a parallel manner, the register is called a parallel-input-parallel-output shift register. The block diagram of a simple two-bit PIPO formed using three D flip flops is shown in Figure 10. It can be seen that all inputs enter the respective flip flops in a parallel manner and that all the outputs are obtained parallel too. register. The block diagram of a simple two-bit PIPO formed using three D flip flops is shown in Figure 10. It can be seen that all inputs enter the respective flip flops in a parallel manner and that all the outputs are obtained parallel too. The two-bit PIPO shift register was constructed using two proposed DFFs. The proposed DFF design was used here. It utilized 55 cells in its QCA design. The two parallel inputs were D1 and D2 for stage 1 and stage 2, respectively. CLK is the clock signal applied to the circuit. Q1 and Q2 were two parallel outputs from stages 1 and 2, respectively. Each stage had a latency of 1 because the QCA clock zone changes four phases from respective input to output with a delay of 0.25 for each zone. Figure 11 shows the two-bit PIPO shift register design as employed in QCA, and Figure 12 shows its simulation graph. It can be seen that when the positive level of the clock appears, the outputs start to follow their respective inputs with a delay of 1 for each stage. For the first clock, the input D1 is 0011, and it can be seen that output Q1 reflects the same sequence 01 with the delay. The second input D2 is 0101 for this duration, and output Q2 reflects the same sequence with the delay of 1.  The two-bit PIPO shift register was constructed using two proposed DFFs. The proposed DFF design was used here. It utilized 55 cells in its QCA design. The two parallel inputs were D1 and D2 for stage 1 and stage 2, respectively. CLK is the clock signal applied to the circuit. Q1 and Q2 were two parallel outputs from stages 1 and 2, respectively. Each stage had a latency of 1 because the QCA clock zone changes four phases from respective input to output with a delay of 0.25 for each zone. Figure 11 shows the two-bit PIPO shift register design as employed in QCA, and Figure 12 shows its simulation graph. It can be seen that when the positive level of the clock appears, the outputs start to follow their respective inputs with a delay of 1 for each stage. For the first clock, the input D1 is 0011, and it can be seen that output Q1 reflects the same sequence 01 with the delay. The second input D2 is 0101 for this duration, and output Q2 reflects the same sequence with the delay of 1. register. The block diagram of a simple two-bit PIPO formed using three D flip flops is shown in Figure 10. It can be seen that all inputs enter the respective flip flops in a parallel manner and that all the outputs are obtained parallel too. The two-bit PIPO shift register was constructed using two proposed DFFs. The proposed DFF design was used here. It utilized 55 cells in its QCA design. The two parallel inputs were D1 and D2 for stage 1 and stage 2, respectively. CLK is the clock signal applied to the circuit. Q1 and Q2 were two parallel outputs from stages 1 and 2, respectively. Each stage had a latency of 1 because the QCA clock zone changes four phases from respective input to output with a delay of 0.25 for each zone. Figure 11 shows the two-bit PIPO shift register design as employed in QCA, and Figure 12 shows its simulation graph. It can be seen that when the positive level of the clock appears, the outputs start to follow their respective inputs with a delay of 1 for each stage. For the first clock, the input D1 is 0011, and it can be seen that output Q1 reflects the same sequence 01 with the delay. The second input D2 is 0101 for this duration, and output Q2 reflects the same sequence with the delay of 1.  The four-bit PIPO shift register is devised with the use of four proposed DFF. It utilizes 114 cells in its QCA implementation. The four parallel inputs to the four stages are D 1 , D 2 , D 3 , and D 4 . Q 1 , Q 2 , Q 3 , and Q 4 are the respective parallel outputs from each of these stages. The latency for each stage is 1. Figures 13 and 14 show the four-bit PIPO shift register design as employed in QCA and its simulation graph, respectively.
The performance parameters of the proposed two-bit PIPO and four-bit PIPO shift registers are given in Table 1. The PIPO designs are effectual in terms of fewer total cells utilized in the design, the area occupied by the cells, total area, cost, and latency. The four-bit PIPO shift register is devised with the use of four proposed DFF. It utilizes 114 cells in its QCA implementation. The four parallel inputs to the four stages are D1, D2, D3, and D4. Q1, Q2, Q3, and Q4 are the respective parallel outputs from each of these stages. The latency for each stage is 1. Figures 13 and 14 show the four-bit PIPO shift register design as employed in QCA and its simulation graph, respectively.   The four-bit PIPO shift register is devised with the use of four proposed DFF. It utilizes 114 cells in its QCA implementation. The four parallel inputs to the four stages are D1, D2, D3, and D4. Q1, Q2, Q3, and Q4 are the respective parallel outputs from each of these stages. The latency for each stage is 1. Figures 13 and 14 show the four-bit PIPO shift register design as employed in QCA and its simulation graph, respectively.

SIPO Designs
When the input is applied serially and the output is obtained in a parallel manner, the shift register is termed as serial-input-parallel-output. Figure 15 is the block diagram representation of a four-bit SIPO shift register. It can be seen that all the DFF are joined in a cascade manner where serial input is entered to the first DFF at the left and then the output of one connects to the input of the DFF following it. Since the same CLK pulse is applied to each of the DFF, the design is synchronous, and the output of each DFF is occupied in a parallel manner. The performance parameters of the proposed two-bit PIPO and four-bit PIPO shift registers are given in Table 1. The PIPO designs are effectual in terms of fewer total cells utilized in the design, the area occupied by the cells, total area, cost, and latency.

SIPO Designs
When the input is applied serially and the output is obtained in a parallel manner, the shift register is termed as serial-input-parallel-output. Figure 15 is the block diagram representation of a four-bit SIPO shift register. It can be seen that all the DFF are joined in a cascade manner where serial input is entered to the first DFF at the left and then the output of one connects to the input of the DFF following it. Since the same CLK pulse is applied to each of the DFF, the design is synchronous, and the output of each DFF is occupied in a parallel manner.  The two-bit SIPO shift register is designed with the proposed design of DFF. It utilizes 73 cells in its QCA design. D is the serial input applied to the flip flop of the first stage. Q 1 is the first stage output, which connects to the input of the second stage of the SIPO shift register. Q 2 is the stage 2 output. The latency for stage 1 is 1 as the QCA clock zone undergoes four changes from input D to output Q 1 . These changes are clock 1 shown in magenta to clock 2 shown in blue and to clock 3 shown in white. The clock zone undergoes 4 changes from Q 1 to output Q 2 , making a total of 8 changes in clock zones from serial input D to output Q 2 with each change giving rise to a delay of 0.25. Therefore, the delay for the second stage is 2. Figure 16 is the implementation of the two-bit SIPO shift register in QCA, and Figure 17 shows the simulation graph for this shift register. When the first clock pulse appears, D is 0; therefore, Q 1 follows the serial input and remains 0 until the next clock pulse appears. It can be seen from the simulation graph that output appears after a delay of 1 in Q 1 . Then, at the second clock pulse, Q 2 follows the input going to it through Q 1 , i.e., becomes 0 with the delay of 2, and at the same time Q 1 goes from 0 to 1 because serial input D is 1 and has the same value until the next clock pulse appears. This process continues, and the input sequence of 0101 appears at the parallel outputs as it keeps shifting with each clock pulse.  Figure 15. Block diagram representation of SIPO shift register.
The two-bit SIPO shift register is designed with the proposed design of DFF. It utilizes 73 cells in its QCA design. D is the serial input applied to the flip flop of the first stage. Q1 is the first stage output, which connects to the input of the second stage of the SIPO shift register. Q2 is the stage 2 output. The latency for stage 1 is 1 as the QCA clock zone undergoes four changes from input D to output Q1. These changes are clock 1 shown in magenta to clock 2 shown in blue and to clock 3 shown in white. The clock zone undergoes 4 changes from Q1 to output Q2, making a total of 8 changes in clock zones from serial input D to output Q2 with each change giving rise to a delay of 0.25. Therefore, the delay for the second stage is 2. Figure 16 is the implementation of the two-bit SIPO shift register in QCA, and Figure 17 shows the simulation graph for this shift register. When the first clock pulse appears, D is 0; therefore, Q1 follows the serial input and remains 0 until the next clock pulse appears. It can be seen from the simulation graph that output appears after a delay of 1 in Q1. Then, at the second clock pulse, Q2 follows the input going to it through Q1, i.e., becomes 0 with the delay of 2, and at the same time Q1 goes from 0 to 1 because serial input D is 1 and has the same value until the next clock pulse appears. This process continues, and the input sequence of 0101 appears at the parallel outputs as it keeps shifting with each clock pulse.  The two-bit SIPO shift register is designed with the proposed design of DFF. It utilizes 73 cells in its QCA design. D is the serial input applied to the flip flop of the first stage. Q1 is the first stage output, which connects to the input of the second stage of the SIPO shift register. Q2 is the stage 2 output. The latency for stage 1 is 1 as the QCA clock zone undergoes four changes from input D to output Q1. These changes are clock 1 shown in magenta to clock 2 shown in blue and to clock 3 shown in white. The clock zone undergoes 4 changes from Q1 to output Q2, making a total of 8 changes in clock zones from serial input D to output Q2 with each change giving rise to a delay of 0.25. Therefore, the delay for the second stage is 2. Figure 16 is the implementation of the two-bit SIPO shift register in QCA, and Figure 17 shows the simulation graph for this shift register. When the first clock pulse appears, D is 0; therefore, Q1 follows the serial input and remains 0 until the next clock pulse appears. It can be seen from the simulation graph that output appears after a delay of 1 in Q1. Then, at the second clock pulse, Q2 follows the input going to it through Q1, i.e., becomes 0 with the delay of 2, and at the same time Q1 goes from 0 to 1 because serial input D is 1 and has the same value until the next clock pulse appears. This process continues, and the input sequence of 0101 appears at the parallel outputs as it keeps shifting with each clock pulse.  The three-bit SIPO shift register is designed with three proposed DFF forming three stages. It utilizes 133 cells in its QCA design. The latency is 1, 2, and 4 for stage 1, stage 2, and stage 3, respectively. Serial input D is coupled to the input of the first stage. Q 1 , Q 2 , and Q 3 are parallel outputs. Q 1 is connected to the second stage input, and Q 2 is connected to the third stage input. Figure 18 is the three-bit SIPO shift register design as implemented in QCA followed by its simulation graph in Figure 19. Figure 20 shows the QCA implementation of four-bit SIPO shift register. Four proposed DFF are used to design this four-bit SIPO shift register. It utilizes 199 cells in its QCA design. D is the serial input that connects to the input of DFF in the first stage. Q 1 , Q 2 , Q 3 , and Q 4 are the parallel outputs of four stages. Output Q 1 of the first stage has a connection to the input of the second stage, output Q 2 has a connection made with the input of the third stage, and output Q 3 has a connection to the input of the fourth stage. The delay is 1, 2, 4, and 6 for stages 1, 2, 3, and 4, respectively. Figure 21 shows the simulation graph of the four-bit SIPO shift register. It can be seen from the waveform that when the first clock pulse appears, serial input D is 0; therefore, output Q 1 follows the input and attains the value of 0 after the delay of 1. It remains at the same value until the next clock pulse appears, and then input D is 1; therefore, Q 1 goes from 0 to 1. At the second clock pulse, the first bit, 0, is shifted to stage 2 and is shown in the graph after the delay of 2. It remains at the same value until the third clock pulse appears and then follows the next value of Q 1 , i.e., 1. Similarly, bit 0 shifts to stages 3 and 4 at clock pulses 3 and 4. While the first bits keep shifting from stage 1 to 4, Q 1 goes from 1 to 0 and then 1 at every clock pulse as the serial input D appears in the same manner.
The performance parameters for the anticipated 2-bit, 2-bit, and 4-bit SIPO shift registers are given in Table 2. These SIPO shift registers are quite efficient in terms of various QCA performance constraints such as number of QCA cells employed in the design, cell area utilization, total area utilized, and cost. The three-bit SIPO shift register is designed with three proposed DFF forming three stages. It utilizes 133 cells in its QCA design. The latency is 1, 2, and 4 for stage 1, stage 2, and stage 3, respectively. Serial input D is coupled to the input of the first stage. Q1, Q2, and Q3 are parallel outputs. Q1 is connected to the second stage input, and Q2 is connected to the third stage input. Figure 18 is the three-bit SIPO shift register design as implemented in QCA followed by its simulation graph in Figure 19.   The three-bit SIPO shift register is designed with three proposed DFF forming stages. It utilizes 133 cells in its QCA design. The latency is 1, 2, and 4 for stage 1, s and stage 3, respectively. Serial input D is coupled to the input of the first stage. and Q3 are parallel outputs. Q1 is connected to the second stage input, and Q2 is con to the third stage input. Figure 18 is the three-bit SIPO shift register design as implem in QCA followed by its simulation graph in Figure 19.    Figure 20 shows the QCA implementation of four-bit SIPO shift register. Four proposed DFF are used to design this four-bit SIPO shift register. It utilizes 199 cells in its QCA design. D is the serial input that connects to the input of DFF in the first stage. Q1, Q2, Q3, and Q4 are the parallel outputs of four stages. Output Q1 of the first stage has a connection to the input of the second stage, output Q2 has a connection made with the input of the third stage, and output Q3 has a connection to the input of the fourth stage. The delay is 1, 2, 4, and 6 for stages 1, 2, 3, and 4, respectively. Figure 21 shows the simulation graph of the four-bit SIPO shift register. It can be seen from the waveform that when the first clock pulse appears, serial input D is 0; therefore, output Q1 follows the input and attains the value of 0 after the delay of 1. It remains at the same value until the next clock pulse appears, and then input D is 1; therefore, Q1 goes from 0 to 1. At the second clock pulse, the first bit, 0, is shifted to stage 2 and is shown in the graph after the delay of 2. It remains at the same value until the third clock pulse appears and then follows the next value of Q1, i.e., 1. Similarly, bit 0 shifts to stages 3 and 4 at clock pulses 3 and 4. While the first bits keep shifting from stage 1 to 4, Q1 goes from 1 to 0 and then 1 at every clock pulse as the serial input D appears in the same manner.    Figure 20 shows the QCA implementation of four-bit SIPO shift register. Four proposed DFF are used to design this four-bit SIPO shift register. It utilizes 199 cells in its QCA design. D is the serial input that connects to the input of DFF in the first stage. Q1, Q2, Q3, and Q4 are the parallel outputs of four stages. Output Q1 of the first stage has a connection to the input of the second stage, output Q2 has a connection made with the input of the third stage, and output Q3 has a connection to the input of the fourth stage. The delay is 1, 2, 4, and 6 for stages 1, 2, 3, and 4, respectively. Figure 21 shows the simulation graph of the four-bit SIPO shift register. It can be seen from the waveform that when the first clock pulse appears, serial input D is 0; therefore, output Q1 follows the input and attains the value of 0 after the delay of 1. It remains at the same value until the next clock pulse appears, and then input D is 1; therefore, Q1 goes from 0 to 1. At the second clock pulse, the first bit, 0, is shifted to stage 2 and is shown in the graph after the delay of 2. It remains at the same value until the third clock pulse appears and then follows the next value of Q1, i.e., 1. Similarly, bit 0 shifts to stages 3 and 4 at clock pulses 3 and 4. While the first bits keep shifting from stage 1 to 4, Q1 goes from 1 to 0 and then 1 at every clock pulse as the serial input D appears in the same manner.    The performance parameters for the anticipated 2-bit, 2-bit, and 4-bit SIPO shift registers are given in Table 2. These SIPO shift registers are quite efficient in terms of various QCA performance constraints such as number of QCA cells employed in the design, cell area utilization, total area utilized, and cost.

Energy Dissipation Analysis
For the purpose of estimation of cells that cause major error i.e., exceedingly erroneous in QCA circuit design, QCAPro, a probabilistic modelling tool [33], was used. This tool uses the fast approximation technique. The tool was used to evaluate the three energies, i.e., the average leakage energy, the average switching energy, as well as the total leakage energy. The analysis was performed at three different tunneling energy levels, 0.5 Ek, 1 Ek, and 1.5 Ek at T = 2K, which is the default operating temperature selected for the energy dissipation analysis of the proposed designs [34][35][36]. The thermal maps for the proposed DFF design at all tunnelling levels, i.e., 0.5EK, 1.0 EK, and 1.5 EK, are shown in Figure 22. The thermal maps for the proposed DFF based 2-bit PIPO shift register at all tunnelling levels, i.e., 0.5 EK, 1.0 EK, and 1.5 EK are shown in Figure 23 and for the DFF based four-bit PIPO shift register are shown in Figure 24. These maps show that cells in dark dissipate more energy than cells lightly mapped. Additionally, these maps can be used to optimize designs to minimize the number of dark cells, thereby reducing overall

Energy Dissipation Analysis
For the purpose of estimation of cells that cause major error i.e., exceedingly erroneous in QCA circuit design, QCAPro, a probabilistic modelling tool [33], was used. This tool uses the fast approximation technique. The tool was used to evaluate the three energies, i.e., the average leakage energy, the average switching energy, as well as the total leakage energy. The analysis was performed at three different tunneling energy levels, 0.5 E k , 1 E k , and 1.5 E k at T = 2K, which is the default operating temperature selected for the energy dissipation analysis of the proposed designs [34][35][36]. The thermal maps for the proposed DFF design at all tunnelling levels, i.e., 0.5E K , 1.0 E K , and 1.5 E K , are shown in Figure 22. The thermal maps for the proposed DFF based 2-bit PIPO shift register at all tunnelling levels, i.e., 0.5 E K , 1.0 E K , and 1.5 E K are shown in Figure 23 and for the DFF based four-bit PIPO shift register are shown in Figure 24. These maps show that cells in dark dissipate more energy than cells lightly mapped. Additionally, these maps can be used to optimize designs to minimize the number of dark cells, thereby reducing overall power dissipation. Tables 3-5 show the energies dissipated by proposed DFF, two-bit PIPO, and four-bit PIPO shift registers, respectively.     In addition to this, the energy dissipation of two-bit SIPO, four-bit SIPO and other QCA designs was analyzed using the QCADesigner-E tool [36]. It approximates the dissipated energy of the circuits designed in QCA according to the method presented in [34]. E bath_total is estimated as the sum of all "bath" of energies (E bath ) for every cycle of clock pulse by each cell in the design and it gives the overall value of energy dissipated in total. E clk is calculated as the sum of two energies, i.e., the transfer of energy within the cells of the circuit and the clock which is separated by each clock cycle. 1.16 × 10 −2 eV is the energy dissipated by proposed DFF in total with an error of about ±1.34 × 10 −3 eV and per cycle average energy dissipation is 1.05 × 10 −3 eV for this circuit. There is an error of ±1.22 × 10 −3 eV in the average energy dissipation per cycle. 1.77 × 10 −2 eV is the approximate value of energy dissipated by the proposed 2-bit PIPO shift register in total with an error of about ±1.75 × 10 −3 eV. Additionally, per cycle average energy dissipation is 1.61 × 10 −3 eV with an error of ±1.59 × 10 −4 eV. These energy values are calculated from the energy dissipation for the whole circuit. With an error of ±3.93 × 10 −3 eV, the proposed 4-bit PIPO shift register design dissipates energy 3.96 × 10 −2 eV in total, and 3.60 × 10 −3 eV on average, with a deviation of ±3.57 × 10 −4 eV per cycle.
With an error of ±2.10× 10 −3 eV, the proposed 2-bit SIPO shift register design dissipates energy 2.32 × 10 −2 eV in total, and 2.11 × 10 −3 eV on average, with an error of ±1.91 × 10 −4 eV per cycle. With an error of ±3.52 × 10 −3 eV, the proposed 3-bit SIPO shift register circuit dissipates an energy of 3.97 × 10 −2 eV in total, and per cycle energy dissipation of 3.61 × 10 −3 eV on average, with an error of ±3.20 × 10 −4 eV. Energy values are calculated from the energy dissipation for the whole configuration. With an error of ±4.13 × 10 −3 eV, the proposed design of 4-bit SIPO shift register dissipates total of 5.02 × 10 −2 eV energy and with an error of ±3.76 × 10 −4 eV this circuit dissipates 4.56 × 10 −3 eV energy on average per cycle. In addition to this, the energy dissipation of two-bit SIPO, four-bit SIPO and other QCA designs was analyzed using the QCADesigner-E tool [36]. It approximates the dissipated energy of the circuits designed in QCA according to the method presented in [34]. Ebath_total is estimated as the sum of all "bath" of energies (Ebath) for every cycle of clock pulse by each cell in the design and it gives the overall value of energy dissipated in total. Eclk is calculated as the sum of two energies, i.e., the transfer of energy within the cells of

Performance Comparison
Based on the parameters that determine the performance of circuits designed in QCA, a comparison was drawn among the proposed QCA designs with the existing designs and overall improvement in the performance of the proposed designs is evaluated. The QCA cost parameter was also compared. It is the product of the squared value of latency and total area utilized. The comparison of the proposed DFF design is given in Table 6, and the percentage improvement from existing designs is presented in Table 7. It is clearly inferred that the proposed DFF design had a 43.24% to 64.41% lower cell count than existing designs with a reduction of about 42.86% to 64.4% in cell area also. The total area utilized was reduced by almost up to 78.67%. The latency or delay of the proposed design was lowered up to 60% with an overall enhancement in performance of about 18.78% to 95.8% due to a reduction in QCA cost.  Table 8 presents the comparison of the proposed DFF based four-bit PIPO shift register and Table 9 presents the percentage improvement in performance from existing designs. It is clearly inferred that the proposed DFF based 4-bit PIPO had a 16.18% to 56.15% lower cell count than existing designs with a reduction of about 16.25% to 56.19% in cell area also. The total area utilized was reduced up to 94.27% with an overall enhancement in performance of about 78.92% to 94.27% due to a reduction in QCA cost.

Conclusions
Numerous suitable resolutions and alternative technologies are being researched for the design and implementation of circuits at nano scale that will be in equivalence with CMOS technology. QCA, being one of those technologies, was presented in this work. Numerous flip flop circuits and their implementation in QCA have been proposed in the existing literature. However, based on the literature review, it was concluded that many of these designs provide less efficient performance because of the higher count of cells, larger latency, and larger utilization of area. For addressing this issue, a new D flip flop design was proposed in this work and implemented in QCA technology using the QCADesigner tool. It was then perceived that the count of cells is 21 for the proposed design and the latency was 1 clock cycle. This led to the QCA cost being effectively reduced. The proposed DFF design was then used for designing other sequential circuits such as shift registers. The evaluation of the performance of these designs revealed that the proposed design of DFF uses 43.24% to 64.41% fewer cells and 54.29% to 78.67% less total area and has up to 95.8% lower QCA cost in comparison with the costs of existing DFF circuits. PIPO and SIPO shift registers were proposed using this DFF, and the proposed PIPO shift register circuits used almost 56.15% fewer cells for 4 bits than the existing designs. The area utilization and QCA cost improved by about 94.27% with the 4-bit PIPO shift registers. Further, the energy dissipation of all the designs proposed in this work was analysed using QCAPro and QCADesigner-E tool, and almost all the proposed designs were efficient in terms of energy utilization.