Design and Evaluation of Cell Interaction Based Vedic Multiplier Using Quantum-Dot Cellular Automata

: A multiplier is one of the main units for digital signal processing and communication systems. In this paper, a high speed and low complexity multiplier is designed on the basis of quantum-dot cellular automata (QCA), which is considered promising nanotechnology. We focus on Vedic multiplier architectures according to Vedic mathematics from ancient Indian sculptures. In fact, an adder is an important block in the design of almost all types of multipliers and a ripple carry adder is used to design simple multiplier implementations. However, a high-speed multi-bit multiplier requires high-speed adder owing to carry propagation. Cell-interaction-based QCA adders have better improvements over conventional majority-gate-based adders. Therefore, a two-bit Vedic multiplier is proposed in QCA and it is used to implement a four-bit form of the multiplier. The proposed architecture has a lower cell count and area compared to other existing structures. Moreover, simulation results demonstrate that the proposed design is sustainable and can be used to realize complex circuit designs for QCA communication networks.


Introduction
High energy consumption is one of the major limitations for transistor-based complementary metal-oxide-semiconductor (CMOS) technology. This limitation cannot be solved completely by scaling down the size of the transistor. Quantum-dot cellular automata (QCA) [1], a type of self-reproduction cellular automata [2][3][4], is one remarkable nanotechnology that can be a possible solution to solve this problem. The energy efficiency and high-speed switching features of this technology is great motivation for developing this paradigm [5][6][7][8][9].
In this technology, logical states can be represented by two antipodal polarization levels in a QCA cell configuration. The square-shaped cell includes a pair of electrons with four quantum dots. QCA offers a new computation method and is based on the fundamental principle of quantum confinement [10][11][12].
There are some challenges for the physical implementation of a QCA cell at room temperature. In previous experiments, the cells were found to operate only at cryogenic temperatures. Recently, the fabrication of a QCA cell using the nano-damascene process has been successful [13]. This potentially enables the room temperature operation of QCA-based circuits.
Array multipliers have been extensively investigated in QCA technology. However, there are also mathematical operations using the Vedic method that are very fast and require less hardware architecture. This can be used to improve the computational speed of processors. A huge number of adders are used to perform partial product addition for higher order multiplications. This method of

Basic QCA
The basic element of QCA is a cell that holds quantum dots and two excess electrons. The positions of the electrons in a cell are a logical state. The cell has a square-shaped nanostructure that contains four quantum dots and two electrons, as shown in Figure 1. The electrons can move through a tunnel junction owing to their Coulombic repulsion. The existing Coulombic interaction between the electrons forces them to occupy positions at opposite corners of the cell by the concept of polarization. As a result, two stable states (binary zero and one) are represented in the cell [18][19][20]. Polarization, P, is determined by one of two diagonals depending on the distribution of charges, and the value is measured [18]. If the cell specifies ρ i as the electron density of the i-th point for the four points i = 1 to 4 clockwise from the top right, then P is defined as follows. P = ((ρ 1 + ρ 3 ) − (ρ 2 + ρ 4 ))/(ρ 1 + ρ 2 + ρ 3 + ρ 4 ) (1) Electronics 2020, 9, x FOR PEER REVIEW 3 of 15 was applied well, and the blue short two arrows indicate the parts that were not applied due to the surrounding repulsive forces.   The binary information is propagated by the neighboring cell interaction along a line of QCA cells owing to the high nonlinearity. This is called a QCA standard wire as shown in Figure 2a. Coulombic interaction acts not only on one cell, but also on neighboring cells, making it easy to make QCA wires by sequentially arranging cells. Therefore, information is transmitted without current flow. In this technology, the basic logic components, an inverter and a majority gate, can be implemented on the basis of the mutual interaction between cells. The majority gate can be three-and five-input majority gates. The three-input majority gate is defined as follows.

QCA Clocking
The QCA frequency plays a crucial role in signal energy recovery and solving the thermodynamic problem in large-scale QCA circuits based on four clearly defined and π/2 interval clock pulses, as shown in Figure 3. The clock is used as an inter-dot barrier controller.
In each QCA clock zone, there are four synchronization phases: Switch, hold, release, and relax. When the inter-dot barrier is high, the cells are held in the hold phase, and when the barrier is low, the cells remain in the relax phases. When the inter-dot barrier is changed, e.g., from low to high or from high to low, the cells are stored in the switch or release phase, respectively. A signal transition occurs during the switching phase. QCA synchronization also serves as an energy source [22][23][24]. As shown in Figure 2b, the three-input majority gate is composed of five cells, and logical two-input AND and OR can be implemented using this majority gate, by setting one arbitrary input cell to binary "0" and "1", respectively as follows.
As an innovative structure, the five-input majority gate consists of 10 cells, and designs based on this majority gate are efficient in terms of the area compared to traditional ones [21]. It can be a multi-functional structure and expressed as follows and shown in Figure 2c.
Two types of inversion operation are actualized by placing the cells diagonally, as illustrated in Figure 2d,e. In Figure 2, the red long two-way arrows indicate the parts where Coulombic repulsion was applied well, and the blue short two arrows indicate the parts that were not applied due to the surrounding repulsive forces.

QCA Clocking
The QCA frequency plays a crucial role in signal energy recovery and solving the thermodynamic problem in large-scale QCA circuits based on four clearly defined and π/2 interval clock pulses, as shown in Figure 3. The clock is used as an inter-dot barrier controller.
In each QCA clock zone, there are four synchronization phases: Switch, hold, release, and relax. When the inter-dot barrier is high, the cells are held in the hold phase, and when the barrier is low, the cells remain in the relax phases. When the inter-dot barrier is changed, e.g., from low to high or from high to low, the cells are stored in the switch or release phase, respectively. A signal transition occurs during the switching phase. QCA synchronization also serves as an energy source [22][23][24].

Wire Crossing in QCA
Four types of implementations can be used for the physical fabrication of QCA, such as metal-dot, semiconductor, molecular, and magnetic QCA implementations. If material production can be efficiently addressed, the next practical problem for a QCA circuit is its operating temperature (7K). This temperature is considered to be extremely low, and a QCA circuit needs to be optimized for roomtemperature operation [25].
Cross-over technology is also considered as one of the problems in QCA technology. According to this concept, there are three types of cross methods: Coplanar, multi-layered, and logical crossover. The coplanar intersection of a wire is carried out using an inverter chain. The inverter chain represents a line of rotating (45˚) cells. A multi-layered crossover is a stereoscopic structure, which is built of several layers. The logical intersection of the wires is updated on the basis of QCA synchronization; namely, the cells in the switch and hold phases can cross cells in the release and relax phases, respectively [26][27][28]. The logical method of crossing wires is more reliable and tolerant than other crossover methods in the modelling process.

Half Adder for the Multiplier
As mentioned, most half-adder designs in QCA are equation-based structures, namely, a combination of inverter and majority gates. It is also possible to implement a QCA half adder using an XOR gate. Several QCA XOR designs have been proposed [17,29,30].
We use cell-interaction-based XOR [17]. The QCA design of the proposed half adder is presented in Figure 4. The proposed structure consists of one XOR gate and one three-input majority gate with two fixed cells to a value of zero which are drawn in orange color. Consequently, the proposed half adder is also considered as a structure based on cell interaction.

Wire Crossing in QCA
Four types of implementations can be used for the physical fabrication of QCA, such as metal-dot, semiconductor, molecular, and magnetic QCA implementations. If material production can be efficiently addressed, the next practical problem for a QCA circuit is its operating temperature (7K). This temperature is considered to be extremely low, and a QCA circuit needs to be optimized for room-temperature operation [25].
Cross-over technology is also considered as one of the problems in QCA technology. According to this concept, there are three types of cross methods: Coplanar, multi-layered, and logical crossover. The coplanar intersection of a wire is carried out using an inverter chain. The inverter chain represents a line of rotating (45 • ) cells. A multi-layered crossover is a stereoscopic structure, which is built of several layers. The logical intersection of the wires is updated on the basis of QCA synchronization; namely, the cells in the switch and hold phases can cross cells in the release and relax phases, respectively [26][27][28]. The logical method of crossing wires is more reliable and tolerant than other crossover methods in the modelling process.

Half Adder for the Multiplier
As mentioned, most half-adder designs in QCA are equation-based structures, namely, a combination of inverter and majority gates. It is also possible to implement a QCA half adder using an XOR gate. Several QCA XOR designs have been proposed [17,29,30].
We use cell-interaction-based XOR [17]. The QCA design of the proposed half adder is presented in Figure 4. The proposed structure consists of one XOR gate and one three-input majority gate with two fixed cells to a value of zero which are drawn in orange color. Consequently, the proposed half adder is also considered as a structure based on cell interaction.  The design is composed of 39 cells and the output is generated after three clock phases. The truth table of the half adder is summarized in Table 1, and it is defined as follows:

The 2 × 2 Bit Vedic Multiplier
This section focuses on a method of Vedic multiplication called, Urdhva-Tiryakbhyam (UT)vertically and crosswise, which can be used for both decimal and binary multiplication [14][15][16]. This method is an algorithm that can be used to multiply the number N × N. The Vedic multiplier does not depend on the processor's clock frequency since the sums and their partial products are calculated in parallel. This means that it is possible to reduce the energy dissipation using a Vedic multiplier and increase the speed of multiplication.
The multiplication of ax + b and cx + d in Vedic mathematics is based on the UT sutra exactly; the resulting expression is derived as acx 2 + (ad + bc)x + bd. The principles of the UT sutra are as follows: • The coefficient of x 2 indicates the vertical multiplication of a and c.

•
The coefficient of x expresses cross-wise multiplication of a with d and b with c, along with the addition of both outputs.

•
The constant is the vertical multiplication of b and d.
It is easy to explain the algorithm by multiplying 2 × 2 bits. The two-bit binary numbers A = [A1A0] and B = [B1B0] are obtained, as shown in Figure 5. First, the least significant bit (LSB) of the first binary code 'A', namely (A0), is multiplied by the LBS 'B', namely (B0). The generated product (A0B0) is stored for the next as the LSB of final result. Table 2 is for the simulation of 2 × 2 Vedic multiplier. The design is composed of 39 cells and the output is generated after three clock phases. The truth table of the half adder is summarized in Table 1, and it is defined as follows:

The 2 × 2 Bit Vedic Multiplier
This section focuses on a method of Vedic multiplication called, Urdhva-Tiryakbhyam (UT)-vertically and crosswise, which can be used for both decimal and binary multiplication [14][15][16]. This method is an algorithm that can be used to multiply the number N × N. The Vedic multiplier does not depend on the processor's clock frequency since the sums and their partial products are calculated in parallel. This means that it is possible to reduce the energy dissipation using a Vedic multiplier and increase the speed of multiplication.
The multiplication of ax + b and cx + d in Vedic mathematics is based on the UT sutra exactly; the resulting expression is derived as acx 2 + (ad + bc)x + bd. The principles of the UT sutra are as follows: • The coefficient of x 2 indicates the vertical multiplication of a and c.

•
The coefficient of x expresses cross-wise multiplication of a with d and b with c, along with the addition of both outputs.

•
The constant is the vertical multiplication of b and d.
It is easy to explain the algorithm by multiplying 2 × 2 bits. The two-bit binary numbers A = [A 1 A 0 ] and B = [B 1 B 0 ] are obtained, as shown in Figure 5. First, the least significant bit (LSB) of the first binary code 'A', namely (A 0 ), is multiplied by the LBS 'B', namely (B 0 ). The generated product (A 0 B 0 ) is stored for the next as the LSB of final result. Table 2 is for the simulation of 2 × 2 Vedic multiplier.  The multiplication of A0 and B0 requires one AND gate. The first step is the crosswise multiplication of the numbers A and B, and this requires two AND gates. The second step is actualized as follows: The LSB of the number A is multiplied with the most significant bit (MSB) of the number B, namely (A0 × B1), and at the same time the MSB of the number A is multiplied with the LSB of the number B, namely (A1 × B0). These generated products are added [(A0 × B1) + (A1 × B0)] using a half adder. The generated two-bit result [C1S1] from the first half adder is considered as that (S1) for the second bit of the final product, and (C1) is saved as a pre-carry for the next step. In the last step, the MSB of the number A (A1) is multiplied with the MSB of the number B(B1) and the result (A1 × B1) is added to the pre-carry (C1) via a second half adder. The two-bit result [C2S2] is obtained from the second half adder; namely (S2) and (C2) are the third and fourth bits, respectively, of the final product. Finally, the four-bit number [C2S2S1S0] is formed as an output. The block diagram of the 2 × 2 bit Vedic multiplier is shown in Figure 6 and the definition is as follows: The 2 × 2 bit Vedic multiplier has been designed in QCA on the basis of the block diagram in Figure 7. Logical wire-crossing is extensively used to cross the wires, as illustrated in Figure 8. There is a total of two half adders and four AND gates in the proposed structure. As mentioned in the previous section, the half adder is designed as a low complexity structure. Hence, the proposed 2 × 2 Vedic multiplier is also a coplanar design.  The multiplication of A 0 and B 0 requires one AND gate. The first step is the crosswise multiplication of the numbers A and B, and this requires two AND gates. The second step is actualized as follows: The LSB of the number A is multiplied with the most significant bit (MSB) of the number B, namely (A 0 × B 1 ), and at the same time the MSB of the number A is multiplied with the LSB of the number B, namely (A 1 × B 0 ). These generated products are added [(A 0 × B 1 ) + (A 1 × B 0 )] using a half adder. The generated two-bit result [C 1 S 1 ] from the first half adder is considered as that (S 1 ) for the second bit of the final product, and (C 1 ) is saved as a pre-carry for the next step. In the last step, the MSB of the number A (A 1 ) is multiplied with the MSB of the number B(B 1 ) and the result (A 1 × B 1 ) is added to the pre-carry (C 1 ) via a second half adder. The two-bit result [C 2 S 2 ] is obtained from the second half adder; namely (S 2 ) and (C 2 ) are the third and fourth bits, respectively, of the final product. Finally, the four-bit number [C 2 S 2 S 1 S 0 ] is formed as an output. The block diagram of the 2 × 2 bit Vedic multiplier is shown in Figure 6 and the definition is as follows: The 2 × 2 bit Vedic multiplier has been designed in QCA on the basis of the block diagram in Figure 7. Logical wire-crossing is extensively used to cross the wires, as illustrated in Figure 8. There is a total of two half adders and four AND gates in the proposed structure. As mentioned in the previous section, the half adder is designed as a low complexity structure. Hence, the proposed 2 × 2 Vedic multiplier is also a coplanar design.

The 4 × 4 Bit Vedic Multiplier
A Vedic multiplier applies bit-wise multiplication with simultaneous product term finding and its column-wise addition. This is one of the best standards for the algorithm of fast multiplication. The main advantage of this Vedic multiplier is that we can calculate partial products and sums, which must be fulfilled simultaneously.

The 4 × 4 Bit Vedic Multiplier
A Vedic multiplier applies bit-wise multiplication with simultaneous product term finding and its column-wise addition. This is one of the best standards for the algorithm of fast multiplication. The main advantage of this Vedic multiplier is that we can calculate partial products and sums, which must be fulfilled simultaneously.
This section focuses on a 4 × 4 Vedic multiplier architecture. In this algorithm, two four-digit numbers: A = [A3A2A1A0] and B = [B3B2B1B0] are obtained for clear explanation. In the first step, the LSB of number A is multiplied with the LSB of number B to generate the LSB of the final product. The rest of the steps are similar to the procedure of the 2 × 2 multiplier. The procedure of the 4 × 4 Vedic multiplier steps is explained using line diagram, as illustrated in Figure 9, and RCA full adder in Figure 10.
By the general concept, the generated carry (Cn) is forwarded to the next step in each step. The process goes on until the last step. The outputs C6 and S6 are taken at the last step. Finally, all of the generated results are obtained as form of [C6 S6 S5 S4 S3 S2 S1 S0]. The process is defined as follows: S0 = A0B0 C1S1 = A1B0 + A0B1 C2S2 = C1 + A0 B2 + A2 B0 + A1 B1 C3S3 = C2 + A0 B3 + A3 B0 + A1B2 + A2B1 C4S4 = C3 + A1B3 + A3B1 + A2B2 C5S5 = C4 + A3B2 + A2B3 C6S6 = C5 + A3B3 The design of the 4 × 4 Vedic multiplier architecture using the 2 × 2 Vedic multiplier improves the complexity of the operation. Four 2 × 2 Vedic multipliers (M1, M2, M3, and M4) perform multiplication operation between the defined inputs and produce sixteen products (S00, S01…S32, S33). The products obtained from the generated unit are placed in three particular positions, i.e., part-one, part-two, and part-three. Part-one includes the last two bits of the M1 result (S01 S00) and part-three contains the primary two bits of the M4 result (S33 S32). However, part-two is a multi-operative part of the summation unit that performs the main cross-wise multiplication for the 4 × 4 architecture of the Vedic multiplier. There are '0' values in part-two in some cases wherever required, as shown in Figure 11. The final result is obtained after calculating the summation unit in the form [R7R6R5R4R3R2R1R0]. This section focuses on a 4 × 4 Vedic multiplier architecture. In this algorithm, two four-digit numbers: are obtained for clear explanation. In the first step, the LSB of number A is multiplied with the LSB of number B to generate the LSB of the final product. The rest of the steps are similar to the procedure of the 2 × 2 multiplier. The procedure of the 4 × 4 Vedic multiplier steps is explained using line diagram, as illustrated in Figure 9, and RCA full adder in Figure 10.     By the general concept, the generated carry (C n ) is forwarded to the next step in each step. The process goes on until the last step. The outputs C 6 and S 6 are taken at the last step. Finally, all of the generated results are obtained as form of [C 6 S 6 S 5 S 4 S 3 S 2 S 1 S 0 ]. The process is defined as follows: The design of the 4 × 4 Vedic multiplier architecture using the 2 × 2 Vedic multiplier improves the complexity of the operation. Four 2 × 2 Vedic multipliers (M1, M2, M3, and M4) perform multiplication operation between the defined inputs and produce sixteen products (S 00 , S 01 . . . S 32 , S 33 ). The products obtained from the generated unit are placed in three particular positions, i.e., part-one, part-two, and part-three. Part-one includes the last two bits of the M1 result (S 01 S 00 ) and part-three contains the primary two bits of the M4 result (S 33 S 32 ). However, part-two is a multi-operative part of the summation unit that performs the main cross-wise multiplication for the 4 × 4 architecture of the Vedic multiplier. There are '0' values in part-two in some cases wherever required, as shown in Figure 11. The final result is obtained after calculating the summation unit in the form [R 7 R 6 R 5 R 4 R 3 R 2 R 1 R 0 ].
A block diagram of the 4 × 4 Vedic multiplier is shown in Figure 9. The basic building block of the 4 × 4 Vedic multiplier is a 2 × 2 Vedic multiplier. The architecture is composed of four 2 × 2 multipliers and three 4-bit ripple carry adder (RCA) designs. The partial products generated from the 2 × 2 multipliers are forwarded to the adders to perform addition. The results obtained from the first RCA are transmitted to the next RCA, and there exist two zero inputs for the second RCA. In some cases, zero inputs are given to some RCAs wherever required. On the basis of the block diagram in Figure 9, the 4 × 4 Vedic multiplier is developed in QCA. As shown in Figure 12, the proposed multiplier design is constructed using logical and multi-layered wire-crossing techniques. The structure is well pipelined within 5.25 clock cycle.   A block diagram of the 4 × 4 Vedic multiplier is shown in Figure 9. The basic building block of the 4 × 4 Vedic multiplier is a 2 × 2 Vedic multiplier. The architecture is composed of four 2 × 2 multipliers and three 4-bit ripple carry adder (RCA) designs. The partial products generated from the 2 × 2 multipliers are forwarded to the adders to perform addition. The results obtained from the first RCA are transmitted to the next RCA, and there exist two zero inputs for the second RCA. In some cases, zero inputs are given to some RCAs wherever required. On the basis of the block diagram in Figure 9, the 4 × 4 Vedic multiplier is developed in QCA. As shown in Figure 12, the proposed multiplier design is constructed using logical and multi-layered wire-crossing techniques. The structure is well pipelined within 5.25 clock cycle. The existing one-bit full adder [31] is used to create RCA designs. This is a multi-layered structure consisting of a majority gate that is placed at the layer of the main cells and an XOR gate that is located on the top layer of the cell in the QCADesigner tool. This means that summation by the full adder is performed using an XOR gate, along with the carry which is obtained serially from the majority gates, as shown in Figure 10. The four-bit RCA design is implemented for the proposed 4 × 4 Vedic multiplier by sequentially connecting the one-bit full adders. Note that other types of adders can be used to help to improve the speed of the addition operation.
In this hierarchical design, a multi-layered technique is used for only the full adder structures. The rest of the structure has been designed as a coplanar architecture. As crossover type in the design, logical crossover is extensively used. The whole architecture would be a coplanar layout if the full adder is changed with a coplanar one.

Simulation Results and Analyses
In this section, we report the results of a structural analysis of the proposed designs, and energy dissipation analysis for the proposed half adder.

Structural Analysis
The proposed designs are simulated with the QCADesigner tool version 2.0.3 [32], which is a well-aimed simulation tool for QCA circuits. QCA circuits are simulated using the bistable The existing one-bit full adder [31] is used to create RCA designs. This is a multi-layered structure consisting of a majority gate that is placed at the layer of the main cells and an XOR gate that is located on the top layer of the cell in the QCADesigner tool. This means that summation by the full adder is performed using an XOR gate, along with the carry which is obtained serially from the majority gates, as shown in Figure 10. The four-bit RCA design is implemented for the proposed 4 × 4 Vedic multiplier by sequentially connecting the one-bit full adders. Note that other types of adders can be used to help to improve the speed of the addition operation.
In this hierarchical design, a multi-layered technique is used for only the full adder structures. The rest of the structure has been designed as a coplanar architecture. As crossover type in the design, logical crossover is extensively used. The whole architecture would be a coplanar layout if the full adder is changed with a coplanar one.

Simulation Results and Analyses
In this section, we report the results of a structural analysis of the proposed designs, and energy dissipation analysis for the proposed half adder.

Structural Analysis
The proposed designs are simulated with the QCADesigner tool version 2.0.3 [32], which is a well-aimed simulation tool for QCA circuits. QCA circuits are simulated using the bistable approximation and coherence vector simulation engines. The simulation parameters are listed in Table 3.  Figure 13 shows a simulation result of the proposed 2 × 2 Vedic multiplier. The simulation results confirm that the binary numbers obtained from the simulation match the values in Table 2. This is explained as follows; the input-output value is shown in the red rectangles in Figure 13; namely, multiplication of the binary number '11' (A 1 A 0 ) by '11' (B 1 B 0 ) is equal to the binary number '1001' (C 2 S 2 S 1 S 0 ). It is clearly seen that the first dashed-line-based rectangle in the input is generated correctly below the first dashed-line-based rectangle in the output, and so on, as shown in Figure 13.
The 4 × 4 Vedic multiplier architecture was designed on the basis of the proposed Vedic 2 × 2 multiplier. In the designs, we use a logical intersection scheme, because of the reliable and stable signal propagation throughout the QCA wires.
A multi-layered cross-over technique is used only in the full adder. The proposed designs have been compared with previous multiplier designs. In this comparison, QCA parallel 4 × 4 multipliers have been also chosen for comparison with the Vedic multipliers. The comparison results are summarized in Table 4, in terms of the cell count, total area, and latency.
According to the comparison, the proposed designs have achieved significant improvements. The chart in Figure 14 indicates that the proposed 4 × 4 Vedic multiplier has great achievements in terms of the area in comparison to previously designed QCA multipliers [33][34][35].  Figure 13 shows a simulation result of the proposed 2 × 2 Vedic multiplier. The simulation results confirm that the binary numbers obtained from the simulation match the values in Table 2. This is explained as follows; the input-output value is shown in the red rectangles in Figure 13; namely, multiplication of the binary number '11' (A1 A0) by '11' (B1 B0) is equal to the binary number '1001' (C2 S2 S1 S0). It is clearly seen that the first dashed-line-based rectangle in the input is generated correctly below the first dashed-line-based rectangle in the output, and so on, as shown in Figure 13. The 4 × 4 Vedic multiplier architecture was designed on the basis of the proposed Vedic 2 × 2 multiplier. In the designs, we use a logical intersection scheme, because of the reliable and stable signal propagation throughout the QCA wires. A multi-layered cross-over technique is used only in the full adder. The proposed designs have been compared with previous multiplier designs. In this comparison, QCA parallel 4 × 4 multipliers have been also chosen for comparison with the Vedic multipliers. The comparison results are summarized in Table 4, in terms of the cell count, total area, and latency.
According to the comparison, the proposed designs have achieved significant improvements. The chart in Figure 14 indicates that the proposed 4 × 4 Vedic multiplier has great achievements in terms of the area in comparison to previously designed QCA multipliers [33][34][35].
Note that the QCA multipliers have been implemented using the cell-interaction XOR structures to attain reliable results in the simulation, because, the cell-interaction based XOR gate has improvements over conventional AND-OR-INVERTER-based ones in all aspects.

Energy Dissipation Analysis
In order to estimate the energy dissipation, the QCAPro [36] tool has been utilized. The energy dissipation is analyzed by considering three different energy levels (0.5 Ek, 1.0 Ek, and 1.5 Ek) at a temperature of 2 K. The dissipated energy is measured on the basis of the Hamiltonian matrix using the HartreeFock approximation concerning the Columbic interaction between the cells and defined as follows.
In the above equation, polarization of i-th juxtaposed cell is represented by and the geometrical factor identifying electrostatic interaction between cells i and j is represented by , due to geometrical distance where is the kink energy between the cells. This kink energy is related to the energy cost of two cells with the opposite polarization γ-the electron tunneling energy inside Note that the QCA multipliers have been implemented using the cell-interaction XOR structures to attain reliable results in the simulation, because, the cell-interaction based XOR gate has improvements over conventional AND-OR-INVERTER-based ones in all aspects.

Energy Dissipation Analysis
In order to estimate the energy dissipation, the QCAPro [36] tool has been utilized. The energy dissipation is analyzed by considering three different energy levels (0.5 E k , 1.0 E k , and 1.5 E k ) at a temperature of 2 K. The dissipated energy is measured on the basis of the Hamiltonian matrix using the HartreeFock approximation concerning the Columbic interaction between the cells and defined as follows.
In the above equation, polarization of i-th juxtaposed cell is represented by C i and the geometrical factor identifying electrostatic interaction between cells i and j is represented by f i,j due to geometrical distance where E k is the kink energy between the cells. This kink energy is related to the energy cost of two cells with the opposite polarization γ-the electron tunneling energy inside the cell, which is controlled by the clock. We have evaluated the energy dissipation of the proposed half adder using the adequate energy estimation tool QCAPro. As given in Tables 5-7, the average leakage energy, switching energy, and circuit energy dissipation, respectively, for the three different tunneling energy levels are calculated at a temperature of 2 K.  [7] 0.09 0.08 0.06 [15] 0.08 0.07 0.06 [29] 0.06 0.05 0.04 Proposed 0.04 0.04 0.03 Table 7. Energy dissipation analysis at T = 2 K.
Designs 0.5 E k (eV) 1.0 E k (eV) 1.5 E k (eV) [7] 0.12 0.15 0.20 [15] 0.10 0.12 0.15 [29] 0.07 0.09 0.12 Proposed 0.06 0.08 0.10 The obtained results are compared to the previous designs in [7,15,29] according to the metrics. Figure 15 shows the considerable optimization of the proposed design with regards to the average energy dissipation. According to the chart in Figure 15, the proposed half adder design consumes the lowest amount of energy compared to previous designs.
Electronics 2020, 9, x FOR PEER REVIEW 13 of 15 the cell, which is controlled by the clock. We have evaluated the energy dissipation of the proposed half adder using the adequate energy estimation tool QCAPro. As given in Tables 5-7, the average leakage energy, switching energy, and circuit energy dissipation, respectively, for the three different tunneling energy levels are calculated at a temperature of 2 K. The obtained results are compared to the previous designs in [7], [15], [29] according to the metrics. Figure 15 shows the considerable optimization of the proposed design with regards to the average energy dissipation. According to the chart in Figure 15, the proposed half adder design consumes the lowest amount of energy compared to previous designs. Designs 0.5 Ek (eV) 1.0 Ek (eV) 1.5 Ek (eV) [7] 0.03 0.08 0.14 [15] 0.02 0.06 0.10 [29] 0.01 0.04 0.07 Proposed 0.01 0.03 0.06 Table 6. Switching energy analysis at T = 2 K.

Conclusions
In this paper, a technique for designing Vedic multiplier with cell-interaction QCA structures based on QCA technology is presented. First, a coplanar 2 × 2 Vedic multiplier has been designed and simulated with QCADesigner. The energy dissipation was also calculated for the circuit by QCAPro. Then, the architecture of a 4 × 4 Vedic multiplier has been designed by applying the 2 × 2 Vedic multiplier. The proposed QCA layouts have been compared with prior works in terms of the cell Figure 15. The average energy dissipation of the half adders for various tunneling energy levels.

Conclusions
In this paper, a technique for designing Vedic multiplier with cell-interaction QCA structures based on QCA technology is presented. First, a coplanar 2 × 2 Vedic multiplier has been designed and simulated with QCADesigner. The energy dissipation was also calculated for the circuit by QCAPro. Then, the architecture of a 4 × 4 Vedic multiplier has been designed by applying the 2 × 2 Vedic multiplier. The proposed QCA layouts have been compared with prior works in terms of the cell count, area, and latency. A comparison demonstrated that the proposed structures are implemented with significant improvements. According to this study, the use of the cell-interaction technique and clock-zone-based crossover can considerably improve the performance of the final structure.