Design of a High-Speed, Low-Power PTL-CMOS Hybrid Multiplier Using Critical-Path Evaluation Model

Yu, Yihe; Pan, Wanyuan; Tang, Chengcheng; Yin, Ningyuan; Yu, Zhiyi

doi:10.3390/electronics13071284

Open AccessArticle

Design of a High-Speed, Low-Power PTL-CMOS Hybrid Multiplier Using Critical-Path Evaluation Model

by

Yihe Yu

,

Wanyuan Pan

,

Chengcheng Tang

,

Ningyuan Yin

^* and

Zhiyi Yu

^*

School of Microelectronics Science and Technology, Sun Yat-sen University, Guangzhou 528478, China

^*

Authors to whom correspondence should be addressed.

Electronics 2024, 13(7), 1284; https://doi.org/10.3390/electronics13071284

Submission received: 5 March 2024 / Revised: 19 March 2024 / Accepted: 29 March 2024 / Published: 29 March 2024

(This article belongs to the Special Issue Advances in Low Powered Circuits Design and Their Application)

Download

Browse Figures

Versions Notes

Abstract

The multiplier is the fundamental component of many computing modules. As the most important component of a multiplier, the full adder (FA) also has a significant impact on the overall performance. Full adders based on pass transistor logic (PTL) have been a very popular research field in recent years, but the uneven delay makes it difficult to analyze the critical path of multipliers based on PTL full adders. In this paper, we propose a model to evaluate the critical path of the carry save array (CSA) multiplier that could reduce the size of the simulation input set from 4 G to 93 K to finally obtain the maximum delay of the multiplier. We propose a novel low-power, high-speed CSA multiplier based on both PTL full adders and CMOS full adders, using our critical-path evaluation model. The proposed work is implemented in the 28 nm process. We use the model to reduce the worst-case delay by 14.5%. The proposed multiplier improved the power delay product by 9.4% over the conventional full CMOS multiplier.

Keywords:

CSA multiplier; low power; pass transistor logic; worst-case delay

1. Introduction

Addition, subtraction, and multiplication are fundamental arithmetic operations in digital signal processing (DSP) applications, such as convolution [1,2], finite impulse response (FIR) [3,4], and Fast Fourier transform (FFT) [5,6]. The multiplier is the most complex, time-consuming, and energy-hungry component. It is becoming a bottleneck in the performance of DSP systems with the increase in input bit width [7].

Multipliers can be classified into several types, such as array multipliers, CSA multipliers, Wallace tree multipliers, Booth multipliers, etc. These multipliers are mostly implemented in the conventional complementary metal oxide semiconductor (CMOS) [8]. In contrast to CMOS logic, pass-transistor logic, which consists of pass transistors or pass gates, has been of great interest due to its advantage in the number of transistors. Since [9] first proposed a PTL-based FA circuit in 1992, a variety of PTL-based full adders have been proposed [10,11,12,13].

Ref. [10] proposed an FA based on PTL and gate diffusion input (GDI) technology, which only requires 10 transistors. The 10T-FA demonstrated commendable advancements in both area utilization and power efficiency. However, upon its implementation in a 4-bit adder, the maximum delay was recorded at 1.095 ns. A diode-connected FinFET restorer (D-FinFETs) was proposed in [11] to eliminate a drawback of the replication of full swing in the PTL-based FA. Although the D-FinFETs effectively mitigated certain drawbacks of PTL, they incurred substantial delays and additional power consumption. Ref. [12] incorporated carbon nanotube field-effect transistor (CNFET) technology into the PTL-based FA design, requiring only 18 transistors. This CNFET-based FA was integrated into an 8-bit adder. Unfortunately, specific performance metrics for the 8-bit adder were not explicitly discussed. Additionally, ref. [13] devised a hybrid FA architecture necessitating just 13 transistors, yet its application in more intricate circuit configurations remains unexplored.

Although pass-transistor logic has advantages in terms of transistor number and has achieved some delay and power advantages in full adder circuits, it also has some inevitable drawbacks. When applied to larger and more complex circuits, such as multipliers, the impact of these drawbacks becomes more apparent, resulting in decreased performance [10]. As discussed in [14], it is essential to acknowledge that PTL circuits may suffer from insufficient driving capability and threshold voltage loss due to structural reasons. Moreover, the cascaded PTL-based circuits might incur a cascade R-C chain, resulting in a decrease in output swing and an exponential increase in delay [15]. At present, the solutions to these problems are rarely mentioned in existing literature, especially when applied to complex circuits. Therefore, it is critical to explore how to apply PTL to complex circuits and achieve good performance.

In this paper, we propose a critical path analysis model and design a PTL/CMOS hybrid CSA multiplier of optimum performance. The main contributions of our work are summarized in the following.

We propose a critical path evaluation model to reduce the size of the simulation input set from 4 G to 93 K for simulation. According to the delay simulation of 2500 sets of data, the average and maximum delays of the filtered data are much greater than those of random data, and the model finally helps us obtain the worst-case delay of the multiplier and optimize the delay of the multiplier.
Using the proposed model, we design a hybrid carry save array (CSA) multiplier based on both PTL-based FAs and conventional CMOS FAs with superior performance. We use the model to reduce the worst-case delay by 14.5%.
We take further measures to improve the power and speed and obtained a 9.4% reduction in power delay product over the conventional full CMOS multiplier.

2. Analysis of Full Adders

The traditional CMOS full adder consists of 28 transistors, as shown in Figure 1, referred to as the 28T-FA.

PTL-based full adders often have fewer transistors than 28 due to their structural advantages, offering notable advantages in terms of area and power consumption. However, in PTL-based circuits, certain input signals serve dual roles as both data inputs and power sources for the transmission gate, leading to challenges such as insufficient driving capability and voltage loss. This prevalent issue affects the majority of PTL-based full adders, limiting them as nonfull-swing FAs [16,17].

2.1. Analysis of 2+14T-FA

In contrast, FAs based on complementary pass-gate logic (CPL) circumvent these issues, providing full-swing operation capabilities. The CPL-based FA proposed in [9] consists of 16 transistors, referred to as the 2+14T-FA, where 2 represents the inverter at the B input. The structure of the 2+14T-FA is depicted in Figure 2. For a single transmission gate, threshold loss occurs when A=B. However, in the 2+14T-FA, a PTL-XOR is composed of two transmission gates with opposite structures. Another transmission gate with an opposite structure can make up the threshold loss. In other words, it provides sufficient driving capability through the complementary transmission gates of two different structures, resulting in full-rail output voltage. Therefore, it is preferred to use 2+14T-FA in our multiplier.

For a full adder with three inputs, there are, in total,

2^{3} \times (2^{3} - 1) = 56

input transitions. In the TSMC 28 nm process, we conducted delay and power traversal post-simulations on the 2+14T-FA and 28T-FA, considering all 56 input transitions, as depicted in Figure 3.

The collected data from these simulations were organized based on the changed ports, as outlined in Table 1. The delay of the 2+14T-FA shows uneven distribution, with smaller propagation delays for certain paths but larger delays for others. Especially, the Ci-Co or Ci-Sum propagation delay of the 2+14T-FA (the third row of Table 1) is significantly smaller than that of the 28T-FA. Furthermore, it is indicated that the 2+14T-FA exhibits an overall energy efficiency advantage over the 28T-FA. However, it is essential to note that this energy consumption superiority of the 2+14T-FA becomes relatively less pronounced under conditions involving simultaneous changes in two input transitions.

2.2. The Issue of PTL Cascading

According to [9], the connection of PTL-based FAs, after cascading the transmission gate structure, can be simplified into a chain of transmission gates, as shown in Figure 4a. The parasitic effects of each node are also considered in the figure. Furthermore, it can be further simplified into an equivalent RC network, as shown in Figure 4b for analysis.

Based on the analysis, the delay of a network consisting of n cascaded transmission gates can be calculated using the Elmore approximation formula, as follows:

D e l a y = 0.69 \times R C \times \frac{n (n + 1)}{2}

(1)

This implies that the propagation delay is proportional to

n^{2}

, meaning that the transmission delay will experience a nonlinear and sharp increase as the number of cascaded transmission gates increase.

3. The Critical Path Evaluation Model for CSA Multipliers

3.1. The Challenge of Multiplier’s Delay Analysis

Upon the analysis detailed in Section 2, it becomes evident that delay emerges as a pivotal constraint impeding the scalability of PTL circuits. The uneven distribution of delay in PTL circuits presents a substantial hurdle in delay analysis. On the other hand, it is crucial to mitigate the considerable delays caused by PTL circuit cascading. Addressing these challenges necessitates a methodical evaluation of the multiplier’s delay characteristics, culminating in the determination of maximum delay values and the identification of critical paths.

However, in the application of PTL from individual FA to more complex multiplier circuits, the conventional approach of performance analysis through traversal simulation becomes impractical. For an 8 × 8-bit multiplier, there are

2^{16} \times 2^{16} = 2^{32}

(4 G) input transitions in total. Moreover, due to the introduction of complete-custom circuits, the performance analysis of multipliers based on PTL cannot rely on EDA tools. Consequently, the task of analyzing and evaluating the performance characteristics of PTL-based multipliers, as well as devising strategies for further optimization, has become exceedingly challenging.

Therefore, we propose the critical path evaluation model for analyzing the critical path of the multiplier and accurately determining its maximum delay, as shown in Figure 5. Firstly, we conduct an exhaustive critical path analysis on the CSA multiplier to identify potential critical paths. Then, an algorithm is proposed to reduce the size of the input transition set. Finally, through simulations performed on the reduced input set, we successfully obtain the maximum delay of the multiplier.

This study aims to explore the improvement of PTL full adders on the performance of

8 \times 8

bit multipliers and verify the proposed critical path evaluation model. There are many possible implementations of multipliers, such as carry ripple adder (CRA), CSA, and Wallace Tree. Among these structures, CSA has the simplest and most repetitive connection relationships, making it the clearest for performance analysis. Therefore, this study is focused on unsigned CSA multipliers.

3.2. Analysis of the Potential Critical Paths

Figure 6 shows several potential critical paths in the 8x8-bit unsigned CSA multiplier. In the case of the CSA multiplier, the critical path is comprised of vertical sub-paths and a leftward horizontal sub-path in the last row. It is crucial to note that the horizontal sub-path must traverse a total of seven FAs (or including one half adder (HA)) in order to achieve the maximum delay. Consequently, the endpoint of the vertical propagation paths is the first FA or the HA in the last row. Additionally, the vertical propagation paths can pass through a maximum of six FAs. As shown in Figure 6, there are

2^{6} - 1 = 63

possible vertical propagation paths whose endpoints are the first FA in the last row. Similarly, as shown in Figure 7, there are

2^{5} = 32

paths leading to the endpoint at the HA. Due to the smaller delay of an HA compared to an FA, we do not consider the case where the vertical propagation path passes through HAs. All 95 vertical sub-paths have the potential to produce the maximum delay.

Moreover, the FA and HA in the last-row horizontal sub-path should all produce Co outputs, but they should not produce the sum output except the last FA. As shown in Figure 8, Path 1 is among the potential critical paths, as explained before. At the same time, there might be a Path 2, which goes from the fourth row downward to the fourth FA (from both sides) at the last row, converging with Path 1 and turning left. For a typical CSA multiplier, the A and B input ports of the “converging” FA are included in Path 2, while Ci is included in Path 1. Only its Co continues to go leftwards along the critical path. In the cases that the “converging” FA produces both flipped Sum and Co, the shorter (and likely faster) Path 2 must incur both A and B to flip, which are sufficient to produce the leftgoing Co before the arrival of Path 1. In this case, the delay accumulated along propagation path 1 does not contribute to the delay of this carry output, resulting in a smaller delay. Therefore, in order for this configuration to be realized, it is essential for the seventh FA in the last row to exhibit an output transition, while the other six FAs produce no

S u m

outputs.

In general, according to the analysis above, the specific selection conditions for the potential critical paths in the 8 × 8-bit unsigned CSA multiplier are the following.

The vertical propagation paths go through a maximum of six FAs. The endpoint of the vertical propagation paths is the first FA or the HA in the last row.
The FA in the last row of the leftmost column outputs a changed Sum/Co, while the other six FAs in the last row outputs no changed Sum.

3.3. An Algorithm for the Critical Path Evaluation Model

A MATLAB algorithm is proposed to identify input transitions corresponding to the potential critical paths. The specific algorithm, as shown in Algorithm 1, is designed to systematically analyze the path of the FAs in response to varying input transitions. Considering that this model is for path analysis, the HA is directly represented by FA with

C i = 0

. Firstly, a model of the FA is constructed, including inputs

A, B, C i

, and outputs

S u m, C o

. The model is reused 56 times as the basic computational unit of the CSA multiplier. The inputs and outputs of the 56 FAs are initialized to 0. Secondly, we use a 32-bit incremental sequence (i.e.,

\leq 2^{32}

)

t e m p

as the changing values for

x, y

. Through iterating

2^{32}

times, the algorithm comprehensively explores all possible transitions for

x, y

. In each iteration, the 56 FAs produce their results (

S u m

and

C o

) row by row according to the multiplier structure. Finally, if the

S u m / C o

of an FA is not equal to the last

S u m / C o

, it indicates a transmission has occurred. If the transmission paths meet the potential critical paths, the current

x, y

and the

x_l a s t, y_l a s t

are recorded and form a set of input transitions.

Our algorithm adopts a detailed model of the CSA multiplier and filters out input transitions that satisfy the conditions for potential critical paths. The number of input transitions is reduced from 4 G to 93 K, significantly enhancing simulation efficiency. To validate the model’s precision, 2500 input transitions were chosen from the 93 K sets for simulation within the proposed multiplier. Concurrently, an additional 2500 input transitions were selected randomly from the 4 G sets for comparative analysis. The probability distribution of delay results for these two sets of simulations is depicted in Figure 9. It reveals a heightened probability of filtered transitions displaying notable delays in contrast to random transitions. Furthermore, the maximum delay observed in filtered transitions surpasses that recorded in random transitions.

Algorithm 1 Filter input transitions.

$for i = 1 t o 56$
$F A (i) . A \leftarrow F A (i) . B \leftarrow F A (i) . C i \leftarrow 0$
$F A (i) . S u m \leftarrow F A (i) . C o \leftarrow 0$
end
$x [1 : 8] \leftarrow y [1 : 8] \leftarrow 0$
while $t e m p \leq 2^{32}$
$x_l a s t \leftarrow x;$ $y_l a s t \leftarrow y;$ $F A_l a s t \leftarrow F A$
$if f l a g = 0 then$
$t e m p \leftarrow t e m p + 1;$ $f l a g \leftarrow 1$
$x [1 : 8] \leftarrow d 2 b i n (t e m p) [1 : 8]$
$y [1 : 8] \leftarrow d 2 b i n (t e m p) [9 : 16]$
$else then$
$t e m p \leftarrow t e m p;$ $f l a g \leftarrow 0$
$x [1 : 8] \leftarrow d 2 b i n (t e m p) [17 : 24]$
$y [1 : 8] \leftarrow d 2 b i n (t e m p) [25 : 32]$
$end$
$a s s i g n t h e p a r t i a l p r o d u c t o f x & y a n d t h e S u m / C o$
$o f t h e F A s t o t h e c o r r e s p o n d i n g F A s^{'} A, B, a n d C i$
$b a s e d o n t h e m u l t i p l i e r s t r u c t u r e .$
$for i = 1 t o 56$
$F A (i) . S \leftarrow F A (i) . A \oplus F A (i) . B \oplus F A (i) . C i$
$F A (i) . C O \leftarrow F A (i) . A + F A (i) . B + F A (i) . C i \geq 2$
$end$
$if A l l t h e F A . S u m (C o) \neq F A_l a s t . S u m (C o) o n t h e$
$c r i t i c a l p a t h then$
$return x_l a s t, y_l a s t, x, y$
end

Additionally, this script can also generate a simulation script directly, which can be embedded in Virtuoso to simulate the multiplier with the selected input transitions and find the worst-case delay. Applying this model, we identified the input transition “00010000 × 10100010” to “11010011 × 11101001” as the one causing the maximum delay in the proposed multiplier. The corresponding critical path is shown in Figure 10.

In order to further validate the accuracy of the proposed model, we automatically synthesized an 8-bit signed multiplier using the Design Compiler. Its structure is similar to a Wallace tree multiplier, and we refer to it as the DC multiplier. We applied the model to the DC multiplier and conducted delay simulations on 2500 sets of filtered input transitions and 2500 sets of random input transitions. As shown in Figure 11, the delay distribution of the filtered transitions exhibits a higher probability at larger values.

4. Hybrid CSA Multiplier

4.1. Analysis of the 2+14T-FA Applied in Multiplier

Figure 12 shows the structure of the CSA multiplier based on CMOS. A straightforward method to design a CSA multiplier with 2+14T-FAs is to replace all the full adders in Figure 12 with 2+14T-FAs. The PTL-based multiplier functions correctly and produces accurate results. However, the maximum delay of this multiplier is as high as twice that of a CMOS multiplier, as is shown in Table 2. This is because the cascading PTL circuit suffers from an exponential increase in delay based on the analysis in Section 2.2.

The method to interrupt the cascading PTL circuits is to insert a complementary metal oxide semiconductor (CMOS) circuit between them. We have tried two resolutions to solve the cascading problem of 2+14T FA.

Insert buffer or inverter every other row;
Interleave 2+14T-FAs and 28T-FAs, as shown in Figure 13.

As shown in the pre-simulation results in Table 2, resolutions such as inserting buffers or inverters proved ineffective due to additional delay and power consumption. We also tried to insert a buffer or inverter every two or four full adders, but the delay was also very poor. However, interweaving 2+14T FA and 28T FA does not add additional circuit components, obtaining a superior power-delay product (PDP) compared to full CMOS multipliers.

4.2. Optimize The Hybrid Multipliers through the Proposed Evaluation Model

Figure 13 shows the initial interleaved structure of the hybrid multiplier based on CMOS and PTL. As evaluated by the proposed critical path evaluation model, the maximum delay of this multiplier is 701 ps. In the vertical transmission path, the delay of 2+14T-FAs in the first row is significantly larger than the other rows, and the delay of the several consecutive B-Sum transmission paths is also particularly large. In the horizontal transmission path, despite the insertion of 28T-FAs interrupting the cascade of 2+14T-FAs, the delay of the last row of ripple-FAs remains large. In the following, we will propose optimization measures for the hybrid multiplier based on the critical path identified by the evaluation model and the analysis of 2+14T-FA in Section 2.1.

Optimization measure 1: Exchange the connection between Sum and Co with the next row.

According to the previous analysis, the delay from A/B to Sum was generally larger, and if they appeared continuously on the vertical transmission path, a significant delay would accumulate. Therefore, we exchange the connection of the vertical FAs from Figure 14a to Figure 14b. After optimization, we re-evaluate the critical path of the multiplier using the proposed evaluation model. As shown in Table 3, optimization measure 1 not only reduces the delay slightly but also achieves significant improvements in power consumption.

Optimization measure 2: Replace the PP-input-FA (FA in the first row and the first column) with 28T-FA.

When multiple inputs changed simultaneously, the delay and power consumption of the 2+14T-FA would be larger. The input of the FAs in the first row and the first column is partial products (PP), which often change simultaneously. Therefore, we further optimized the multiplier structure by replacing the first row and the first column adders with 28T-FA. After optimization, we continue to utilize the proposed evaluation model to assess the critical path of the multiplier. As shown in Table 3, optimization measure 2 further reduces the PDP by approximately 7%.

Optimization measure 3: PTL-HA.

After optimization measure 2, the circuit portion of PTL replacing the multiplier is relatively small, and the power advantage is not significant enough. Therefore, we designed PTL-HA and replaced all half adders in the multiplier with PTL-HA, further reducing the power consumption of the multiplier.

Optimization measure 4: Optimize the delay in the last row.

According to the previous analysis in Section 3.2, the leftward horizontal sub-path in the last row is an important part of the CSA multiplier’s critical path. Therefore, the optimization of the delay in the last row assumes a paramount significance in the endeavor to curtail the maximum delay of our multiplier. Firstly, due to the horizontal cascading of the final row’s FAs, a 28T-FA is inserted every 2 2+14T-FA. Secondly, we could remove the input inverter of the 2+14T-FA and turn it into a 14T-FA. The function of the removed inverter could be provided by the 28T-FA in the sixth row, as shown in Figure 15. In general, the last row consists of one PTL-HA, five 14T-FAs, and two 28T-FAs.

The final multiplier structure with four optimization measures is shown in Figure 16. After optimization, we once again use the proposed evaluation model to assess the critical path of the multiplier. As shown in Table 3, the final power-delay product of the multiplier has been reduced by approximately 25% compared to the initial value.

5. Experimental Results

This paper presents multipliers based on customized 2+14T-FA, PTL-HA, and standard cells implemented in the TSMC 28 nm process using IC Compiler for layout and routing. The layout of the proposed hybrid CSA multiplier is shown in Figure 17. There are several points to consider when creating the layout.

We manually drew the layouts of the 2+14T-FA and PTL-HA using Virtuoso, and employed the IC Compiler for automated custom layout and routing of the hybrid multiplier design.
To enable the IC Compiler automation flow, the height of the PTL-FA layout must match the height of other CMOS standard cells used. For instance, since we utilized the 7-track library of TSMC 28 nm, the height of the PTL-FA layout had to be maintained at 0.7 μm.
During the IC Compiler automated layout and routing, manual intervention was carried out in order to minimize wire length and reduce maximum delay. In the placement stage, as the final row of adders constituted the critical path with complex interconnections, we arranged the FAs in this row as close as possible and fixed their positions before proceeding with automated layout and routing.

Simulations were conducted at 0.9 V and 27 degrees Celsius using Virtuoso. Table 3 presents the post-simulation results of the four PTL-CMOS hybrid multipliers, with a simulation frequency of 50 MHz. Guided by the proposed evaluation model, the power consumption and maximum delay of the PTL-CMOS hybrid multiplier are continuously reduced by approximately 25%.

Figure 18a,b display the post-layout simulation results of power consumption of the proposed multiplier and CMOS multiplier at various voltages. The power consumption of both multipliers decreases with the decrease in voltage, while the delay increases. Thus, the PDP of both multipliers reaches its minimum at 0.9 V. The PDP of the proposed multiplier is always better than the CMOS multiplier at various voltages. Figure 18c,d show the post-layout simulation at different corners. The PDP of the proposed multiplier remains superior. Table 4 presents the comparison at a voltage of 0.9 V and a temperature of 27 degrees Celsius in Virtuoso, with a simulation frequency of 500 MHz. Compared with the traditional CMOS-based multiplier, the proposed multiplier achieves a 9.4% improvement in PDP and a 14.3% improvement in area power production (APP).

Furthermore, we conducted a pre-simulation of the proposed hybrid multiplier and traditional CMOS multiplier under the 40 nm process, with a simulation frequency of 50 MHz. As shown in Table 5, the proposed hybrid multiplier still exhibits performance advantages under the 40 nm process.

All results in Table 4 are post-simulation results. Compared to 8-bit unsigned multipliers in the 45 nm process, both our PDP and APP are superior to [20], and our APP is superior to [21]. Considering the fact that our process is more advanced than [20,21], the proposed multiplier was also compared with works that are consistent with or closer to our implementation process. When compared to [18] in the 32 nm process, we have significant advantages in both PDP and APP. Compared to [19] in the 28 nm process, our APP has an advantage, although our delay is slightly larger. This is because [19] is an approximate multiplier, resulting in minimal delay. Despite performing exact calculations, we still achieve a smaller area and power consumption. Therefore, our proposed multiplier demonstrates excellent performance advantages in terms of low power consumption and small area.

6. Conclusions

In this paper, a model for evaluating the critical path of a CSA multiplier is presented. It can be used to evaluate the critical path of the CSA multiplier to obtain the maximum delay and narrow the size of the simulation input set from 4 G to 93 K. This model has also been applied to other structural multipliers, with effective path screening. We proposed a PTL-CMOS hybrid CSA multiplier and, with the assistance of the model, achieve a reduction in its maximum delay by approximately 25%. Compared to traditional CMOS-based CSA multipliers, the proposed multiplier achieves a 9.4% improvement in power delay product and a 14.3% increase in area-power product. Simulations comparing the two multipliers under different processes, voltages, and corners show that the proposed hybrid multiplier exhibits excellent performance. This indicates that integrating PTL into multipliers can significantly reduce circuit power consumption and area. Furthermore, by finely designing PTL circuits’ drive and load, the issues of insufficient drive capability and significant delay can be addressed.

Author Contributions

Conceptualization, Y.Y.; data curation, W.P. and C.T.; methodology, Y.Y. and N.Y.; project administration, Z.Y. and N.Y.; software, Y.Y. and W.P.; validation, Y.Y., W.P. and C.T.; writing—original draft, Y.Y.; writing—review and editing, N.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the Key-Area Research and Development Program of Guangdong Province under Grant 2021B1101270005 and 2021B0101410004; in part by the National Key Research and Development Program of China under Grant 2017YFA0206200 and 2018YFB2202601; in part by the National Natural Science Foundation of China (NSFC) under Grant 61834005 and Grant 61902443; in part by the Guangdong Basic and Applied Basic Research Foundation under Grant 2022A1515011708; in part by the Zhuhai Industry-Academic collaboration program ZH22017001200097PWC.

Data Availability Statement

Due to privacy restrictions, research data in manuscripts cannot be disclosed.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

FA	full adder
HA	half adder
PTL	pass transistor logic
CPL	complementary pass-gate logic
CMOS	complementary metal oxide semiconductor
CSA	carry save array
DSP	digital signal processing
FIR	finite impulse response
FFT	Fast Fourier transform
GDI	gate diffusion input

References

Ganesh, K.T.; Kumar, B.V.S.; Mihiraamsh, B.S.; Akhil, G.; Ravitej, V.; Murugan, S. Low power and single multiplier design for 2D convolutions. In Proceedings of the 2021 Second International Conference on Electronics and Sustainable Communication Systems (ICESC), Coimbatore, India, 4–6 August 2021; pp. 1957–1964. [Google Scholar] [CrossRef]
Kamdi, R.; Thakre, P.; Pathade, A.; Tiwari, S.K.; Kalbande, K. 4 Bit and 8 Bit Convolution Using Vedic Multiplier. In Proceedings of the 2022 International Conference on Emerging Trends in Engineering and Medical Sciences (ICETEMS), Nagpur, India, 18–19 November 2022; pp. 352–357. [Google Scholar] [CrossRef]
Patel, S.; Khare, K.; Yadav, J.; Yadav, P. High Performance Robust FIR Filter Design Using Radix-8 Based Improved Booth Multiplier For Signal Processing Application. In Proceedings of the 2021 8th International Conference on Signal Processing and Integrated Networks (SPIN), Noida, India, 26–27 August 2021; pp. 82–87. [Google Scholar] [CrossRef]
Sravani, K.; Saisri, M.; Sivani, U.V.; Kumar, A.R. Design and Implementation of Optimized FIR Filter using CSA and Booth Multiplier for High Speed Signal Processing. In Proceedings of the 2023 4th International Conference for Emerging Technology (INCET), Belgaum, India, 26–28 May 2023; pp. 1–6. [Google Scholar] [CrossRef]
Pang, L.; Chan, K.; Wong, S.; Tan, C.W. VHDL Modeling of Booth Radix-4 Floating Point Multiplier for VLSI Designer’s Library. Wseas Trans. Syst. 2013, 12, 1–11. [Google Scholar]
Mahesh, B.V.; Srivasarao, T. Performance Evaluation of FFT through Adaptive Hold Logic (AHL) Booth Multiplier. In Proceedings of the 2023 International Conference for Advancement in Technology (ICONAT), Goa, India, 24–26 January 2023; pp. 1–6. [Google Scholar] [CrossRef]
Abraham, S.; Kaur, S.S.S. Study of various high speed multipliers. In Proceedings of the 2015 International Conference on Computer Communication and Informatics (ICCCI), Coimbatore, India, 8–10 January 2015. [Google Scholar]
Kandregula, P.C.V.G.M.G.B. Design of Area Efficient, Low Power, High Speed and Full Swing Hybrid Multipliers. In Proceedings of the 2021 International Conference on Computing, Communication, and Intelligent Systems (ICCCIS), Coimbatore, India, 27–29 January 2021. [Google Scholar]
HM, Z.N.W. A new design of the CMOS full adder. IEEE J. -Solid-State Circuits 1992, 27, 840–844. [Google Scholar]
Nirmalraj, T.; Pandiyan, S.K.; Karan, R.K.; Sivaraman, R.; Amirtharajan, R. Design of Low-Power 10-Transistor Full Adder Using GDI Technique for Energy-Efficient Arithmetic Applications. Circuits Syst. Signal Process. 2023, 42, 6. [Google Scholar] [CrossRef]
Saraswathi, C.; Rani, N.U.; Nagateja, T. High performance and energy efficient FinFET based 1-bit PT full adders. In Proceedings of the 2016 IEEE International Conference on Computational Intelligence and Computing Research (ICCIC), Chennai, India, 15–17 December 2016; pp. 1–4. [Google Scholar] [CrossRef]
Jitendra, K.S.; Srinivasulu, A.; Singh, B.P. A new low-power full-adder cell for low voltage using CNTFETs. In Proceedings of the 2017 9th International Conference on Electronics, Computers and Artificial Intelligence (ECAI), Targoviste, Romania, 29 June–1 July 2017; pp. 1–5. [Google Scholar] [CrossRef]
Singh, S.S.; Leishangthem, D.; Shah, M.N.; Shougaijam, B. A Unique Design of Hybrid Full Adder for the Application of Low Power VLSI Circuits. In Proceedings of the 2020 4th International Conference on Electronics, Communication and Aerospace Technology (ICECA), Coimbatore, India, 5–7 November 2020; pp. 260–264. [Google Scholar] [CrossRef]
Naseri, H.; Timarchi, S. Low-Power and Fast Full Adder by Exploring New XOR and XNOR Gates. IEEE Trans. Very Large Scale Integr. (Vlsi) Syst. 2018, 26, 1481–1493. [Google Scholar] [CrossRef]
Yin, N.; Pan, W.; Yu, Y.; Tang, C.; Yu, Z. Low-Power Pass-Transistor Logic-Based Full Adder and 8-Bit Multiplier. Electronics 2023, 12, 3209. [Google Scholar] [CrossRef]
Bui, H.T.; Wang, Y.; Jiang, Y. Design and analysis of low-power 10-transistor full adders using novel XOR-XNOR gates. IEEE Trans. Circuits Syst. Ii Analog. Digit. Signal Process. 2002, 49, 25–30. [Google Scholar] [CrossRef]
Vigneswaran, T.; Mukundhan, B.P.S. A novel low power and high performance 14 transistor CMOS full adder cell. J. Appl. Sci. 2007, 6, 1978–1981. [Google Scholar] [CrossRef]
Boppana Krishna, N.V.V.; Kommareddy, J.R.S. Low-Cost and High-Performance 8 × 8 Booth Multiplier. Circuits Syst. Signal Process. 2019, 38, 4357–4368. [Google Scholar] [CrossRef]
Strollo, A.G.M.; Napoli, E.; De Caro, D.; Petra, N.; Meo, G.D. Comparison and Extension of Approximate 4-2 Compressors for Low-Power Approximate Multipliers. IEEE Trans. Circuits Syst. Regul. Pap. 2020, 67, 3021–3034. [Google Scholar] [CrossRef]
Waris, H.; Wang, C.; Liu, W.; Han, J.; Lombardi, F. Hybrid Partial Product-Based High-Performance Approximate Recursive Multipliers. IEEE Trans. Emerg. Top. Comput. 2022, 10, 507–513. [Google Scholar] [CrossRef]
Amirafshar, N.; Baroughi, A.S.; Shahhoseini, H.S.; TaheriNejad, N. Carry Disregard Approximate Multipliers. IEEE Trans. Circuits Syst. Regul. Pap. 2023, 70, 4840–4853. [Google Scholar] [CrossRef]

Figure 1. Traditional complementary metal oxide semiconductor (CMOS) full adder (28T-FA).

Figure 2. The pass transistor logic (PTL) full adder proposed in [9] (2+14T-FA).

Figure 3. Traversal simulations on the 2+14T−FA and 28T−FA with 56 input transitions. (a) Delay of Co. (b) Delay of Sum. (c) Energy consumption.

Figure 4. The equivalent circuit of cascaded pass transistor logic (PTL)-full adder (FA). (a) Transmission gate chain. (b) Equivalent Resistance-Capacitance (RC) network.

Figure 5. The critical path evaluation model for carry save array (CSA) multipliers.

Figure 6. Potential critical paths in carry save array (CSA) multiplier, where vertical propagation path endpoints are the first full adder (FA) in the last row.

Figure 7. Potential critical paths in CSA multiplier, where vertical propagation paths’ endpoints are the HA in the last row.

Figure 8. An early

S u m

output in the last row.

Figure 8. An early

S u m

output in the last row.

Figure 9. The probability distribution of delay results for filtered transitions and random transitions.

Figure 10. Critical path of proposed multiplier.

Figure 11. The probability distribution of delay results for filtered transitions and random transitions in the Design Compiler multiplier.

Figure 12. The structure of the complementary metal oxide semiconductor (CMOS) multiplier.

Figure 13. Initial hybrid structure of the multiplier.

Figure 14. Exchange the connection. (a) Initial connection. (b) New connection.

Figure 15. Shared inverter between 28T-full adder (FA) and 14T-FA.

Figure 16. Final hybrid structure of the multiplier.

Figure 17. The layout of proposed hybrid carry save array (CSA) multiplier.

Figure 18. Simulation results of multipliers versus voltage and corner. (a) Power simulation results versus voltage. (b) Power-delay product (PDP) simulation results versus voltage. (c) Power simulation results versus corner. (d) PDP simulation results versus corner.

Table 1. Traversal simulations on the 2+14T-FA and 28T-FA.

Changed Port	Delay_Co(ps)		Delay_Sum(ps)		Energy(fJ)
Changed Port	28T	2+14T	28T	2+14T	28T	2+14T
A	42.00	74.50	51.75	73.38	3.00	2.99
B	40.75	67.75	51.63	83.88	2.94	2.88
Ci	38.25	24.50	50.75	29.00	2.81	1.70
A,B	37.25	39.50	N/A	N/A	3.31	2.40
A,Ci	36.75	63.25	N/A	N/A	3.19	3.50
B,Ci	35.50	53.50	N/A	N/A	3.22	4.73
A,B,Ci	45.00	52.25	68.38	55.57	5.21	4.00

Table 2. Comparison of resolutions to the cascading issue of 2+14T-full adder (FA).

Resolutions	Power (μW)	Delay_max (ps) ¹	PDP %
28T	7.70	314	47.55%
2+14T	6.51	781	100%
2+14T with buffer	7.64	430	64.76%
2+14T with inverter	7.30	333	47.44%
2+14T-28T	7.11	310	43.31%

¹ All worst-case delay was obtained using the evaluation model proposed in Section 3.2.

Table 3. Comparison of pass transistor logic (PTL)-complementary metal oxide semiconductor (CMOS) hybrid multiplier in the optimization process.

Hybrid Multiplier	Power (μW)	Delay_max (ps) ²	PDP (%)
Initial multiplier	8.550	701	100%
Optimization 1	7.490	699	87.35%
Optimization 1,2	7.080	683	80.68%
Optimization 1,2,3	6.857	684	78.25%
Optimization 1,2,3,4	6.585	668	75.59%

² The worst-case delay was obtained using the evaluation model proposed in Section 3.2.

Table 4. Comparison with complementary metal oxide semiconductor (CMOS)-based multiplier and other multipliers.

Multiplier	Proposed	CMOS-Based	[18] 2019	[19] 2020	[20] 2022	[21] 2023
Process	28 nm	28 nm	32 nm	28 nm	45 nm	45 nm
Bit width	8 bit	8 bit	8 bit	8 bit	8 bit	8 bit
Voltage (V)	0.9	0.9	1.05	0.9	N/A	N/A
Power (μW)	97.68 @500 MHz	113.02 @500 MHz	342 @500 MHz	162	181.65	85.61
Delay_max (ps)	797	757	1177	249	840	760
Area (μm²)	117.99	121.44	1190	175	419.23	300.6
Transistors	1610	1872	N/A	N/A	N/A	N/A
PDP (pJ)	0.078	0.086	0.407	0.040	0.153	0.065
APP (μm² × W)	0.012	0.014	0.403	0.028	0.076	0.026

Table 5. Comparison of pass transistor logic (PTL)-complementary metal oxide semiconductor (CMOS) hybrid multiplier in the 40 nm process.

Multiplier	Power (μW)	Delay_max (ps)	PDP (%)
hybrid multiplier	3.73	793	86.95%
CMOS multiplier	4.47	761	100%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yu, Y.; Pan, W.; Tang, C.; Yin, N.; Yu, Z. Design of a High-Speed, Low-Power PTL-CMOS Hybrid Multiplier Using Critical-Path Evaluation Model. Electronics 2024, 13, 1284. https://doi.org/10.3390/electronics13071284

AMA Style

Yu Y, Pan W, Tang C, Yin N, Yu Z. Design of a High-Speed, Low-Power PTL-CMOS Hybrid Multiplier Using Critical-Path Evaluation Model. Electronics. 2024; 13(7):1284. https://doi.org/10.3390/electronics13071284

Chicago/Turabian Style

Yu, Yihe, Wanyuan Pan, Chengcheng Tang, Ningyuan Yin, and Zhiyi Yu. 2024. "Design of a High-Speed, Low-Power PTL-CMOS Hybrid Multiplier Using Critical-Path Evaluation Model" Electronics 13, no. 7: 1284. https://doi.org/10.3390/electronics13071284

APA Style

Yu, Y., Pan, W., Tang, C., Yin, N., & Yu, Z. (2024). Design of a High-Speed, Low-Power PTL-CMOS Hybrid Multiplier Using Critical-Path Evaluation Model. Electronics, 13(7), 1284. https://doi.org/10.3390/electronics13071284

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Design of a High-Speed, Low-Power PTL-CMOS Hybrid Multiplier Using Critical-Path Evaluation Model

Abstract

1. Introduction

2. Analysis of Full Adders

2.1. Analysis of 2+14T-FA

2.2. The Issue of PTL Cascading

3. The Critical Path Evaluation Model for CSA Multipliers

3.1. The Challenge of Multiplier’s Delay Analysis

3.2. Analysis of the Potential Critical Paths

3.3. An Algorithm for the Critical Path Evaluation Model

4. Hybrid CSA Multiplier

4.1. Analysis of the 2+14T-FA Applied in Multiplier

4.2. Optimize The Hybrid Multipliers through the Proposed Evaluation Model

5. Experimental Results

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI