Design of Complementary Metal–Oxide–Semiconductor Encoder/Decoder with Compact Circuit Structure for Booth Multiplier

Wang, Yu-Nsin; Hung, Yu-Cherng

doi:10.3390/engproc2025103021

Open AccessProceeding Paper

Design of Complementary Metal–Oxide–Semiconductor Encoder/Decoder with Compact Circuit Structure for Booth Multiplier^†

by

Yu-Nsin Wang

and

Yu-Cherng Hung

^*

Department of Electronic Engineering, National Chin-Yi University of Technology, Taichung 411030, Taiwan

^*

Author to whom correspondence should be addressed.

^†

Presented at the 8th Eurasian Conference on Educational Innovation 2025, Bali, Indonesia, 7–9 February 2025.

Eng. Proc. 2025, 103(1), 21; https://doi.org/10.3390/engproc2025103021

Published: 1 September 2025

(This article belongs to the Proceedings of The 8th Eurasian Conference on Educational Innovation 2025)

Download

Browse Figures

Versions Notes

Abstract

Multipliers are crucial components in digital processing and the arithmetic logic unit (ALU) of central processing unit (CPU) design. As the data bit length increases, the number of partial products in the multiplication process increases, resulting in an increased summation time for the partial products. Consequently, the speed of the multiplier circuit is adversely affected by increased time delays. In this article, we present a combined radix-4 Booth encoding module that employs metal–oxide–semiconductor (MOS) transistors that share common control signals to reduce the transistor count. In HSPICE simulations, the functionality of the proposed circuit architecture was verified, and the number of transistors used was successfully reduced.

Keywords:

multipliers; CMOS circuits; Booth multipliers; booth encoder; booth decoder

1. Introduction

Multiplication is a fundamental arithmetic operation and is critical in various microprocessors and digital signal processing systems. The traditional multiplication process consists of two main operations: partial product generation and the addition of these partial products. In the digital multiplication of two binary numbers, the multiplicand and the multiplier are used to generate the related partial products. Once all partial products are generated, a matrix of partial products is formed. After this process, related partial products are added, from the least significant bit (LSB) to the most significant bit (MSB), by using a digital adder array. In a multiplier chip design, the adder array used for partial product accumulation is a major source of the chip area, time delay, and power consumption. Furthermore, as the data bit length increases, the traditional adder array needs more time to calculate the corresponding sum and carry values. The operating time of a multiplier in a traditional circuit architecture increases with the bit length of input data.

Booth’s multiplication algorithm was invented in the 1950s by Andrew Donald Booth, a British electrical engineer and computer scientist. Radix-4 Booth encoding technology reduces the number of partial product rows but effectively decreases the number of accumulation operations. As a result, the overall performance of multipliers is improved by Booth encoding, compared with that of the conventional multiplier circuits. By encoding the input data of the multiplier, the number of partial product rows is decreased. The reduction in partial product rows means that fewer adders are required. Fewer adders lead to significant reductions in the multiplier’s area, delay, and power consumption [1,2]. Therefore, the technique of radix-4 Booth encoding is often employed in advanced multiplier designs [3,4,5,6,7].

In this study, we explored the design of the corresponding complementary metal–oxide–semiconductor (CMOS) circuit for the radix-4 Booth encoder/decoder. Furthermore, we simplified the circuit design and reduced power consumption at the cost of a slight time delay penalty.

2. Radix-4 Booth Encoding

The Booth algorithm, first proposed by Andrew D. Booth in 1951 [8], is based on the data of the multiplier. If consecutive 1s in the multiplier are detected, the bits starting from the first 1 in the sequence are changed to a value of −1, with the next bit after the consecutive 1s being set to 1, while the bits in between are set to 0. Therefore, as the number of 0s in the multiplier increases, the number of generated partial product rows that require addition operations decreases as follows:

{(14)}_{10} = {(0 0 1 1 1 0)}_{2}

(1)

X \times {(0 0 1 1 1 0)}_{2} = X \times (2^{3} + 2^{2} + 2^{1}) = X \times 14

(2)

X \times {(0 1 0 0 - 1 0)}_{2} = X \times (2^{4} - 2^{1}) = X \times 14

(3)

where X is the multiplicand, and the number 14 in this case is the multiplier. Equation (1) shows the binary representation of the decimal number 14. In traditional multiplication, as shown in Equation (2), three partial products are generated. However, if we convert from (0 0 1 1 1 0)₂ to (0 1 0 0 −1 0)₂, (3), only two partial products are generated in this case.

The traditional Booth encoding, specifically referred to as radix-2 Booth encoding, was proposed in Ref. [8]. The encoding method is shown in (6), and this method does not reduce the number of products generated. Therefore, this encoding is not effective in reducing the generation of partial product rows.

{(B)}_{10} = \sum_{i = 0}^{n - 1} b_{i} {\times 2}^{i}

(4)

{(B)}_{10} = - b_{n - 1} \times 2^{n - 1} + \sum_{i = 0}^{n - 2} b_{i} \times 2^{i}

(5)

B = (b_{n - 2} - b_{n - 1}) \times 2^{n - 1} + (b_{n - 3} - b_{n - 2}) \times 2^{n - 2} + (b_{n - 4} - b_{n - 3}) \times 2^{n - 3} + \dots + (b_{0} - b_{1}) \times 2^{1} + (b_{- 1} - b_{0}) \times 2^{0}

(6)

Radix-4 Booth encoding [1] is an improvement over radix-2 Booth encoding and is also called modified Booth encoding. Its encoding method, as shown in (7), reduces the number of partial product rows from n to either n/2 or (n/2) + 1, depending on whether the multiplication is signed or unsigned. The coefficients for radix-4 Booth encoding, with a digit set of −2, −1, 0, 1, and 2, are shown in Table 1, where Y represents the multiplier, X represents the multiplicand, and PP represents the coefficient of the partial product.

B = (- 2 b_{n - 1} + b_{n - 2} + b_{n - 3}) \times 2^{n - 2} + (- 2 b_{n - 3} + b_{n - 4} + b_{n - 5}) \times 2^{n - 4} + \dots + (- 2 b_{3} + b_{2} + b_{1}) \times 2^{2} + (- 2 b_{1} + b_{0} + b_{- 1}) \times 2^{0}

(7)

3. Proposed Radix-4 Booth Encoding Module

3.1. Encoder

The radix-4 Booth coding module is illustrated in Figure 1a, with its generated coefficients presented in Table 2. The signal neg_i represents a negative multiplier and is directly generated by the signal Y_2i+1. The signal Y_iS represents a multiple of 1 and is generated by a CMOS-based XOR gate, with its inverted signal Y_iSI produced through an inverter. The signal Y_iC represents a multiple of 2 and is generated by a 3-3 OAI (OR-AND-Invert) structure, with its inverted signal Y_iCI also produced through an inverter. The signal zero_i represents a multiple of 0 and is generated by a 3-3 OAI (OR-AND-Invert) structure. The signal S_i is a correction bit required for the modified sign extension structure technique proposed in reference [9], considering its use in the adder array. The signal S_i is generated by a 2-1 AOI (AND-OR-Invert) structure, with its inverted signal S_iI produced through an inverter.

3.2. Decoder

The transistor-level implementation of the demux-type 4-bit radix-4 Booth decoder circuit is illustrated in Figure 2a. Each stage of the demux is controlled by a transmission gate (TG), which is driven by the Y_iS and Y_iC signals. The first stage of the decoder circuit is illustrated in Figure 2b, which demonstrates the elemental principles. By observing the signals Y_iS, Y_iC, and zero_i in Table 2, signals Y_iS, Y_iC, and zero_i never become 1 simultaneously. When signal Y_iS = 1, the TG gate controlled by signal Y_iS is turned on, while the TG gate controlled by signal Y_iC remains off, and the signals NX_j output is directed to PP_ij. When signal Y_iC = 1, the TG gate controlled by signal Y_iS remains off, while the TG gate controlled by signal Y_iC and the NMOS connected to PP_ij turn on. The signals NX_j output is directed to PP_ij+₁, and the NMOS outputs 0 to PP_ij. When signals zero_i = 1, both TG gates remain off, and the NMOS controlled by the signal zero_i turns on, outputting 0 to all PP. The demux-type decoder used to output the correction bit S_i is illustrated in Figure 2c. Its operation is the same, with the difference being the additional NMOS controlled by the signal Y_iS at out2. When the signal Y_iS = 1, the NMOS conducts and outputs 0 to out2.

4. Circuit Improvement and Simulation Results

4.1. OAI_zero_i and OAIT_zero_i

The OAI structure used to generate the signal zero_i is illustrated in Figure 3a. Its delay time was imbalanced, so the NMOS part was modified, resulting in the architecture illustrated in Figure 3b. Compared with the original structure, the delay time of the proposed architecture is more balanced, with values closer to those of the original structure’s better performance, as shown in Table 3.

4.2. X_j_2 and X_j2

In the decoder, the signal NX_j is obtained by performing an XOR operation between the multiplicand X_j and the signal Y_2i+1, indicating a negative value in Booth encoding. The operation results are shown in Table 4. When Y_2i+1 = 0, it indicates a positive value, and the signal NX_j is the same as X_j. Conversely, when Y_2i+1 = 1, it indicates a negative value, and the signal NX_j is the inverse of X_j. The corresponding truth table is shown in Table 4.

Figure 4a illustrates two CMOS-type XOR gates, which share the signal Y_2i+1 from the same Booth encoding group. To optimize the design, they were merged into a single structure by sharing MOS transistors controlled by the common signal, resulting in the architecture shown in Figure 4b. Table 5 presents a comparison of the original and revised architectures, showcasing the proposed design’s advantages in terms of reduced power consumption, lower delay, and a reduced number of transistors.

4.3. z_c_s, zc_s, and zcs

The original architecture for generating the Booth encoding signals Y_iC, Y_iS, and zero_i is illustrated in Figure 5a. In this design, many MOS transistors are controlled by the same signals. To optimize the design, we implemented a shared MOS approach. The combined architecture for generating the zero_i and Y_iC signals is shown in Figure 5b, while Figure 5c presents the fully integrated architecture, which also incorporates Y_iS signal generation. Table 6 provides a comparison of these three architectures. Although the fully merged design reduces the transistor count compared to the other two, it exhibits greater signal delay.

5. Conclusions

We examined a circuit design for shared MOS transistors to reduce the transistor count. By sharing MOS transistors controlled by the same signals, different signal generation architectures are merged into a single design. The proposed architecture in this study maintains correct functionality. Although it does not yield significant improvements in power consumption or delay time, it successfully reduces the number of transistors used.

Author Contributions

Conceptualization, Y.-N.W. and Y.-C.H.; experiments and data validation, Y.-N.W.; writing—original draft preparation, Y.-N.W.; writing—review and editing, Y.-C.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

All simulated results were obtained by using HSPICE simulator of EDA (electronic design automation) software tools for VLSI designing.

Acknowledgments

We would like to express our gratitude to the United Microelectronics Corporation (UMC), Taiwan, and the Taiwan Semiconductor Research Institute (TSRI) for providing the EDA software that allows us to perform IC layout and simulations at our laboratory.

Conflicts of Interest

The authors declare no conflicts of interest.

References

MacSorley, O. High-speed arithmetic in binary computers. IRE Proc. 2007, 49, 67–91. [Google Scholar] [CrossRef]
Rubinfield, L.P. A proof of the modified Booth’s algorithm for multiplication. IEEE Trans. Comput. 1975, 24, 1014–1015. [Google Scholar] [CrossRef]
Cheng, Q.; Dai, L.; Huang, M.; Shen, A.; Mao, W.; Hashimoto, M.; Yu, H. A low-power sparse convolutional neural network accelerator with pre-encoding radix-4 Booth multiplier. IEEE Trans. Circuits Syst. II: Express Briefs. 2023, 70, 2246–2250. [Google Scholar] [CrossRef]
Wang, H.; Liu, Y.; Han, J. The design of multipliers based on radix-4 Booth coding. In Proceedings of the 2022 4th International Academic Exchange Conference on Science and Technology Innovation (IAECST), Guangzhou, China, 9–11 December 2022; pp. 1471–1475. [Google Scholar]
Cui, X.; Liu, W.; Chen, X.; Swartzlander, E.E.; Lombardi, F. A modified partial product generator for redundant binary multipliers. IEEE Trans. Comput. 2016, 65, 1165–1171. [Google Scholar] [CrossRef]
Park, J.; Kim, S.; Lee, Y.-S. A low-power booth multiplier using novel data partition method. In Proceedings of the 2004 IEEE Asia-Pacific Conference on Advanced System Integrated Circuits, Fukuoka, Japan, 4–5 August 2004; pp. 54–57. [Google Scholar]
Abu-Khater, I.S.; Bellaouar, A.; Elmasry, M.I. Circuit techniques for CMOS low-power high-performance multipliers. IEEE J. Solid-State Circuits. 1996, 31, 1535–1546. [Google Scholar] [CrossRef]
Booth, A.D. A signed binary multiplication technique. Quart. J. Mech. Appl. Math. 1951, 4, 236–240. [Google Scholar] [CrossRef]
Ercegovac, M.D.; Lang, T. Digital Arithmetic; Morgan Kaufmann: Los Altos, CA, USA, 2003. [Google Scholar]

Figure 1. (a) The gate-level encoder for the signals neg_i, Y_iC, Y_iS, zero_i, and S_i generator; (b) the transistor-level encoder for the Y_iS generator; (c) the transistor-level encoder for the Y_iC generator; (d) the transistor-level encoder for the Si generator.

Figure 2. (a) The transistor-level implementation of the demux-type 4-bit radix-4 Booth decoder circuit; (b) the first stage of the decoder circuit; (c) the demux-type decoder used to output the correction bit S_i.

Figure 3. (a) OAI_zero_i; (b) OAIT_zero_i.

Figure 4. (a) X_j_2; (b) X_j2.

Figure 5. (a) z_c_s; (b) zc_s; (c) zcs.

Table 1. Coefficients of radix-4 Booth encoding.

Y_2i+1	Y_2i	Y_2i−1	PP Coef. ¹
0	0	0	$0 \times$ X_j
0	0	1	$+ 1 \times$ X_j
0	1	0	$+ 1 \times$ X_j
0	1	1	$+ 2 \times$ X_j
1	0	0	$- 2 \times$ X_j
1	0	1	$- 1 \times$ X_j
1	1	0	$- 1 \times$ X_j
1	1	1	$0 \times$ X_j

¹ Coefficient of the partial product.

Table 2. Coefficients of the radix-4 Booth encoding module.

Y_2i+1	Y_2i	Y_2i−1	PP Coef.	neg_i	Y_iC	Y_iS	zero_i	S_i
0	0	0	$0 \times$ X_j	0	0	0	1	0
0	0	1	$+ 1 \times$ X_j	0	0	1	0	0
0	1	0	$+ 1 \times$ X_j	0	0	1	0	0
0	1	1	$+ 2 \times$ X_j	0	1	0	0	0
1	0	0	$- 2 \times$ X_j	1	1	0	0	1
1	0	1	$- 1 \times$ X_j	1	0	1	0	1
1	1	0	$- 1 \times$ X_j	1	0	1	0	1
1	1	1	$0 \times$ X_j	1	0	0	1	0

Table 3. Simulation results of OAI_zero_i and OAIT_zero_i.

	OAI_zero_i	OAIT_zero_i
Avg. (uW)	9.2957	9.4973
Delay (ps)	80.8/120	91.1/86.4

Table 4. Truth table of the signals NX_j and NX_j+₁ generator.

Y_2i+1	X_j+1	X_j	NX_j+1	NX_j
0	0	0	0	0
0	0	1	0	1
0	1	0	1	0
0	1	1	1	1
1	0	0	1	1
1	0	1	1	0
1	1	0	0	1
1	1	1	0	0

Table 5. Simulation results of X_j_2 and X_j2.

	X_j_2	X_j2
Avg. (uW)	12.1443	10.9709
Delay Time (ps)	NX_j 66.4/32.7	NX_j 60.7/28.4
Delay Time (ps)	NX_j+1 66.3/32.8	NX_j+1 45.2/39.2
Numbers of Transistors	11 PMOS	9 PMOS
Numbers of Transistors	11 NMOS	9 NMOS

Table 6. Simulation results of z_c_s, zc_s, and zcs.

		z_c_s (Figure 5a)	zc_s (Figure 5b)	zcs (Figure 5c)
Avg. (uW)		23.8827	23.0814	23.2156
Delay Time (ps)	Y_iC	123/117	182/116	164/127
	Y_iCI	141/143	199/143	180/155
	Y_iS	88.5/115	79.5/105	123/158
	Y_iSI	107/142	98.8/131	95.3/141
	zero_i	105/86.4	102/144	113/137
Numbers of Transistors		23 PMOS	19 PMOS	17 PMOS
Numbers of Transistors		23 NMOS	19 NMOS	17 NMOS

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, Y.-N.; Hung, Y.-C. Design of Complementary Metal–Oxide–Semiconductor Encoder/Decoder with Compact Circuit Structure for Booth Multiplier. Eng. Proc. 2025, 103, 21. https://doi.org/10.3390/engproc2025103021

AMA Style

Wang Y-N, Hung Y-C. Design of Complementary Metal–Oxide–Semiconductor Encoder/Decoder with Compact Circuit Structure for Booth Multiplier. Engineering Proceedings. 2025; 103(1):21. https://doi.org/10.3390/engproc2025103021

Chicago/Turabian Style

Wang, Yu-Nsin, and Yu-Cherng Hung. 2025. "Design of Complementary Metal–Oxide–Semiconductor Encoder/Decoder with Compact Circuit Structure for Booth Multiplier" Engineering Proceedings 103, no. 1: 21. https://doi.org/10.3390/engproc2025103021

APA Style

Wang, Y.-N., & Hung, Y.-C. (2025). Design of Complementary Metal–Oxide–Semiconductor Encoder/Decoder with Compact Circuit Structure for Booth Multiplier. Engineering Proceedings, 103(1), 21. https://doi.org/10.3390/engproc2025103021

Article Menu

Design of Complementary Metal–Oxide–Semiconductor Encoder/Decoder with Compact Circuit Structure for Booth Multiplier^†

Abstract

1. Introduction

2. Radix-4 Booth Encoding