MDPI - Publisher of Open Access Journals

15 pages, 5053 KB

Open AccessArticle

Enhanced Dual Carry Approximate Adder with Error Reduction Unit for High-Performance Multiplier and In-Memory Computing

by Kaeun Lim, Jinhyun Kim, Eunsu Kim and Youngmin Kim

Electronics 2025, 14(9), 1702; https://doi.org/10.3390/electronics14091702 - 22 Apr 2025

Cited by 1 | Viewed by 1621

The Dual Carry Approximate Adder (DCAA) is proposed as an advanced 8-bit approximate adder featuring dual carry-out and carry-in full adders (FAs) along with an Error Reduction Unit (ERU) to enhance accuracy. The 8-bit adder is partitioned into upper and lower 4-bit blocks, [...] Read more.

The Dual Carry Approximate Adder (DCAA) is proposed as an advanced 8-bit approximate adder featuring dual carry-out and carry-in full adders (FAs) along with an Error Reduction Unit (ERU) to enhance accuracy. The 8-bit adder is partitioned into upper and lower 4-bit blocks, connected via a dual carry-out full adder and a dual carry-in full adder. To minimize impact on the critical path, an ERU is designed for efficient error correction. Four variants of the DCAA are provided, allowing users to select the most suitable design based on their specific power, area, and accuracy requirements. The DCAA achieves a 78% reduction in Mean Error Distance (MED) while maintaining high computational speed and efficiency. When applied to Wallace Tree multipliers, it reduces delay by 32% compared to ripple carry adders (RCAs), and in in-memory computing (IMC) architectures, it significantly improves accuracy with minimal delay overhead. Experimental results demonstrate that the DCAA offers a well-balanced trade-off between accuracy, speed, and resource efficiency, making it suitable for high-performance, error-tolerant applications. Compared to existing approximate adders, DCAA exhibits superior error correction capabilities while achieving significantly lower delay. Furthermore, its efficient hardware implementation enables seamless integration into various computing paradigms, including AI accelerators and neuromorphic processors. Additionally, the scalability of the design allows for flexible adaptation to different bit-widths, making it a versatile solution for next-generation computing architectures. Full article

(This article belongs to the Special Issue CMOS Integrated Circuits Design)

► Show Figures

Figure 1

18 pages, 3584 KB

Open AccessArticle

A New Carry Look-Ahead Adder Architecture Optimized for Speed and Energy

by Padmanabhan Balasubramanian and Douglas L. Maskell

Electronics 2024, 13(18), 3668; https://doi.org/10.3390/electronics13183668 - 15 Sep 2024

Cited by 5 | Viewed by 4985

Abstract

We introduce a new carry look-ahead adder (NCLA) architecture that employs non-uniform-size carry look-ahead adder (CLA) modules, in contrast to the conventional CLA (CCLA) architecture, which utilizes uniform-size CLA modules. We adopted two strategies for the implementation of the NCLA. Our novel approach [...] Read more.

We introduce a new carry look-ahead adder (NCLA) architecture that employs non-uniform-size carry look-ahead adder (CLA) modules, in contrast to the conventional CLA (CCLA) architecture, which utilizes uniform-size CLA modules. We adopted two strategies for the implementation of the NCLA. Our novel approach enables improved speed and energy efficiency for the NCLA architecture compared to the CCLA architecture without incurring significant area and power penalties. Various adders were implemented to demonstrate the advantages of NCLA, ranging from the slower ripple carry adder to the widely regarded fastest parallel-prefix adder viz. the Kogge–Stone adder, and their performance metrics were compared. The 32-bit addition was used as an example, with the adders implemented using a semi-custom design method and a 28 nm CMOS standard cell library. Synthesis results show that the NCLA architecture offers substantial improvements in design metrics compared to its high-speed counterparts. Specifically, an NCLA achieved (i) a 14.7% reduction in delay and a 13.4% reduction in energy compared to an optimized CCLA, while occupying slightly more area; (ii) a 42.1% reduction in delay and a 58.3% reduction in energy compared to a conditional sum adder, with an 8% increase in the area; (iii) a 14.7% reduction in delay and a 37.7% reduction in energy compared to an optimized carry select adder, while requiring 37% less area; and (iv) a 20.2% reduction in energy and a 55.4% reduction in area compared to the Kogge–Stone adder. Full article

► Show Figures

Figure 1

10 pages, 691 KB

Open AccessArticle

Bitwise Logical Operations in VCMA-MRAM

by Gulafshan Gulafshan, Selma Amara, Rajat Kumar, Danial Khan, Hossein Fariborzi and Yehia Massoud

Electronics 2022, 11(18), 2805; https://doi.org/10.3390/electronics11182805 - 6 Sep 2022

Cited by 8 | Viewed by 3084

Abstract

Today’s technology demands compact, portable, fast, and energy-efficient devices. One approach to making energy-efficient devices is an in-memory computation that addresses the memory bottleneck issues of the present computing system by utilizing a spintronic device viz. magnetic tunnel junction (MTJ). Further, area and [...] Read more.

Today’s technology demands compact, portable, fast, and energy-efficient devices. One approach to making energy-efficient devices is an in-memory computation that addresses the memory bottleneck issues of the present computing system by utilizing a spintronic device viz. magnetic tunnel junction (MTJ). Further, area and energy can be reduced through approximate computation. We present a circuit design based on the logic-in-memory computing paradigm on voltage-controlled magnetic anisotropy magnetoresistive random access memory (VCMA-MRAM). During the computation, multiple bit cells within the memory array are selected that are in parallel by activating multiple word lines. The designed circuit performs all logic operations-Read/NOT, AND/NAND, OR/NOR, and arithmetic SUM operation (1-bit approximate adder with 75% accuracy for SUM and accurate carry out) by slight modification using control signals. All the simulations have been performed at a 45 nm CMOS technology node with VCMA-MTJ compact model by using the HSPICE simulator. Simulation results show that the proposed circuit’s approximate adder consumes about 300% less energy and 2.3 times faster than its counterpart exact adder. Full article

(This article belongs to the Section Computer Science & Engineering)

► Show Figures

Figure 1

11 pages, 10992 KB

Open AccessArticle

High-Speed and Energy-Efficient Carry Look-Ahead Adder

by Padmanabhan Balasubramanian and Nikos E. Mastorakis

J. Low Power Electron. Appl. 2022, 12(3), 46; https://doi.org/10.3390/jlpea12030046 - 10 Aug 2022

Cited by 29 | Viewed by 9601

Abstract

The carry look-ahead adder (CLA) is well known among the family of high-speed adders. However, a conventional CLA is not faster than other high-speed adders such as a conditional sum adder (CSA), a carry-select adder (CSLA), and the Kogge–Stone adder (KSA), which is [...] Read more.

The carry look-ahead adder (CLA) is well known among the family of high-speed adders. However, a conventional CLA is not faster than other high-speed adders such as a conditional sum adder (CSA), a carry-select adder (CSLA), and the Kogge–Stone adder (KSA), which is the fastest parallel-prefix adder. Further, in terms of power-delay product (PDP) that characterizes the energy of digital circuits, the conventional CLA is not efficient compared to CSLA and KSA. In this context, this paper presents a high-speed and energy-efficient architecture for the CLA. Many adders ranging from ripple carry to parallel-prefix adders were implemented using a 32-28 nm CMOS standard digital cell library by considering a 32-bit addition. The adders were structurally described in Verilog and synthesized using Synopsys Design Compiler. From the results obtained, it is observed that the proposed CLA achieves a reduction in critical path delay by 55.3% and a reduction in PDP by 45% compared to the conventional CLA. Compared to the CSA, the proposed CLA achieves a reduction in critical path delay by 33.9%, a reduction in power by 26.1%, and a reduction in PDP by 51.1%. Compared to an optimized CSLA, the proposed CLA achieves a reduction in power by 35.4%, a reduction in area by 37.3%, and a reduction in PDP by 37.1% without sacrificing the speed. Although the KSA is faster, the proposed CLA achieves a reduction in power by 39.6%, a reduction in PDP by 6.5%, and a reduction in area by 55.6% in comparison. Full article

► Show Figures

Figure 1

21 pages, 5633 KB

Open AccessArticle

Performance Metric Evaluation of Error-Tolerant Adders for 2D Image Blending

by Tanya Mendez, Subramanya G. Nayak, Vasanth Kumar P., Vijay S. R. and Vishnumurthy Kedlaya K.

Electronics 2022, 11(15), 2461; https://doi.org/10.3390/electronics11152461 - 8 Aug 2022

Cited by 10 | Viewed by 3128

Abstract

The hardware implementation of error-tolerant adders using the paradigm of approximate computing has considerably influenced the performance metrics, especially in applications that can compromise accuracy. The foundation for approximate processing is the inclusion of errors in the design to enhance the effectiveness and [...] Read more.

The hardware implementation of error-tolerant adders using the paradigm of approximate computing has considerably influenced the performance metrics, especially in applications that can compromise accuracy. The foundation for approximate processing is the inclusion of errors in the design to enhance the effectiveness and reduce the complexity. This work presents three base adders using the novel concept of error tolerance in digital VLSI design. The research is extended to construct nine variants of power and delay-efficient 16 and 32-bit error-tolerant carry select adders (CSLA). To attain optimization in power and delay, conventional CSLA is refined by substituting ripple carry adders (RCA) with the newly proposed selector unit to minimize the switching activity. The research work includes the power, area, and delay estimates of the design from synthesis using the gpdk-90 nm and gpdk-45 nm standard cell libraries. The proposed adders exhibit reduced delay, power dissipation, area, power delay product (PDP), energy delay product (EDP), and area delay product (ADP) compared to the existing approximate adders. The proposed adder is used in an image blending application. There is a significant improvement in the peak-signal-to-noise ratio (PSNR) in the blended image compared to the standard designs. Full article

(This article belongs to the Special Issue VLSI Circuits & Systems Design)

► Show Figures

Figure 1

4 pages, 395 KB

Open AccessProceeding Paper

Investigation on Performance of Single Precision Floating Point Multiplier (SPFPM) Using CSA Multiplier and Different Types of Adders

by Hasaan Amjad, Zeeshan Ahmad, Muneeb Abrar and Hina Rasheed

Eng. Proc. 2021, 12(1), 107; https://doi.org/10.3390/engproc2021012107 - 22 Mar 2022

Cited by 2 | Viewed by 2478

Abstract

Nowadays, floating point multiplier (FPM) plays an essential role in computers. The IEEE 754 norm for floating point numbers is the most widely recognized portrayal for real numbers on today’s PCs. Addition, multiplication, subtraction, and division are the four important functions of single [...] Read more.

Nowadays, floating point multiplier (FPM) plays an essential role in computers. The IEEE 754 norm for floating point numbers is the most widely recognized portrayal for real numbers on today’s PCs. Addition, multiplication, subtraction, and division are the four important functions of single precision floating arithmetic, amongst which multiplication has the most extensive use in every algorithm. Fast multipliers are of critical need in modern high-performance applications, especially in digital signal processing, because DSP involves many important multiplication-based operations, e.g., fast Fourier transform (FFT) and convolution. These speedy computations can be implemented on field programmable gate arrays (FPGAs), because they can provide a high speed and a large number of on-board digital resources. FPGAs are involved in many modern applications such as cryptography and communication computations, arithmetic and scientific computation, digital image and signal processing, etc. There are many forms of FPM available. This paper describes an efficient way to implement single precision FPM in IEEE 754 standard format, where Verilog hardware description language (VHDL) is used to implement the design for Xilinx Spartan 6 FPGA. Here, the 32-bit number will be divided into three parts: sign bit, exponent, and mantissa. This paper is implemented by using different types of adders, which includes carry increment adder (CIA), carry select adder (CSA), ripple carry adder (RCA), and carry look-ahead adder (CLA). Carry save array (CSA) multiplication is used for performing the mantissa multiplication. Full article

(This article belongs to the Proceedings of The 1st International Conference on Energy, Power and Environment)

► Show Figures

Figure 1

10 pages, 3884 KB

Open AccessArticle

An Energy and Area Efficient Carry Select Adder with Dual Carry Adder Cell

by Heng You, Jia Yuan, Weidi Tang and Shushan Qiao

Electronics 2019, 8(10), 1129; https://doi.org/10.3390/electronics8101129 - 7 Oct 2019

Cited by 18 | Viewed by 4431

Abstract

In this paper, an energy and area efficient carry select adder (CSLA) is proposed. To minimize the redundant logic operation of a regular CSLA, a dual carry adder cell is proposed. The proposed dual carry adder is composed of an XOR/XNOR cell and [...] Read more.

In this paper, an energy and area efficient carry select adder (CSLA) is proposed. To minimize the redundant logic operation of a regular CSLA, a dual carry adder cell is proposed. The proposed dual carry adder is composed of an XOR/XNOR cell and two pairs of sum-carry cells. Both CMOS logic and a transmission gate were applied to the dual carry adder cell to achieve fast and energy efficient operation. Eight-bit, 16b, and 32b square-root (SQRT) CSLAs based on the proposed dual carry adder were developed. The post-layout simulation based on a SMIC 55 nm process demonstrated that the proposed CSLAs reduced power consumption by 68.4–72.2% with a slight delay increase for different bit widths. As the dual carry adder had much fewer transistors than the two regular full adders, the area of the proposed CSLAs was reduced by 45.8–51.1%. The area-power-delay product of the proposed CSLA improved 5.1×–6.73× compared with the regular CSLA. Full article

(This article belongs to the Section Microelectronics)

► Show Figures

Figure 1

12 pages, 1835 KB

Open AccessCommunication

Performance Comparison of Carry-Lookahead and Carry-Select Adders Based on Accurate and Approximate Additions

by Padmanabhan Balasubramanian and Nikos Mastorakis

Electronics 2018, 7(12), 369; https://doi.org/10.3390/electronics7120369 - 2 Dec 2018

Cited by 38 | Viewed by 8142

Abstract

Addition is a fundamental operation in microprocessing and digital signal processing hardware, which is physically realized using an adder. The carry-lookahead adder (CLA) and the carry-select adder (CSLA) are two popular high-speed, low-power adder architectures. The speed performance of a CLA architecture can [...] Read more.

Addition is a fundamental operation in microprocessing and digital signal processing hardware, which is physically realized using an adder. The carry-lookahead adder (CLA) and the carry-select adder (CSLA) are two popular high-speed, low-power adder architectures. The speed performance of a CLA architecture can be improved by adopting a hybrid CLA architecture which employs a small-size ripple-carry adder (RCA) to replace a sub-CLA in the least significant bit positions. On the other hand, the power dissipation of a CSLA employing full adders and 2:1 multiplexers can be reduced by utilizing binary-to-excess-1 code (BEC) converters. In the literature, the designs of many CLAs and CSLAs were described separately. It would be useful to have a direct comparison of their performances based on the design metrics. Hence, we implemented homogeneous and hybrid CLAs, and CSLAs with and without the BEC converters by considering 32-bit accurate and approximate additions to facilitate a comparison. For the gate-level implementations, we considered a 32/28 nm complementary metal-oxide-semiconductor (CMOS) process targeting a typical-case process–voltage–temperature (PVT) specification. The results show that the hybrid CLA/RCA architecture is preferable among the CLA and CSLA architectures from the speed and power perspectives to perform accurate and approximate additions. Full article

(This article belongs to the Section Computer Science & Engineering)

► Show Figures

Figure 1

21 pages, 4094 KB

Open AccessArticle

Low Power Robust Early Output Asynchronous Block Carry Lookahead Adder with Redundant Carry Logic

by Padmanabhan Balasubramanian, Douglas Maskell and Nikos Mastorakis

Electronics 2018, 7(10), 243; https://doi.org/10.3390/electronics7100243 - 9 Oct 2018

Cited by 10 | Viewed by 3759

Abstract

Adder is an important datapath unit of a general-purpose microprocessor or a digital signal processor. In the nanoelectronics era, the design of an adder that is modular and which can withstand variations in process, voltage and temperature are of interest. In this context, [...] Read more.

Adder is an important datapath unit of a general-purpose microprocessor or a digital signal processor. In the nanoelectronics era, the design of an adder that is modular and which can withstand variations in process, voltage and temperature are of interest. In this context, this article presents a new robust early output asynchronous block carry lookahead adder (BCLA) with redundant carry logic (BCLARC) that has a reduced power-cycle time product (PCTP) and is a low power design. The proposed asynchronous BCLARC is implemented using the delay-insensitive dual-rail code and adheres to the 4-phase return-to-zero (RTZ) and the 4-phase return-to-one (RTO) handshaking. Many existing asynchronous ripple-carry adders (RCAs), carry lookahead adders (CLAs) and carry select adders (CSLAs) were implemented alongside to perform a comparison based on a 32/28 nm complementary metal-oxide-semiconductor (CMOS) technology. The 32-bit addition was considered for an example. For implementation using the delay-insensitive dual-rail code and subject to the 4-phase RTZ handshaking (4-phase RTO handshaking), the proposed BCLARC which is robust and of early output type achieves: (i) 8% (5.7%) reduction in PCTP compared to the optimum RCA, (ii) 14.9% (15.5%) reduction in PCTP compared to the optimum BCLARC, and (iii) 26% (25.5%) reduction in PCTP compared to the optimum CSLA. Full article

(This article belongs to the Section Microelectronics)

► Show Figures

Figure 1

Search Results (9)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (9)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI