Hardware Optimized and Error Reduced Approximate Adder

: This paper presents a new hardware optimized and error reduced approximate adder (HOERAA), which is suitable for ﬁeld programmable gate array (FPGA)- and application speciﬁc integrated circuit (ASIC)-based implementations. In this work, we consider a FPGA-based implementation using Xilinx Vivado 2018.3, targeting an Artix-7 FPGA. The ASIC-based realizations are based on a 32 / 28nm complementary metal oxide semiconductor (CMOS) process. Based on FPGA implementations, we note the following: (i) For 32-bit addition involving a 8-bit least signiﬁcant inaccurate sub-adder, HOERAA requires 22% fewer look-up tables (LUTs) and 18.6% fewer registers while reducing the minimum clock period by 7.1% and reducing the power-delay product (PDP) by 14.7%, compared to the native accurate FPGA adder, and (ii) for 64-bit addition involving a 8-bit least signiﬁcant inaccurate sub-adder, HOERAA requires 11% fewer LUTs and 9.3% fewer registers while reducing the minimum clock period by 8.3% and reducing the PDP by 9.3%, compared to the native accurate FPGA adder. Based on ASIC-style implementations, HOERAA is found to achieve the following reductions in design metrics compared to an optimum accurate carry-lookahead adder: (i) A 15.7% reduction in critical path delay, a 21.4% reduction in area, and a 35% reduction in PDP for 32-bit addition involving a 8-bit least signiﬁcant inaccurate sub-adder, and (ii) a 15.3% reduction in critical path delay, a 10.7% reduction in area, and a 20% reduction in PDP for 64-bit addition involving a 8-bit least signiﬁcant inaccurate sub-adder. Moreover, comparisons with other approximate adders show that HOERAA has a signiﬁcantly reduced average error, mean average error, and root mean square error, while reporting near optimum design metrics.


Introduction
Approximate computing is emerging as a viable low power alternative to conventional accurate computing [1], especially for practical digital signal processing applications which underlie modern electronics, computer, and communication engineering. Whether it be big data analytics [2], software engineering [3], neuromorphic computing [4], hardware realization of deep neural networks for machine learning and artificial intelligence [5], memory systems for multicore processors [6,7], low power graphics processing units [8], and ultra-low power electronic design involving sub-threshold operation of devices [9], approximate computing is being resorted to in the quest for achieving greater efficiency in computing [10]. Approximate computing takes advantage of the inherent error resilience of practical multimedia applications [11].
Approximate computing spans both hardware and software, and in this work, the focus is on the design of approximate hardware. With respect to approximate hardware, the research focus has been A and B represent the adder inputs in Figure 1, while SUM denotes the sum output, which includes a carry overflow bit. The subscripts associated with the adder inputs and outputs denote the corresponding bit positions. The (N-K) sum bits of the accurate sub-adder are combined with the K sum bits of the inaccurate sub-adder to produce the N sum bits of approximate adders, which includes the carry overflow.
We shall use some legends for the approximate adders for the ease of referencing; some of the legends are defined by the authors in their respective works while the remaining are defined by us in this work for referencing.
The accurate adder is shown in Figure 1a. The approximate adder of Reference [19], called the lower-part OR adder (LOA), is shown in Figure 1b. In the LOA, the K input bit-pairs of the inaccurate sub-adder are respectively OR-ed to produce the K sum bits. The most significant bit-pair of the inaccurate sub-adder viz. A K-1 and B K-1 is alone AND-ed and given as the carry input for the accurate sub-adder.
The approximate adder of Reference [20] shall be referred to as LOAWA (i.e., LOA without the 2-input AND function). LOAWA is shown in Figure 1c, which is nearly identical to LOA of Reference [19]; the only exception being that the carry input of the accurate sub-adder is set to a constant 0.
The approximate adder of [18], which uses the type 5 approximate full adder for realizing the inaccurate sub-adder is shown in Figure 1d. This approximate adder shall be referred to as APPROX5. The sum bits of the inaccurate sub-adder are the same as one set of the corresponding input bits, i.e., SUM i = B i , where i denotes a bit position. The carry input for the accurate sub-adder is a most significant input bit belonging to the inaccurate sub-adder viz. A K-1 .
The hardware-efficient approximate adder presented in Reference [27] shall be called HEAA, which is shown in Figure 1e. We observe that HEAA consumes less FPGA resources compared to the accurate FPGA adder and consumes either the same FPGA resources as that of some approximate adders or more FPGA resources in comparison with other approximate adders. HEAA is almost structurally similar to LOA; however, there exists a difference. A 2-to-1 multiplexer (MUX21) is used in the inaccurate sub-adder in addition to the 2-input OR functions. The OR-ed output of the most significant input bit-pair in the inaccurate sub-adder is given to the 0-input of MUX21, and a constant 0 is given to the 1-input of MUX21. The AND of the most significant input bit-pair corresponding to the inaccurate sub-adder serves as the select input of MUX21, besides serving as the carry input for the accurate sub-adder.
An optimized version of LOA presented in Reference [21], called OLOCA, is shown in Figure 1f. OLOCA is in fact a slight modification of LOA. The accurate sub-adder and the sum output bits corresponding to the two most significant bit positions in the inaccurate sub-adder of OLOCA are the same as that of LOA. However, the remaining (K-2) less significant sum output bits in the inaccurate sub-adder of OLOCA are tied to a constant 1. In the LOA, however, all the K sum output bits are produced by OR-ing the respective input bit-pairs. A constant 1 for the (K-2) sum bits in OLOCA implies the elimination of look-up tables (LUTs) and flip-flops (FFs) for realizing those bits of the inaccurate sub-adder for a FPGA-based implementation. Hence, OLOCA is likely to consume less FPGA resources compared to the accurate FPGA adder, LOA, LOAWA, APPROX5, and HEAA, which is substantiated by the simulation results given in the next section.
Our new hardware optimized and error reduced approximate adder (HOERAA) is portrayed by Figure 1g. Just like in OLOCA, the (K-2) less significant sum output bits in the inaccurate sub-adder of HOERAA are set to a constant 1 (i.e., V dd ). The accurate sub-adder of LOA, LOAWA, APPROX5, HEAA, OLOCA, and HOERAA are the same. Given these, HOERAA would potentially consume less FPGA resources for physical implementation, like OLOCA. However, there is some custom logic introduced in HOERAA, especially with respect to the most significant bit position in the inaccurate sub-adder. A MUX21 is used in the inaccurate sub-adder in addition to the 2-input OR functions. The OR-ed output of the most significant input bit-pair in the inaccurate sub-adder, i.e., (A K-1 | B K-1 ) is given to the 0-input of MUX21, while the logical AND of A K-2 and B K-2 , (i.e., A K-2 & B K-2 ) is given to the 1-input of MUX21. The AND of the most significant input bit-pair in the inaccurate sub-adder (i.e., A K-1 & B K-1 ) serves as the select input of MUX21, besides serving as the carry input for the accurate

Implementation Results and Error Characteristics
We first present the FPGA-based implementation results for the accurate and approximate adders. This is followed by a presentation of the ASIC-based implementation results. Then, the error characteristics of the approximate adders are discussed.

FPGA-Based Implementation Results
Accurate and approximate 32-bit and 64-bit adders, whose architectures were shown in Figure 1, were coded in Verilog HDL in a behavioral style and synthesized and implemented on an Artix-7 FPGA (part: xc7a100tcsg324-3) using Xilinx Vivado 2018.3. All the adders were implemented with a pair of registers on the adder inputs and a register following the adder output. These registers are not considered part of the adder hardware and are only included to isolate the adders from the physical inputs and outputs so that the input-output (I/O) delay and I/O routing delay do not influence the adder speed. Placing a circuit between registers to determine its speed is normal practice in FPGA design.
The approximate adders were implemented assuming K = 4 and K = 8. With K = 4, 4-bits are allotted to the least significant inaccurate sub-adder and the remaining bits (28 bits in the case of 32-bit addition and 60 bits in the case of 64-bit addition) are allotted to the more significant accurate sub-adder. Hence, K = 4 implies a 28-4 input partition with respect to the 32-bit approximate adder and a 60-4 input partition in the case of the 64-bit approximate adder. With K = 8, 8-bits are allotted to the least significant inaccurate sub-adder and the remaining bits (24 bits in the case of 32-bit addition and 56 bits in the case of 64-bit addition) are allotted to the more significant accurate sub-adder. Hence, K = 8 implies a 24-8 input partition with respect to the 32-bit approximate adder and a 56-8 input partition in the case of the 64-bit approximate adder.
The choice of 8-bits for the inaccurate sub-adders is based on the premise that, for practical digital image processing and video encoding applications, the approximation size is recommended to be confined to the range of 7 to 9 bits in [18,28]. Nevertheless, the number of bits to be allotted to the inaccurate sub-adders can be decided commensurate with a target application depending upon its error resilience. The consideration of 4-bits for the inaccurate sub-adders here is mainly meant to illustrate how the design metrics of the approximate adders become better optimized when the approximation size is increased (from 4-bits to 8-bits) in the inaccurate sub-adders.
For the FPGA implementations, the synthesis strategy was set to Flow_AreaOptimized_high and the default implementation strategy was used. The adders were successfully synthesized and placed and routed. The timing summaries were analyzed to ensure that all the adders have a positive timing slack post-place and route. The implementation results for 32-and 64-bit additions covering various FPGA design metrics, such as minimum clock period, maximum operating frequency, total on-chip power consumption, number of LUTs utilized, and the number of FFs utilized as registers, are given in Table 1. The default total on-chip power consumption estimates provided by Vivado are also given in Table 1, which includes the power consumption of clocks, signals, logic, and I/O. It was noted that the variation in the total power reflects the variation in the adder logic power. An important general inference drawn from Table 1 is that whether it is a 32-bit addition or a 64-bit addition, as the approximation size is increased (from 4-bits to 8-bits), the size of the inaccurate sub-adder will increase and the size of the accurate sub-adder will decrease, and hence the speed (i.e., maximum frequency) of the approximate adders will increase while their power consumption and resource utilization will simultaneously decrease, compared to the corresponding accurate adders. For example, considering the 32-bit addition, the proposed approximate adder (HOERAA) reports a 3% increase in speed for a 28-4 input partition and a 7.7% improvement in speed for a 24-8 input partition compared to the accurate FPGA adder. Further, HOERAA consumes 3 LUTs and 6 FFs less compared to the accurate 32-bit FPGA adder for a 28-4 input partition and consumes 7 LUTs and 18 FFs less for a 24-8 input partition. Moreover, HOERAA consumes 3.3% less on-chip power for a 28-4 input partition and 8.1% less on-chip power for a 24-8 input partition compared to the accurate FPGA adder. These demonstrate the trade-off between accuracy and optimization of the design metrics. In general, the design metrics of the approximate adders are better optimized as the approximation size is increased.
The FPGA FF resources exactly reflect the latched inputs and outputs used by the various adders. For example, for the 32-bit adders, with a 24-8 input partition, LOA, LOAWA, and HEAA requires two sets of 32-bit inputs and one 33-bit output, resulting in 97 FFs. APPROX5 does not use input bits A 2 to A 0 for K = 4 and input bits A 6 to A 0 for K = 8, saving 3 FFs and 7 FFs, respectively, while OLOCA and HOERAA do not use the lower 2-bits of A, B, and SUM for K = 4 and the lower 6-bits of A, B, and SUM for K = 8, resulting in the savings of 6 FFs and 18 FFs, respectively. However, these FFs are not part of the adder structure and are just reported here for completeness.
It is seen from Table 1 that APPROX5 consumes fewer LUTs compared to the rest of the adders for 32-and 64-bit additions. HOERAA and OLOCA consume just one more FPGA LUT resource compared to APPROX5 for 32-and 64-bit additions. However, APPROX5, with a 28-4 input partition, consumes 3 FPGA FFs more compared to OLOCA and HOERAA for 32-bit addition, and 15 FPGA FFs more compared to OLOCA and HOERAA with a 24-8 input partition. In the case of 64-bit addition, a similar trend is observed for a comparison between APPROX5 and OLOCA and HOERAA, with respect to the utilization of FPGA LUT and FF resources.
In terms of the maximum frequency, different approximate adders report higher speeds for different addition sizes based on the extent of approximation. For example, in the case of 32-bit addition, APPROX5 reports a higher speed compared to the accurate and other approximate adders for both 28-4 and 24-8 input partitions. On the other hand, in the case of 64-bit addition, LOAWA reports a higher speed compared to the accurate and other approximate adders for a 60-4 input partition, while OLOCA and HOERAA report higher speed than the accurate and other approximate adders for a 56-8 input partition.
The power-delay products (PDPs) of accurate and approximate adders were calculated by multiplying the minimum clock period with the total on-chip power. The PDPs of the adders were subsequently normalized. To perform the normalization, the maximum PDP (corresponding to the accurate FPGA adder) was considered as the reference and this PDP was used to divide the PDPs of all the approximate adders corresponding to a particular addition operation. It is desirable for power and delay to be at a minimum for a digital circuit and, hence, the PDP is also desirable to be at a minimum. Hence, the least value of PDP corresponding to an adder is reflective of a low power/energy-efficient design. The normalized PDPs of FPGA-based accurate and approximate adders, with respect to 32and 64-bit additions, are plotted in Figure 2a,b respectively. resource utilization will simultaneously decrease, compared to the corresponding accurate adders. For example, considering the 32-bit addition, the proposed approximate adder (HOERAA) reports a 3% increase in speed for a 28-4 input partition and a 7.7% improvement in speed for a 24-8 input partition compared to the accurate FPGA adder. Further, HOERAA consumes 3 LUTs and 6 FFs less compared to the accurate 32-bit FPGA adder for a 28-4 input partition and consumes 7 LUTs and 18 FFs less for a 24-8 input partition. Moreover, HOERAA consumes 3.3% less on-chip power for a 28-4 input partition and 8.1% less on-chip power for a 24-8 input partition compared to the accurate FPGA adder. These demonstrate the trade-off between accuracy and optimization of the design metrics. In general, the design metrics of the approximate adders are better optimized as the approximation size is increased.
The FPGA FF resources exactly reflect the latched inputs and outputs used by the various adders. For example, for the 32-bit adders, with a 24-8 input partition, LOA, LOAWA, and HEAA requires two sets of 32-bit inputs and one 33-bit output, resulting in 97 FFs. APPROX5 does not use input bits A2 to A0 for K = 4 and input bits A6 to A0 for K = 8, saving 3 FFs and 7 FFs, respectively, while OLOCA and HOERAA do not use the lower 2-bits of A, B, and SUM for K = 4 and the lower 6bits of A, B, and SUM for K = 8, resulting in the savings of 6 FFs and 18 FFs, respectively. However, these FFs are not part of the adder structure and are just reported here for completeness.
It is seen from Table 1 that APPROX5 consumes fewer LUTs compared to the rest of the adders for 32-and 64-bit additions. HOERAA and OLOCA consume just one more FPGA LUT resource compared to APPROX5 for 32-and 64-bit additions. However, APPROX5, with a 28-4 input partition, consumes 3 FPGA FFs more compared to OLOCA and HOERAA for 32-bit addition, and 15 FPGA FFs more compared to OLOCA and HOERAA with a 24-8 input partition. In the case of 64-bit addition, a similar trend is observed for a comparison between APPROX5 and OLOCA and HOERAA, with respect to the utilization of FPGA LUT and FF resources.
In terms of the maximum frequency, different approximate adders report higher speeds for different addition sizes based on the extent of approximation. For example, in the case of 32-bit addition, APPROX5 reports a higher speed compared to the accurate and other approximate adders for both 28-4 and 24-8 input partitions. On the other hand, in the case of 64-bit addition, LOAWA reports a higher speed compared to the accurate and other approximate adders for a 60-4 input partition, while OLOCA and HOERAA report higher speed than the accurate and other approximate adders for a 56-8 input partition.
The power-delay products (PDPs) of accurate and approximate adders were calculated by multiplying the minimum clock period with the total on-chip power. The PDPs of the adders were subsequently normalized. To perform the normalization, the maximum PDP (corresponding to the accurate FPGA adder) was considered as the reference and this PDP was used to divide the PDPs of all the approximate adders corresponding to a particular addition operation. It is desirable for power and delay to be at a minimum for a digital circuit and, hence, the PDP is also desirable to be at a minimum. Hence, the least value of PDP corresponding to an adder is reflective of a low power/energy-efficient design. The normalized PDPs of FPGA-based accurate and approximate adders, with respect to 32-and 64-bit additions, are plotted in Figure 2a,b respectively.  least PDP for a 60-4 input partition while OLOCA and HOERAA achieve the least PDP for a 56-8 input partition. Compared to the accurate 32-bit FPGA adder, HOERAA reports a 6.1% reduction in energy for a 28-4 input partition and a 14.7% reduction in energy for a 24-8 input partition. On the other hand, in comparison with the accurate 64-bit FPGA adder, HOERAA reports a 3% reduction in energy with a 60-4 input partition and a 9.3% reduction in energy with a 56-8 input partition. Further, HOERAA achieves better energy efficiency compared to the accurate FPGA adder while consuming less FPGA resources (LUTs and FFs).
The percentage of energy reduction achieved is relatively less for 64-bit addition compared to 32-bit addition. This is because the size of the accurate sub-adder is greater in the case of the former compared to the latter. Hence, another general inference is that a relatively small approximate adder will exhibit an enhanced design optimization compared to a larger approximate adder for the same approximation. This is because the ratio of bits allotted to the accurate sub-adder vis-à-vis the inaccurate sub-adder would be greater in a larger adder (for example, 15:1 in a 64-bit approximate adder with K = 4) compared to a smaller adder (for example, 7:1 in a 32-bit approximate adder with K = 4). The design metrics corresponding to the accurate sub-adder would dominate the design metrics of the approximate adder when increasing the size of the accurate sub-adder while keeping the size of the inaccurate sub-adder as a constant. Thus, as the size of the accurate sub-adder in an approximate adder increases, the savings in design metrics obtained through approximation are reduced.

ASIC-Based Implementation Results
For an ASIC-based implementation, a 32/28nm CMOS-based standard digital cell library [29] was considered for the implementations. An efficient accurate carry-lookahead adder (CLA) recently presented in Reference [30] was considered. The new CLA of Reference [30] is called the factorized recursive CLA, i.e., FRCLA, which is used to realize the accurate adder of Figure 1a. The approximate adders shown in Figure 1b-g were implemented with the FRCLA used to realize the accurate sub-adders. The accurate and approximate adders were implemented using a high-V t 32/28 nm CMOS technology. The simulation environment corresponds to a typical-case specification of the standard digital cell library with a recommended supply voltage of 1.05 V and an operating junction temperature of 25 • C. The simulation set-up was maintained the same as in Reference [29]. The critical path delay, silicon area, and average power dissipation of the adders were estimated using Synopsys tools. To estimate the average power dissipation, about 1000 random input vectors were identically supplied to the adders at time intervals of 5 ns (200 MHz), as done in Reference [29]. The switching activities captured through the functional simulations were subsequently used to estimate the average power dissipation. The critical path delays and area occupancies were estimated with default wire loads included while performing the simulations. The time-based power analysis mode of Synopsys PrimeTime was invoked to estimate the average power dissipation. The estimated ASIC-based design metrics are given in Table 2. Table 2. ASIC-based design metrics corresponding to accurate and approximate 32-bit and 64-bit adders estimated using a 32/28nm CMOS process. Referring to Table 2, it is seen that the critical path delays of the approximate adders corresponding to a particular addition are almost the same for a similar approximation. For example, all the 32-bit approximate adders with a 24-8 input partition have the same critical path delay of 0.96ns. This is because the critical path delays of the approximate adders are underpinned by the critical path delays of the accurate sub-adders, which are realized using the FRCLA architecture. In terms of area, LOAWA and APPROX5 consume less silicon compared to the accurate and other approximate adders for 32and 64-bit additions. OLOCA dissipates less power compared to the accurate and other approximate adders for 32-and 64-bit additions.

32-bit addition
A perusal of Table 2 confirms the general observation made on the basis of Table 1 that increasing the approximation size (from 4-bits to 8-bits) in an approximate adder helps to better optimize the design metrics by reducing the critical path delay, silicon area, and average power simultaneously. This is found to be true for 32-bit and 64-bit approximate adders employing two approximations with K = 4 and K = 8. For example, a 32-bit HOERAA with a 28-4 input partition reports a 7.8% reduction in delay, a 10% reduction in area, and an 8.8% reduction in power compared to the accurate 32-bit FRCLA, and a 32-bit HOERAA with a 24-8 input partition reports enhanced reductions in delay, area, and power by 15.7%, 21.4%, and 23%, respectively, compared to the accurate 32-bit FRCLA. On the other hand, a 64-bit HOERAA with a 60-4 input partition reports a 4.8% reduction in delay, a 5% reduction in area, and a 4.4% reduction in power compared to the accurate 32-bit FRCLA, while a 64-bit HOERAA with a 56-8 input partition reports enhanced reductions in delay, area, and power by 9.6%, 10.7%, and 11.4%, respectively, compared to the accurate 64-bit FRCLA. As noted earlier, the relatively less percentage reductions in design metrics achieved for 64-bit approximate adders compared to 32-bit approximate adders are attributed to the increase in the size of the accurate sub-adders for the former compared to the latter. The PDPs of accurate and approximate adders were calculated from the delay and power values given in Table 2. The normalized PDP plots corresponding to the ASIC-based designs of accurate and approximate adders for 32-and 64-bit additions are shown in Figure 3a,b.
The normalization was performed in the same manner as discussed earlier in the context of Figure 2. Figure 3a,b show that OLOCA has the least PDP, which is closely followed by HOERAA. accurate and approximate adders were calculated from the delay and power values given in Table 2.
The normalized PDP plots corresponding to the ASIC-based designs of accurate and approximate adders for 32-and 64-bit additions are shown in Figure 3a,b. The normalization was performed in the same manner as discussed earlier in the context of Figure 2. Figure 3a,b show that OLOCA has the least PDP, which is closely followed by HOERAA.

Error Characteristics of Approximate Adders
It is important to ascertain the error characteristics of the approximate adders to determine which of them would have a less error while providing a maximum implementation performance (i.e., better optimized design metrics). This an important focus of this work, where the error characteristic is given an equal weightage as the quality-of-results (design metrics).
Many error metrics have been proposed in the literature for approximate computing. In this paper, we used the standard error metrics such as the average error (AE), the mean average error (MAE, also referred to as mean error distance (MED) in the literature), and the root mean square error (RMSE) [31] for the approximate adders, as they are more widely used in error analysis. The error metrics and the error bounds of the approximate adders are given in Table 3. The error metrics, such as AE, MAE, and RMSE, have again been calculated by using a 4-bit and an 8-bit inaccurate subadder for the approximate adders shown in Figure 1b-g, experimentally, by considering all the distinct inputs corresponding to the inaccurate sub-adders. The generalized error range for the approximate adders having a K-bit inaccurate sub-adder is also given in Table 3 (last column). No error is applicable for the accurate adder shown in Figure 1a. Table 3. Error metrics of the approximate adders shown in Figure 1b-g. Average error (AE), mean average error (MAE), and root mean square error (RMSE) were calculated by considering a 4-bit and an 8-bit inaccurate sub-adder (i.e., K = 4 and K = 8) in the approximate adders. The generalized error range of the approximate adders using a K-bit inaccurate sub-adder is also specified.

Error Characteristics of Approximate Adders
It is important to ascertain the error characteristics of the approximate adders to determine which of them would have a less error while providing a maximum implementation performance (i.e., better optimized design metrics). This an important focus of this work, where the error characteristic is given an equal weightage as the quality-of-results (design metrics).
Many error metrics have been proposed in the literature for approximate computing. In this paper, we used the standard error metrics such as the average error (AE), the mean average error (MAE, also referred to as mean error distance (MED) in the literature), and the root mean square error (RMSE) [31] for the approximate adders, as they are more widely used in error analysis. The error metrics and the error bounds of the approximate adders are given in Table 3. The error metrics, such as AE, MAE, and RMSE, have again been calculated by using a 4-bit and an 8-bit inaccurate sub-adder for the approximate adders shown in Figure 1b-g, experimentally, by considering all the distinct inputs corresponding to the inaccurate sub-adders. The generalized error range for the approximate adders having a K-bit inaccurate sub-adder is also given in Table 3 (last column). No error is applicable for the accurate adder shown in Figure 1a. Table 3. Error metrics of the approximate adders shown in Figure 1b-g. Average error (AE), mean average error (MAE), and root mean square error (RMSE) were calculated by considering a 4-bit and an 8-bit inaccurate sub-adder (i.e., K = 4 and K = 8) in the approximate adders. The generalized error range of the approximate adders using a K-bit inaccurate sub-adder is also specified. We consider RMSE as a more important error metric in this work since it has been observed in Reference [27] that RMSE gives a relatively higher weight to larger errors and is considered more likely to impact an application employing approximate arithmetic. AE, MAE, and RMSE are expressed as follows:

Type of Adder
(1) In Equations (1), (2), and (3), 'N' represents the adder size, 'e' is the error distance (i.e., the numerical difference between the accurate and approximate adder output), 'P' denotes the probability of an error value occurrence, and 'δ' is the set of all error values.
From Table 3, it is seen that HOERAA has a reduced RMSE compared to the other approximate adders for K = 4 and K = 8. While comparing with OLOCA, which has the same speed as HOERAA, evident from Tables 1 and 2, HOERAA reports a 50% reduction in AE, a 39.5% reduction in MAE, and a 40.7% reduction in RMSE for K = 4, and a 50% reduction in AE, a 38.5% reduction in MAE, and a 40.2% reduction in RMSE for K = 8.
It would also be useful to study the distribution of errors of the approximate adders about the zero error. The results of this error distribution analysis, which assumes a uniform distribution of inputs, are captured in Figure 4. The percentage occurrence of errors corresponding to the specified error bounds (range), given in Table 3, are plotted in the Y-axis and the error bounds are plotted in the X-axis in Figure 4. It can be seen from Figure 4 that HOERAA has a better near normal distribution about the zero error. APPROX5 also has a good error distribution about the zero error. However, APPROX5 reports substantial increases in MAE and RMSE by 103.2% and 80.4% on average, compared to HOERAA, when considering 4-bit and 8-bit inaccurate sub-adders in the approximate adders, which is undesirable.

Conclusions
This article has presented a new approximate adder called HOERAA, which is suitable for FPGA-and ASIC-based implementations. In this work, we considered a subset of approximate adders for comparison, which are also suitable for FPGA-and ASIC-based implementations. It has

Conclusions
This article has presented a new approximate adder called HOERAA, which is suitable for FPGAand ASIC-based implementations. In this work, we considered a subset of approximate adders for comparison, which are also suitable for FPGA-and ASIC-based implementations. It has been observed that HOERAA achieves a good optimization in the design metrics compared to the other approximate adders whilst reporting reduced error characteristics.
Considering a FPGA-based implementation with a 8-bit inaccurate sub-adder, HOERAA requires 22% fewer LUTs and 18.6% fewer registers for 32-bit addition while improving the maximum frequency by 7.7% and reducing the PDP by 14.7% compared to the native accurate FPGA adder, and for 64-bit addition, HOERAA requires 11% fewer LUTs and 9.3% fewer registers while increasing the maximum frequency by 9.1% and reducing the PDP by 9.3%.
Considering an ASIC-based implementation using a standard digital cell library, and compared to the accurate FRCLA, HOERAA with an 8-bit inaccurate sub-adder achieves a 15.7% reduction in critical path delay, a 21.4% reduction in area, and a 35% reduction in PDP for 32-bit addition, and a 15.3% reduction in critical path delay, a 10.7% reduction in area, and a 20% reduction in PDP for 64-bit addition.
We also considered FPGA-and ASIC-based implementations of 32-bit and 64-bit approximate adders using a 4-bit inaccurate sub-adder. This was done mainly to analyze the trends in the design metrics when the approximation size is increased from 4-bits to 8-bits in the least significant adder bit positions. It was inferred that, for both FPGA-and ASIC-based implementations, increasing the size of the inaccurate sub-adder has a beneficial impact on the optimization of design metrics. However, this would be accompanied by some decrease in the accuracy. This points to a potential trade-off between accuracy and optimization of the design metrics that is inherent in approximate computing. AE, MAE, and RMSE are standard error metrics, which are usually quantified when performing approximate computer arithmetic [31]. It is mentioned in Reference [27] that RMSE gives a relatively greater weight to larger errors and it is mentioned in Reference [32] that MAE (also called MED) is an effective metric for measuring the implementation accuracy of an approximate adder. Overall, HOERAA has reduced error attributes compared to the other approximate adders. Notably, in comparison with OLOCA, which has nearly similar design metrics as HOERAA, the latter achieves significant reductions in the error metrics viz. a 39% reduction in MAE and a 40.5% reduction in RMSE, on average, when considering the use of 4-bit and 8-bit inaccurate sub-adders.
Overall, from the perspectives of hardware design optimization and error reduction, HOERAA is preferable to the rest, including the recently proposed OLOCA. In the future, we would consider utilizing HOERAA for constructing an approximate multiplier. These could then be considered for realizing low power/energy-efficient computation units for use in practical digital signal processing applications such as image/audio/video processing.