# Quasi-Delay-Insensitive Implementation of Approximate Addition

^{1}

^{2}

^{*}

## Abstract

**:**

## 1. Introduction

- Asynchronous QDI implementation of approximate addition based on different approximate adder architectures, such as LOA, LOAWA, APPROX5, HEAA, OLOCA and HOERAA is described alongside the accurate addition.
- Design metrics of the accurate QDI adder and different approximate QDI adders are estimated and compared by considering a delay-insensitive dual-rail data encoding and return-to-zero (RTZ) and return-to-one (RTO) handshaking. The implementations use a 32/28-nm complementary metal oxide semiconductor (CMOS) process technology.
- Popular error metrics widely used in approximate computing are calculated for the approximate adders and provided for a comparison. The error distribution of the approximate adders is also portrayed. The implementation of the adders and their error metrics correspond to a digital image processing application.
- A digital image processing application is considered to demonstrate the practical usefulness of approximate addition vis-à-vis the accurate addition.

## 2. QDI Circuit Design—Fundamentals

#### 2.1. Delay-Insensitive Data Encoding and Four-Phase Handshaking

_{1}and I

_{0}, where I = 1 is encoded as I

_{1}= 1 and I

_{0}= 0, and I = 0 is encoded as I

_{0}= 1 and I

_{1}= 0. These two assignments are called data. I

_{1}= I

_{0}= 0 is called the spacer, and I

_{1}= I

_{0}= 1 is considered to be indeterminate or invalid.

_{1}and I

_{0}, where I = 1 is encoded as I

_{1}= 0 and I

_{0}= 1, and I = 0 is encoded as I

_{0}= 0 and I

_{1}= 1. These two assignments are called data. I

_{1}= I

_{0}= 1 is called the spacer, and I

_{1}= I

_{0}= 0 is considered to be indeterminate or invalid.

#### 2.2. Kinds of QDI Circuits

## 3. QDI Implementation of Accurate and Approximate Adders

_{L–1}up to B

_{0}are directly forwarded as the respective sum bits viz. SUM

_{L–1}up to SUM

_{0}, and A

_{L–2}up to A

_{0}are discarded. Only A

_{L–1}is given as the carry input to the accurate part in APPROX5. Given that a 22–10 input partition has been used for all the approximate adders considering a digital image processing application, therefore B

_{9}up to B

_{0}are forwarded as the respective sum bits SUM

_{9}up to SUM

_{0}. This means that based on dual-rail encoding, B

_{91}up to B

_{01}are forwarded as SUM

_{91}up to SUM

_{01}, and B

_{90}up to B

_{00}are forwarded as SUM

_{90}up to SUM

_{00}, by using a non-inverting buffer for each encoded sum rail. (A

_{81}, A

_{80}) up to (A

_{01}, A

_{00}) are however discarded.

_{9}of HEAA, i.e., SUM

_{91}and SUM

_{90}are realized in early output style after logic simplification, as shown in Figure 7a,b, which correspond to RTZ and RTO handshaking, respectively. The equations for SUM

_{91}and SUM

_{90}are given below.

_{71}to SUM

_{01}should assume a constant 1 for the application of data and should assume 0 for the application of the spacer. This is realized by using a dedicated non-inverting buffer to connect each sum rail viz. SUM

_{71}to SUM

_{01}to the ACKIN signal. This would ensure that when ACKIN is 1 during the application of data, SUM

_{71}to SUM

_{01}will assume 1, and when ACKIN is 0 during the application of the spacer, SUM

_{71}to SUM

_{01}will assume 0. With respect to SUM

_{70}to SUM

_{00}, they should always be 0. This is ensured by connecting each of these sum rails individually to TIEL (i.e., a tie-to-logic low) standard cells so that SUM

_{70}to SUM

_{00}will always remain 0. Considering RTO handshaking, SUM

_{71}to SUM

_{01}are individually connected to the ACKIN signal through dedicated non-inverting buffers, as done for RTZ handshaking. However, SUM

_{70}to SUM

_{00}are individually connected to TIEH (i.e., tie-to-logic high) standard cells so that SUM

_{70}to SUM

_{00}will always remain 1.

_{7}to SUM

_{0}. Hence, the circuits used for OLOCA with respect to these eight less significant sum bits are maintained the same for HOERAA for RTZ and RTO handshaking. Considering the most significant sum bit in the inaccurate part of HOERAA viz. SUM

_{9}, again a 2:1 multiplexer is involved as is the case with HEAA discussed earlier. Therefore, SUM

_{9}of HOERAA, i.e., SUM

_{91}and SUM

_{90}after dual-rail encoding, was realized in an early output style after logic simplification, as described for HEAA, and this is shown in Figure 8a,b with respect to RTZ and RTO handshaking.

## 4. Implementation Results

## 5. Error Metrics of Approximate Adders

^{64}distinct input vectors, which is computationally intensive to consider. Hence, we generated one million random input vectors, and supplied them to the accurate and approximate adders. For each input supplied, the absolute difference between the sum produced by an approximate adder and the sum produced by the accurate adder was calculated, which is also called the error magnitude. The error magnitude may be positive or negative or nil. The absolute error was computed individually for each approximate adder for each input vector supplied, and the mean of the absolute errors corresponding to a million random input vectors was calculated using Equation (3). The RMSE was calculated using Equation (4). MAE and RMSE calculated for the approximate adders are given in Table 2. From Table 2, it is seen that HOERAA has almost the same MAE as HEAA. However, HOERAA has 8.6% reduced RMSE compared to HEAA. Hence, in terms of the error metrics, HOERAA is preferable to its counterparts. The reduced MAE and RMSE of HOERAA leads to better digital image processing results, with an improved signal-to-noise ratio compared to the other approximate adders, and this shall be discussed in the next section.

## 6. Digital Image Processing Results

## 7. Conclusions

## Author Contributions

## Funding

## Acknowledgments

## Conflicts of Interest

## References

- Han, J.; Orshansky, M. Approximate computing: An emerging paradigm for energy-efficient design. In Proceedings of the 18th IEEE European Test Symposium, Avignon, France, 27–31 May 2013. [Google Scholar]
- Venkataramani, S.; Chakradhar, S.T.; Roy, K.; Raghunathan, A. Approximate computing and the quest for computing efficiency. In Proceedings of the 52nd Design Automation Conference, San Francisco, CA, USA, 7–11 June 2015. [Google Scholar]
- Breuer, M.A.; Zhu, H. Error-tolerance and multi-media. In Proceedings of the International Conference on Intelligent Information Hiding and Multimedia Signal Processing, Pasadena, CA, USA, 18–20 December 2006. [Google Scholar]
- Shafique, M.; Hafiz, R.; Rehman, S.; El-Harouni, W.; Henkel, J. Cross-layer approximate computing: From logic to architectures. In Proceedings of the 53rd Annual Design Automation Conference, Austin, TX, USA, 5–9 June 2016. [Google Scholar]
- Sampson, A.; Deitl, W.; Fortuna, D.; Gnanapragasam, D.; Ceze, L.; Grossman, D. EnerJ: Approximate data types for safe and general low-power computation. ACM SIGPLAN Not.
**2011**, 46, 164–174. [Google Scholar] [CrossRef] - Sampson, K.; Nelson, J.; Strauss, K.; Ceze, L. Approximate storage in solid-state memories. ACM Trans. Comput. Syst.
**2014**, 32, 1–23. [Google Scholar] [CrossRef] - Jiang, H.; Liu, C.; Liu, L.; Lombardi, F.; Han, J. A review, classification, and comparative evaluation of approximate arithmetic circuits. ACM J. Emerg. Technol. Comput. Syst.
**2017**, 13, 1–34. [Google Scholar] [CrossRef] [Green Version] - Venkataramani, S.; Sabne, A.; Kozhikkottu, V.; Roy, K.; Raghunathan, A. SALSA: Systematic logic synthesis of approximate circuits. In Proceedings of the 49th Design Automation Conference, San Francisco, CA, USA, 3–7 June 2012. [Google Scholar]
- Prabakaran, B.S.; Rehman, S.; Hanif, M.A.; Ullah, S.; Mazaheri, G.; Kumar, A.; Shafique, M. DeMAS: An efficient design methodology for building approximate adders for FPGA-based systems. In Proceedings of the Design, Automation and Test in Europe Conference, Dresden, Germany, 19–23 March 2018. [Google Scholar]
- Perri, S.; Spagnolo, F.; Frustaci, F.; Corsonello, P. Efficient approximate adders for FPGA-based data-paths. Electronics
**2020**, 9, 1529. [Google Scholar] [CrossRef] - Mahdiani, H.R.; Ahmadi, A.; Fakhraie, S.M.; Lucas, C. Bio-inspired computational blocks for efficient VLSI implementation of soft-computing applications. IEEE Trans. Circuits Syst. I Regul. Pap.
**2010**, 57, 850–862. [Google Scholar] [CrossRef] - Albicocco, P.; Cardarilli, G.C.; Nannarelli, A.; Petricca, M.; Re, M. Imprecise arithmetic for low power image processing. In Proceedings of the 46th Asilomar Conference on Signals, Systems and Computers, Pacific Grove, CA, USA, 4–7 November 2012. [Google Scholar]
- Gupta, V.; Mohapatra, D.; Raghunathan, A.; Roy, K. Low-power digital signal processing using approximate adders. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst.
**2013**, 32, 124–137. [Google Scholar] [CrossRef] - Balasubramanian, P.; Maskell, D. Hardware efficient approximate adder design. In Proceedings of the IEEE Region 10 Conference, Jeju, Korea, 28–31 October 2018. [Google Scholar]
- Dalloo, A.; Najafi, A.; Garcia-Ortiz, A. Systematic design of an approximate adder: The optimized lower part constant-OR adder. IEEE Trans. VLSI Syst.
**2018**, 26, 1595–1599. [Google Scholar] [CrossRef] - Balasubramanian, P.; Maskell, D.L. Hardware optimized and error reduced approximate adder. Electronics
**2019**, 8, 1212. [Google Scholar] [CrossRef] [Green Version] - Balasubramanian, P.; Dang, C.; Maskell, D.L. Approximate quasi-delay-insensitive asynchronous adders: Design and analysis. In Proceedings of the 60th IEEE International Midwest Symposium on Circuits and Systems, Boston, MA, USA, 6–9 August 2017. [Google Scholar]
- Bouesse, G.F.; Sicard, G.; Baixas, A.; Renaudin, M. Quasi delay insensitive asynchronous circuits for low EMI. In Proceedings of the 4th International Workshop on Electromagnetic Compatibility of Integrated Circuits, Angers, France, 31 March–2 April 2004. [Google Scholar]
- Nowick, S.M.; Singh, M. Asynchronous design—Part 1: Overview and recent advances. IEEE Des. Test
**2015**, 32, 5–18. [Google Scholar] [CrossRef] - Yu, Z.C.; Furber, S.B.; Plana, L.A. An investigation into the security of self-timed circuits. In Proceedings of the 9th International Symposium on Asynchronous Circuits and Systems, Vancouver, BC, Canada, 12–15 May 2003. [Google Scholar]
- David, I.; Ginosar, R.; Yoeli, M. Self-timed is self-checking. J. Electron. Test.
**1995**, 6, 219–228. [Google Scholar] [CrossRef] - Martin, A.J. The limitation to delay-insensitivity in asynchronous circuits. In Beauty is Our Business (Texts and Monographs in Computer Science); Feijen, W.H.J., van Gasteren, A.J.M., Gries, D., Misra, J., Eds.; Springer: New York, NY, USA, 1990; pp. 302–311. [Google Scholar]
- Martin, A.J.; Prakash, P. Asynchronous nano-electronics: Preliminary investigation. In Proceedings of the 14th IEEE International Symposium on Asynchronous Circuits and Systems, Newcastle upon Tyne, UK, 7–11 April 2008. [Google Scholar]
- Muller, D.E.; Bartky, S. A theory of asynchronous circuits. In Proceedings of the International Symposium on the Theory of Switching (Part I), Cambridge, MA, USA, 2–5 April 1957. [Google Scholar]
- Verhoeff, T. Delay-insensitive codes—An overview. Distrib. Comput.
**1988**, 3, 1–8. [Google Scholar] [CrossRef] [Green Version] - Sparsø, J.; Furber, S.B. Principles of Asynchronous Circuit Design: A Systems Perspective; Kluwer Academic Publishers: Dordrecht, The Netherlands, 2001; pp. 9–28. [Google Scholar]
- Moreira, M.T.; Guazzelli, R.A.; Calazans, N.L.V. Return-to-one protocol for reducing static power in C-elements of QDI circuits employing m-of-n codes. In Proceedings of the 25th Symposium on Integrated Circuits and Systems Design, Brasilia, Brazil, 30 August–2 September 2012. [Google Scholar]
- Seitz, C.L. System Timing. In Introduction to VLSI Systems; Mead, C., Conway, L., Eds.; Addison-Wesley: Reading, MA, USA, 1980; pp. 218–262. ISBN 978-0201043587. [Google Scholar]
- Brej, C.F.; Garside, J.D. Early output logic using anti-tokens. In Proceedings of the 12th International Workshop on Logic and Synthesis, Laguna Beach, CA, USA, 28–30 May 2008. [Google Scholar]
- Toms, W.B. Synthesis of Quasi-Delay-Insensitive Datapath Circuits. Ph.D. Thesis, The University of Manchester, Manchester, UK, February 2006. [Google Scholar]
- Balasubramanian, P.; Mastorakis, N.E. QDI decomposed DIMS method featuring homogeneous/heterogeneous data encoding. In Recent Advances in Computers, Communications, Applied Social Science and Mathematics; Mastorakis, N., Mladenov, V., Lepadatescu, B., Karimi, H.R., Helmis, C.G., Eds.; WSEAS Press: Athens, Greece, 2011; pp. 93–101. ISBN 978-1618040305. [Google Scholar]
- Jeong, C.; Nowick, S.M. Block level relaxation for timing-robust asynchronous circuits based on eager evaluation. In Proceedings of the 14th IEEE International Symposium on Asynchronous Circuits and Systems, Newcastle upon Tyne, UK, 7–10 April 2008. [Google Scholar]
- Balasubramanian, P. Comments on “Dual-rail asynchronous logic multi-level implementation”. Integr. VLSI J.
**2016**, 52, 34–40. [Google Scholar] [CrossRef] [Green Version] - Balasubramanian, P. A robust asynchronous early output full adder. WSEAS Trans. Circuits Syst.
**2011**, 10, 221–230. [Google Scholar] - Balasubramanian, P.; Maskell, D.L.; Mastorakis, N.E. Speed, energy and area optimized early output quasi-delay-insensitive array multipliers. PLoS ONE
**2020**, 15, e0228343. [Google Scholar] [CrossRef] [PubMed] - Balasubramanian, P. Comparative evaluation of quasi-delay-insensitive asynchronous adders corresponding to return-to-zero and return-to-one handshaking. Facta Univ. Ser. Electron. Energetics
**2018**, 31, 25–39. [Google Scholar] [CrossRef] - Synopsys SAED_EDK32/28_CORE Databook, Revision 1.0.0. January 2012. Available online: https://www.synopsys.com/community/university-program/teaching-resources.html (accessed on 17 September 2020).
- Rabaey, J.M.; Chandrakasan, A.; Nikolic, B. Digital Integrated Circuits: A Design Perspective, 2nd ed.; Pearson Education: London, UK, 2003; ISBN 978-0130909961. [Google Scholar]
- Balasubramanian, P.; Yamashita, S. Area/latency optimized early output asynchronous full adders and relative-timed ripple carry adders. SpringerPlus
**2016**, 5, 440. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Liang, J.; Han, J.; Lombardi, F. New metrics for the reliability of approximate and probabilistic adders. IEEE Trans. Comput.
**2012**, 62, 1760–1771. [Google Scholar] [CrossRef] - Zhu, N.; Goh, W.L.; Zhang, W.; Yeo, K.S.; Kong, Z.H. Design of low-power high-speed truncation-error-tolerant adder and its application in digital signal processing. IEEE Trans. VLSI Syst.
**2010**, 18, 1225–1229. [Google Scholar] - Gibson, J.D.; Bovik, A. Handbook of Image and Video Processing; Academic Press: Cambridge, MA, USA, 2000. [Google Scholar]
- Soares, L.B.; da Rosa, M.M.A.; Diniz, C.M.; da Costa, E.A.C.; Bampi, S. Exploring power-performance-quality tradeoff of approximate adders for energy efficient Sobel filtering. In Proceedings of the IEEE 9th Latin American Symposium on Circuits and Systems, Puerto Vallarta, Mexico, 25–28 February 2018. [Google Scholar]

**Figure 1.**Block schematic of a quasi-delay-insensitive (QDI) circuit stage. The critical data path is shown via a dashed red line.

**Figure 2.**(

**a**) Logic symbol, (

**b**) gate level realization, and (

**c**) transistor level realization of C-element.

**Figure 4.**Early output half adder corresponding to: (

**a**) return-to-zero (RTZ) handshaking; and (

**b**) return-to-one (RTO) handshaking.

**Figure 6.**Early output realization of a 2-input AND function corresponding to: (

**a**) RTZ handshaking; and (

**b**) RTO handshaking, where (P1, P0) and (Q1, Q0) are the dual-rail inputs and (R1, R0) is the dual-rail output. Early output realization of a 2-input OR function corresponding to: (

**c**) RTZ handshaking; and (

**d**) RTO handshaking, where (S1, S0) and (T1, T0) represent the dual-rail inputs and (U1, U0) represents the dual-rail output.

**Figure 7.**Early output logic realization of the most significant sum bit in the inaccurate part of HEAA corresponding to: (

**a**) RTZ handshaking, and (

**b**) RTO handshaking.

**Figure 8.**Early output logic realization of the most significant sum bit in the inaccurate part of HOERAA corresponding to: (

**a**) RTZ handshaking, and (

**b**) RTO handshaking.

**Figure 9.**Error distribution plots of the approximate adders: (

**a**) LOA; (

**b**) LOAWA; (

**c**) APPROX5; (

**d**) HEAA; (

**e**) OLOCA; and (

**f**) HOERAA.

**Figure 10.**lena image processed using accurate and approximate addition based inverse fast Fourier transform (IFFT) and fast Fourier transform (FFT).

**Table 1.**Design metrics of accurate and approximate 32-bit QDI adders, implemented using a 32/28-nm complementary metal oxide semiconductor (CMOS) technology.

Handshake Scheme | Name of Adder | Forward Latency (ns) | Cycle Time (ns) | Area (µm ^{2}) | Energy, i.e., PCTP (fJ) |
---|---|---|---|---|---|

RTZ | Accurate | 3.11 | 3.81 | 1628.55 | 15,468.60 |

LOA | 2.36 | 3.06 | 1417.11 | 12,344.04 | |

LOAWA | 2.28 | 2.98 | 1394.74 | 12,012.38 | |

APPROX5 | 2.36 | 3.06 | 1413.04 | 12,344.04 | |

HEAA | 2.36 | 3.06 | 1418.12 | 12,344.04 | |

OLOCA | 2.36 | 3.06 | 1411.01 | 12,729.60 | |

HOERAA | 2.36 | 3.06 | 1424.99 | 12,732.66 | |

RTO | Accurate | 2.94 | 3.64 | 1628.55 | 14,771.12 |

LOA | 2.25 | 2.96 | 1417.11 | 11,937.68 | |

LOAWA | 2.17 | 2.87 | 1394.74 | 11,566.10 | |

APPROX5 | 2.25 | 2.96 | 1413.04 | 11,937.68 | |

HEAA | 2.25 | 2.96 | 1418.12 | 11,937.68 | |

OLOCA | 2.25 | 2.96 | 1411.01 | 12,310.64 | |

HOERAA | 2.25 | 2.96 | 1427.02 | 12,316.56 |

Approximate Adder | MAE | RMSE |
---|---|---|

LOA | 191.75 | 255.95 |

LOAWA | 255.47 | 361.61 |

APPROX5 | 256.20 | 295.82 |

HEAA | 127.64 | 180.80 |

OLOCA | 207.92 | 276.46 |

HOERAA | 127.99 | 165.32 |

**Table 3.**Peak signal-to-noise ratio (PSNR) of various digital images obtained after processing by using different approximate adders.

Approximate Adder Used | PSNR | |||||
---|---|---|---|---|---|---|

Woman_with_Dark_Hair | Barbara | Einstein | Boat | Lake | Peppers_Gray | |

LOA | 32.8121 | 32.4863 | 32.5567 | 32.5604 | 32.6313 | 32.6581 |

LOAWA | 25.2304 | 25.1106 | 25.7325 | 24.8022 | 25.2703 | 25.1460 |

APPROX5 | 32.1200 | 31.6881 | 31.8320 | 31.8445 | 31.7789 | 31.8853 |

HEAA | 30.8507 | 30.6490 | 31.0126 | 30.5959 | 30.6447 | 30.7053 |

OLOCA | 32.3729 | 32.0496 | 32.1424 | 32.1698 | 32.1815 | 32.2262 |

HOERAA | 33.2847 | 32.9709 | 33.1791 | 33.0211 | 32.9155 | 33.0998 |

**Table 4.**Structural similarity index metric (SSIM) of various digital images obtained after processing by using different approximate adders.

Approximate Adder Used | SSIM | |||||
---|---|---|---|---|---|---|

Woman_with_Dark_Hair | Barbara | Einstein | Boat | Lake | Peppers_Gray | |

LOA | 0.8150 | 0.8527 | 0.8440 | 0.8602 | 0.8666 | 0.8447 |

LOAWA | 0.7884 | 0.8396 | 0.8198 | 0.8464 | 0.8514 | 0.8302 |

APPROX5 | 0.8063 | 0.8450 | 0.8318 | 0.8462 | 0.8537 | 0.8284 |

HEAA | 0.9174 | 0.9426 | 0.9370 | 0.9480 | 0.9485 | 0.9471 |

OLOCA | 0.8096 | 0.8463 | 0.8373 | 0.8517 | 0.8587 | 0.8359 |

HOERAA | 0.9028 | 0.9297 | 0.9226 | 0.9358 | 0.9394 | 0.9279 |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Balasubramanian, P.; Mastorakis, N.E.
Quasi-Delay-Insensitive Implementation of Approximate Addition. *Symmetry* **2020**, *12*, 1919.
https://doi.org/10.3390/sym12111919

**AMA Style**

Balasubramanian P, Mastorakis NE.
Quasi-Delay-Insensitive Implementation of Approximate Addition. *Symmetry*. 2020; 12(11):1919.
https://doi.org/10.3390/sym12111919

**Chicago/Turabian Style**

Balasubramanian, Padmanabhan, and Nikos E. Mastorakis.
2020. "Quasi-Delay-Insensitive Implementation of Approximate Addition" *Symmetry* 12, no. 11: 1919.
https://doi.org/10.3390/sym12111919