A Multiplierless Architecture for Image Convolution in Memory
Abstract
1. Introduction
- (1)
- Image convolution is arithmetically intensive: Assuming zero padding at the edges, a convolution of a image with a 3 × 3 kernel results in IPs, where each IP requires nine multiplications and eight addition operations.
- (2)
- The kernel or the filter is usually fixed for a particular processing on the image. Typically, for image processing, there are standard filters like sharpen, edge detection, sobel, blur, laplacian, etc. which are applied for various tasks like smoothening, segmentation and feature extraction.

2. Proposed Architecture for Image Convolution: Distributed Arithmetic Approach

3. Image Convolution in Memory: Functional Verification
3.1. SystemC Tool to Evaluate In-Memory Architectures
3.2. Negative Filter Co-Coefficients and Adder Bit-Width
3.3. Customization of the Tool and Functional Verification of Image Convolution in Memory
4. Image Convolution in Memory: Circuit Implementation and Validation
4.1. Pre-Convolution Procedure
4.2. In-Memory Image Convolution Architecture
4.3. Circuit Design and Simulation Results
4.4. Performance Metrics for Single Stride of Convolution
4.4.1. Energy
4.4.2. Latency
4.4.3. Area
5. Comparison with Conventional Image Convolution
5.1. Energy and Latency Estimate of Convolution in Google’s TPU
5.2. Energy and Latency Estimate of Convolution Using DA
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Horowitz, M. Computing’s energy problem (and what we can do about it). In Proceedings of the 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC), San Francisco, CA, USA, 9–13 February 2014; pp. 10–14. [Google Scholar] [CrossRef]
- Mutlu, O.; Ghose, S.; Gómez-Luna, J.; Ausavarungnirun, R. Processing data where it makes sense: Enabling in-memory computation. Microprocess. Microsyst. 2019, 67, 28–41. [Google Scholar] [CrossRef]
- Reuben, J.; Ben-Hur, R.; Wald, N.; Talati, N.; Ali, A.H.; Gaillardon, P.E.; Kvatinsky, S. Memristive logic: A framework for evaluation and comparison. In Proceedings of the 2017 27th International Symposium on Power and Timing Modeling, Optimization and Simulation (PATMOS), Thessaloniki, Greece, 25–27 September 2017; pp. 1–8. [Google Scholar] [CrossRef]
- Nguyen, H.A.D.; Yu, J.; Lebdeh, M.A.; Taouil, M.; Hamdioui, S.; Catthoor, F. A Classification of Memory-Centric Computing. J. Emerg. Technol. Comput. Syst. 2020, 16, 1–26. [Google Scholar] [CrossRef]
- Lakshmi, V.; Reuben, J.; Pudi, V. A Novel In-Memory Wallace Tree Multiplier Architecture Using Majority Logic. IEEE Trans. Circuits Syst. I Regul. Pap. 2022, 69, 1148–1158. [Google Scholar] [CrossRef]
- White, S. Applications of distributed arithmetic to digital signal processing: A tutorial review. IEEE ASSP Mag. 1989, 6, 4–19. [Google Scholar] [CrossRef]
- Mehendale, M.; Sharma, M.; Meher, P.K. DA-Based Circuits for Inner-Product Computation. In Arithmetic Circuits for DSP Applications; John Wiley & Sons, Ltd.: Hoboken, NJ, USA, 2017; Volume 3, pp. 77–112. [Google Scholar] [CrossRef]
- Jha, C.K.; Mahzoon, A.; Drechsler, R. Investigating Various Adder Architectures for Digital In-Memory Computing Using MAGIC-based Memristor Design Style. In Proceedings of the 2022 IEEE International Conference on Emerging Electronics (ICEE), Bangalore, India, 11–14 December 2022; pp. 1–4. [Google Scholar] [CrossRef]
- Reuben, J. Design of In-Memory Parallel-Prefix Adders. J. Low Power Electron. Appl. 2021, 11, 45. [Google Scholar] [CrossRef]
- Reuben, J.; Pechmann, S. Accelerated Addition in Resistive RAM Array Using Parallel-Friendly Majority Gates. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 2021, 29, 1108–1121. [Google Scholar] [CrossRef]
- Lakshmi, V.; Pudi, V.; Reuben, J. Inner Product Computation In-Memory Using Distributed Arithmetic. IEEE Trans. Circuits Syst. I Regul. Pap. 2022, 69, 4546–4557. [Google Scholar] [CrossRef]
- Jiang, Z.; Wu, Y.; Yu, S.; Yang, L.; Song, K.; Karim, Z.; Wong, H.S.P. A Compact Model for Metal–Oxide Resistive Random Access Memory with Experiment Verification. IEEE Trans. Electron Devices 2016, 63, 1884–1892. [Google Scholar] [CrossRef]
- Jooq, M.K.Q.; Moaiyeri, M.H.; Tamersit, K. A New Design Paradigm for Auto-Nonvolatile Ternary SRAMs Using Ferroelectric CNTFETs: From Device to Array Architecture. IEEE Trans. Electron Devices 2022, 69, 6113–6120. [Google Scholar] [CrossRef]
- Jouppi, N.P.; Young, C.; Patil, N.; Patterson, D.; Agrawal, G.; Bajwa, R.; Bates, S.; Bhatia, S.; Boden, N.; Borchers, A.; et al. In-Datacenter Performance Analysis of a Tensor Processing Unit. SIGARCH Comput. Archit. News 2017, 45, 1–12. [Google Scholar] [CrossRef]
- Zeller, F.; Reuben, J.; Fey, D. Multiplier-free In-Memory Vector-Matrix Multiplication Using Distributed Arithmetic. arXiv 2025, arXiv:2510.02099. [Google Scholar] [CrossRef]
- Reuben, J. Resistive RAM and Peripheral Circuitry: An Integrated Circuit Perspective. 2024. Available online: https://books.google.com.au/books?id=M-gIEQAAQBAJ&printsec=frontcover&source=gbs_book_other_versions_r&redir_esc=y#v=onepage&q&f=false (accessed on 15 October 2025).
- Chang, M.F.; Wu, J.J.; Chien, T.F.; Liu, Y.C.; Yang, T.C.; Shen, W.C.; King, Y.C.; Lin, C.J.; Lin, K.F.; Chih, Y.D.; et al. Embedded 1Mb ReRAM in 28nm CMOS with 0.27-to-1V read using swing-sample-and-couple sense amplifier and self-boost-write-termination scheme. In Proceedings of the 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC), San Francisco, CA, USA, 9–13 February 2014; pp. 332–333. [Google Scholar] [CrossRef]
- Okuno, J.; Kunihiro, T.; Konishi, K.; Maemura, H.; Shuto, Y.; Sugaya, F.; Materano, M.; Ali, T.; Lederer, M.; Kuehnel, K.; et al. Demonstration of 1T1C FeRAM Arrays for Nonvolatile Memory Applications. In Proceedings of the 2021 20th International Workshop on Junction Technology (IWJT), Kyoto, Japan, 10–11 June 2021; pp. 1–4. [Google Scholar] [CrossRef]
- Okuno, J.; Kunihiro, T.; Konishi, K.; Materano, M.; Ali, T.; Kuehnel, K.; Seidel, K.; Mikolajick, T.; Schroeder, U.; Tsukamoto, M.; et al. 1T1C FeRAM Memory Array Based on Ferroelectric HZO With Capacitor Under Bitline. IEEE J. Electron Devices Soc. 2022, 10, 29–34. [Google Scholar] [CrossRef]
- Kobayashi, T.; Nogami, K.; Shirotori, T.; Fujimoto, Y. A current-controlled latch sense amplifier and a static power-saving input buffer for low-power architecture. IEEE J. Solid-State Circuits 1993, 28, 523–527. [Google Scholar] [CrossRef]
- Weis, C.; Mutaal, A.; Naji, O.; Jung, M.; Hansson, A.; Wehn, N. DRAMSpec: A High-Level DRAM Timing, Power and Area Exploration Tool. Int. J. Parallel Program. 2017, 45, 1566–1591. [Google Scholar] [CrossRef]
- Lanza, M.; Wong, H.S.P.; Pop, E.; Ielmini, D.; Strukov, D.; Regan, B.C.; Larcher, L.; Villena, M.A.; Yang, J.J.; Goux, L.; et al. Recommended Methods to Study Resistive Switching Devices. Adv. Electron. Mater. 2019, 5, 1800143. [Google Scholar] [CrossRef]






| Step | Address | MR | IS | LSIS |
|---|---|---|---|---|
| 1 | 1 0011 1111 | 13 | 13 | 26 |
| 2 | 0 0110 0111 | 7 | 33 | 66 |
| 3 | 0 1010 0111 | 8 | 74 | 148 |
| 4 | 0 0100 1110 | 6 | 154 | 308 |
| 5 | 1 1110 0110 | 9 | 317 | 634 |
| 6 | 0 1000 0110 | 5 | 639 | 1278 |
| 7 | 0 0100 1110 | 6 | 1284 | 2568 |
| 8 | 0 0000 0110 | 3 | 2571 |
| ASIC (Google TPU v1) | In-Memory Image Convolution | IMIC Is | |
|---|---|---|---|
| Energy | 8421 nJ | 150 nJ | 56× less |
| Latency | 250.9 μs | 6021 μs | 24× more |
| Area/Hardware | 256 × 256 array of Multiple-and-Accumulate (MAC) units | Two ReRAM arrays with peripheral circuitry of 4748 transistors |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Reuben, J.; Zeller, F.; Seiler, B.; Fey, D. A Multiplierless Architecture for Image Convolution in Memory. J. Low Power Electron. Appl. 2025, 15, 63. https://doi.org/10.3390/jlpea15040063
Reuben J, Zeller F, Seiler B, Fey D. A Multiplierless Architecture for Image Convolution in Memory. Journal of Low Power Electronics and Applications. 2025; 15(4):63. https://doi.org/10.3390/jlpea15040063
Chicago/Turabian StyleReuben, John, Felix Zeller, Benjamin Seiler, and Dietmar Fey. 2025. "A Multiplierless Architecture for Image Convolution in Memory" Journal of Low Power Electronics and Applications 15, no. 4: 63. https://doi.org/10.3390/jlpea15040063
APA StyleReuben, J., Zeller, F., Seiler, B., & Fey, D. (2025). A Multiplierless Architecture for Image Convolution in Memory. Journal of Low Power Electronics and Applications, 15(4), 63. https://doi.org/10.3390/jlpea15040063

