FPGA-Based Implementation of Artificial Neural Network for Accelerated Handwritten Digit Recognition
Abstract
1. Introduction
2. Analysis of Related Work
Problem Statement and Motivation
3. Architecture of Proposed Multi-Layer Perceptron Model
- 1.
- Starting with the input layer, we propagate data forward to the output layer. This step is the forward propagation. Therefore, we calculate the activation unit (, for i in range 1 to 32) of the hidden layer according to Equation System (1) [7], where X is the input data, W is the weight unit, and B is the bias unit. Note that we use the default Keras dense layer based on Glorot Uniform initialization (Xavier initialization) to initialize the weights.
- 2.
- 3.
- We backpropagate the error. We find its derivative with respect to each weight in the network, and update the model.
- 4.
- We optimize using the Adam optimizer method.
4. FPGA Design of Proposed Multi-Layer Perceptron Model
- Image load module: This module allows the loading of the input image of size from the internal memory into a 2-dimensional (row and column) buffer of the same size. We process the data by slots of four pixels using a bus of 32 bits (8 bits by pixel).
- Average Pooling module: from the buffer, we launch seven clusters in parallel, and each one proceeds to calculate the average of four neighboring pixels. The first cluster calculates the average of pixels at positions (0,0), (0,1), (1,0), (1,1), the second one calculates the average of pixels at positions (2,0), (2,1), (3,0), (3,1), the third one calculates the average of pixels at positions (4,0), (4,1), (5,0), (5,1), the fourth one calculates the average of pixels at positions (6,0), (6,1), (7,0), (7,1), the fifth one calculates the average of pixels at positions (8,0), (8,1), (9,0), (9,1), the sixth one calculates the average of pixels at positions (10,0), (10,1), (11,0), (11,1), and the last one calculates the average of pixels at positions (12,0), (12,1), (13,0), (13,1). For the next round, we proceed with columns 14 to 27. Then, we pass to the next two rows, 2 and 3, and so on until we reach the last group of pixels. The architecture is shown in Figure 6. The processing of each pair of rows takes clock cycles. Therefore, the whole process takes clock cycles.
- Reshape module: this intermediate module allows only the reorganization of the data into a 1-dimensional vector to facilitate the calculations and connecting to the hidden layer. Therefore, by considering the input image of size pixels, the module output gives us a vector of length 196 pixels.
- Neurons calculation module: as we mentioned above, the trained weights are saved in the internal memory of our FPGA. To connect the input 1-D vector to the hidden layer that generates 32 neurons, we need weight values and 32 bias values, and to connect the hidden layer to the output layer of 10 neurons, we need weight values and 10 bias values. As a result, we need 6634 parameters (). Note that all the learned parameters are integers encoded in an 8-bit signed format (values in the range of −128 to 127) in the VHDL description (two complement binary representation C2). To implement an accelerated design, during the hidden layer calculation, we run all the neurons in parallel, which results in 32 multiplications in parallel. , , , (). W and P are 8-bit VHDL signed values, and is a 32-bit VHDL signed value. This represents parallel stage 2, as illustrated in Figure 7. Therefore, the hidden layer is generated in 197 clock cycles (196 cycles for the weight cumulative multiplications and 1 cycle for the bias addition). Similarly, during the output layer calculation, we run 10 neurons in parallel, which results in 10 multiplications in parallel. , , , (). W is an 8-bit, is a 16-bit, and is a 32-bit VHDL signed value. This represents parallel stage 3, as illustrated in Figure 8. Therefore, the output layer is generated in 11 clock cycles (10 cycles for the weight cumulative multiplications and 1 cycle for the bias addition). Note that more parallel calculations are possible, but this increase the required hardware logic resources.
- ReLU calculation module: To implement the ReLU function defined by Equation (2), we pipeline the output of the previous module to an intermediate register R of size 32 bits. Then, we compare the content of this register to 0. If negative, we generate 0 as the output value. Otherwise, we shift the register R to the output port. Note that we connect the ReLU module to all the neurons and run them in parallel at each of the two layers.
- Maximum classification module: This module receives and classifies the 10 neurons of the output layer. Then, it generates the maximum value corresponding to the predicted result from the input image. Since the possible expected outputs are values in the range of 0 to 9, we encoded it in an 8-bit format for possible extension of the architecture to recognize the alphabetic characters. In our FPGA implementation, we displayed this value in the LEDs on the Pynq-Z2 used board.



5. Comparison and Discussion
| Work | Device | LUT | FF | DSP | BRAM | Exec Time | Feq (Mhz) | Power |
|---|---|---|---|---|---|---|---|---|
| Al-Khaleel et al. [36] | Spartan-6 | 56,864 | 1596 | - | - | - | 160.7 | - |
| Ahn [40] | KC705 | 42,616 | - | 326 | 31.5 | - | - | - |
| Yilmazi et al. [44] | Virtex 7 | 79,322 | 9243 | 134 | - | - | 200 | - |
| Yu et al. [13] | ALveo U50 | 72,813 | 36,663 | 0 | 234 | 34.24 µs | 300.3 | 18.59 W |
| Wang et al. [38] | Cyclone IV | 24,245 | 16,356 | 93 | - | - | - | - |
| Moradi et al. [33] | Spartan 3AN | - | - | - | - | - | 112 | - |
| Khan et al. [31] | Max+ II | - | - | - | - | 72.96 µs | 4.36 | - |
| Giardino et al. [37] | XC7A100T | 15,796 | 106,400 | - | 73 | 41 µs | 300 | 0.975 W |
| This work | Zynq 7 | 20,758 | 4426 | 42 | 3.5 | 2.192 µs | 125 | 0.364 W |
6. Conclusions and Future Work
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
| ANN | Artificial Neural Networks |
| CPU | Central Processing Unit |
| FPGA | Field Programmable Gate Arrays |
| DSP | Digital Signal Processing |
| MLP | Multi-Layer Perceptron |
| VHDL | VHSIC Hardware Description Language |
| AI | Artificial Intelligence |
| GPU | Graphics Processing Units |
| ROM | Read-Only Memory |
| CNN | Convolution Neural Network |
| OCR | Optical Character Recognition |
| ICFHR | International Conference on Frontiers of Handwriting Recognition |
| ICDAR | International Conference on Document Analysis and Recognition |
| SVM | Support Vector Machines |
| AOCR | Arabic Optical Character Recognition |
| DBN | Deep Belief Network |
| ReLU | Rectified Linear Unit |
| MSE | Mean Squared Error |
| HDL | Hardware Description Language |
References
- Morris, G. Central processing unit (CPU). In Encyclopedia of Computer Science 2003; John Wiley & Sons: Chichester, UK, 2003; pp. 199–200. [Google Scholar]
- Kehoe, P.; Smeaton, A.F. Using Graphics Processor Units (GPUs) for Automatic Video Structuring. In Proceedings of the Eighth International Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS ’07), Santorini, Greece, 6–8 June 2007; p. 18. [Google Scholar] [CrossRef]
- Wang, Y.; Gao, L.; Yang, H. FPGA Programmable Logic Block Architecture with High-Density MAC for Deep Learning Inference. Electronics 2026, 15, 801. [Google Scholar] [CrossRef]
- Madani, M.; Assad, S.E.; Dridi, F.; Lozi, R. Enhanced design and hardware implementation of a chaos-based block cipher for image protection. J. Differ. Equ. Appl. 2022, 29, 1408–1428. [Google Scholar] [CrossRef]
- Sadeghi, S.; Cpi, P. A Comprehensive Review of Digital Signal Processing (DSP) Algorithms and Their Applications in Telecommunication and Wireless Communication Systems. Int. J. Eng. Technol. Sci. 2025, 2025, 60. [Google Scholar]
- Norgbe, C.; Madani, M.; Bourennane, E.B. Privacy-Preserving for Medical Images Using Cryptosystem and Convolutional Autoencoder. In Proceedings of the 2025 9th International Conference on Computer, Software and Modeling (ICCSM), Rome, Italy, 3–5 July 2025; pp. 40–45. [Google Scholar] [CrossRef]
- Ramchoun, H.; Idrissi, M.; Ghanou, Y.; Ettaouil, M. Multilayer Perceptron: Architecture Optimization and Training. Int. J. Interact. Multimed. Artif. Intell. 2016, 4, 26–30. [Google Scholar] [CrossRef]
- Hojjat, K. MNIST Dataset. Available online: https://www.kaggle.com/datasets/hojjatk/mnist-dataset (accessed on 18 May 2026).
- TUL Embedded. PYNQ-Z2 Development Board—Product Specification. Available online: https://www.tulembedded.com/fpga/ProductsPYNQ-Z2.html (accessed on 18 May 2026).
- Madani, M.; Benkhaddra, I.; Tanougast, C.; Chitroub, S.; Sieler, L. Digital Implementation of an Improved LTE Stream Cipher SNOW 3G based on Hyperchaotic PRNG. In Security and Communication Networks; John Wiley & Sons: Hoboken, NJ, USA, 2017; Volume 2017, 15p. [Google Scholar] [CrossRef]
- Cherifi, R.; Madani, M. Secure and Efficient Tele-Radiography Based on the Fusion of a Convolutional Autoencoder and Chaotic Latent Encryption. J. Image Graph. 2026, 14, 49–57. [Google Scholar] [CrossRef]
- Preetha, S.; Afrid, I.M.; Karthik Hebbar, P.; Nishchay, S.K. Machine Learning for Handwriting Recognition. Int. J. Comput. 2020, 38, 93–101. [Google Scholar]
- Yu, K.; Kim, M.; Choi, J.R. Memory-Tree Based Design of Optical Character Recognition in FPGA. Electronics 2023, 12, 754. [Google Scholar] [CrossRef]
- AlKendi, W.; Gechter, F.; Heyberger, L.; Guyeux, C. Advancements and Challenges in Handwritten Text Recognition: A Comprehensive Survey. J. Imaging 2024, 10, 18. [Google Scholar] [CrossRef]
- Netzer, Y.; Wang, T.; Coates, A.; Bissacco, A.; Wu, B.; Ng, A. Reading Digits in Natural Images with Unsupervised Feature Learning. In NIPS Workshop on Deep Learning and Unsupervised Feature Learning; Springer: Berlin/Heidelberg, Germany, 2011. [Google Scholar]
- Surya, N.R.; Afseena, S. Handwritten Character Recognition—A Review. Int. J. Sci. Res. Publ. 2015, 5, 3. [Google Scholar]
- Aljarrah, I.A.; Al-Khaleel, O.D.; Mhaidat, K.M.; Alrefai, M.; Alzu’bi, A.; Rabab’ah, M. Automated System for Arabic Optical Character Recognition with Lookup Dictionary. J. Emerg. Technol. Web Intell. 2012, 4, 362–370. [Google Scholar] [CrossRef][Green Version]
- Inad, A.; Osama, A.K.; Khaldoon, M.; Mu’ath, A.; Abdullah, A.; Mohammad, R. Automated system for Arabic optical character recognition. In Proceedings of the Association for Computing Machinery, New York, NY, USA, 11–14 November 2012. [Google Scholar] [CrossRef]
- Haidar, A.; John, G.; Hisham, A. A Real-time DSP-Based Optical Character Recognition System for Isolated Arabic characters using the TI TMS320C6416T. In Proceedings of the 2008 IAJC-IJME International Conference, Nashville, TN, USA, 17–19 November 2008. [Google Scholar]
- Bhagat, M.S.S.; Joshi, M.A.R.; Gajbhiye, M.V.S.; Nandanwar, M.S.R.; Ingle, P.M. Handwritten Character Detection Using Optical Character Recognition Method. Int. J. Res. Appl. Sci. Eng. Technol. 2018, 6, 4724–4726. [Google Scholar] [CrossRef]
- Rajabi, M.; Nematbakhsh, N.; Monadjemi, A. A New Decision Tree for Recognition of Persian Handwritten Characters. Int. J. Comput. Appl. 2012, 44, 52–58. [Google Scholar] [CrossRef]
- Obaid, A.; El-Bakry, H.; Eldosuky, M.; Shehab, A. Handwritten Text Recognition System based on Neural Network. Int. J. Adv. Res. Comput. Sci. Technol. 2016, 4, 72–77. [Google Scholar]
- Meenu, M.; Jyothi, R.L. Handwritten Character Recognition: A Comprehensive Review on Geometrical Analysis. Osr. J. Comput. Eng. (IOSR-JCE) 2015, 17, 83–88. [Google Scholar]
- Rao, P.S.; Aditya, J.N.H.S. Handwriting Recognition—“ Offline ” Approach. In Proceedings of the Research School of Computer Science, Stockholm, Sweden, 21–25 June 2014. [Google Scholar]
- Rosyda, S.S.; Purboyo, T.W. A Review of Various Handwriting Recognition Methods. Int. J. Appl. Eng. Res. 2018, 13, 1155–1164. [Google Scholar]
- Zhu, W. Classification of MNIST Handwritten Digit Database using Neural Network. In Proceedings of the Research School of Computer Science 2018, Canberra, Australia, 20 July 2018; Available online: https://api.semanticscholar.org/CorpusID:202741200 (accessed on 18 May 2026).
- Shamim, S.M.; Miah, M.B.; Sarker, A.; Rana, M.; Jobair, A. Handwritten Digit Recognition Using Machine Learning Algorithms. Glob. J. Sci. Technol. 2018, 18, 29–39. [Google Scholar] [CrossRef]
- Darmatasia; Fanany, M.I. Handwriting recognition on form document using convolutional neural network and support vector machines (CNN-SVM). In Proceedings of the 2017 5th International Conference on Information and Communication Technology (ICoIC7), Melaka, Malaysia, 17–19 May 2017; pp. 1–6. [Google Scholar]
- Fahmy, M.M.M.; Ali, S.A. Automatic recognition of handwritten arabic characters using their geometrical features. Stud. Inform. Control 2001, 10, 81–98. [Google Scholar]
- Ali, A.H.; Mohammed, M.A.; Ahmed, M.A. Character Recognition By Implementing FPGA-Based Artificial Neural Network. Mesopotamian J. Comput. Sci. 2021, 2021, 13–17. [Google Scholar] [CrossRef]
- Khan, F.; Uppal, M.; Song, W.C.; Kang, M.J.; Mirza, A. FPGA Implementation of a Neural Network for Character Recognition. Adv. Neural Netw. 2006, 2006, 1357–1365. [Google Scholar] [CrossRef]
- Rahardjo, P.M.; dan Nanang Sulistyanto, M.R. The Implementation of Feedforward Backpropagation Algorithm for Digit Handwritten Recognition in a Xilinx Spartan-3. J. EECCIS 2010, IV, 2. [Google Scholar]
- Moradi, M.; Pourmina, M.A.; Razzazi, F. FPGA-Based Farsi Handwritten Digit Recognition System. Int. J. Simul. Syst. Sci. Technol. 2010, 11, 17–22. [Google Scholar]
- Toosizadeh, N.; Eshghi, M. Design and implementation of a new persian digits ocr algorithm on fpga chips. In Proceedings of the 13th Conference, European Signal Processing (EUSIPCO2005), Antalya, Turkey, 4–8 September 2005. [Google Scholar]
- Al-Marakeby, A.; Kimura, F.; Zaki, M.; Rashid, A. Design of an Embedded Arabic Optical Character Recognition. J. Signal Process. Syst. 2013, 70, 249–258. [Google Scholar] [CrossRef]
- Al-Khaleel, O.; Aljarrah, I.; Idries, A.; Mhaidat, K. Hardware Implementation of Web Based Arabic Optical Character Recognition Units. J. Emerg. Technol. Web Intell. 2014, 6, 210–219. [Google Scholar] [CrossRef]
- Giardino, D.; Matta, M.; Silvestri, F.; Spanò, S.; Trobiani, V. FPGA Implementation of Hand-written Number Recognition Based on CNN. Int. J. Adv. Sci. Eng. Inf. Technol. 2019, 9, 167–171. [Google Scholar] [CrossRef]
- Wang, L.; Yang, Z.; Xu, G.R.; lan Fu, M.; Wang, Y. Design of FPGA-based Handwriting Image Recognition System. Adv. Model. Anal. B 2017, 60, 426–437. [Google Scholar] [CrossRef]
- Schmidhuber, J. Deep learning in neural networks: An overview. Neural Netw. 2015, 61, 85–117. [Google Scholar] [CrossRef] [PubMed]
- Ahn, B. Real-time video object recognition using convolutional neural network. In Proceedings of the 2015 International Joint Conference on Neural Networks (IJCNN), Killarney, Ireland, 12–17 July 2015; pp. 1–7. [Google Scholar] [CrossRef]
- Kokkinis, A.; Siozios, K. Fast Resource Estimation of FPGA-Based MLP Accelerators for TinyML Applications. Electronics 2025, 14, 247. [Google Scholar] [CrossRef]
- Zhao, Y.; Li, M.; Zhang, Y.; Lin, Q.; Chen, Z. Research on FPGA timing optimization methods with large on-chip memory resource utilization in PCIe DMA. In Proceedings of the 2016 CIE International Conference on Radar (RADAR), Guangzhou, China, 10–13 October 2016; pp. 1–4. [Google Scholar] [CrossRef]
- Giasemis, F.I.; Lončar, V.; Granado, B.; Gligorov, V.V. Comparative Analysis of FPGA and GPU Performance for Machine Learning-Based Track Reconstruction at LHCb. In Proceedings of the 2025 23rd IEEE Interregional NEWCAS Conference (NEWCAS), Paris, France, 22–25 June 2025; Available online: http://arxiv.org/abs/2502.02304 (accessed on 26 May 2026).
- Yilmaz, A.R.; Erkmen, B.; Yavuz, O. Accelerating handwritten signature recognition using intelligent algorithm based embedded system. Sigma J. Eng. Nat. Sci. Sigma MüHendislik Ve Fen Bilim. Derg. 2016, 34, 393–405. [Google Scholar]







| Category | References | Strong Sides (Advantages) | Weak Sides (Limitations) |
|---|---|---|---|
| Literature surveys | [12,13,14] | Comprehensive data compilation | Absence of hardware metrics |
| Global methodology overviews | No direct algorithmic validation | ||
| Multi-lingual dataset indexing | Restricted to high-level analysis | ||
| Traditional OCR and structural pipelines | [17,18,20,23,24] | Low-complexity geometric logic | Manual feature engineering required |
| Effective background noise removal | Weak adaptation to custom styles | ||
| Deterministic and fast execution | Highly sensitive to font distortions | ||
| Embedded software | [19,22,29] | High algorithmic flexibility | Strict sequential code execution |
| Straightforward software patching | Limited computational throughput | ||
| Low development abstraction barrier | Hardware functional bottlenecks | ||
| Deep learning models | [15,16,25,26,27,28] | State-of-the-art classification accuracy | High computational footprint |
| Automated feature extraction | Extensive memory consumption | ||
| Robustness against input noise | Requires high-end CPU/GPU nodes | ||
| Early Hardware and Lightweight FPGA Matrix Models | [21,30,31,32,33] | Deterministic execution latency | Restricted to tiny binary grids |
| High internal parallel processing | Oversimplified neural topologies | ||
| Low operating energy consumption | Low generalization capabilities | ||
| Modern Customized FPGA Systems | [34,35,36,37,38] | Superior real-time processing throughput | Highly static hardware architecture |
| Tailored internal resource balancing | Time-consuming in coding phases | ||
| Optimized power efficiency at the edge | Complex co-design implementation |
| VHDL Module | Input Width | Output Width |
|---|---|---|
| Image load | bit | bit |
| Average pooling | bit | bit |
| Reshape | bit | bit |
| Neurons calculation 1 | bit | bit |
| ReLu 1 | bit | bit |
| Neurons calculation 2 | bit | bit |
| ReLu 2 | bit | bit |
| Maximum classification | bit | bit |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Madani, M.; Bourennane, E.-B. FPGA-Based Implementation of Artificial Neural Network for Accelerated Handwritten Digit Recognition. Electronics 2026, 15, 2384. https://doi.org/10.3390/electronics15112384
Madani M, Bourennane E-B. FPGA-Based Implementation of Artificial Neural Network for Accelerated Handwritten Digit Recognition. Electronics. 2026; 15(11):2384. https://doi.org/10.3390/electronics15112384
Chicago/Turabian StyleMadani, Mahdi, and El-Bay Bourennane. 2026. "FPGA-Based Implementation of Artificial Neural Network for Accelerated Handwritten Digit Recognition" Electronics 15, no. 11: 2384. https://doi.org/10.3390/electronics15112384
APA StyleMadani, M., & Bourennane, E.-B. (2026). FPGA-Based Implementation of Artificial Neural Network for Accelerated Handwritten Digit Recognition. Electronics, 15(11), 2384. https://doi.org/10.3390/electronics15112384

