Multimodal Handwritten Exam Text Recognition Based on Deep Learning
Abstract
1. Introduction
- (1)
- A Multimodal Handwritten Text Adaptive Recognition algorithm, MHTR, is proposed for examination scenarios. By integrating a Handwritten Character Classification Module with a Handwritten Text Adaptive Recognition Module, the method enables effective recognition of mixed handwritten content, including Chinese characters, digits, and mathematical expressions.
- (2)
- A Context-aware Recognition Optimization Module is designed to incorporate local semantic and structural information, effectively mitigating misrecognition issues caused by similar character shapes and diverse handwriting styles.
- (3)
- A heterogeneous integrated handwritten text dataset tailored for examination scenarios is constructed, covering various character types such as Chinese characters, digits, and mathematical symbols, with high structural complexity and stylistic diversity.
2. Materials and Methods
2.1. Dataset Construction
2.1.1. Self-Constructed Handwritten Exam Paper Dataset
2.1.2. Public MNIST Dataset
2.1.3. Public CASIA-HWDB Dataset
2.1.4. Public CROHME Dataset
2.2. Multimodal Handwritten Text Adaptive Recognition Algorithm
2.2.1. Handwritten Character Classification Module
- (1)
- Embedding Layer
- (2)
- Transformer Encoder
- (3)
- MLP Head
2.2.2. Handwritten Text Adaptive Recognition Module
Handwritten Chinese Character Recognition Module
- (1)
- Backbone Network
- (2)
- Detection and Recognition Module
- (3)
- Reading Order Prediction Module
- (4)
- Graph-Based Decoding Algorithm
Handwritten Math Formula Recognition Module
- (1)
- Multi-Scale Counting Module
- (2)
- Counting-Combined Attention Decoder
- (3)
- Loss Function
Handwritten Digit Recognition Module
- (1)
- Proposed Convolutional Token Embedding Mechanism
- (2)
- Proposed Convolutional Transformer Block Mechanism
2.2.3. Context-Aware Handwritten Text Recognition Optimization Module
Context Feature Encoding Based on Bi-LSTM
LSTM Context Feature Decoding Based on Attention Mechanism
3. Experiment and Performance Analysis
3.1. Experimental Configuration
3.2. Experimental Indicators
3.3. Comparison Experiments of Different Recognition Models
3.4. Comparison Experiments of Text Recognition Optimization Models
3.5. Visualization of Recognition Results for the MHTR
3.6. Visualization of Recognition Results for the Text Optimization Model
4. Discussion
4.1. Model Innovations and Advantages
4.2. Limitations and Future Work
4.2.1. Limitations
- (1)
- The experiments were conducted only on horizontally written handwriting and did not address recognition of handwritten content at arbitrary angles, which to some extent limits the applicability of the model. In addition, the model’s recognition performance still faces certain limitations in extreme scenarios, such as severe writing distortions or heavy noise interference.
- (2)
- In the recognition of handwritten text on exam papers, current research mainly focuses on single visual input data types from offline exam papers (such as handwritten text in images), while neglecting other modal information present in the papers (such as paper quality, pen pressure, etc.), which may have a certain impact on handwriting recognition.
4.2.2. Future Work
- (1)
- Future research should incorporate multi-angle and multi-orientation handwritten text data to enhance the model’s adaptability to arbitrary writing directions. A more diverse dataset covering extreme handwriting styles and complex scenarios should be built to improve the model’s generalization ability under non-ideal conditions.
- (2)
- In the field of exam paper handwritten text recognition, future work can explore multi-modal data fusion techniques, combining image information with other types of sensor data (such as pressure sensors and tilt sensors) to further improve recognition accuracy.
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Casey, R.; Nagy, G. Recognition of Printed Chinese Characters. IEEE Trans. Electron. Comput. 1966, 15, 91–101. [Google Scholar] [CrossRef]
- Ahmed, S.; Islam, S. Methods in Detection of Median Filtering in Digital Images: A Survey. Multimed. Tools Appl. 2023, 82, 43945–43965. [Google Scholar] [CrossRef]
- Du, Y.; Yuan, H.; Jia, K.; Li, F. Research on Threshold Segmentation Method of Two-Dimensional Otsu Image Based on Improved Sparrow Search Algorithm. IEEE Access 2023, 11, 70459–70469. [Google Scholar] [CrossRef]
- Petlu, M.; Shanmugam, U.; Elangovan, K. Damaged Number Plate Detection to Improve the Accuracy Rate Using Bernsen Algorithm over Genetic Algorithm. Proc. Adv. Sustain. Constr. Mater. 2023, 2655, 020074. [Google Scholar] [CrossRef]
- Di, Y.; Li, R.; Tian, H.; Guo, J.; Shi, B.; Wang, Z.; Yan, K.; Liu, Y. A maneuvering target tracking based on fastIMM-extended Viterbi algorithm. Neural Comput. Appl. 2023, 37, 7925–7934. [Google Scholar] [CrossRef]
- Fan, G.; Han, Y.; Li, J.; Peng, L.; Yeh, Y.; Hong, W. A Hybrid Model for Deep Learning Short-Term Power Load Forecasting Based on Feature Extraction Statistics Techniques. Expert Syst. Appl. 2024, 238, 122012. [Google Scholar] [CrossRef]
- Shen, L.; Jiang, C.J.; Liu, G.J. Satellite Objects Extraction and Classification Based on Similarity Measure. IEEE Trans. Syst. Man Cybern. Syst. 2015, 46, 1148–1154. [Google Scholar] [CrossRef]
- Zhu, L.; Chen, T.; Yin, J.; See, S.; Liu, J. Learning Gabor Texture Features for Fine-Grained Recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Paris, France, 1–6 October 2023; pp. 1621–1631. [Google Scholar] [CrossRef]
- Barbhuiya, A.A.; Karsh, R.K.; Jain, R. A Convolutional Neural Network and Classical Moments-Based Feature Fusion Model for Gesture Recognition. Multimed. Syst. 2022, 28, 1779–1792. [Google Scholar] [CrossRef]
- Wei, X.; Lu, S.; Lu, Y. Compact MQDF Classifiers Using Sparse Coding for Handwritten Chinese Character Recognition. Pattern Recognit. 2018, 76, 679–690. [Google Scholar] [CrossRef]
- Valkenborg, D.; Rousseau, A.; Geubbelmans, M.; Burzykowski, T. Support Vector Machines. Am. J. Orthod. Dentofac. Orthop. 2023, 164, 754–757. [Google Scholar] [CrossRef] [PubMed]
- Borucka, A.; Kozłowski, E.R.; Parczewski, R.; Antosz, K.; Gil, L.; Pieniak, D. Supply Sequence Modelling Using Hidden Markov Models. Appl. Sci. 2023, 13, 231. [Google Scholar] [CrossRef]
- Ciresan, D.C.; Meier, U.; Gambardella, L.M. Convolutional Neural Network Committees for Handwritten Character Classification. In Proceedings of the 2011 International Conference on Document Analysis and Recognition, Beijing, China, 18–21 September 2011; pp. 1135–1139. [Google Scholar] [CrossRef]
- Wang, S.; Chen, L.; Wu, C.; Fan, W.; Sun, J.; Naoi, S. CNN Based Handwritten Character Recognition. In Advances in Chinese Document and Text Processing; World Scientific Publishing: Singapore, 2017; pp. 57–77. [Google Scholar]
- Chen, Y.; Zhang, H.; Liu, C. Improved Learning for Online Handwritten Chinese Text Recognition with Convolutional Prototype Network. In Proceedings of the Document Analysis and Recognition—ICDAR 2023, San José, CA, USA, 21–26 August 2023; Lecture Notes in Computer Science. Springer: Cham, Switzerland, 2023; pp. 38–53. [Google Scholar] [CrossRef]
- Bharati, P. A Hybrid Approach for Denoising and Recognition of Handwritten Characters Using Deblur GAN-CNN. In Proceedings of the 2024 IEEE 4th International Conference on ICT in Business Industry & Government (ICTBIG), Indore, India, 13–14 December 2024; pp. 1–7. [Google Scholar] [CrossRef]
- Sunori, S.; Sumithra, S. Enhancing Handwritten Text Identification through a Hybrid CNN-RNN Method. In Proceedings of the 3rd International Conference on Optimization Techniques in the Field of Engineering (ICOFE-2024), Tamil Nadu, India, 11–12 October 2024; pp. 1–13. [Google Scholar] [CrossRef]
- Kolhe, P.S. Various Approaches of Convolutional Neural Network-Based Recognition of Handwritten Devanagari Characters. In Proceedings of the 2023 3rd International Conference on Innovative Sustainable Computational Technologies (CISCT), Dehradun, India, 8–9 September 2023; pp. 1–4. [Google Scholar] [CrossRef]
- Ahmed, S.; Mehmood, Z.; Awan, I.; Yousaf, R. A Novel Technique for Handwritten Digit Recognition Using Deep Learning. J. Sens. 2023, 2023, 2753941. [Google Scholar] [CrossRef]
- Mohamed, N.; Josphineleela, R.; Madkar, S.; Sena, J.; Alfurhood, B.; Pant, B. The Smart Handwritten Digits Recognition Using Machine Learning Algorithm. In Proceedings of the 2023 3rd International Conference on Advance Computing and Innovative Technologies in Engineering (ICACITE), Greater Noida, India, 12–13 May 2023; pp. 340–344. [Google Scholar] [CrossRef]
- Tang, J.; Guo, H.; Wu, J.; Yin, F.; Huang, L. Offline Handwritten Mathematical Expression Recognition with Graph Encoder and Transformer Decoder. Pattern Recognit. 2024, 148, 110155. [Google Scholar] [CrossRef]
- Zhang, H.; Song, H.; Li, S.; Zhou, M.; Song, D. A Survey of Controllable Text Generation Using Transformer-based Pre-trained Language Models. ACM Comput. Surv. 2023, 56, 1–37. [Google Scholar] [CrossRef]
- Zhang, J.; Du, J.; Yang, Y.; Song, Y.Z.; Wei, S.; Dai, L. A Tree-Structured Decoder for Image-to-Markup Generation. In Proceedings of the 37th International Conference on Machine Learning. PMLR, Virtual, 13–18 July 2020; Proceedings of Machine Learning Research. pp. 11076–11085. [Google Scholar]
- Zhu, J.; Zhao, W.; Li, Y.; Hu, X.; Gao, L. TAMER: Tree-Aware Transformer for Handwritten Mathematical Expression Recognition. Proc. AAAI Conf. Artif. Intell. 2025, 39, 10950–10958. [Google Scholar] [CrossRef]
- Wu, J.W.; Yin, F.; Zhang, Y.M.; Zhang, X.Y.; Liu, C.L. Graph-to-Graph: Towards Accurate and Interpretable Online Handwritten Mathematical Expression Recognition. Proc. AAAI Conf. Artif. Intell. 2021, 35, 2925–2933. [Google Scholar] [CrossRef]
- Li, B.; Yuan, Y.; Liang, D.; Liu, X.; Ji, Z.; Bai, J.; Liu, W.; Bai, X. When Counting Meets HMER: Counting-Aware Network for Handwritten Mathematical Expression Recognition. In Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel, 23–27 October 2022; Springer Nature: Cham, Switzerland, 2022; pp. 197–214. [Google Scholar] [CrossRef]
- Yuan, Y.; Liu, X.; Dikubab, W.; Liu, H.; Ji, Z.; Wu, Z.; Bai, X. Syntax-Aware Network for Handwritten Mathematical Expression Recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 4553–4562. [Google Scholar] [CrossRef]
- Zhao, W.; Gao, L.; Yan, Z.; Peng, S.; Du, L.; Zhang, Z. Handwritten Mathematical Expression Recognition with Bidirectionally Trained Transformer. In Proceedings of the International Conference on Document Analysis and Recognition, Lausanne, Switzerland, 5–10 September 2021; Springer International Publishing: Cham, Switzerland, 2021; pp. 570–584. [Google Scholar] [CrossRef]
- Bian, X.; Qin, B.; Xin, X.; Li, J.; Su, X.; Wang, Y. Handwritten Mathematical Expression Recognition via Attention Aggregation Based Bi-Directional Mutual Learning. Proc. AAAI Conf. Artif. Intell. 2022, 36, 113–121. [Google Scholar] [CrossRef]
Name | Configuration |
---|---|
Operating system | Windows 11 |
CPU | Intel (R) Xeon (R) Platinum 8352 V |
GPU | NVIDIA GeForce RTX 4090 (24 GB) |
CUDA | 11.8 |
Python | Python 3.10 |
PyTorch | PyTorch 2.1.2 |
Name | Configuration |
---|---|
Image Size | |
Weight Decay | 0.005 |
Batch Size | 32 |
Learning Rate | 0.01 |
Number of iterations | 200 |
Model | Character Recognition Accuracy |
---|---|
CRNN-CTC | 80.53% |
Transformer | 82.82% |
MHTR (Proposed) | 86.63% |
Model | Character Recognition Accuracy |
---|---|
CRNN | 84.63% |
SATR | 85.32% |
MFCNN | 85.58% |
MHTR (Proposed) | 86.63% |
Optimization Model (Proposed) | 88.64% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Shi, H.; Zhu, Z.; Zhang, C.; Feng, X.; Wang, Y. Multimodal Handwritten Exam Text Recognition Based on Deep Learning. Appl. Sci. 2025, 15, 8881. https://doi.org/10.3390/app15168881
Shi H, Zhu Z, Zhang C, Feng X, Wang Y. Multimodal Handwritten Exam Text Recognition Based on Deep Learning. Applied Sciences. 2025; 15(16):8881. https://doi.org/10.3390/app15168881
Chicago/Turabian StyleShi, Hua, Zhenhui Zhu, Chenxue Zhang, Xiaozhou Feng, and Yonghang Wang. 2025. "Multimodal Handwritten Exam Text Recognition Based on Deep Learning" Applied Sciences 15, no. 16: 8881. https://doi.org/10.3390/app15168881
APA StyleShi, H., Zhu, Z., Zhang, C., Feng, X., & Wang, Y. (2025). Multimodal Handwritten Exam Text Recognition Based on Deep Learning. Applied Sciences, 15(16), 8881. https://doi.org/10.3390/app15168881