MDPI - Publisher of Open Access Journals

15 pages, 2124 KiB

Open AccessArticle

Toward Building a Domain-Based Dataset for Arabic Handwritten Text Recognition

by Khawlah Alhefdhi, Abdulmalik Alsalman and Safi Faizullah

Electronics 2025, 14(12), 2461; https://doi.org/10.3390/electronics14122461 - 17 Jun 2025

Viewed by 446

The problem of automatic recognition of handwritten text has recently been widely discussed in the research community. Handwritten text recognition is considered a challenging task for cursive scripts, such as Arabic-language scripts, due to their complex properties. Although the demand for automatic text [...] Read more.

The problem of automatic recognition of handwritten text has recently been widely discussed in the research community. Handwritten text recognition is considered a challenging task for cursive scripts, such as Arabic-language scripts, due to their complex properties. Although the demand for automatic text recognition is growing, especially to assist in digitizing archival documents, limited datasets are available for Arabic handwritten text compared to other languages. In this paper, we present novel work on building the Real Estate and Judicial Documents dataset (REJD dataset), which aims to facilitate the recognition of Arabic text in millions of archived documents. This paper also discusses the use of Optical Character Recognition and deep learning techniques, aiming to serve as the initial version in a series of experiments and enhancements designed to achieve optimal results. Full article

► Show Figures

Figure 1

16 pages, 3640 KiB

Open AccessArticle

Advanced Dense Text Detection in Graded Examinations Leveraging Chinese Character Components

by Renyuan Liu, Yunyu Shi, Xian Tang and Xiang Liu

Appl. Sci. 2025, 15(4), 1818; https://doi.org/10.3390/app15041818 - 10 Feb 2025

Viewed by 1068

Abstract

The dense text detection and segmentation of Chinese characters has always been a research hotspot due to the complex background and diverse scenarios. In the field of education, the detection of handwritten Chinese characters is affected by background noise, texture interference, etc. Especially [...] Read more.

The dense text detection and segmentation of Chinese characters has always been a research hotspot due to the complex background and diverse scenarios. In the field of education, the detection of handwritten Chinese characters is affected by background noise, texture interference, etc. Especially in low-quality handwritten text, character overlap or occlusion makes the character boundaries blurred, which increases the difficulty of detection and segmentation; In this paper, an improved EAST network CEE (Components-ECA-EAST Network), which fuses the attention mechanism with the feature pyramid structure, is proposed based on the analysis of the structure of Chinese character mini-components. The ECA (Efficient Channel Attention) attention mechanism is incorporated during the feature extraction phase; in the feature fusion stage, the convolutional features are extracted from the self-constructed mini-component dataset and then fused with the feature pyramid in a cascade manner, and finally, Dice Loss is used as the regression task loss function. The above improvements comprehensively improve the performance of the network in detecting and segmenting the mini-components and subtle strokes of handwritten Chinese characters; The CEE model was tested on the self-constructed dataset with an accuracy of 84.6% and a mini-component mAP of 77.6%, which is an improvement of 7.4% and 8.4%, respectively, compared to the original model; The constructed dataset and improved model are well suited for applications such as writing grade examinations, and represent an important exploration of the development of educational intelligence. Full article

(This article belongs to the Special Issue Artificial Intelligence Technologies for Education: Advancements, Challenges, and Impacts)

► Show Figures

Figure 1

18 pages, 18456 KiB

Open AccessArticle

iForal: Automated Handwritten Text Transcription for Historical Medieval Manuscripts

by Alexandre Matos, Pedro Almeida, Paulo L. Correia and Osvaldo Pacheco

J. Imaging 2025, 11(2), 36; https://doi.org/10.3390/jimaging11020036 - 25 Jan 2025

Cited by 1 | Viewed by 1662

Abstract

The transcription of historical manuscripts aims at making our cultural heritage more accessible to experts and also to the larger public, but it is a challenging and time-intensive task. This paper contributes an automated solution for text layout recognition, segmentation, and recognition to [...] Read more.

The transcription of historical manuscripts aims at making our cultural heritage more accessible to experts and also to the larger public, but it is a challenging and time-intensive task. This paper contributes an automated solution for text layout recognition, segmentation, and recognition to speed up the transcription process of historical manuscripts. The focus is on transcribing Portuguese municipal documents from the Middle Ages in the context of the iForal project, including the contribution of an annotated dataset containing Portuguese medieval documents, notably a corpus of 67 Portuguese royal charter data. The proposed system can accurately identify document layouts, isolate the text, segment, and transcribe it. Results for the layout recognition model achieved 0.98 mAP@0.50 and 0.98 precision, while the text segmentation model achieved 0.91 mAP@0.50, detecting 95% of the lines. The text recognition model achieved 8.1% character error rate (CER) and 25.5% word error rate (WER) on the test set. These results can then be validated by palaeographers with less effort, contributing to achieving high-quality transcriptions faster. Moreover, the automatic models developed can be utilized as a basis for the creation of models that perform well for other historical handwriting styles, notably using transfer learning techniques. The contributed dataset has been made available on the HTR United catalogue, which includes training datasets to be used for automatic transcription or segmentation models. The models developed can be used, for instance, on the eSriptorium platform, which is used by a vast community of experts. Full article

(This article belongs to the Section Document Analysis and Processing)

► Show Figures

Figure 1

17 pages, 3986 KiB

Open AccessArticle

Efficient Image Inpainting for Handwritten Text Removal Using CycleGAN Framework

by Somanka Maiti, Shabari Nath Panuganti, Gaurav Bhatnagar and Jonathan Wu

Mathematics 2025, 13(1), 176; https://doi.org/10.3390/math13010176 - 6 Jan 2025

Viewed by 2027

Abstract

With the recent rise in the development of deep learning techniques, image inpainting—the process of restoring missing or corrupted regions in images—has witnessed significant advancements. Although state-of-the-art models are effective, they often fail to inpaint complex missing areas, especially when handwritten occlusions are [...] Read more.

With the recent rise in the development of deep learning techniques, image inpainting—the process of restoring missing or corrupted regions in images—has witnessed significant advancements. Although state-of-the-art models are effective, they often fail to inpaint complex missing areas, especially when handwritten occlusions are present in the image. To address this issue, an image inpainting model based on a residual CycleGAN is proposed. The generator takes as input the image occluded by handwritten missing patches and generates a restored image, which the discriminator then compares with the original ground truth image to determine whether it is real or fake. An adversarial trade-off between the generator and discriminator motivates the model to improve its training and produce a superior reconstructed image. Extensive experiments and analyses confirm that the proposed method generates inpainted images with superior visual quality and outperforms state-of-the-art deep learning approaches. Full article

(This article belongs to the Special Issue New Trends in Computer Vision, Pattern Recognition and Machine Learning)

► Show Figures

Figure 1

18 pages, 4420 KiB

Open AccessArticle

Machine Learning Approach for Arabic Handwritten Recognition

by A. M. Mutawa, Mohammad Y. Allaho and Monirah Al-Hajeri

Appl. Sci. 2024, 14(19), 9020; https://doi.org/10.3390/app14199020 - 6 Oct 2024

Cited by 2 | Viewed by 3745

Abstract

Text recognition is an important area of the pattern recognition field. Natural language processing (NLP) and pattern recognition have been utilized efficiently in script recognition. Much research has been conducted on handwritten script recognition. However, the research on the Arabic language for handwritten [...] Read more.

Text recognition is an important area of the pattern recognition field. Natural language processing (NLP) and pattern recognition have been utilized efficiently in script recognition. Much research has been conducted on handwritten script recognition. However, the research on the Arabic language for handwritten text recognition received little attention compared with other languages. Therefore, it is crucial to develop a new model that can recognize Arabic handwritten text. Most of the existing models used to acknowledge Arabic text are based on traditional machine learning techniques. Therefore, we implemented a new model using deep machine learning techniques by integrating two deep neural networks. In the new model, the architecture of the Residual Network (ResNet) model is used to extract features from raw images. Then, the Bidirectional Long Short-Term Memory (BiLSTM) and connectionist temporal classification (CTC) are used for sequence modeling. Our system improved the recognition rate of Arabic handwritten text compared to other models of a similar type with a character error rate of 13.2% and word error rate of 27.31%. In conclusion, the domain of Arabic handwritten recognition is advancing swiftly with the use of sophisticated deep learning methods. Full article

(This article belongs to the Special Issue Applied Intelligence in Natural Language Processing)

► Show Figures

Figure 1

13 pages, 1585 KiB

Open AccessArticle

Analyzing Arabic Handwriting Style through Hand Kinematics

by Vahan Babushkin, Haneen Alsuradi, Muhamed Osman Al-Khalil and Mohamad Eid

Sensors 2024, 24(19), 6357; https://doi.org/10.3390/s24196357 - 30 Sep 2024

Cited by 3 | Viewed by 1877

Abstract

Handwriting style is an important aspect affecting the quality of handwriting. Adhering to one style is crucial for languages that follow cursive orthography and possess multiple handwriting styles, such as Arabic. The majority of available studies analyze Arabic handwriting style from static documents, [...] Read more.

Handwriting style is an important aspect affecting the quality of handwriting. Adhering to one style is crucial for languages that follow cursive orthography and possess multiple handwriting styles, such as Arabic. The majority of available studies analyze Arabic handwriting style from static documents, focusing only on pure styles. In this study, we analyze handwriting samples with mixed styles, pure styles (Ruq’ah and Naskh), and samples without a specific style from dynamic features of the stylus and hand kinematics. We propose a model for classifying handwritten samples into four classes based on adherence to style. The stylus and hand kinematics data were collected from 50 participants who were writing an Arabic text containing all 28 letters and covering most Arabic orthography. The parameter search was conducted to find the best hyperparameters for the model, the optimal sliding window length, and the overlap. The proposed model for style classification achieves an accuracy of 88%. The explainability analysis with Shapley values revealed that hand speed, pressure, and pen slant are among the top 12 important features, with other features contributing nearly equally to style classification. Finally, we explore which features are important for Arabic handwriting style detection. Full article

(This article belongs to the Special Issue Sensor-Based Behavioral Biometrics)

► Show Figures

Figure 1

19 pages, 3640 KiB

Open AccessArticle

Recognition of Chinese Electronic Medical Records for Rehabilitation Robots: Information Fusion Classification Strategy

by Jiawei Chu, Xiu Kan, Yan Che, Wanqing Song, Kudreyko Aleksey and Zhengyuan Dong

Sensors 2024, 24(17), 5624; https://doi.org/10.3390/s24175624 - 30 Aug 2024

Viewed by 1812

Abstract

Named entity recognition is a critical task in the electronic medical record management system for rehabilitation robots. Handwritten documents often contain spelling errors and illegible handwriting, and healthcare professionals frequently use different terminologies. These issues adversely affect the robot’s judgment and precise operations. [...] Read more.

Named entity recognition is a critical task in the electronic medical record management system for rehabilitation robots. Handwritten documents often contain spelling errors and illegible handwriting, and healthcare professionals frequently use different terminologies. These issues adversely affect the robot’s judgment and precise operations. Additionally, the same entity can have different meanings in various contexts, leading to category inconsistencies, which further increase the system’s complexity. To address these challenges, a novel medical entity recognition algorithm for Chinese electronic medical records is developed to enhance the processing and understanding capabilities of rehabilitation robots for patient data. This algorithm is based on a fusion classification strategy. Specifically, a preprocessing strategy is proposed according to clinical medical knowledge, which includes redefining entities, removing outliers, and eliminating invalid characters. Subsequently, a medical entity recognition model is developed to identify Chinese electronic medical records, thereby enhancing the data analysis capabilities of rehabilitation robots. To extract semantic information, the ALBERT network is utilized, and BILSTM and MHA networks are combined to capture the dependency relationships between words, overcoming the problem of different meanings for the same entity in different contexts. The CRF network is employed to determine the boundaries of different entities. The research results indicate that the proposed model significantly enhances the recognition accuracy of electronic medical texts by rehabilitation robots, particularly in accurately identifying entities and handling terminology diversity and contextual differences. This model effectively addresses the key challenges faced by rehabilitation robots in processing Chinese electronic medical texts, and holds important theoretical and practical value. Full article

(This article belongs to the Special Issue Dynamics and Control System Design for Robot Manipulation)

► Show Figures

Figure 1

16 pages, 5331 KiB

Open AccessArticle

A Gateway API-Based Data Fusion Architecture for Automated User Interaction with Historical Handwritten Manuscripts

by Christos Spandonidis, Fotis Giannopoulos and Kyriakoula Arvaniti

Heritage 2024, 7(9), 4631-4646; https://doi.org/10.3390/heritage7090218 - 27 Aug 2024

Viewed by 1227

Abstract

To preserve handwritten historical documents, libraries are choosing to digitize them, ensuring their longevity and accessibility. However, the true value of these digitized images lies in their transcription into a textual format. In recent years, various tools have been developed utilizing both traditional [...] Read more.

To preserve handwritten historical documents, libraries are choosing to digitize them, ensuring their longevity and accessibility. However, the true value of these digitized images lies in their transcription into a textual format. In recent years, various tools have been developed utilizing both traditional and AI-based models to address the challenges of deciphering handwritten texts. Despite their importance, there are still several obstacles to overcome, such as the need for scalable and modular solutions, as well as the ability to cater to a continuously growing user community autonomously. This study focuses on introducing a new information fusion architecture, specifically highlighting the Gateway API. Developed as part of the μDoc.tS research program, this architecture aims to convert digital images of manuscripts into electronic text, ensuring secure and efficient routing of requests from front-end applications to the back end of the information system. The validation of this architecture demonstrates its efficiency in handling a large volume of requests and effectively distributing the workload. One significant advantage of this proposed method is its compatibility with everyday devices, eliminating the need for extensive computational infrastructures. It is believed that the scalability and modularity of this architecture can pave the way for a unified multi-platform solution, connecting diverse user environments and databases. Full article

► Show Figures

Figure 1

26 pages, 12966 KiB

Open AccessArticle

Optical Medieval Music Recognition—A Complete Pipeline for Historic Chants

by Alexander Hartelt, Tim Eipert and Frank Puppe

Appl. Sci. 2024, 14(16), 7355; https://doi.org/10.3390/app14167355 - 20 Aug 2024

Cited by 2 | Viewed by 1354

Abstract

Manual transcription of music is a tedious work, which can be greatly facilitated by optical music recognition (OMR) software. However, OMR software is error prone in particular for older handwritten documents. This paper introduces and evaluates a pipeline that automates the entire OMR [...] Read more.

Manual transcription of music is a tedious work, which can be greatly facilitated by optical music recognition (OMR) software. However, OMR software is error prone in particular for older handwritten documents. This paper introduces and evaluates a pipeline that automates the entire OMR workflow in the context of the Corpus Monodicum project, enabling the transcription of historical chants. In addition to typical OMR tasks such as staff line detection, layout detection, and symbol recognition, the rarely addressed tasks of text and syllable recognition and assignment of syllables to symbols are tackled. For quantitative and qualitative evaluation, we use documents written in square notation developed in the 11th–12th century, but the methods apply to many other notations as well. Quantitative evaluation measures the number of necessary interventions for correction, which are about 0.4% for layout recognition including the division of text in chants, 2.4% for symbol recognition including pitch and reading order and 2.3% for syllable alignment with correct text and symbols. Qualitative evaluation showed an efficiency gain compared to manual transcription with an elaborate tool by a factor of about 9. In a second use case with printed chants in similar notation from the “Graduale Synopticum”, the evaluation results for symbols are much better except for syllable alignment indicating the difficulty of this task. Full article

(This article belongs to the Special Issue Machine Learning in Audio Signal Processing and Music Information Retrieval)

► Show Figures

Figure 1

15 pages, 5521 KiB

Open AccessArticle

A Historical Handwritten French Manuscripts Text Detection Method in Full Pages

by Rui Sang, Shili Zhao, Yan Meng, Mingxian Zhang, Xuefei Li, Huijie Xia and Ran Zhao

Information 2024, 15(8), 483; https://doi.org/10.3390/info15080483 - 14 Aug 2024

Viewed by 1419

Abstract

Historical handwritten manuscripts pose challenges to automated recognition techniques due to their unique handwriting styles and cultural backgrounds. In order to solve the problems of complex text word misdetection, omission, and insufficient detection of wide-pitch curved text, this study proposes a high-precision text [...] Read more.

Historical handwritten manuscripts pose challenges to automated recognition techniques due to their unique handwriting styles and cultural backgrounds. In order to solve the problems of complex text word misdetection, omission, and insufficient detection of wide-pitch curved text, this study proposes a high-precision text detection method based on improved YOLOv8s. Firstly, the Swin Transformer is used to replace C2f at the end of the backbone network to solve the shortcomings of fine-grained information loss and insufficient learning features in text word detection. Secondly, the Dysample (Dynamic Upsampling Operator) method is used to retain more detailed features of the target and overcome the shortcomings of information loss in traditional upsampling to realize the text detection task for dense targets. Then, the LSK (Large Selective Kernel) module is added to the detection head to dynamically adjust the feature extraction receptive field, which solves the cases of extreme aspect ratio words, unfocused small text, and complex shape text in text detection. Finally, in order to overcome the CIOU (Complete Intersection Over Union) loss in target box regression with unclear aspect ratio, insensitive to size change, and insufficient correlation between target coordinates, Gaussian Wasserstein Distance (GWD) is introduced to modify the regression loss to measure the similarity between the two bounding boxes in order to obtain high-quality bounding boxes. Compared with the State-of-the-Art methods, the proposed method achieves optimal performance in text detection, with the precision and mAP@0.5 reaching 86.3% and 82.4%, which are 8.1% and 6.7% higher than the original method, respectively. The advancement of each module is verified by ablation experiments. The experimental results show that the method proposed in this study can effectively realize complex text detection and provide a powerful technical means for historical manuscript reproduction. Full article

► Show Figures

Figure 1

15 pages, 529 KiB

Open AccessArticle

A Pix2Pix Architecture for Complete Offline Handwritten Text Normalization

by Alvaro Barreiro-Garrido, Victoria Ruiz-Parrado, A. Belen Moreno and Jose F. Velez

Sensors 2024, 24(12), 3892; https://doi.org/10.3390/s24123892 - 16 Jun 2024

Cited by 2 | Viewed by 1980

Abstract

In the realm of offline handwritten text recognition, numerous normalization algorithms have been developed over the years to serve as preprocessing steps prior to applying automatic recognition models to handwritten text scanned images. These algorithms have demonstrated effectiveness in enhancing the overall performance [...] Read more.

In the realm of offline handwritten text recognition, numerous normalization algorithms have been developed over the years to serve as preprocessing steps prior to applying automatic recognition models to handwritten text scanned images. These algorithms have demonstrated effectiveness in enhancing the overall performance of recognition architectures. However, many of these methods rely heavily on heuristic strategies that are not seamlessly integrated with the recognition architecture itself. This paper introduces the use of a Pix2Pix trainable model, a specific type of conditional generative adversarial network, as the method to normalize handwritten text images. Also, this algorithm can be seamlessly integrated as the initial stage of any deep learning architecture designed for handwritten recognition tasks. All of this facilitates training the normalization and recognition components as a unified whole, while still maintaining some interpretability of each module. Our proposed normalization approach learns from a blend of heuristic transformations applied to text images, aiming to mitigate the impact of intra-personal handwriting variability among different writers. As a result, it achieves slope and slant normalizations, alongside other conventional preprocessing objectives, such as normalizing the size of text ascenders and descenders. We will demonstrate that the proposed architecture replicates, and in certain cases surpasses, the results of a widely used heuristic algorithm across two metrics and when integrated as the first step of a deep recognition architecture. Full article

(This article belongs to the Special Issue Object Recognition with Vision Sensors Based on Machine Learning and Deep Learning)

► Show Figures

Figure 1

18 pages, 2977 KiB

Open AccessArticle

CNN-Based Multi-Factor Authentication System for Mobile Devices Using Faces and Passwords

by Jinho Han

Appl. Sci. 2024, 14(12), 5019; https://doi.org/10.3390/app14125019 - 8 Jun 2024

Cited by 3 | Viewed by 2850

Abstract

Multi-factor authentication (MFA) is a system for authenticating an individual’s identity using two or more pieces of data (known as factors). The reason for using more than two factors is to further strengthen security through the use of additional data for identity authentication. [...] Read more.

Multi-factor authentication (MFA) is a system for authenticating an individual’s identity using two or more pieces of data (known as factors). The reason for using more than two factors is to further strengthen security through the use of additional data for identity authentication. Sequential MFA requires a number of steps to be followed in sequence for authentication; for example, with three factors, the system requires three authentication steps. In this case, to proceed with MFA using a deep learning approach, three artificial neural networks (ANNs) are needed. In contrast, in parallel MFA, the authentication steps are processed simultaneously. This means that processing is possible with only one ANN. A convolutional neural network (CNN) is a method for learning images through the use of convolutional layers, and researchers have proposed several systems for MFA using CNNs in which various modalities have been employed, such as images, handwritten text for authentication, and multi-image data for machine learning of facial emotion. This study proposes a CNN-based parallel MFA system that uses concatenation. The three factors used for learning are a face image, an image converted from a password, and a specific image designated by the user. In addition, a secure password image is created at different bit-positions, enabling the user to securely hide their password information. Furthermore, users designate a specific image other than their face as an auxiliary image, which could be a photo of their pet dog or favorite fruit, or an image of one of their possessions, such as a car. In this way, authentication is rendered possible through learning the three factors—that is, the face, password, and specific auxiliary image—using the CNN. The contribution that this study makes to the existing body of knowledge is demonstrating that the development of an MFA system using a lightweight, mobile, multi-factor CNN (MMCNN), which can even be used in mobile devices due to its low number of parameters, is possible. Furthermore, an algorithm that can securely transform a text password into an image is proposed, and it is demonstrated that the three considered factors have the same weight of information for authentication based on the false acceptance rate (FAR) values experimentally obtained with the proposed system. Full article

(This article belongs to the Special Issue Integrating Artificial Intelligence in Renewable Energy Systems)

► Show Figures

Figure 1

18 pages, 24722 KiB

Open AccessArticle

Historical Text Line Segmentation Using Deep Learning Algorithms: Mask-RCNN against U-Net Networks

by Florian Côme Fizaine, Patrick Bard, Michel Paindavoine, Cécile Robin, Edouard Bouyé, Raphaël Lefèvre and Annie Vinter

J. Imaging 2024, 10(3), 65; https://doi.org/10.3390/jimaging10030065 - 5 Mar 2024

Cited by 5 | Viewed by 3985

Abstract

Text line segmentation is a necessary preliminary step before most text transcription algorithms are applied. The leading deep learning networks used in this context (ARU-Net, dhSegment, and Doc-UFCN) are based on the U-Net architecture. They are efficient, but fall under the same concept, [...] Read more.

Text line segmentation is a necessary preliminary step before most text transcription algorithms are applied. The leading deep learning networks used in this context (ARU-Net, dhSegment, and Doc-UFCN) are based on the U-Net architecture. They are efficient, but fall under the same concept, requiring a post-processing step to perform instance (e.g., text line) segmentation. In the present work, we test the advantages of Mask-RCNN, which is designed to perform instance segmentation directly. This work is the first to directly compare Mask-RCNN- and U-Net-based networks on text segmentation of historical documents, showing the superiority of the former over the latter. Three studies were conducted, one comparing these networks on different historical databases, another comparing Mask-RCNN with Doc-UFCN on a private historical database, and a third comparing the handwritten text recognition (HTR) performance of the tested networks. The results showed that Mask-RCNN outperformed ARU-Net, dhSegment, and Doc-UFCN using relevant line segmentation metrics, that performance evaluation should not focus on the raw masks generated by the networks, that a light mask processing is an efficient and simple solution to improve evaluation, and that Mask-RCNN leads to better HTR performance. Full article

(This article belongs to the Section Document Analysis and Processing)

► Show Figures

Figure 1

18 pages, 3564 KiB

Open AccessArticle

Offline Mongolian Handwriting Recognition Based on Data Augmentation and Improved ECA-Net

by Qing-Dao-Er-Ji Ren, Lele Wang, Zerui Ma and Saheya Barintag

Electronics 2024, 13(5), 835; https://doi.org/10.3390/electronics13050835 - 21 Feb 2024

Cited by 3 | Viewed by 1607

Abstract

Writing is an important carrier of cultural inheritance, and the digitization of handwritten texts is an effective means to protect national culture. Compared to Chinese and English handwriting recognition, the research on Mongolian handwriting recognition started relatively late and achieved few results due [...] Read more.

Writing is an important carrier of cultural inheritance, and the digitization of handwritten texts is an effective means to protect national culture. Compared to Chinese and English handwriting recognition, the research on Mongolian handwriting recognition started relatively late and achieved few results due to the characteristics of the script itself and the lack of corpus. First, according to the characteristics of Mongolian handwritten characters, the random erasing data augmentation algorithm was modified, and a dual data augmentation (DDA) algorithm was proposed by combining the improved algorithm with horizontal wave transformation (HWT) to augment the dataset for training the Mongolian handwriting recognition. Second, the classical CRNN handwriting recognition model was improved. The structure of the encoder and decoder was adjusted according to the characteristics of the Mongolian script, and the attention mechanism was introduced in the feature extraction and decoding stages of the model. An improved handwriting recognition model, named the EGA model, suitable for the features of Mongolian handwriting was suggested. Finally, the effectiveness of the EGA model was verified by a large number of data tests. Experimental results demonstrated that the proposed EGA model improves the recognition accuracy of Mongolian handwriting, and the structural modification of the encoder and coder effectively balances the recognition accuracy and complexity of the model. Full article

(This article belongs to the Special Issue Deep Learning in Image Processing and Pattern Recognition)

► Show Figures

Figure 1

30 pages, 3035 KiB

Open AccessReview

Advancements and Challenges in Handwritten Text Recognition: A Comprehensive Survey

by Wissam AlKendi, Franck Gechter, Laurent Heyberger and Christophe Guyeux

J. Imaging 2024, 10(1), 18; https://doi.org/10.3390/jimaging10010018 - 8 Jan 2024

Cited by 24 | Viewed by 12262

Abstract

Handwritten Text Recognition (HTR) is essential for digitizing historical documents in different kinds of archives. In this study, we introduce a hybrid form archive written in French: the Belfort civil registers of births. The digitization of these historical documents is challenging due to [...] Read more.

Handwritten Text Recognition (HTR) is essential for digitizing historical documents in different kinds of archives. In this study, we introduce a hybrid form archive written in French: the Belfort civil registers of births. The digitization of these historical documents is challenging due to their unique characteristics such as writing style variations, overlapped characters and words, and marginal annotations. The objective of this survey paper is to summarize research on handwritten text documents and provide research directions toward effectively transcribing this French dataset. To achieve this goal, we presented a brief survey of several modern and historical HTR offline systems of different international languages, and the top state-of-the-art contributions reported of the French language specifically. The survey classifies the HTR systems based on techniques employed, datasets used, publication years, and the level of recognition. Furthermore, an analysis of the systems’ accuracies is presented, highlighting the best-performing approach. We have also showcased the performance of some HTR commercial systems. In addition, this paper presents a summarization of the HTR datasets that publicly available, especially those identified as benchmark datasets in the International Conference on Document Analysis and Recognition (ICDAR) and the International Conference on Frontiers in Handwriting Recognition (ICFHR) competitions. This paper, therefore, presents updated state-of-the-art research in HTR and highlights new directions in the research field. Full article

(This article belongs to the Section Computer Vision and Pattern Recognition)

► Show Figures

Figure 1

Search Results (69)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (69)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI