MDPI - Publisher of Open Access Journals

28 pages, 2784 KB

Open AccessArticle

A Statistically Validated and Decoding-Aware CNN–Transformer–CTC Framework for Multi-Font Printed Arabic Word Recognition

by Abderrahime Tabzaoui and Loqman Chakir

Appl. Sci. 2026, 16(9), 4071; https://doi.org/10.3390/app16094071 - 22 Apr 2026

Viewed by 328

Abstract

Printed Arabic Optical Character Recognition (OCR) remains challenging due to complex glyph morphology, typographic variability, and sensitivity to Unicode-preserved evaluation protocols. This work introduces a methodology that explicitly treats decoding strategy and orthographic normalization as primary experimental variables in multi-font Arabic OCR evaluation. [...] Read more.

Printed Arabic Optical Character Recognition (OCR) remains challenging due to complex glyph morphology, typographic variability, and sensitivity to Unicode-preserved evaluation protocols. This work introduces a methodology that explicitly treats decoding strategy and orthographic normalization as primary experimental variables in multi-font Arabic OCR evaluation. A CNN–Transformer encoder trained with Connectionist Temporal Classification (CTC) is employed as a controlled backbone to isolate the effects of inference configuration and text normalization. Through systematic analysis on the APTI benchmark, we demonstrate that decoding policy and diacritic handling significantly influence reported recognition performance. In particular, language-model-guided decoding yields substantial improvements over greedy decoding, while Unicode-preserved evaluation introduces systematic orthographic inflation driven by deterministic diacritic mismatch. These effects are further amplified by strong cross-font variability. The proposed normalization-aware evaluation framework disentangles structural recognition errors from protocol-induced artifacts, providing a more controlled and reproducible basis for Arabic OCR benchmarking. Full article

(This article belongs to the Special Issue Applied Computer Vision and Deep Learning)

► Show Figures

Figure 1

15 pages, 2124 KB

Open AccessArticle

Toward Building a Domain-Based Dataset for Arabic Handwritten Text Recognition

by Khawlah Alhefdhi, Abdulmalik Alsalman and Safi Faizullah

Electronics 2025, 14(12), 2461; https://doi.org/10.3390/electronics14122461 - 17 Jun 2025

Viewed by 3879

Abstract

The problem of automatic recognition of handwritten text has recently been widely discussed in the research community. Handwritten text recognition is considered a challenging task for cursive scripts, such as Arabic-language scripts, due to their complex properties. Although the demand for automatic text [...] Read more.

The problem of automatic recognition of handwritten text has recently been widely discussed in the research community. Handwritten text recognition is considered a challenging task for cursive scripts, such as Arabic-language scripts, due to their complex properties. Although the demand for automatic text recognition is growing, especially to assist in digitizing archival documents, limited datasets are available for Arabic handwritten text compared to other languages. In this paper, we present novel work on building the Real Estate and Judicial Documents dataset (REJD dataset), which aims to facilitate the recognition of Arabic text in millions of archived documents. This paper also discusses the use of Optical Character Recognition and deep learning techniques, aiming to serve as the initial version in a series of experiments and enhancements designed to achieve optimal results. Full article

► Show Figures

Figure 1

15 pages, 1403 KB

Open AccessEditor’s ChoiceArticle

BERTopic for Enhanced Idea Management and Topic Generation in Brainstorming Sessions

by Asma Cheddak, Tarek Ait Baha, Youssef Es-Saady, Mohamed El Hajji and Mohamed Baslam

Information 2024, 15(6), 365; https://doi.org/10.3390/info15060365 - 20 Jun 2024

Cited by 23 | Viewed by 12531

Abstract

Brainstorming is an important part of the design thinking process since it encourages creativity and innovation through bringing together diverse viewpoints. However, traditional brainstorming practices face challenges such as the management of large volumes of ideas. To address this issue, this paper introduces [...] Read more.

Brainstorming is an important part of the design thinking process since it encourages creativity and innovation through bringing together diverse viewpoints. However, traditional brainstorming practices face challenges such as the management of large volumes of ideas. To address this issue, this paper introduces a decision support system that employs the BERTopic model to automate the brainstorming process, which enhances the categorization of ideas and the generation of coherent topics from textual data. The dataset for our study was assembled from a brainstorming session on “scholar dropouts”, where ideas were captured on Post-it notes, digitized through an optical character recognition (OCR) model, and enhanced using data augmentation with a language model, GPT-3.5, to ensure robustness. To assess the performance of our system, we employed both quantitative and qualitative analyses. Quantitative evaluations were conducted independently across various parameters, while qualitative assessments focused on the relevance and alignment of keywords with human-classified topics during brainstorming sessions. Our findings demonstrate that BERTopic outperforms traditional LDA models in generating semantically coherent topics. These results demonstrate the usefulness of our system in managing the complex nature of Arabic language data and improving the efficiency of brainstorming sessions. Full article

(This article belongs to the Special Issue Artificial Intelligence (AI) for Economics and Business Management)

► Show Figures

Graphical abstract

24 pages, 4818 KB

Open AccessArticle

Recognition of Arabic Air-Written Letters: Machine Learning, Convolutional Neural Networks, and Optical Character Recognition (OCR) Techniques

by Khalid M. O. Nahar, Izzat Alsmadi, Rabia Emhamed Al Mamlook, Ahmad Nasayreh, Hasan Gharaibeh, Ali Saeed Almuflih and Fahad Alasim

Sensors 2023, 23(23), 9475; https://doi.org/10.3390/s23239475 - 28 Nov 2023

Cited by 35 | Viewed by 5769

Abstract

Air writing is one of the essential fields that the world is turning to, which can benefit from the world of the metaverse, as well as the ease of communication between humans and machines. The research literature on air writing and its applications [...] Read more.

Air writing is one of the essential fields that the world is turning to, which can benefit from the world of the metaverse, as well as the ease of communication between humans and machines. The research literature on air writing and its applications shows significant work in English and Chinese, while little research is conducted in other languages, such as Arabic. To fill this gap, we propose a hybrid model that combines feature extraction with deep learning models and then uses machine learning (ML) and optical character recognition (OCR) methods and applies grid and random search optimization algorithms to obtain the best model parameters and outcomes. Several machine learning methods (e.g., neural networks (NNs), random forest (RF), K-nearest neighbours (KNN), and support vector machine (SVM)) are applied to deep features extracted from deep convolutional neural networks (CNNs), such as VGG16, VGG19, and SqueezeNet. Our study uses the AHAWP dataset, which consists of diverse writing styles and hand sign variations, to train and evaluate the models. Prepossessing schemes are applied to improve data quality by reducing bias. Furthermore, OCR character (OCR) methods are integrated into our model to isolate individual letters from continuous air-written gestures and improve recognition results. The results of this study showed that the proposed model achieved the best accuracy of 88.8% using NN with VGG16. Full article

(This article belongs to the Section Optical Sensors)

► Show Figures

Figure 1

17 pages, 4419 KB

Open AccessArticle

A Deep Learning Approach for Arabic Manuscripts Classification

by Lutfieh S. Al-homed, Kamal M. Jambi and Hassanin M. Al-Barhamtoshy

Sensors 2023, 23(19), 8133; https://doi.org/10.3390/s23198133 - 28 Sep 2023

Cited by 11 | Viewed by 4890

Abstract

For centuries, libraries worldwide have preserved ancient manuscripts due to their immense historical and cultural value. However, over time, both natural and human-made factors have led to the degradation of many ancient Arabic manuscripts, causing the loss of significant information, such as authorship, [...] Read more.

For centuries, libraries worldwide have preserved ancient manuscripts due to their immense historical and cultural value. However, over time, both natural and human-made factors have led to the degradation of many ancient Arabic manuscripts, causing the loss of significant information, such as authorship, titles, or subjects, rendering them as unknown manuscripts. Although catalog cards attached to these manuscripts might contain some of the missing details, these cards have degraded significantly in quality over the decades within libraries. This paper presents a framework for identifying these unknown ancient Arabic manuscripts by processing the catalog cards associated with them. Given the challenges posed by the degradation of these cards, simple optical character recognition (OCR) is often insufficient. The proposed framework uses deep learning architecture to identify unknown manuscripts within a collection of ancient Arabic documents. This involves locating, extracting, and classifying the text from these catalog cards, along with implementing processes for region-of-interest identification, rotation correction, feature extraction, and classification. The results demonstrate the effectiveness of the proposed method, achieving an accuracy rate of 92.5%, compared to 83.5% with classical image classification and 81.5% with OCR alone. Full article

(This article belongs to the Section Physical Sensors)

► Show Figures

Figure 1

33 pages, 3518 KB

Open AccessReview

Analysis of Recent Deep Learning Techniques for Arabic Handwritten-Text OCR and Post-OCR Correction

by Rayyan Najam and Safiullah Faizullah

Appl. Sci. 2023, 13(13), 7568; https://doi.org/10.3390/app13137568 - 27 Jun 2023

Cited by 43 | Viewed by 14484

Abstract

Arabic handwritten-text recognition applies an OCR technique and then a text-correction technique to extract the text within an image correctly. Deep learning is a current paradigm utilized in OCR techniques. However, no study investigated or critically analyzed recent deep-learning techniques used for Arabic [...] Read more.

Arabic handwritten-text recognition applies an OCR technique and then a text-correction technique to extract the text within an image correctly. Deep learning is a current paradigm utilized in OCR techniques. However, no study investigated or critically analyzed recent deep-learning techniques used for Arabic handwritten OCR and text correction during the period of 2020–2023. This analysis fills this noticeable gap in the literature, uncovering recent developments and their limitations for researchers, practitioners, and interested readers. The results reveal that CNN-LSTM-CTC is the most suitable architecture among Transformer and GANs for OCR because it is less complex and can hold long textual dependencies. For OCR text correction, applying DL models to generated errors in datasets improved accuracy in many works. In conclusion, Arabic OCR has the potential to further apply several text-embedding models to correct the resultant text from the OCR, and there is a significant gap in studies investigating this problem. In addition, there is a need for more high-quality and domain-specific OCR Arabic handwritten datasets. Moreover, we recommend the practical development of a space for future trends in Arabic OCR applications, derived from current limitations in Arabic OCR works and from applications in other languages; this will involve a plethora of possibilities that have not been effectively researched at the time of writing. Full article

(This article belongs to the Special Issue Applications of Machine Learning to Image, Video, Text and Bioinformatic Analysis)

► Show Figures

Figure 1

27 pages, 2712 KB

Open AccessReview

A Survey of OCR in Arabic Language: Applications, Techniques, and Challenges

by Safiullah Faizullah, Muhammad Sohaib Ayub, Sajid Hussain and Muhammad Asad Khan

Appl. Sci. 2023, 13(7), 4584; https://doi.org/10.3390/app13074584 - 4 Apr 2023

Cited by 70 | Viewed by 24844

Abstract

Optical character recognition (OCR) is the process of extracting handwritten or printed text from a scanned or printed image and converting it to a machine-readable form for further data processing, such as searching or editing. Automatic text extraction using OCR helps to digitize [...] Read more.

Optical character recognition (OCR) is the process of extracting handwritten or printed text from a scanned or printed image and converting it to a machine-readable form for further data processing, such as searching or editing. Automatic text extraction using OCR helps to digitize documents for improved productivity and accessibility and for preservation of historical documents. This paper provides a survey of the current state-of-the-art applications, techniques, and challenges in Arabic OCR. We present the existing methods for each step of the complete OCR process to identify the best-performing approach for improved results. This paper follows the keyword-search method for reviewing the articles related to Arabic OCR, including the backward and forward citations of the article. In addition to state-of-art techniques, this paper identifies research gaps and presents future directions for Arabic OCR. Full article

(This article belongs to the Special Issue Digital Image Processing: Advanced Technologies and Applications)

► Show Figures

Graphical abstract

21 pages, 1415 KB

Open AccessArticle

Persian Optical Character Recognition Using Deep Bidirectional Long Short-Term Memory

by Zohreh Khosrobeigi, Hadi Veisi, Ehsan Hoseinzade and Hanieh Shabanian

Appl. Sci. 2022, 12(22), 11760; https://doi.org/10.3390/app122211760 - 19 Nov 2022

Cited by 15 | Viewed by 10636

Abstract

Optical Character Recognition (OCR) is a system of converting images, including text,into editable text and is applied to various languages such as English, Arabic, and Persian. While these languages have similarities, their fundamental differences can create unique challenges. In Persian, continuity between Characters, [...] Read more.

Optical Character Recognition (OCR) is a system of converting images, including text,into editable text and is applied to various languages such as English, Arabic, and Persian. While these languages have similarities, their fundamental differences can create unique challenges. In Persian, continuity between Characters, the existence of semicircles, dots, oblique, and left-to-right characters such as English words in the context are some of the most important challenges in designing Persian OCR systems. Our proposed framework, Bina, is designed in a special way to address the issue of continuity by utilizing Convolution Neural Network (CNN) and deep bidirectional Long-Short Term Memory (BLSTM), a type of LSTM networks that has access to both past and future context. A huge and diverse dataset, including about 2M samples of both Persian and English contexts,consisting of various fonts and sizes, is also generated to train and test the performance of the proposed model. Various configurations are tested to find the optimal structure of CNN and BLSTM. The results show that Bina successfully outperformed state of the art baseline algorithm by achieving about 96% accuracy in the Persian and 88% accuracy in the Persian and English contexts. Full article

(This article belongs to the Special Issue Computer Vision and Pattern Recognition Based on Deep Learning)

► Show Figures

Figure 1

20 pages, 2854 KB

Open AccessArticle

Novel Perspectives for the Management of Multilingual and Multialphabetic Heritages through Automatic Knowledge Extraction: The DigitalMaktaba Approach

by Sonia Bergamaschi, Stefania De Nardis, Riccardo Martoglia, Federico Ruozzi, Luca Sala, Matteo Vanzini and Riccardo Amerigo Vigliermo

Sensors 2022, 22(11), 3995; https://doi.org/10.3390/s22113995 - 25 May 2022

Cited by 19 | Viewed by 4418

Abstract

The linguistic and social impact of multiculturalism can no longer be neglected in any sector, creating the urgent need of creating systems and procedures for managing and sharing cultural heritages in both supranational and multi-literate contexts. In order to achieve this goal, text [...] Read more.

The linguistic and social impact of multiculturalism can no longer be neglected in any sector, creating the urgent need of creating systems and procedures for managing and sharing cultural heritages in both supranational and multi-literate contexts. In order to achieve this goal, text sensing appears to be one of the most crucial research areas. The long-term objective of the DigitalMaktaba project, born from interdisciplinary collaboration between computer scientists, historians, librarians, engineers and linguists, is to establish procedures for the creation, management and cataloguing of archival heritage in non-Latin alphabets. In this paper, we discuss the currently ongoing design of an innovative workflow and tool in the area of text sensing, for the automatic extraction of knowledge and cataloguing of documents written in non-Latin languages (Arabic, Persian and Azerbaijani). The current prototype leverages different OCR, text processing and information extraction techniques in order to provide both a highly accurate extracted text and rich metadata content (including automatically identified cataloguing metadata), overcoming typical limitations of current state of the art approaches. The initial tests provide promising results. The paper includes a discussion of future steps (e.g., AI-based techniques further leveraging the extracted data/metadata and making the system learn from user feedback) and of the many foreseen advantages of this research, both from a technical and a broader cultural-preservation and sharing point of view. Full article

(This article belongs to the Collection Sensors and Communications for the Social Good)

► Show Figures

Figure 1

20 pages, 6349 KB

Open AccessArticle

Exploiting Script Similarities to Compensate for the Large Amount of Data in Training Tesseract LSTM: Towards Kurdish OCR

by Saman Idrees and Hossein Hassani

Appl. Sci. 2021, 11(20), 9752; https://doi.org/10.3390/app11209752 - 19 Oct 2021

Cited by 6 | Viewed by 6204

Abstract

Applications based on Long-Short-Term Memory (LSTM) require large amounts of data for their training. Tesseract LSTM is a popular Optical Character Recognition (OCR) engine that has been trained and used in various languages. However, its training becomes obstructed when the target language is [...] Read more.

Applications based on Long-Short-Term Memory (LSTM) require large amounts of data for their training. Tesseract LSTM is a popular Optical Character Recognition (OCR) engine that has been trained and used in various languages. However, its training becomes obstructed when the target language is not resourceful. This research suggests a remedy for the problem of scant data in training Tesseract LSTM for a new language by exploiting a training dataset for a language with a similar script. The target of the experiment is Kurdish. It is a multi-dialect language and is considered less-resourced. We choose Sorani, one of the Kurdish dialects, that is mostly written in Persian-Arabic script. We train Tesseract using an Arabic dataset, and then we use a considerably small amount of texts in Persian-Arabic to train the engine to recognize Sorani texts. Our dataset is based on a series of court case documents in the Kurdistan Region of Iraq. We also fine-tune the engine using 10 Unikurd fonts. We use Lstmeval and Ocreval to evaluate the outputs. The result indicates the achievement of 95.45% accuracy. We also test the engine using texts outside the context of court cases. The accuracy of the system remains close to what was found earlier indicating that the script similarity could be used to overcome the lack of large-scale data. Full article

(This article belongs to the Section Computing and Artificial Intelligence)

► Show Figures

Figure 1

17 pages, 10112 KB

Open AccessArticle

CNN-Based Page Segmentation and Object Classification for Counting Population in Ottoman Archival Documentation

by Yekta Said Can and M. Erdem Kabadayı

J. Imaging 2020, 6(5), 32; https://doi.org/10.3390/jimaging6050032 - 14 May 2020

Cited by 14 | Viewed by 5248

Abstract

Historical document analysis systems gain importance with the increasing efforts in the digitalization of archives. Page segmentation and layout analysis are crucial steps for such systems. Errors in these steps will affect the outcome of handwritten text recognition and Optical Character Recognition (OCR) [...] Read more.

Historical document analysis systems gain importance with the increasing efforts in the digitalization of archives. Page segmentation and layout analysis are crucial steps for such systems. Errors in these steps will affect the outcome of handwritten text recognition and Optical Character Recognition (OCR) methods, which increase the importance of the page segmentation and layout analysis. Degradation of documents, digitization errors, and varying layout styles are the issues that complicate the segmentation of historical documents. The properties of Arabic scripts such as connected letters, ligatures, diacritics, and different writing styles make it even more challenging to process Arabic script historical documents. In this study, we developed an automatic system for counting registered individuals and assigning them to populated places by using a CNN-based architecture. To evaluate the performance of our system, we created a labeled dataset of registers obtained from the first wave of population registers of the Ottoman Empire held between the 1840s and 1860s. We achieved promising results for classifying different types of objects and counting the individuals and assigning them to populated places. Full article

(This article belongs to the Special Issue Recent Advances in Historical Document Processing)

► Show Figures

Figure 1

23 pages, 863 KB

Open AccessArticle

Generative vs. Discriminative Recognition Models for Off-Line Arabic Handwriting

by Moftah Elzobi and Ayoub Al-Hamadi

Sensors 2018, 18(9), 2786; https://doi.org/10.3390/s18092786 - 24 Aug 2018

Cited by 5 | Viewed by 3712

Abstract

The majority of handwritten word recognition strategies are constructed on learning-based generative frameworks from letter or word training samples. Theoretically, constructing recognition models through discriminative learning should be the more effective alternative. The primary goal of this research is to compare the performances [...] Read more.

The majority of handwritten word recognition strategies are constructed on learning-based generative frameworks from letter or word training samples. Theoretically, constructing recognition models through discriminative learning should be the more effective alternative. The primary goal of this research is to compare the performances of discriminative and generative recognition strategies, which are described by generatively-trained hidden Markov modeling (HMM), discriminatively-trained conditional random fields (CRF) and discriminatively-trained hidden-state CRF (HCRF). With learning samples obtained from two dissimilar databases, we initially trained and applied an HMM classification scheme. To enable HMM classifiers to effectively reject incorrect and out-of-vocabulary segmentation, we enhance the models with adaptive threshold schemes. Aside from proposing such schemes for HMM classifiers, this research introduces CRF and HCRF classifiers in the recognition of offline Arabic handwritten words. Furthermore, the efficiencies of all three strategies are fully assessed using two dissimilar databases. Recognition outcomes for both words and letters are presented, with the pros and cons of each strategy emphasized. Full article

(This article belongs to the Special Issue Emerging Algorithms and Applications in Vision Sensors System based on Artificial Intelligence)

► Show Figures

Figure 1

2 pages, 150 KB

Open AccessEditorial

Document Image Processing

by Laurence Likforman-Sulem and Ergina Kavallieratou

J. Imaging 2018, 4(7), 84; https://doi.org/10.3390/jimaging4070084 - 22 Jun 2018

Cited by 2 | Viewed by 4788

Abstract

n/a Full article

(This article belongs to the Special Issue Document Image Processing)

19 pages, 5686 KB

Open AccessArticle

Open Datasets and Tools for Arabic Text Detection and Recognition in News Video Frames

by Oussama Zayene, Sameh Masmoudi Touj, Jean Hennebert, Rolf Ingold and Najoua Essoukri Ben Amara

J. Imaging 2018, 4(2), 32; https://doi.org/10.3390/jimaging4020032 - 31 Jan 2018

Cited by 12 | Viewed by 12915

Abstract

Recognizing texts in video is more complex than in other environments such as scanned documents. Video texts appear in various colors, unknown fonts and sizes, often affected by compression artifacts and low quality. In contrast to Latin texts, there are no publicly available [...] Read more.

Recognizing texts in video is more complex than in other environments such as scanned documents. Video texts appear in various colors, unknown fonts and sizes, often affected by compression artifacts and low quality. In contrast to Latin texts, there are no publicly available datasets which cover all aspects of the Arabic Video OCR domain. This paper describes a new well-defined and annotated Arabic-Text-in-Video dataset called AcTiV 2.0. The dataset is dedicated especially to building and evaluating Arabic video text detection and recognition systems. AcTiV 2.0 contains 189 video clips serving as a raw material for creating 4063 key frames for the detection task and 10,415 cropped text images for the recognition task. AcTiV 2.0 is also distributed with its annotation and evaluation tools that are made open-source for standardization and validation purposes. This paper also reports on the evaluation of several systems tested under the proposed detection and recognition protocols. Full article

(This article belongs to the Special Issue Document Image Processing)

► Show Figures

Figure 1

11 pages, 1550 KB

Open AccessArticle

A Holistic Technique for an Arabic OCR System

by Farhan M. A. Nashwan, Mohsen A. A. Rashwan, Hassanin M. Al-Barhamtoshy, Sherif M. Abdou and Abdullah M. Moussa

J. Imaging 2018, 4(1), 6; https://doi.org/10.3390/jimaging4010006 - 27 Dec 2017

Cited by 24 | Viewed by 8695

Abstract

Analytical based approaches in Optical Character Recognition (OCR) systems can endure a significant amount of segmentation errors, especially when dealing with cursive languages such as the Arabic language with frequent overlapping between characters. Holistic based approaches that consider whole words as single units [...] Read more.

Analytical based approaches in Optical Character Recognition (OCR) systems can endure a significant amount of segmentation errors, especially when dealing with cursive languages such as the Arabic language with frequent overlapping between characters. Holistic based approaches that consider whole words as single units were introduced as an effective approach to avoid such segmentation errors. Still the main challenge for these approaches is their computation complexity, especially when dealing with large vocabulary applications. In this paper, we introduce a computationally efficient, holistic Arabic OCR system. A lexicon reduction approach based on clustering similar shaped words is used to reduce recognition time. Using global word level Discrete Cosine Transform (DCT) based features in combination with local block based features, our proposed approach managed to generalize for new font sizes that were not included in the training data. Evaluation results for the approach using different test sets from modern and historical Arabic books are promising compared with state of art Arabic OCR systems. Full article

(This article belongs to the Special Issue Document Image Processing)

► Show Figures

Figure 1

Search Results (16)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (16)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI