Next Article in Journal
Deriving Quantitative Crystallographic Information from the Wavelength-Resolved Neutron Transmission Analysis Performed in Imaging Mode
Next Article in Special Issue
Transcription of Spanish Historical Handwritten Documents with Deep Neural Networks
Previous Article in Journal
In-Situ Imaging of Liquid Phase Separation in Molten Alloys Using Cold Neutrons
Previous Article in Special Issue
DocCreator: A New Software for Creating Synthetic Ground-Truthed Document Images
Article Menu
Issue 1 (January) cover image

Export Article

Open AccessArticle
J. Imaging 2018, 4(1), 6; https://doi.org/10.3390/jimaging4010006

A Holistic Technique for an Arabic OCR System

1
Department of Electronics and Electrical Communications, Cairo University, Giza 12613, Egypt
2
Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah 21589, Saudi Arabia
3
Faculty of Computers & Information, Cairo University, Giza 12613, Egypt
*
Author to whom correspondence should be addressed.
Received: 30 October 2017 / Revised: 18 December 2017 / Accepted: 22 December 2017 / Published: 27 December 2017
(This article belongs to the Special Issue Document Image Processing)
Full-Text   |   PDF [1550 KB, uploaded 27 December 2017]   |  

Abstract

Analytical based approaches in Optical Character Recognition (OCR) systems can endure a significant amount of segmentation errors, especially when dealing with cursive languages such as the Arabic language with frequent overlapping between characters. Holistic based approaches that consider whole words as single units were introduced as an effective approach to avoid such segmentation errors. Still the main challenge for these approaches is their computation complexity, especially when dealing with large vocabulary applications. In this paper, we introduce a computationally efficient, holistic Arabic OCR system. A lexicon reduction approach based on clustering similar shaped words is used to reduce recognition time. Using global word level Discrete Cosine Transform (DCT) based features in combination with local block based features, our proposed approach managed to generalize for new font sizes that were not included in the training data. Evaluation results for the approach using different test sets from modern and historical Arabic books are promising compared with state of art Arabic OCR systems. View Full-Text
Keywords: Arabic OCR systems; holistic OCR approach; holistic OCR features; lexicon reduction Arabic OCR systems; holistic OCR approach; holistic OCR features; lexicon reduction
Figures

Figure 1

This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited (CC BY 4.0).
SciFeed

Share & Cite This Article

MDPI and ACS Style

Nashwan, F.M.A.; Rashwan, M.A.A.; Al-Barhamtoshy, H.M.; Abdou, S.M.; Moussa, A.M. A Holistic Technique for an Arabic OCR System. J. Imaging 2018, 4, 6.

Show more citation formats Show less citations formats

Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Related Articles

Article Metrics

Article Access Statistics

1

Comments

[Return to top]
J. Imaging EISSN 2313-433X Published by MDPI AG, Basel, Switzerland RSS E-Mail Table of Contents Alert
Back to Top