Special Issue "Document Image Processing"

A special issue of Journal of Imaging (ISSN 2313-433X).

Deadline for manuscript submissions: closed (15 December 2017)

Special Issue Editors

Guest Editor
Dr. Ergina Kavallieratou

University of the Aegean, Department of Information and Communication Systems Engineering, Samos, Greece
Website | E-Mail
Interests: document image; document image processing; historical document images; document analysis; deep learning; machine learning
Guest Editor
Dr. Laurence Likforman-Sulem

Telecom ParisTech/TSI
Website | E-Mail
Interests: handwriting recognition with Markovian methods (HMMs, Bayesian Networks) and Recurrent Neural Networks (BLSTMs); document analysis of historical documents; information extraction in degraded documents; Web documents; automatic detection of cognitive disorders from handwriting signal

Special Issue Information

Dear Colleagues,

Document Image Processing allows systems like OCR, writer identification, writer recognition, check processing, historical document processing, etc., to extract useful information from document images. In order to succeed, many preprocessing tasks can be required: Document skew detection and correction, slant removal, binarization and segmentation procedures, as well as other normalization tasks.

The intent of this Special Issue is to collect the experiences of leading scientists of the field, but also to be an assessment tool for people who are new to the world of document image processing.

This Special Issue intends to cover the following topics, but is not limited to them:

  • Document Image Analysis
  • Document Understanding
  • Document Analysis Systems
  • Document Processing
  • Camera-based Document Processing
  • Document Databases and Digital Libraries
  • Mining Document Image Collections
  • Document Forensics
  • Historical Documents
  • Segmentation and Restoration
  • Performance Evaluation
  • Camera and Scene Text Understanding
  • Machine Learning for Document Analysis
  • Human-Document Interaction
  • Novel Applications

Indeed, any work concerning the use of document image processing, as well the development of new application procedures, may fall within the scope of this Special Issue. Of course, papers must present novel results, or the advancement of previously published data, and the matter should be dealt with scientific rigor.

Dr. Ergina Kavallieratou
Dr. Laurence Likforman
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All papers will be peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Journal of Imaging is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) is waived for well-prepared manuscripts submitted to this issue. Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • document image;
  • document image processing;
  • historical document images;
  • document analysis;
  • deep learning;
  • machine learning;

Published Papers (9 papers)

View options order results:
result details:
Displaying articles 1-9
Export citation of selected articles as:

Research

Open AccessArticle Benchmarking of Document Image Analysis Tasks for Palm Leaf Manuscripts from Southeast Asia
J. Imaging 2018, 4(2), 43; doi:10.3390/jimaging4020043
Received: 15 December 2017 / Revised: 10 February 2018 / Accepted: 18 February 2018 / Published: 22 February 2018
PDF Full-text (7742 KB) | HTML Full-text | XML Full-text
Abstract
This paper presents a comprehensive test of the principal tasks in document image analysis (DIA), starting with binarization, text line segmentation, and isolated character/glyph recognition, and continuing on to word recognition and transliteration for a new and challenging collection of palm leaf manuscripts
[...] Read more.
This paper presents a comprehensive test of the principal tasks in document image analysis (DIA), starting with binarization, text line segmentation, and isolated character/glyph recognition, and continuing on to word recognition and transliteration for a new and challenging collection of palm leaf manuscripts from Southeast Asia. This research presents and is performed on a complete dataset collection of Southeast Asian palm leaf manuscripts. It contains three different scripts: Khmer script from Cambodia, and Balinese script and Sundanese script from Indonesia. The binarization task is evaluated on many methods up to the latest in some binarization competitions. The seam carving method is evaluated for the text line segmentation task, compared to a recently new text line segmentation method for palm leaf manuscripts. For the isolated character/glyph recognition task, the evaluation is reported from the handcrafted feature extraction method, the neural network with unsupervised learning feature, and the Convolutional Neural Network (CNN) based method. Finally, the Recurrent Neural Network-Long Short-Term Memory (RNN-LSTM) based method is used to analyze the word recognition and transliteration task for the palm leaf manuscripts. The results from all experiments provide the latest findings and a quantitative benchmark for palm leaf manuscripts analysis for researchers in the DIA community. Full article
(This article belongs to the Special Issue Document Image Processing)
Figures

Figure 1

Open AccessArticle Handwritten Devanagari Character Recognition Using Layer-Wise Training of Deep Convolutional Neural Networks and Adaptive Gradient Methods
J. Imaging 2018, 4(2), 41; doi:10.3390/jimaging4020041
Received: 6 December 2017 / Revised: 9 February 2018 / Accepted: 12 February 2018 / Published: 13 February 2018
PDF Full-text (1171 KB) | HTML Full-text | XML Full-text
Abstract
Handwritten character recognition is currently getting the attention of researchers because of possible applications in assisting technology for blind and visually impaired users, human–robot interaction, automatic data entry for business documents, etc. In this work, we propose a technique to recognize handwritten Devanagari
[...] Read more.
Handwritten character recognition is currently getting the attention of researchers because of possible applications in assisting technology for blind and visually impaired users, human–robot interaction, automatic data entry for business documents, etc. In this work, we propose a technique to recognize handwritten Devanagari characters using deep convolutional neural networks (DCNN) which are one of the recent techniques adopted from the deep learning community. We experimented the ISIDCHAR database provided by (Information Sharing Index) ISI, Kolkata and V2DMDCHAR database with six different architectures of DCNN to evaluate the performance and also investigate the use of six recently developed adaptive gradient methods. A layer-wise technique of DCNN has been employed that helped to achieve the highest recognition accuracy and also get a faster convergence rate. The results of layer-wise-trained DCNN are favorable in comparison with those achieved by a shallow technique of handcrafted features and standard DCNN. Full article
(This article belongs to the Special Issue Document Image Processing)
Figures

Figure 1

Open AccessArticle A Study of Different Classifier Combination Approaches for Handwritten Indic Script Recognition
J. Imaging 2018, 4(2), 39; doi:10.3390/jimaging4020039
Received: 15 December 2017 / Revised: 6 February 2018 / Accepted: 8 February 2018 / Published: 13 February 2018
PDF Full-text (2203 KB) | HTML Full-text | XML Full-text
Abstract
Script identification is an essential step in document image processing especially when the environment is multi-script/multilingual. Till date researchers have developed several methods for the said problem. For this kind of complex pattern recognition problem, it is always difficult to decide which classifier
[...] Read more.
Script identification is an essential step in document image processing especially when the environment is multi-script/multilingual. Till date researchers have developed several methods for the said problem. For this kind of complex pattern recognition problem, it is always difficult to decide which classifier would be the best choice. Moreover, it is also true that different classifiers offer complementary information about the patterns to be classified. Therefore, combining classifiers, in an intelligent way, can be beneficial compared to using any single classifier. Keeping these facts in mind, in this paper, information provided by one shape based and two texture based features are combined using classifier combination techniques for script recognition (word-level) purpose from the handwritten document images. CMATERdb8.4.1 contains 7200 handwritten word samples belonging to 12 Indic scripts (600 per script) and the database is made freely available at https://code.google.com/p/cmaterdb/. The word samples from the mentioned database are classified based on the confidence scores provided by Multi-Layer Perceptron (MLP) classifier. Major classifier combination techniques including majority voting, Borda count, sum rule, product rule, max rule, Dempster-Shafer (DS) rule of combination and secondary classifiers are evaluated for this pattern recognition problem. Maximum accuracy of 98.45% is achieved with an improvement of 7% over the best performing individual classifier being reported on the validation set. Full article
(This article belongs to the Special Issue Document Image Processing)
Figures

Figure 1

Open AccessArticle Efficient Query Specific DTW Distance for Document Retrieval with Unlimited Vocabulary
J. Imaging 2018, 4(2), 37; doi:10.3390/jimaging4020037
Received: 31 October 2017 / Revised: 27 January 2018 / Accepted: 2 February 2018 / Published: 8 February 2018
PDF Full-text (414 KB) | HTML Full-text | XML Full-text
Abstract
In this paper, we improve the performance of the recently proposed Direct Query Classifier (dqc). The (dqc) is a classifier based retrieval method and in general, such methods have been shown to be superior to the OCR-based solutions for
[...] Read more.
In this paper, we improve the performance of the recently proposed Direct Query Classifier (dqc). The (dqc) is a classifier based retrieval method and in general, such methods have been shown to be superior to the OCR-based solutions for performing retrieval in many practical document image datasets. In (dqc), the classifiers are trained for a set of frequent queries and seamlessly extended for the rare and arbitrary queries. This extends the classifier based retrieval paradigm to an unlimited number of classes (words) present in a language. The (dqc) requires indexing cut-portions (n-grams) of the word image and dtw distance has been used for indexing. However, dtw is computationally slow and therefore limits the performance of the (dqc). We introduce query specific dtw distance, which enables effective computation of global principal alignments for novel queries. Since the proposed query specific dtw distance is a linear approximation of the dtw distance, it enhances the performance of the (dqc). Unlike previous approaches, the proposed query specific dtw distance uses both the class mean vectors and the query information for computing the global principal alignments for the query. Since the proposed method computes the global principal alignments using n-grams, it works well for both frequent and rare queries. We also use query expansion (qe) to further improve the performance of our query specific dtw. This also allows us to seamlessly adapt our solution to new fonts, styles and collections. We have demonstrated the utility of the proposed technique over 3 different datasets. The proposed query specific dtw performs well compared to the previous dtw approximations. Full article
(This article belongs to the Special Issue Document Image Processing)
Figures

Figure 1

Open AccessArticle Open Datasets and Tools for Arabic Text Detection and Recognition in News Video Frames
J. Imaging 2018, 4(2), 32; doi:10.3390/jimaging4020032
Received: 26 November 2017 / Revised: 23 January 2018 / Accepted: 26 January 2018 / Published: 31 January 2018
PDF Full-text (5686 KB) | HTML Full-text | XML Full-text
Abstract
Recognizing texts in video is more complex than in other environments such as scanned documents. Video texts appear in various colors, unknown fonts and sizes, often affected by compression artifacts and low quality. In contrast to Latin texts, there are no publicly available
[...] Read more.
Recognizing texts in video is more complex than in other environments such as scanned documents. Video texts appear in various colors, unknown fonts and sizes, often affected by compression artifacts and low quality. In contrast to Latin texts, there are no publicly available datasets which cover all aspects of the Arabic Video OCR domain. This paper describes a new well-defined and annotated Arabic-Text-in-Video dataset called AcTiV 2.0. The dataset is dedicated especially to building and evaluating Arabic video text detection and recognition systems. AcTiV 2.0 contains 189 video clips serving as a raw material for creating 4063 key frames for the detection task and 10,415 cropped text images for the recognition task. AcTiV 2.0 is also distributed with its annotation and evaluation tools that are made open-source for standardization and validation purposes. This paper also reports on the evaluation of several systems tested under the proposed detection and recognition protocols. Full article
(This article belongs to the Special Issue Document Image Processing)
Figures

Figure 1

Open AccessArticle A New Binarization Algorithm for Historical Documents
J. Imaging 2018, 4(2), 27; doi:10.3390/jimaging4020027
Received: 31 October 2017 / Revised: 16 January 2018 / Accepted: 16 January 2018 / Published: 23 January 2018
PDF Full-text (2874 KB) | HTML Full-text | XML Full-text
Abstract
Monochromatic documents claim for much less computer bandwidth for network transmission and storage space than their color or even grayscale equivalent. The binarization of historical documents is far more complex than recent ones as paper aging, color, texture, translucidity, stains, back-to-front interference, kind
[...] Read more.
Monochromatic documents claim for much less computer bandwidth for network transmission and storage space than their color or even grayscale equivalent. The binarization of historical documents is far more complex than recent ones as paper aging, color, texture, translucidity, stains, back-to-front interference, kind and color of ink used in handwriting, printing process, digitalization process, etc. are some of the factors that affect binarization. This article presents a new binarization algorithm for historical documents. The new global filter proposed is performed in four steps: filtering the image using a bilateral filter, splitting image into the RGB components, decision-making for each RGB channel based on an adaptive binarization method inspired by Otsu’s method with a choice of the threshold level, and classification of the binarized images to decide which of the RGB components best preserved the document information in the foreground. The quantitative and qualitative assessment made with 23 binarization algorithms in three sets of “real world” documents showed very good results. Full article
(This article belongs to the Special Issue Document Image Processing)
Figures

Figure 1

Open AccessArticle Transcription of Spanish Historical Handwritten Documents with Deep Neural Networks
J. Imaging 2018, 4(1), 15; doi:10.3390/jimaging4010015
Received: 30 October 2017 / Revised: 22 December 2017 / Accepted: 2 January 2018 / Published: 11 January 2018
PDF Full-text (1223 KB) | HTML Full-text | XML Full-text
Abstract
The digitization of historical handwritten document images is important for the preservation of cultural heritage. Moreover, the transcription of text images obtained from digitization is necessary to provide efficient information access to the content of these documents. Handwritten Text Recognition (HTR) has become
[...] Read more.
The digitization of historical handwritten document images is important for the preservation of cultural heritage. Moreover, the transcription of text images obtained from digitization is necessary to provide efficient information access to the content of these documents. Handwritten Text Recognition (HTR) has become an important research topic in the areas of image and computational language processing that allows us to obtain transcriptions from text images. State-of-the-art HTR systems are, however, far from perfect. One difficulty is that they have to cope with image noise and handwriting variability. Another difficulty is the presence of a large amount of Out-Of-Vocabulary (OOV) words in ancient historical texts. A solution to this problem is to use external lexical resources, but such resources might be scarce or unavailable given the nature and the age of such documents. This work proposes a solution to avoid this limitation. It consists of associating a powerful optical recognition system that will cope with image noise and variability, with a language model based on sub-lexical units that will model OOV words. Such a language modeling approach reduces the size of the lexicon while increasing the lexicon coverage. Experiments are first conducted on the publicly available Rodrigo dataset, which contains the digitization of an ancient Spanish manuscript, with a recognizer based on Hidden Markov Models (HMMs). They show that sub-lexical units outperform word units in terms of Word Error Rate (WER), Character Error Rate (CER) and OOV word accuracy rate. This approach is then applied to deep net classifiers, namely Bi-directional Long-Short Term Memory (BLSTMs) and Convolutional Recurrent Neural Nets (CRNNs). Results show that CRNNs outperform HMMs and BLSTMs, reaching the lowest WER and CER for this image dataset and significantly improving OOV recognition. Full article
(This article belongs to the Special Issue Document Image Processing)
Figures

Figure 1

Open AccessArticle A Holistic Technique for an Arabic OCR System
J. Imaging 2018, 4(1), 6; doi:10.3390/jimaging4010006
Received: 30 October 2017 / Revised: 18 December 2017 / Accepted: 22 December 2017 / Published: 27 December 2017
PDF Full-text (1550 KB) | HTML Full-text | XML Full-text
Abstract
Analytical based approaches in Optical Character Recognition (OCR) systems can endure a significant amount of segmentation errors, especially when dealing with cursive languages such as the Arabic language with frequent overlapping between characters. Holistic based approaches that consider whole words as single units
[...] Read more.
Analytical based approaches in Optical Character Recognition (OCR) systems can endure a significant amount of segmentation errors, especially when dealing with cursive languages such as the Arabic language with frequent overlapping between characters. Holistic based approaches that consider whole words as single units were introduced as an effective approach to avoid such segmentation errors. Still the main challenge for these approaches is their computation complexity, especially when dealing with large vocabulary applications. In this paper, we introduce a computationally efficient, holistic Arabic OCR system. A lexicon reduction approach based on clustering similar shaped words is used to reduce recognition time. Using global word level Discrete Cosine Transform (DCT) based features in combination with local block based features, our proposed approach managed to generalize for new font sizes that were not included in the training data. Evaluation results for the approach using different test sets from modern and historical Arabic books are promising compared with state of art Arabic OCR systems. Full article
(This article belongs to the Special Issue Document Image Processing)
Figures

Figure 1

Open AccessArticle DocCreator: A New Software for Creating Synthetic Ground-Truthed Document Images
J. Imaging 2017, 3(4), 62; doi:10.3390/jimaging3040062
Received: 30 October 2017 / Revised: 29 November 2017 / Accepted: 5 December 2017 / Published: 11 December 2017
PDF Full-text (25492 KB) | HTML Full-text | XML Full-text
Abstract
Most digital libraries that provide user-friendly interfaces, enabling quick and intuitive access to their resources, are based on Document Image Analysis and Recognition (DIAR) methods. Such DIAR methods need ground-truthed document images to be evaluated/compared and, in some cases, trained. Especially with the
[...] Read more.
Most digital libraries that provide user-friendly interfaces, enabling quick and intuitive access to their resources, are based on Document Image Analysis and Recognition (DIAR) methods. Such DIAR methods need ground-truthed document images to be evaluated/compared and, in some cases, trained. Especially with the advent of deep learning-based approaches, the required size of annotated document datasets seems to be ever-growing. Manually annotating real documents has many drawbacks, which often leads to small reliably annotated datasets. In order to circumvent those drawbacks and enable the generation of massive ground-truthed data with high variability, we present DocCreator, a multi-platform and open-source software able to create many synthetic image documents with controlled ground truth. DocCreator has been used in various experiments, showing the interest of using such synthetic images to enrich the training stage of DIAR tools. Full article
(This article belongs to the Special Issue Document Image Processing)
Figures

Figure 1

Back to Top