Next Article in Journal
Epithelium and Stroma Identification in Histopathological Images Using Unsupervised and Semi-Supervised Superpixel-Based Segmentation
Next Article in Special Issue
A Holistic Technique for an Arabic OCR System
Previous Article in Journal
Performance of the Commercial PP/ZnS:Cu and PP/ZnS:Ag Scintillation Screens for Fast Neutron Imaging
Article Menu
Issue 4 (December) cover image

Export Article

Open AccessArticle
J. Imaging 2017, 3(4), 62; https://doi.org/10.3390/jimaging3040062

DocCreator: A New Software for Creating Synthetic Ground-Truthed Document Images

1
Laboratoire Bordelais de Recherche en Informatique UMR 5800, Université de Bordeaux, CNRS, Bordeaux INP, 33400 Talence, France
2
Laboratoire Informatique, Image et Interaction (L3i), Université de La Rochelle, 17000 La Rochelle, France
3
LIPADE Laboratory, Paris Descartes University, 45, rue des Saints-Pères, 75270 Paris, CEDEX 6, France
These authors contributed equally to this work. Other authors: Kieu Van-Cuong worked on degradation models, Antoine Billy worked on synthetic document reconstruction.
*
Author to whom correspondence should be addressed.
Received: 30 October 2017 / Revised: 29 November 2017 / Accepted: 5 December 2017 / Published: 11 December 2017
(This article belongs to the Special Issue Document Image Processing)
Full-Text   |   PDF [25492 KB, uploaded 12 December 2017]   |  

Abstract

Most digital libraries that provide user-friendly interfaces, enabling quick and intuitive access to their resources, are based on Document Image Analysis and Recognition (DIAR) methods. Such DIAR methods need ground-truthed document images to be evaluated/compared and, in some cases, trained. Especially with the advent of deep learning-based approaches, the required size of annotated document datasets seems to be ever-growing. Manually annotating real documents has many drawbacks, which often leads to small reliably annotated datasets. In order to circumvent those drawbacks and enable the generation of massive ground-truthed data with high variability, we present DocCreator, a multi-platform and open-source software able to create many synthetic image documents with controlled ground truth. DocCreator has been used in various experiments, showing the interest of using such synthetic images to enrich the training stage of DIAR tools. View Full-Text
Keywords: synthetic image generation; document degradation models; performance evaluation; data augmentation for retraining and fine-tuning; DIAR synthetic image generation; document degradation models; performance evaluation; data augmentation for retraining and fine-tuning; DIAR
Figures

Figure 1

This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. (CC BY 4.0).
SciFeed

Share & Cite This Article

MDPI and ACS Style

Journet, N.; Visani, M.; Mansencal, B.; Van-Cuong, K.; Billy, A. DocCreator: A New Software for Creating Synthetic Ground-Truthed Document Images. J. Imaging 2017, 3, 62.

Show more citation formats Show less citations formats

Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Related Articles

Article Metrics

Article Access Statistics

1

Comments

[Return to top]
J. Imaging EISSN 2313-433X Published by MDPI AG, Basel, Switzerland RSS E-Mail Table of Contents Alert
Back to Top