Next Article in Journal
Digital Image Correlation of Strains at Profiled Wood Surfaces Exposed to Wetting and Drying
Next Article in Special Issue
A Study of Different Classifier Combination Approaches for Handwritten Indic Script Recognition
Previous Article in Journal
An Overview of Deep Learning Based Methods for Unsupervised and Semi-Supervised Anomaly Detection in Videos
Previous Article in Special Issue
Open Datasets and Tools for Arabic Text Detection and Recognition in News Video Frames
Article Menu
Issue 2 (February) cover image

Export Article

Open AccessArticle
J. Imaging 2018, 4(2), 37; https://doi.org/10.3390/jimaging4020037

Efficient Query Specific DTW Distance for Document Retrieval with Unlimited Vocabulary

1
Center for Visual Information Technology, IIIT Hyderabad, Hyderabad 500 032, India
2
CSE Department, Stony Brook University, Stony Brook, NY 11794, USA
3
Department of Computer Science and Engineering, IIT Jodhpur, Jodhpur 342037, India
*
Author to whom correspondence should be addressed.
Received: 31 October 2017 / Revised: 27 January 2018 / Accepted: 2 February 2018 / Published: 8 February 2018
(This article belongs to the Special Issue Document Image Processing)
Full-Text   |   PDF [414 KB, uploaded 8 February 2018]   |  

Abstract

In this paper, we improve the performance of the recently proposed Direct Query Classifier (dqc). The (dqc) is a classifier based retrieval method and in general, such methods have been shown to be superior to the OCR-based solutions for performing retrieval in many practical document image datasets. In (dqc), the classifiers are trained for a set of frequent queries and seamlessly extended for the rare and arbitrary queries. This extends the classifier based retrieval paradigm to an unlimited number of classes (words) present in a language. The (dqc) requires indexing cut-portions (n-grams) of the word image and dtw distance has been used for indexing. However, dtw is computationally slow and therefore limits the performance of the (dqc). We introduce query specific dtw distance, which enables effective computation of global principal alignments for novel queries. Since the proposed query specific dtw distance is a linear approximation of the dtw distance, it enhances the performance of the (dqc). Unlike previous approaches, the proposed query specific dtw distance uses both the class mean vectors and the query information for computing the global principal alignments for the query. Since the proposed method computes the global principal alignments using n-grams, it works well for both frequent and rare queries. We also use query expansion (qe) to further improve the performance of our query specific dtw. This also allows us to seamlessly adapt our solution to new fonts, styles and collections. We have demonstrated the utility of the proposed technique over 3 different datasets. The proposed query specific dtw performs well compared to the previous dtw approximations. View Full-Text
Keywords: DTW distance; query classifiers; word spotting; indexing; retrieval DTW distance; query classifiers; word spotting; indexing; retrieval
Figures

Figure 1

This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited (CC BY 4.0).
SciFeed
Printed Edition Available!
A printed edition of this Special Issue is available here.

Share & Cite This Article

MDPI and ACS Style

Nagendar, G.; Ranjan, V.; Harit, G.; Jawahar, C.V. Efficient Query Specific DTW Distance for Document Retrieval with Unlimited Vocabulary. J. Imaging 2018, 4, 37.

Show more citation formats Show less citations formats

Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Related Articles

Article Metrics

Article Access Statistics

1

Comments

[Return to top]
J. Imaging EISSN 2313-433X Published by MDPI AG, Basel, Switzerland RSS E-Mail Table of Contents Alert
Back to Top