Efficient Query Specific DTW Distance for Document Retrieval with Unlimited Vocabulary
AbstractIn this paper, we improve the performance of the recently proposed Direct Query Classifier (dqc). The (dqc) is a classifier based retrieval method and in general, such methods have been shown to be superior to the OCR-based solutions for performing retrieval in many practical document image datasets. In (dqc), the classifiers are trained for a set of frequent queries and seamlessly extended for the rare and arbitrary queries. This extends the classifier based retrieval paradigm to an unlimited number of classes (words) present in a language. The (dqc) requires indexing cut-portions (n-grams) of the word image and dtw distance has been used for indexing. However, dtw is computationally slow and therefore limits the performance of the (dqc). We introduce query specific dtw distance, which enables effective computation of global principal alignments for novel queries. Since the proposed query specific dtw distance is a linear approximation of the dtw distance, it enhances the performance of the (dqc). Unlike previous approaches, the proposed query specific dtw distance uses both the class mean vectors and the query information for computing the global principal alignments for the query. Since the proposed method computes the global principal alignments using n-grams, it works well for both frequent and rare queries. We also use query expansion (qe) to further improve the performance of our query specific dtw. This also allows us to seamlessly adapt our solution to new fonts, styles and collections. We have demonstrated the utility of the proposed technique over 3 different datasets. The proposed query specific dtw performs well compared to the previous dtw approximations. View Full-Text
A printed edition of this Special Issue is available here.
Share & Cite This Article
Nagendar, G.; Ranjan, V.; Harit, G.; Jawahar, C.V. Efficient Query Specific DTW Distance for Document Retrieval with Unlimited Vocabulary. J. Imaging 2018, 4, 37.
Nagendar G, Ranjan V, Harit G, Jawahar CV. Efficient Query Specific DTW Distance for Document Retrieval with Unlimited Vocabulary. Journal of Imaging. 2018; 4(2):37.Chicago/Turabian Style
Nagendar, Gattigorla; Ranjan, Viresh; Harit, Gaurav; Jawahar, C. V. 2018. "Efficient Query Specific DTW Distance for Document Retrieval with Unlimited Vocabulary." J. Imaging 4, no. 2: 37.
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.