Enhancing Keyword Spotting via NLP-Based Re-Ranking: Leveraging Semantic Relevance Feedback in the Handwritten Domain
Abstract
1. Introduction
1.1. Motivation
1.2. Contribution
- We present a novel framework for semantic-aware re-ranking in handwritten document image KWS by integrating cutting-edge NLP techniques into the retrieval pipeline. To the best of our knowledge, this is the first attempt to incorporate transformer-based Large Language Models (LLMs) for uncovering semantic relationships with top-ranked query instances, enabling post-retrieval re-ranking that enhances overall KWS performance.
- We explore two distinct decoding strategies for transcribing word image instances from the initial ranked list: (i) TrOCR [37], a transformer-based vision-to-text model that maps visual input directly to character sequences, and (ii) the character counting and Connectionist Temporal Classification (CTC)-based re-scoring mechanism introduced in [33], a compact Convolutional Neural Network (CNN)-based segmentation-free approach that scores query matches based on dense character probability maps and sequence alignment. These transcriptions are subsequently embedded using state-of-the-art language models (e.g., RoBERTa [38]), projecting each word into a semantically meaningful vector space for downstream re-ranking.
- We propose a new ranking scheme where each word instance is assigned a composite score: a weighted sum of its verbatim (visual-based) similarity and its semantic similarity to the query. This allows semantically relevant words, possibly missed in the initial visual ranking, to be spotted higher in the final ordering.
- We perform extensive ablation studies on the George Washington (GW) and IAM datasets using two cutting-edge reference KWS systems that perform directly at page level, namely, KWS-Simplified [33] and WordRetrievalNet [29], evaluating improvements in mAP across different embedding strategies and similarity metrics.
- Finally, our results illustrate that NLP-based re-ranking not only improves standard KWS performance but also opens the path toward semantically aware information retrieval systems. These systems bridge the gap between visually dissimilar yet semantically related word instances while maintaining a plug-and-play design that is both dataset-agnostic and independent of baseline model retraining. This generalization capability is further supported by the low variability in re-ranking effectiveness across cross-validation folds. Throughout our experiments, we observe consistently improved mAP and low standard deviation. Taken together, our approach pushes the frontier of recognition-free document understanding by integrating semantic reasoning into post-retrieval analysis.
2. Related Works
2.1. Segmentation-Free Keyword Spotting: Key Approaches and Trends
2.2. Retrieval Enhancement via Relevance Feedback and Re-Ranking
2.2.1. Supervised Relevance Feedback
2.2.2. Unsupervised Feedback and Re-Ranking
2.3. Semantic Knowledge Transfer
3. Materials and Methods
3.1. WordRetrievalNet
- Offline stage: A deep neural network (DNN) is trained to generate a database of candidate bounding boxes along with their representations in a latent space.
- Online stage: A query is matched against the database, returning a ranked list of bounding boxes with the highest cosine similarity to the query representation.
- A classification head that identifies pixels belonging to a positive word area;
- A regression head that predicts the offsets between each pixel of a positive word area and the corresponding bounding box containing it;
- An embedding head that maps word areas into the latent space (DCToW, PHOC, etc.).
3.2. Segmentation-Free KWS Simplified
3.3. Proposed Framework
- Two distinct decoder architectures;
- Three alternative state-of-the-art semantic LLMs;
- Two late fusion strategies.
4. Experimental Evaluation
4.1. Datasets
4.2. Evaluation Protocol
4.3. Implementation Details
4.4. Ablation Experiments on Fusion Strategy and Decoder Choice
4.4.1. Impact of the Baseline KWS Model
4.4.2. Impact of the Decoder
4.4.3. Semantic Embedding Models
4.4.4. Fusion Strategy: Weighted Combination
4.4.5. Fusion Strategy: Semantic Pruning
4.4.6. Qualitative Analysis
4.5. Discussion
5. Conclusions
- Exploring transformer-based late fusion strategies that support end-to-end trainable re-ranking, inspired by cross-modal architectures such as CLIP-Rerankers and BLIP;
- Incorporating dynamic query expansion mechanisms via LLMs to improve semantic coverage and reduce reliance on exact phrasing;
- Integrating joint optimization objectives with vision–language models to enhance alignment between visual content and textual intent.
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Giotis, A.P.; Sfikas, G.; Gatos, B.; Nikou, C. A survey of document image word spotting techniques. Pattern Recognit. 2017, 68, 310–332. [Google Scholar] [CrossRef]
- Almazán, J.; Gordo, A.; Fornes, A.; Valveny, E. Word Spotting and Recognition with Embedded Attributes. IEEE Trans. Pattern Anal. Mach. Intell. 2014, 36, 2552–2566. [Google Scholar] [CrossRef] [PubMed]
- Sudholt, S.; Fink, G.A. Evaluating Word String Embeddings and Loss Functions for CNN-Based Word Spotting. In Proceedings of the 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), Kyoto, Japan, 9–15 November 2017; pp. 493–498. [Google Scholar] [CrossRef]
- Wei, H.; Zhang, J.; Liu, K. A Hybrid Representation of Word Images for Keyword Spotting. In Proceedings of the Neural Information Processing, ICONIP 2020, Bangkok, Thailand, 18–22 November 2020; Yang, H., Pasupa, K., Leung, A., Kwok, J., Chan, J., King, I., Eds.; Communications in Computer and Information Science; Springer: Cham, Switzerland, 2020; Volume 1332, pp. 3–15. [Google Scholar] [CrossRef]
- Wolf, F.; Brandenbusch, K.; Fink, G.A. Improving Handwritten Word Synthesis for Annotation-free Word Spotting. In Proceedings of the 17th International Conference on Frontiers in Handwriting Recognition (ICFHR), Dortmund, Germany, 8–10 September 2020; pp. 61–66. [Google Scholar] [CrossRef]
- Marcelli, A.; De Gregorio, G.; Santoro, A. A Model for Evaluating the Performance of a Multiple Keywords Spotting System for the Transcription of Historical Handwritten Documents. J. Imaging 2020, 6, 117. [Google Scholar] [CrossRef] [PubMed]
- Parziale, A.; Capriolo, G.; Marcelli, A. One Step Is Not Enough: A Multi-Step Procedure for Building the Training Set of a Query by String Keyword Spotting System to Assist the Transcription of Historical Document. J. Imaging 2020, 6, 109. [Google Scholar] [CrossRef] [PubMed]
- Cheikhrouhou, A.; Kessentini, Y.; Kanoun, S. Multi-task learning for simultaneous script identification and keyword spotting in document images. Pattern Recognit. 2021, 113, 107832. [Google Scholar] [CrossRef]
- Daraee, F.; Mozaffari, S.; Razavi, S.M. Handwritten keyword spotting using deep neural networks and certainty prediction. Comput. Electr. Eng. 2021, 92, 107–111. [Google Scholar] [CrossRef]
- Wolf, F.; Fischer, A.; Fink, G. Graph Convolutional Neural Networks for Learning Attribute Representations for Word Spotting. In Proceedings of the Document Analysis and Recognition–ICDAR 2021, Lausanne, Switzerland, 5–10 September 2021; Lecture Notes in Computer Science; Springer: Cham, Switzerland, 2021; Volume 12821, pp. 51–66. [Google Scholar] [CrossRef]
- Retsinas, G.; Sfikas, G.; Nikou, C.; Maragos, P. From Seq2Seq Recognition to Handwritten Word Embeddings. In Proceedings of the 32nd British Machine Vision Conference (BMVC), Online, 22–25 November 2021; pp. 1–14. Available online: https://www.bmvc2021-virtualconference.com/assets/papers/1481.pdf (accessed on 4 May 2025).
- Kundu, S.; Malakar, S.; Geem, Z.; Moon, Y.; Singh, P.; Sarkar, R. Hough Transform-Based Angular Features for Learning-Free Handwritten Keyword Spotting. Sensors 2021, 21, 4648. [Google Scholar] [CrossRef]
- Majumder, S.; Ghosh, S.; Malakar, S.; Sarkar, R.; Nasipuri, M. A voting-based technique for word spotting in handwritten document images. Multimed. Tools Appl. 2021, 80, 12411–12434. [Google Scholar] [CrossRef]
- De Gregorio, G.; Biswas, S.; Souibgui, M.A.; Bensalah, A.; Lladós, J.; Fornés, A.; Marcelli, A. A Few Shot Multi-representation Approach for N-Gram Spotting in Historical Manuscripts. In Proceedings of the Frontiers in Handwriting Recognition, ICFHR 2022, Hyderabad, India, 4–7 December 2022; Porwal, U., Fornés, A., Shafait, F., Eds.; Lecture Notes in Computer Science; Springer: Cham, Switzerland, 2022; Volume 13639, pp. 3–17. [Google Scholar] [CrossRef]
- Ghilas, H.; Gagaoua, M.; Tari, A.; Cheriet, M. Spatial Distribution of Ink at Keypoints (SDIK): A Novel Feature for Word Spotting in Arabic Documents. Int. J. Image Graph. 2022, 22, 2250035. [Google Scholar] [CrossRef]
- Gongidi, S.; Jawahar, C. Handwritten Text Retrieval from Unlabeled Collections. In Computer Vision and Image Processing (CVIP 2021), Rupnagar, India, 3–5 December 2021; Communications in Computer and Information Science; Springer: Cham, Switzerland, 2022; Volume 1568, pp. 3–13. [Google Scholar] [CrossRef]
- Giotis, A.P.; Sfikas, G.; Nikou, C. Adversarial Deep Features for Weakly Supervised Document Image Keyword Spotting. In Proceedings of the 2022 IEEE 14th Image, Video, and Multidimensional Signal Processing Workshop (IVMSP), Nafplio, Greece,, 26–29 June 2022; pp. 1–5. [Google Scholar] [CrossRef]
- Banerjee, D.; Bhowal, P.; Malakar, S.; Cuevas, E.; Pérez-Cisneros, M.; Sarkar, R. Z-Transform-Based Profile Matching to Develop a Learning-Free Keyword Spotting Method for Handwritten Document Images. Int. J. Comput. Intell. Syst. 2022, 15, 93. [Google Scholar] [CrossRef]
- Krishnan, P.; Dutta, K.; Jawahar, C. HWNet v3: A joint embedding framework for recognition and retrieval of handwritten text. Int. J. Doc. Anal. Recognit. (IJDAR) 2023, 26, 401–417. [Google Scholar] [CrossRef]
- Vidal, E.; Toselli, A.; Puigcerver, J. Lexicon-based probabilistic indexing of handwritten text images. Neural Comput. Appl. 2023, 35, 17501–17520. [Google Scholar] [CrossRef]
- Wolf, F.; Fink, G.A. Self-training for handwritten word recognition and retrieval. Int. J. Doc. Anal. Recognit. (IJDAR) 2024, 27, 225–244. [Google Scholar] [CrossRef]
- Matos, A.; Almeida, P.; Correia, P.; Pacheco, O. iForal: Automated Handwritten Text Transcription for Historical Medieval Manuscripts. J. Imaging 2025, 11, 36. [Google Scholar] [CrossRef] [PubMed]
- Khamekhem Jemni, S.; Ammar, S.; Souibgui, M.A.; Kessentini, Y.; Cheddad, A. ST-KeyS: Self-supervised Transformer for Keyword Spotting in historical handwritten documents. Pattern Recognit. 2026, 170, 112036. [Google Scholar] [CrossRef]
- Rusiñol, M.; Aldavert, D.; Toledo, R.; Llados, J. Browsing Heterogeneous Document Collections by a Segmentation-Free Word Spotting Method. In Proceedings of the 11th International Conference on Document Analysis and Recognition (ICDAR), Beijing, China, 18–21 September 2011; pp. 63–67. [Google Scholar]
- Almazán, J.; Gordo, A.; Fornes, A.; Valveny, E. Efficient Exemplar Word Spotting. In Proceedings of the 23rd British Machine Vision Conference (BMVC), Guildford, UK, 3–7 September 2012; pp. 67.1–67.11. [Google Scholar]
- Kovalchuk, A.; Wolf, L.; Dershowitz, N. A Simple and Fast Word Spotting Method. In Proceedings of the 14th International Conference on Frontiers in Handwriting Recognition (ICFHR), Crete, Greece, 1–4 September 2014; pp. 3–8. [Google Scholar]
- Rothacker, L.; Fink, G.A. Segmentation-free Query-by-String Word Spotting with Bag-of-Features HMMs. In Proceedings of the 13th International Conference on Document Analysis and Recognition (ICDAR), Tunis, Tunisia, 23–26 August 2015; pp. 661–665. [Google Scholar]
- Wilkinson, T.; Lindström, J.; Brun, A. Neural Ctrl-F: Segmentation-Free Query-by-String Word Spotting in Handwritten Manuscript Collections. In Proceedings of the 16th IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 4443–4452. [Google Scholar] [CrossRef]
- Zhao, P.; Xue, W.; Li, Q.; Cai, S. Query by Strings and Return Ranking Word Regions with Only One Look. In Proceedings of the Asian Conference on Computer Vision (ACCV), Kyoto, Japan, 30 November–4 December 2020; Ishikawa, H., Liu, C.L., Pajdla, T., Shi, J., Eds.; Lecture Notes in Computer Science. Springer: Cham, Switzerland, 2021; Volume 12622, pp. 3–18. [Google Scholar] [CrossRef]
- Wilkinson, T.; Nettelblad, C. Bootstrapping Weakly Supervised Segmentation-free Word Spotting through HMM-based Alignment. In Proceedings of the 17th International Conference on Frontiers in Handwriting Recognition (ICFHR), Dortmund, Germany, 8–10 September 2020; pp. 49–54. [Google Scholar] [CrossRef]
- Prabhakar, C. Segmentation-Free Word Spotting in Handwritten Documents Using Scale Space Co-HoG Feature Descriptors. In Applications of Advanced Machine Intelligence in Computer Vision and Object Recognition: Emerging Research and Opportunities; Chakraborty, S., Mali, K., Eds.; IGI Global: Hershey, PA, USA, 2020; pp. 219–247. [Google Scholar] [CrossRef]
- Das, S.; Mandal, S. Segmentation-free word spotting in historical Bangla handwritten document using Wave Kernel Signature. Pattern Anal. Appl. 2020, 23, 593–610. [Google Scholar] [CrossRef]
- Retsinas, G.; Sfikas, G.; Nikou, C. Keyword Spotting Simplified: A Segmentation-Free Approach Using Character Counting and CTC Re-scoring. In Proceedings of the International Conference on Document Analysis and Recognition, San Jose, CA, USA, 21–26 August 2023; Springer: Cham, Switzerland, 2023; pp. 446–464. [Google Scholar]
- Mikolov, T.; Chen, K.; Corrado, G.; Dean, J. Efficient Estimation of Word Representations in Vector Space. In Proceedings of the Workshop at the International Conference on Learning Representations (ICLR), Scottsdale, AZ, USA, 2–4 May 2013. [Google Scholar]
- Bojanowski, P.; Grave, E.; Joulin, A.; Mikolov, T. Enriching Word Vectors with Subword Information. Trans. Assoc. Comput. Linguist. 2017, 5, 135–146. [Google Scholar] [CrossRef]
- Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, MN, USA, 2–7 June 2019; pp. 4171–4186. [Google Scholar]
- Li, M.; Lv, T.; Chen, J.; Cui, L.; Lu, Y.; Florencio, D.; Zhang, C.; Li, Z.; Wei, F. TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models. In Proceedings of the AAAI Conference on Artificial Intelligence, Washington, DC, USA, 7–14 February 2023; Volume 37, pp. 13094–13102. [Google Scholar]
- Reimers, N.; Gurevych, I. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, 3–7 November 2019; Inui, K., Jiang, J., Ng, V., Wan, X., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2019; pp. 3982–3992. [Google Scholar] [CrossRef]
- Leydier, Y.; Ouji, A.; LeBourgeois, F.; Emptoz, H. Towards an Omnilingual Word Retrieval System for Ancient Manuscripts. Pattern Recognit. 2009, 42, 2089–2105. [Google Scholar] [CrossRef]
- Zhang, X.; Tan, C.L. Handwritten word image matching based on Heat Kernel Signature. Pattern Recognit. 2015, 48, 3346–3356. [Google Scholar] [CrossRef]
- Gatos, B.; Pratikakis, I. Segmentation-free word spotting in historical printed documents. In Proceedings of the 10th International Conference on Document Analysis and Recognition (ICDAR), Barcelona, Spain, 26–29 July 2009; pp. 271–275. [Google Scholar]
- Rothacker, L.; Rusiñol, M.; Fink, G.A. Bag-of-Features HMMs for Segmentation-Free Word Spotting in Handwritten Documents. In Proceedings of the 12th International Conference on Document Analysis and Recognition (ICDAR), Washington, DC, USA, 25–28 August 2013; pp. 1305–1309. [Google Scholar]
- Almazán, J.; Gordo, A.; Fornes, A.; Valveny, E. Segmentation-free word spotting with exemplar SVMs. Pattern Recognit. 2014, 47, 3967–3978. [Google Scholar] [CrossRef]
- Riba, P.; Llados, J.; Fornes, A. Handwritten Word Spotting by Inexact Matching of Grapheme Graphs. In Proceedings of the 13th International Conference on Document Analysis and Recognition (ICDAR), Tunis, Tunisia, 23–26 August 2015; pp. 781–785. [Google Scholar]
- Zagoris, K.; Pratikakis, I.; Gatos, B. Unsupervised Word Spotting in Historical Handwritten Document Images Using Document-Oriented Local Features. IEEE Trans. Image Process. 2017, 26, 4032–4041. [Google Scholar] [CrossRef]
- Ghosh, S.K.; Valveny, E. R-PHOC: Segmentation-Free Word Spotting Using CNN. In Proceedings of the 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), Kyoto, Japan, 9–15 November 2017; pp. 801–806. [Google Scholar] [CrossRef]
- Wilkinson, T.; Brun, A. Semantic and Verbatim Word Spotting Using Deep Neural Networks. In Proceedings of the 15th International Conference on Frontiers in Handwriting Recognition (ICFHR), Shenzhen, China, 23–26 October 2016; pp. 307–312. [Google Scholar] [CrossRef]
- Rothacker, L.; Sudholt, S.; Rusakov, E.; Kasperidus, M.; Fink, G.A. Word Hypotheses for Segmentation-Free Word Spotting in Historic Document Images. In Proceedings of the 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), Kyoto, Japan, 9–15 November 2017; pp. 1174–1179. [Google Scholar] [CrossRef]
- Tüselmann, O.; Fink, G.A. Exploring semantic word representations for recognition-free NLP on handwritten document images. In Proceedings of the International Conference on Document Analysis and Recognition, San Jose, CA, USA, 21–26 August 2023; Springer: Cham, Switzerland, 2023; pp. 85–100. [Google Scholar]
- He, B. Rocchio’s Formula. In Encyclopedia of Database Systems; Springer: New York, NY, USA, 2009; p. 2447. [Google Scholar]
- Rusiñol, M.; Llados, J. Boosting the handwritten word spotting experience by including the user in the loop. Pattern Recognit. 2014, 47, 1063–1072. [Google Scholar] [CrossRef]
- Wolf, F.; Oberdiek, P.; Fink, G. Exploring Confidence Measures for Word Spotting in Heterogeneous Datasets. In Proceedings of the 2019 International Conference on Document Analysis and Recognition (ICDAR), Sydney, NSW, Australia, 20–25 September 2019; pp. 583–588. [Google Scholar] [CrossRef]
- Nara, R.; Yamaguchi, S.; Ito, K.; Yoshie, O. Revisiting Relevance Feedback for CLIP-Based Interactive Image Retrieval. In Proceedings of the Computer Vision–ECCV 2024 Workshops, Milan, Italy, 29 September–4 October 2024; Lecture Notes in Computer Science; Springer: Cham, Switzerland, 2025; Volume 15639, pp. 3–20. [Google Scholar] [CrossRef]
- Ghosh, S.; Valveny, E. A Sliding Window Framework for Word Spotting Based on Word Attributes. In Proceedings of the 7th Iberian Conference on Pattern Recognition and Image Analysis (PRAI), Santiago de Compostela, Spain, 17–19 June 2015; pp. 652–661. [Google Scholar]
- Shekhar, R.; Jawahar, C. Word Image Retrieval Using Bag of Visual Words. In Proceedings of the 10th IAPR International Workshop on Document Analysis Systems (DAS), Gold Coast, QLD, Australia, 27–29 March 2012; pp. 297–301. [Google Scholar]
- Vats, E.; Hast, A.; Fornés, A. Training-Free and Segmentation-Free Word Spotting using Feature Matching and Query Expansion. In Proceedings of the 15th International Conference on Document Analysis and Recognition (ICDAR), Sydney, NSW, Australia, 20–25 September 2019; pp. 1294–1299. [Google Scholar] [CrossRef]
- Chuang, Y.S.; Fang, W.; Li, S.W.; Yih, W.-t.; Glass, J. Expand, Rerank, and Retrieve: Query Reranking for Open-Domain Question Answering. In Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, Toronto, ON, Canada, 9–14 July 2023; pp. 12131–12147. [Google Scholar]
- Ou, W.; Huynh, V.N. Conditional variational autoencoder for query expansion in ad-hoc information retrieval. Inf. Sci. 2024, 652, 119764. [Google Scholar] [CrossRef]
- Djoudi, K.; Alimazighi, Z.; Hedjazi, B.D. Information retrieval with query expansion and re-ranking: A survey. In Proceedings of the 2nd International Conference on Emerging Trends and Applications in Artificial Intelligence (ICETAI 2024), Baghdad, Iraq, 2–3 October 2024; The Institution of Engineering and Technology: Stevenage, UK, 2025; Volume 2024, pp. 114–119. [Google Scholar] [CrossRef]
- Tüselmann, O.; Wolf, F.; Fink, G.A. Identifying and tackling key challenges in semantic word spotting. In Proceedings of the 2020 17th International Conference on Frontiers in Handwriting Recognition (ICFHR), Dortmund, Germany, 8–10 September 2020; pp. 55–60. [Google Scholar]
- Krishnan, P.; Jawahar, C. Bringing Semantics in Word Image Retrieval. In Proceedings of the 12th International Conference on Document Analysis and Recognition (ICDAR), Washington, DC, USA, 25–28 August 2013; pp. 733–737. [Google Scholar]
- Gordo, A.; Almazán, J.; Murray, N.; Perronin, F. LEWIS: Latent Embeddings for Word Images and Their Semantics. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1242–1250. [Google Scholar]
- Miller, G.A. WordNet: A lexical database for English. Commun. ACM 1995, 38, 39–41. [Google Scholar] [CrossRef]
- Sahlgren, M. The distributional hypothesis. Ital. J. Linguist. 2008, 20, 33–53. [Google Scholar]
- Krishnan, P.; Jawahar, C. Bringing semantics into word image representation. Pattern Recognit. 2020, 108, 107542. [Google Scholar] [CrossRef]
- Washington, G. The Writings of George Washington from the Original Manuscript Sources, 1745–1799; Fitzpatrick, J., Ed.; U.S. Government Printing Office: Washington, DC, USA, 1931.
- Song, K.; Tan, X.; Qin, T.; Lu, J.; Liu, T.Y. MPNet: Masked and Permuted Pre-training for Language Understanding. Adv. Neural Inf. Process. Syst. 2020, 33, 16857–16867. [Google Scholar]
- Wang, W.; Wei, F.; Dong, L.; Bao, H.; Yang, N.; Zhou, M. MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers. Adv. Neural Inf. Process. Syst. 2020, 33, 5776–5788. [Google Scholar]
- Lin, T.Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature Pyramid Networks for Object Detection. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 936–944. [Google Scholar]
- Wang, W.; Xie, E.; Li, X.; Hou, W.; Lu, T.; Yu, G.; Shao, S. Shape robust text detection with progressive scale expansion network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 9336–9345. [Google Scholar]
- Milletari, F.; Navab, N.; Ahmadi, S.A. V-net: Fully convolutional neural networks for volumetric medical image segmentation. In Proceedings of the 2016 Fourth International Conference on 3D Vision (3DV), Stanford, CA, USA, 25–28 October 2016; pp. 565–571. [Google Scholar]
- Zheng, Z.; Wang, P.; Liu, W.; Li, J.; Ye, R.; Ren, D. Distance-IoU loss: Faster and better learning for bounding box regression. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 12993–13000. [Google Scholar]
- Tian, J.; Yan, B.; Yu, J.; Weng, C.; Yu, D.; Watanabe, S. Bayes Risk CTC: Controllable CTC Alignment in Sequence-to-Sequence Tasks. In Proceedings of the International Conference on Learning Representations (ICLR), Kigali, Rwanda, 1–5 May 2023; Available online: https://openreview.net/forum?id=Bd7GueaTxUz (accessed on 19 June 2025).
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar] [CrossRef]
- Graves, A.; Fernández, S.; Gomez, F.; Schmidhuber, J. Connectionist Temporal Classification: Labelling Unsegmented Sequence Data with Recurrent Neural Networks. In Proceedings of the 23rd International Conference on Machine Learning, Pittsburgh, PA, USA, 25–29 June 2006; pp. 369–376. [Google Scholar]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16x16 words: Transformers for image recognition at scale. In Proceedings of the International Conference on Learning Representations (ICLR), Vienna, Austria, 4 May 2021; Available online: https://openreview.net/forum?id=YicbFdNTTy (accessed on 13 May 2025).
- Lavrenko, V.; Rath, T.M.; Manmatha, R. Holistic word recognition for handwritten historical documents. In Proceedings of the 1st International Workshop on Document Image Analysis for Libraries, Palo Alto, CA, USA, 23–24 January 2004; pp. 278–287. [Google Scholar]
- Marti, U.V.; Bunke, H. The IAM-database: An English sentence database for offline handwriting recognition. Int. J. Doc. Anal. Recognit. 2002, 5, 39–46. [Google Scholar] [CrossRef]
- Washington, G. George Washington Papers, Series 2, Letterbooks 1754–1799: Letterbook 1, Aug. 11, 1754–Dec. 25, 1755; 1755; pp. 270–279, 300–309. Manuscript/Mixed Material. Available online: https://www.loc.gov/item/mgw2.001/ (accessed on 30 April 2025).
- Model Card: TrOCR. Available online: https://huggingface.co/microsoft/trocr-base-handwritten (accessed on 28 May 2025).
- Loshchilov, I.; Hutter, F. Decoupled Weight Decay Regularization. In Proceedings of the 7th International Conference on Learning Representations (ICLR), New Orleans, LA, USA, 6–9 May 2019; Available online: https://openreview.net/forum?id=Bkg6RiCqY7 (accessed on 15 May 2025).
- SentenceTransformers v5.0. Available online: https://www.sbert.net/ (accessed on 28 May 2025).
- Model Card: Stsb-Roberta-Base. Available online: https://huggingface.co/sentence-transformers/stsb-roberta-base (accessed on 28 May 2025).
- Cer, D.; Diab, M.; Agirre, E.; Lopez-Gazpio, I.; Specia, L. SemEval-2017 Task 1: Semantic Textual Similarity Multilingual and Crosslingual Focused Evaluation. In Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017), Vancouver, BC, Canada, 3–4 August 2017; Bethard, S., Carpuat, M., Apidianaki, M., Mohammad, S.M., Cer, D., Jurgens, D., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2017; pp. 1–14. [Google Scholar] [CrossRef]
- Model Card: All-Mpnet-Base-v2. Available online: https://huggingface.co/sentence-transformers/all-mpnet-base-v2 (accessed on 28 May 2025).
- Model Card: All-MiniLM-L12-v2. Available online: https://huggingface.co/sentence-transformers/all-MiniLM-L12-v2 (accessed on 28 May 2025).
- Dey, S.; Nicolaou, A.; Lladós, J.; Pal, U. Evaluation of word spotting under improper segmentation scenario. Int. J. Doc. Anal. Recognit. (IJDAR) 2019, 22, 361–374. [Google Scholar] [CrossRef]
- Saad-Falcon, J.; Khattab, O.; Santhanam, K.; Florian, R.; Franz, M.; Roukos, S.; Sil, A.; Sultan, M.; Potts, C. UDAPDR: Unsupervised Domain Adaptation via LLM Prompting and Distillation of Rerankers. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, Singapore, 6–10 December 2023; Bouamor, H., Pino, J., Bali, K., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2023; pp. 11265–11279. [Google Scholar] [CrossRef]
- Radford, A.; Kim, J.W.; Hallacy, C.; Ramesh, A.; Goh, G.; Agarwal, S.; Sastry, G.; Askell, A.; Mishkin, P.; Clark, J.; et al. Learning Transferable Visual Models From Natural Language Supervision. In Proceedings of the 38th International Conference on Machine Learning (ICML), PMLR, Virtual, 18–24 July 2021; Volume 139, pp. 8748–8763. [Google Scholar]
- Li, J.; Li, D.; Savarese, S.; Hoi, S. BLIP-2: Bootstrapping Language-Image Pre-Training with Frozen Image Encoders and Large Language Models. In Proceedings of the 40th International Conference on Machine Learning (ICML), ICML’23, Honolulu, HI, USA, 23–29 July 2023; Volume 202, pp. 19730–19742. [Google Scholar]
- Da, C.; Luo, C.; Zheng, Q.; Yao, C. Vision Grid Transformer for Document Layout Analysis. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Paris, France, 1–6 October 2023; pp. 19462–19471. [Google Scholar]
Fold No. | Page IDs Across Each Fold |
---|---|
1 | 274, 309, 276, 272, 303 |
2 | 306, 273, 301, 300, 278 |
3 | 270, 302, 277, 275, 308 |
4 | 307, 304, 279, 271, 305 |
Model | mAP@25 | mAP@50 |
---|---|---|
WordRetrievalNet (Reproduced) | 94.31 ± 1.8 | 88.29 ± 4.0 |
WordRetrievalNet (Reported) | 96.46 | 94.06 |
KWS-Simplified (Reproduced) | 89.74 ± 0.7 | 72.29 ± 3.0 |
KWS-Simplified (Reported) | 91.6 | 66.4 |
Semantic Weight | Semantic LLM | George Washington Dataset | IAM Handwriting Database | ||||||
---|---|---|---|---|---|---|---|---|---|
KWS-Simplified Decoder | TrOCR Decoder | KWS-Simplified Decoder | TrOCR Decoder | ||||||
mAP@25 | mAP@50 | mAP@25 | mAP@50 | mAP@25 | mAP@50 | mAP@25 | mAP@50 | ||
0.0 * | all -MiniLM -L12 -v2 | 94.31 ± 1.8 | 88.29 ± 4.0 | 94.31 ± 1.8 | 88.29 ± 4.0 | 79.15 | 72.85 | 79.15 | 72.85 |
0.1 | 95.80 ± 1.5 | 89.51 ± 3.8 | 94.51 ± 1.8 | 88.43 ± 4.0 | 80.60 | 73.98 | 82.04 | 75.40 | |
0.2 | 96.10 ± 1.5 | 89.75 ± 3.8 | 93.96 ± 2.0 | 87.84 ± 4.0 | 79.29 | 72.62 | 80.59 | 73.77 | |
0.3 | 96.30 ± 1.3 | 89.79 ± 3.7 | 93.20 ± 2.1 | 87.07 ± 3.8 | 75.87 | 69.30 | 77.68 | 71.05 | |
0.4 | 96.04 ± 1.5 | 89.64 ± 3.6 | 91.71 ± 2.5 | 85.73 ± 3.9 | 71.36 | 65.13 | 74.04 | 67.66 | |
0.5 | 95.34 ± 1.4 | 89.01 ± 3.5 | 89.05 ± 2.9 | 83.19 ± 3.8 | 65.95 | 60.19 | 70.68 | 64.57 | |
0.6 | 94.39 ± 1.5 | 88.20 ± 3.5 | 84.82 ± 4.2 | 79.41 ± 4.2 | 60.26 | 54.99 | 67.17 | 61.33 | |
0.7 | 93.25 ± 1.6 | 87.37 ± 3.5 | 79.04 ± 4.8 | 74.03 ± 3.7 | 55.95 | 50.99 | 64.47 | 58.88 | |
0.8 | 91.92 ± 1.9 | 86.26 ± 3.5 | 73.12 ± 6.1 | 68.42 ± 4.5 | 53.12 | 48.38 | 62.88 | 57.42 | |
0.9 | 90.93 ± 2.0 | 85.42 ± 3.6 | 69.68 ± 6.6 | 65.20 ± 4.8 | 51.46 | 46.88 | 62.22 | 56.83 | |
1.0 | 89.83 ± 2.2 | 84.28 ± 3.7 | 64.58 ± 7.5 | 60.07 ± 6.0 | 47.96 | 43.71 | 54.34 | 49.48 | |
0.0 * | all -mpnet -base -v2 | 94.31 ± 1.8 | 88.29 ± 4.0 | 94.31 ± 1.8 | 88.29 ± 4.0 | 79.15 | 72.85 | 79.15 | 72.85 |
0.1 | 95.90 ± 1.4 | 89.67 ± 3.8 | 94.68 ± 1.6 | 88.62 ± 3.9 | 80.84 | 74.18 | 82.12 | 75.43 | |
0.2 | 96.27 ± 1.3 | 89.86 ± 3.8 | 94.09 ± 1.7 | 87.98 ± 3.8 | 80.01 | 73.21 | 80.77 | 74.02 | |
0.3 | 96.50 ± 1.4 | 89.97 ± 4.0 | 93.07 ± 2.1 | 87.02 ± 3.7 | 77.55 | 70.82 | 77.87 | 71.29 | |
0.4 | 96.56 ± 1.5 | 90.08 ± 3.9 | 91.78 ± 2.4 | 85.85 ± 3.7 | 73.86 | 67.22 | 74.73 | 68.33 | |
0.5 | 96.00 ± 1.3 | 89.62 ± 3.6 | 89.42 ± 2.9 | 83.85 ± 3.8 | 69.24 | 62.95 | 71.23 | 65.12 | |
0.6 | 95.04 ± 1.2 | 88.82 ± 3.5 | 85.13 ± 3.6 | 79.84 ± 3.8 | 63.70 | 57.97 | 67.84 | 61.99 | |
0.7 | 93.83 ± 1.6 | 87.86 ± 3.4 | 79.10 ± 5.3 | 74.10 ± 4.1 | 58.18 | 52.97 | 65.43 | 59.78 | |
0.8 | 92.57 ± 1.7 | 86.71 ± 3.4 | 74.06 ± 6.2 | 69.30 ± 4.6 | 53.92 | 49.10 | 63.80 | 58.25 | |
0.9 | 91.53 ± 1.8 | 85.84 ± 3.6 | 70.50 ± 6.4 | 65.90 ± 4.6 | 51.43 | 46.89 | 62.78 | 57.36 | |
1.0 | 90.21 ± 2.1 | 84.45 ± 3.7 | 64.91 ± 7.5 | 60.31 ± 6.0 | 49.26 | 44.84 | 54.99 | 50.11 | |
0.0 * | stsb -roberta -base | 94.31 ± 1.8 | 88.29 ± 4.0 | 94.31 ± 1.8 | 88.29 ± 4.0 | 79.15 | 72.85 | 79.15 | 72.85 |
0.1 | 95.83 ± 1.6 | 89.60 ± 3.9 | 94.58 ± 2.0 | 88.55 ± 4.2 | 81.16 | 74.49 | 81.88 | 75.18 | |
0.2 | 96.19 ± 1.3 | 89.82 ± 3.7 | 94.01 ± 2.1 | 87.91 ± 4.2 | 80.39 | 73.55 | 80.97 | 74.22 | |
0.3 | 96.59 ± 1.3 | 90.17 ± 3.7 | 92.87 ± 2.9 | 86.92 ± 4.5 | 77.40 | 70.68 | 77.51 | 70.90 | |
0.4 | 96.44 ± 1.6 | 89.99 ± 3.7 | 90.75 ± 3.2 | 84.96 ± 4.4 | 73.54 | 66.94 | 74.29 | 67.87 | |
0.5 | 96.32 ± 1.6 | 89.90 ± 3.7 | 87.91 ± 3.5 | 82.32 ± 4.3 | 69.45 | 63.25 | 71.28 | 65.11 | |
0.6 | 96.04 ± 1.7 | 89.57 ± 3.8 | 84.15 ± 4.3 | 78.83 ± 4.9 | 64.83 | 59.05 | 68.70 | 62.85 | |
0.7 | 95.27 ± 1.7 | 88.91 ± 4.0 | 80.37 ± 4.4 | 75.25 ± 4.7 | 61.28 | 55.73 | 66.71 | 61.03 | |
0.8 | 94.37 ± 1.8 | 88.08 ± 3.9 | 77.09 ± 4.4 | 72.16 ± 4.3 | 58.23 | 53.05 | 65.16 | 59.66 | |
0.9 | 93.54 ± 2.0 | 87.46 ± 4.0 | 74.05 ± 4.8 | 69.22 ± 4.1 | 56.22 | 51.22 | 64.13 | 58.74 | |
1.0 | 92.32 ± 2.2 | 86.19 ± 3.9 | 67.98 ± 5.4 | 63.34 ± 4.3 | 53.99 | 49.11 | 56.03 | 51.24 |
Semantic Weight | Semantic LLM | George Washington Dataset | IAM Handwriting Database | ||||||
---|---|---|---|---|---|---|---|---|---|
KWS-Simplified Decoder | TrOCR Decoder | KWS-Simplified Decoder | TrOCR Decoder | ||||||
mAP@25 | mAP@50 | mAP@25 | mAP@50 | mAP@25 | mAP@50 | mAP@25 | mAP@50 | ||
0.0 * | all -MiniLM -L12 -v2 | 89.74 ± 0.7 | 72.29 ± 3.0 | 89.74 ± 0.7 | 72.29 ± 3.0 | 86.40 | 63.73 | 86.40 | 63.73 |
0.1 | 90.62 ± 0.5 | 72.77 ± 3.0 | 90.27 ± 0.8 | 72.53 ± 3.1 | 86.71 | 63.86 | 87.49 | 64.22 | |
0.2 | 90.68 ± 0.4 | 72.82 ± 3.0 | 90.38 ± 0.7 | 72.57 ± 3.1 | 86.79 | 63.94 | 87.84 | 64.47 | |
0.3 | 90.70 ± 0.5 | 72.84 ± 3.0 | 90.35 ± 0.7 | 72.55 ± 3.2 | 86.57 | 63.72 | 88.01 | 64.58 | |
0.4 | 90.72 ± 0.5 | 72.84 ± 3.0 | 90.30 ± 0.7 | 72.51 ± 3.2 | 86.11 | 63.36 | 88.06 | 64.60 | |
0.5 | 90.79 ± 0.4 | 72.90 ± 3.0 | 90.32 ± 0.6 | 72.55 ± 3.2 | 85.65 | 63.01 | 87.74 | 64.43 | |
0.6 | 90.85 ± 0.5 | 72.93 ± 3.0 | 90.32 ± 0.6 | 72.54 ± 3.2 | 84.66 | 62.38 | 87.28 | 64.17 | |
0.7 | 90.91 ± 0.5 | 72.97 ± 3.0 | 90.26 ± 0.7 | 72.51 ± 3.2 | 83.32 | 61.47 | 86.30 | 63.56 | |
0.8 | 90.83 ± 0.4 | 72.90 ± 2.9 | 90.12 ± 0.6 | 72.42 ± 3.1 | 81.31 | 60.20 | 84.66 | 62.49 | |
0.9 | 90.82 ± 0.4 | 72.89 ± 2.9 | 89.93 ± 0.6 | 72.29 ± 3.0 | 77.61 | 57.85 | 81.51 | 60.47 | |
1.0 | 90.43 ± 0.7 | 72.73 ± 2.7 | 87.16 ± 0.6 | 70.39 ± 2.1 | 71.97 | 54.39 | 76.74 | 57.24 | |
0.0 * | all -mpnet -base -v2 | 89.74 ± 0.7 | 72.29 ± 3.0 | 89.74 ± 0.7 | 72.29 ± 3.0 | 86.40 | 63.73 | 86.40 | 63.73 |
0.1 | 90.63 ± 0.5 | 72.78 ± 3.0 | 90.22 ± 0.7 | 72.49 ± 3.1 | 86.69 | 63.88 | 87.53 | 64.25 | |
0.2 | 90.69 ± 0.5 | 72.83 ± 3.0 | 90.31 ± 0.7 | 72.51 ± 3.1 | 86.70 | 63.87 | 87.89 | 64.51 | |
0.3 | 90.72 ± 0.5 | 72.84 ± 3.0 | 90.29 ± 0.6 | 72.49 ± 3.1 | 86.56 | 63.75 | 88.16 | 64.80 | |
0.4 | 90.81 ± 0.5 | 72.91 ± 3.1 | 90.32 ± 0.7 | 72.54 ± 3.1 | 86.32 | 63.59 | 88.25 | 64.88 | |
0.5 | 90.86 ± 0.5 | 72.96 ± 3.1 | 90.29 ± 0.7 | 72.53 ± 3.2 | 85.91 | 63.33 | 87.97 | 64.69 | |
0.6 | 90.87 ± 0.4 | 72.94 ± 3.0 | 90.26 ± 0.6 | 72.49 ± 3.1 | 85.03 | 62.71 | 87.52 | 64.37 | |
0.7 | 90.94 ± 0.5 | 72.97 ± 3.1 | 90.10 ± 0.7 | 72.39 ± 3.1 | 84.15 | 62.10 | 86.60 | 63.77 | |
0.8 | 90.91 ± 0.5 | 72.96 ± 3.0 | 90.06 ± 0.6 | 72.35 ± 3.0 | 82.56 | 61.09 | 85.07 | 62.78 | |
0.9 | 90.85 ± 0.4 | 72.92 ± 3.0 | 89.74 ± 0.4 | 72.09 ± 2.9 | 79.47 | 59.19 | 81.97 | 60.86 | |
1.0 | 90.47 ± 0.5 | 72.78 ± 2.8 | 87.06 ± 0.7 | 70.23 ± 2.1 | 74.04 | 55.89 | 77.30 | 57.78 | |
0.0 * | stsb -roberta -base | 89.74 ± 0.7 | 72.29 ± 3.0 | 89.74 ± 0.7 | 72.29 ± 3.0 | 86.40 | 63.73 | 86.40 | 63.73 |
0.1 | 90.62 ± 0.5 | 72.78 ± 3.0 | 90.21 ± 0.6 | 72.46 ± 3.0 | 86.58 | 63.82 | 87.35 | 64.13 | |
0.2 | 90.63 ± 0.5 | 72.79 ± 3.0 | 90.27 ± 0.6 | 72.47 ± 3.0 | 86.78 | 63.85 | 87.78 | 64.52 | |
0.3 | 90.66 ± 0.5 | 72.80 ± 3.0 | 90.27 ± 0.6 | 72.47 ± 3.1 | 86.72 | 63.74 | 88.04 | 64.72 | |
0.4 | 90.76 ± 0.5 | 72.88 ± 3.0 | 90.32 ± 0.6 | 72.52 ± 3.1 | 86.45 | 63.59 | 88.12 | 64.84 | |
0.5 | 90.89 ± 0.5 | 72.97 ± 3.1 | 90.33 ± 0.5 | 72.54 ± 3.1 | 85.87 | 63.29 | 87.62 | 64.60 | |
0.6 | 90.90 ± 0.4 | 72.97 ± 3.1 | 90.30 ± 0.4 | 72.50 ± 3.0 | 85.09 | 62.79 | 86.78 | 64.16 | |
0.7 | 90.92 ± 0.4 | 72.97 ± 3.1 | 90.13 ± 0.4 | 72.41 ± 2.9 | 83.82 | 61.99 | 85.44 | 63.32 | |
0.8 | 90.86 ± 0.4 | 72.92 ± 3.0 | 89.96 ± 0.5 | 72.28 ± 2.9 | 81.96 | 60.79 | 83.06 | 61.79 | |
0.9 | 90.74 ± 0.4 | 72.86 ± 3.0 | 89.75 ± 0.5 | 72.12 ± 2.8 | 79.45 | 59.12 | 80.17 | 59.88 | |
1.0 | 90.33 ± 0.5 | 72.71 ± 2.8 | 87.08 ± 0.9 | 70.24 ± 2.0 | 76.10 | 57.18 | 76.49 | 57.40 |
Threshold | Semantic LLM | WordRetrievalNet | KWS-Simplified | ||||||
---|---|---|---|---|---|---|---|---|---|
KWS-Simplified Decoder | TrOCR Decoder | KWS-Simplified Decoder | TrOCR Decoder | ||||||
mAP@25 | mAP@50 | mAP@25 | mAP@50 | mAP@25 | mAP@50 | mAP@25 | mAP@50 | ||
0.1 | all -MiniLM -L12 -v2 | 94.25 ± 1.7 | 88.24 ± 3.9 | 94.16 ± 1.7 | 88.20 ± 4.1 | 89.63 ± 0.8 | 72.18 ± 3.1 | 89.50 ± 0.9 | 72.05 ± 3.0 |
0.3 | 92.02 ± 2.3 | 86.32 ± 3.7 | 77.32 ± 6.3 | 71.77 ± 4.3 | 87.52 ± 1.2 | 70.26 ± 3.7 | 71.46 ± 5.1 | 56.03 ± 3.9 | |
0.5 | 89.87 ± 2.3 | 84.52 ± 3.7 | 67.59 ± 6.7 | 63.15 ± 4.8 | 86.53 ± 1.6 | 69.69 ± 4.1 | 63.25 ± 6.1 | 49.08 ± 4.3 | |
0.7 | 88.58 ± 2.6 | 83.68 ± 3.8 | 63.25 ± 6.3 | 59.34 ± 4.7 | 85.62 ± 2.2 | 69.01 ± 4.7 | 59.35 ± 5.9 | 45.98 ± 4.3 | |
0.9 | 87.81 ± 2.7 | 83.06 ± 3.8 | 60.10 ± 6.6 | 56.45 ± 5.3 | 85.24 ± 2.3 | 68.71 ± 4.7 | 56.53 ± 5.9 | 43.64 ± 4.4 | |
0.1 | all -mpnet -base -v2 | 94.29 ± 1.8 | 88.28 ± 4.0 | 93.96 ± 2.1 | 87.97 ± 4.2 | 89.72 ± 0.7 | 72.27 ± 3.0 | 89.34 ± 0.9 | 71.90 ± 2.9 |
0.3 | 91.74 ± 2.0 | 86.21 ± 3.4 | 76.40 ± 5.8 | 71.46 ± 4.0 | 87.91 ± 1.4 | 70.75 ± 3.8 | 71.68 ± 4.9 | 56.63 ± 3.4 | |
0.5 | 89.51 ± 2.2 | 84.27 ± 3.7 | 66.96 ± 7.0 | 62.76 ± 5.4 | 86.46 ± 2.0 | 69.72 ± 4.5 | 62.64 ± 5.7 | 48.51 ± 4.4 | |
0.7 | 88.50 ± 2.3 | 83.51 ± 3.8 | 63.11 ± 6.2 | 59.20 ± 4.7 | 85.72 ± 1.9 | 69.04 ± 4.5 | 59.04 ± 5.8 | 45.68 ± 4.0 | |
0.9 | 87.50 ± 2.9 | 82.76 ± 4.0 | 59.70 ± 6.8 | 56.05 ± 5.4 | 85.24 ± 2.5 | 68.71 ± 4.8 | 56.15 ± 6.0 | 43.26 ± 4.4 | |
0.1 | stsb -roberta -base | 94.33 ± 1.8 | 88.32 ± 4.0 | 93.49 ± 1.9 | 87.45 ± 4.2 | 89.74 ± 0.7 | 72.29 ± 3.0 | 88.96 ± 1.1 | 71.52 ± 3.2 |
0.3 | 94.32 ± 2.1 | 88.33 ± 4.2 | 85.81 ± 3.4 | 80.42 ± 4.3 | 89.05 ± 1.2 | 71.73 ± 3.4 | 81.33 ± 2.7 | 64.75 ± 3.0 | |
0.5 | 92.95 ± 2.0 | 87.08 ± 3.7 | 73.83 ± 4.6 | 69.09 ± 4.0 | 88.00 ± 1.6 | 70.90 ± 4.1 | 69.24 ± 3.8 | 54.77 ± 3.5 | |
0.7 | 90.72 ± 2.5 | 85.36 ± 3.7 | 65.99 ± 5.3 | 62.04 ± 4.0 | 86.28 ± 2.4 | 69.62 ± 4.7 | 62.33 ± 4.6 | 48.99 ± 3.3 | |
0.9 | 87.82 ± 2.9 | 83.02 ± 3.9 | 60.61 ± 6.2 | 56.93 ± 4.9 | 85.35 ± 2.4 | 68.83 ± 4.8 | 57.18 ± 5.4 | 44.19 ± 3.9 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Papazis, S.; Giotis, A.P.; Nikou, C. Enhancing Keyword Spotting via NLP-Based Re-Ranking: Leveraging Semantic Relevance Feedback in the Handwritten Domain. Electronics 2025, 14, 2900. https://doi.org/10.3390/electronics14142900
Papazis S, Giotis AP, Nikou C. Enhancing Keyword Spotting via NLP-Based Re-Ranking: Leveraging Semantic Relevance Feedback in the Handwritten Domain. Electronics. 2025; 14(14):2900. https://doi.org/10.3390/electronics14142900
Chicago/Turabian StylePapazis, Stergios, Angelos P. Giotis, and Christophoros Nikou. 2025. "Enhancing Keyword Spotting via NLP-Based Re-Ranking: Leveraging Semantic Relevance Feedback in the Handwritten Domain" Electronics 14, no. 14: 2900. https://doi.org/10.3390/electronics14142900
APA StylePapazis, S., Giotis, A. P., & Nikou, C. (2025). Enhancing Keyword Spotting via NLP-Based Re-Ranking: Leveraging Semantic Relevance Feedback in the Handwritten Domain. Electronics, 14(14), 2900. https://doi.org/10.3390/electronics14142900