Advances in Text Mining and Analytics

A special issue of Electronics (ISSN 2079-9292). This special issue belongs to the section "Computer Science & Engineering".

Deadline for manuscript submissions: 15 June 2026 | Viewed by 3451

Special Issue Editors

Department of Computer Science and Engineering, University of North Texas, Denton, TX 76205, USA
Interests: data mining and knowledge discovery; information retrieval and extraction; web mining and social network analysis; biomedical and healthcare applications

E-Mail Website
Guest Editor
The Anuradha and Vikas Sinha Department of Data Science, University of North Texas, Denton, TX 76203, USA
Interests: natrual language processing; AIoT (Artificial Intelligence of Things); text mining; generative AI; recommender systems; web service
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Text is the most traditional method for information recording and knowledge representation. Nowadays, textual information is growing at an astounding pace, creating an enormous challenge for analysts trying to discover valuable information that is buried. Text mining focuses on mining meaningful information from massive texts. Text analytics uses computer algorithms and techniques that enable computers to understand and interpret human language, transforming or analyzing unstructured text data into usable formats to identify patterns, trends, and insights from extensive textual data. For example, new non-trivial trends, patterns, and associations among entities of interest, such as relationships between genes, proteins, and diseases and the connections between different places or the commonalities of people, are such forms of underlying knowledge. However, great challenges face many text mining and analytic tasks because of the increasing volume of text data and the difficulty in capturing valuable knowledge hidden in them. Efficient and effective text mining and analytic techniques are demanded to tackle these existing challenges.

This Special Issue aims to present the latest research and developments in text mining and analytics, including new methods and techniques with recent advancements in machine learning, natural language processing, and artificial intelligence. Authors are invited to submit original, unpublished articles addressing the development of new text mining and analytics techniques, such as algorithms, software, and others. Applications of text mining and analytics techniques in different contexts are also welcomed, such as sentiment analysis, web mining and social network analysis, topic modeling, named entity recognition, biomedical literature mining, and healthcare informatics. Topics of interest include but are not limited to the following:

  • Natural language understanding;
  • Information retrieval and extraction;
  • Generative AI;
  • Document topic modeling;
  • Knowledge discovery in text;
  • Language modeling;
  • Recommender systems;
  • Knowledge networks and graphs;
  • Named entity recognition and entity linking;
  • Document semantic extraction and relation extraction;
  • Sentiment analysis, opinion, and argument mining;
  • Text summarization;
  • Question answering systems;
  • Development of software for text mining and analytics;
  • Applications of text mining and analytics (such as education, healthcare, bioinformatics, finance, social media, computational life science, etc.).

Dr. Wei Jin
Dr. Yang Zhang
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 250 words) can be sent to the Editorial Office for assessment.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Electronics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • text mining
  • natural language processing
  • information retrieval
  • text analytics
  • relation extraction
  • entity recognition and linking
  • language models

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • Reprint: MDPI Books provides the opportunity to republish successful Special Issues in book format, both online and in print.

Further information on MDPI's Special Issue policies can be found here.

Published Papers (3 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

21 pages, 1991 KB  
Article
Zero-Shot Resume–Job Matching with LLMs via Structured Prompting and Semantic Embeddings
by Panagiotis Skondras, Panagiotis Zervas and Giannis Tzimas
Electronics 2025, 14(24), 4960; https://doi.org/10.3390/electronics14244960 - 17 Dec 2025
Viewed by 363
Abstract
In this article, we present a tool for matching resumes to job posts and vice versa (job post to resumes). With minor modifications, it may also be adapted to other domains where text matching is necessary. This tool may help organizations save time [...] Read more.
In this article, we present a tool for matching resumes to job posts and vice versa (job post to resumes). With minor modifications, it may also be adapted to other domains where text matching is necessary. This tool may help organizations save time during the hiring process, as well as assist applicants by allowing them to match their resumes to job posts they have selected. To achieve text matching without any model training (zero-shot matching), we constructed dynamic structured prompts that consisted of unstructured and semi-structured job posts and resumes based on specific criteria, and we utilized the Chain of Thought (CoT) technique on the Mistral model (open-mistral-7b). In response, the model generated structured (segmented) job posts and resumes. Then, the job posts and resumes were cleaned and preprocessed. We utilized state-of-the-art sentence similarity models hosted on Hugging face (nomic-embed-text-v1-5 and google-embedding-gemma-300m) through inference endpoints to create sentence embeddings for each resume and job post segment. We used the cosine similarity metric to determine the optimal matching, and the matching operation was applied to eleven different occupations. The results we achieved reached up to 87% accuracy for some of the occupations and underscore the potential of zero-shot techniques in text matching utilizing LLMs. The dataset we used was from indeed.com, and the Spring AI framework was used for the implementation of the tool. Full article
(This article belongs to the Special Issue Advances in Text Mining and Analytics)
Show Figures

Figure 1

0 pages, 1764 KB  
Article
A Domain-Finetuned Semantic Matching Framework Based on Dynamic Masking and Contrastive Learning for Specialized Text Retrieval
by Yiming Zhang, Yong Zhu, Zijie Zhu, Pengzhong Liu, Pengfei Xie and Cong Wu
Electronics 2025, 14(24), 4882; https://doi.org/10.3390/electronics14244882 - 11 Dec 2025
Viewed by 218
Abstract
Semantic matching is essential for understanding natural language, but traditional models like BERT face challenges with random masking strategies, limiting their ability to capture key information. Additionally, BERT’s sentence vectors may “collapse,” making it difficult to distinguish between different sentences. This paper introduces [...] Read more.
Semantic matching is essential for understanding natural language, but traditional models like BERT face challenges with random masking strategies, limiting their ability to capture key information. Additionally, BERT’s sentence vectors may “collapse,” making it difficult to distinguish between different sentences. This paper introduces a domain-finetuned semantic matching framework that uses dynamic masking and contrastive learning techniques to address these issues. The dynamic masking strategy enhances the model’s ability to retain critical information, while contrastive learning improves sentence vector representations using a small amount of unlabeled text. This approach helps the model better align with the needs of various downstream tasks. Experimental results show that after private domain training, the model improves semantic similarity between entities by 16.9%, outperforming existing models. It also demonstrates an 8.0% average improvement in semantic matching for diverse text. Performance metrics such as A@1, A@3, and A@5 are at least 26.1% higher than those of competing models. For newly added entities, the model achieves a 44.3% average improvement, consistently surpassing other models by at least 30%. These results collectively validate the effectiveness and superiority of the proposed framework in domain-specific semantic matching tasks. Full article
(This article belongs to the Special Issue Advances in Text Mining and Analytics)
Show Figures

Figure 1

19 pages, 1599 KB  
Article
Enhancing Clinical Named Entity Recognition via Fine-Tuned BERT and Dictionary-Infused Retrieval-Augmented Generation
by Soumya Challaru Sreenivas, Saqib Chowdhury and Mohammad Masum
Electronics 2025, 14(18), 3676; https://doi.org/10.3390/electronics14183676 - 17 Sep 2025
Viewed by 2448
Abstract
Clinical notes often contain unstructured text filled with abbreviations, non-standard terminology, and inconsistent phrasing, which pose significant challenges for automated medical information extraction. Named Entity Recognition (NER) plays a crucial role in structuring this data by identifying and categorizing key clinical entities such [...] Read more.
Clinical notes often contain unstructured text filled with abbreviations, non-standard terminology, and inconsistent phrasing, which pose significant challenges for automated medical information extraction. Named Entity Recognition (NER) plays a crucial role in structuring this data by identifying and categorizing key clinical entities such as symptoms, medications, and diagnoses. However, traditional and even transformer-based NER models often struggle with ambiguity and fail to produce clinically interpretable outputs. In this study, we present a hybrid two-stage framework that enhances medical NER by integrating a fine-tuned BERT model for initial entity extraction with a Dictionary-Infused Retrieval-Augmented Generation (DiRAG) module for terminology normalization. Our approach addresses two critical limitations in current clinical NER systems: lack of contextual clarity and inconsistent standardization of medical terms. The DiRAG module combines semantic retrieval from a UMLS-based vector database with lexical matching and prompt-based generation using a large language model, ensuring precise and explainable normalization of ambiguous entities. The fine-tuned BERT model achieved an F1 score of 0.708 on the MACCROBAT dataset, outperforming several domain-specific baselines, including BioBERT and ClinicalBERT. The integration of the DiRAG module further improved the interpretability and clinical relevance of the extracted entities. Through qualitative case studies, we demonstrate that our framework not only enhances clarity but also mitigates common issues such as abbreviation ambiguity and terminology inconsistency. Full article
(This article belongs to the Special Issue Advances in Text Mining and Analytics)
Show Figures

Figure 1

Back to TopTop