Emerging Theory and Applications in Natural Language Processing, 2nd Edition

A special issue of Electronics (ISSN 2079-9292). This special issue belongs to the section "Artificial Intelligence".

Deadline for manuscript submissions: 15 December 2025 | Viewed by 5059

Special Issue Editors


E-Mail Website
Guest Editor
School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China
Interests: knowledge graph; natural language processing; multimodal
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing 100083, China
Interests: natural language processing; knowledge graph; machine learning
Special Issues, Collections and Topics in MDPI journals
School of Computer Science and Technology, Dalian University of Technology, Dalian 116081, China
Interests: information retrieval; question answering and dialogue; natural language processing; large language models
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues

In recent years, natural language processing (NLP) has been transformed by groundbreaking advancements in deep learning and the emergence of large language models (LLMs). The integration of LLMs with adaptation tuning methods has significantly increased the generalization capabilities of NLP models, potentially enabling the development of general artificial intelligence systems. Recognizing the significance of this progress, it is crucial to explore their potential and understand their relationship with classical methods in shaping the future of NLP and its real-world applications. The aim of this Special Issue is to showcase cutting-edge research in NLP, highlighting novel theories, methods, and applications that advance the state of the art, while also promoting interdisciplinary research.

The scope of this Special Issue includes, but is not limited to, the following topics:

  • Novel NLP theory, architectures, and algorithms;
  • Theoretical foundations of LLMs: emergent abilities, scaling effects, etc.;
  • Model training and utilization strategies;
  • Efficiency and scalability of language models;
  • Integration of NLP with other AI technologies;
  • Interpretability of NLP and LLM;
  • Evaluating large language models: capabilities and limitations;
  • Ethical considerations and fairness;
  • Safety and alignment in LLMs;
  • Domain-specific NLP applications;
  • Other emerging topics in NLP and LLM research.

Dr. Linmei Hu
Dr. Jian Liu
Dr. Bo Xu
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Electronics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • natural language processing
  • large language models
  • NLP theory and application

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • Reprint: MDPI Books provides the opportunity to republish successful Special Issues in book format, both online and in print.

Further information on MDPI's Special Issue policies can be found here.

Related Special Issue

Published Papers (5 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

35 pages, 8966 KB  
Article
Verified Language Processing with Hybrid Explainability
by Oliver Robert Fox, Giacomo Bergami and Graham Morgan
Electronics 2025, 14(17), 3490; https://doi.org/10.3390/electronics14173490 - 31 Aug 2025
Viewed by 521
Abstract
The volume and diversity of digital information have led to a growing reliance on Machine Learning (ML) techniques, such as Natural Language Processing (NLP), for interpreting and accessing appropriate data. While vector and graph embeddings represent data for similarity tasks, current state-of-the-art pipelines [...] Read more.
The volume and diversity of digital information have led to a growing reliance on Machine Learning (ML) techniques, such as Natural Language Processing (NLP), for interpreting and accessing appropriate data. While vector and graph embeddings represent data for similarity tasks, current state-of-the-art pipelines lack guaranteed explainability, failing to accurately determine similarity for given full texts. These considerations can also be applied to classifiers exploiting generative language models with logical prompts, which fail to correctly distinguish between logical implication, indifference, and inconsistency, despite being explicitly trained to recognise the first two classes. We present a novel pipeline designed for hybrid explainability to address this. Our methodology combines graphs and logic to produce First-Order Logic (FOL) representations, creating machine- and human-readable representations through Montague Grammar (MG). The preliminary results indicate the effectiveness of this approach in accurately capturing full text similarity. To the best of our knowledge, this is the first approach to differentiate between implication, inconsistency, and indifference for text classification tasks. To address the limitations of existing approaches, we use three self-contained datasets annotated for the former classification task to determine the suitability of these approaches in capturing sentence structure equivalence, logical connectives, and spatiotemporal reasoning. We also use these data to compare the proposed method with language models pre-trained for detecting sentence entailment. The results show that the proposed method outperforms state-of-the-art models, indicating that natural language understanding cannot be easily generalised by training over extensive document corpora. This work offers a step toward more transparent and reliable Information Retrieval (IR) from extensive textual data. Full article
Show Figures

Graphical abstract

31 pages, 855 KB  
Article
A Comparative Evaluation of Transformer-Based Language Models for Topic-Based Sentiment Analysis
by Spyridon Tzimiris, Stefanos Nikiforos, Maria Nefeli Nikiforos, Despoina Mouratidis and Katia Lida Kermanidis
Electronics 2025, 14(15), 2957; https://doi.org/10.3390/electronics14152957 - 24 Jul 2025
Viewed by 1313
Abstract
This research investigates topic-based sentiment classification in Greek educational-related data using transformer-based language models. A comparative evaluation is conducted on GreekBERT, XLM-r-Greek, mBERT, and Palobert using three original sentiment-annotated datasets representing parents of students with functional diversity, school directors, and teachers, each capturing [...] Read more.
This research investigates topic-based sentiment classification in Greek educational-related data using transformer-based language models. A comparative evaluation is conducted on GreekBERT, XLM-r-Greek, mBERT, and Palobert using three original sentiment-annotated datasets representing parents of students with functional diversity, school directors, and teachers, each capturing diverse educational perspectives. The analysis examines both overall sentiment performance and topic-specific evaluations across four thematic classes: (i) Material and Technical Conditions, (ii) Educational Dimension, (iii) Psychological/Emotional Dimension, and (iv) Learning Difficulties and Emergency Remote Teaching. Results indicate that GreekBERT consistently outperforms other models, achieving the highest overall F1 score (0.91), particularly excelling in negative sentiment detection (F1 = 0.95) and showing robust performance for positive sentiment classification. The Psychological/Emotional Dimension emerged as the most reliably classified category, with GreekBERT and mBERT demonstrating notably high accuracy and F1 scores. Conversely, Learning Difficulties and Emergency Remote Teaching presented significant classification challenges, especially for Palobert. This study contributes significantly to the field of sentiment analysis with Greek-language data by introducing original annotated datasets, pioneering the application of topic-based sentiment analysis within the Greek educational context, and offering a comparative evaluation of transformer models. Additionally, it highlights the superior performance of Greek-pretrained models in capturing emotional detail, and provides empirical evidence of the negative emotional responses toward Emergency Remote Teaching. Full article
Show Figures

Figure 1

29 pages, 1234 KB  
Article
Automatic Detection of the CaRS Framework in Scholarly Writing Using Natural Language Processing
by Olajide Omotola, Nonso Nnamoko, Charles Lam, Ioannis Korkontzelos, Callum Altham and Joseph Barrowclough
Electronics 2025, 14(14), 2799; https://doi.org/10.3390/electronics14142799 - 11 Jul 2025
Viewed by 563
Abstract
Many academic introductions suffer from inconsistencies and a lack of comprehensive structure, often failing to effectively outline the core elements of the research. This not only impacts the clarity and readability of the article but also hinders the communication of its significance and [...] Read more.
Many academic introductions suffer from inconsistencies and a lack of comprehensive structure, often failing to effectively outline the core elements of the research. This not only impacts the clarity and readability of the article but also hinders the communication of its significance and objectives to the intended audience. This study aims to automate the CaRS (Creating a Research Space) model using machine learning and natural language processing techniques. We conducted a series of experiments using a custom-developed corpus of 50 biology research article introductions, annotated with rhetorical moves and steps. The dataset was used to evaluate the performance of four classification algorithms: Prototypical Network (PN), Support Vector Machines (SVM), Naïve Bayes (NB), and Random Forest (RF); in combination with six embedding models: Word2Vec, GloVe, BERT, GPT-2, Llama-3.2-3B, and TEv3-small. Multiple experiments were carried out to assess performance at both the move and step levels using 5-fold cross-validation. Evaluation metrics included accuracy and weighted F1-score, with comprehensive results provided. Results show that the SVM classifier, when paired with Llama-3.2-3B embeddings, consistently achieved the highest performance across multiple tasks when trained on preprocessed dataset, with 79% accuracy and weighted F1-score on rhetorical moves and strong results on M2 steps (75% accuracy and weighted F1-score). While other combinations showed promise, particularly NB and RF with newer embeddings, none matched the consistency of the SVM–Llama pairing. Compared to existing benchmarks, our model achieves similar or better performance; however, direct comparison is limited due to differences in datasets and experimental setups. Despite the unavailability of the benchmark dataset, our findings indicate that SVM is an effective choice for rhetorical classification, even in few-shot learning scenarios. Full article
Show Figures

Figure 1

23 pages, 809 KB  
Article
Towards Smarter Assessments: Enhancing Bloom’s Taxonomy Classification with a Bayesian-Optimized Ensemble Model Using Deep Learning and TF-IDF Features
by Ali Alammary and Saeed Masoud
Electronics 2025, 14(12), 2312; https://doi.org/10.3390/electronics14122312 - 6 Jun 2025
Cited by 1 | Viewed by 1565
Abstract
Bloom’s taxonomy provides a well-established framework for categorizing the cognitive complexity of assessment questions, ensuring alignment with course learning outcomes (CLOs). Achieving this alignment is essential for constructing meaningful and valid assessments that accurately measure student learning. However, in higher education, the large [...] Read more.
Bloom’s taxonomy provides a well-established framework for categorizing the cognitive complexity of assessment questions, ensuring alignment with course learning outcomes (CLOs). Achieving this alignment is essential for constructing meaningful and valid assessments that accurately measure student learning. However, in higher education, the large volume of questions that instructors must develop each semester makes manual classification of cognitive levels a time-consuming and error-prone process. Despite various attempts to automate this classification, the highest accuracy reported in existing research has not exceeded 93.5%, highlighting the need for further advancements in this area. Furthermore, the best-performing deep learning models only reached an accuracy of 86%. These results emphasize the need for improvement, particularly in the application of deep learning models, which have not been fully exploited for this task. In response to these challenges, our study explores a novel approach to enhance the accuracy of cognitive level classification. We leverage a combination of augmentation through synonym substitution, advanced feature extraction techniques utilizing DistilBERT and TF-IDF, and a robust ensemble model incorporating soft voting. These methods were selected to capture both semantic meaning and term frequency, allowing the model to benefit from contextual depth and statistical relevance. Additionally, Bayesian optimization is employed for hyperparameter tuning to refine the model’s performance further. The novelty of our approach lies in the fusion of sparse TF-IDF features with dense DistilBERT embeddings, optimized through Bayesian search across multiple classifiers. This hybrid design captures both term-level salience and deep contextual semantics, something not fully exploited in prior models focused solely on transformer architectures. Our soft-voting ensemble capitalizes on classifier diversity, yielding more stable and accurate results. Through this integrated approach outperformed previous configurations with an accuracy of 96%, surpassing the current state-of-the-art results and setting a new benchmark for automated cognitive level classification. These findings have significant implications for the development of high-quality, scalable assessments in educational settings. Full article
Show Figures

Figure 1

19 pages, 3091 KB  
Article
Efficient Data Reduction Through Maximum-Separation Vector Selection and Centroid Embedding Representation
by Sultan Alshamrani
Electronics 2025, 14(10), 1919; https://doi.org/10.3390/electronics14101919 - 9 May 2025
Viewed by 524
Abstract
This study introduces two novel data reduction approaches for efficient sentiment analysis: High-Distance Sentiment Vectors (HDSV) and Centroid Sentiment Embedding Vectors (CSEV). By leveraging embedding space characteristics from DistilBERT, HDSV selects maximally separated sample pairs, while CSEV computes representative centroids for each sentiment [...] Read more.
This study introduces two novel data reduction approaches for efficient sentiment analysis: High-Distance Sentiment Vectors (HDSV) and Centroid Sentiment Embedding Vectors (CSEV). By leveraging embedding space characteristics from DistilBERT, HDSV selects maximally separated sample pairs, while CSEV computes representative centroids for each sentiment class. We evaluate these methods on three benchmark datasets: SST-2, Yelp, and Sentiment140. Our results demonstrate remarkable data efficiency, reducing training samples to just 100 with HDSV and two with CSEV while maintaining comparable performance to full dataset training. Notable findings include CSEV achieving 88.93% accuracy on SST-2 (compared to 90.14% with full data) and both methods showing improved cross-dataset generalization, with less than 2% accuracy drop in domain transfer tasks versus 11.94% for full dataset training. The proposed methods enable significant storage savings, with datasets compressed to less than 1% of their original size, making them particularly valuable for resource-constrained environments. Our findings advance the understanding of data requirements in sentiment analysis, demonstrating that strategically selected minimal training data can achieve robust and generalizable classification while promoting more sustainable machine learning practices. Full article
Show Figures

Figure 1

Back to TopTop