Emerging Theory and Applications in Natural Language Processing, 2nd Edition

A special issue of Electronics (ISSN 2079-9292). This special issue belongs to the section "Artificial Intelligence".

Deadline for manuscript submissions: 15 December 2025 | Viewed by 1622

Special Issue Editors


E-Mail Website
Guest Editor
School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China
Interests: knowledge graph; natural language processing; multimodal
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing 100083, China
Interests: natural language processing; knowledge graph; machine learning
Special Issues, Collections and Topics in MDPI journals
School of Computer Science and Technology, Dalian University of Technology, Dalian 116081, China
Interests: information retrieval; question answering and dialogue; natural language processing; large language models
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues

In recent years, natural language processing (NLP) has been transformed by groundbreaking advancements in deep learning and the emergence of large language models (LLMs). The integration of LLMs with adaptation tuning methods has significantly increased the generalization capabilities of NLP models, potentially enabling the development of general artificial intelligence systems. Recognizing the significance of this progress, it is crucial to explore their potential and understand their relationship with classical methods in shaping the future of NLP and its real-world applications. The aim of this Special Issue is to showcase cutting-edge research in NLP, highlighting novel theories, methods, and applications that advance the state of the art, while also promoting interdisciplinary research.

The scope of this Special Issue includes, but is not limited to, the following topics:

  • Novel NLP theory, architectures, and algorithms;
  • Theoretical foundations of LLMs: emergent abilities, scaling effects, etc.;
  • Model training and utilization strategies;
  • Efficiency and scalability of language models;
  • Integration of NLP with other AI technologies;
  • Interpretability of NLP and LLM;
  • Evaluating large language models: capabilities and limitations;
  • Ethical considerations and fairness;
  • Safety and alignment in LLMs;
  • Domain-specific NLP applications;
  • Other emerging topics in NLP and LLM research.

Dr. Linmei Hu
Dr. Jian Liu
Dr. Bo Xu
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Electronics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • natural language processing
  • large language models
  • NLP theory and application

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • Reprint: MDPI Books provides the opportunity to republish successful Special Issues in book format, both online and in print.

Further information on MDPI's Special Issue policies can be found here.

Related Special Issue

Published Papers (3 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

29 pages, 1234 KiB  
Article
Automatic Detection of the CaRS Framework in Scholarly Writing Using Natural Language Processing
by Olajide Omotola, Nonso Nnamoko, Charles Lam, Ioannis Korkontzelos, Callum Altham and Joseph Barrowclough
Electronics 2025, 14(14), 2799; https://doi.org/10.3390/electronics14142799 - 11 Jul 2025
Viewed by 264
Abstract
Many academic introductions suffer from inconsistencies and a lack of comprehensive structure, often failing to effectively outline the core elements of the research. This not only impacts the clarity and readability of the article but also hinders the communication of its significance and [...] Read more.
Many academic introductions suffer from inconsistencies and a lack of comprehensive structure, often failing to effectively outline the core elements of the research. This not only impacts the clarity and readability of the article but also hinders the communication of its significance and objectives to the intended audience. This study aims to automate the CaRS (Creating a Research Space) model using machine learning and natural language processing techniques. We conducted a series of experiments using a custom-developed corpus of 50 biology research article introductions, annotated with rhetorical moves and steps. The dataset was used to evaluate the performance of four classification algorithms: Prototypical Network (PN), Support Vector Machines (SVM), Naïve Bayes (NB), and Random Forest (RF); in combination with six embedding models: Word2Vec, GloVe, BERT, GPT-2, Llama-3.2-3B, and TEv3-small. Multiple experiments were carried out to assess performance at both the move and step levels using 5-fold cross-validation. Evaluation metrics included accuracy and weighted F1-score, with comprehensive results provided. Results show that the SVM classifier, when paired with Llama-3.2-3B embeddings, consistently achieved the highest performance across multiple tasks when trained on preprocessed dataset, with 79% accuracy and weighted F1-score on rhetorical moves and strong results on M2 steps (75% accuracy and weighted F1-score). While other combinations showed promise, particularly NB and RF with newer embeddings, none matched the consistency of the SVM–Llama pairing. Compared to existing benchmarks, our model achieves similar or better performance; however, direct comparison is limited due to differences in datasets and experimental setups. Despite the unavailability of the benchmark dataset, our findings indicate that SVM is an effective choice for rhetorical classification, even in few-shot learning scenarios. Full article
Show Figures

Figure 1

23 pages, 809 KiB  
Article
Towards Smarter Assessments: Enhancing Bloom’s Taxonomy Classification with a Bayesian-Optimized Ensemble Model Using Deep Learning and TF-IDF Features
by Ali Alammary and Saeed Masoud
Electronics 2025, 14(12), 2312; https://doi.org/10.3390/electronics14122312 - 6 Jun 2025
Viewed by 602
Abstract
Bloom’s taxonomy provides a well-established framework for categorizing the cognitive complexity of assessment questions, ensuring alignment with course learning outcomes (CLOs). Achieving this alignment is essential for constructing meaningful and valid assessments that accurately measure student learning. However, in higher education, the large [...] Read more.
Bloom’s taxonomy provides a well-established framework for categorizing the cognitive complexity of assessment questions, ensuring alignment with course learning outcomes (CLOs). Achieving this alignment is essential for constructing meaningful and valid assessments that accurately measure student learning. However, in higher education, the large volume of questions that instructors must develop each semester makes manual classification of cognitive levels a time-consuming and error-prone process. Despite various attempts to automate this classification, the highest accuracy reported in existing research has not exceeded 93.5%, highlighting the need for further advancements in this area. Furthermore, the best-performing deep learning models only reached an accuracy of 86%. These results emphasize the need for improvement, particularly in the application of deep learning models, which have not been fully exploited for this task. In response to these challenges, our study explores a novel approach to enhance the accuracy of cognitive level classification. We leverage a combination of augmentation through synonym substitution, advanced feature extraction techniques utilizing DistilBERT and TF-IDF, and a robust ensemble model incorporating soft voting. These methods were selected to capture both semantic meaning and term frequency, allowing the model to benefit from contextual depth and statistical relevance. Additionally, Bayesian optimization is employed for hyperparameter tuning to refine the model’s performance further. The novelty of our approach lies in the fusion of sparse TF-IDF features with dense DistilBERT embeddings, optimized through Bayesian search across multiple classifiers. This hybrid design captures both term-level salience and deep contextual semantics, something not fully exploited in prior models focused solely on transformer architectures. Our soft-voting ensemble capitalizes on classifier diversity, yielding more stable and accurate results. Through this integrated approach outperformed previous configurations with an accuracy of 96%, surpassing the current state-of-the-art results and setting a new benchmark for automated cognitive level classification. These findings have significant implications for the development of high-quality, scalable assessments in educational settings. Full article
Show Figures

Figure 1

19 pages, 3091 KiB  
Article
Efficient Data Reduction Through Maximum-Separation Vector Selection and Centroid Embedding Representation
by Sultan Alshamrani
Electronics 2025, 14(10), 1919; https://doi.org/10.3390/electronics14101919 - 9 May 2025
Viewed by 385
Abstract
This study introduces two novel data reduction approaches for efficient sentiment analysis: High-Distance Sentiment Vectors (HDSV) and Centroid Sentiment Embedding Vectors (CSEV). By leveraging embedding space characteristics from DistilBERT, HDSV selects maximally separated sample pairs, while CSEV computes representative centroids for each sentiment [...] Read more.
This study introduces two novel data reduction approaches for efficient sentiment analysis: High-Distance Sentiment Vectors (HDSV) and Centroid Sentiment Embedding Vectors (CSEV). By leveraging embedding space characteristics from DistilBERT, HDSV selects maximally separated sample pairs, while CSEV computes representative centroids for each sentiment class. We evaluate these methods on three benchmark datasets: SST-2, Yelp, and Sentiment140. Our results demonstrate remarkable data efficiency, reducing training samples to just 100 with HDSV and two with CSEV while maintaining comparable performance to full dataset training. Notable findings include CSEV achieving 88.93% accuracy on SST-2 (compared to 90.14% with full data) and both methods showing improved cross-dataset generalization, with less than 2% accuracy drop in domain transfer tasks versus 11.94% for full dataset training. The proposed methods enable significant storage savings, with datasets compressed to less than 1% of their original size, making them particularly valuable for resource-constrained environments. Our findings advance the understanding of data requirements in sentiment analysis, demonstrating that strategically selected minimal training data can achieve robust and generalizable classification while promoting more sustainable machine learning practices. Full article
Show Figures

Figure 1

Back to TopTop