Machine Learning Applications in Natural Language Processing

A special issue of Big Data and Cognitive Computing (ISSN 2504-2289). This special issue belongs to the section "Data Mining and Machine Learning".

Deadline for manuscript submissions: 20 May 2026 | Viewed by 4872

Special Issue Editors

School of Computer Science, University of Nottingham Ningbo China, Ningbo 315100, China
Interests: machine learning; XAI; big data; Internet of Things; multimedia security and forensics; security and QoS in wireless networks
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
Henley Business School, University of Reading, Berkshire RG6 6UR, UK
Interests: big data; IoT and cloud computing in informatics and E-Business
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
Center for Advanced Intelligence Project (AIP), RIKEN, Tokyo 103-0027, Japan
Interests: AI4Sci; quantum machine learning; natural language processing

Special Issue Information

Dear Colleagues,

Currently, natural language processing (NLP) is one of the most important technologies. In the era of big data, especially with the increasing amount of text available online, the progress of NLP has grown rapidly. In NLP, large-scale datasets (e.g., text corpora and multilingual databases) enable the training of advanced models.

Machine learning is a branch of artificial intelligence that focuses on developing algorithms to learn from data and improve their performance over time. With their development, they have revolutionized NLP, enabling tasks such as language translation, large-scale text processing, document summarization, human-like response generation, and speech recognition to be conducted with greater accuracy and efficiency. Furthermore, interdisciplinary research on machine learning-based NLP has been expanded to healthcare, finance, education, entertainment, etc.

The combination of machine learning and NLP has led to significant advancements in the field, opening up new possibilities for applications.

This Special Issue aims to provide a comprehensive platform to explore the latest advancements in machine learning applications in NLP, with a specific emphasis on the integration of big data, deep learning, and large language model technologies. In essence, this Special Issue’s themes are a subset of "Big Data and Cognitive Computing", focusing specifically on NLP-driven advancements that leverage data-intensive, AI-powered methods to enhance machine intelligence. By showcasing research and innovative approaches in the design, development, and implementation of machine learning applications in NLP, this Special Issue contributes directly to the evolution of cognitive computing systems that aim to mimic human cognitive abilities. Cognitive computing relies on big data to simulate human-like reasoning, decision-making, and language understanding.

This Special Issue also places high value on interdisciplinary research that explores the practical applications of NLP in various domains.

Original research articles and reviews are welcome in this Special Issue. The scope includes, but is not limited to, the following topics:

  • Text classification and clustering;
  • Text mining;
  • Text summarization;
  • Information extraction;
  • Information retrieval;
  • Machine translation;
  • Question answering;
  • Sentiment analysis;
  • Large language models;
  • Multimodal NLP;
  • NLP applications in healthcare;
  • NLP applications in finance;
  • NLP applications in education;
  • NLP applications in entertainment;
  • Explainable artificial intelligence for NLP;
  • Semiotic-aware NLP;
  • Semiotics for NLP explainability.

We look forward to receiving your contributions.

Dr. Ying Weng
Prof. Dr. Kecheng Liu
Dr. Chao Li
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 250 words) can be sent to the Editorial Office for assessment.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Big Data and Cognitive Computing is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1800 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • big data
  • deep learning
  • large language model
  • multimodal
  • XAI

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • Reprint: MDPI Books provides the opportunity to republish successful Special Issues in book format, both online and in print.

Further information on MDPI's Special Issue policies can be found here.

Published Papers (2 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Jump to: Review

25 pages, 4657 KB  
Article
Identifying Methodological Language in Psychology Abstracts: A Machine Learning Approach Using NLP and Embedding-Based Clustering
by Konstantinos G. Stathakis, George Papageorgiou and Christos Tjortjis
Big Data Cogn. Comput. 2025, 9(9), 224; https://doi.org/10.3390/bdcc9090224 - 29 Aug 2025
Viewed by 1679
Abstract
Research articles are valuable resources for Information Retrieval and Natural Language Processing (NLP) tasks, offering opportunities to analyze key components of scholarly content. This study investigates the presence of methodological terminology in psychology research over the past 30 years (1995–2024) by applying a [...] Read more.
Research articles are valuable resources for Information Retrieval and Natural Language Processing (NLP) tasks, offering opportunities to analyze key components of scholarly content. This study investigates the presence of methodological terminology in psychology research over the past 30 years (1995–2024) by applying a novel NLP and Machine Learning pipeline to a large corpus of 85,452 abstracts, as well as the extent to which this terminology forms distinct thematic groupings. Combining glossary-based extraction, contextualized language model embeddings, and dual-mode clustering, this study offers a scalable framework for the exploration of methodological transparency in scientific text via deep semantic structures. A curated glossary of 365 method-related keywords served as a gold-standard reference for term identification, using direct and fuzzy string matching. Retrieved terms were encoded with SciBERT, averaging embeddings across contextual occurrences to produce unified vectors. These vectors were clustered using unsupervised and weighted unsupervised approaches, yielding six and ten clusters, respectively. Cluster composition was analyzed using weighted statistical measures to assess term importance within and across groups. A total of 78.16% of the examined abstracts contained glossary terms, with an average of 1.8 term per abstract, highlighting an increasing presence of methodological terminology in psychology and reflecting a shift toward greater transparency in research reporting. This work goes beyond the use of static vectors by incorporating contextual understanding in the examination of methodological terminology, while offering a scalable and generalizable approach to semantic analysis in scientific texts, with implications for meta-research, domain-specific lexicon development, and automated scientific knowledge discovery. Full article
(This article belongs to the Special Issue Machine Learning Applications in Natural Language Processing)
Show Figures

Figure 1

Review

Jump to: Research

17 pages, 698 KB  
Review
What Distinguishes AI-Generated from Human Writing? A Rapid Review of the Literature
by Georgios P. Georgiou
Big Data Cogn. Comput. 2026, 10(2), 55; https://doi.org/10.3390/bdcc10020055 - 8 Feb 2026
Viewed by 2508
Abstract
Large language models (LLMs) are now routine writing tools across various domains, intensifying questions about when text should be treated as human-authored, artificial intelligence (AI)-generated, or collaboratively produced. This rapid review aims to identify cue families reported in empirical studies as distinguishing AI [...] Read more.
Large language models (LLMs) are now routine writing tools across various domains, intensifying questions about when text should be treated as human-authored, artificial intelligence (AI)-generated, or collaboratively produced. This rapid review aims to identify cue families reported in empirical studies as distinguishing AI from human-authored text and to assess how stable these cues are across genres/tasks, text lengths, and revision conditions. Following the Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) guidelines, we searched four online databases for peer-reviewed empirical articles (1 January 2022–1 January 2026). After deduplication and screening, 40 studies were included. Evidence converged on five cue families: surface, discourse/pragmatic, epistemic/content, predictability/probabilistic, and provenance. Surface cues dominated the literature and were the most consistently operationalized. Discourse/pragmatic cues followed, particularly in discipline-bound academic genres where stance and metadiscourse differentiated AI from human writing. Predictability/probabilistic cues were central in detector-focused studies, while epistemic/content cues emerged primarily in tasks where grounding and authenticity were salient. Provenance cues were concentrated in watermarking research. Across studies, cue stability was consistently conditional rather than universal. Specifically, surface and discourse cues often remained discriminative within constrained genres, but shifted with register and discipline; probabilistic cues were powerful yet fragile under paraphrasing, post-editing, and evasion; and provenance signals required robustness to editing, mixing, and span localization. Overall, the literature indicates that AI–human distinction emerges from layered and context-dependent cue profiles rather than from any single reliable marker. High-stakes decisions, therefore, require condition-aware interpretation, triangulation across multiple cue families, and human oversight rather than automated classification in isolation. Full article
(This article belongs to the Special Issue Machine Learning Applications in Natural Language Processing)
Show Figures

Figure 1

Back to TopTop