Natural Language Processing (NLP) and Large Language Modelling

A special issue of Computers (ISSN 2073-431X).

Deadline for manuscript submissions: 31 July 2025 | Viewed by 9561

Special Issue Editor


E-Mail Website
Guest Editor
School of Info Technology, Faculty of Science, Engineering and Built Environment, Geelong Waurn Ponds Campus, Deakin University, Geelong, VIC 3216, Australia
Interests: natural language processing; small efficient language modelling; continual learning; text generation; adversarial learning; scientific text mining; multimodality; conversational systems

Special Issue Information

Dear Colleagues,

NLP is a rapidly evolving field that plays a crucial role in shaping the future of human–computer interactions, with applications ranging from sentiment analysis and machine translation to question answering and dialogue systems.

We invite researchers, practitioners, and enthusiasts to submit original research articles, reviews, and case studies that contribute to the advancement of NLP. Extended conference papers are also welcome, but they should contain at least 50% of new material, e.g., in the form of technical extensions, more in-depth evaluations, or additional use cases. Topics of interest for this Special Issue include, but are not limited to, the following:

  • Large language modelling and its applications;
  • Sentiment analysis and opinion mining;
  • Machine translation and multilingual processing;
  • Question answering and information retrieval;
  • Dialogue systems and conversational agents;
  • Text summarization and generation;
  • Natural language understanding and generation;
  • NLP applications in healthcare, finance, education, and other domains.

Submissions should present novel research findings, innovative methodologies, and practical applications that demonstrate the current state of the art in NLP. We welcome interdisciplinary approaches and encourage submissions that explore the intersection of NLP with other fields, such as machine learning, artificial intelligence, and cognitive science.

Dr. Ming Liu
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Computers is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1800 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • natural language processing
  • small efficient language modelling
  • continual learning
  • text generation
  • adversarial learning
  • scientific text mining
  • multimodality
  • conversational systems
  • large language model

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue polices can be found here.

Published Papers (7 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Jump to: Other

17 pages, 1448 KiB  
Article
LLaMA 3 vs. State-of-the-Art Large Language Models: Performance in Detecting Nuanced Fake News
by Stefan Emil Repede and Remus Brad
Computers 2024, 13(11), 292; https://doi.org/10.3390/computers13110292 - 11 Nov 2024
Viewed by 913
Abstract
This study investigates the effectiveness of a proposed version of Meta’s LLaMA 3 model in detecting fake claims across bilingual (English and Romanian) datasets, focusing on a multi-class approach beyond traditional binary classifications in order to better mimic real-world scenarios. The research employs [...] Read more.
This study investigates the effectiveness of a proposed version of Meta’s LLaMA 3 model in detecting fake claims across bilingual (English and Romanian) datasets, focusing on a multi-class approach beyond traditional binary classifications in order to better mimic real-world scenarios. The research employs a proposed version of the LLaMA 3 model, optimized for identifying nuanced categories such as “Mostly True” and “Mostly False”, and compares its performance against leading large language models (LLMs) including Open AI’s ChatGPT versions, Google’s Gemini, and similar LLaMA models. The analysis reveals that the proposed LLaMA 3 model consistently outperforms its base version and older LLaMA models, particularly in the Romanian dataset, achieving the highest accuracy of 39% and demonstrating superior capabilities in identifying nuanced claims, over all the compared large language models. However, the model’s performance across both languages highlights some challenges, with generally low accuracy and difficulties in handling ambiguous categories by all the LLMs. The study also underscores the impact of language and cultural context on model reliability, noting that even state-of-the-art models like ChatGPT 4.o and Gemini exhibit inconsistencies when applied to Romanian text and more than a binary true/false approach. Full article
(This article belongs to the Special Issue Natural Language Processing (NLP) and Large Language Modelling)
Show Figures

Figure 1

13 pages, 853 KiB  
Article
Assessing Large Language Models Used for Extracting Table Information from Annual Financial Reports
by David Balsiger, Hans-Rudolf Dimmler, Samuel Egger-Horstmann and Thomas Hanne
Computers 2024, 13(10), 257; https://doi.org/10.3390/computers13100257 - 9 Oct 2024
Viewed by 1113
Abstract
The extraction of data from tables in PDF documents has been a longstanding challenge in the field of data processing and analysis. While traditional methods have been explored in depth, the rise of Large Language Models (LLMs) offers new possibilities. This article addresses [...] Read more.
The extraction of data from tables in PDF documents has been a longstanding challenge in the field of data processing and analysis. While traditional methods have been explored in depth, the rise of Large Language Models (LLMs) offers new possibilities. This article addresses the knowledge gaps regarding LLMs, specifically ChatGPT-4 and BARD, for extracting and interpreting data from financial tables in PDF format. This research is motivated by the real-world need to efficiently gather and analyze corporate financial information. The hypothesis is that LLMs—in this case, ChatGPT-4 and BARD—can accurately extract key financial data, such as balance sheets and income statements. The methodology involves selecting representative pages from 46 annual reports of large Swiss corporations listed in the SMI Expanded Index from 2022 and copy–pasting text from these into LLMs. Eight analytical questions were posed to the LLMs, and their responses were assessed for accuracy and for identifying potential error sources in data extraction. The findings revealed significant variance in the performance of ChatGPT-4 and another LLM, BARD, with ChatGPT-4 generally exhibiting superior accuracy. This research contributes to understanding the capabilities and limitations of LLMs in processing and interpreting complex financial data from corporate documents. Full article
(This article belongs to the Special Issue Natural Language Processing (NLP) and Large Language Modelling)
Show Figures

Figure 1

20 pages, 2961 KiB  
Article
Leveraging Large Language Models with Chain-of-Thought and Prompt Engineering for Traffic Crash Severity Analysis and Inference
by Hao Zhen, Yucheng Shi, Yongcan Huang, Jidong J. Yang and Ninghao Liu
Computers 2024, 13(9), 232; https://doi.org/10.3390/computers13090232 - 14 Sep 2024
Viewed by 1616
Abstract
Harnessing the power of Large Language Models (LLMs), this study explores the use of three state-of-the-art LLMs, specifically GPT-3.5-turbo, LLaMA3-8B, and LLaMA3-70B, for crash severity analysis and inference, framing it as a classification task. We generate textual narratives from original traffic crash tabular [...] Read more.
Harnessing the power of Large Language Models (LLMs), this study explores the use of three state-of-the-art LLMs, specifically GPT-3.5-turbo, LLaMA3-8B, and LLaMA3-70B, for crash severity analysis and inference, framing it as a classification task. We generate textual narratives from original traffic crash tabular data using a pre-built template infused with domain knowledge. Additionally, we incorporated Chain-of-Thought (CoT) reasoning to guide the LLMs in analyzing the crash causes and then inferring the severity. This study also examine the impact of prompt engineering specifically designed for crash severity inference. The LLMs were tasked with crash severity inference to: (1) evaluate the models’ capabilities in crash severity analysis, (2) assess the effectiveness of CoT and domain-informed prompt engineering, and (3) examine the reasoning abilities with the CoT framework. Our results showed that LLaMA3-70B consistently outperformed the other models, particularly in zero-shot settings. The CoT and Prompt Engineering techniques significantly enhanced performance, improving logical reasoning and addressing alignment issues. Notably, the CoT offers valuable insights into LLMs’ reasoning process, unleashing their capacity to consider diverse factors such as environmental conditions, driver behavior, and vehicle characteristics in severity analysis and inference. Full article
(This article belongs to the Special Issue Natural Language Processing (NLP) and Large Language Modelling)
Show Figures

Figure 1

24 pages, 22050 KiB  
Article
SOD: A Corpus for Saudi Offensive Language Detection Classification
by Afefa Asiri and Mostafa Saleh
Computers 2024, 13(8), 211; https://doi.org/10.3390/computers13080211 - 20 Aug 2024
Viewed by 849
Abstract
Social media platforms like X (formerly known as Twitter) are integral to modern communication, enabling the sharing of news, emotions, and ideas. However, they also facilitate the spread of harmful content, and manual moderation of these platforms is impractical. Automated moderation tools, predominantly [...] Read more.
Social media platforms like X (formerly known as Twitter) are integral to modern communication, enabling the sharing of news, emotions, and ideas. However, they also facilitate the spread of harmful content, and manual moderation of these platforms is impractical. Automated moderation tools, predominantly developed for English, are insufficient for addressing online offensive language in Arabic, a language rich in dialects and informally used on social media. This gap underscores the need for dedicated, dialect-specific resources. This study introduces the Saudi Offensive Dialectal dataset (SOD), consisting of over 24,000 tweets annotated across three levels: offensive or non-offensive, with offensive tweets further categorized as general insults, hate speech, or sarcasm. A deeper analysis of hate speech identifies subtypes related to sports, religion, politics, race, and violence. A comprehensive descriptive analysis of the SOD is also provided to offer deeper insights into its composition. Using machine learning, traditional deep learning, and transformer-based deep learning models, particularly AraBERT, our research achieves a significant F1-Score of 87% in identifying offensive language. This score improves to 91% with data augmentation techniques addressing dataset imbalances. These results, which surpass many existing studies, demonstrate that a specialized dialectal dataset enhances detection efficacy compared to mixed-language datasets. Full article
(This article belongs to the Special Issue Natural Language Processing (NLP) and Large Language Modelling)
Show Figures

Figure 1

31 pages, 2905 KiB  
Article
On Using GeoGebra and ChatGPT for Geometric Discovery
by Francisco Botana, Tomas Recio and María Pilar Vélez
Computers 2024, 13(8), 187; https://doi.org/10.3390/computers13080187 - 30 Jul 2024
Viewed by 2077
Abstract
This paper explores the performance of ChatGPT and GeoGebra Discovery when dealing with automatic geometric reasoning and discovery. The emergence of Large Language Models has attracted considerable attention in mathematics, among other fields where intelligence should be present. We revisit a couple of [...] Read more.
This paper explores the performance of ChatGPT and GeoGebra Discovery when dealing with automatic geometric reasoning and discovery. The emergence of Large Language Models has attracted considerable attention in mathematics, among other fields where intelligence should be present. We revisit a couple of elementary Euclidean geometry theorems discussed in the birth of Artificial Intelligence and a non-trivial inequality concerning triangles. GeoGebra succeeds in proving all these selected examples, while ChatGPT fails in one case. Our thesis is that both GeoGebra and ChatGPT could be used as complementary systems, where the natural language abilities of ChatGPT and the certified computer algebra methods in GeoGebra Discovery can cooperate in order to obtain sound and—more relevant—interesting results. Full article
(This article belongs to the Special Issue Natural Language Processing (NLP) and Large Language Modelling)
Show Figures

Figure 1

23 pages, 578 KiB  
Article
An NLP-Based Exploration of Variance in Student Writing and Syntax: Implications for Automated Writing Evaluation
by Maria Goldshtein, Amin G. Alhashim and Rod D. Roscoe
Computers 2024, 13(7), 160; https://doi.org/10.3390/computers13070160 - 25 Jun 2024
Viewed by 1128
Abstract
In writing assessment, expert human evaluators ideally judge individual essays with attention to variance among writers’ syntactic patterns. There are many ways to compose text successfully or less successfully. For automated writing evaluation (AWE) systems to provide accurate assessment and relevant feedback, they [...] Read more.
In writing assessment, expert human evaluators ideally judge individual essays with attention to variance among writers’ syntactic patterns. There are many ways to compose text successfully or less successfully. For automated writing evaluation (AWE) systems to provide accurate assessment and relevant feedback, they must be able to consider similar kinds of variance. The current study employed natural language processing (NLP) to explore variance in syntactic complexity and sophistication across clusters characterized in a large corpus (n = 36,207) of middle school and high school argumentative essays. Using NLP tools, k-means clustering, and discriminant function analysis (DFA), we observed that student writers employed four distinct syntactic patterns: (1) familiar and descriptive language, (2) consistently simple noun phrases, (3) variably complex noun phrases, and (4) moderate complexity with less familiar language. Importantly, each pattern spanned the full range of writing quality; there were no syntactic patterns consistently evaluated as “good” or “bad”. These findings support the need for nuanced approaches in automated writing assessment while informing ways that AWE can participate in that process. Future AWE research can and should explore similar variability across other detectable elements of writing (e.g., vocabulary, cohesion, discursive cues, and sentiment) via diverse modeling methods. Full article
(This article belongs to the Special Issue Natural Language Processing (NLP) and Large Language Modelling)
Show Figures

Figure 1

Other

Jump to: Research

34 pages, 5078 KiB  
Systematic Review
Context-Aware Embedding Techniques for Addressing Meaning Conflation Deficiency in Morphologically Rich Languages Word Embedding: A Systematic Review and Meta Analysis
by Mosima Anna Masethe, Hlaudi Daniel Masethe and Sunday O. Ojo
Computers 2024, 13(10), 271; https://doi.org/10.3390/computers13100271 - 17 Oct 2024
Viewed by 958
Abstract
This systematic literature review aims to evaluate and synthesize the effectiveness of various embedding techniques—word embeddings, contextual word embeddings, and context-aware embeddings—in addressing Meaning Conflation Deficiency (MCD). Using the PRISMA framework, this study assesses the current state of research and provides insights into [...] Read more.
This systematic literature review aims to evaluate and synthesize the effectiveness of various embedding techniques—word embeddings, contextual word embeddings, and context-aware embeddings—in addressing Meaning Conflation Deficiency (MCD). Using the PRISMA framework, this study assesses the current state of research and provides insights into the impact of these techniques on resolving meaning conflation issues. After a thorough literature search, 403 articles on the subject were found. A thorough screening and selection process resulted in the inclusion of 25 studies in the meta-analysis. The evaluation adhered to the PRISMA principles, guaranteeing a methodical and lucid process. To estimate effect sizes and evaluate heterogeneity and publication bias among the chosen papers, meta-analytic approaches were utilized such as the tau-squared (τ2) which represents a statistical parameter used in random-effects, H-squared (H2) is a statistic used to measure heterogeneity, and I-squared (I2) quantify the degree of heterogeneity. The meta-analysis demonstrated a high degree of variation in effect sizes among the studies, with a τ2 value of 8.8724. The significant degree of heterogeneity was further emphasized by the H2 score of 8.10 and the I2 value of 87.65%. A trim and fill analysis with a beta value of 5.95, a standard error of 4.767, a Z-value (or Z-score) of 1.25 which is a statistical term used to express the number of standard deviations a data point deviates from the established mean, and a p-value (probability value) of 0.2 was performed to account for publication bias which is one statistical tool that can be used to assess the importance of hypothesis test results. The results point to a sizable impact size, but the estimates are highly unclear, as evidenced by the huge standard error and non-significant p-value. The review concludes that although contextually aware embeddings have promise in treating Meaning Conflation Deficiency, there is a great deal of variability and uncertainty in the available data. The varied findings among studies are highlighted by the large τ2, I2, and H2 values, and the trim and fill analysis show that changes in publication bias do not alter the impact size’s non-significance. To generate more trustworthy insights, future research should concentrate on enhancing methodological consistency, investigating other embedding strategies, and extending analysis across various languages and contexts. Even though the results demonstrate a significant impact size in addressing MCD through sophisticated word embedding techniques, like context-aware embeddings, there is still a great deal of variability and uncertainty because of various factors, including the different languages studied, the sizes of the corpuses, and the embedding techniques used. These differences show how future research methods must be standardized to guarantee that study results can be compared to one another. The results emphasize how crucial it is to extend the linguistic scope to more morphologically rich and low-resource languages, where MCD is especially difficult. The creation of language-specific models for low-resource languages is one way to increase performance and consistency across Natural Language Processing (NLP) applications in a practical sense. By taking these actions, we can advance our understanding of MCD more thoroughly, which will ultimately improve the performance of NLP systems in a variety of language circumstances. Full article
(This article belongs to the Special Issue Natural Language Processing (NLP) and Large Language Modelling)
Show Figures

Figure 1

Back to TopTop