applsci-logo

Journal Browser

Journal Browser

Neural Network Technologies in Natural Language Processing and Data Mining

A special issue of Applied Sciences (ISSN 2076-3417). This special issue belongs to the section "Computing and Artificial Intelligence".

Deadline for manuscript submissions: closed (20 March 2025) | Viewed by 10368

Special Issue Editors


E-Mail Website
Guest Editor
Department of Informatics, University of Rijeka, Radmile Matejčić 2, 51000 Rijeka, Croatia
Interests: artificial intelligence; machine learning; interpretable machine learning; educational data mining; natural language processing; machine translation
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
Faculty of Informatics and Digital Technologies, University of Rijeka, 51000 Rijeka, Croatia
Interests: artificial intelligence; data science; machine learning; explainable artificial intelligence; explainable machine learning; human-centric AI; trustworthy Internet of Things systems
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Neural network technologies have revolutionized the fields of natural language processing (NLP) and data mining, as well as the way we process and extract hidden insights from vast amounts of textual data. Neural networks play a central role in uncovering patterns and trends from unstructured complex data sources. The have been successfully applied in all sorts of classification and regression tasks, as well as in clustering, anomaly detection, association rule extraction, etc. Different types of neural networks are used depending on the application. BERT, GPT, and T5 are among the most notable transformer architectures applied in NLP tasks such as language understanding, generation, summarization, question answering, and translation. They all share the fundamental concept of leveraging self-attention mechanisms. Both NLP and data mining fields evolve over time and continue to explore new variations and enhancements to address different tasks and challenges in a more efficient and effective manner.

This Special Issue of Applied Sciences covers all application areas of neural network technologies in the fields of natural language processing (NLP) and data mining. It aims to show how neural network technologies have addressed long-standing challenges in these areas, as well as how they give rise to new challenges.

Both original research articles and comprehensive review articles are welcome.

Topics of interest in this Special Issue include various applications of neural networks such as:

  • Topic modeling;
  • Text classification;
  • Automatic language translation;
  • Text generation;
  • Profiling;
  • Language understanding;
  • Named entity recognition;
  • Information extraction;
  • Social media analysis;
  • Pattern recognition;
  • Classification;
  • Regression;
  • Anomaly detection;
  • Association rule extraction;
  • Feature extraction;
  • Dealing with imbalanced and biased data sets;
  • Biases in language models;
  • Emerging trends.

Prof. Dr. Marija Brkić Bakarić
Prof. Dr. Maja Matetic
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Applied Sciences is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • neural network technologies
  • natural language processing (NLP)
  • data mining
  • classification
  • regression
  • clustering
  • anomaly detection
  • association rule extraction
  • transformer
  • language understanding
  • generation
  • summarization
  • question answering
  • translation
  • self-attention mechanisms

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue policies can be found here.

Published Papers (5 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

22 pages, 6086 KiB  
Article
A Comparative Evaluation of Transformers and Deep Learning Models for Arabic Meter Classification
by A. M. Mutawa and Sai Sruthi
Appl. Sci. 2025, 15(9), 4941; https://doi.org/10.3390/app15094941 - 29 Apr 2025
Abstract
Arabic poetry follows intricate rhythmic patterns known as ‘arūḍ’ (prosody), which makes its automated categorization particularly challenging. While earlier studies primarily relied on conventional machine learning and recurrent neural networks, this work evaluates the effectiveness of transformer-based models—an area not extensively explored for [...] Read more.
Arabic poetry follows intricate rhythmic patterns known as ‘arūḍ’ (prosody), which makes its automated categorization particularly challenging. While earlier studies primarily relied on conventional machine learning and recurrent neural networks, this work evaluates the effectiveness of transformer-based models—an area not extensively explored for this task. We investigate several pretrained transformer models, including Arabic Bidirectional Encoder Representations from Transformers (Arabic-BERT), BERT base Arabic (AraBERT), Arabic Efficiently Learning an Encoder that Classifies Token Replacements Accurately (AraELECTRA), Computational Approaches to Modeling Arabic BERT (CAMeLBERT), Multi-dialect Arabic BERT (MARBERT), and Modern Arabic BERT (ARBERT), alongside deep learning models such as Bidirectional Long Short-Term Memory (BiLSTM) and Bidirectional Gated Recurrent Units (BiGRU). This study uses half-verse data across 14 m. The CAMeLBERT model achieved the highest performance, with an accuracy of 90.62% and an F1-score of 0.91, outperforming other models. We further analyze feature significance and model behavior using the Local Interpretable Model-Agnostic Explanations (LIME) interpretability technique. The LIME-based analysis highlights key linguistic features that most influence model predictions. These findings demonstrate the strengths and limitations of each method and pave the way for further advancements in Arabic poetry analysis using deep learning. Full article
Show Figures

Figure 1

29 pages, 976 KiB  
Article
Vegetarianism Discourse in Russian Social Media: A Case Study
by Nikita Gorduna and Natalia Vanetik
Appl. Sci. 2025, 15(1), 259; https://doi.org/10.3390/app15010259 - 30 Dec 2024
Viewed by 680
Abstract
Dietary choices, especially vegetarianism, have attracted much attention lately due to their potential effects on the environment, human health, and morality. Despite this, public discourse on vegetarianism in Russian-language contexts remains underexplored. This paper introduces VegRuCorpus, a novel, manually annotated dataset of Russian-language [...] Read more.
Dietary choices, especially vegetarianism, have attracted much attention lately due to their potential effects on the environment, human health, and morality. Despite this, public discourse on vegetarianism in Russian-language contexts remains underexplored. This paper introduces VegRuCorpus, a novel, manually annotated dataset of Russian-language social media texts expressing opinions on vegetarianism. Through extensive experimentation, we demonstrate that contrastive learning significantly outperforms traditional machine learning and fine-tuned transformer models, achieving the best classification performance for distinguishing pro- and anti-vegetarian opinions. While traditional models perform competitively using syntactic and semantic representations and fine-tuned transformers show promise, our findings highlight the need for task-specific data to unlock their full potential. By providing a new dataset and insights into model performance, this work advances opinion mining and contributes to understanding nutritional health discourse in Russia. Full article
Show Figures

Figure 1

14 pages, 1558 KiB  
Article
Comparing Fine-Tuning and Prompt Engineering for Multi-Class Classification in Hospitality Review Analysis
by Ive Botunac, Marija Brkić Bakarić and Maja Matetić
Appl. Sci. 2024, 14(14), 6254; https://doi.org/10.3390/app14146254 - 18 Jul 2024
Cited by 2 | Viewed by 2572
Abstract
This study compares the effectiveness of fine-tuning Transformer models, specifically BERT, RoBERTa, DeBERTa, and GPT-2, against using prompt engineering in LLMs like ChatGPT and GPT-4 for multi-class classification of hotel reviews. As the hospitality industry increasingly relies on online customer feedback to improve [...] Read more.
This study compares the effectiveness of fine-tuning Transformer models, specifically BERT, RoBERTa, DeBERTa, and GPT-2, against using prompt engineering in LLMs like ChatGPT and GPT-4 for multi-class classification of hotel reviews. As the hospitality industry increasingly relies on online customer feedback to improve services and strategize marketing, accurately analyzing this feedback is crucial. Our research employs a multi-task learning framework to simultaneously conduct sentiment analysis and categorize reviews into aspects such as service quality, ambiance, and food. We assess the capabilities of fine-tuned Transformer models and LLMs with prompt engineering in processing and understanding the complex user-generated content prevalent in the hospitality industry. The results show that fine-tuned models, particularly RoBERTa, are more adept at classification tasks due to their deep contextual processing abilities and faster execution times. In contrast, while ChatGPT and GPT-4 excel in sentiment analysis by better capturing the nuances of human emotions, they require more computational power and longer processing times. Our findings support the hypothesis that fine-tuning models can achieve better results and faster execution than using prompt engineering in LLMs for multi-class classification in hospitality reviews. This study suggests that selecting the appropriate NLP model depends on the task’s specific needs, balancing computational efficiency and the depth of sentiment analysis required for actionable insights in hospitality management. Full article
Show Figures

Figure 1

34 pages, 11122 KiB  
Article
A Bibliometric Analysis of Text Mining: Exploring the Use of Natural Language Processing in Social Media Research
by Andra Sandu, Liviu-Adrian Cotfas, Aurelia Stănescu and Camelia Delcea
Appl. Sci. 2024, 14(8), 3144; https://doi.org/10.3390/app14083144 - 9 Apr 2024
Cited by 11 | Viewed by 3805
Abstract
Natural language processing (NLP) plays a pivotal role in modern life by enabling computers to comprehend, analyze, and respond to human language meaningfully, thereby offering exciting new opportunities. As social media platforms experience a surge in global usage, the imperative to capture and [...] Read more.
Natural language processing (NLP) plays a pivotal role in modern life by enabling computers to comprehend, analyze, and respond to human language meaningfully, thereby offering exciting new opportunities. As social media platforms experience a surge in global usage, the imperative to capture and better understand the messages disseminated within these networks becomes increasingly crucial. Moreover, the occurrence of adverse events, such as the emergence of a pandemic or conflicts in various parts of the world, heightens social media users’ inclinations towards these platforms. In this context, this paper aims to explore the scientific literature dedicated to the utilization of NLP in social media research, with the goal of highlighting trends, keywords, and collaborative networks within the authorship that contribute to the proliferation of papers in this field. To achieve this objective, we extracted and analyzed 1852 papers from the ISI Web of Science database. An initial observation reveals a remarkable annual growth rate of 62.18%, underscoring the heightened interest of the academic community in this domain. This paper includes an n-gram analysis and a review of the most cited papers in the extracted database, offering a comprehensive bibliometric analysis. The insights gained from these efforts provide essential perspectives and contribute to identifying pertinent issues in social media analysis addressed through the application of NLP. Full article
Show Figures

Figure 1

28 pages, 1581 KiB  
Article
Authorship Attribution in Less-Resourced Languages: A Hybrid Transformer Approach for Romanian
by Melania Nitu and Mihai Dascalu
Appl. Sci. 2024, 14(7), 2700; https://doi.org/10.3390/app14072700 - 23 Mar 2024
Viewed by 1836
Abstract
Authorship attribution for less-resourced languages like Romanian, characterized by the scarcity of large, annotated datasets and the limited number of available NLP tools, poses unique challenges. This study focuses on a hybrid Transformer combining handcrafted linguistic features, ranging from surface indices like word [...] Read more.
Authorship attribution for less-resourced languages like Romanian, characterized by the scarcity of large, annotated datasets and the limited number of available NLP tools, poses unique challenges. This study focuses on a hybrid Transformer combining handcrafted linguistic features, ranging from surface indices like word frequencies to syntax, semantics, and discourse markers, with contextualized embeddings from a Romanian BERT encoder. The methodology involves extracting contextualized representations from a pre-trained Romanian BERT model and concatenating them with linguistic features, selected using the Kruskal–Wallis mean rank, to create a hybrid input vector for a classification layer. We compare this approach with a baseline ensemble of seven machine learning classifiers for authorship attribution employing majority soft voting. We conduct studies on both long texts (full texts) and short texts (paragraphs), with 19 authors and a subset of 10. Our hybrid Transformer outperforms existing methods, achieving an F1 score of 0.87 on the full dataset of the 19-author set (an 11% enhancement) and an F1 score of 0.95 on the 10-author subset (an increase of 10% over previous research studies). We conduct linguistic analysis leveraging textual complexity indices and employ McNemar and Cochran’s Q statistical tests to evaluate the performance evolution across the best three models, while highlighting patterns in misclassifications. Our research contributes to diversifying methodologies for effective authorship attribution in resource-constrained linguistic environments. Furthermore, we publicly release the full dataset and the codebase associated with this study to encourage further exploration and development in this field. Full article
Show Figures

Figure 1

Back to TopTop