Submit to Information Review for Information Propose a Special Issue

Journal Menu

Journal Browser

► Journal Browser

Application of Machine Learning in Data Science and Computational Intelligence

Special Issue Editors
Special Issue Information
Keywords
Benefits of Publishing in a Special Issue
Published Papers

A special issue of Information (ISSN 2078-2489). This special issue belongs to the section "Artificial Intelligence".

Deadline for manuscript submissions: closed (31 May 2025) | Viewed by 22547

Share This Special Issue

Special Issue Editors

Dr. Elias Dritsas

E-Mail Website
Guest Editor

Department of Informatics and Computer Engineering, University of West Attica, Egaleo Park Campus, 12243 Athens, Greece
Interests: artificial intelligence; big data; data analysis; databases; data mining; data structures; machine learning; privacy; security; trust
Special Issues, Collections and Topics in MDPI journals

Dr. Maria Trigka

E-Mail Website
Guest Editor

Industrial Systems Institute (ISI), Athena Research and Innovation Center, 26504 Patras, Greece
Interests: 5G; 6G; artificial intelligence; deep learning; image processing; IoT; machine learning; MIMO; mmWave; signal processing; wireless communications
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Data science is a field of study that focuses on the extraction of valuable information from noisy data and incorporates various disciplines, such as data engineering, data preprocessing, visualization, predictive analytics, data mining, machine learning and statistics. In recent years, there has been rapidly growing interest in the mathematical and theoretical aspects of data science. This manifests in deterministic and non-deterministic models (i.e., probabilistic and a family of probabilistic known as statistical) that provide guaranteed performance, robustness, and reusable and interpretable results. The digital transformation of information systems has made feasible the effective use of data science techniques such as artificial intelligence (AI) and machine learning (ML) for various applications. In addition, the application of sensor technology and AI/ML will inevitably lead to a more objective and enhanced performance, lower cost and more effective system management overall. The aim of this Special Issue is to present high-quality innovative ideas and research solutions (for both theoretical and practical challenges) that facilitate data analysis and modelling with the aid of artificial intelligence and machine learning in various domains and applications.

Dr. Elias Dritsas
Dr. Maria Trigka
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Information is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1800 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

data science
data mining
artificial intelligence
machine learning
statistics
predictive modelling
monitoring
data analytics

Benefits of Publishing in a Special Issue

Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
Reprint: MDPI Books provides the opportunity to republish successful Special Issues in book format, both online and in print.

Further information on MDPI's Special Issue policies can be found here.

Published Papers (11 papers)

Download All Papers

Order results

Result details

Show export options Show export options

Select all

Export citation of selected articles as:

Research

24 pages, 1088 KB

Open AccessArticle

Multilingual Sentiment Analysis with Data Augmentation: A Cross-Language Evaluation in French, German, and Japanese

by Suboh Alkhushayni and Hyesu Lee

Information 2025, 16(9), 806; https://doi.org/10.3390/info16090806 - 17 Sep 2025

Viewed by 967

Abstract

Machine learning in natural language processing (NLP) analyzes datasets to make future predictions, but developing accurate models requires large, high-quality, and balanced datasets. However, collecting such datasets, especially for low-resource languages, is time-consuming and costly. As a solution, data augmentation can be used to increase the dataset size by generating synthetic samples from existing data. This study examines the effect of translation-based data augmentation on sentiment analysis using small datasets in three diverse languages: French, German, and Japanese. We use two neural machine translation (NMT) services—Google Translate and DeepL—to generate augmented datasets through intermediate language translation. Sentiment analysis models based on Support Vector Machine (SVM) are trained on both original and augmented datasets and evaluated using accuracy, precision, recall, and F1 score. Our results demonstrate that translation augmentation significantly enhances model performance in both French and Japanese. For example, using Google Translate, model accuracy improved from 62.50% to 83.55% in Japanese (+21.05%) and from 87.66% to 90.26% in French (+2.6%). In contrast, the German dataset showed a minor improvement or decline, depending on the translator used. Google-based augmentation generally outperformed DeepL, which yielded smaller or negative gains. To evaluate cross-lingual generalization, models trained on one language were tested on datasets in the other two. Notably, a model trained on augmented German data improved its accuracy on French test data from 81.17% to 85.71% and on Japanese test data from 71.71% to 79.61%. Similarly, a model trained on augmented Japanese data improved accuracy on German test data by up to 3.4%. These findings highlight that translation-based augmentation can enhance sentiment classification and cross-language adaptability, particularly in low-resource and multilingual NLP settings. Full article

(This article belongs to the Special Issue Application of Machine Learning in Data Science and Computational Intelligence)

► Show Figures

Figure 1

20 pages, 4253 KB

Open AccessArticle

Data-Driven Structural Health Monitoring Through Echo State Network Regression

by Xiaoou Li, Yingqin Zhu and Wen Yu

Information 2025, 16(8), 678; https://doi.org/10.3390/info16080678 - 8 Aug 2025

Viewed by 660

Abstract

This paper presents a novel data-driven approach to structural health monitoring (SHM) that uses Echo State Network (ESN) regression for continuous damage assessment. In contrast to traditional classification methods that demand extensive labeled data on damaged states, our approach utilizes an ESN, a powerful recurrent neural network, to directly predict a continuous damage metric from sensor data. This regression-based methodology offers two key advantages relevant to data science applications in SHM: (1) Reduced Training Data Dependency: The ESN achieves high accuracy even with limited data on damaged structures, significantly alleviating the data acquisition burden compared to classification-based AI/ML techniques. (2) Enhanced Noise Resilience: The inherent reservoir computing property of ESNs, characterized by a fixed, high-dimensional recurrent layer, makes them more tolerant of sensor noise and environmental variations compared to classification methods, leading to more reliable and robust SHM predictions from noisy data. A comprehensive evaluation demonstrates the effectiveness of the proposed ESN in identifying structural damage, highlighting its potential for practical application in data-driven SHM systems. Full article

(This article belongs to the Special Issue Application of Machine Learning in Data Science and Computational Intelligence)

► Show Figures

Figure 1

24 pages, 6025 KB

Open AccessArticle

Uniform Manifold Approximation and Projection Filtering and Explainable Artificial Intelligence to Detect Adversarial Machine Learning

by Achmed Samuel Koroma, Sara Narteni, Enrico Cambiaso and Maurizio Mongelli

Information 2025, 16(8), 647; https://doi.org/10.3390/info16080647 - 29 Jul 2025

Viewed by 977

Abstract

Adversarial machine learning exploits the vulnerabilities of artificial intelligence (AI) models by inducing malicious distortion in input data. Starting with the effect of adversarial methods on well-known MNIST and CIFAR-10 open datasets, this paper investigates the ability of Uniform Manifold Approximation and Projection (UMAP) in providing useful representations of both legitimate and malicious images and analyzes the attacks’ behavior under various conditions. By enabling the extraction of decision rules and the ranking of important features from classifiers such as decision trees, eXplainable AI (XAI) achieves zero false positives and negatives in detection through very simple if-then rules over UMAP variables. Several examples are reported in order to highlight attacks behaviour. The data availability statement details all code and data which is publicly available to offer support to reproducibility. Full article

(This article belongs to the Special Issue Application of Machine Learning in Data Science and Computational Intelligence)

► Show Figures

Figure 1

19 pages, 6555 KB

Open AccessArticle

Exploiting Structured Global and Neighbor Orders for Enhanced Ordinal Regression

by Imam Mustafa Kamal, Solichin Mochammad, Latifah Nurahmi, Azis Natawijaya and Muhammad Kalili

Information 2025, 16(8), 624; https://doi.org/10.3390/info16080624 - 22 Jul 2025

Viewed by 917

Abstract

Ordinal regression combines classification and regression techniques, constrained by the intrinsic order among categories. It has wide-ranging applications in real-world scenarios, such as product quality grading, medical diagnoses, and facial age recognition, where understanding ranked relationships is crucial. Existing models, which often employ a series of binary classifiers with ordinal consistency loss, effectively enforce global order consistency but frequently encounter misclassification errors between adjacent categories. Achieving both global and local (neighbor-level) ordinal consistency, however, remains a significant challenge. In this study, we propose a hybrid ordinal regression model that addresses global ordinal structure while enhancing local consistency between neighboring categories. Our approach leverages ordinal metric learning to generate embeddings that capture global ordinal relationships and extends consistent rank logits with a neighbor order penalty in the loss function to reduce adjacent category misclassifications. Experimental results on multiple benchmark ordinal datasets demonstrate that our model significantly minimizes neighboring misclassification errors and global order inconsistencies, outperforming existing ordinal regression models. Full article

(This article belongs to the Special Issue Application of Machine Learning in Data Science and Computational Intelligence)

► Show Figures

Figure 1

24 pages, 2522 KB

Open AccessArticle

Digital Requirements for Systems Analysts in Europe: A Sectoral Analysis of Online Job Advertisements and ESCO Skills

by Konstantinos Charmanas, Konstantinos Georgiou, Nikolaos Mittas and Lefteris Angelis

Information 2025, 16(5), 363; https://doi.org/10.3390/info16050363 - 29 Apr 2025

Viewed by 1034

Abstract

Systems analysts can be considered a valuable part of organizations, as their responsibilities and contributions concern the improvement of information systems, which constitute an irreplaceable part of organizations. Thus, by exploring the current labor market of systems analysts, researchers can gather valuable knowledge to understand some invaluable societal needs. In this context, the objectives of this study are to investigate the sets of digital skills from the European Skills, Competences, Qualifications, and Occupations (ESCO) taxonomy required by systems analysts in Europe and examine the key characteristics of various relevant sectors. For this purpose, a tool combining topic extraction, machine learning, and statistical analysis is utilized. The outcomes prove that systems analysts may indeed possess different types of digital skills, where 12 distinct topics are discovered, and that the professional, scientific, and technical activities demand the most unique sets of digital skills across 17 sectors. Ultimately, the findings show that the numerous sectors indeed have divergent requirements and should be approached accordingly. Overall, this study can offer valuable guidelines for identifying both the general duties of systems analysts and the specific needs of each sector. Also, the presented tool and methods may provide ideas for exploring different domains associated with content information and distinct groups. Full article

(This article belongs to the Special Issue Application of Machine Learning in Data Science and Computational Intelligence)

► Show Figures

Figure 1

23 pages, 522 KB

Open AccessArticle

ORUD-Detect: A Comprehensive Approach to Offensive Language Detection in Roman Urdu Using Hybrid Machine Learning–Deep Learning Models with Embedding Techniques

by Nisar Hussain, Amna Qasim, Gull Mehak, Olga Kolesnikova, Alexander Gelbukh and Grigori Sidorov

Information 2025, 16(2), 139; https://doi.org/10.3390/info16020139 - 13 Feb 2025

Cited by 4 | Viewed by 1970

Abstract

With the rapid expansion of social media, detecting offensive language has become critically important for healthy online interactions. This poses a considerable challenge for low-resource languages such as Roman Urdu which are widely spoken on platforms like Facebook. In this paper, we perform a comprehensive study of offensive language detection models on Roman Urdu datasets using both Machine Learning (ML) and Deep Learning (DL) approaches. We present a dataset of 89,968 Facebook comments and extensive preprocessing techniques such as TF-IDF features, Word2Vec, and fastText embeddings to address linguistic idiosyncrasies and code-mixed aspects of Roman Urdu. Among the ML models, a linear kernel Support Vector Machine (SVM) model scored the best performance, with an F1 score of 94.76, followed by SVM models with radial and polynomial kernels. Even the use of BoW uni-gram features with naive Bayes produced competitive results, with an F1 score of 94.26. The DL models performed well, with Bi-LSTM returning an F1 score of 98.00 with Word2Vec embeddings and fastText-based Bi-RNN performing at 97.00, showcasing the inference of contextual embeddings and soft similarity. The CNN model also gave a good result, with an F1 score of 96.00. The CNN model also achieved an F1 score of 96.00. This study presents hybrid ML and DL approaches to improve offensive language detection approaches for low-resource languages. This research opens up new doors to providing safer online environments for widespread Roman Urdu users. Full article

(This article belongs to the Special Issue Application of Machine Learning in Data Science and Computational Intelligence)

► Show Figures

Figure 1

23 pages, 929 KB

Open AccessArticle

Detection of Depression Severity in Social Media Text Using Transformer-Based Models

by Amna Qasim, Gull Mehak, Nisar Hussain, Alexander Gelbukh and Grigori Sidorov

Information 2025, 16(2), 114; https://doi.org/10.3390/info16020114 - 7 Feb 2025

Cited by 13 | Viewed by 5389

Abstract

Depression, a serious mental health disorder, requires accurate classification for effective intervention. Existing methods often fail to capture nuanced emotional and linguistic cues, leading to suboptimal classification of depression severity. This study bridges this gap by leveraging content-based approaches (N-grams) and context-based methods (Sentence Transformers), alongside advanced transformer-based models, to classify mild, moderate, and severe depression using text data sourced from Reddit. By demonstrating the effectiveness of modern NLP techniques in capturing subtle contextual variations, this research highlights the potential of transformer-based models to enhance depression severity detection. The proposed framework offers a scalable and adaptable solution for real-world mental health diagnostics and early intervention systems. Full article

(This article belongs to the Special Issue Application of Machine Learning in Data Science and Computational Intelligence)

► Show Figures

Figure 1

19 pages, 389 KB

Open AccessArticle

Exploring the Behavior and Performance of Large Language Models: Can LLMs Infer Answers to Questions Involving Restricted Information?

by Ángel Cadena-Bautista, Francisco F. López-Ponce, Sergio Luis Ojeda-Trueba, Gerardo Sierra and Gemma Bel-Enguix

Information 2025, 16(2), 77; https://doi.org/10.3390/info16020077 - 22 Jan 2025

Cited by 1 | Viewed by 2156

Abstract

In this paper various LLMs are tested in a specific domain using a Retrieval-Augmented Generation (RAG) system. The study focuses on the performance and behavior of the models and was conducted in Spanish. A questionnaire based on The Bible, which consists of questions that vary in complexity of reasoning, was created in order to evaluate the reasoning capabilities of each model. The RAG system matches a question with the most similar passage from The Bible and feeds the pair to each LLM. The evaluation aims to determine whether each model can reason solely with the provided information or if it disregards the instructions given and makes use of its pretrained knowledge. Full article

(This article belongs to the Special Issue Application of Machine Learning in Data Science and Computational Intelligence)

► Show Figures

Graphical abstract

17 pages, 1865 KB

Open AccessArticle

Improving Sentiment Analysis Performance on Imbalanced Moroccan Dialect Datasets Using Resample and Feature Extraction Techniques

by Zineb Nassr, Faouzia Benabbou, Nawal Sael and Touria Hamim

Information 2025, 16(1), 39; https://doi.org/10.3390/info16010039 - 10 Jan 2025

Cited by 1 | Viewed by 1897

Abstract

Sentiment analysis is a crucial component of text mining and natural language processing (NLP), involving the evaluation and classification of text data based on its emotional tone, typically categorized as positive, negative, or neutral. While significant research has focused on structured languages like English, unstructured languages, such as the Moroccan Dialect (MD), face substantial resource limitations and linguistic challenges, making effective sentiment analysis difficult. This study addresses this gap by exploring the integration of data-balancing techniques with machine learning (ML) methods, specifically investigating the impact of resampling techniques and feature extraction methods, including Term Frequency–Inverse Document Frequency (TF-IDF), Bag of Words (BOW), and N-grams. Through rigorous experimentation, we evaluate the effectiveness of these approaches in enhancing sentiment analysis accuracy for the Moroccan dialect. Our findings demonstrate that strategic resampling, combined with the TF-IDF method, significantly improves classification accuracy and robustness. We also explore the interaction between resampling strategies and feature extraction methods, revealing varying levels of effectiveness across different combinations. Notably, the Support Vector Machine (SVM) classifier, when paired with TF-IDF representation, achieves superior performance, with an accuracy of 90.24% and a precision of 90.34%. These results highlight the importance of tailored resampling techniques, appropriate feature extraction methods, and machine learning optimization in advancing sentiment analysis for under-resourced and dialect-heavy languages like the Moroccan dialect, providing a practical framework for future research and development in NLP for unstructured languages. Full article

(This article belongs to the Special Issue Application of Machine Learning in Data Science and Computational Intelligence)

► Show Figures

Graphical abstract

14 pages, 405 KB

Open AccessArticle

Understanding Online Purchases with Explainable Machine Learning

by João A. Bastos and Maria Inês Bernardes

Information 2024, 15(10), 587; https://doi.org/10.3390/info15100587 - 26 Sep 2024

Cited by 2 | Viewed by 1717

Abstract

Customer profiling in e-commerce is a powerful tool that enables organizations to create personalized offers through direct marketing. One crucial objective of customer profiling is to predict whether a website visitor will make a purchase, thereby generating revenue. Machine learning models are the most accurate means to achieve this objective. However, the opaque nature of these models may deter companies from adopting them. Instead, they may prefer simpler models that allow for a clear understanding of the customer attributes that contribute to a purchase. In this study, we show that companies need not compromise on prediction accuracy to understand their online customers. By leveraging website data from a multinational communications service provider, we establish that the most pertinent customer attributes can be readily extracted from a black box model. Specifically, we show that the features that measure customer activity within the e-commerce platform are the most reliable predictors of conversions. Moreover, we uncover significant nonlinear relationships between customer features and the likelihood of conversion. Full article

(This article belongs to the Special Issue Application of Machine Learning in Data Science and Computational Intelligence)

► Show Figures

Figure 1

24 pages, 667 KB

Open AccessArticle

Utilizing Multi-Class Classification Methods for Automated Sleep Disorder Prediction

by Elias Dritsas and Maria Trigka

Information 2024, 15(8), 426; https://doi.org/10.3390/info15080426 - 23 Jul 2024

Cited by 4 | Viewed by 3333

Abstract

Even from infancy, a human’s day-life alternates from a period of wakefulness to a period of sleep at night, during the 24-hour cycle. Sleep is a normal process necessary for human physical and mental health. A lack of sleep makes it difficult to control emotions and behaviour, reduces productivity at work, and can even increase stress or depression. In addition, poor sleep affects health; when sleep is insufficient, the chances of developing serious diseases greatly increase. Researchers in sleep medicine have identified an extensive list of sleep disorders, and thus leveraged Artificial Intelligence (AI) to automate their analysis and gain a deeper understanding of sleep patterns and related disorders. In this research, we seek a Machine Learning (ML) solution that will allow for efficient classification of unlabeled instances as being Sleep Apnea, Insomnia or Normal (subjects without a specific sleep disorder) by assessing the performance of two well-established strategies for multi-class classification tasks: the One-Vs-All (OVA) and One-Vs-One (OVO). In the context of the specific strategies, two well-known binary classification models were assumed, Logistic Regression (LR) and Support Vector Machines (SVMs). Both strategies’ validity was verified upon a dataset of diverse information related to the profiles (anthropometric data, sleep metrics, lifestyle and cardiovascular health factors) of potential patients or individuals not exhibiting any specific sleep disorder. Performance evaluation was carried out by comparing the weighted average results in all involved classes that represent these two specific sleep disorders and no-disorder occurrence; accuracy, kappa score, precision, recall, f-measure, and Area Under the ROC curve (AUC) were recorded and compared to identify an effective and robust model and strategy, both class-wise and on average. The experimental evaluation unveiled that after feature selection, 2-degree polynomial SVM under both strategies was the least complex and most efficient, recording an accuracy of 91.44%, a kappa score of 84.97%, precision, recall and f-measure equal to 0.914, and an AUC of 0.927. Full article

(This article belongs to the Special Issue Application of Machine Learning in Data Science and Computational Intelligence)

► Show Figures

Journal Menu

Journal Browser

Application of Machine Learning in Data Science and Computational Intelligence

Share This Special Issue

Special Issue Editors

Special Issue Information

Keywords

Benefits of Publishing in a Special Issue

Published Papers (11 papers)

Research

Further Information

Guidelines

MDPI Initiatives

Follow MDPI