Application of Machine Learning in Data Science and Computational Intelligence

A special issue of Information (ISSN 2078-2489). This special issue belongs to the section "Artificial Intelligence".

Deadline for manuscript submissions: 31 May 2025 | Viewed by 7736

Special Issue Editors


E-Mail Website
Guest Editor
Industrial Systems Institute (ISI), Athena Research and Innovation Center, 26504 Patras, Greece
Interests: 5G; 6G; artificial intelligence; deep learning; image processing; IoT; machine learning; MIMO; mmWave; signal processing; wireless communications
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Data science is a field of study that focuses on the extraction of valuable information from noisy data and incorporates various disciplines, such as data engineering, data preprocessing, visualization, predictive analytics, data mining, machine learning and statistics. In recent years, there has been rapidly growing interest in the mathematical and theoretical aspects of data science. This manifests in deterministic and non-deterministic models (i.e., probabilistic and a family of probabilistic known as statistical) that provide guaranteed performance, robustness, and reusable and interpretable results. The digital transformation of information systems has made feasible the effective use of data science techniques such as artificial intelligence (AI) and machine learning (ML) for various applications. In addition, the application of sensor technology and AI/ML will inevitably lead to a more objective and enhanced performance, lower cost and more effective system management overall. The aim of this Special Issue is to present high-quality innovative ideas and research solutions (for both theoretical and practical challenges) that facilitate data analysis and modelling with the aid of artificial intelligence and machine learning in various domains and applications.

Dr. Elias Dritsas
Dr. Maria Trigka
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Information is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • data science
  • data mining
  • artificial intelligence
  • machine learning
  • statistics
  • predictive modelling
  • monitoring
  • data analytics

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue policies can be found here.

Published Papers (7 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

24 pages, 2522 KiB  
Article
Digital Requirements for Systems Analysts in Europe: A Sectoral Analysis of Online Job Advertisements and ESCO Skills
by Konstantinos Charmanas, Konstantinos Georgiou, Nikolaos Mittas and Lefteris Angelis
Information 2025, 16(5), 363; https://doi.org/10.3390/info16050363 - 29 Apr 2025
Abstract
Systems analysts can be considered a valuable part of organizations, as their responsibilities and contributions concern the improvement of information systems, which constitute an irreplaceable part of organizations. Thus, by exploring the current labor market of systems analysts, researchers can gather valuable knowledge [...] Read more.
Systems analysts can be considered a valuable part of organizations, as their responsibilities and contributions concern the improvement of information systems, which constitute an irreplaceable part of organizations. Thus, by exploring the current labor market of systems analysts, researchers can gather valuable knowledge to understand some invaluable societal needs. In this context, the objectives of this study are to investigate the sets of digital skills from the European Skills, Competences, Qualifications, and Occupations (ESCO) taxonomy required by systems analysts in Europe and examine the key characteristics of various relevant sectors. For this purpose, a tool combining topic extraction, machine learning, and statistical analysis is utilized. The outcomes prove that systems analysts may indeed possess different types of digital skills, where 12 distinct topics are discovered, and that the professional, scientific, and technical activities demand the most unique sets of digital skills across 17 sectors. Ultimately, the findings show that the numerous sectors indeed have divergent requirements and should be approached accordingly. Overall, this study can offer valuable guidelines for identifying both the general duties of systems analysts and the specific needs of each sector. Also, the presented tool and methods may provide ideas for exploring different domains associated with content information and distinct groups. Full article
Show Figures

Figure 1

23 pages, 522 KiB  
Article
ORUD-Detect: A Comprehensive Approach to Offensive Language Detection in Roman Urdu Using Hybrid Machine Learning–Deep Learning Models with Embedding Techniques
by Nisar Hussain, Amna Qasim, Gull Mehak, Olga Kolesnikova, Alexander Gelbukh and Grigori Sidorov
Information 2025, 16(2), 139; https://doi.org/10.3390/info16020139 - 13 Feb 2025
Viewed by 621
Abstract
With the rapid expansion of social media, detecting offensive language has become critically important for healthy online interactions. This poses a considerable challenge for low-resource languages such as Roman Urdu which are widely spoken on platforms like Facebook. In this paper, we perform [...] Read more.
With the rapid expansion of social media, detecting offensive language has become critically important for healthy online interactions. This poses a considerable challenge for low-resource languages such as Roman Urdu which are widely spoken on platforms like Facebook. In this paper, we perform a comprehensive study of offensive language detection models on Roman Urdu datasets using both Machine Learning (ML) and Deep Learning (DL) approaches. We present a dataset of 89,968 Facebook comments and extensive preprocessing techniques such as TF-IDF features, Word2Vec, and fastText embeddings to address linguistic idiosyncrasies and code-mixed aspects of Roman Urdu. Among the ML models, a linear kernel Support Vector Machine (SVM) model scored the best performance, with an F1 score of 94.76, followed by SVM models with radial and polynomial kernels. Even the use of BoW uni-gram features with naive Bayes produced competitive results, with an F1 score of 94.26. The DL models performed well, with Bi-LSTM returning an F1 score of 98.00 with Word2Vec embeddings and fastText-based Bi-RNN performing at 97.00, showcasing the inference of contextual embeddings and soft similarity. The CNN model also gave a good result, with an F1 score of 96.00. The CNN model also achieved an F1 score of 96.00. This study presents hybrid ML and DL approaches to improve offensive language detection approaches for low-resource languages. This research opens up new doors to providing safer online environments for widespread Roman Urdu users. Full article
Show Figures

Figure 1

23 pages, 929 KiB  
Article
Detection of Depression Severity in Social Media Text Using Transformer-Based Models
by Amna Qasim, Gull Mehak, Nisar Hussain, Alexander Gelbukh and Grigori Sidorov
Information 2025, 16(2), 114; https://doi.org/10.3390/info16020114 - 7 Feb 2025
Cited by 1 | Viewed by 1248
Abstract
Depression, a serious mental health disorder, requires accurate classification for effective intervention. Existing methods often fail to capture nuanced emotional and linguistic cues, leading to suboptimal classification of depression severity. This study bridges this gap by leveraging content-based approaches (N-grams) and context-based methods [...] Read more.
Depression, a serious mental health disorder, requires accurate classification for effective intervention. Existing methods often fail to capture nuanced emotional and linguistic cues, leading to suboptimal classification of depression severity. This study bridges this gap by leveraging content-based approaches (N-grams) and context-based methods (Sentence Transformers), alongside advanced transformer-based models, to classify mild, moderate, and severe depression using text data sourced from Reddit. By demonstrating the effectiveness of modern NLP techniques in capturing subtle contextual variations, this research highlights the potential of transformer-based models to enhance depression severity detection. The proposed framework offers a scalable and adaptable solution for real-world mental health diagnostics and early intervention systems. Full article
Show Figures

Figure 1

19 pages, 389 KiB  
Article
Exploring the Behavior and Performance of Large Language Models: Can LLMs Infer Answers to Questions Involving Restricted Information?
by Ángel Cadena-Bautista, Francisco F. López-Ponce, Sergio Luis Ojeda-Trueba, Gerardo Sierra and Gemma Bel-Enguix
Information 2025, 16(2), 77; https://doi.org/10.3390/info16020077 - 22 Jan 2025
Viewed by 869
Abstract
In this paper various LLMs are tested in a specific domain using a Retrieval-Augmented Generation (RAG) system. The study focuses on the performance and behavior of the models and was conducted in Spanish. A questionnaire based on The Bible, which consists of questions [...] Read more.
In this paper various LLMs are tested in a specific domain using a Retrieval-Augmented Generation (RAG) system. The study focuses on the performance and behavior of the models and was conducted in Spanish. A questionnaire based on The Bible, which consists of questions that vary in complexity of reasoning, was created in order to evaluate the reasoning capabilities of each model. The RAG system matches a question with the most similar passage from The Bible and feeds the pair to each LLM. The evaluation aims to determine whether each model can reason solely with the provided information or if it disregards the instructions given and makes use of its pretrained knowledge. Full article
Show Figures

Graphical abstract

17 pages, 1865 KiB  
Article
Improving Sentiment Analysis Performance on Imbalanced Moroccan Dialect Datasets Using Resample and Feature Extraction Techniques
by Zineb Nassr, Faouzia Benabbou, Nawal Sael and Touria Hamim
Information 2025, 16(1), 39; https://doi.org/10.3390/info16010039 - 10 Jan 2025
Viewed by 819
Abstract
Sentiment analysis is a crucial component of text mining and natural language processing (NLP), involving the evaluation and classification of text data based on its emotional tone, typically categorized as positive, negative, or neutral. While significant research has focused on structured languages like [...] Read more.
Sentiment analysis is a crucial component of text mining and natural language processing (NLP), involving the evaluation and classification of text data based on its emotional tone, typically categorized as positive, negative, or neutral. While significant research has focused on structured languages like English, unstructured languages, such as the Moroccan Dialect (MD), face substantial resource limitations and linguistic challenges, making effective sentiment analysis difficult. This study addresses this gap by exploring the integration of data-balancing techniques with machine learning (ML) methods, specifically investigating the impact of resampling techniques and feature extraction methods, including Term Frequency–Inverse Document Frequency (TF-IDF), Bag of Words (BOW), and N-grams. Through rigorous experimentation, we evaluate the effectiveness of these approaches in enhancing sentiment analysis accuracy for the Moroccan dialect. Our findings demonstrate that strategic resampling, combined with the TF-IDF method, significantly improves classification accuracy and robustness. We also explore the interaction between resampling strategies and feature extraction methods, revealing varying levels of effectiveness across different combinations. Notably, the Support Vector Machine (SVM) classifier, when paired with TF-IDF representation, achieves superior performance, with an accuracy of 90.24% and a precision of 90.34%. These results highlight the importance of tailored resampling techniques, appropriate feature extraction methods, and machine learning optimization in advancing sentiment analysis for under-resourced and dialect-heavy languages like the Moroccan dialect, providing a practical framework for future research and development in NLP for unstructured languages. Full article
Show Figures

Graphical abstract

14 pages, 405 KiB  
Article
Understanding Online Purchases with Explainable Machine Learning
by João A. Bastos and Maria Inês Bernardes
Information 2024, 15(10), 587; https://doi.org/10.3390/info15100587 - 26 Sep 2024
Viewed by 1022
Abstract
Customer profiling in e-commerce is a powerful tool that enables organizations to create personalized offers through direct marketing. One crucial objective of customer profiling is to predict whether a website visitor will make a purchase, thereby generating revenue. Machine learning models are the [...] Read more.
Customer profiling in e-commerce is a powerful tool that enables organizations to create personalized offers through direct marketing. One crucial objective of customer profiling is to predict whether a website visitor will make a purchase, thereby generating revenue. Machine learning models are the most accurate means to achieve this objective. However, the opaque nature of these models may deter companies from adopting them. Instead, they may prefer simpler models that allow for a clear understanding of the customer attributes that contribute to a purchase. In this study, we show that companies need not compromise on prediction accuracy to understand their online customers. By leveraging website data from a multinational communications service provider, we establish that the most pertinent customer attributes can be readily extracted from a black box model. Specifically, we show that the features that measure customer activity within the e-commerce platform are the most reliable predictors of conversions. Moreover, we uncover significant nonlinear relationships between customer features and the likelihood of conversion. Full article
Show Figures

Figure 1

24 pages, 667 KiB  
Article
Utilizing Multi-Class Classification Methods for Automated Sleep Disorder Prediction
by Elias Dritsas and Maria Trigka
Information 2024, 15(8), 426; https://doi.org/10.3390/info15080426 - 23 Jul 2024
Cited by 2 | Viewed by 2185
Abstract
Even from infancy, a human’s day-life alternates from a period of wakefulness to a period of sleep at night, during the 24-hour cycle. Sleep is a normal process necessary for human physical and mental health. A lack of sleep makes it difficult to [...] Read more.
Even from infancy, a human’s day-life alternates from a period of wakefulness to a period of sleep at night, during the 24-hour cycle. Sleep is a normal process necessary for human physical and mental health. A lack of sleep makes it difficult to control emotions and behaviour, reduces productivity at work, and can even increase stress or depression. In addition, poor sleep affects health; when sleep is insufficient, the chances of developing serious diseases greatly increase. Researchers in sleep medicine have identified an extensive list of sleep disorders, and thus leveraged Artificial Intelligence (AI) to automate their analysis and gain a deeper understanding of sleep patterns and related disorders. In this research, we seek a Machine Learning (ML) solution that will allow for efficient classification of unlabeled instances as being Sleep Apnea, Insomnia or Normal (subjects without a specific sleep disorder) by assessing the performance of two well-established strategies for multi-class classification tasks: the One-Vs-All (OVA) and One-Vs-One (OVO). In the context of the specific strategies, two well-known binary classification models were assumed, Logistic Regression (LR) and Support Vector Machines (SVMs). Both strategies’ validity was verified upon a dataset of diverse information related to the profiles (anthropometric data, sleep metrics, lifestyle and cardiovascular health factors) of potential patients or individuals not exhibiting any specific sleep disorder. Performance evaluation was carried out by comparing the weighted average results in all involved classes that represent these two specific sleep disorders and no-disorder occurrence; accuracy, kappa score, precision, recall, f-measure, and Area Under the ROC curve (AUC) were recorded and compared to identify an effective and robust model and strategy, both class-wise and on average. The experimental evaluation unveiled that after feature selection, 2-degree polynomial SVM under both strategies was the least complex and most efficient, recording an accuracy of 91.44%, a kappa score of 84.97%, precision, recall and f-measure equal to 0.914, and an AUC of 0.927. Full article
Show Figures

Figure 1

Back to TopTop