Data-Related Challenges in Machine Learning: Theory and Application

A special issue of Electronics (ISSN 2079-9292). This special issue belongs to the section "Artificial Intelligence".

Deadline for manuscript submissions: 30 November 2026 | Viewed by 4119

Special Issue Editors


E-Mail Website
Guest Editor
Department of Computer Engineering, Sunchon National University, Suncheon 57922, Republic of Korea
Interests: application of deep learning in the fields of virtual reality (VR) and computer graphics

E-Mail Website
Guest Editor
Department of Information and Communication Engineering, Wonkwang University, Iksan 54538, Republic of Korea
Interests: large language models; intrusion detection; wifi sensing; image processing
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
School of Computer Science and Engineering, Kunsan National University, Gunsan 54150, Republic of Korea
Interests: artificial intelligence; multimedia; digital contents

Special Issue Information

Dear Colleagues,

Machine learning (ML) has transformed various industries, including healthcare, finance, cybersecurity, and autonomous systems. However, despite its success, ML still faces critical data-related challenges that impact its reliability, scalability, and fairness. This Special Issue, “Data-Related Challenges in Machine Learning: Theory and Application”, aims to explore fundamental issues, theoretical advancements, and innovative solutions addressing data quality, bias, privacy, efficiency, and interpretability in ML. 

The special issue will focus on, but not be limited to, the following key challenges: 

- Data Quality and Preprocessing: Handling noisy, missing, and inconsistent data remains a fundamental issue. Research on automated data cleaning, robust feature engineering, and data augmentation techniques is essential for improving ML model performance. 

- Bias and Fairness in ML: Many ML systems inherit biases from their training data, leading to unfair outcomes. Ensuring fairness through bias detection, fairness-aware learning, and ethical AI development is crucial. 

- Data Scarcity and Efficiency: In many real-world applications, collecting large, high-quality datasets is impractical. Methods such as few-shot learning, transfer learning, and self-supervised learning provide potential solutions. 

- Privacy and Security in ML: As ML models handle sensitive information, privacy-preserving techniques like federated learning, differential privacy, and encrypted machine learning are gaining attention. 

- Data Interpretability and Robustness: Understanding how ML models make decisions is essential for trust and adoption. Explainable AI (XAI) and adversarial robustness research contribute to making ML systems more transparent and reliable. 

Prof. Dr. Kwang-Seong Shin
Dr. Sungkwan Youm
Prof. Dr. Seong-Yoon Shin
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 250 words) can be sent to the Editorial Office for assessment.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Electronics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • data quality and preprocessing
  • bias and fairness in machine learning
  • few-shot learning and transfer learning
  • privacy-preserving machine learning
  • explainability and robustness in ML

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • Reprint: MDPI Books provides the opportunity to republish successful Special Issues in book format, both online and in print.

Further information on MDPI's Special Issue policies can be found here.

Published Papers (5 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

42 pages, 2464 KB  
Article
Energy-Aware Multilingual Evaluation of Large Language Models
by I. de Zarzà, Mauro Liz, J. de Curtò and Carlos T. Calafate
Electronics 2026, 15(7), 1395; https://doi.org/10.3390/electronics15071395 - 27 Mar 2026
Viewed by 438
Abstract
The rapid deployment of Large Language Models (LLMs) in multilingual, production-scale systems has made inference-time energy consumption a critical yet systematically under-evaluated dimension of model quality. While accuracy-centric benchmarks dominate current evaluation practice, they fail to capture the energy cost of reasoning, particularly [...] Read more.
The rapid deployment of Large Language Models (LLMs) in multilingual, production-scale systems has made inference-time energy consumption a critical yet systematically under-evaluated dimension of model quality. While accuracy-centric benchmarks dominate current evaluation practice, they fail to capture the energy cost of reasoning, particularly across languages and task complexities where consumption profiles diverge substantially. In this work, we present a comprehensive energy–performance evaluation of five instruction-tuned LLMs, spanning Transformer, Grouped-Query Attention, and State Space Model architectures, across thirteen typologically diverse languages and multiple task difficulty levels under controlled GPU-level energy measurement on NVIDIA H200 hardware. Our analysis encompasses 65 model–language configurations totaling over 5100 individual inference runs, supported by rigorous non-parametric statistical testing (Friedman tests, pairwise Wilcoxon signed-rank with Holm correction, and paired Cohen’s d effect sizes). We report four principal findings. First, energy consumption varies up to threefold across models under identical workloads (χ2=49.42, p=4.78×1010, Friedman test), stratifying into three distinct energy regimes driven by architecture and generation dynamics rather than parameter count. Second, energy expenditure and reasoning performance are only weakly coupled, as confirmed by Spearman rank correlation analysis (rs=0.109, p=0.386). Third, task category and difficulty level introduce substantial and model-dependent variation in both energy demand and performance, with cross-lingual performance variance amplifying at higher difficulty levels. Fourth, language choice acts as a measurable deployment parameter as follows: Romance languages on average achieve lower energy consumption than English across multiple models, while model efficiency rankings shift across languages, yielding language-dependent Pareto-optimal frontiers. We formalize these trade-offs through multi-objective Pareto analysis and introduce a composite AI Energy Score metric that captures reasoning quality per unit of energy. Of the 65 evaluated configurations, only four are Pareto-optimal, three Mistral-7B configurations at the low-energy extreme and one Phi-4-mini-instruct configuration at the high-performance end, while three of the five models are entirely dominated across all language configurations. These findings provide actionable guidelines for energy-aware model selection in multilingual deployments and support the integration of AI Energy Scores as a standard complementary criterion in LLM evaluation frameworks. Full article
(This article belongs to the Special Issue Data-Related Challenges in Machine Learning: Theory and Application)
Show Figures

Figure 1

27 pages, 1628 KB  
Article
Synthetic Data Augmentation for Imbalanced Tabular Data: A Comparative Study of Generation Methods
by Dong-Hyun Won, Kwang-Seong Shin and Sungkwan Youm
Electronics 2026, 15(4), 883; https://doi.org/10.3390/electronics15040883 - 20 Feb 2026
Viewed by 892
Abstract
Class imbalance in tabular datasets poses a challenge for machine learning classification tasks, often leading to biased models that underperform in predicting minority class instances. This study presents a comparative analysis of synthetic data generation methods for addressing class imbalance in tabular data. [...] Read more.
Class imbalance in tabular datasets poses a challenge for machine learning classification tasks, often leading to biased models that underperform in predicting minority class instances. This study presents a comparative analysis of synthetic data generation methods for addressing class imbalance in tabular data. We evaluate four augmentation approaches—Synthetic Minority Over-sampling Technique (SMOTE), Gaussian Copula, Tabular Variational Autoencoder (TVAE), and Conditional Tabular Generative Adversarial Network (CTGAN)—using the University of California Irvine (UCI) Bank Marketing dataset, which exhibits a class imbalance ratio of approximately 7.88:1. Our experimental framework assesses each method across three dimensions: statistical fidelity to the original data distribution evaluated through four complementary metrics (marginal numerical similarity, categorical distribution similarity, correlation structure preservation, and Kolmogorov–Smirnov test), machine learning utility measured through classification performance, and minority class detection capability. Results indicate that all augmentation methods achieved statistically significant improvements over the baseline (p<0.05). SMOTE achieved the highest recall (54.2%, a 117.6% relative improvement over the baseline) and F1-Score (0.437, +22.4% over the baseline) for minority class detection, while Gaussian Copula provided the highest composite fidelity score (0.930) with competitive predictive performance. A weak negative correlation (ρ=0.30) between composite fidelity and classification performance was observed, suggesting that higher statistical fidelity does not necessarily translate to better downstream task performance. Deep learning-based methods (TVAE, CTGAN) showed statistically significant improvements over the baseline (recall: +58% to +63%) but underperformed compared to simpler methods under default configurations, suggesting the need for larger training samples or more extensive hyperparameter tuning. These findings offer reference points for practitioners working with moderately imbalanced tabular data with limited minority class samples, supporting the selection of generation strategies based on specific requirements regarding data fidelity and classification objectives. Full article
(This article belongs to the Special Issue Data-Related Challenges in Machine Learning: Theory and Application)
Show Figures

Figure 1

24 pages, 1740 KB  
Article
Unpacking Prediction: Contextualized and Interpretable Academic Risk Modeling with XAI for Small Cohorts
by Di Sun, Pengfei Xu, Gang Cheng and Ping Zhang
Electronics 2026, 15(3), 626; https://doi.org/10.3390/electronics15030626 - 2 Feb 2026
Viewed by 446
Abstract
Effective prediction of academic risk is vital in higher education to enable timely intervention and support student retention. While the introduction of Educational Data Mining (EDM) has enhanced prediction effectiveness, existing research often focuses only on single factors or large scale samples, and [...] Read more.
Effective prediction of academic risk is vital in higher education to enable timely intervention and support student retention. While the introduction of Educational Data Mining (EDM) has enhanced prediction effectiveness, existing research often focuses only on single factors or large scale samples, and is notably deficient in providing transparent explanations for prediction results. To address these gaps, this study proposes an Explainable Artificial Intelligence (XAI) framework for predicting and interpreting academic risk within a high-dimensional, small sample context. Based on a dataset from a specific student cohort, we employed an ML model combined with SHapley Additive exPlanations (SHAP) method as the XAI framework. The findings provide two major contributions to the “Data-Related Challenges in ML” discussion. Firstly, by leveraging the XAI framework, it successfully enhances data interpretability, revealing the out-of-class peer support as the feature with the strongest association with academic risk, which is a complex and often underestimated data dimension, surpassing traditional academic metrics. Specifically, learning support from peers is identified as the most critical feature in mitigating risk at both the group and individual levels. Secondly, methodologically, this framework validates a reliable approach for extracting meaningful, trustworthy, and interpretable knowledge from limited and specific cohort data, offering a solution for applications with highly contextualized and precise interventions, where large, generalizable datasets are impractical. In conclusion, this study enhances the transparency and trustworthiness of ML in EDM, ensuring responsible intervention strategies in academic risk prediction. Full article
(This article belongs to the Special Issue Data-Related Challenges in Machine Learning: Theory and Application)
Show Figures

Figure 1

19 pages, 1730 KB  
Article
Optimizing EV Battery Charging Using Fuzzy Logic in the Presence of Uncertainties and Unknown Parameters
by Minhaz Uddin Ahmed, Md Ohirul Qays, Stefan Lachowicz and Parvez Mahmud
Electronics 2026, 15(1), 177; https://doi.org/10.3390/electronics15010177 - 30 Dec 2025
Viewed by 670
Abstract
The growing use of electric vehicles (EVs) creates challenges in designing charging systems that are smart, dependable, and efficient, especially when environmental conditions change. This research proposes a fuzzy-logic-based PID control strategy integrated into a photovoltaic (PV) powered EV charging system to address [...] Read more.
The growing use of electric vehicles (EVs) creates challenges in designing charging systems that are smart, dependable, and efficient, especially when environmental conditions change. This research proposes a fuzzy-logic-based PID control strategy integrated into a photovoltaic (PV) powered EV charging system to address uncertainties such as fluctuating solar irradiance, grid instability, and dynamic load demands. A MATLAB-R2023a/Simulink-R2023a model was developed to simulate the charging process using real-time adaptive control. The fuzzy logic controller (FLC) automatically updates the PID gains by evaluating the error and how quickly the error is changing. This adaptive approach enables efficient voltage regulation and improved system stability. Simulation results demonstrate that the proposed fuzzy–PID controller effectively maintains a steady charging voltage and minimizes power losses by modulating switching frequency. Additionally, the system shows resilience to rapid changes in irradiance and load, improving energy efficiency and extending battery life. This hybrid approach outperforms conventional PID and static control methods, offering enhanced adaptability for renewable-integrated EV infrastructure. The study contributes to sustainable mobility solutions by optimizing the interaction between solar energy and EV charging, paving the way for smarter, grid-friendly, and environmentally responsible charging networks. These findings support the potential for the real-world deployment of intelligent controllers in EV charging systems powered by renewable energy sources This study is purely simulation-based; experimental validation via hardware-in-the-loop (HIL) or prototype development is reserved for future work. Full article
(This article belongs to the Special Issue Data-Related Challenges in Machine Learning: Theory and Application)
Show Figures

Figure 1

24 pages, 2843 KB  
Article
Classification of Maize Images Enhanced with Slot Attention Mechanism in Deep Learning Architectures
by Zafer Cömert, Alper Talha Karadeniz, Erdal Basaran and Yuksel Celik
Electronics 2025, 14(13), 2635; https://doi.org/10.3390/electronics14132635 - 30 Jun 2025
Cited by 2 | Viewed by 1094
Abstract
Maize is a vital global crop, serving as a fundamental component of global food security. To support sustainable maize production, the accurate classification of maize seeds—particularly distinguishing haploid from diploid types—is essential for enhancing breeding efficiency. Conventional methods relying on manual inspection or [...] Read more.
Maize is a vital global crop, serving as a fundamental component of global food security. To support sustainable maize production, the accurate classification of maize seeds—particularly distinguishing haploid from diploid types—is essential for enhancing breeding efficiency. Conventional methods relying on manual inspection or simple machine learning are prone to errors and unsuitable for large-scale data. To overcome these limitations, we propose Slot-Maize, a novel deep learning architecture that integrates Convolutional Neural Networks (CNN), Slot Attention, Gated Recurrent Units (GRU), and Long Short-Term Memory (LSTM) layers. The Slot-Maize model was evaluated using two datasets: the Maize Seed Dataset and the Maize Variety Dataset. The Slot Attention module improves feature representation by focusing on object-centric regions within seed images. The GRU captures short-term sequential patterns in extracted features, while the LSTM models long-range dependencies, enhancing temporal understanding. Furthermore, Grad-CAM was utilized as an explainable AI technique to enhance the interpretability of the model’s decisions. The model demonstrated an accuracy of 96.97% on the Maize Seed Dataset and 92.30% on the Maize Variety Dataset, outperforming existing methods in both cases. These results demonstrate the model’s robustness, generalizability, and potential to accelerate automated maize breeding workflows. In conclusion, the Slot-Maize model provides a robust and interpretable solution for automated maize seed classification, representing a significant advancement in agricultural technology. By combining accuracy with explainability, Slot-Maize provides a reliable tool for precision agriculture. Full article
(This article belongs to the Special Issue Data-Related Challenges in Machine Learning: Theory and Application)
Show Figures

Figure 1

Back to TopTop