MDPI - Publisher of Open Access Journals

28 pages, 2379 KiB

Open AccessArticle

FADEL: Ensemble Learning Enhanced by Feature Augmentation and Discretization

by Chuan-Sheng Hung, Chun-Hung Richard Lin, Shi-Huang Chen, You-Cheng Zheng, Cheng-Han Yu, Cheng-Wei Hung, Ting-Hsin Huang and Jui-Hsiu Tsai

Bioengineering 2025, 12(8), 827; https://doi.org/10.3390/bioengineering12080827 - 30 Jul 2025

Viewed by 251

Abstract

In recent years, data augmentation techniques have become the predominant approach for addressing highly imbalanced classification problems in machine learning. Algorithms such as the Synthetic Minority Over-sampling Technique (SMOTE) and Conditional Tabular Generative Adversarial Network (CTGAN) have proven effective in synthesizing minority class [...] Read more.

In recent years, data augmentation techniques have become the predominant approach for addressing highly imbalanced classification problems in machine learning. Algorithms such as the Synthetic Minority Over-sampling Technique (SMOTE) and Conditional Tabular Generative Adversarial Network (CTGAN) have proven effective in synthesizing minority class samples. However, these methods often introduce distributional bias and noise, potentially leading to model overfitting, reduced predictive performance, increased computational costs, and elevated cybersecurity risks. To overcome these limitations, we propose a novel architecture, FADEL, which integrates feature-type awareness with a supervised discretization strategy. FADEL introduces a unique feature augmentation ensemble framework that preserves the original data distribution by concurrently processing continuous and discretized features. It dynamically routes these feature sets to their most compatible base models, thereby improving minority class recognition without the need for data-level balancing or augmentation techniques. Experimental results demonstrate that FADEL, solely leveraging feature augmentation without any data augmentation, achieves a recall of 90.8% and a G-mean of 94.5% on the internal test set from Kaohsiung Chang Gung Memorial Hospital in Taiwan. On the external validation set from Kaohsiung Medical University Chung-Ho Memorial Hospital, it maintains a recall of 91.9% and a G-mean of 86.7%. These results outperform conventional ensemble methods trained on CTGAN-balanced datasets, confirming the superior stability, computational efficiency, and cross-institutional generalizability of the FADEL architecture. Altogether, FADEL uses feature augmentation to offer a robust and practical solution to extreme class imbalance, outperforming mainstream data augmentation-based approaches. Full article

(This article belongs to the Special Issue Artificial Intelligence for Better Healthcare and Precision Medicine, 2nd Edition)

► Show Figures

Graphical abstract

25 pages, 1319 KiB

Open AccessArticle

Beyond Performance: Explaining and Ensuring Fairness in Student Academic Performance Prediction with Machine Learning

by Kadir Kesgin, Salih Kiraz, Selahattin Kosunalp and Bozhana Stoycheva

Appl. Sci. 2025, 15(15), 8409; https://doi.org/10.3390/app15158409 - 29 Jul 2025

Viewed by 241

Abstract

This study addresses fairness in machine learning for student academic performance prediction using the UCI Student Performance dataset. We comparatively evaluate logistic regression, Random Forest, and XGBoost, integrating the Synthetic Minority Oversampling Technique (SMOTE) to address class imbalance and 5-fold cross-validation for robust [...] Read more.

This study addresses fairness in machine learning for student academic performance prediction using the UCI Student Performance dataset. We comparatively evaluate logistic regression, Random Forest, and XGBoost, integrating the Synthetic Minority Oversampling Technique (SMOTE) to address class imbalance and 5-fold cross-validation for robust model training. A comprehensive fairness analysis is conducted, considering sensitive attributes such as gender, school type, and socioeconomic factors, including parental education (Medu and Fedu), cohabitation status (Pstatus), and family size (famsize). Using the AIF360 library, we compute the demographic parity difference (DP) and Equalized Odds Difference (EO) to assess model biases across diverse subgroups. Our results demonstrate that XGBoost achieves high predictive performance (accuracy: 0.789; F1 score: 0.803) while maintaining low bias for socioeconomic attributes, offering a balanced approach to fairness and performance. A sensitivity analysis of bias mitigation strategies further enhances the study, advancing equitable artificial intelligence in education by incorporating socially relevant factors. Full article

(This article belongs to the Special Issue Challenges and Trends in Technology-Enhanced Learning)

► Show Figures

Figure 1

22 pages, 1896 KiB

Open AccessArticle

Physics-Constrained Diffusion-Based Scenario Expansion Method for Power System Transient Stability Assessment

by Wei Dong, Yue Yu, Lebing Zhao, Wen Hua, Ying Yang, Bowen Wang, Jiawen Cao and Changgang Li

Processes 2025, 13(8), 2344; https://doi.org/10.3390/pr13082344 - 23 Jul 2025

Viewed by 236

Abstract

In transient stability assessment (TSA) of power systems, the extreme scarcity of unstable scenario samples often leads to misjudgments of fault risks by assessment models, and this issue is particularly pronounced in new-type power systems with high penetration of renewable energy sources. To [...] Read more.

In transient stability assessment (TSA) of power systems, the extreme scarcity of unstable scenario samples often leads to misjudgments of fault risks by assessment models, and this issue is particularly pronounced in new-type power systems with high penetration of renewable energy sources. To address this, this paper proposes a physics-constrained diffusion-based scenario expansion method. It constructs a hierarchical conditional diffusion framework embedded with transient differential equations, combines a spatiotemporal decoupling analysis mechanism to capture grid topological and temporal features, and introduces a transient energy function as a stability boundary constraint to ensure the physical rationality of generated scenarios. Verification on the modified IEEE-39 bus system with a high proportion of new energy sources shows that the proposed method achieves an unstable scenario recognition rate of 98.77%, which is 3.92 and 2.65 percentage points higher than that of the Synthetic Minority Oversampling Technique (SMOTE, 94.85%) and Generative Adversarial Networks (GANs, 96.12%) respectively. The geometric mean achieves 99.33%, significantly enhancing the accuracy and reliability of TSA, and providing sufficient technical support for identifying the dynamic security boundaries of power systems. Full article

(This article belongs to the Section Energy Systems)

► Show Figures

Figure 1

43 pages, 2108 KiB

Open AccessFeature PaperArticle

FIGS: A Realistic Intrusion-Detection Framework for Highly Imbalanced IoT Environments

by Zeynab Anbiaee, Sajjad Dadkhah and Ali A. Ghorbani

Electronics 2025, 14(14), 2917; https://doi.org/10.3390/electronics14142917 - 21 Jul 2025

Viewed by 386

Abstract

The rapid growth of Internet of Things (IoT) environments has increased security challenges due to heightened exposure to cyber threats and attacks. A key problem is the class imbalance in attack traffic, where critical yet underrepresented attacks are often overlooked by intrusion-detection systems [...] Read more.

The rapid growth of Internet of Things (IoT) environments has increased security challenges due to heightened exposure to cyber threats and attacks. A key problem is the class imbalance in attack traffic, where critical yet underrepresented attacks are often overlooked by intrusion-detection systems (IDS), thereby compromising reliability. We propose Feature-Importance GAN SMOTE (FIGS), an innovative, realistic intrusion-detection framework designed for IoT environments to address this challenge. Unlike other works that rely only on traditional oversampling methods, FIGS integrates sensitivity-based feature-importance analysis, Generative Adversarial Network (GAN)-based augmentation, a novel imbalance ratio (GIR), and Synthetic Minority Oversampling Technique (SMOTE) for generating high-quality synthetic data for minority classes. FIGS enhanced minority class detection by focusing on the most important features identified by the sensitivity analysis, while minimizing computational overhead and reducing noise during data generation. Evaluations on the CICIoMT2024 and CICIDS2017 datasets demonstrate that FIGS improves detection accuracy and significantly lowers the false negative rate. FIGS achieved a 17% improvement over the baseline model on the CICIoMT2024 dataset while maintaining performance for the majority groups. The results show that FIGS represents a highly effective solution for real-world IoT networks with high detection accuracy across all classes without introducing unnecessary computational overhead. Full article

(This article belongs to the Special Issue Network Security and Cryptography Applications)

► Show Figures

Figure 1

18 pages, 10564 KiB

Open AccessArticle

Handling Data Structure Issues with Machine Learning in a Connected and Autonomous Vehicle Communication System

by Pranav K. Jha and Manoj K. Jha

Vehicles 2025, 7(3), 73; https://doi.org/10.3390/vehicles7030073 - 11 Jul 2025

Viewed by 327

Abstract

Connected and Autonomous Vehicles (CAVs) remain vulnerable to cyberattacks due to inherent security gaps in the Controller Area Network (CAN) protocol. We present a structured Python (3.11.13) framework that repairs structural inconsistencies in a public CAV dataset to improve the reliability of machine [...] Read more.

Connected and Autonomous Vehicles (CAVs) remain vulnerable to cyberattacks due to inherent security gaps in the Controller Area Network (CAN) protocol. We present a structured Python (3.11.13) framework that repairs structural inconsistencies in a public CAV dataset to improve the reliability of machine learning-based intrusion detection. We assess the effect of training data volume and compare Random Forest (RF) and Extreme Gradient Boosting (XGBoost) classifiers across four attack types: DoS, Fuzzy, RPM spoofing, and GEAR spoofing. XGBoost outperforms RF, achieving 99.2 % accuracy on the DoS dataset and 100 % accuracy on the Fuzzy, RPM, and GEAR datasets. The Synthetic Minority Oversampling Technique (SMOTE) further enhances minority-class detection without compromising overall performance. This methodology provides a generalizable framework for anomaly detection in other connected systems, including smart grids, autonomous defense platforms, and industrial control networks. Full article

► Show Figures

Figure 1

14 pages, 574 KiB

Open AccessArticle

Ki-67 as a Predictor of Metastasis in Adrenocortical Carcinoma: Artificial Intelligence Insights from Retrospective Imaging Data

by Andrew J. Goulian and David S. Yee

J. Clin. Med. 2025, 14(14), 4829; https://doi.org/10.3390/jcm14144829 - 8 Jul 2025

Viewed by 342

Abstract

Background/Objectives: Adrenocortical carcinoma (ACC) is a rare, aggressive malignancy with poor prognosis, particularly in metastatic cases. The Ki-67 proliferation index is a recognized marker of tumor aggressiveness, yet its role in guiding diagnostic imaging and surgical decision-making remains underexplored. This study evaluates Ki-67’s [...] Read more.

Background/Objectives: Adrenocortical carcinoma (ACC) is a rare, aggressive malignancy with poor prognosis, particularly in metastatic cases. The Ki-67 proliferation index is a recognized marker of tumor aggressiveness, yet its role in guiding diagnostic imaging and surgical decision-making remains underexplored. This study evaluates Ki-67’s predictive value for metastasis at diagnosis, leveraging artificial intelligence (AI) to inform personalized, minimally invasive strategies for ACC management. Methods: We retrospectively analyzed 53 patients with histologically confirmed ACC from the Adrenal-ACC-Ki67-Seg dataset in The Cancer Imaging Archive. All patients had Ki-67 indices from surgical specimens and preoperative contrast-enhanced CT scans. Descriptive statistics, t-tests, ANOVA, and multivariable logistic regression evaluated associations between Ki-67, tumor size, age, and metastasis. Random Forest classifiers—with and without the Synthetic Minority Oversampling Technique (SMOTE)—were developed to predict metastasis. A Ki-67-only model served as a baseline comparator. Model performance was assessed using the area under the curve (AUC) and DeLong’s test. Results: Patients with metastatic disease had significantly higher Ki-67 indices (mean 39.4% vs. 21.6%, p < 0.05). Logistic regression identified Ki-67 as the sole significant predictor (OR = 1.06, 95% CI: 1.01–1.12). The Ki-67-only model achieved an AUC of 0.637, while the SMOTE-enhanced Random Forest achieved an AUC of 0.994, significantly outperforming all others (p < 0.001). Conclusions: Ki-67 is significantly associated with metastasis at ACC diagnosis and demonstrates independent predictive value in regression analysis. However, integration with machine learning models incorporating tumor size and age significantly improves overall predictive accuracy, supporting AI-assisted risk stratification and precision imaging strategies in adrenal cancer care. Full article

(This article belongs to the Special Issue Recent Advances in Imaging and Interventional Techniques for Renal and Adrenal Diseases)

► Show Figures

Figure 1

21 pages, 3919 KiB

Open AccessArticle

Comparative Analysis of Resampling Techniques for Class Imbalance in Financial Distress Prediction Using XGBoost

by Guodong Hou, Dong Ling Tong, Soung Yue Liew and Peng Yin Choo

Mathematics 2025, 13(13), 2186; https://doi.org/10.3390/math13132186 - 4 Jul 2025

Viewed by 403

Abstract

One of the key challenges in financial distress data is class imbalance, where the data are characterized by a highly imbalanced ratio between the number of distressed and non-distressed samples. This study examines eight resampling techniques for improving distress prediction using the XGBoost [...] Read more.

One of the key challenges in financial distress data is class imbalance, where the data are characterized by a highly imbalanced ratio between the number of distressed and non-distressed samples. This study examines eight resampling techniques for improving distress prediction using the XGBoost algorithm. The study was performed on a dataset acquired from the CSMAR database, containing 26,383 firm-quarter samples from 639 Chinese A-share listed companies (2007–2024), with only 12.1% of the cases being distressed. Results show that standard Synthetic Minority Oversampling Technique (SMOTE) enhanced F1-score (up to 0.73) and Matthews Correlation Coefficient (MCC, up to 0.70), while SMOTE-Tomek and Borderline-SMOTE further boosted recall, slightly sacrificing precision. These oversampling and hybrid methods also maintained reasonable computational efficiency. However, Random Undersampling (RUS), though yielding high recall (0.85), suffered from low precision (0.46) and weaker generalization, but was the fastest method. Among all techniques, Bagging-SMOTE achieved balanced performance (AUC 0.96, F1 0.72, PR-AUC 0.80, MCC 0.68) using a minority-to-majority ratio of 0.15, demonstrating that ensemble-based resampling can improve robustness with minimal impact on the original class distribution, albeit with higher computational cost. The compared findings highlight that no single approach fits all use cases, and technique selection should align with specific goals. Techniques favoring recall (e.g., Bagging-SMOTE, SMOTE-Tomek) are suited for early warning, while conservative techniques (e.g., Tomek Links) help reduce false positives in risk-sensitive applications, and efficient methods such as RUS are preferable when computational speed is a priority. Full article

(This article belongs to the Special Issue New Advances in Computational Finance and Computational Intelligence in Finance)

► Show Figures

Figure 1

15 pages, 1869 KiB

Open AccessArticle

Application of Hybrid Model Based on LASSO-SMOTE-BO-SVM to Lithology Identification During Drilling

by Hui Yao, Manyu Liang, Shangxian Yin, Qing Zhang, Yunlei Tian, Guoan Wang, Enke Hou, Huiqing Lian, Jinfu Zhang and Chuanshi Wu

Processes 2025, 13(7), 2038; https://doi.org/10.3390/pr13072038 - 27 Jun 2025

Viewed by 405

Abstract

Lithology identification during drilling plays a vital role in geological and geotechnical exploration, as it facilitates the early detection of formation-related hazards and supports the development of optimized mining strategies. Traditional lithology identification research involves problems such as fuzzy indicator characteristics and unbalanced [...] Read more.

Lithology identification during drilling plays a vital role in geological and geotechnical exploration, as it facilitates the early detection of formation-related hazards and supports the development of optimized mining strategies. Traditional lithology identification research involves problems such as fuzzy indicator characteristics and unbalanced sample quantities, which affect the accuracy and interpretability of model identification. In order to solve these problems, the Shanxi Guoqiang Coal Mine was taken as the research object, and a combined machine learning model was used to conduct a study on lithology identification during drilling. First, the least absolute shrinkage and selection operator (LASSO) algorithm was used to screen the independent variables and retain the parameters that contributed the most to lithology identification. Then, the synthetic minority oversampling technique (SMOTE) algorithm was used to expand the data samples, increase the amounts of minority sample data, and keep the ratios of various lithology data at 1:1. Then, the Bayesian optimization (BO) algorithm was used to optimize the penalty factor C and kernel function hyperparameter γ—two important parameters of the support vector machine (SVM) model—and the BO-SVM lithology identification model was established. Finally, the data samples were processed, and the results were compared with those of single models and unbalanced sample processing to evaluate their effect. The results showed the following: during the drilling process, the four indicators of drilling speed, mud pressure, slurry flow rate, and torque are strongly correlated with the lithology and can be used for lithology identification and classification research. After the data set was oversampled using the SMOTE algorithm, each model had better robustness and generalization ability; the classification result evaluation indicators were also greatly improved, especially for the random forest model, which had a poor original evaluation effect. The BO algorithm was used to optimize the parameters of the SVM model and establish a combined model that correctly identified 95 groups of data out of 96 groups of test samples with an identification accuracy rate of 99%, which was better than that of the traditional machine learning model. The evaluation results were compared with measured data, which confirmed the reliability of the combined model classification method and its potential to be extended to lithology identification and classification work. Full article

(This article belongs to the Special Issue Data-Driven Analysis and Simulation of Coal Mining)

► Show Figures

Figure 1

19 pages, 2410 KiB

Open AccessArticle

MAK-Net: A Multi-Scale Attentive Kolmogorov–Arnold Network with BiGRU for Imbalanced ECG Arrhythmia Classification

by Cong Zhao, Bingwei Lai, Yongzheng Xu, Yiping Wang and Haorong Dong

Sensors 2025, 25(13), 3928; https://doi.org/10.3390/s25133928 - 24 Jun 2025

Viewed by 572

Abstract

Accurate classification of electrocardiogram (ECG) signals is vital for reliable arrhythmia diagnosis and informed clinical decision-making, yet real-world datasets often suffer severe class imbalance that degrades recall and F1-score. To address these limitations, we introduce MAK-Net, a hybrid deep learning framework that combines: [...] Read more.

Accurate classification of electrocardiogram (ECG) signals is vital for reliable arrhythmia diagnosis and informed clinical decision-making, yet real-world datasets often suffer severe class imbalance that degrades recall and F1-score. To address these limitations, we introduce MAK-Net, a hybrid deep learning framework that combines: (1) a four-branch multiscale convolutional module for comprehensive feature extraction across diverse waveform morphologies; (2) an efficient channel attention mechanism for adaptive weighting of clinically salient segments; (3) bidirectional gated recurrent units (BiGRU) to capture long-range temporal dependencies; and (4) Kolmogorov–Arnold Network (KAN) layers with learnable spline activations for enhanced nonlinear representation and interpretability. We further mitigate imbalance by synergistically applying focal loss and the Synthetic Minority Oversampling Technique (SMOTE). On the MIT-BIH arrhythmia database, MAK-Net attains state-of-the-art performance—0.9980 accuracy, 0.9888 F1-score, 0.9871 recall, 0.9905 precision, and 0.9991 specificity—demonstrating superior robustness to imbalanced classes compared with existing methods. These findings validate the efficacy of multiscale feature fusion, attention-guided learning, and KAN-based nonlinear mapping for automated, clinically reliable arrhythmia detection. Full article

(This article belongs to the Section Biomedical Sensors)

► Show Figures

Figure 1

15 pages, 1351 KiB

Open AccessArticle

A Machine Learning-Based Detection for Parameter Tampering Vulnerabilities in Web Applications Using BERT Embeddings

by Sun Young Yun and Nam-Wook Cho

Symmetry 2025, 17(7), 985; https://doi.org/10.3390/sym17070985 - 22 Jun 2025

Viewed by 615

Abstract

The widespread adoption of web applications has led to a significant increase in the number of automated cyberattacks. Parameter tampering attacks pose a substantial security threat, enabling privilege escalation and unauthorized data exfiltration. Traditional pattern-based detection tools exhibit limited efficacy against such threats, [...] Read more.

The widespread adoption of web applications has led to a significant increase in the number of automated cyberattacks. Parameter tampering attacks pose a substantial security threat, enabling privilege escalation and unauthorized data exfiltration. Traditional pattern-based detection tools exhibit limited efficacy against such threats, as identical parameters may produce varying response patterns contingent on their processing context, including security filtering mechanisms. This study proposes a machine learning-based detection model to address these limitations by identifying parameter tampering vulnerabilities through a contextual analysis. The training dataset aggregates real-world vulnerability cases collected from web crawls, public vulnerability databases, and penetration testing reports. The Synthetic Minority Over-sampling Technique (SMOTE) was employed to address the data imbalance during training. Recall was adopted as the primary evaluation metric to prioritize the detection of true vulnerabilities. Comparative analysis showed that the XGBoost model demonstrated superior performance and was selected as the detection model. Validation was performed using web URLs with known parameter tampering vulnerabilities, achieving a detection rate of 73.3%, outperforming existing open-source automated tools. The proposed model enhances vulnerability detection by incorporating semantic representations of parameters and their values using BERT embeddings, enabling the system to learn contextual characteristics beyond the capabilities of pattern-based methods. These findings suggest the potential of the proposed method for scalable, efficient, and automated security diagnostics in large-scale web environments. Full article

(This article belongs to the Section Computer)

► Show Figures

Figure 1

15 pages, 640 KiB

Open AccessArticle

Interpretable Machine Learning for Serum-Based Metabolomics in Breast Cancer Diagnostics: Insights from Multi-Objective Feature Selection-Driven LightGBM-SHAP Models

by Emek Guldogan, Fatma Hilal Yagin, Hasan Ucuzal, Sarah A. Alzakari, Amel Ali Alhussan and Luca Paolo Ardigò

Medicina 2025, 61(6), 1112; https://doi.org/10.3390/medicina61061112 - 19 Jun 2025

Viewed by 942

Abstract

Background and Objectives: Breast cancer accounts for 12.5% of all new cancer cases in women worldwide. Early detection significantly improves survival rates, but traditional biomarkers like CA 15-3 and HER2 lack sensitivity and specificity, particularly for early-stage disease. Advances in metabolomics and machine [...] Read more.

Background and Objectives: Breast cancer accounts for 12.5% of all new cancer cases in women worldwide. Early detection significantly improves survival rates, but traditional biomarkers like CA 15-3 and HER2 lack sensitivity and specificity, particularly for early-stage disease. Advances in metabolomics and machine learning, particularly explainable artificial intelligence (XAI), offer new opportunities for identifying robust biomarkers and improving diagnostic accuracy. This study aimed to identify and validate serum-based metabolic biomarkers for breast cancer using advanced metabolomic profiling techniques and a Light Gradient Boosting Machine (LightGBM) model. Additionally, SHapley Additive exPlanations (SHAP) were applied to enhance model interpretability and biological insight. Materials and Methods: The study included 103 breast cancer patients and 31 healthy controls. Serum samples underwent liquid and gas chromatography–time-of-flight mass spectrometry (LC-TOFMS and GC-TOFMS). Mutual Information (MI), Sparse Partial Least Squares (sPLS), Boruta, and Multi-Objective Feature Selection (MOFS) approaches were applied to the data for biomarker discovery. LightGBM, AdaBoost, and Random Forest were employed for classification and to identify class imbalance with the Synthetic Minority Oversampling Technique (SMOTE). SHAP analysis ranked metabolites based on their contribution to model predictions. Results: Compared to other feature selection approaches, the MOFS approach was more robust in terms of predictive performance, and metabolites identified by this method were used in subsequent analyses for biomarker discovery. LightGBM outperformed the AdaBoost and Random Forest models, achieving 86.6% accuracy, 89.1% sensitivity, 84.2% specificity, and an F1-score of 87.0%. SHAP analysis identified 2-Aminobutyric acid, choline, and coproporphyrin as the most influential metabolites, with dysregulation of these markers associated with breast cancer risk. Conclusions: This study is among the first to integrate SHAP explainability with metabolomic profiling, bridging computational predictions and biological insights for improved clinical adoption. This study demonstrates the effectiveness of combining metabolomics with XAI-driven machine learning for breast cancer diagnostics. The identified biomarkers not only improve diagnostic accuracy but also reveal critical metabolic dysregulations associated with disease progression. Full article

(This article belongs to the Special Issue Recent Advances in Diagnosis and Therapy of Gynecologic and Breast Cancers)

► Show Figures

Figure 1

22 pages, 1970 KiB

Open AccessArticle

Enhanced Intrusion Detection Using Conditional-Tabular-Generative-Adversarial-Network-Augmented Data and a Convolutional Neural Network: A Robust Approach to Addressing Imbalanced Cybersecurity Datasets

by Shridhar Allagi, Toralkar Pawan and Wai Yie Leong

Mathematics 2025, 13(12), 1923; https://doi.org/10.3390/math13121923 - 10 Jun 2025

Viewed by 678

Abstract

Intrusion prevention and classification are common in the research field of cyber security. Models built from training data may fail to prevent or classify intrusions accurately if the dataset is imbalanced. Most researchers employ SMOTE to balance the dataset. SMOTE in turn fails [...] Read more.

Intrusion prevention and classification are common in the research field of cyber security. Models built from training data may fail to prevent or classify intrusions accurately if the dataset is imbalanced. Most researchers employ SMOTE to balance the dataset. SMOTE in turn fails to address the constraints associated with the dataset, such as diverse data types, preserving the data distribution, capturing non-linear relationships, and preserving oversampling noise. The novelty of this work is in addressing the issues associated with data distribution and SMOTE by employing Conditional Tabular Generative Adversarial Networks (CTGANs) on NSL_KDD and UNSW_NB15 datasets. The balanced input corpus is fed into the CNN model to predict the intrusion. The CNN model involves two convolution layers, max-pooling, ReLU as the activation layer, and a dense layer. The proposed work employs measures such as accuracy, recall, precision, specificity and F1-score for measuring the model performance. The study shows that CTGAN improves the intrusion detection rate. This research highlights the high-quality synthetic samples generated by CTGAN that significantly enhance CNN-based intrusion detection performance on imbalance datasets. This demonstrates the potential for deploying GAN-based oversampling techniques in real-world cybersecurity systems to improve detection accuracy and reduce false negatives. Full article

(This article belongs to the Special Issue Computer Vision, Image Processing Technologies and Machine Learning)

► Show Figures

Figure 1

15 pages, 2035 KiB

Open AccessArticle

Evaluation of the Effect of Using Different Types of Clinker Grinding Aids on Grinding Performance by Numerical Analysis

by Yahya Kaya, Veysel Kobya, Murat Eser, Naz Mardani, Metin Bilgin and Ali Mardani

Materials 2025, 18(12), 2712; https://doi.org/10.3390/ma18122712 - 9 Jun 2025

Viewed by 411

Abstract

To develop more environmentally friendly and sustainable cementitious systems, the use of grinding aids (GAs) during the clinker grinding process has increasingly gained attention. Although the mechanisms of the action of grinding aids (GAs) are known, the selection of an effective grinding aid [...] Read more.

To develop more environmentally friendly and sustainable cementitious systems, the use of grinding aids (GAs) during the clinker grinding process has increasingly gained attention. Although the mechanisms of the action of grinding aids (GAs) are known, the selection of an effective grinding aid (GA) can be difficult due to the complexity of appropriate selection criteria. For this reason, it is important to model the effect of GA properties on grinding performance. In this study, seven different types of GAs were used in four different dosages, and time-dependent grinding was performed. The Blaine fineness values of cements were compared after each grinding process. In addition, the modeling of these parameters using machine learning and ensemble learning methods was discussed. The Synthetic Minority Over-sampling Technique (Smote) was used to generate artificial data and increase the number of data for the grinding efficiency experiment. The data were modeled using methods such as Artificial Neural Networks (ANNs), Attentive Interpretable Tabular Learning (TabNet), Random Forests (RFs), and the XGBoost Regressor (XGBoost), and the ranking of the parameters affecting the Blaine properties was determined using the XGBoost method. The XGBoost method achieved the best results in the MAE, RMSE, and LogCosh metrics with values of 21.0384, 33.7379, and 15.4846, respectively, in the experimental modeling studies with augmented data. This study contributes to a better understanding of the relationship between GA selection and milling process performance. Full article

(This article belongs to the Special Issue Modeling and Optimization of Material Properties and Characteristics)

► Show Figures

Figure 1

18 pages, 368 KiB

Open AccessArticle

Stacked Ensemble Learning for Classification of Parkinson’s Disease Using Telemonitoring Vocal Features

by Bolaji A. Omodunbi, David B. Olawade, Omosigho F. Awe, Afeez A. Soladoye, Nicholas Aderinto, Saak V. Ovsepian and Stergios Boussios

Diagnostics 2025, 15(12), 1467; https://doi.org/10.3390/diagnostics15121467 - 9 Jun 2025

Viewed by 770

Abstract

Background: Parkinson’s disease (PD) is a progressive neurodegenerative condition that impairs motor and non-motor functions. Early and accurate diagnosis is critical for effective management and care. Leveraging machine learning (ML) techniques, this study aimed to develop a robust prediction system for PD using [...] Read more.

Background: Parkinson’s disease (PD) is a progressive neurodegenerative condition that impairs motor and non-motor functions. Early and accurate diagnosis is critical for effective management and care. Leveraging machine learning (ML) techniques, this study aimed to develop a robust prediction system for PD using a stacked ensemble learning approach, addressing challenges such as imbalanced datasets and feature optimization. Methods: An open-access PD dataset comprising 22 vocal attributes and 195 instances from 31 subjects was utilized. To prevent data leakage, subjects were divided into training (22 subjects) and testing (9 subjects) groups, ensuring no subject appeared in both sets. Preprocessing included data cleaning and normalization via min–max scaling. The synthetic minority oversampling technique (SMOTE) was applied exclusively to the training set to address class imbalance. Feature selection techniques—forward search, gain ratio, and Kruskal–Wallis test—were employed using subject-wise cross-validation to identify significant attributes. The developed system combined support vector machine (SVM), random forest (RF), K-nearest neighbor (KNN), and decision tree (DT) as base classifiers, with logistic regression (LR) as the meta-classifier in a stacked ensemble learning framework. Performance was evaluated using both recording-wise and subject-wise metrics to ensure clinical relevance. Results: The stacked ensemble learning model achieved realistic performance with a recording-wise accuracy of 84.7% and subject-wise accuracy of 77.8% on completely unseen subjects, outperforming individual classifiers including KNN (81.4%), RF (79.7%), and SVM (76.3%). Cross-validation within the training set showed 89.2% accuracy, with the performance difference highlighting the importance of proper validation methodology. Feature selection results showed that using the top 10 features ranked by gain ratio provided optimal balance between performance and clinical interpretability. The system’s methodological robustness was validated through rigorous subject-wise evaluation, demonstrating the critical impact of validation methodology on reported performance. Conclusions: By implementing subject-wise validation and preventing data leakage, this study demonstrates that proper validation yields substantially different (and more realistic) results compared to flawed recording-wise approaches. The findings underscore the critical importance of validation methodology in healthcare ML applications and provide a template for methodologically sound PD classification research. Future research should focus on validating the model with larger, multi-center datasets and implementing standardized validation protocols to enhance clinical applicability. Full article

(This article belongs to the Special Issue Machine-Learning-Based Disease Diagnosis and Prediction)

► Show Figures

Figure 1

23 pages, 4049 KiB

Open AccessArticle

ROSE-BOX: A Lightweight and Efficient Intrusion Detection Framework for Resource-Constrained IIoT Environments

by Silin Peng, Yu Han, Ruonan Li, Lichen Liu, Jie Liu and Zhaoquan Gu

Appl. Sci. 2025, 15(12), 6448; https://doi.org/10.3390/app15126448 - 8 Jun 2025

Viewed by 509

Abstract

The rapid advancement of the Industrial Internet of Things (IIoT) has transformed industrial automation, enabling real-time monitoring and intelligent decision making. However, increased connectivity exposes IIoT systems to sophisticated cyber threats, which may pose significant security risks, especially in resource-constrained IIoT environments where [...] Read more.

The rapid advancement of the Industrial Internet of Things (IIoT) has transformed industrial automation, enabling real-time monitoring and intelligent decision making. However, increased connectivity exposes IIoT systems to sophisticated cyber threats, which may pose significant security risks, especially in resource-constrained IIoT environments where computational efficiency is critical. Existing intrusion detection solutions often suffer from high computational overhead and inadequate adaptability, rendering them impractical for real-time deployment in IIoT environments. To address these challenges, this study introduces a lightweight and efficient intrusion detection framework tailored for resource-constrained IIoT environments. Firstly, an XGBoost-assisted Random Forest (XGB-RF) method is proposed to select the most important features to obtain an optimal feature subset. Moreover, SMOTE (Synthetic Minority Oversampling Technique) is utilized to balance the optimal feature subset to improve detection precision. Furthermore, to reduce computing resource requirements and latency while improving detection performance, Bayesian optimization is applied to fine-tune the parameters of XGBoost (BO-XGBoost) to obtain the best detection results. Finally, extensive experiments on benchmark datasets, including CIC-IDS2017, CSE-CIC-IDS2018, and CIC-DDoS2019, demonstrate that the proposed method, which we call ROSE-BOX (Random Forest, Synthetic Minority Oversampling Technique, and BO-Xgboost), achieves a detection accuracy exceeding 99.85% while maintaining low latency and CPU occupancy rates. Our findings highlight the robustness, lightweight nature, and efficiency of ROSE-BOX, making it well-suited for real-time intrusion detection in resource-constrained IIoT environments. Full article

(This article belongs to the Special Issue Advances in the Internet of Things (IoT): Attacks Detection and Privacy Protection)

► Show Figures

Figure 1

Search Results (309)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (309)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI