Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (64)

Search Parameters:
Keywords = borderline SMOTE

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
17 pages, 811 KB  
Article
A Hybrid Feature-Weighting and Resampling Model for Imbalanced Sentiment Analysis in User Game Reviews
by Thao-Trang Huynh-Cam, Long-Sheng Chen, Hsuan-Jung Huang and Hsiu-Chia Ko
Mathematics 2026, 14(8), 1273; https://doi.org/10.3390/math14081273 - 11 Apr 2026
Viewed by 269
Abstract
Sentiment analysis of online game reviews has increasingly become important in understanding player experiences and supporting data-driven game development. However, research in this domain has continuously faced two unresolved challenges: (1) the extreme imbalance between positive and negative feedback, and (2) the inefficiency [...] Read more.
Sentiment analysis of online game reviews has increasingly become important in understanding player experiences and supporting data-driven game development. However, research in this domain has continuously faced two unresolved challenges: (1) the extreme imbalance between positive and negative feedback, and (2) the inefficiency of existing feature-weighting schemes in capturing sentiment signals embedded in informal gaming discourses. Prior works demonstrated that negative feedback—though a few in number are highly influential—usually contain richer emotional content and longer textual structures; yet, prevailing classification models often perform poorly for these minorities (i.e., negative feedback). Numerous studies explored multimodal imbalance issues, class imbalance in cross-lingual ABSA (Aspect-Based Sentiment Analysis), reinforcement-learning-based architectures for imbalanced extraction tasks, and oversampling strategies like SMOTE (Synthetic Minority Over-sampling Technique) variants. Few investigations specifically addressed imbalanced sentiment classification in the contexts of online game reviews, where user-generated content exhibits unique lexical, structural, and emotional characteristics. To address these gaps, this study integrated TF-IDF (Term Frequency-Inverse Document Frequency), VADER (Valence Aware Dictionary and Sentiment Reasoner) lexicon features, and IGM (Inverse Gravity Moment) weightings with advanced oversampling methods such as ADASYN (Adaptive Synthetic Sampling Approach for Imbalanced Learning) and Borderline-SMOTE to improve the detection of minority sentiment classes. Ensemble models, including XGBoost (Extreme Gradient Boosting) and LightGBM (Light Gradient-Boosting Machine), were further employed to enhance the robustness of imbalance. Using a large-scale dataset of Steam game reviews, the proposed framework demonstrated substantial improvement in identifying negative sentiments, addressing a critical limitation in the existing computational game-analysis literature, and advancing the modeling for detecting the emotion-rich but imbalance-prone user feedback. Full article
Show Figures

Figure 1

25 pages, 8863 KB  
Article
A Multi-Scale Residual Convolutional Neural Network for Fault Diagnosis of Progressive Cavity Pump Systems in Coalbed Methane Wells with Imbalanced and Differentiated Data
by Jiaojiao Yu, Yajie Ou, Ying Gao, Youwu Li, Feng Gu, Jinhuang You, Bin Liu, Xiaoyong Gao and Chaodong Tan
Processes 2026, 14(2), 383; https://doi.org/10.3390/pr14020383 - 22 Jan 2026
Viewed by 343
Abstract
Coalbed methane, an abundant clean energy resource in China, is gaining significant attention. Electric submersible progressive cavity pumps, ideal for downhole extraction with high solids content, are vital in coalbed methane operations. Current fault diagnosis research for these pumps mainly relies on machine [...] Read more.
Coalbed methane, an abundant clean energy resource in China, is gaining significant attention. Electric submersible progressive cavity pumps, ideal for downhole extraction with high solids content, are vital in coalbed methane operations. Current fault diagnosis research for these pumps mainly relies on machine learning algorithms to identify fault features, but complex working conditions and imbalanced sample distributions challenge these models’ ability to perceive multi-scale and multi-dimensional features. To enhance the model’s perception of deep abnormal data in complex multi-case industrial datasets, this study proposes a deep learning model based on a multi-scale extraction and residual module convolutional neural network. Innovatively, a cross-attention module using global autocorrelation and local cross-correlation is introduced to constrain the multi-scale feature extraction process, making the model better suited to specific and differentiated data environments. Post feature extraction, the model employs Borderline-SMOTE to augment minority class samples and uses Tomek Links for noise removal. These enhancements improve the comprehensive perception of fault types with significant differences in period, amplitude, and dimension, as well as the learning capability for rare faults. Based on field-collected fault data and using enhanced and cleaned features for classifier training, tests on a real industrial dataset show the proposed model achieves an F1 Measure of 90.7%—an improvement of 13.38% over the unimproved model and 9.15–31.64% over other common fault diagnosis models. Experimental results confirm the method’s effectiveness in adapting to extremely imbalanced sample distributions and complex, variable field data characteristics. Full article
(This article belongs to the Special Issue Coalbed Methane Development Process)
Show Figures

Figure 1

41 pages, 80556 KB  
Article
Why ROC-AUC Is Misleading for Highly Imbalanced Data: In-Depth Evaluation of MCC, F2-Score, H-Measure, and AUC-Based Metrics Across Diverse Classifiers
by Mehdi Imani, Majid Joudaki, Ayoub Bagheri and Hamid R. Arabnia
Technologies 2026, 14(1), 54; https://doi.org/10.3390/technologies14010054 - 10 Jan 2026
Cited by 3 | Viewed by 3717
Abstract
This study re-evaluates ROC-AUC for binary classification under severe class imbalance (<3% positives). Despite its widespread use, ROC-AUC can mask operationally salient differences among classifiers when the costs of false positives and false negatives are asymmetric. Using three benchmarks, credit-card fraud detection (0.17%), [...] Read more.
This study re-evaluates ROC-AUC for binary classification under severe class imbalance (<3% positives). Despite its widespread use, ROC-AUC can mask operationally salient differences among classifiers when the costs of false positives and false negatives are asymmetric. Using three benchmarks, credit-card fraud detection (0.17%), yeast protein localization (1.35%), and ozone level detection (2.9%), we compare ROC-AUC with Matthews Correlation Coefficient, F2-score, H-measure, and PR-AUC. Our empirical analyses span 20 classifier–sampler configurations per dataset, combined with four classifiers (Logistic Regression, Random Forest, XGBoost, and CatBoost) and four oversampling methods plus a no-resampling baseline (no resampling, SMOTE, Borderline-SMOTE, SVM-SMOTE, ADASYN). ROC-AUC exhibits pronounced ceiling effects, yielding high scores even for underperforming models. In contrast, MCC and F2 align more closely with deployment-relevant costs and achieve the highest Kendall’s τ rank concordance across datasets; PR-AUC provides threshold-independent ranking, and H-measure integrates cost sensitivity. We quantify uncertainty and differences using stratified bootstrap confidence intervals, DeLong’s test for ROC-AUC, and Friedman–Nemenyi critical-difference diagrams, which collectively underscore the limited discriminative value of ROC-AUC in rare-event settings. The findings recommend a shift to a multi-metric evaluation framework: ROC-AUC should not be used as the primary metric in ultra-imbalanced settings; instead, MCC and F2 are recommended as primary indicators, supplemented by PR-AUC and H-measure where ranking granularity and principled cost integration are required. This evidence encourages researchers and practitioners to move beyond sole reliance on ROC-AUC when evaluating classifiers in highly imbalanced data. Full article
Show Figures

Figure 1

28 pages, 5227 KB  
Article
A BSMOTE-OOA-SuperLearner Hybrid Framework for Interpretable Prediction of Pillar Stability
by Weizhang Liang, Yu Liu, Pengpeng Lu and Zheng Li
Symmetry 2026, 18(1), 49; https://doi.org/10.3390/sym18010049 - 26 Dec 2025
Viewed by 376
Abstract
Pillar stability prediction is essential for underground mining safety, yet it remains challenging due to limited data, class imbalance, and insufficient interpretability. This study proposes an integrated Borderline-SMOTE-Osprey Optimization Algorithm-Super Learner framework (BSMOTE-OOA-SL) for hard-rock pillar stability prediction. The framework combines five heterogeneous [...] Read more.
Pillar stability prediction is essential for underground mining safety, yet it remains challenging due to limited data, class imbalance, and insufficient interpretability. This study proposes an integrated Borderline-SMOTE-Osprey Optimization Algorithm-Super Learner framework (BSMOTE-OOA-SL) for hard-rock pillar stability prediction. The framework combines five heterogeneous base learners (ANN, GBDT, KNN, RF, and SVM), applies Borderline-SMOTE within training folds to alleviate class imbalance, and employs the Osprey Optimization Algorithm (OOA) for systematic hyperparameter optimization. The model is evaluated using a dataset of 241 pillar cases from seven underground mines. Statistical experiments based on multiple random train–test splits show that the proposed framework consistently outperforms individual base learners in terms of Accuracy, Macro-Precision, Macro-Recall, and Macro-F1, demonstrating improved robustness and generalization. Ablation results indicate that the joint use of Borderline-SMOTE and OOA leads to quantitative performance gains of 10.21%, 12.25%, 12.61%, and 12.86% in Accuracy, Macro-Precision, Macro-Recall, and Macro-F1, respectively. Under a representative data split, the model achieves an overall accuracy of 95.92%, with strong class-wise Precision, Recall, and F1-score across all stability categories, and AUC values exceeding 0.9 for all classes (reaching 1.0 for the Failed category). SHAP-based interpretability analysis identifies stress-related indicators—particularly average pillar stress, Stress/UCS ratio, and UCS—as the dominant factors governing pillar stability. Overall, the proposed BSMOTE-OOA-SL framework provides a robust, interpretable, and statistically reliable solution for hard-rock pillar stability prediction. Full article
(This article belongs to the Special Issue Feature Papers in Section "Engineering and Materials" 2025)
Show Figures

Figure 1

28 pages, 5110 KB  
Article
WISEST: Weighted Interpolation for Synthetic Enhancement Using SMOTE with Thresholds
by Ryotaro Matsui, Luis Guillen, Satoru Izumi and Takuo Suganuma
Sensors 2025, 25(24), 7417; https://doi.org/10.3390/s25247417 - 5 Dec 2025
Viewed by 714
Abstract
Imbalanced learning occurs when rare but critical events are missed because classifiers are trained primarily on majority-class samples. This paper introduces WISEST, a locality-aware weighted-interpolation algorithm that generates synthetic minority samples within a controlled threshold near class boundaries. Benchmarked on more than a [...] Read more.
Imbalanced learning occurs when rare but critical events are missed because classifiers are trained primarily on majority-class samples. This paper introduces WISEST, a locality-aware weighted-interpolation algorithm that generates synthetic minority samples within a controlled threshold near class boundaries. Benchmarked on more than a hundred real-world imbalanced datasets, such as KEEL, with different imbalance ratios, noise levels, geometries, and other security and IoT sets (IoT-23 and BoT–IoT), WISEST consistently improved minority detection in at least one of the metrics on about half of those datasets, achieving up to a 25% relative recall increase and up to an 18% increase in F1 compared to the original training and other approaches. However, in most cases, WISEST’s trade-off gains are in accuracy and precision depending on the dataset and classifier. These results indicate that WISEST is a practical and robust option when minority support and borderline structure permit safe synthesis, although no single sampler uniformly outperforms others across all datasets. Full article
(This article belongs to the Special Issue Advances in Security of Mobile and Wireless Communications)
Show Figures

Figure 1

22 pages, 1734 KB  
Article
A Machine Learning Approach for Factor Analysis and Scenario-Based Prediction of Construction Accidents
by Ki-nam Kim, Dae-gu Cho and Min-jae Lee
Buildings 2025, 15(23), 4343; https://doi.org/10.3390/buildings15234343 - 28 Nov 2025
Cited by 1 | Viewed by 774
Abstract
The construction industry has persistently high accident rates, and major events continue despite strengthened safety management systems. This study analyzes 19,456 accident records from the national Construction Safety Management Integrated Information (CSI) system and applies a Light Gradient Boosting Machine (LightGBM) model to [...] Read more.
The construction industry has persistently high accident rates, and major events continue despite strengthened safety management systems. This study analyzes 19,456 accident records from the national Construction Safety Management Integrated Information (CSI) system and applies a Light Gradient Boosting Machine (LightGBM) model to predict fatal versus injury outcomes. SHAP was used to identify influential factors and quantify each variable’s contribution. Fatal events represented about 5% of cases, reflecting substantial class imbalance. To address this, three oversampling methods—SMOTE, Borderline-SMOTE, and ADASYN—were tested. The ADASYN model showed the best performance (F1-score = 0.905, AUC = 0.879) and was selected as the final model. Oversampling was applied exclusively to the training folds during stratified 10-fold cross-validation on the training set. After identifying the optimal number of iterations, the model was retrained on the full training data and its final performance was evaluated on the independent test set. SHAP results indicated that Type of Accident, Accident Object, and Work Process were primary drivers of fatal outcomes, whereas Safety Management Plan and Public/Private Ownership helped lessen severity. Project Cost, Progress Rate, and Number of Workers moderated prediction strength through interactions with key variables. This study clarifies structural relationships among factors affecting accident outcomes using a LightGBM–SHAP framework that captures nonlinear interactions, supporting explainable artificial intelligence (AI)–based safety management and risk monitoring. Full article
(This article belongs to the Section Construction Management, and Computers & Digitization)
Show Figures

Figure 1

13 pages, 545 KB  
Article
Factors Influencing Stroke Severity Based on Collateral Circulation, Clinical Markers and Machine Learning
by Jia-Lang Xu
Diagnostics 2025, 15(23), 2983; https://doi.org/10.3390/diagnostics15232983 - 24 Nov 2025
Viewed by 920
Abstract
Background/Objectives: Stroke is a serious neurological disorder that significantly affects patients’ quality of life and overall health. The severity of a stroke can vary widely and is influenced by multiple factors, such as clinical presentation, diagnostic findings, and the site of onset. This [...] Read more.
Background/Objectives: Stroke is a serious neurological disorder that significantly affects patients’ quality of life and overall health. The severity of a stroke can vary widely and is influenced by multiple factors, such as clinical presentation, diagnostic findings, and the site of onset. This study aimed to identify and analyze key variables that contribute to stroke severity, with a particular focus on the role of collateral circulation. Methods: This study analyzed clinical, imaging, and biochemical variables—ipsilateral collateral flow on MRA, MRI unilateral–bilateral stroke, systolic blood pressure (SBP), fasting plasma glucose (FPG), and blood urea nitrogen (BUN). Group differences used chi-square and Mann–Whitney U tests. Class imbalance was addressed with SMOTE; Logistic Regression, Random Forest, XGBoost, and SVM were cross-validated, reporting accuracy, precision, recall, and F1 with 95% CIs. Results: Reduced or absent ipsilateral collateral flow and unilateral–bilateral stroke were strongly associated with greater severity (p < 0.001). SBP was significant (p = 0.034), FPG was significant (p = 0.023), and BUN was borderline (p = 0.059). SMOTE improved prediction: Random Forest achieved accuracy 83.3% (CI: 79.1–87.6) and F1 84.0% (CI: 79.1–88.9); XGBoost reached accuracy 80.2% (CI: 71.5–89.0) and F1 81.4% (CI: 73.8–89.0). Logistic Regression improved to F1 70.8% (CI: 55.4–86.2), whereas SVM declined to accuracy 52.2% (CI: 37.5–67.0). Conclusions: Collateral status and unilateral–bilateral stroke are key determinants of severity; SBP and FPG add prognostic value, with BUN borderline. Tree-based ensembles trained on SMOTE-balanced data provide the most reliable predictions for risk stratification. These findings suggest that future work may focus on integrating such predictive models into Clinical Decision Support Systems (CDSSs) to enhance early risk identification, strengthen CDSSs, and enable more personalized care planning for stroke patients. Full article
Show Figures

Figure 1

20 pages, 4551 KB  
Article
Explainable Learning Framework for the Assessment and Prediction of Wind Shear-Induced Aviation Turbulence
by Afaq Khattak, Pak-wai Chan, Feng Chen, Adil A. M. Elhassan and Badr T. Alsulami
Atmosphere 2025, 16(12), 1318; https://doi.org/10.3390/atmos16121318 - 22 Nov 2025
Viewed by 828
Abstract
Wind shear-induced aviation turbulence (WSAT) remains a major safety concern during approach and takeoff phases at complex terrain airports. This study develops an interpretable Explainable Boosting Machine (EBM) framework to classify WSAT events at Hong Kong International Airport (HKIA). The framework integrates Differential [...] Read more.
Wind shear-induced aviation turbulence (WSAT) remains a major safety concern during approach and takeoff phases at complex terrain airports. This study develops an interpretable Explainable Boosting Machine (EBM) framework to classify WSAT events at Hong Kong International Airport (HKIA). The framework integrates Differential Evolution with HyperBand (DEHB) for hyperparameter tuning and applies multiple data balance methods such as SMOTE, Borderline SMOTE, Safe-Level SMOTE, and G-SMOTE. The dataset consists of Pilot Reports (PIREPs) collected between 1 January 2007 and 31 July 2023, with 6838 wind shear events that include variables that relate to wind shear magnitude, altitude, runway distance, rainfall condition, and causal factors. Among all configurations, the EBM tuned via DEHB and trained with SMOTE-treated data achieved the highest predictive performance with BA = 0.710, MCC = 0.321, and G-Mean = 0.708, higher than untreated and other balance variants. EBM-based interpretation showed that wind shear altitude and wind shear magnitude were key predictors, and their interaction reflected a nonlinear pattern where WSAT probability rose under moderate-to-high shear conditions (wind shear altitude ≈ 0.5–2.5 and magnitude ≈ 30–35 knots). The DEHB-optimized EBM–SMOTE framework provides a transparent interpretive foundation for WSAT risk assessment and advances quantitative evaluation in aviation meteorology. Full article
(This article belongs to the Section Atmospheric Techniques, Instruments, and Modeling)
Show Figures

Figure 1

8 pages, 4309 KB  
Proceeding Paper
Evaluation of Boosting Algorithms for Skin Cancer Classification Using the PAD-UFES-20 Dataset and Custom CNN Feature Extraction
by Danish Javed, Usama Arshad, Haider Irfan, Raja Hashim Ali and Talha Ali Khan
Eng. Proc. 2025, 87(1), 115; https://doi.org/10.3390/engproc2025087115 - 13 Nov 2025
Cited by 4 | Viewed by 1244
Abstract
Early and reliable detection of skin cancer is critical for improving patient outcomes and minimizing diagnostic uncertainty in dermatological practice. This study proposes an interpretable hybrid framework that integrates ConvMixer-based deep feature extraction with gradient boosting classifiers to perform multi-class skin lesion classification [...] Read more.
Early and reliable detection of skin cancer is critical for improving patient outcomes and minimizing diagnostic uncertainty in dermatological practice. This study proposes an interpretable hybrid framework that integrates ConvMixer-based deep feature extraction with gradient boosting classifiers to perform multi-class skin lesion classification on the publicly available PAD-UFES-20 dataset. The dataset contains 2298 dermoscopic and clinical images with associated patient metadata (age, gender, and anatomical site), enabling a joint evaluation of demographic and anatomical factors influencing model performance. After data augmentation, normalization, and class balancing using Borderline-SMOTE, Image embeddings extracted via ConvMixer were integrated with patient metadata and subsequently classified using CatBoost, XGBoost, and LightGBM. Among these, CatBoost achieved the highest macro-AUC of 0.94 and macro-F1 of 0.88, with a melanoma sensitivity of 0.91, while maintaining good calibration (Brier score = 0.06). Grad-CAM and SHAP analyses confirmed that the model’s attention and feature importance correspond to clinically relevant lesion regions and attributes. The results highlight that age and body-region imbalances in the PAD-UFES-20 dataset modestly influence predictive behavior, emphasizing the importance of balanced sampling and stratified validation. Overall, the proposed ConvMixer–CatBoost framework provides a compact, explainable, and generalizable solution for AI-assisted skin cancer classification. Full article
(This article belongs to the Proceedings of The 5th International Electronic Conference on Applied Sciences)
Show Figures

Figure 1

27 pages, 2953 KB  
Article
A Machine Learning Approach to Valve Plate Failure Prediction in Piston Pumps Under Imbalanced Data Conditions: Comparison of Data Balancing Methods
by Marcin Rojek and Marcin Blachnik
Appl. Sci. 2025, 15(21), 11542; https://doi.org/10.3390/app152111542 - 29 Oct 2025
Viewed by 847
Abstract
This article focuses on the problem of building a real-world predictive maintenance system for hydraulic piston pumps. Particular attention is given to the issue of limited data availability regarding the failure state of systems with a damaged valve plate. The main objective of [...] Read more.
This article focuses on the problem of building a real-world predictive maintenance system for hydraulic piston pumps. Particular attention is given to the issue of limited data availability regarding the failure state of systems with a damaged valve plate. The main objective of this work was to analyze the impact of imbalanced data on the quality of the failure prediction system. Several data balancing techniques, including oversampling, undersampling, and combined methods, were evaluated to overcome the limitations. The dataset used for evaluation includes recordings from eleven sensors, such as pressure, flow, and temperature, registered at various points in the hydraulic system. It also includes data from three additional vibration sensors. The experiments were conducted with imbalance ratios ranging from 0.5% to a fully balanced dataset. The results indicate that two methods, Borderline SMOTE and SMOTE+Tomek Links, dominate. These methods allowed the system to achieve the highest performance on a completely new dataset with different levels of damaged valve plates, for the balance rate larger than three percent. Furthermore, for balance rates below one percent, the use of data balancing methods may adversely affect the model. Finally, our results indicate the limitations of the use of cross-validation procedures when assessing data balancing methods. Full article
Show Figures

Figure 1

35 pages, 10688 KB  
Article
Multi-Armed Bandit Optimization for Explainable AI Models in Chronic Kidney Disease Risk Evaluation
by Jianbo Huang, Long Li and Jia Chen
Symmetry 2025, 17(11), 1808; https://doi.org/10.3390/sym17111808 - 27 Oct 2025
Cited by 1 | Viewed by 1087
Abstract
Chronic kidney disease (CKD) impacts over 850 million people globally, representing a critical public health issue, yet existing risk assessment methodologies inadequately address the complexity of disease progression trajectories. Traditional machine learning approaches encounter critical limitations including inefficient hyperparameter selection and lack of [...] Read more.
Chronic kidney disease (CKD) impacts over 850 million people globally, representing a critical public health issue, yet existing risk assessment methodologies inadequately address the complexity of disease progression trajectories. Traditional machine learning approaches encounter critical limitations including inefficient hyperparameter selection and lack of clinical transparency, hindering their deployment in healthcare settings. This study introduces an innovative computational framework that integrates adaptive Multi-Armed Bandit (MAB) strategies with BorderlineSMOTE sampling techniques to improve CKD risk assessment. The proposed methodology leverages XGBoost within an ensemble learning paradigm enhanced by Upper Confidence Bound exploration strategy, coupled with a comprehensive interpretability system incorporating SHAP and LIME analytical tools to ensure model transparency. To address the challenge of algorithmic interpretability while maintaining clinical utility, a four-level risk categorization framework was developed, employing cross-validated stratification methods and balanced performance evaluation metrics, thereby ensuring fair predictive accuracy across diverse patient populations and minimizing bias toward dominant risk categories. Through rigorous empirical evaluation on clinical datasets, we performed extensive comparative analysis against sixteen established algorithms using paired statistical testing with Bonferroni correction. The MAB-optimized framework achieved superior predictive performance with accuracy of 91.8%, F1-score of 91.0%, and ROC-AUC of 97.8%, demonstrating superior performance within the evaluated cohort of reference algorithms (p-value < 0.001). Remarkably, our optimized framework delivered nearly ten-fold computational efficiency gains relative to conventional grid search methods while preserving robust classification performance. Feature importance analysis identified albumin-to-creatinine ratio, eGFR measurements, and CKD staging as dominant prognostic factors, demonstrating concordance with established clinical nephrology practice. This research addresses three core limitations in healthcare artificial intelligence: optimization computational cost, model interpretability, and consistent performance across heterogeneous clinical populations, offering a practical solution for improved CKD risk stratification in clinical practice. Full article
Show Figures

Figure 1

37 pages, 2286 KB  
Article
Parameterised Quantum SVM with Data-Driven Entanglement for Zero-Day Exploit Detection
by Steven Jabulani Nhlapo, Elodie Ngoie Mutombo and Mike Nkongolo Wa Nkongolo
Computers 2025, 14(8), 331; https://doi.org/10.3390/computers14080331 - 15 Aug 2025
Cited by 1 | Viewed by 3074
Abstract
Zero-day attacks pose a persistent threat to computing infrastructure by exploiting previously unknown software vulnerabilities that evade traditional signature-based network intrusion detection systems (NIDSs). To address this limitation, machine learning (ML) techniques offer a promising approach for enhancing anomaly detection in network traffic. [...] Read more.
Zero-day attacks pose a persistent threat to computing infrastructure by exploiting previously unknown software vulnerabilities that evade traditional signature-based network intrusion detection systems (NIDSs). To address this limitation, machine learning (ML) techniques offer a promising approach for enhancing anomaly detection in network traffic. This study evaluates several ML models on a labeled network traffic dataset, with a focus on zero-day attack detection. Ensemble learning methods, particularly eXtreme gradient boosting (XGBoost), achieved perfect classification, identifying all 6231 zero-day instances without false positives and maintaining efficient training and prediction times. While classical support vector machines (SVMs) performed modestly at 64% accuracy, their performance improved to 98% with the use of the borderline synthetic minority oversampling technique (SMOTE) and SMOTE + edited nearest neighbours (SMOTEENN). To explore quantum-enhanced alternatives, a quantum SVM (QSVM) is implemented using three-qubit and four-qubit quantum circuits simulated on the aer_simulator_statevector. The QSVM achieved high accuracy (99.89%) and strong F1-scores (98.95%), indicating that nonlinear quantum feature maps (QFMs) can increase sensitivity to zero-day exploit patterns. Unlike prior work that applies standard quantum kernels, this study introduces a parameterised quantum feature encoding scheme, where each classical feature is mapped using a nonlinear function tuned by a set of learnable parameters. Additionally, a sparse entanglement topology is derived from mutual information between features, ensuring a compact and data-adaptive quantum circuit that aligns with the resource constraints of noisy intermediate-scale quantum (NISQ) devices. Our contribution lies in formalising a quantum circuit design that enables scalable, expressive, and generalisable quantum architectures tailored for zero-day attack detection. This extends beyond conventional usage of QSVMs by offering a principled approach to quantum circuit construction for cybersecurity. While these findings are obtained via noiseless simulation, they provide a theoretical proof of concept for the viability of quantum ML (QML) in network security. Future work should target real quantum hardware execution and adaptive sampling techniques to assess robustness under decoherence, gate errors, and dynamic threat environments. Full article
Show Figures

Figure 1

21 pages, 3919 KB  
Article
Comparative Analysis of Resampling Techniques for Class Imbalance in Financial Distress Prediction Using XGBoost
by Guodong Hou, Dong Ling Tong, Soung Yue Liew and Peng Yin Choo
Mathematics 2025, 13(13), 2186; https://doi.org/10.3390/math13132186 - 4 Jul 2025
Cited by 8 | Viewed by 3801
Abstract
One of the key challenges in financial distress data is class imbalance, where the data are characterized by a highly imbalanced ratio between the number of distressed and non-distressed samples. This study examines eight resampling techniques for improving distress prediction using the XGBoost [...] Read more.
One of the key challenges in financial distress data is class imbalance, where the data are characterized by a highly imbalanced ratio between the number of distressed and non-distressed samples. This study examines eight resampling techniques for improving distress prediction using the XGBoost algorithm. The study was performed on a dataset acquired from the CSMAR database, containing 26,383 firm-quarter samples from 639 Chinese A-share listed companies (2007–2024), with only 12.1% of the cases being distressed. Results show that standard Synthetic Minority Oversampling Technique (SMOTE) enhanced F1-score (up to 0.73) and Matthews Correlation Coefficient (MCC, up to 0.70), while SMOTE-Tomek and Borderline-SMOTE further boosted recall, slightly sacrificing precision. These oversampling and hybrid methods also maintained reasonable computational efficiency. However, Random Undersampling (RUS), though yielding high recall (0.85), suffered from low precision (0.46) and weaker generalization, but was the fastest method. Among all techniques, Bagging-SMOTE achieved balanced performance (AUC 0.96, F1 0.72, PR-AUC 0.80, MCC 0.68) using a minority-to-majority ratio of 0.15, demonstrating that ensemble-based resampling can improve robustness with minimal impact on the original class distribution, albeit with higher computational cost. The compared findings highlight that no single approach fits all use cases, and technique selection should align with specific goals. Techniques favoring recall (e.g., Bagging-SMOTE, SMOTE-Tomek) are suited for early warning, while conservative techniques (e.g., Tomek Links) help reduce false positives in risk-sensitive applications, and efficient methods such as RUS are preferable when computational speed is a priority. Full article
Show Figures

Figure 1

19 pages, 2124 KB  
Article
A Unified Deep Learning Ensemble Framework for Voice-Based Parkinson’s Disease Detection and Motor Severity Prediction
by Madjda Khedimi, Tao Zhang, Chaima Dehmani, Xin Zhao and Yanzhang Geng
Bioengineering 2025, 12(7), 699; https://doi.org/10.3390/bioengineering12070699 - 27 Jun 2025
Cited by 4 | Viewed by 3115
Abstract
This study presents a hybrid ensemble learning framework for the joint detection and motor severity prediction of Parkinson’s disease (PD) using biomedical voice features. The proposed architecture integrates a deep multimodal fusion model with dense expert pathways, multi-head self-attention, and multitask output branches [...] Read more.
This study presents a hybrid ensemble learning framework for the joint detection and motor severity prediction of Parkinson’s disease (PD) using biomedical voice features. The proposed architecture integrates a deep multimodal fusion model with dense expert pathways, multi-head self-attention, and multitask output branches to simultaneously perform binary classification and regression. To ensure data quality and improve model generalization, preprocessing steps included outlier removal via Isolation Forest, two-stage feature scaling (RobustScaler followed by MinMaxScaler), and augmentation through polynomial and interaction terms. Borderline-SMOTE was employed to address class imbalance in the classification task. To enhance prediction performance, ensemble learning strategies were applied by stacking outputs from the fusion model with tree-based regressors (Random Forest, Gradient Boosting, and XGBoost), using diverse meta-learners including XGBoost, Ridge Regression, and a deep neural network. Among these, the Stacking Ensemble with XGBoost (SE-XGB) achieved the best results, with an R2 of 99.78% and RMSE of 0.3802 for UPDRS regression and 99.37% accuracy for PD classification. Comparative analysis with recent literature highlights the superior performance of our framework, particularly in regression settings. These findings demonstrate the effectiveness of combining advanced feature engineering, deep learning, and ensemble meta-modeling for building accurate and generalizable models in voice-based PD monitoring. This work provides a scalable foundation for future clinical decision support systems. Full article
Show Figures

Figure 1

44 pages, 13985 KB  
Article
Improving Transformer Health Index Prediction Performance Using Machine Learning Algorithms with a Synthetic Minority Oversampling Technique
by Muhammad Akmal A. Putra, Suwarno and Rahman Azis Prasojo
Energies 2025, 18(9), 2364; https://doi.org/10.3390/en18092364 - 6 May 2025
Cited by 1 | Viewed by 2207
Abstract
Machine learning (ML) has emerged as a powerful tool in transformer condition assessment, enabling more accurate diagnostics by leveraging historical test data. However, imbalanced datasets, often characterized by limited samples in poor transformer conditions, pose significant challenges to model performance. This study investigates [...] Read more.
Machine learning (ML) has emerged as a powerful tool in transformer condition assessment, enabling more accurate diagnostics by leveraging historical test data. However, imbalanced datasets, often characterized by limited samples in poor transformer conditions, pose significant challenges to model performance. This study investigates the application of oversampling techniques to enhance ML model accuracy in predicting the Health Index of transformers. A dataset comprising 3850 transformer tests collected from utilities across Indonesia was used. Key parameters, including oil quality, dissolved gas analysis, and paper condition factors, were employed as inputs for ML modeling. To address the class imbalance, various oversampling methods, such as the Synthetic Minority Oversampling Technique (SMOTE), Borderline-SMOTE, SMOTE-Tomek, and SMOTE-ENN, were implemented and compared. This study explores the impact of these techniques on model performance, focusing on classification accuracy, precision, recall, and F1-score. The results reveal that all SMOTE-based methods improved model performance, with SMOTE-ENN yielding the best outcomes. It significantly reduced classification errors, particularly for minority classes, ensuring better predictive reliability. These findings underscore the importance of advanced oversampling techniques in improving transformer diagnostics. By effectively addressing the challenges posed by imbalanced datasets, this research provides a robust framework for applying ML in transformer condition monitoring and other domains with similar data constraints. Full article
Show Figures

Figure 1

Back to TopTop