MDPI - Publisher of Open Access Journals

26 pages, 3656 KB

Open AccessArticle

Explainable Machine Learning for Predicting Dengue Recovery Duration: Insights from Multi-Center Clinical Data

by Adam Khan, Asad Ali, Fazal Hanan and Muhammad Ismail Mohmand

Healthcare 2026, 14(13), 1881; https://doi.org/10.3390/healthcare14131881 (registering DOI) - 27 Jun 2026

Viewed by 168

Background: Dengue fever remains a major public health challenge in endemic regions, where recovery duration varies considerably across patients due to a combination of clinical, demographic, and contextual factors. Although machine learning (ML) approaches have increasingly been applied to dengue related prediction tasks, [...] Read more.

Background: Dengue fever remains a major public health challenge in endemic regions, where recovery duration varies considerably across patients due to a combination of clinical, demographic, and contextual factors. Although machine learning (ML) approaches have increasingly been applied to dengue related prediction tasks, many existing models operate as black boxes, limiting their interpretability and practical usefulness in healthcare settings. This study presents an Explainable Artificial Intelligence (XAI) based machine learning framework for analyzing dengue recovery duration using a multi-center clinical dataset collected from healthcare institutions across Khyber Pakhtunkhwa, Pakistan. Methods: Clinical records from 100 laboratory-confirmed dengue patients treated across multiple healthcare institutions were analyzed. The dataset included demographic, socio-economic, and clinical variables. Four machine learning models: Linear Regression, Decision Tree, Random Forest, and Neural Network, were developed and evaluated using 10-fold cross-validation. Explainability techniques, including Partial Dependence Plots (PDP), Individual Conditional Expectation (ICE), and Local Interpretable Model-Agnostic Explanations (LIME), were employed to investigate global and patient specific factors influencing recovery duration. Results: Among the evaluated models, Random Forest demonstrated the best overall predictive performance, achieving the lowest Root Mean Square Error (RMSE; 11.29 days) and Mean Absolute Error (MAE; 9.09 days), corresponding to a 40.4% reduction in prediction error compared with Linear Regression. Decision Tree also showed substantial improvement, reducing RMSE by 37%, whereas the Neural Network achieved a more modest improvement of 8.6%. Although all models exhibited relatively low coefficient of determination (R²) values (maximum R² = 0.026), the explainability analyses consistently identified age and platelet count as the most influential predictors of recovery duration. Older age and lower platelet counts were generally associated with longer recovery periods, while hospital type, education level, and blood group also contributed to prediction outcomes. ICE and LIME analyses further revealed considerable patient level heterogeneity, indicating that recovery trajectories are shaped by complex interactions among clinical, demographic, and contextual factors rather than a single dominant predictor. Full article

► Show Figures

Graphical abstract

18 pages, 1548 KB

Open AccessArticle

Machine Learning-Based Diabetes Risk Prediction via DiaHealth Dataset with Explainable AI and Streamlit Deployment

by Samson Adeyemi, Muhammad Zahid Iqbal and Md Golam Muttaquee Talukder

Future Internet 2026, 18(6), 331; https://doi.org/10.3390/fi18060331 - 21 Jun 2026

Viewed by 323

Abstract

The growing worldwide prevalence of Diabetes Mellitus highlights the urgent need for effective early detection methods to enable prompt intervention. This study develops a machine learning-based decision-support prototype for predicting diabetes risk using health metrics from the DiaHealth dataset, a recently published Bangladeshi [...] Read more.

The growing worldwide prevalence of Diabetes Mellitus highlights the urgent need for effective early detection methods to enable prompt intervention. This study develops a machine learning-based decision-support prototype for predicting diabetes risk using health metrics from the DiaHealth dataset, a recently published Bangladeshi open-source dataset for Type 2 diabetes prediction. Five supervised learning algorithms were evaluated: Logistic Regression (LR), Support Vector Machine (SVM), K-Nearest Neighbour (KNN), Decision Tree (DT), and Random Forest (RF). Models were assessed across three stages: before feature scaling, after standardisation, and following hyperparameter optimisation via GridSearchCV, using accuracy, precision, recall, and F1-score as evaluation metrics. LR and SVM showed marked improvements after standardisation, consistent with their sensitivity to feature magnitude, whilst tree-based approaches such as DT and RF remained largely unchanged. KNN displayed minimal sensitivity to scaling, which is discussed in relation to the feature distributions of the dataset. Following hyperparameter tuning, RF achieved the highest accuracy of 95%, outperforming all other models. RF predictions were interpreted using Local Interpretable Model-agnostic Explanations (LIME) to promote transparency in model decision-making. The best-performing model was subsequently deployed as an interactive web-based prototype application using Streamlit, providing real-time prediction outputs. These findings demonstrate how preprocessing choices and hyperparameter tuning can differentially affect algorithm performance and illustrate the potential of combining explainable AI with practical deployment for diabetes risk assessment in a research context. Full article

(This article belongs to the Special Issue The Future Internet of Medical Things, 3rd Edition)

► Show Figures

Figure 1

38 pages, 3705 KB

Open AccessArticle

Is the Visual Explanation of Deep Learning Robust? Statistical Evaluation of Popular Visual Explanation Methods on State-of-the-Art Convolutional Neural Networks in Classification Tasks

by Justyna Golec and Tomasz Hachaj

Electronics 2026, 15(12), 2526; https://doi.org/10.3390/electronics15122526 - 8 Jun 2026

Viewed by 287

Abstract

Many methods have been proposed for visualizing and interpreting the results of artificial intelligence (AI) algorithms. AI explainability (XAI) methods vary in mathematical basis, effectiveness, and scope of application. Knowing this, an important question arises: how do their results differ from a statistical [...] Read more.

Many methods have been proposed for visualizing and interpreting the results of artificial intelligence (AI) algorithms. AI explainability (XAI) methods vary in mathematical basis, effectiveness, and scope of application. Knowing this, an important question arises: how do their results differ from a statistical point of view, and are some of them more useful than the others in certain scenarios? Our article aims to assess the robustness of the most popular AI models’ explainability visualization methods and to identify differences in the results obtained. We did this by analyzing fundamental convolutional neural network models that classified 598 cat images from the Oxford III-T Pet database and 580 filtered pictures of Boeing planes from the Aircraft Images Dataset. We performed a comparative analysis of the similarities between methods based on Class Activation Mapping (CAM), gradients, and Local Interpretable Model-agnostic Explanations (LIME). To evaluate them, we used Pearson Correlation Coefficient (CC), Matthews Correlation Coefficient (MCC), Spearman’s Rank, Structural Similarity Index Measure (SSIM), Kullback–Leibler divergence, Intersection over Union (IoU), and Soft IoU. To check the fidelity and robustness of the XAI methods, we used RandomCAM and ran an ablation test, checking for a decrease in prediction confidence as we gradually removed the least significant regions. Our results provide an up-to-date and broad comparative analysis of this field. They can serve as a reference point for machine learning scientists and engineers. Full article

(This article belongs to the Special Issue Artificial Intelligence in Computer Vision: Advances and Applications)

► Show Figures

Figure 1

26 pages, 3065 KB

Open AccessArticle

ML-BUSMetab: Machine Learning-Based Metabolomic Profiling for Predicting Aspirin Response in Colorectal Cancer Chemoprevention: A Multi-Model Explainable Artificial Intelligence Approach with External Validation

by Abdulvahap Pınar, Ahmet Kadir Arslan and Cemil Çolak

J. Clin. Med. 2026, 15(11), 4287; https://doi.org/10.3390/jcm15114287 - 1 Jun 2026

Viewed by 318

Abstract

Background/Objectives: Aspirin-based colorectal cancer (CRC) chemoprevention remains a promising yet individually variable strategy. As a proof-of-concept toward future personalized chemoprevention frameworks, we aimed to develop and validate machine learning (ML) models capable of distinguishing aspirin-exposed from placebo-exposed participants based on their plasma metabolomic [...] Read more.

Background/Objectives: Aspirin-based colorectal cancer (CRC) chemoprevention remains a promising yet individually variable strategy. As a proof-of-concept toward future personalized chemoprevention frameworks, we aimed to develop and validate machine learning (ML) models capable of distinguishing aspirin-exposed from placebo-exposed participants based on their plasma metabolomic signatures, thereby characterizing the metabolomic footprint of aspirin administration rather than directly predicting clinical chemoprevention benefit. Methods: Training was performed on the Aspirin/Folate Polyp Prevention Study (AFPPS) dataset ST001422 (n = 300) and external validation on ST001423 (n = 223). After multi-method consensus feature selection, reducing 19,433 features to 300, sixteen ML and deep learning (DL) architectures were benchmarked under nested cross-validation. Model interpretability was assessed using SHapley Additive exPlanations (SHAP) and Local Interpretable Model-agnostic Explanations (LIME) analyses. Results: GBM_sklearn achieved the highest cross-validation Precision–Recall AUC (PR-AUC) of 0.945, while ensemble stacking (Stack_LGB) offered superior calibration (Brier = 0.117). DL models consistently underperformed traditional ML (PR-AUC: 0.673–0.843 vs. 0.881–0.945), attributable to limited sample size. SHAP and LIME analyses independently identified m/z 196.0604 (C18, RT 89.4 s) as the top metabolic biomarker, consistent with aspirin-induced glycerophospholipid pathway alterations. External validation performance degraded substantially (PR-AUC: 0.945 → 0.711), attributable to inter-study analytical batch effects. Conclusions: This framework demonstrates the feasibility of metabolomics-driven personalized chemoprevention. Although the high feature-to-sample ratio (300:300) and the substantial drop between internal and external performance indicate that the cross-validation estimates likely include dataset-specific noise in addition to the true biological signal. While highlighting batch harmonization and aggressive feature reduction (e.g., LASSO/RFE-based selection of 10–20 high-impact metabolites) as a prerequisite for clinical translation. Full article

(This article belongs to the Section Gastroenterology & Hepatopancreatobiliary Medicine)

► Show Figures

Figure 1

18 pages, 1448 KB

Open AccessArticle

Trustworthy Assessment of University Competitiveness Using a Neural Network Model

by Tadeusz A. Grzeszczyk

Information 2026, 17(6), 536; https://doi.org/10.3390/info17060536 - 1 Jun 2026

Viewed by 266

Abstract

Universities compete for funding, and their positions depend on the results of national assessments and rankings, which are expensive to produce and based on difficult-to-predict expert opinions. Assessment results have a significant impact on a university’s reputation, funding levels, attractiveness to faculty and [...] Read more.

Universities compete for funding, and their positions depend on the results of national assessments and rankings, which are expensive to produce and based on difficult-to-predict expert opinions. Assessment results have a significant impact on a university’s reputation, funding levels, attractiveness to faculty and staff, and success in recruiting top-tier students. Expert assessments and forecasts are widely used, but additional support from trusted AI tools is desirable. Several attempts have been made to use various machine learning methods, but confidence in such solutions is limited due to perceived difficulties in clearly and reliably justifying the resulting predictions. This research aims to present a proposal for using neural network models, accompanied by explanations of their predictions, to support trustworthy and sustainable assessment of university competitiveness. This methodological contribution enhances the transparency and interpretability of the assessment process and is further supported by empirical studies based on data from selected universities. A Fully Connected Neural Network (FCNN) is used for the calculations, and the local interpretable model-agnostic explanations (LIME) method is applied to explain the prediction results. The results confirm the usefulness of the proposed model and provide a solid foundation for improving evaluation systems and building trust in AI applications for assessing universities’ competitive position and the benefits of scientific research for society. Full article

(This article belongs to the Section Artificial Intelligence)

► Show Figures

Figure 1

44 pages, 17845 KB

Open AccessArticle

Explainable Machine Learning Framework for Automotive Fuel Efficiency and CO₂ Emission Estimation: A Comparative Study Toward Environmental Sustainability

by Md Monir Ahammod Bin Atique, Md Tareq Zaman, Salman Jahan, Masud Rana and Jeong-Hun Park

Energies 2026, 19(11), 2664; https://doi.org/10.3390/en19112664 - 31 May 2026

Viewed by 289

Abstract

The transportation sector is the primary consumer of vehicle fuel worldwide and is thus a major contributor to climate change via carbon dioxide (CO₂) emissions. In addition to severe environmental impacts, such as global warming, droughts, floods, and rising sea levels, [...] Read more.

The transportation sector is the primary consumer of vehicle fuel worldwide and is thus a major contributor to climate change via carbon dioxide (CO₂) emissions. In addition to severe environmental impacts, such as global warming, droughts, floods, and rising sea levels, these emissions have a negative effect on public health by increasing the prevalence of respiratory disease. Achieving environmental sustainability through regulatory oversight requires a strong understanding of vehicular fuel consumption and CO₂ emissions. However, accurate modeling of these remains challenging due to the complex non-linear relationships between various vehicular characteristics and the lack of interpretability of many predictive models. Traditional linear models often fail to capture high-dimensional data complexities, while black-box methods provide few actionable insights for policymaking. To address these gaps, we developed a robust and data-driven two-stage machine-learning (ML) framework designed to enhance model performance and reliability. First, we implemented standard data preprocessing, enhanced feature engineering, and hyperparameter tuning for 14 cutting-edge ML algorithms and three advanced modeling techniques to explore their predictive performance. Second, we introduced three interpretable explainable AI (XAI) approaches. These were evaluated on a publicly available Kaggle static dataset of 550 vehicles, dominated by gasoline-powered vehicles, with only two diesels and two electric vehicles. The tuned CatBoost model demonstrated strong predictive performance, achieving an impressive R² of 0.9260, a root mean square error (RMSE) of 1.1759, and a mean absolute error (MAE) of 0.8147. In parallel, we deterministically estimated CO₂ emissions from fuel consumption, which provide direct estimates of tailpipe emissions. To ensure transparency and model interpretability, we employed Shapley additive explanations, local interpretable model-agnostic explanations, and permutation importance to identify the key factors contributing to the model predictions. Across the explainability analyses, cylinder count, front-wheel drive (drive_fwd), and the displacement–year interaction were the primary contributors to the predicted combined miles per gallon; in other words, they strongly affected fuel consumption. Collectively, these findings demonstrate the ability of the proposed model to capture complex feature relationships; thus, it offers a valuable tool for researchers and policymakers in sustainability planning and emission control. Future research should focus on real-time driving or dynamic measurements data and enhancing practical applications to further reduce emissions and promote environmental sustainability. Full article

(This article belongs to the Special Issue Waste-to-Energy Technologies for Circular Economy and Carbon Neutrality)

► Show Figures

Figure 1

28 pages, 8906 KB

Open AccessArticle

Machine Learning-Based Prediction of Polymer Properties Using Structure–Property Relationship Modeling

by Mohammod Hafizur Rahman, Md Arifuzzaman, Md Ehtesamul Haque, Ramasamy Srinivasaga Naidu, Md Enamul Hoque and Muhammad Ali Martuza

Polymers 2026, 18(11), 1320; https://doi.org/10.3390/polym18111320 - 27 May 2026

Viewed by 652

Abstract

The rapid advancement of Machine Learning (ML) has significantly transformed polymer science by enabling efficient prediction and design of polymer properties through high-throughput screening. However, current methods still struggle with nonlinear Structure–Property Relationships (SPRs), limited dataset standardization, and computational inefficiency, which restrict prediction [...] Read more.

The rapid advancement of Machine Learning (ML) has significantly transformed polymer science by enabling efficient prediction and design of polymer properties through high-throughput screening. However, current methods still struggle with nonlinear Structure–Property Relationships (SPRs), limited dataset standardization, and computational inefficiency, which restrict prediction accuracy and interpretability. This study proposes a comprehensive ML-based framework for predicting polymer properties and identifying SPRs. The approach integrates data preprocessing, molecular descriptor and topological index–based feature extraction, iterative feature selection, and XGBoost predictive modeling. Model hyperparameters are optimized using the Starfish Optimization Algorithm (SOA) to enhance performance and efficiency. Model interpretability is achieved through SHapley Additive exPlanations (SHAP) and Local Interpretable Model-Agnostic Explanations (LIME), providing both global and local insights into the influence of molecular features on polymer properties. Experimental evaluation on the PolyOne dataset demonstrates strong predictive performance, with R² values exceeding 0.92, mean absolute error (MAE) below 0.08, and root mean square error (RMSE) under 0.12 for key physical and optical polymer properties. Overall, the proposed framework effectively balances accuracy, computational efficiency, and interpretability, offering a robust and practical tool for accelerating polymer design while enhancing understanding of molecular structure–property relationships. Full article

(This article belongs to the Section Artificial Intelligence in Polymer Science)

► Show Figures

Figure 1

23 pages, 21478 KB

Open AccessArticle

Explainable Split-Learning-Based Framework for Accurate Pulmonary Nodule Classification

by Amira Bouamrane, Makhlouf Derdour, Ahmed Alksas, Norah Saleh ALghamdi, Mohamed Ghazal and Ayman El-Baz

Bioengineering 2026, 13(5), 552; https://doi.org/10.3390/bioengineering13050552 - 13 May 2026

Viewed by 394

Abstract

Lung cancer rates are the highest among cancers, making it the leading cause of death worldwide. With advances in new technologies and diverse diagnostic methods, Computer-Aided Diagnosis Systems (CADx) have improved pulmonary nodule classification with notable accuracy and speed. However, limited data availability [...] Read more.

Lung cancer rates are the highest among cancers, making it the leading cause of death worldwide. With advances in new technologies and diverse diagnostic methods, Computer-Aided Diagnosis Systems (CADx) have improved pulmonary nodule classification with notable accuracy and speed. However, limited data availability and privacy concerns remain significant challenges, in addition to the reported rates of false negatives and false positives. This work aims to develop an approach based on collaborative feature extraction between multiple centers, thus achieving data efficiency and diversity while ensuring privacy and reducing false positives and false negatives. This work proposes a new explainable feature-based split learning approach using diverse Computed Tomography (CT) scan datasets to evaluate data diversity and privacy. It adopts a split ResNet-50 architecture on the client side for feature extraction. On the server side, a hybrid 2D-CNN combined with an attention mechanism is used for final classification and decision-making. The architecture was evaluated using two ablation studies based on ConvNeXt-Tiny and EfficientNetB0. In addition, the model was tested on two external datasets to assess its robustness and generalizability, and with Local Interpretable Model-agnostic Explanations (LIMEs) and Grad-CAM to assess trustworthiness. This proposed approach showed an accuracy and F1-score of 99.38%, with a 1.23% false negative rate and zero false positives. Moreover, when tested on totally unseen datasets, the approach achieved an accuracy and an F1-score of 99.28% on the first dataset, with 1.24% false negatives and 0% false positives. In addition, when tested on the second dataset, the results indicate an ability to generalize, with 95.74% accuracy, with false negative and false positive rates of 7.07% and 1.41%, respectively. Full article

(This article belongs to the Section Biosignal Processing)

► Show Figures

Figure 1

25 pages, 3031 KB

Open AccessArticle

Explainable Transformer-Based Framework for Suicide Risk Detection: Deep Learning with Interpretability for Mental Health Crisis Identification

by Muhammad Azhar, Muhammad Arman, Adeen Amjad, Deshinta Arrova Dewi, Muhammad Usman Ahmad and Shafiq Hussain

Information 2026, 17(5), 448; https://doi.org/10.3390/info17050448 - 6 May 2026

Viewed by 376

Abstract

The public health concern of suicide continues to rise and is increasingly prevalent on social media. The severity of this growing issue highlights the need for improved methods for detecting suicide risk. Many current deep learning approaches do not possess the required level [...] Read more.

The public health concern of suicide continues to rise and is increasingly prevalent on social media. The severity of this growing issue highlights the need for improved methods for detecting suicide risk. Many current deep learning approaches do not possess the required level of explainability for application in clinical settings. This study proposes the development of a transformer-based framework called “CrisisFormer,” which was trained on an imbalanced dataset containing 40,000 Reddit posts from the Suicide Watch subreddit and enhanced using DistilBERT. Additionally, the CrisisFormer framework uses three forms of explainable artificial intelligence for interpreting results: SHapley Additive exPlanations (SHAP), Local Interpretable Model-Agnostic Explanations (LIME), and transformer attention visualizations. The CrisisFormer framework achieved superior results for detecting the risk of suicide, with 96.25% accuracy, 96.30% precision, 96.25% recall, 96.25% F1 score, and 0.9944 AUC, compared to traditional models such as CNN, LSTM, and BiLSTM. Furthermore, by including clinically relevant suicide terms in its results, CrisisFormer demonstrates a high potential for incorporation into real-world mental health systems for intervention during ongoing mental health crises. Full article

(This article belongs to the Special Issue Advances in Explainable Artificial Intelligence, 2nd Edition)

► Show Figures

Figure 1

15 pages, 1448 KB

Open AccessArticle

Integrating Risk Factors and Symptoms for Urinary Tract Infection Diagnosis Using an Explainable AI Approach in Low-Resource Regions

by Kingsley Attai, Daniel Asuquo, Kingsley Akputu, Okure Obot, Cornelia Thomas, Faith-Valentine Uzoka, Ekerette Attai, Christie Akwaowo and Faith-Michael Uzoka

Information 2026, 17(5), 435; https://doi.org/10.3390/info17050435 - 1 May 2026

Viewed by 297

Abstract

Urinary Tract Infections (UTIs) represent one of the most prevalent bacterial infections globally, posing significant health burdens, especially in low- and middle-income countries (LMICs), due to delayed diagnoses, limited access to laboratory services, and rising antimicrobial resistance. This study presents a machine learning [...] Read more.

Urinary Tract Infections (UTIs) represent one of the most prevalent bacterial infections globally, posing significant health burdens, especially in low- and middle-income countries (LMICs), due to delayed diagnoses, limited access to laboratory services, and rising antimicrobial resistance. This study presents a machine learning (ML)-based diagnostic support framework for early UTI detection, leveraging structured clinical data and explainable artificial intelligence (XAI) techniques to enhance interpretability and trust among healthcare providers. A patient dataset containing 4865 records was used in the study to train and test Extreme Gradient Boosting (XGBoost), Decision Tree (DT) and Random Forest (RF) classifiers, while class imbalance was addressed using Synthetic Minority Over-sampling Technique (SMOTE). The performance of the models was evaluated through accuracy, precision, recall, F1-score, Log Loss, and AUC-ROC, and random forest showed the best results (accuracy: 86.43%, F1-score: 86.71%, AUC-ROC: 0.8695). To ensure that such models can be adopted by stakeholders in the health sector, Local Interpret-able Model-agnostic Explanations (LIME) were integrated, which identified painful urination, urinary frequency, and suprapubic pain as primary predictors in the model. This study shows that interpretable ML models can be helpful in resource-limited regions in predicting UTIs, thereby rendering a solution to improve the management of infections in these regions. Full article

(This article belongs to the Section Artificial Intelligence)

► Show Figures

Graphical abstract

35 pages, 14306 KB

Open AccessArticle

Enhancing SDN Intrusion Detection via Multi-Hybrid Deep Learning Fusion and Explainable AI

by Usman Ahmed and Muhammad Tariq Sadiq

Mathematics 2026, 14(9), 1498; https://doi.org/10.3390/math14091498 - 29 Apr 2026

Viewed by 442

Abstract

Software-defined networking (SDN) represents a paradigm shift in network management, but its centralized control plane introduces new and severe security vulnerabilities. Conventional intrusion detection systems, including signature- and rule-based methods, lack adaptability and interpretability in the face of evolving threats. This paper proposes [...] Read more.

Software-defined networking (SDN) represents a paradigm shift in network management, but its centralized control plane introduces new and severe security vulnerabilities. Conventional intrusion detection systems, including signature- and rule-based methods, lack adaptability and interpretability in the face of evolving threats. This paper proposes a multi-hybrid deep learning fusion ensemble (MHDLFE) to enhance intrusion detection in SDN environments. The framework integrates Deep Neural Networks (DNNs), Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), and Long Short-Term Memory (LSTM) models via feature fusion and a meta-classifier, thereby improving both detection performance and robustness. To address the critical need for transparency in security systems, the proposed approach incorporates Explainable AI techniques, specifically Shapley Additive Explanations (SHAP) and Local Interpretable Model-agnostic Explanations (LIME), providing interpretable insights into model decisions. The proposed model achieves strong performance on the NSL-KDD and CIC-IDS2017 datasets, attaining near-perfect binary classification scores of 97.91% and 93.30%, and multiclass accuracies of 98.61% and 97.91%, respectively. These results demonstrate that the proposed framework delivers an effective and trustworthy SDN intrusion detection system by combining deep learning, ensemble fusion, and explainable AI to support accurate, transparent, and reliable cybersecurity decision-making. Full article

(This article belongs to the Special Issue Advanced Applications of Deep Learning Methods: Interdisciplinary Perspectives)

► Show Figures

Figure 1

20 pages, 1432 KB

Open AccessArticle

Towards Classifying Obesity Risk: A Cross-Validated XGBoost Model Optimized for Imbalanced Data

by Jamal Haggouni, Salma Azzouzi and Moulay El Hassan Charaf

Obesities 2026, 6(3), 27; https://doi.org/10.3390/obesities6030027 - 28 Apr 2026

Viewed by 647

Abstract

Obesity is ranked as one of the biggest health challenges facing humanity today. Globally, the number of obese people has almost tripled since 1975, and this lifestyle disease currently affects hundreds of millions of adults who suffer from major health problems due to [...] Read more.

Obesity is ranked as one of the biggest health challenges facing humanity today. Globally, the number of obese people has almost tripled since 1975, and this lifestyle disease currently affects hundreds of millions of adults who suffer from major health problems due to it, such as heart disease, type 2 diabetes and some cancers, that weigh heavily on the global health systems, In order to keep high standards for methods, anthropometric variables, i.e., Height and Weight have been intentionally excluded from the features, because labels for obesity classes are based on these measurements; thus, including them would introduce target leakage. All models were individually tuned with Optuna (50 trials, TPE sampler), and the class imbalance was managed by the synthetic minority over-sampling technique (SMOTE), which was done only in training folds. The models were evaluated by stratified five-fold cross-validation, with the macro-averaged F1-score being used as the main metric for evaluation. The best model was the fine-tuned XGBoost, which gave a test macro F1-score value of 0.872 and a macro-AUC of 0.977. The model was higher performing than others such as Random Forest (F1 = 0.869), MLP (F1 = 0.777), and Logistic Regression (F1 = 0.605). This means that behavioral and lifestyle variables may have a very strong and sufficient signal to identify obesity status, even when there are no direct anthropometric measurements available. However, it is worth noting that results here represent only performance on a single public benchmark dataset, so they cannot be taken as proof that the model would do well in real-world clinical settings. With the advent of ML methods for obesity prediction, rigorous, leakage-free evaluation becomes indispensable. Apart from external validation of the clinical models on independent datasets, the use of interpretability tools such as SHapley Additive exPlanations (SHAP) and Local Interpretable Model-Agnostic Explanations (LIME) for understanding decision-making, as well as sex and gender subgroup analyses for evaluating fairness and equity, should also be pursued in the future. This study highlights the importance of rigorous, leakage-free evaluation in machine learning-based obesity research. Future work should focus on external validation using independent clinical cohorts, the integration of interpretability techniques such as SHapley Additive exPlanations (SHAP) and Local Interpretable Model-Agnostic Explanations (LIME), and subgroup analyses by sex and gender to assess model fairness and clinical equity. Full article

(This article belongs to the Special Issue Obesity and Its Comorbidities: Prevention and Therapy 2026)

► Show Figures

Figure 1

32 pages, 6911 KB

Open AccessArticle

Predicting the Strength of Sustainable Graphene-Enhanced Cementitious Composites Using Novel Machine Learning and Explainable AI Techniques

by Sanjog Chhetri Sapkota, Moinul Haq, Bipin Thapa, Sabin Adhikari, Anupam Dhakal, Roshan Paudel, Aashish Ghimire and Tushar Bansal

Infrastructures 2026, 11(5), 146; https://doi.org/10.3390/infrastructures11050146 - 24 Apr 2026

Cited by 1 | Viewed by 772

Abstract

The prediction of the compressive strength (CS) for sustainable concrete reinforced with graphene nanoplatelets (GNPs) is difficult as a result of nonlinear interactions between chemical composition, dispersion state, and curing conditions. To address this, an interpretable ensemble machine learning framework is developed to [...] Read more.

The prediction of the compressive strength (CS) for sustainable concrete reinforced with graphene nanoplatelets (GNPs) is difficult as a result of nonlinear interactions between chemical composition, dispersion state, and curing conditions. To address this, an interpretable ensemble machine learning framework is developed to provide accurate predictions of CS. The major input parameters used are sand content, graphene diameters, graphene thicknesses, and percentages of GNP to sand (GNP%; w/w), water-to-cement ratio W/C, ultrasonication period UST time (s), curing age CA day(s), while the CS (in MPa) is the target output. The random forest (RF) and XGBoost (XGB) models are incorporated into two novel metaheuristic optimization techniques, the Drawer-based optimization algorithm (DOA) and the Giant Trevally Optimizer (GTO), to enhance hyperparameter tuning and generalization. For all models, DOA XGB hybrids are the most predictive, with testing R² values up to 0.98; RMSE of around 2.9 MPa; MAE is approximately 2.0 MPa, and well over 97% within ±20% prediction error boundaries. The explainable artificial intelligence methodologies like Shapley Additive exPlanations (SHAP), Local Interpretable Model-Agnostic Explanations (LIME), partial dependence plots, and Individual Conditional Expectation plots reveal curing age and graphene thickness as the dominant parameters. High strengths above 70 MPa are always achieved from higher curing age, w/c ratio (from 0.3 to 0.4), and graphene dosage (from 0.5 to 2.5%). A Python GUI is developed for efficient and accurate strength predictions suitable for practical applications. The proposed approach provides a robust, interpretable, and efficient alternative to extensive testing for GNP-reinforced concrete. Full article

(This article belongs to the Special Issue AI in Sustainable and Resilient Infrastructures: Construction, Management, and Maintenance)

► Show Figures

Figure 1

16 pages, 2289 KB

Open AccessProceeding Paper

An Efficient Hybrid Framework for Weld Defect Detection Using GAN, CNN and XGBoost

by Kalyanaraman Pattabiraman, Ashish Patil, Yash Gulavani, Ritik Malik and Atharva Gai

Eng. Proc. 2026, 130(1), 9; https://doi.org/10.3390/engproc2026130009 - 22 Apr 2026

Viewed by 614

Abstract

Automated detection of defects in welds are inevitable in the assurance of structural integrity, but this faces serious challenges due to the microscopic characteristics of the discontinuities, low visual contrast and infrequent occurrence of defect samples. Conventional deep learning methods, while accurate, often [...] Read more.

Automated detection of defects in welds are inevitable in the assurance of structural integrity, but this faces serious challenges due to the microscopic characteristics of the discontinuities, low visual contrast and infrequent occurrence of defect samples. Conventional deep learning methods, while accurate, often lack interpretability and exhibit low recall for rare defects. This paper proposes a novel hybrid system combining a Generative Adversarial Network (GAN), a Convolutional Neural Network (CNN), and Extreme Gradient Boosting (XGBoost 2.0.0) to enhance weld defect classification performance and transparency. Firstly, a Deep Convolutional GAN (DCGAN) creates synthetic images of the minority classes; thus, the problem of class imbalance is resolved. Then, a pretrained ResNet50V2 CNN is used to extract features of the deep layers from the original images as well as from the generated ones. After that, these features are fed into an XGBoost classifier, which uses tree-based learning to optimize classification results and make the process more understandable to the user. Furthermore, interpretation is also facilitated by Grad-CAM rendering of the CNN regions of interest and SHAP analysis to measure the involvement of the features in XGBoost. Experiments using the available LoHi-WELD datasets show that the overall accuracy is significantly improved, the per-class recall of the rare defects is also enhanced, and the robustness is also improved. The proposed hybrid method not only achieves better results but also generates visual/explainable output, which is very valuable when the system is implemented in industrial welding inspection systems. This paper serves as a liaison between the latest AI technology and the practical interpretability requirements of the mechanical and welding engineering fields. Full article

(This article belongs to the Proceedings of The 19th Global Congress on Manufacturing and Management (GCMM 2025))

► Show Figures

Figure 1

14 pages, 730 KB

Open AccessProceeding Paper

Lightweight and Transparent Intrusion Detection in the Internet of Medical Things: The Role of Explainable AI

by Rawan Abdulaziz AlRumaih, Tarek Moulahi and Dina M. Ibrahim

Comput. Sci. Math. Forum 2026, 13(1), 5; https://doi.org/10.3390/cmsf2026013005 - 16 Apr 2026

Viewed by 557

Abstract

The rise of the Internet of Medical Things (IoMT) has transformed healthcare through real-time monitoring and improved outcomes but also introduced critical security and privacy challenges. This paper presents a focused survey of Explainable AI (XAI) approaches for intrusion detection in IoMT, emphasizing [...] Read more.

The rise of the Internet of Medical Things (IoMT) has transformed healthcare through real-time monitoring and improved outcomes but also introduced critical security and privacy challenges. This paper presents a focused survey of Explainable AI (XAI) approaches for intrusion detection in IoMT, emphasizing methods that are lightweight, transparent, and deployable under resource constraints. We first clarify XAI terminology and taxonomy (global vs. local scope; ante hoc vs. post hoc; model-agnostic vs. model-specific) and then systematize recent works from the past five years across cybersecurity sub-domains relevant to eHealth. Representative pipelines span classical ML (e.g., LR, RF, SVM, and XGBoost) and deep models (e.g., DNNs and SRU/LSTM), with post hoc explainers, especially SHAP and LIME, dominating practice on benchmark datasets such as CICIDS2017, NSL-KDD, ToN-IoT, WUSTL-EHMS, and CICIoMT2024. Our comparative analysis highlights consistent gains from model ensembling and interpretable feature selection while uncovering key gaps: limited real-world validation, inconsistent explainability metrics, adversarial brittleness, and the computing cost of explanations at the edge. Full article

(This article belongs to the Proceedings of The 1st International Conference on Emerging Tech & Innovation (ICETI))

► Show Figures

Figure 1

Search Results (223)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (223)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI