MDPI - Publisher of Open Access Journals

25 pages, 3685 KB

Open AccessArticle

Explainable Meta-Learning Ensemble Framework for Predicting Insulin Dose Adjustments in Diabetic Patients: A Comparative Machine Learning Approach with SHAP-Based Clinical Interpretability

by Emek Guldogan, Burak Yagin, Hasan Ucuzal, Abdulmohsen Algarni, Fahaid Al-Hashem and Mohammadreza Aghaei

Medicina 2026, 62(3), 502; https://doi.org/10.3390/medicina62030502 - 9 Mar 2026

Viewed by 143

Abstract

Background and Objectives: Diabetes mellitus represents one of the most prevalent chronic metabolic disorders worldwide, necessitating precise insulin dose management to prevent both acute and long-term complications. The optimization of insulin dosing remains a significant clinical challenge, as inappropriate dosing can lead [...] Read more.

Background and Objectives: Diabetes mellitus represents one of the most prevalent chronic metabolic disorders worldwide, necessitating precise insulin dose management to prevent both acute and long-term complications. The optimization of insulin dosing remains a significant clinical challenge, as inappropriate dosing can lead to hypoglycemia or hyperglycemia, each carrying substantial morbidity risks. Machine learning approaches have emerged as promising tools for developing clinical decision support systems; however, their practical implementation requires both high predictive accuracy and model interpretability. This study aimed to develop and evaluate an explainable machine learning framework for predicting insulin dose adjustments in diabetic patients. We sought to compare multiple ensemble learning approaches and identify the optimal model configuration that balances predictive performance with clinical interpretability through comprehensive SHAP and LIME analyses. Materials and Methods: A comprehensive dataset comprising 10,000 patient records with 12 clinical and demographic features was utilized. We implemented and compared nine machine learning models, including gradient boosting variants (XGBoost, LightGBM, CatBoost, GradientBoosting), AdaBoost, and four ensemble strategies (Voting, Stacking, Blending, and Meta-Learning). Model interpretability was achieved through SHapley Additive exPlanations (SHAP) and Local Interpretable Model-agnostic Explanations (LIME) analyses. Performance was evaluated using accuracy, weighted F1-score, area under the receiver operating characteristic curve (AUC-ROC), precision-recall AUC (PR-AUC), sensitivity, specificity, and cross-entropy loss. Results: The Meta-Learning Ensemble achieved superior performance across all evaluation metrics, attaining an accuracy of 81.35%, weighted F1-score of 0.8121, macro-averaged AUC-ROC of 0.9637, and PR-AUC of 0.9317. The model demonstrated exceptional sensitivity (86.61%) and specificity (91.79%), with particularly high performance in detecting dose reduction requirements (100% sensitivity for the ‘down’ class). SHAP analysis revealed insulin sensitivity, previous medications, sleep hours, weight, and body mass index as the most influential predictors across different insulin adjustment categories. The meta-model feature importance analysis indicated that LightGBM probability estimates contributed most significantly to the ensemble predictions. Conclusions: The proposed explainable Meta-Learning Ensemble framework demonstrates robust predictive capability for insulin dose adjustment recommendations while maintaining clinical interpretability. The integration of SHAP-based explanations facilitates clinician understanding of model predictions, supporting transparent and informed decision-making in diabetes management. This approach represents a significant advancement toward the clinical implementation of artificial intelligence in personalized insulin therapy. Full article

(This article belongs to the Special Issue Application of Artificial Intelligence in Disease Diagnosis and Treatment)

► Show Figures

Figure 1

30 pages, 2628 KB

Open AccessArticle

Predicting Bond Defaults in China: A Double-Ensemble Model Leveraging SMOTE for Class Imbalance

by Chongwen Tian and Rong Li

Big Data Cogn. Comput. 2026, 10(3), 81; https://doi.org/10.3390/bdcc10030081 - 6 Mar 2026

Viewed by 193

Abstract

This study proposes the Double-Ensemble Learning Classification with SMOTE (DELC-SMOTE), a novel hierarchical framework designed to address the critical challenge of severe class imbalance in financial bond default prediction. The model integrates the Synthetic Minority Over-sampling Technique (SMOTE) into a two-phase ensemble architecture. [...] Read more.

This study proposes the Double-Ensemble Learning Classification with SMOTE (DELC-SMOTE), a novel hierarchical framework designed to address the critical challenge of severe class imbalance in financial bond default prediction. The model integrates the Synthetic Minority Over-sampling Technique (SMOTE) into a two-phase ensemble architecture. The first phase employs introspective stacking, where six heterogeneous base learners are individually enhanced through algorithm-specific balancing and meta-learning. The second phase fuses these optimized experts via performance-weighted voting. Empirical analysis utilizes a comprehensive dataset of 10,440 Chinese corporate bonds (522 defaults, ~5% default rate) sourced from Wind and CSMAR databases. Given the high cost of both false negatives and false positives in risk assessment, the Geometric Mean (G-mean) and Specificity are employed as primary evaluation metrics. Results demonstrate that the proposed DELC-SMOTE model significantly outperforms individual base classifiers and benchmark ensemble variants, achieving a G-mean of 0.9152 and a Specificity of 0.8715 under the primary experimental setting. The model exhibits robust performance across varying imbalance ratios (2%, 10%, 20%) and strong resilience against data noise, perturbations, and outliers. These findings indicate that the synergistic integration of data-level resampling within a diversified, two-tiered ensemble structure effectively mitigates class imbalance bias and enhances predictive reliability. The framework offers a robust and generalizable tool for actionable default risk assessment in imbalanced financial datasets. Full article

(This article belongs to the Section Data Mining and Machine Learning)

► Show Figures

Figure 1

17 pages, 424 KB

Open AccessArticle

SegFusion: A Lattice-Based Dynamic Ensemble Framework for Chinese Word Segmentation with Unsupervised Statistical Features

by Chengfeng Wen and Jiqiu Deng

Appl. Sci. 2026, 16(5), 2463; https://doi.org/10.3390/app16052463 - 4 Mar 2026

Viewed by 168

Abstract

Although existing Chinese word segmentation systems have achieved substantial progress on standard benchmarks, prediction disagreements among heterogeneous models remain prevalent when processing texts containing complex ambiguities and out-of-vocabulary words, and traditional static ensemble methods such as majority voting often fail to make reliable [...] Read more.

Although existing Chinese word segmentation systems have achieved substantial progress on standard benchmarks, prediction disagreements among heterogeneous models remain prevalent when processing texts containing complex ambiguities and out-of-vocabulary words, and traditional static ensemble methods such as majority voting often fail to make reliable decisions in low-consensus scenarios. To address this issue, this paper proposes SegFusion, a stacked heterogeneous ensemble framework for Chinese word segmentation based on word lattice re-scoring. The framework first constructs a candidate word lattice to consolidate diverse outputs from heterogeneous segmenters into a unified lattice representation, and then incorporates unsupervised statistical features, including mutual information and branching entropy, as external discriminative evidence to perform dynamic arbitration at the word level, followed by global decoding to obtain the optimal segmentation path. Experimental results on multiple standard datasets demonstrate that SegFusion consistently outperforms individual models and mainstream ensemble baselines in terms of overall segmentation performance and out-of-vocabulary (OOV) recall. In particular, on the MSR dataset with severe ambiguity, SegFusion achieves improvements of 3.71% in F1 score and 4.10% in OOV recall. Further fine-grained analysis shows that the introduction of unsupervised statistical features effectively mitigates model consistency bias in low-support scenarios. These results indicate that integrating language statistical priors independent of training data into the ensemble arbitration stage is an effective way to enhance the robustness and consistency of Chinese word segmentation systems. Full article

(This article belongs to the Section Computing and Artificial Intelligence)

► Show Figures

Figure 1

47 pages, 2578 KB

Open AccessArticle

Machine Learning-Based Prediction of Compressive Strength in Recycled Aggregate Self-Compacting Concrete: An Ensemble Modeling Approach with SHAP Interpretability Analysis

by Zhengyang Zhang, Biao Luo and Ya Su

Appl. Sci. 2026, 16(5), 2432; https://doi.org/10.3390/app16052432 - 3 Mar 2026

Viewed by 170

Abstract

The incorporation of recycled concrete aggregates (RCAs) into self-compacting concrete (SCC) represents a critical sustainable construction strategy addressing both construction waste management and natural resource conservation. However, predicting the compressive strength of recycled aggregate self-compacting concrete (RASCC) remains challenging due to complex nonlinear [...] Read more.

The incorporation of recycled concrete aggregates (RCAs) into self-compacting concrete (SCC) represents a critical sustainable construction strategy addressing both construction waste management and natural resource conservation. However, predicting the compressive strength of recycled aggregate self-compacting concrete (RASCC) remains challenging due to complex nonlinear interactions among mixture parameters. This study develops a robust predictive framework using ensemble machine learning algorithms to accurately estimate RASCC compressive strength across diverse mixture compositions. A comprehensive database comprising 301 experimental specimens with 18 input variables—including curing age, binder components, water-to-binder ratio, recycled aggregate properties, and supplementary cementitious materials—was systematically analyzed. Four advanced modeling approaches were evaluated: Light Gradient Boosting Machine (LightGBM), Categorical Boosting (CatBoost), Stacked Generalization with Ridge regression meta-learner, and Voting ensemble with Non-Negative Least Squares optimization. The Stacking ensemble model demonstrated superior predictive performance on the independent test set, with R² = 0.963, RMSE = 3.321 MPa, and MAE = 2.506 MPa. Rigorous residual analysis confirmed model validity through satisfaction of normality, homoscedasticity, and independence assumptions. SHAP interpretability analysis identified specimen age as the dominant predictor, followed by recycled aggregate density and water-to-binder ratio, while elucidating the complex nonlinear contributions of supplementary cementitious materials including fly ash and ground granulated blast furnace slag. The developed framework demonstrates practical applicability for predicting RASCC compressive strength across conventional to high-performance grades, facilitating sustainable mix design optimization while maintaining structural performance requirements, and advancing circular economy principles through confident integration of recycled aggregates in SCC applications. Full article

► Show Figures

Figure 1

18 pages, 1324 KB

Open AccessArticle

A Stacking-Based Multi-Step LSTM and Policy-Enhanced SVR Method for Carbon Emission Prediction

by Bingtai Liu, Wanyi Zhang and Jianfei Huang

Sustainability 2026, 18(5), 2434; https://doi.org/10.3390/su18052434 - 3 Mar 2026

Viewed by 141

Abstract

China’s “dual-carbon” targets require more scientifically precise methods for carbon emission forecasting. Existing methods mainly rely on time series or regression models: the former captures temporal trends but lacks interpretability, while the latter provides explanatory power but struggles with nonlinear patterns. To overcome [...] Read more.

China’s “dual-carbon” targets require more scientifically precise methods for carbon emission forecasting. Existing methods mainly rely on time series or regression models: the former captures temporal trends but lacks interpretability, while the latter provides explanatory power but struggles with nonlinear patterns. To overcome these limitations, this paper applies a multi-step LSTM with transfer learning to capture nonlinear temporal dynamics of carbon emissions, incorporates an SVR with added policy variables to improve accuracy, and finally employs a stacking model to integrate above advantages. Predictions are then aggregated via linear regression to leverage complementary strengths. The proposed model is trained on 1960–2004 data and tested on 2005–2019, 2023 and 2024 data. Results show that the optimized LSTM and SVR improve prediction accuracy, while the Stacking-based ensemble surpasses individual models in accuracy and robustness. Based on the integrated model, predictions for 2023–2050 indicate that if policies are strengthened in 2025, China’s carbon emissions will peak in 2024 and subsequently decline to about 8175 Mt CO₂ by 2050; if policies are not strengthened in 2025, emissions will peak in 2026 and subsequently decline to about 6983 Mt CO₂. Full article

► Show Figures

Figure 1

32 pages, 19818 KB

Open AccessArticle

An Interpretable Ensemble Machine Learning Framework for Predicting the Ultimate Flexural Capacity of BFRP-Reinforced Concrete Beams

by Sebghatullah Jueyendah and Elif Ağcakoca

Polymers 2026, 18(5), 601; https://doi.org/10.3390/polym18050601 - 28 Feb 2026

Viewed by 265

Abstract

Prediction of the ultimate moment capacity (Mu) of BFRP-reinforced concrete beams is complicated by nonlinear parameter interactions and the linear-elastic response of BFRP, reducing the accuracy of conventional design models. This study develops an optimized machine learning (ML) framework incorporating random forest, extra [...] Read more.

Prediction of the ultimate moment capacity (Mu) of BFRP-reinforced concrete beams is complicated by nonlinear parameter interactions and the linear-elastic response of BFRP, reducing the accuracy of conventional design models. This study develops an optimized machine learning (ML) framework incorporating random forest, extra trees, gradient boosting, adaboost, bagging, support vector regression, histogram-based gradient boosting, and ensemble voting and stacking strategies for reliable prediction of the Mu of BFRP-reinforced concrete beams. A comprehensive database of material, geometric, reinforcement, and BFRP mechanical parameters was analyzed, and model performance was evaluated using an 80/20 train–test split and 10-fold cross-validation based on R², RMSE, MAE, and MAPE. The stacking regressor demonstrated superior predictive performance, achieving an R² of 0.999 (RMSE = 0.590) in training and an R² of 0.988 (RMSE = 2.487) in testing, indicating excellent robustness and strong generalization capability in predicting Mu. Furthermore, interpretability analyses based on SHAP, PDP, ALE, and ICE demonstrate that span length (L) and beam depth (h) constitute the governing parameters in the prediction of Mu. Unlike prior studies focused mainly on predictive accuracy, this work proposes an optimized and interpretable stacking ensemble framework that integrates explainable AI with classical flexural mechanics for physically consistent and reliable prediction of the ultimate moment capacity of BFRP-reinforced concrete beams. Full article

(This article belongs to the Special Issue Fiber-Reinforced Polymer Composites: Progress and Prospects)

► Show Figures

Graphical abstract

17 pages, 2074 KB

Open AccessArticle

Predicting ICU Readmission Afte Intracerebral Hemorrhage: A Deep Learning Framework Using MIMIC Time-Series Data

by Sergio Celada-Bernal, Alejandro Piñán-Roescher, Ruyman Hernández-López and Carlos M. Travieso-González

Appl. Sci. 2026, 16(5), 2235; https://doi.org/10.3390/app16052235 - 26 Feb 2026

Viewed by 234

Abstract

Intensive Care Unit (ICU) readmissions following Intracerebral Hemorrhage (ICH) are associated with increased mortality and resource burden. Current prediction models predominantly rely on static admission features, failing to capture the temporal evolution of physiological instability. This study proposes a novel deep learning framework [...] Read more.

Intensive Care Unit (ICU) readmissions following Intracerebral Hemorrhage (ICH) are associated with increased mortality and resource burden. Current prediction models predominantly rely on static admission features, failing to capture the temporal evolution of physiological instability. This study proposes a novel deep learning framework to predict ICU readmission by leveraging high-resolution time-series data from the MIMIC-III and MIMIC-IV databases. We developed a Stacked Gated Recurrent Unit (GRU) Architecture Ensemble, integrated with Time-series Generative Adversarial Networks (TimeGAN) to address the inherent class imbalance of readmission events. Our model achieved a state-of-the-art Area Under the Receiver Operating Characteristic Curve (AUC) of 0.912, significantly outperforming traditional machine learning baselines and static feature models. The sensitivity of 88.1% highlights the model’s efficacy in minimizing unsafe premature discharges. Furthermore, interpretability analysis using SHAP values identified Length of Stay, MELD Score, and Monocytes as critical predictors, revealing that readmission risk is driven by a complex interplay between systemic organ dysfunction and inflammatory response. These findings demonstrate that incorporating temporal dynamics and generative data augmentation significantly enhances risk stratification, offering a robust clinical decision support tool to optimize discharge timing in neurocritical care. Full article

(This article belongs to the Special Issue Interdisciplinary Applications of Machine Learning and Intelligent Signal Prediction for Smart and Connected Environments)

► Show Figures

Figure 1

24 pages, 5876 KB

Open AccessArticle

A Stacking-Based Ensemble Learning Method for Multispectral Reconstruction of Printed Halftone Images

by Lin Zhu, Jinghuan Ge, Dongwen Tian and Jie Yang

Symmetry 2026, 18(3), 406; https://doi.org/10.3390/sym18030406 - 25 Feb 2026

Viewed by 194

Abstract

Motivation: Accurate spectral reconstruction of printed halftone images is essential for achieving high-fidelity color reproduction and robust color management across modern printing systems. However, traditional physics-based models, such as the Yule–Nielsen and Clapper–Yule formulations, rely on simplified empirical assumptions and often fail to [...] Read more.

Motivation: Accurate spectral reconstruction of printed halftone images is essential for achieving high-fidelity color reproduction and robust color management across modern printing systems. However, traditional physics-based models, such as the Yule–Nielsen and Clapper–Yule formulations, rely on simplified empirical assumptions and often fail to capture the complex nonlinear and asymmetric interactions induced by multi-ink overlays and substrate light scattering. Meanwhile, existing data-driven approaches based on single learning models exhibit limited capability in modeling the complementary and symmetrical characteristics inherent in halftone structures, resulting in suboptimal prediction accuracy and generalization performance. Method: To address these limitations, we propose a Stacking Ensemble Spectral Prediction (SESP) framework. The proposed method adopts a two-layer stacking architecture that integrates heterogeneous base regressors, including Support Vector Regression (SVR), Random Forest (RF), and eXtreme Gradient Boosting (XGBoost 3.0.3), with Ridge Regression employed as the meta-learner for optimal prediction aggregation. This ensemble design enables effective modeling of both halftone pattern symmetry and complex substrate scattering behavior. Results: Extensive experiments conducted on printed halftone image datasets demonstrate the superior performance of the proposed SESP framework. Compared with the best-performing reference method (PCA-IPSO-DNN), SESP achieves relative reductions in RMSE and CIEDE2000 of 12.8% and 6.8% under illuminant A, 9.5% and 6.9% under D50, and 12.2% and 7.2% under D65, respectively. In addition, SESP consistently outperforms traditional physics-based models, including Yule–Nielsen and Clapper–Yule, in terms of both spectral prediction accuracy and colorimetric fidelity. These results confirm the effectiveness of the proposed framework in modeling the intricate nonlinear and asymmetric relationships between CMYK halftone patterns and spectral reflectance. Full article

(This article belongs to the Special Issue Computer Vision, Robotics, and Automation Engineering)

► Show Figures

Figure 1

25 pages, 2562 KB

Open AccessArticle

Research on the Assessment of Dairy Cow Dry Matter Intake Using ITSO-Optimized Stacking Ensemble Learning

by Shuairan Wang, Ting Long, Xiaoli Wei, Qinzu Guo, Hongrui Guo, Weizheng Shen and Zhixin Gu

Animals 2026, 16(4), 625; https://doi.org/10.3390/ani16040625 - 16 Feb 2026

Viewed by 200

Abstract

Dry matter intake (DMI) in dairy cows is a critical indicator of nutrient intake from feed, serving as the cornerstone of precision feeding practices, playing a critical role in improving production efficiency and enhancing the quality of dairy products. To address the high [...] Read more.

Dry matter intake (DMI) in dairy cows is a critical indicator of nutrient intake from feed, serving as the cornerstone of precision feeding practices, playing a critical role in improving production efficiency and enhancing the quality of dairy products. To address the high costs of traditional measurement methods and the structural complexity and large parameter counts of neural network models, this study proposes a Stacking ensemble learning model to assess DMI, with model parameters optimized using the Tuna Swarm Optimization (TSO) algorithm to enhance assessment accuracy, taking cow body weight, lying duration, lying times, rumination duration, foraging duration, walking steps, and the concentrate-to-roughage feed ratio as input variables. To further improve TSO’s search efficiency and spatial exploration, this study introduces Sine–Logistic chaotic mapping, Levy flight, and Gaussian random walk strategy to optimize the TSO algorithm, developing the improved Tuna Swarm Optimization (ITSO). ITSO-optimized Stacking model achieved superior performance in DMI assessment, with an accuracy of 95.84%, significantly outperforming SVR, RF, DT, GBR, ETR, and AdaBoost models. This study provides a robust tool for precision feeding, contributing to optimizing cow feeding strategies, improving farm efficiency, and supporting sustainable dairy farming practices. Full article

(This article belongs to the Section Cattle)

► Show Figures

Figure 1

16 pages, 3373 KB

Open AccessArticle

Intelligent Assessment Framework of Unmanned Air Vehicle Health Status Based on Bayesian Stacking

by Junfu Qiao, Jinqin Guo, Yu Zhang and Yongwei Li

Batteries 2026, 12(2), 62; https://doi.org/10.3390/batteries12020062 - 14 Feb 2026

Viewed by 300

Abstract

This paper proposed a stacking-based ensemble model to replace the traditional single machine learning model prediction approach, significantly improving the evaluation efficiency of SoC and SoH of lithium batteries. Firstly, a dataset was constructed including three input variables (temperature, current, and voltage) and [...] Read more.

This paper proposed a stacking-based ensemble model to replace the traditional single machine learning model prediction approach, significantly improving the evaluation efficiency of SoC and SoH of lithium batteries. Firstly, a dataset was constructed including three input variables (temperature, current, and voltage) and two output variables (SoC and SoH). Pearson correlation coefficients and histograms were used for preliminary analysis of the correlations and distributions of the dataset. The multi-layer perceptron (MLP), support vector machine (SVM), random forest (RF), and extreme gradient boosting tree (XGB) were used as base prediction models. Bayesian optimization (BO) was used to fine-tune the parameters of these models, then three statistical indicators were compared to assess the prediction accuracy of the four ML models. Furthermore, MLP, SVM, and RF were selected as base models, while XGB was used as the meta-model, enhancing the integrated performance of the prediction models. SHAP was used to quantify the influence of the output variables on SoC. Finally, linked measures for the prediction model were proposed to achieve autonomous monitoring of drones. The results showed that XGB exhibited superior prediction accuracy, with R² of 0.93 and RMSE of 0.14. The ensemble model obtained using stacking reduced the number of outliers by 89.4%. Current was identified as the key variable influencing both SoC and SoH. Furthermore, the intelligent prediction model proposed in this paper can be integrated with controllers, visualization web pages, and other systems to enable the health status assessment of drones. Full article

(This article belongs to the Section Battery Performance, Ageing, Reliability and Safety)

► Show Figures

Figure 1

36 pages, 31133 KB

Open AccessArticle

SOBLE-Top5: A Stacking Ensemble Learning-Based Seasonal Downscaling Inversion Framework for Surface Soil Moisture Using Multi-Source Data

by Shengmin Zhu, Haiyang Yu, Bingqian Ji, Qi Liu and Deng Pan

Remote Sens. 2026, 18(4), 585; https://doi.org/10.3390/rs18040585 - 13 Feb 2026

Viewed by 270

Abstract

Surface soil moisture (SSM) serves as a critical indicator for regional water cycles, agricultural management, and drought monitoring. However, existing the SMAP data suffers from limited spatial resolution, making it challenging to meet the demands of large-scale, high-resolution applications. Taking Henan Province, located [...] Read more.

Surface soil moisture (SSM) serves as a critical indicator for regional water cycles, agricultural management, and drought monitoring. However, existing the SMAP data suffers from limited spatial resolution, making it challenging to meet the demands of large-scale, high-resolution applications. Taking Henan Province, located in east-central China with a continental monsoon climate and marked seasonal variability, as the study area, this research integrates multi-source data to develop a seasonal modeling strategy. Based on stacking ensemble learning, the SSM downscaling inversion model (SOBLE-Top5) is constructed. SHAP value attribution analysis is employed to reveal the primary drivers of seasonal dynamics. The results indicate: (1) The SSM exhibits distinct seasonal characteristics. Compared to the all-season modeling, the RMSE and R² metrics significantly improve during spring and summer. The winter ET and RF models show an approximately 9–14% higher R² and a 47–50% lower RMSE. (2) The SOBLE-Top5 strategy achieved up to a 4.65% higher R² and a 21.22% lower RMSE compared to the optimal single base model. (3) Spatial variations in the SSM characteristics reveal stable performance during the winter. The spring saw slight SSM declines in the northern regions due to rising temperatures. The study area reached its annual low (<0.08 m³/m³) in May–June. Driven by flood season precipitation, July–August witnessed local increases exceeding 52%. The autumn exhibited a stable-then-rising trend with pronounced north–south gradient characteristics. (4) The SHAP analysis indicates that the winter SSM is primarily controlled by bulk density and clay content. The spring SSM is most influenced by LST, followed by bulk density. The summer and the autumn SSM are synergistically driven by multiple factors including elevation, temperature, and precipitation, with the summer precipitation exerting the most significant impact on instantaneous SSM variations. Full article

(This article belongs to the Section Remote Sensing in Agriculture and Vegetation)

► Show Figures

Figure 1

11 pages, 1064 KB

Open AccessProceeding Paper

Ensemble-Based Imputation for Handling Missing Values in Healthcare Datasets: A Comparative Study of Machine Learning Models

by Bilal Ibrahim Maijamaa, Salim Ahmad, Aminu Musa, Abdullahi Ishaq and Abida Ayuba

Eng. Proc. 2026, 124(1), 21; https://doi.org/10.3390/engproc2026124021 - 9 Feb 2026

Viewed by 245

Abstract

This study addresses the challenge of missing numerical values in healthcare datasets by proposing a Particle Swarm Optimization (PSO)-optimized stacking ensemble model for data imputation. The framework combines Random Forest, XGBoost, and Linear Regression within a stacking architecture, with PSO used to optimize [...] Read more.

This study addresses the challenge of missing numerical values in healthcare datasets by proposing a Particle Swarm Optimization (PSO)-optimized stacking ensemble model for data imputation. The framework combines Random Forest, XGBoost, and Linear Regression within a stacking architecture, with PSO used to optimize model selection and hyperparameters for improved accuracy. The approach was evaluated on the Breast Cancer Wisconsin and Heart Disease datasets under Missing Completely at Random (MCAR) conditions at 30%, 20%, and 10% missingness levels, using RMSE, MAE, R², and processing time as performance metrics. Experimental results show that the proposed model consistently outperforms individual learners across all missingness scenarios, achieving an RMSE of 0.0446, MAE of 0.0303, and R² of 86.56% on the Breast Cancer dataset at 10% MCAR, and an RMSE of 0.1388 with an R² of 75.19% on the Heart Disease dataset. Compared with a MissForest-based existing approach, the proposed framework demonstrates substantial reductions in imputation error, confirming the effectiveness of combining ensemble learning with evolutionary optimization. Although the PSO-based stacking model incurs higher computational cost, the findings indicate that it provides a robust, accurate, and generalizable solution for numerical data imputation in healthcare applications under varying levels of missingness. Full article

(This article belongs to the Proceedings of The 6th International Electronic Conference on Applied Sciences)

► Show Figures

Figure 1

39 pages, 3699 KB

Open AccessArticle

Enhancing Decision Intelligence Using Hybrid Machine Learning Framework with Linear Programming for Enterprise Project Selection and Portfolio Optimization

by Abdullah, Nida Hafeez, Carlos Guzmán Sánchez-Mejorada, Miguel Jesús Torres Ruiz, Rolando Quintero Téllez, Eponon Anvi Alex, Grigori Sidorov and Alexander Gelbukh

AI 2026, 7(2), 52; https://doi.org/10.3390/ai7020052 - 1 Feb 2026

Viewed by 1087

Abstract

This study presents a hybrid analytical framework that enhances project selection by achieving reasonable predictive accuracy through the integration of expert judgment and modern artificial intelligence (AI) techniques. Using an enterprise-level dataset of 10,000 completed software projects with verified real-world statistical characteristics, we [...] Read more.

This study presents a hybrid analytical framework that enhances project selection by achieving reasonable predictive accuracy through the integration of expert judgment and modern artificial intelligence (AI) techniques. Using an enterprise-level dataset of 10,000 completed software projects with verified real-world statistical characteristics, we develop a three-step architecture for intelligent decision support. First, we introduce an extended Analytic Hierarchy Process (AHP) that incorporates organizational learning patterns to compute expert-validated criteria weights with a consistent level of reliability (

C R = 0.04

), and Linear Programming is used for portfolio optimization. Second, we propose a machine learning architecture that integrates expert knowledge derived from AHP into models such as Transformers, TabNet, and Neural Oblivious Decision Ensembles through mechanisms including attention modulation, split criterion weighting, and differentiable tree regularization. Third, the hybrid AHP-Stacking classifier generates a meta-ensemble that adaptively balances expert-derived information with data-driven patterns. The analysis shows that the model achieves 97.5% accuracy, a 96.9% F1-score, and a 0.989 AUC-ROC, representing a 25% improvement compared to baseline methods. The framework also indicates a projected 68.2% improvement in portfolio value (estimated incremental value of USD 83.5 M) based on post factum financial results from the enterprise’s ventures.This study is evaluated retrospectively using data from a single enterprise, and while the results demonstrate strong robustness, generalizability to other organizational contexts requires further validation. This research contributes a structured approach to hybrid intelligent systems and demonstrates that combining expert knowledge with machine learning can provide reliable, transparent, and high-performing decision-support capabilities for project portfolio management. Full article

► Show Figures

Figure 1

25 pages, 5911 KB

Open AccessArticle

Soil Moisture Inversion in Alfalfa via UAV with Feature Fusion and Ensemble Learning

by Jinxi Chen, Jianxin Yin, Yuanbo Jiang, Yanxia Kang, Yanlin Ma, Guangping Qi, Chungang Jin, Bojie Xie, Wenjing Yu, Yanbiao Wang, Junxian Chen, Jiapeng Zhu and Boda Li

Plants 2026, 15(3), 404; https://doi.org/10.3390/plants15030404 - 28 Jan 2026

Viewed by 239

Abstract

Timely access to soil moisture conditions in farmland crops is the foundation and key to achieving precise irrigation. Due to their high spatiotemporal resolution, unmanned aerial vehicle (UAV) remote sensing has become an important method for monitoring soil moisture. This study addresses soil [...] Read more.

Timely access to soil moisture conditions in farmland crops is the foundation and key to achieving precise irrigation. Due to their high spatiotemporal resolution, unmanned aerial vehicle (UAV) remote sensing has become an important method for monitoring soil moisture. This study addresses soil moisture retrieval in alfalfa fields across different growth stages. Based on UAV multispectral images, a multi-source feature set was constructed by integrating spectral and texture features. The performance of three machine learning models—random forest regression (RFR), K-nearest neighbors regression (KNN), and XG-Boost—as well as two ensemble learning models, Voting and Stacking, was systematically compared. The results indicate the following: (1) The integrated learning models generally outperform individual machine learning models, with the Voting model performing best across all growth stages, achieving a maximum R² of 0.874 and an RMSE of 0.005; among the machine learning models, the optimal model varies with growth stage, with XG-Boost being the best during the branching and early flowering stages (maximum R² of 0.836), while RFR performs better during the budding stage (R² of 0.790). (2) The fusion of multi-source features significantly improved inversion accuracy. Taking the Voting model as an example, the accuracy of the fused features (R² = 0.874) increased by 0.065 compared to using single-texture features (R² = 0.809), and the RMSE decreased from 0.012 to 0.005. (3) In terms of inversion depth, the optimal inversion depth for the branching stage and budding stage is 40–60 cm, while the optimal depth for the early flowering stage is 20–40 cm. In summary, the method that integrates multi-source feature fusion and ensemble learning significantly improves the accuracy and stability of alfalfa soil moisture inversion, providing an effective technical approach for precise water management of artificial grasslands in arid regions. Full article

(This article belongs to the Special Issue Water and Nutrient Management for Sustainable Crop Production)

► Show Figures

Figure 1

31 pages, 5186 KB

Open AccessArticle

Simulating Daily Evapotranspiration of Summer Soybean in the North China Plain Using Four Machine Learning Models

by Liyuan Han, Fukui Gao, Shenghua Dong, Yinping Song, Hao Liu and Ni Song

Agronomy 2026, 16(3), 315; https://doi.org/10.3390/agronomy16030315 - 26 Jan 2026

Viewed by 493

Abstract

Accurate estimation of crop evapotranspiration (ET) is essential for achieving efficient agricultural water use in the North China Plain. Although machine learning techniques have demonstrated considerable potential for ET simulation, a systematic evaluation of model-architecture suitability and hyperparameter optimization strategies specifically for summer [...] Read more.

Accurate estimation of crop evapotranspiration (ET) is essential for achieving efficient agricultural water use in the North China Plain. Although machine learning techniques have demonstrated considerable potential for ET simulation, a systematic evaluation of model-architecture suitability and hyperparameter optimization strategies specifically for summer soybean ET estimation in this region is still lacking. To address this gap, we systematically compared several machine learning architectures and their hyperparameter optimization schemes to develop a high-accuracy daily ET model for summer soybean in the North China Plain. Synchronous observations from a large-scale weighing lysimeter and an automatic weather station were first used to characterize the day-to-day dynamics of soybean ET and to identify the key driving variables. Four algorithms—support vector regression (SVR), Random Forest (RF), extreme gradient boosting (XGBoost), and a stacking ensemble—were then trained for ET simulation, while Particle Swarm Optimization (PSO), Genetic Algorithms (GAs), and Randomized Grid Search (RGS) were employed for hyperparameter tuning. Results show that solar radiation (R_S), maximum air temperature (T_max), and leaf area index (LAI) are the dominant drivers of ET. The Stacking-PSO-F3 combination, forced with Rs, T_max, LAI, maximum relative humidity (RH_max), and minimum relative humidity (RH_min), achieved the highest accuracy, yielding R² values of 0.948 on the test set and 0.900 in interannual validation, thereby demonstrating excellent precision, stability, and generalizability. The proposed model provides a robust technical tool for precision irrigation and regional water resource optimization. Full article

(This article belongs to the Special Issue Water and Fertilizer Regulation Theory and Technology in Crops)

► Show Figures

Figure 1

Search Results (264)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (264)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI