Saved Queries

Postoperative stroke is a serious and fatal condition that often affects elderly surgical patients. This rare but severe complication arises from complex interactions between comorbidities, physiologic instability and demographic disturbances that traditional risk tools often fail to capture.This study aims to develop and validate a machine learning model with an improved ability to predict the risk of postoperative stroke in elderly patients utilising the comprehensive clinical and demographic ICU data from the Medical Information Mart for Intensive Care IV (MIMIC-IV) database. External validation was performed on MIMIC-III and the eICU Collaborative Research Database, with eICU being the primary validation set. We identified postoperative surgical intensive care unit (SICU) patients aged 55 years or older from all databases. A strict temporal window of the first 24 h of ICU admission was applied across all three datasets while extracting features like laboratory measurements and vital sign summaries in order to ensure that all predictor values were derived from a fixed observation period at the beginning of ICU stay. After preprocessing, applying Multivariate Imputation by Chained Equations (MICE) imputation and initial screening of 88 candidate variables, 20 clinically meaningful predictors were selected through a multistage feature selection pipeline incorporating RFECV and permutation importance. SHAP analysis and LIME analysis were used for interpretability. We evaluated ten machine learning techniques, including Logistic Regression, Decision Tree, Random Forest, K-Nearest Neighbors (KNNs), Support Vector Machine (SVM–RBF Kernel), Gradient Boosting (GBDT), Neural Network, XGBoost, CatBoost, Naive Bayes. Among them, Random Forest demonstrated strong predictive performance by achieving an AUROC of 0.8072 (95% CI [0.7890, 0.8253]) on the internal validation set. The model also achieved AUROC of 0.7557 (95% CI [0.7267, 0.7794]) and 0.9144 (95% CI [0.8893, 0.9378]) on the external validation sets eICU and MIMIC-III, respectively. Mean systolic blood pressure, Elixhauser score, minimum calcium, and minimum INR (PT) were consistently identified as the most influential predictors through both SHAP analysis and LIME analysis, thus strengthening model interpretability. Our findings suggest that a Random Forest-based predictive model can provide an accurate and generalisable prediction of postoperative stroke in elderly ICU patients using routinely collected physiologic and laboratory data. This also supports early risk stratification and targeted postoperative monitoring. Full article

(This article belongs to the Section Applied Biomedical Data Science)

►▼ Show Figures

Figure 1

32 pages, 6586 KB

Open AccessArticle

Multidirectional Ultrasound Propagation Velocity as a Predictor of Open Porosity and Water Absorption in Volcanic Rocks: Traditional Regression and Machine Learning

by José A. Valido, José M. Cáceres and Luís Sousa

Appl. Sci. 2026, 16(7), 3225; https://doi.org/10.3390/app16073225 - 26 Mar 2026

Viewed by 102

Abstract

Ultrasound propagation velocity was investigated as a non-destructive predictor of open porosity (

ρ_{0}

) and water absorption (

A_{w}

) in volcanic rocks (two ignimbrites, a trachyte, and a basalt). Six velocity measurements were obtained under dry and saturated conditions [...] Read more.

Ultrasound propagation velocity was investigated as a non-destructive predictor of open porosity (

ρ_{0}

) and water absorption (

A_{w}

) in volcanic rocks (two ignimbrites, a trachyte, and a basalt). Six velocity measurements were obtained under dry and saturated conditions along three orthogonal directions, and the dry Z-axis velocity was selected as the reference univariate predictor because it provided the highest explanatory power and the best cross-validated performance among the tested ultrasound variables. Four univariate regressions (linear, exponential, power law, and second-order polynomial), parametric multivariable linear regression, and five machine learning regressors were compared using lithology-stratified 5-fold cross-validation, grouping both ignimbrites as a single lithology. Univariate models showed moderate predictive capability for

ρ_{0}

(cross-validated coefficient of determination

R^{2} \approx

0.506 to 0.580), whereas

A_{w}

was captured more accurately, with the power law model reaching 0.923 ± 0.008. Multivariable linear regression improved

ρ_{0}

when lithology was included (0.803 ± 0.084), while changes for

A_{w}

were small. The highest accuracy was achieved by ensemble tree methods: extremely randomized trees with lithology yielded 0.949 ± 0.015 for

ρ_{0}

(root mean square error 2.16 ± 0.38 percentage points), and Gradient Boosting with lithology yielded 0.976 ± 0.006 for

A_{w}

(0.80 ± 0.12 percentage points). Full article

(This article belongs to the Special Issue Application of Ultrasonic Non-Destructive Testing—Second Edition)

►▼ Show Figures

Figure 1

22 pages, 4755 KB

Open AccessArticle

Comparative Assessment of Supervised Machine Learning Models for Predicting Water Uptake in Sorption-Based Thermal Energy Storage

by Milad Tajik Jamalabad, Elham Abohamzeh, Daud Mustafa Minhas, Seongbhin Kim, Dohyun Kim, Aejung Yoon and Georg Frey

Energies 2026, 19(7), 1619; https://doi.org/10.3390/en19071619 - 25 Mar 2026

Viewed by 143

Abstract

In this study, supervised machine learning (ML) regression models are employed to predict water uptake during the sorption process in a sorption reactor for thermal energy storage applications. Two main methods are used to study sorption storage systems: experimental studies and numerical simulations. Experimental studies involve physical testing and measurements but are often costly and time-consuming. Numerical simulations are more flexible and cost-effective, though they can require significant computational resources for large or complex systems. To address these challenges, researchers are increasingly employing various machine learning techniques, which offer strong potential for data analysis and predictive modeling. In this study, CFD-based sorption simulations are integrated with machine learning models to predict the spatiotemporal evolution of water uptake. Several ML techniques including support vector regression (SVR), Random Forest, XGBoost, CatBoost (gradient boosting decision trees), and multilayer perceptron neural networks (MLPs) are evaluated and compared. A fixed-bed reactor equipped with fins and tubes is considered within a closed adsorption thermal storage system. Numerical simulations are conducted for three different fin lengths (10 mm, 25 mm, and 35 mm) to generate a comprehensive dataset for training the ML models and capturing the complex temporal evolution of water uptake, thereby enabling predictions for unseen fin geometries. The results indicate that neural network-based models achieve superior predictive performance compared to the other methods. For water uptake training, the mean absolute error (MAE), root mean squared error (RMSE), and coefficient of determination

(R^{2})

are approximately 2.83, 4.37, and 0.91, respectively. The predicted water uptake shows close agreement with the numerical simulation results. For the prediction cases, the MAE, MSE, and

R^{2}

values are approximately 1.13, 1.2, and 0.8, respectively. Overall, the study demonstrates that machine learning models can accurately predict water uptake beyond the training dataset, indicating strong generalization capability and significant potential for improving thermal management system design. Additionally, the proposed approach reduces simulation time and computational cost while providing an efficient and reliable framework for modeling complex sorption processes in thermal energy storage systems. Full article

(This article belongs to the Special Issue HVAC System: Load Forecasting, System Modeling, Optimal Control and Flexible Interaction)

►▼ Show Figures

Figure 1

22 pages, 5716 KB

Open AccessArticle

Machine-Learning-Based Historical Reconstruction of Soil Organic Carbon Dynamics in Coastal Tidal Flats: Quantifying the Spatiotemporal Impacts of Reclamation

by Caiyao Kou, Yongbin Zhang, Weidong Man, Fuping Li, Chunyan Lu, Qingwen Zhang and Mingyue Liu

Remote Sens. 2026, 18(7), 978; https://doi.org/10.3390/rs18070978 - 25 Mar 2026

Viewed by 225

Abstract

Coastal tidal flat soil organic carbon (SOC) is significantly affected by reclamation activities. However, the limited availability of historical SOC data constrains the reconstruction of past SOC. SOC data were integrated in current time-point and remote sensing data during the last two decades by applying machine learning (ML) methods such as random forest (RF), boosted regression trees (BRT), and extreme gradient boosting (XGBoost) to map the spatiotemporal distribution of tidal flat reclamation and the spatial distribution of SOC content in the western coastal region of the Bohai Rim over the last two decades and to explore how the period and type of reclamation affect SOC content. The results show that: (1) The area of tidal flats decreased by 61.92% from 2000 to 2020 due to reclamation activities. (2) Among the ML methods, the XGBoost model demonstrated the best performance (R² = 0.71, MAE = 0.93 g/kg, RMSE = 1.32 g/kg, d-Willmott = 0.98), with the modified normalized difference water index (MNDWI) being the most important predictor variable. (3) The SOC content of tidal flats decreased from 4.11 g/kg in 2000 to 3.33 g/kg in 2020, a reduction of 18.98%. (4) The reclamation of tidal flats into marshes, forest lands, grasslands, farmlands, and bare lands led to an increasing trend in SOC content, with the greatest increase observed in regions converted to farmlands. This study provides data support for the control of reclamation activities, creation of tidal flat conservation policies, and strategic decision-making for climate change mitigation. Full article

(This article belongs to the Special Issue Intelligent Remote Sensing for Wetland Mapping and Monitoring)

►▼ Show Figures

Figure 1

20 pages, 2519 KB

Open AccessFeature PaperArticle

Machine Learning Framework for Predicting Mechanical Properties of Heat-Treated Alloys: Computational Approach

by Saurabh Tiwari and Aman Gupta

Metals 2026, 16(3), 320; https://doi.org/10.3390/met16030320 - 13 Mar 2026

Viewed by 304

Abstract

Heat treatment critically controls microstructure and mechanical properties in engineering alloys, but experimental optimization is costly and time-intensive. Machine learning (ML) offers a data-driven alternative, though data scarcity and feature leakage often limit predictive reliability. A comprehensive ML framework was developed and validated using a physics-informed synthetic dataset of 332 heat-treated alloy samples covering carbon steels (AISI 4140, 1080, 4340, 5130), aluminum alloys (AlSi7Mg, AlSi10Mg, Al6061, Al2618), and stainless steels (304, 316L). Twenty-seven features describing chemical composition, heat-treatment parameters, and microstructural characteristics were initially included. Following strict data-leakage analysis, all six mechanical property features were fully removed, leaving 22 independent predictors. Five regression models—Extra Trees, Random Forest, Gradient Boosting, Ridge, and ElasticNet—were evaluated using a 70/15/15 train–validation–test split with randomized hyperparameter optimization and 3-fold cross-validation. The Random Forest model showed the best test performance for tensile strength prediction (R² = 0.9282, RMSE = 37.24 MPa, MAE = 28.54 MPa, MAPE = 5.39%), with minimal overfitting. Tempering temperature, carbon content, and manganese content were the most influential features, aligning with established metallurgical principles. The proposed framework demonstrates robust, leakage-free prediction of mechanical properties from composition and processing parameters, offering a scalable approach for accelerated alloy design pending experimental validation. This study serves as a methodological framework demonstration; the reported performance metrics are benchmarks against the synthetic dataset, and experimental validation with real alloy data remains essential for industrial deployment. Full article

►▼ Show Figures

Figure 1

17 pages, 3074 KB

Open AccessArticle

Predicting CO₂ Solubility in Brine for Carbon Storage with a Hybrid Machine Learning Framework Optimized by Ant Colony Algorithm

by Seyed Hossein Hashemi, Farshid Torabi and Sepideh Palizdan

Water 2026, 18(6), 662; https://doi.org/10.3390/w18060662 - 11 Mar 2026

Viewed by 231

Abstract

Predicting carbon dioxide (CO₂) solubility in brine is critical for carbon capture and storage. This study employs the Ant Colony Optimization (ACO) algorithm to enhance the predictive accuracy of four machine learning models: Neural Network (NN), Decision Tree (DT), Support Vector Regression (SVR), and Gradient Boosting Machine (GBM). The models were trained and validated on a mineral compound dataset. Performance was evaluated using the coefficient of determination (R²) and error metrics including RMSE and MAE. The GBM model achieved the highest test accuracy (R² = 0.986) with low errors (RMSE = 0.0478, MAE = 0.0362), demonstrating superior ability to model complex, non-linear relationships with minimal overfitting. The optimized NN, featuring three layers and fifteen neurons, delivered strong performance (R² = 0.930) with balanced errors across datasets. The DT model offered excellent interpretability and a strong test score (R² = 0.912), while the SVR model provided robust generalization (R² = 0.889). The results indicate that ACO is an effective tool for hyperparameter tuning across diverse model architectures. For maximum accuracy, GBM is recommended, whereas DT is ideal when interpretability is required. The NN presents a strong middle-ground option with competitive accuracy. This comparative framework assists in selecting the optimal model based on specific project priorities of accuracy, transparency, or computational efficiency for geochemical forecasting. Full article

(This article belongs to the Special Issue Intelligent Water Management: Machine Learning, Remote Sensing, Data Analytics, Predictive Modeling, and the Path to Sustainability)

►▼ Show Figures

Figure 1

21 pages, 474 KB

Open AccessArticle

Performance Evaluation of Machine Learning and Deep Learning Models for Credit Risk Prediction

by Irvine Mapfumo and Thokozani Shongwe

J. Risk Financial Manag. 2026, 19(3), 210; https://doi.org/10.3390/jrfm19030210 - 11 Mar 2026

Viewed by 430

Abstract

Credit risk prediction is essential for financial institutions to effectively assess the likelihood of borrower defaults and manage associated risks. This study presents a comparative analysis of deep learning architectures and traditional machine learning models on imbalanced credit risk datasets. To address class imbalance, we employ three resampling techniques: Synthetic Minority Over-sampling Technique (SMOTE), Edited Nearest Neighbors (ENN), and the hybrid SMOTE-ENN. We evaluate the performance of various models, including multilayer perceptron (MLP), convolutional neural network (CNN), long short-term memory (LSTM), gated recurrent unit (GRU), logistic regression, decision tree, support vector machine (SVM), random forest, adaptive boosting, and extreme gradient boosting. The analysis reveals that SMOTE-ENN combined with MLP achieves the highest F1-score of 0.928 (accuracy 95.4%) on the German dataset, while SMOTE-ENN with random forest attains the best F1-score of 0.789 (accuracy 82.1%) on the Taiwanese dataset. SHapley Additive exPlanations (SHAP) are employed to enhance model interpretability, identifying key drivers of credit default. These findings provide actionable guidance for developing transparent, high-performing, and robust credit risk assessment systems. Full article

(This article belongs to the Section Financial Technology and Innovation)

►▼ Show Figures

Figure 1

19 pages, 32031 KB

Open AccessArticle

Performance Prediction of Perovskite-Catalyzed CO₂ Decomposition Based on Machine-Learning Method

by Jiayi Chen, Kun Wang, Huaqing Xie, Kerong Ma and Kunlun Li

Energies 2026, 19(6), 1388; https://doi.org/10.3390/en19061388 - 10 Mar 2026

Viewed by 274

Abstract

Perovskite oxides show excellent catalytic performance for thermochemical CO₂ splitting, with A/B-site cation substitution further enhancing redox activity. While traditional first-principles methods are computationally expensive, machine learning (ML) provides an efficient approach to perovskite optimization. In this paper, machine learning is employed to investigate and predict the performance of perovskite catalysts in CO₂ decomposition reactions. Based on 227 perovskite compositions (A₁A₂)(B₁B₂)O₃ curated from experimental literature, a total of five ML models are used, including Decision Tree, Bagging, Random Forest, Extra Trees, and Gradient Boosting Regression (GBR). The Random Forest model performed best. After hyperparameter optimization, the Random Forest model achieved an R² of 0.910 and an MAE of 41.528 on an independent test set. SHAP analysis indicated that the thermal reduction temperature (T₁) and the B1-site stoichiometric fraction (C_b₁) are the most influential features governing the predicted CO yield. A higher CO yield is predicted when C_b₁ ranges from 0.6 to 0.8, and T₁ exceeds 1300 °C. This behavior can be attributed to the enhanced formation of oxygen vacancies at elevated temperatures and the optimized electronic structure induced by appropriate B-site stoichiometry. Full article

(This article belongs to the Special Issue Innovative Catalytic Approaches for Energy Conversion and Storage)

►▼ Show Figures

Figure 1

23 pages, 2333 KB

Open AccessArticle

Measurement of Metal Surface Temperature Based on Visible Light Images: A Strategy for On-Site Image Acquisition

by Xingwang Li, Wenhua Wu, Chengxiang Lei, Yang Chen, Zheng Tian and Qizheng Ye

Appl. Sci. 2026, 16(5), 2556; https://doi.org/10.3390/app16052556 - 6 Mar 2026

Viewed by 175

Abstract

Based on the mechanism of thermally modulated reflected light, visible light images combined with machine learning methods can be used to estimate the surface temperature of metal equipment at ambient temperature under sunlight conditions. However, the surface conditions of on-site equipment and camera imaging parameters vary greatly across different scenarios, leading to poor generalization of models trained solely on laboratory image databases. To address this, it is necessary to update the original laboratory database by incorporating on-site images and retrain the model accordingly; on the other hand, since most of the on-site equipment is working normally, there are few images capturing fault-induced high temperatures. Even if the method of updating and retraining on-site images is used, the data imbalance in the image database can still cause significant measurement errors in these high-temperature images. This study studies image database update schemes to address both multi-scenario and data imbalance problems and demonstrates that retraining with as little as 5% scenario-specific images or 1% high-temperature images significantly improves temperature prediction accuracy, which was validated through on-site experiments at a substation. By comparing four machine learning algorithms (random forest regression, gradient boosted regression trees, decision trees, and k-nearest neighbors), this study reveals that RFR yields the best performance. These findings enhance the practical applicability of visible light image-based temperature measurement models in engineering contexts. Full article

(This article belongs to the Special Issue Applied Computer Vision and Deep Learning)

►▼ Show Figures

Figure 1

46 pages, 990 KB

Open AccessReview

Machine Learning for Outdoor Thermal Comfort Assessment and Optimization: Methods, Applications and Perspectives

by Giouli Mihalakakou, John A. Paravantis, Alexandros Romeos, Sonia Malefaki, Paraskevas N. Georgiou and Athanasios Giannadakis

Sustainability 2026, 18(5), 2600; https://doi.org/10.3390/su18052600 - 6 Mar 2026

Viewed by 297

Abstract

Urban environments face increasing thermal stress from climate change and the Urban Heat Island effect, with significant implications for livability, public health, and energy sustainability. Outdoor thermal comfort is defined as the state in which conditions are perceived as acceptable, depends on interactions among meteorological, morphological, physiological, and behavioral factors. This review synthesizes the application of machine learning (ML) to outdoor thermal comfort assessment into a practice-oriented taxonomy. Research spans diverse climates and urban forms, using inputs across environmental and human domains. Supervised learning dominates. Regression approaches (linear regression, support vector regression, random forest, gradient boosting) and classification algorithms (decision trees, support vector machines, K-nearest neighbors, Naïve Bayes, random forest classifiers) are widely used to predict thermal indices such as the Physiological Equivalent Temperature and Universal Thermal Climate Index, or to classify subjective responses including thermal sensation, comfort, and acceptability. Unsupervised learning (clustering, principal component analysis) supports identification of microclimatic zones and perceptual clusters, while deep learning (multilayer perceptrons, convolutional and recurrent neural networks, generative adversarial networks) achieves superior accuracy for complex, high-dimensional, and spatiotemporal data. Algorithms such as random forests, support vector machines, and gradient boosting consistently show strong performance for both indices and subjective responses when integrating multi-domain inputs. Semi-supervised and reinforcement learning remain underexplored but offer promise for leveraging large-scale sensor data and enabling adaptive, real-time comfort management. The review concludes with a roadmap emphasizing explainable artificial intelligence, scalable surrogate modeling, and integration with simulation-based optimization and parametric design tools. Full article

►▼ Show Figures

Figure 1

21 pages, 5351 KB

Open AccessArticle

PSO-Based Ensemble Learning Enhanced with Explainable Artificial Intelligence for Breast Glandular Dose Estimation in Mammography

by Sevgi Ünal and Remzi Gürfidan

Appl. Sci. 2026, 16(5), 2514; https://doi.org/10.3390/app16052514 - 5 Mar 2026

Viewed by 300

Abstract

Objectives: This study aims to predict patient-specific Average Glandular Dose (AGD) in mammography using machine learning-based models to support personalised radiation dose optimisation and reduce unnecessary exposure during breast cancer screening. Methods: A retrospective dataset of 671 female patients who underwent full-field digital mammography between 2020 and 2024 was analysed. Right craniocaudal (CC) images were used to construct a structured dataset including mAs, kVp, compressed breast thickness, air kerma (k_air), half-value layer (HVL), and breast pattern. Five regression-based machine learning models (CatBoost, Gradient Boosting, Random Forest, Extra Trees, and AdaBoost) and their Particle Swarm Optimisation (PSO)-enhanced versions were evaluated. Model performance was assessed using MSE, RMSE, MAE, MAPE, and R². SHAP analysis was applied to interpret model predictions and determine variable importance. Results: PSO integration significantly reduced prediction errors, particularly in boosting-based models. The CatBoost + PSO model achieved the best performance (RMSE = 0.0100, MAPE ≈ 1.74%, R² = 0.9846), followed by the Gradient Boosting + PSO model (R² = 0.9787). PSO reduced RMSE and MAPE by approximately 55% and 52%, respectively. SHAP analysis identified k_air, breast thickness, and breast pattern as the most influential factors affecting AGD. Conclusions: Machine learning models enhanced with PSO, especially CatBoost + PSO, provide accurate and reliable patient-specific AGD predictions. The proposed approach enables rapid and clinically applicable dose estimation and highlights breast pattern as a critical parameter influencing glandular dose, supporting personalised radiation dose optimisation in mammography. Full article

►▼ Show Figures

Figure 1

26 pages, 407 KB

Open AccessReview

Machine Learning and Deep Learning for Dropout Prediction in Higher Education: A Review

by Beatriz Duro, Anabela Gomes, Fernanda Brito Correia, Ana Rosa Borges and Jorge Bernardino

Computers 2026, 15(3), 164; https://doi.org/10.3390/computers15030164 - 4 Mar 2026

Viewed by 538

Abstract

Student dropout in Higher Education remains a persistent challenge with significant academic, social and economic consequences. Predictive analytics using traditional Machine Learning and Deep Learning have been increasingly explored to support early identification of students at risk. This article presents a structured literature review of studies published between 2018 and 2025 that apply these techniques to predict dropout in Higher Education. Unlike previous reviews, we pay particular attention to model interpretability, practical deployment and ethical considerations when analysing data types, preprocessing strategies and modelling approaches. Results show that transparent traditional models, including Decision Trees, Logistic Regression, and ensemble methods such as Random Forest and Gradient Boosting remain dominant because they perform strongly on structured data and are easier to explain. Deep Learning approaches, although less prevalent, show promise for sequential and behavioural data but face challenges in data availability, explainability, and implementation complexity. Despite frequently high reported performance, most studies rely on single-institution datasets, limiting generalisability, and only a minority address fairness, bias, or real-world integration. This analysis concludes that we must transition from accuracy-focused evaluations to transparent, accountable and actionable predictive systems that facilitate data-driven and inclusive decision-making in Higher Education. Full article

(This article belongs to the Special Issue Transformative Approaches in Education: Harnessing AI, Augmented Reality, and Virtual Reality for Innovative Teaching and Learning (2nd Edition))

►▼ Show Figures

Figure 1

16 pages, 8115 KB

Open AccessArticle

Fusing Deep Learning and Gradient Boosting for Robust Minute-Level Atmospheric Visibility Nowcasting

by Yuguo Ni, Chenbo Xie, Zichen Zhang and Jianfeng Chen

Geosciences 2026, 16(3), 104; https://doi.org/10.3390/geosciences16030104 - 3 Mar 2026

Viewed by 299

Abstract

Atmospheric visibility nowcasting is vital for safety-critical operations but remains challenging due to complex atmospheric dynamics. We propose a compact stacking ensemble merging a multilayer perceptron (MLP) and gradient-boosted regression trees (GBRT). The model, trained on seven months of minute-scale resolution data with a variability-adaptive filter to suppress sensor noise, employs cross-validation. Results demonstrate that the ensemble achieves its peak performance in the operationally critical low-visibility regime (V < 5 km). This range is particularly significant as it encompasses the Category I and II (CAT I/II) operational thresholds defined by the World Meteorological Organization (WMO) for aviation and surface transportation safety. In this regime, the ensemble yields an R² of 0.82 and an MAE≈385 m, significantly outperforming single learners during rapid weather transitions. Conversely, in the high-visibility regime (V > 20 km), the explanatory power decreases (R² of 0.46) due to inherent forward-scattering sensor uncertainties and low aerosol concentrations. Despite these range-specific physical limitations, the model maintains high robustness with narrowly centered residuals. This efficient approach, utilizing cost-effective in situ sensors, is highly suitable for airport and road-weather applications and offers strong potential for multi-site scalability. Full article

(This article belongs to the Section Climate and Environment)

►▼ Show Figures

Figure 1

23 pages, 3889 KB

Open AccessArticle

Enhanced Runoff Prediction in Zijiang River Basin Using Machine Learning and SHAP-Based Interpretability

by Kaiwen Ma, Changbo Jiang, Yuannan Long, Zhiyuan Wu and Shixiong Yan

Water 2026, 18(5), 601; https://doi.org/10.3390/w18050601 - 2 Mar 2026

Viewed by 374

Abstract

To address the limitations of traditional runoff prediction methods—namely, the oversimplification of meteorological factor selection, ambiguous interactions among core variables, and the disruptive influence of redundant inputs—this study focuses on the Zijiang River Basin as a representative case. A suite of machine learning models, including Long Short-Term Memory Neural Network (LSTM), Convolutional Neural Network (CNN)-LSTM, Temporal Convolutional Network (TCN), and Gradient Boosting Regression Tree (GBRT), was constructed and trained using 13 distinct combinations of meteorological variables. These configurations were systematically evaluated to assess their compatibility with each model in simulating daily runoff patterns. Additionally, the Shapley Additive Explanations (SHAP) algorithm was employed to quantitatively assess the contribution of each factor to predictive accuracy. Among the models tested, the TCN model consistently demonstrated superior performance, particularly in mitigating the effects of irrelevant or redundant features. The GBRT model showed distinctive strengths in accurately predicting peak flow timings. Of all input configurations, the combination of “runoff + precipitation + evaporation + temperature” emerged as the most effective. Findings indicate that the predictive value of individual meteorological variables hinges primarily on their direct correlation with runoff, while the effectiveness of multi-factor schemes depends on the degree of functional integration—specifically, the coupling of hydrological recharge, consumption, and regulatory processes. The presence of redundant variables was found to impair model performance unless they contributed to a meaningful synergistic relationship with core inputs. The SHAP analysis further reinforced these insights: precipitation-related variables proved to be the most critical to prediction accuracy, whereas temperature and evaporation served more complementary roles. Notably, the inclusion of relative humidity tended to suppress runoff responses and increased deviation in peak timing estimates. These findings shed light on the nuanced interplay between meteorological input design and model selection, offering a robust foundation for optimizing data-driven runoff prediction frameworks. Full article

(This article belongs to the Special Issue Application of Machine Learning in Hydrological Monitoring)

►▼ Show Figures

Figure 1

27 pages, 4803 KB

Open AccessArticle

Enhancing Short-Term Wind Energy Forecasting with XGBoost and Conformal Prediction for Robust Uncertainty Quantification

by Rabelani Innocent Nthangeni, Caston Sigauke, Thakhani Ravele and Thinawanga Hangwani Tshisikhawe

Computation 2026, 14(3), 56; https://doi.org/10.3390/computation14030056 - 1 Mar 2026

Viewed by 300

Abstract

This paper presents probabilistic wind energy forecasting using quantile regression averaging combined with a conformal prediction modelling framework. The study uses data from Eskom, South Africa’s power utility company. The data is from April 2019 to November 2023. A partial linear additive quantile regression (PLAQR) averaging method is used to combine forecasts from two competing forecasting models: eXtreme Gradient Boosting (XGBoost) and Principal Component Regression (PCR). To compare the predictive abilities of the models, two data splits are used: 80%, 10% and 10% for the first set, and 85%, 10% and 5% for the second set, for training, validation and testing, respectively. Empirical results suggest that the combined predictions from PLAQR perform better than the individual models, significantly improving calibration and accuracy. The proposed combination has the smallest root mean square error (RMSE) and the highest probability of change in direction (POCID). The combination captures nonlinearities and produces well-calibrated probabilistic results. Probability integral transform histograms validate this. This performance gain reflected the importance of data volume. This is reinforced by the fact that the PLAQR model, which combines the benefits of tree-based approaches and linear models, is a robust modelling approach for reliable renewable energy forecasting. Future research directions should consider more varied ensembles. Full article

(This article belongs to the Section Computational Engineering)

►▼ Show Figures

Figure 1

Show export options Show export options

Select all

Export citation of selected articles as:

Error

Oops... you haven't selected anything for export.

Displaying article 1-50 on page 1 of 18.

Go to page 1 2 3 4 5

Search Results (896)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI