MDPI - Publisher of Open Access Journals

29 pages, 3451 KiB

Open AccessArticle

A Dual-Variable Selection Framework for Enhancing Forest Aboveground Biomass Estimation via Multi-Source Remote Sensing

by Dapeng Chen, Hongbin Luo, Zhi Liu, Jie Pan, Yong Wu, Er Wang, Chi Lu, Lei Wang, Weibin Wang and Guanglong Ou

Remote Sens. 2025, 17(14), 2493; https://doi.org/10.3390/rs17142493 - 17 Jul 2025

Abstract

Integrating multi-source remote sensing can improve the accuracy of forest aboveground biomass (AGB) estimation. However, the accuracy and stability of the forest AGB estimation results are affected by multiple remote sensing feature variables as well as parameter tuning of machine learning algorithms. To [...] Read more.

Integrating multi-source remote sensing can improve the accuracy of forest aboveground biomass (AGB) estimation. However, the accuracy and stability of the forest AGB estimation results are affected by multiple remote sensing feature variables as well as parameter tuning of machine learning algorithms. To this end, this study employed six types of remote sensing data—Landsat 8 OLI, Sentinel-2A, GEDI, ICESat-2, ALOS-2, and SAOCOM. A dual-variable selection strategy based on SHapley Additive exPlanations (SHAP) was developed, and a genetic algorithm (GA) was used to optimize the parameters of five machine learning models—elastic net (EN), least absolute shrinkage and selection operator (Lasso), support vector regression (SVR), Random Forest (RF), and Categorical Boosting (CatBoost)—to estimate the AGB of Pinus kesiya var. langbianensis forest in Wuyi Village, Zhenyuan County. The dual-variable selection strategy integrates SHAP with the Pearson correlation coefficient (PC), RF, EN, and Lasso to enhance feature screening robustness and interpretability. The results of the study showed that Lasso-SHAP dual-variate screening was more stable than SHAP univariate screening. In particular, the Lasso-SHAP strategy improved the average R² from 0.59 (using SHAP alone) to above 0.70, achieving an enhancement of 11%. Among GA-optimized parametric machine learning models, the linear GA-Lasso achieved the best performance, with an R² of 0.91 and an RMSE of 12.94 Mg/ha, followed by the GA-EN model (R² = 0.89, RMSE = 14.46 Mg/ha). For nonlinear models, GA-SVR performed the best (R² = 0.74, RMSE = 22.07 Mg/ha), surpassing the GA-CatBoost model (R² = 0.64, RMSE = 25.88 Mg/ha). In summary, the Lasso-SHAP dual-variable selection strategy effectively improves the estimation accuracy of AGB for Pinus kesiya var. langbianensis forests, while GA-optimized machine learning models demonstrate excellent performance, providing strong support for regional-scale forest resource monitoring and carbon stock assessment. Full article

(This article belongs to the Section Forest Remote Sensing)

23 pages, 7016 KiB

Open AccessArticle

SOC Estimation of Lithium-Ion Batteries Utilizing EIS Technology with SHAP–ASO–LightGBM

by Panpan Hu, Chun Yin Li and Chi Chung Lee

Batteries 2025, 11(7), 272; https://doi.org/10.3390/batteries11070272 - 17 Jul 2025

Abstract

Accurate State of Charge (SOC) estimation is critical for optimizing the performance and longevity of lithium-ion batteries (LIBs), which are widely used in applications ranging from electric vehicles to renewable energy storage. Traditional SOC estimation methods, such as Coulomb counting and open-circuit voltage [...] Read more.

Accurate State of Charge (SOC) estimation is critical for optimizing the performance and longevity of lithium-ion batteries (LIBs), which are widely used in applications ranging from electric vehicles to renewable energy storage. Traditional SOC estimation methods, such as Coulomb counting and open-circuit voltage measurement, suffer from cumulative errors and slow response times. This paper proposes a novel machine learning-based approach for SOC estimation by integrating Electrochemical Impedance Spectroscopy (EIS) with the SHapley Additive exPlanations (SHAP) method, Atom Search Optimization (ASO), and Light Gradient Boosting Machine (LightGBM). This study focuses on large-capacity lithium iron phosphate (LFP) batteries (3.2 V, 104 Ah), addressing a gap in existing research. EIS data collected at various SOC levels and temperatures were processed using SHAP for feature extraction (FE), and the ASO–LightGBM model was employed for SOC prediction. Experimental results demonstrate that the proposed SHAP–ASO–LightGBM method significantly improves estimation accuracy, achieving an RMSE of 3.3%, MAE of 1.86%, and R² of 0.99, outperforming traditional methods like LSTM and DNN. The findings highlight the potential of EIS and machine learning (ML) for robust SOC estimation in large-capacity LIBs. Full article

► Show Figures

Figure 1

24 pages, 2173 KiB

Open AccessArticle

A Novel Ensemble of Deep Learning Approach for Cybersecurity Intrusion Detection with Explainable Artificial Intelligence

by Abdullah Alabdulatif

Appl. Sci. 2025, 15(14), 7984; https://doi.org/10.3390/app15147984 - 17 Jul 2025

Abstract

In today’s increasingly interconnected digital world, cyber threats have grown in frequency and sophistication, making intrusion detection systems a critical component of modern cybersecurity frameworks. Traditional IDS methods, often based on static signatures and rule-based systems, are no longer sufficient to detect and [...] Read more.

In today’s increasingly interconnected digital world, cyber threats have grown in frequency and sophistication, making intrusion detection systems a critical component of modern cybersecurity frameworks. Traditional IDS methods, often based on static signatures and rule-based systems, are no longer sufficient to detect and respond to complex and evolving attacks. To address these challenges, Artificial Intelligence and machine learning have emerged as powerful tools for enhancing the accuracy, adaptability, and automation of IDS solutions. This study presents a novel, hybrid ensemble learning-based intrusion detection framework that integrates deep learning and traditional ML algorithms with explainable artificial intelligence for real-time cybersecurity applications. The proposed model combines an Artificial Neural Network and Support Vector Machine as base classifiers and employs a Random Forest as a meta-classifier to fuse predictions, improving detection performance. Recursive Feature Elimination is utilized for optimal feature selection, while SHapley Additive exPlanations (SHAP) provide both global and local interpretability of the model’s decisions. The framework is deployed using a Flask-based web interface in the Amazon Elastic Compute Cloud environment, capturing live network traffic and offering sub-second inference with visual alerts. Experimental evaluations using the NSL-KDD dataset demonstrate that the ensemble model outperforms individual classifiers, achieving a high accuracy of 99.40%, along with excellent precision, recall, and F1-score metrics. This research not only enhances detection capabilities but also bridges the trust gap in AI-powered security systems through transparency. The solution shows strong potential for application in critical domains such as finance, healthcare, industrial IoT, and government networks, where real-time and interpretable threat detection is vital. Full article

(This article belongs to the Special Issue Advanced Cybersecurity Applications: Solutions to Counteract Cyber Threats)

► Show Figures

Figure 1

26 pages, 10906 KiB

Open AccessArticle

Explainable Machine Learning for Mapping Rainfall-Induced Landslide Thresholds in Italy

by Xiangyu Shao, Wenjun Yan, Chaoying Yan, Wen Zhao, Yixuan Wang, Xia Shi, Hongchang Dong, Tianjiang Li, Junpo Yu, Peng Zuo, Zeyu Zhou and Jiming Jin

Appl. Sci. 2025, 15(14), 7937; https://doi.org/10.3390/app15147937 - 16 Jul 2025

Abstract

Reliable rainfall thresholds are critical for effective early warning and mitigating the risks of rainfall-induced landslides. Traditional statistical models have limitations in multi-variable modeling, while machine learning models face interpretability challenges. Explainable machine learning methods can address these challenges, but they are rarely [...] Read more.

Reliable rainfall thresholds are critical for effective early warning and mitigating the risks of rainfall-induced landslides. Traditional statistical models have limitations in multi-variable modeling, while machine learning models face interpretability challenges. Explainable machine learning methods can address these challenges, but they are rarely applied to rainfall threshold modeling. In this study, we compared the performance of an empirical statistical model and machine learning models for predicting rainfall-induced landslides in Italy. Based on the optimal model, we visualized refined rainfall thresholds at three probability levels and employed SHAP (Shapley Additive Explanations) to enhance model explainability by quantifying the contribution of each input variable to the predictions. The results demonstrated that the XGBoost model achieved a good performance (AUC = 0.917 ± 0.026) with well-balanced sensitivity (0.792 ± 0.075) and specificity (0.812 ± 0.033) in landslide susceptibility modeling. Hydrological factors, particularly total rainfall, were identified as the dominant triggering mechanisms, with SHAP analysis confirming their substantially greater contribution compared to environmental factors in rainfall threshold modeling. The developed visualized threshold maps revealed distinct spatial variations in landslide-triggering rainfall thresholds across Italy, characterized by lower thresholds in gentle slope areas with moderate annual precipitation and higher thresholds in steep slope and mid-to-low-elevation regions, while these regional differences decreased under high-probability scenarios. This study offered a modeling approach for regional rainfall threshold assessment by integrating multi-variable modeling with explainable methods, contributing to the development of landslide early warning systems. Full article

► Show Figures

Figure 1

20 pages, 9405 KiB

Open AccessArticle

Developing a Hybrid Model to Enhance the Robustness of Interpretability for Landslide Susceptibility Assessment

by Xiao Yan, Dongshui Zhang, Yongshun Han, Tongsheng Li, Pin Zhong, Zhe Ning and Shirou Tan

ISPRS Int. J. Geo-Inf. 2025, 14(7), 277; https://doi.org/10.3390/ijgi14070277 - 16 Jul 2025

Abstract

Landslide is one of the most damaging natural hazards, causing extensive damage to the infrastructure and threatening human life. Although advances have been made in landslide susceptibility assessment by objective explainable machine learning, the interpretability robustness of traditional single landslide susceptibility model is [...] Read more.

Landslide is one of the most damaging natural hazards, causing extensive damage to the infrastructure and threatening human life. Although advances have been made in landslide susceptibility assessment by objective explainable machine learning, the interpretability robustness of traditional single landslide susceptibility model is still low. The proposed interpretable hybrid model in this study overcomes these challenges and aims to enhance the stability of landslide susceptibility interpretability. The model integrates three base machine learning models—LightGBM, XGBoost, and Random Forest—using a heterogeneous category strategy, thereby enhancing the robustness of model interpretability. The hybrid model is interpreted using SHAP (Shapley Additive Explanations) values, which quantify feature contributions. A 10-fold cross-validation with the coefficient of variation (CV) metric reveals that the hybrid model outperforms individual base models in terms of interpretive robustness, yielding a lower CV value of 0.175 compared to 0.208 for LightGBM, 0.240 for XGBoost, and 0.207 for the Random Forest model. Although predictive accuracy remains comparable to the baseline models, the hybrid model provides more stable and reliable interpretability results for landslide susceptibility. It identifies the slope, elevation, and LS factor as the three most important factors for landslide susceptibility in Xi’an city. Furthermore, the quantitative nonlinear relationships between these predisposing factors and susceptibility were identified, providing empowering knowledge for the landslides risk prevention and urban planning in the regions vulnerable to landslides. Full article

(This article belongs to the Special Issue Advances in Remote Sensing and GIS for Natural Hazards Monitoring and Management)

► Show Figures

Figure 1

13 pages, 830 KiB

Open AccessArticle

Machine Learning-Based Prediction of Postoperative Deep Vein Thrombosis Following Tibial Fracture Surgery

by Humam Baki and İsmail Bülent Özçelik

Diagnostics 2025, 15(14), 1787; https://doi.org/10.3390/diagnostics15141787 - 16 Jul 2025

Abstract

Background/Objectives: Postoperative deep vein thrombosis (DVT) is a common and serious complication after tibial fracture surgery. This study aimed to develop and evaluate machine learning (ML) models to predict the occurrence of DVT following tibia fracture surgery. Methods: A retrospective analysis [...] Read more.

Background/Objectives: Postoperative deep vein thrombosis (DVT) is a common and serious complication after tibial fracture surgery. This study aimed to develop and evaluate machine learning (ML) models to predict the occurrence of DVT following tibia fracture surgery. Methods: A retrospective analysis was conducted on patients who had undergone surgery for isolated tibial fractures. A total of 42 predictive models were developed using combinations of six ML algorithms—logistic regression, support vector machine, random forest, extreme gradient boosting, Light Gradient Boosting Machine (LightGBM), and neural networks—and seven feature selection methods, including SHapley Additive exPlanations (SHAP), Least Absolute Shrinkage and Selection Operator (LASSO), Boruta, recursive feature elimination, univariate filtering, and full-variable inclusion. Model performance was assessed based on discrimination, quantified by the area under the receiver operating characteristic curve (AUC-ROC), and calibration, measured using Brier scores, with internal validation performed via bootstrapping. Results: Of 471 patients, 80 (17.0%) developed postoperative DVT. The ML models achieved high overall accuracy in predicting DVT. Twenty-four models showed similarly excellent discrimination (pairwise AUC comparisons, p > 0.05). The top-performing model (random forest with RFE) attained an AUC of ~0.99, while several others (including LightGBM and SVM-based models) also reached AUC values in the 0.97–0.99 range. Notably, support vector machine models paired with Boruta or LASSO feature selection demonstrated the best calibration (lowest Brier scores), indicating reliable risk estimation. The final selected SVM models achieved high specificity (≥95%) with moderate sensitivity (~75–80%) for DVT detection. Conclusions: ML models demonstrated high accuracy in predicting postoperative DVT following tibial fracture surgery. Support vector machine-based models showed particularly favorable discrimination and calibration. These results suggest the potential utility of ML-based risk stratification to guide individualized prophylaxis, warranting further validation in prospective clinical settings. Full article

(This article belongs to the Special Issue Applications of Artificial Intelligence in Orthopedics)

► Show Figures

Figure 1

22 pages, 1906 KiB

Open AccessArticle

Explainable and Optuna-Optimized Machine Learning for Battery Thermal Runaway Prediction Under Class Imbalance Conditions

by Abir El Abed, Ghalia Nassreddine, Obada Al-Khatib, Mohamad Nassereddine and Ali Hellany

Thermo 2025, 5(3), 23; https://doi.org/10.3390/thermo5030023 - 15 Jul 2025

Viewed by 89

Abstract

Modern energy storage systems for both power and transportation are highly related to lithium-ion batteries (LIBs). However, their safety depends on a potentially hazardous failure mode known as thermal runaway (TR). Predicting and classifying TR causes can widely enhance the safety of power [...] Read more.

Modern energy storage systems for both power and transportation are highly related to lithium-ion batteries (LIBs). However, their safety depends on a potentially hazardous failure mode known as thermal runaway (TR). Predicting and classifying TR causes can widely enhance the safety of power and transportation systems. This paper presents an advanced machine learning method for forecasting and classifying the causes of TR. A generative model for synthetic data generation was used to handle class imbalance in the dataset. Hyperparameter optimization was conducted using Optuna for four classifiers: Support Vector Machine (SVM), Multi-Layer Perceptron (MLP), tabular network (TabNet), and Extreme Gradient Boosting (XGBoost). A three-fold cross-validation approach was used to guarantee a robust evaluation. An open-source database of LIB failure events is used for model training and testing. The XGBoost model outperforms the other models across all TR categories by achieving 100% accuracy and a high recall (1.00). Model results were interpreted using SHapley Additive exPlanations analysis to investigate the most significant factors in TR predictors. The findings show that important TR indicators include energy adjusted for heat and weight loss, heater power, average cell temperature upon activation, and heater duration. These findings guide the design of safer battery systems and preventive monitoring systems for real applications. They can help experts develop more efficient battery management systems, thereby improving the performance and longevity of battery-operated devices. By enhancing the predictive knowledge of temperature-driven failure mechanisms in LIBs, the study directly advances thermal analysis and energy storage safety domains. Full article

► Show Figures

Figure 1

24 pages, 4383 KiB

Open AccessArticle

Predicting Employee Attrition: XAI-Powered Models for Managerial Decision-Making

by İrem Tanyıldızı Baydili and Burak Tasci

Systems 2025, 13(7), 583; https://doi.org/10.3390/systems13070583 - 15 Jul 2025

Viewed by 76

Abstract

Background: Employee turnover poses a multi-faceted challenge to organizations by undermining productivity, morale, and financial stability while rendering recruitment, onboarding, and training investments wasteful. Traditional machine learning approaches often struggle with class imbalance and lack transparency, limiting actionable insights. This study introduces an [...] Read more.

Background: Employee turnover poses a multi-faceted challenge to organizations by undermining productivity, morale, and financial stability while rendering recruitment, onboarding, and training investments wasteful. Traditional machine learning approaches often struggle with class imbalance and lack transparency, limiting actionable insights. This study introduces an Explainable AI (XAI) framework to achieve both high predictive accuracy and interpretability in turnover forecasting. Methods: Two publicly available HR datasets (IBM HR Analytics, Kaggle HR Analytics) were preprocessed with label encoding and MinMax scaling. Class imbalance was addressed via GAN-based synthetic data generation. A three-layer Transformer encoder performed binary classification, and SHapley Additive exPlanations (SHAP) analysis provided both global and local feature attributions. Model performance was evaluated using accuracy, precision, recall, F1 score, and ROC AUC metrics. Results: On the IBM dataset, the Generative Adversarial Network (GAN) Transformer model achieved 92.00% accuracy, 96.67% precision, 87.00% recall, 91.58% F1, and 96.32% ROC AUC. On the Kaggle dataset, it reached 96.95% accuracy, 97.28% precision, 96.60% recall, 96.94% F1, and 99.15% ROC AUC, substantially outperforming classical resampling methods (ROS, SMOTE, ADASYN) and recent literature benchmarks. SHAP explanations highlighted JobSatisfaction, Age, and YearsWithCurrManager as top predictors in IBM and number project, satisfaction level, and time spend company in Kaggle. Conclusion: The proposed GAN Transformer SHAP pipeline delivers state-of-the-art turnover prediction while furnishing transparent, actionable insights for HR decision-makers. Future work should validate generalizability across diverse industries and develop lightweight, real-time implementations. Full article

(This article belongs to the Special Issue Decision-Making in Sustainable Business Models: Prediction and Modeling)

► Show Figures

Figure 1

30 pages, 1095 KiB

Open AccessArticle

Unraveling the Drivers of ESG Performance in Chinese Firms: An Explainable Machine-Learning Approach

by Hyojin Kim and Myounggu Lee

Systems 2025, 13(7), 578; https://doi.org/10.3390/systems13070578 - 14 Jul 2025

Viewed by 196

Abstract

As Chinese firms play pivotal roles in global supply chains, multinational corporations face increasing pressure to ensure ESG accountability across their sourcing networks. Current ESG rating systems lack transparency in incorporating China’s unique industrial, economic, and cultural factors, creating reliability concerns for stakeholders [...] Read more.

As Chinese firms play pivotal roles in global supply chains, multinational corporations face increasing pressure to ensure ESG accountability across their sourcing networks. Current ESG rating systems lack transparency in incorporating China’s unique industrial, economic, and cultural factors, creating reliability concerns for stakeholders managing supply chain sustainability risks. This study develops an explainable artificial intelligence framework using SHAP and permutation feature importance (PFI) methods to predict the ESG performance of Chinese firms. We analyze comprehensive ESG data of 1608 Chinese listed companies over 13 years (2009–2021), integrating financial and non-financial determinants traditionally examined in isolation. Empirical findings demonstrate that random forest algorithms significantly outperform multivariate linear regression in capturing nonlinear ESG relationships. Key non-financial determinants include patent portfolios, CSR training initiatives, pollutant emissions, and charitable donations, while financial factors such as current assets and gearing ratios prove influential. Sectoral analysis reveals that manufacturing firms are evaluated through pollutant emissions and technical capabilities, whereas non-manufacturing firms are assessed on business taxes and intangible assets. These insights provide essential tools for multinational corporations to anticipate supply chain sustainability conditions. Full article

(This article belongs to the Special Issue Navigating Complexity in a Changing World: Challenges and Opportunities for Sustainable and Resilient Supply Chains)

► Show Figures

Figure 1

27 pages, 9829 KiB

Open AccessArticle

An Advanced Ensemble Machine Learning Framework for Estimating Long-Term Average Discharge at Hydrological Stations Using Global Metadata

by Alexandr Neftissov, Andrii Biloshchytskyi, Ilyas Kazambayev, Serhii Dolhopolov and Tetyana Honcharenko

Water 2025, 17(14), 2097; https://doi.org/10.3390/w17142097 - 14 Jul 2025

Viewed by 136

Abstract

Accurate estimation of long-term average (LTA) discharge is fundamental for water resource assessment, infrastructure planning, and hydrological modeling, yet it remains a significant challenge, particularly in data-scarce or ungauged basins. This study introduces an advanced machine learning framework to estimate long-term average discharge [...] Read more.

Accurate estimation of long-term average (LTA) discharge is fundamental for water resource assessment, infrastructure planning, and hydrological modeling, yet it remains a significant challenge, particularly in data-scarce or ungauged basins. This study introduces an advanced machine learning framework to estimate long-term average discharge using globally available hydrological station metadata from the Global Runoff Data Centre (GRDC). The methodology involved comprehensive data preprocessing, extensive feature engineering, log-transformation of the target variable, and the development of multiple predictive models, including a custom deep neural network with specialized pathways and gradient boosting machines (XGBoost, LightGBM, CatBoost). Hyperparameters were optimized using Bayesian techniques, and a weighted Meta Ensemble model, which combines predictions from the best individual models, was implemented. Performance was rigorously evaluated using R², RMSE, and MAE on an independent test set. The Meta Ensemble model demonstrated superior performance, achieving a Coefficient of Determination (R²) of 0.954 on the test data, significantly surpassing baseline and individual advanced models. Model interpretability analysis using SHAP (Shapley Additive explanations) confirmed that catchment area and geographical attributes are the most dominant predictors. The resulting model provides a robust, accurate, and scalable data-driven solution for estimating long-term average discharge, enhancing water resource assessment capabilities and offering a powerful tool for large-scale hydrological analysis. Full article

(This article belongs to the Section New Sensors, New Technologies and Machine Learning in Water Sciences)

► Show Figures

Figure 1

21 pages, 3168 KiB

Open AccessArticle

Prediction on Slip Modulus of Screwed Connection for Timber–Concrete Composite Structures Based on Machine Learning

by Wen-Wu Lu, Yu-Wei Chen, Ji-Gang Xu, Hui-Feng Yang, Hao-Tian Tao, Wei Zheng and Ben-Kai Shi

Buildings 2025, 15(14), 2458; https://doi.org/10.3390/buildings15142458 - 13 Jul 2025

Viewed by 258

Abstract

Screwed connections are widely adopted in timber–concrete composite (TCC) structures. Owing to the diverse connection configurations and complex shear mechanisms, existing empirical models or theoretical formulas cannot accurately and efficiently predict the shear modulus of a screwed connection. Therefore, this study develops machine [...] Read more.

Screwed connections are widely adopted in timber–concrete composite (TCC) structures. Owing to the diverse connection configurations and complex shear mechanisms, existing empirical models or theoretical formulas cannot accurately and efficiently predict the shear modulus of a screwed connection. Therefore, this study develops machine learning (ML) algorithms to accurately predict the slip modulus. A data set including 222 sets of testing results was established by collecting the values of the slip modulus and associated ten features. Four ML methods, including decision tree (DT), random forest (RF), adaptive boosting machine (AdaBoost), and gradient boosting regression tree (GBRT), are adopted to develop the ML algorithm. The Shapley Additive Explanation (SHAP) framework was employed to interpret the effects of related features on the slip modulus. GBRT demonstrated the best accuracy compared with the other three ML methods in terms of four popular quantitative metrics. Moreover, all ML methods showed an evident accuracy advantage compared to existing analytical methods. Through a SHAP analysis, it was found that concrete strength, screw inclination, timber density, and timber type have a large impact on the slip modulus of a screwed connection compared to other input features. Full article

(This article belongs to the Special Issue Performance Analysis of Timber Composite Structures)

► Show Figures

Figure 1

21 pages, 2742 KiB

Open AccessArticle

Origin Traceability of Chinese Mitten Crab (Eriocheir sinensis) Using Multi-Stable Isotopes and Explainable Machine Learning

by Danhe Wang, Chunxia Yao, Yangyang Lu, Di Huang, Yameng Li, Xugan Wu, Weiguo Song and Qinxiong Rao

Foods 2025, 14(14), 2458; https://doi.org/10.3390/foods14142458 - 13 Jul 2025

Viewed by 211

Abstract

The Chinese mitten crab (Eriocheir sinensis) industry is currently facing the challenges of origin fraud, as well as a lack of precision and interpretability of existing traceability methods. Here, we propose a high-precision origin traceability method based on a combination of [...] Read more.

The Chinese mitten crab (Eriocheir sinensis) industry is currently facing the challenges of origin fraud, as well as a lack of precision and interpretability of existing traceability methods. Here, we propose a high-precision origin traceability method based on a combination of stable isotope analysis and interpretable machine learning. We sampled Chinese mitten crabs from six origins representing diverse aquatic environments and farming practices, and analyzed their δ¹³C, δ¹⁵N, δ²H, and δ¹⁸O stable isotope compositions in different sexes and tissues (hepatopancreas, muscle, and gonad). By comparing the classification performance of Random Forest, XGBoost, and Logistic Regression models, we found that the Random Forest model outperformed the others, achieving high accuracy (91.3%) in distinguishing samples from different origins. Interpretation of the optimal Random Forest model, using SHAP (SHapley Additive exPlanations) analysis, identified δ²H in male muscle, δ¹⁵N in female hepatopancreas, and δ¹³C in female hepatopancreas as the most influential features for discriminating geographic origin. This analysis highlighted the crucial role of environmental factors, such as water source, diet, and trophic level, in origin discrimination and demonstrated that isotopic characteristics of different tissues provide unique discriminatory information. This study offers a novel paradigm for stable isotope traceability based on explainable machine learning, significantly enhancing the identification capability and reliability of Chinese mitten crab origin traceability, and holds significant implications for food safety assurance. Full article

(This article belongs to the Section Food Analytical Methods)

► Show Figures

Figure 1

21 pages, 5361 KiB

Open AccessArticle

Inversion of County-Level Farmland Soil Moisture Based on SHAP and Stacking Models

by Hui Zhan, Peng Guo, Jiaxin Hao, Jiali Li and Zixu Wang

Agriculture 2025, 15(14), 1506; https://doi.org/10.3390/agriculture15141506 - 13 Jul 2025

Viewed by 185

Abstract

Accurate monitoring of soil moisture in arid agricultural regions is essential for improving crop production and the efficient management of water resources. This study focuses on Shihezi City in Xinjiang, China. We propose a novel method for soil moisture retrieval by integrating Sentinel-1 [...] Read more.

Accurate monitoring of soil moisture in arid agricultural regions is essential for improving crop production and the efficient management of water resources. This study focuses on Shihezi City in Xinjiang, China. We propose a novel method for soil moisture retrieval by integrating Sentinel-1 and Sentinel-2 remote sensing data. Dual-polarization parameters (VV + VH and VV × VH) were constructed and tested. Pearson correlation analysis showed that these polarization combinations carried the most useful information for soil moisture estimation. We then applied Shapley Additive exPlanations (SHAP) for feature selection, and a Stacking model was used to perform soil moisture inversion based on the selected features. SHAP values derived from the coupled support vector regression (SVR) and random forest regression (RFR) models were used to select an additional six key features for model construction. Building on this framework, a comparative analysis was conducted to evaluate the predictive performance of multivariate linear regression (MLR), RFR, SVR, and a Stacking model that integrates these three models. The results demonstrate that the Stacking model outperformed other approaches in soil moisture retrieval, achieving a higher R² of 0.70 compared to 0.52, 0.61, and 0.62 for MLR, RFR, and SVR, respectively. This process concluded with the use of the Stacking model to generate a county-level farmland soil moisture distribution map, which provides an objective and practical approach to guide agricultural management and the optimized allocation of water resources in arid regions. Full article

(This article belongs to the Section Digital Agriculture)

► Show Figures

Figure 1

27 pages, 10631 KiB

Open AccessArticle

Sensor-Based Yield Prediction in Durum Wheat Under Semi-Arid Conditions Using Machine Learning Across Zadoks Growth Stages

by Süreyya Betül Rufaioğlu, Ali Volkan Bilgili, Erdinç Savaşlı, İrfan Özberk, Salih Aydemir, Amjad Mohamed Ismael, Yunus Kaya and João P. Matos-Carvalho

Remote Sens. 2025, 17(14), 2416; https://doi.org/10.3390/rs17142416 - 12 Jul 2025

Viewed by 309

Abstract

Yield prediction in wheat cultivated under semi-arid climatic conditions is gaining increasing importance for sustainable production strategies and decision support systems. In this study, a time-series-based modeling approach was implemented using sensor-based data (SPAD, NSPAD, NDVI, INSEY, and plant height measurements collected at [...] Read more.

Yield prediction in wheat cultivated under semi-arid climatic conditions is gaining increasing importance for sustainable production strategies and decision support systems. In this study, a time-series-based modeling approach was implemented using sensor-based data (SPAD, NSPAD, NDVI, INSEY, and plant height measurements collected at four different Zadoks growth stages (ZD24, ZD30, ZD31, and ZD32). Five different machine learning algorithms (Random Forest, Gradient Boosting, AdaBoost, LightGBM, and XGBoost) were tested individually for each stage, and the model performances were evaluated using statistical metrics such as R²%, RMSE t/ha, and MAE t/ha. Modeling results revealed that the ZD31 stage (first node detectable) was identified as the most successful phase for prediction accuracy, with the XGBoost model achieving the highest R²% score (81.0). In the same model, RMSE and MAE values were calculated as 0.49 and 0.37, respectively. The LightGBM model also showed remarkable performance during the ZD30 stage, achieving an R²% of 78.0, an RMSE of 0.52, and an MAE of 0.40. The SHAP (SHapley Additive exPlanations) method used to interpret feature importance revealed that the NDVI and INSEY indices contributed the most significant values to prediction accuracy for yield. This study demonstrates that phenology-sensitive yield prediction approaches offer high potential for sensor-based digital applications. Furthermore, the integration of timing, model selection, and explainability provided valuable insights for the development of advanced decision support systems. Full article

(This article belongs to the Special Issue Cropland and Yield Mapping with Multi-source Remote Sensing)

► Show Figures

Figure 1

14 pages, 1519 KiB

Open AccessArticle

Harnessing Radiomics and Explainable AI for the Classification of Usual and Nonspecific Interstitial Pneumonia

by Turkey Refaee, Ouf Aloofy, Khalid Alduraibi, Wael Ageeli, Ali Alyami, Rafat Mohtasib, Naif Majrashi and Philippe Lambin

J. Clin. Med. 2025, 14(14), 4934; https://doi.org/10.3390/jcm14144934 - 11 Jul 2025

Viewed by 304

Abstract

Objectives: Accurate differentiation between usual interstitial pneumonia (UIP) and nonspecific interstitial pneumonia (NSIP) is crucial for guiding treatment in interstitial lung diseases (ILDs). This study evaluates the efficacy of clinical, radiomic, and combined models in classifying UIP and NSIP using high-resolution computed [...] Read more.

Objectives: Accurate differentiation between usual interstitial pneumonia (UIP) and nonspecific interstitial pneumonia (NSIP) is crucial for guiding treatment in interstitial lung diseases (ILDs). This study evaluates the efficacy of clinical, radiomic, and combined models in classifying UIP and NSIP using high-resolution computed tomography (HRCT) scans. Materials and Methods: A retrospective analysis was performed on 105 HRCT scans (UIP = 60, NSIP = 45) from Faisal Hospital and Research Center. Demographic and pulmonary function data formed the clinical model. Radiomic features, extracted using the pyRadiomics package, were refined using recursive feature elimination. A combined model was developed by integrating clinical and radiomic features to assess their complementary diagnostic value. Model performance was assessed via the area under the receiver operating characteristic curve (AUC). SHapley Additive exPlanations (SHAP) analysis, including both global feature importance and individual-level explanations, was used to interpret the model predictions. Results: The clinical model achieved an AUC of 0.62 with a sensitivity of 54% and a specificity of 78%. The radiomic model outperformed it with an AUC of 0.90 with a sensitivity and specificity above 85%. The combined model showed an AUC of 0.86 with a sensitivity of 88% and a specificity of 78%. SHAP analysis identified texture-based features, such as GLCM_Idmn and NGTDM_Contrast, as influential for classification. Conclusions: Radiomic features enhance classification accuracy for UIP and NSIP compared to clinical models. Integrating HCR into clinical workflows may reduce variability and improve diagnostic accuracy in ILD. Future studies should validate findings using larger, multicenter datasets. Full article

(This article belongs to the Section Nuclear Medicine & Radiology)

► Show Figures

Figure 1

Search Results (1,043)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (1,043)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI