Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (7)

Search Parameters:
Keywords = Boruta-SHAP feature importance

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
16 pages, 2351 KiB  
Article
Associations Between Dietary Amino Acid Intake and Elevated High-Sensitivity C-Reactive Protein in Children: Insights from a Cross-Sectional Machine Learning Study
by Lianlong Yu, Xiaodong Zheng, Jilan Li, Changqing Liu, Yiya Liu, Meina Tian, Qianrang Zhu, Zhenchuang Tang and Maoyu Wu
Nutrients 2025, 17(13), 2235; https://doi.org/10.3390/nu17132235 - 5 Jul 2025
Viewed by 563
Abstract
Background High-sensitivity C-reactive protein (hs-CRP) is a protein that indicates inflammation and the risk of cardiovascular diseases. The intake of dietary amino acids can influence immune and inflammatory reactions. However, studies on the relationship between dietary amino acids and hs-CRP, especially in children, [...] Read more.
Background High-sensitivity C-reactive protein (hs-CRP) is a protein that indicates inflammation and the risk of cardiovascular diseases. The intake of dietary amino acids can influence immune and inflammatory reactions. However, studies on the relationship between dietary amino acids and hs-CRP, especially in children, remain scarce. Methods This cross-sectional study analyzed data from the Nutrition and China Children and Lactating Women Nutrition and Health Survey (2016–2019), focusing on 3514 children (724 with elevated hs-CRP ≥ 3 mg/L and 2790 with normal levels). Dietary information was gathered via a food frequency questionnaire, and hs-CRP levels were obtained from blood samples. Boruta algorithm and propensity scores were used to select and match dietary factors and sample sizes. Machine learning (ML) algorithms and logistic regression models assessed the link between amino acid intake and elevated hs-CRP risk, adjusting for age, sex, BMI, and lifestyle factors. Results The odds ratios (ORs) for elevated hs-CRP were significant for several amino acids, including Ile, Leu, Lys, Ser, Cys, Tyr, His, Pro, SAA, and AAA, with values ranging from 1.10 to 2.07. The LightGBM algorithm was the most effective in predicting elevated hs-CRP risk, achieving an AUC of 0.927. Tyrosine, methionine, cysteine, and proline were identified as important features by SHAP analysis and logistic regression. The intake of Ser, Cys, Tyr, and Pro showed a linear increase in the risk of elevated hs-CRP, especially in individuals with low protein intake and normal weight (p < 0.1). Conclusions Intake of amino acids like Ser, Cys, Tyr, and Pro significantly impacts hs-CRP levels in children, indicating that regulating these could help prevent inflammation-related diseases. This study supports future dietary and health management strategies. This is first large-scale ML study linking amino acids to pediatric inflammation in China. The main limitations are the cross-section design and the use of self-reported dietary data. Full article
Show Figures

Figure 1

28 pages, 27464 KiB  
Article
Groundwater Potential Mapping Using Optimized Decision Tree-Based Ensemble Learning Model with Local and Global Explainability
by Fatemeh Sadat Hosseini, Ali Jafari, Iman Zandi, Ali Asghar Alesheikh and Fatemeh Rezaie
Water 2025, 17(10), 1520; https://doi.org/10.3390/w17101520 - 17 May 2025
Cited by 1 | Viewed by 893
Abstract
Identifying potential groundwater areas is of great importance for its sustainable management. This study improves groundwater potential mapping in Fars province, Iran, by integrating Random Forest (RF) and Categorical gradient Boosting (CatBoost) models with a Bayesian optimization algorithm. The Boruta–XGBoost algorithm for selecting [...] Read more.
Identifying potential groundwater areas is of great importance for its sustainable management. This study improves groundwater potential mapping in Fars province, Iran, by integrating Random Forest (RF) and Categorical gradient Boosting (CatBoost) models with a Bayesian optimization algorithm. The Boruta–XGBoost algorithm for selecting the most important features and SHapley Additive exPlanation (SHAP) values increased the local and global interpretability of the models. The results showed that the optimized CatBoost model provided a more accurate and reliable groundwater potential map with an Area Under the receiver operating characteristic Curve (AUC) of 0.8778 and a Root Mean Square Error (RMSE) of 0.3779 compared to the RF with an AUC = 0.8396 and RMSE = 0.4072. The CatBoost model also identified 80% of wells with potential 1 in the very high and high potential classes, as well as 60% of wells with potential 0 in the low and very low potential classes. SHAP analysis highlighted land use/land cover and the terrain roughness index as the most impactful features, while porosity and permeability had minimal influence. Also, the contribution of individual features for each mapping unit in the study area was calculated using SHAP analysis and a map of SHAP values was prepared. The proposed approach offers a comprehensive methodology for groundwater potential mapping, encompassing input data identification, key feature selection, machine learning model optimization, and output explanation. This effective procedure can be applied in other areas and regions, providing valuable insights for decision-makers to manage groundwater resources sustainably and ensure water security. Full article
Show Figures

Figure 1

14 pages, 1243 KiB  
Article
The Prognostic Value of the CALLY Index in Sepsis: A Composite Biomarker Reflecting Inflammation, Nutrition, and Immunity
by Ali Sarıdaş and Remzi Çetinkaya
Diagnostics 2025, 15(8), 1026; https://doi.org/10.3390/diagnostics15081026 - 17 Apr 2025
Viewed by 845
Abstract
Background/Objectives: Sepsis remains a leading cause of mortality worldwide, necessitating the development of effective prognostic markers for early risk stratification. The C-reactive protein–albumin–lymphocyte (CALLY) index is a novel biomarker that integrates inflammatory, nutritional, and immunological parameters. This study aimed to evaluate the [...] Read more.
Background/Objectives: Sepsis remains a leading cause of mortality worldwide, necessitating the development of effective prognostic markers for early risk stratification. The C-reactive protein–albumin–lymphocyte (CALLY) index is a novel biomarker that integrates inflammatory, nutritional, and immunological parameters. This study aimed to evaluate the association between the CALLY index and 30-day all-cause mortality in sepsis patients. Methods: This retrospective cohort study included adult patients diagnosed with sepsis in the emergency department between 1 January 2022, and 1 January 2025. The CALLY index was calculated as (CRP × absolute lymphocyte count)/albumin. The primary outcome was 30-day all-cause mortality. Five machine learning models—extreme gradient boosting (XGBoost), multilayer perceptron, random forest, support vector machine, and generalized linear model—were developed for mortality prediction. Four feature selection strategies (gain score, SHAP values, Boruta, and LASSO regression) were used to evaluate predictor consistency. The clinical utility of the CALLY index was assessed using decision curve analysis (DCA). Results: A total of 1644 patients were included, of whom 345 (21.0%) died within 30 days. Among the five machine learning models, the XGBoost model achieved the highest performance (AUC: 0.995, R2: 0.867, MAE: 0.063, RMSE: 0.145). In gain-based feature selection, the CALLY index emerged as the top predictor (gain: 0.187), followed by serum lactate (0.185) and white blood cell count (0.117). The CALLY index also ranked second in SHAP analysis (mean value: 0.317) and first in Boruta importance (mean importance: 37.54). DCA showed the highest net clinical benefit of the CALLY index within the 0.10–0.15 risk threshold range. Conclusions: This study demonstrates that the CALLY index is a significant predictor of 30-day mortality in sepsis patients. Machine learning analysis further reinforced the prognostic value of the CALLY index. Full article
(This article belongs to the Special Issue Diagnosis and Prognosis of Sepsis)
Show Figures

Figure 1

21 pages, 14071 KiB  
Article
Data Integration Based on UAV Multispectra and Proximal Hyperspectra Sensing for Maize Canopy Nitrogen Estimation
by Fuhao Lu, Haiming Sun, Lei Tao and Peng Wang
Remote Sens. 2025, 17(8), 1411; https://doi.org/10.3390/rs17081411 - 16 Apr 2025
Viewed by 678
Abstract
Nitrogen (N) is critical for maize (Zea mays L.) growth and yield, necessitating precise estimation of canopy nitrogen concentration (CNC) to optimize fertilization strategies. Remote sensing technologies, such as proximal hyperspectral sensors and unmanned aerial vehicle (UAV)-based multispectral imaging, offer promising solutions [...] Read more.
Nitrogen (N) is critical for maize (Zea mays L.) growth and yield, necessitating precise estimation of canopy nitrogen concentration (CNC) to optimize fertilization strategies. Remote sensing technologies, such as proximal hyperspectral sensors and unmanned aerial vehicle (UAV)-based multispectral imaging, offer promising solutions for non-destructive CNC monitoring. This study evaluates the effectiveness of proximal hyperspectral sensor and UAV-based multispectral data integration in estimating CNC for spring maize during key growth stages (from the 11th leaf stage, V11, to the Silking stage, R1). Field experiments were conducted to collect multispectral data (20 vegetation indices [MVI] and 24 texture indices [MTI]), hyperspectral data (24 vegetation indices [HVI] and 20 characteristic indices [HCI]), alongside laboratory analysis of 120 CNC samples. The Boruta algorithm identified important features from integrated datasets, followed by correlation analysis between these features and CNC and Random Forest (RF)-based modeling, with SHAP (SHapley Additive exPlanations) values interpreting feature contributions. Results demonstrated the UAV-based multispectral model achieved high accuracy and Computational Efficiency (CE) (R2 = 0.879, RMSE = 0.212, CE = 2.075), outperforming the hyperspectral HVI-HCI model (R2 = 0.832, RMSE = 0.250, CE =2.080). Integrating multispectral and hyperspectral features yields a high-precision model for CNC model estimation (R2 = 0.903, RMSE = 0.190), outperforming standalone multispectral and hyperspectral models by 2.73% and 8.53%, respectively. However, the CE of the integrated model decreased by 1.93% and 1.68%, respectively. Key features included multispectral red-edge indices (NREI, NDRE, CI) and texture parameters (R1m), alongside hyperspectral indices (SR, PRI) and spectral parameters (SDy, Rg) exhibited varying directional impacts on CNC estimation using RF. Together, these findings highlight that the Boruta–RF–SHAP strategy demonstrates the synergistic value of integrating multi-source data from UAV-based multispectral and proximal hyperspectral sensing data for enhancing precise nitrogen management in maize cultivation. Full article
Show Figures

Figure 1

21 pages, 8305 KiB  
Article
Digital Mapping of Soil pH and Driving Factor Analysis Based on Environmental Variable Screening
by He Huang, Yaolin Liu, Yanfang Liu, Zhaomin Tong, Zhouqiao Ren and Yifan Xie
Sustainability 2025, 17(7), 3173; https://doi.org/10.3390/su17073173 - 3 Apr 2025
Cited by 1 | Viewed by 698
Abstract
This study comprehensively considers soil formation factors such as land use types, soil types, depths, and geographical conditions in Lanxi City, China. Using multi-source public data, three environmental variable screening methods, the Boruta algorithm, Recursive Feature Elimination (RFE), and Particle Swarm Optimization (PSO), [...] Read more.
This study comprehensively considers soil formation factors such as land use types, soil types, depths, and geographical conditions in Lanxi City, China. Using multi-source public data, three environmental variable screening methods, the Boruta algorithm, Recursive Feature Elimination (RFE), and Particle Swarm Optimization (PSO), were used to optimize and combine 47 environmental variables for the modeling of soil pH based on the data collected from farmland in the study area in 2022, and their effects were evaluated. A Random Forest (RF) model was used to predict soil pH in the study area. At the same time, Pearson correlation analysis, an environmental variable importance assessment based on the RF model, and SHAP explanatory model were used to explore the main controlling factors of soil pH and reveal its spatial differentiation mechanism. The results showed that in the presence of a large number of environmental variables, the model with covariates selected by PSO before the application of the Random Forest algorithm had higher prediction accuracy than that of Boruta–RF, RFE–RF, and all variable prediction RF models (MAE = 0.496, RMSE = 0.641, R2 = 0.413, LCCC = 0.508). This indicates that PSO, as a covariate selection method, effectively optimized the input variables for the RF model, enhancing its performance. In addition, the results of the Pearson correlation analysis, RF-model-based environmental variable importance assessment, and SHAP explanatory model consistently indicate that Channel Network Base Level (CNBL), Elevation (DEM), Temperature mean (T_m), Evaporation (E_m), Land surface temperature mean (LST_m), and Humidity mean (H_m) are key factors affecting the spatial differentiation of soil pH. In summary, the approach of using PSO for covariate selection before applying the RF model exhibits high prediction accuracy and can serve as an effective method for predicting the spatial distribution of soil pH, providing important references for accurately simulating the spatial mapping of soil attributes in hilly and basin areas. Full article
Show Figures

Graphical abstract

28 pages, 7401 KiB  
Article
A Field-Scale Framework for Assessing the Influence of Measure-While-Drilling Variables on Geotechnical Characterization Using a Boruta-SHAP Approach
by Daniel Goldstein, Chris Aldrich, Quanxi Shao and Louisa O’Connor
Mining 2025, 5(1), 20; https://doi.org/10.3390/mining5010020 - 20 Mar 2025
Cited by 2 | Viewed by 500
Abstract
This study presents an application of Boruta-SHapley Additive ExPlanations (Boruta-SHAP) for geotechnical characterization using Measure-While-Drilling (MWD) data, enabling a more interpretable and statistically rigorous assessment of feature importance. Measure-While-Drilling data collected at the scale of an open-pit mine was [...] Read more.
This study presents an application of Boruta-SHapley Additive ExPlanations (Boruta-SHAP) for geotechnical characterization using Measure-While-Drilling (MWD) data, enabling a more interpretable and statistically rigorous assessment of feature importance. Measure-While-Drilling data collected at the scale of an open-pit mine was used to characterize geotechnical properties using regression-based machine learning models. In contrast to previous studies using MWD data to recognize rock type using Principal Component Analysis (PCA), which only identifies the directions of maximum variance, the Boruta-SHAP method quantifies the individual contribution of each Measure-While-Drilling variable. This method ensures interpretable and reliable geotechnical characterization as well as robust feature selection by comparing predictors against randomized ‘shadow’ features. The Boruta-SHAP analysis revealed that bit air pressure and torque-to-penetration ratio were the most significant predictors of rock strength, contradicting previous assumptions that rate of penetration was the dominant factor. Moreover, feature importance was conducted for fracture frequency and Geological Strength Index (GSI), a rock mass classification system. A comparative analysis of prediction performance was also performed using a range of different machine learning algorithms that resulted in strong coefficient of determinations of actual field or laboratory results versus predicted values. The results are plausible, confirming that MWD data could provide a high-resolution description of geotechnical conditions prior to mining, leading to a more confident prediction of subsurface geotechnical properties. Therefore, the fragmentation from blasting as well as downstream operational phases, such as digging, hauling, and crushing, could be improved effectively. Full article
Show Figures

Figure 1

11 pages, 1609 KiB  
Article
Machine Learning-Based Identification of the Strongest Predictive Variables of Winning and Losing in Belgian Professional Soccer
by Youri Geurkink, Jan Boone, Steven Verstockt and Jan G. Bourgois
Appl. Sci. 2021, 11(5), 2378; https://doi.org/10.3390/app11052378 - 8 Mar 2021
Cited by 36 | Viewed by 7612
Abstract
This study aimed to identify the strongest predictive variables of winning and losing in the highest Belgian soccer division. A predictive machine learning model based on a broad range of variables (n = 100) was constructed, using a dataset consisting of 576 games. [...] Read more.
This study aimed to identify the strongest predictive variables of winning and losing in the highest Belgian soccer division. A predictive machine learning model based on a broad range of variables (n = 100) was constructed, using a dataset consisting of 576 games. To avoid multicollinearity and reduce dimensionality, Variance Inflation Factor (threshold of 5) and BorutaShap were respectively applied. A total of 13 variables remained and were used to predict winning or losing using Extreme Gradient Boosting. TreeExplainer was applied to determine feature importance on a global and local level. The model showed an accuracy of 89.6% ± 3.1% (precision: 88.9%; recall: 90.1%, f1-score: 89.5%), correctly classifying 516 out of 576 games. Shots on target from the attacking penalty box showed to be the best predictor. Several physical indicators are amongst the best predictors, as well as contextual variables such as ELO -ratings, added transfers value of the benched players and match location. The results show the added value of the inclusion of a broad spectrum of variables when predicting and evaluating game outcomes. Similar modelling approaches can be used by clubs to identify the strongest predictive variables for their leagues, and evaluate and improve their current quantitative analyses. Full article
(This article belongs to the Section Computing and Artificial Intelligence)
Show Figures

Figure 1

Back to TopTop