MDPI - Publisher of Open Access Journals

13 pages, 830 KiB

Open AccessArticle

Machine Learning-Based Prediction of Postoperative Deep Vein Thrombosis Following Tibial Fracture Surgery

by Humam Baki and İsmail Bülent Özçelik

Diagnostics 2025, 15(14), 1787; https://doi.org/10.3390/diagnostics15141787 - 16 Jul 2025

Viewed by 313

Background/Objectives: Postoperative deep vein thrombosis (DVT) is a common and serious complication after tibial fracture surgery. This study aimed to develop and evaluate machine learning (ML) models to predict the occurrence of DVT following tibia fracture surgery. Methods: A retrospective analysis [...] Read more.

Background/Objectives: Postoperative deep vein thrombosis (DVT) is a common and serious complication after tibial fracture surgery. This study aimed to develop and evaluate machine learning (ML) models to predict the occurrence of DVT following tibia fracture surgery. Methods: A retrospective analysis was conducted on patients who had undergone surgery for isolated tibial fractures. A total of 42 predictive models were developed using combinations of six ML algorithms—logistic regression, support vector machine, random forest, extreme gradient boosting, Light Gradient Boosting Machine (LightGBM), and neural networks—and seven feature selection methods, including SHapley Additive exPlanations (SHAP), Least Absolute Shrinkage and Selection Operator (LASSO), Boruta, recursive feature elimination, univariate filtering, and full-variable inclusion. Model performance was assessed based on discrimination, quantified by the area under the receiver operating characteristic curve (AUC-ROC), and calibration, measured using Brier scores, with internal validation performed via bootstrapping. Results: Of 471 patients, 80 (17.0%) developed postoperative DVT. The ML models achieved high overall accuracy in predicting DVT. Twenty-four models showed similarly excellent discrimination (pairwise AUC comparisons, p > 0.05). The top-performing model (random forest with RFE) attained an AUC of ~0.99, while several others (including LightGBM and SVM-based models) also reached AUC values in the 0.97–0.99 range. Notably, support vector machine models paired with Boruta or LASSO feature selection demonstrated the best calibration (lowest Brier scores), indicating reliable risk estimation. The final selected SVM models achieved high specificity (≥95%) with moderate sensitivity (~75–80%) for DVT detection. Conclusions: ML models demonstrated high accuracy in predicting postoperative DVT following tibial fracture surgery. Support vector machine-based models showed particularly favorable discrimination and calibration. These results suggest the potential utility of ML-based risk stratification to guide individualized prophylaxis, warranting further validation in prospective clinical settings. Full article

(This article belongs to the Special Issue Applications of Artificial Intelligence in Orthopedics)

► Show Figures

Figure 1

18 pages, 8113 KiB

Open AccessArticle

An Interpretable Machine Learning Model Based on Inflammatory–Nutritional Biomarkers for Predicting Metachronous Liver Metastases After Colorectal Cancer Surgery

by Hao Zhu, Danyang Shen, Xiaojie Gan and Ding Sun

Biomedicines 2025, 13(7), 1706; https://doi.org/10.3390/biomedicines13071706 - 12 Jul 2025

Viewed by 444

Abstract

Objective: Tumor progression is regulated by systemic immune status, nutritional metabolism, and the inflammatory microenvironment. This study aims to investigate inflammatory–nutritional biomarkers associated with metachronous liver metastasis (MLM) in colorectal cancer (CRC) and develop a machine learning model for accurate prediction. Methods [...] Read more.

Objective: Tumor progression is regulated by systemic immune status, nutritional metabolism, and the inflammatory microenvironment. This study aims to investigate inflammatory–nutritional biomarkers associated with metachronous liver metastasis (MLM) in colorectal cancer (CRC) and develop a machine learning model for accurate prediction. Methods: This study enrolled 680 patients with CRC who underwent curative resection, randomly allocated into a training set (n = 477) and a validation set (n = 203) in a 7:3 ratio. Feature selection was performed using Boruta and Lasso algorithms, identifying nine core prognostic factors through variable intersection. Seven machine learning (ML) models were constructed using the training set, with the optimal predictive model selected based on comprehensive evaluation metrics. An interactive visualization tool was developed to interpret the dynamic impact of key features on individual predictions. The partial dependence plots (PDPs) revealed a potential dose–response relationship between inflammatory–nutritional markers and MLM risk. Results: Among 680 patients with CRC, the cumulative incidence of MLM at 6 months postoperatively was 39.1%. Multimodal feature selection identified nine key predictors, including the N stage, vascular invasion, carcinoembryonic antigen (CEA), systemic immune–inflammation index (SII), albumin–bilirubin index (ALBI), differentiation grade, prognostic nutritional index (PNI), fatty liver, and T stage. The gradient boosting machine (GBM) demonstrated the best overall performance (AUROC: 0.916, sensitivity: 0.772, specificity: 0.871). The generalized additive model (GAM)-fitted SHAP analysis established, for the first time, risk thresholds for four continuous variables (CEA > 8.14 μg/L, PNI < 44.46, SII > 856.36, ALBI > −2.67), confirming their significant association with MLM development. Conclusions: This study developed a GBM model incorporating inflammatory-nutritional biomarkers and clinical features to accurately predict MLM in colorectal cancer. Integrated with dynamic visualization tools, the model enables real-time risk stratification via a freely accessible web calculator, guiding individualized surveillance planning and optimizing clinical decision-making for precision postoperative care. Full article

(This article belongs to the Special Issue Advances in Hepatology)

► Show Figures

Figure 1

19 pages, 5784 KiB

Open AccessArticle

Identification of Exosome-Associated Biomarkers in Diabetic Foot Ulcers: A Bioinformatics Analysis and Experimental Validation

by Tianbo Li, Lei Gao and Jiangning Wang

Biomedicines 2025, 13(7), 1687; https://doi.org/10.3390/biomedicines13071687 - 10 Jul 2025

Viewed by 450

Abstract

Background: Diabetic foot ulcers (DFUs) are a severe complication of diabetes and are characterized by impaired wound healing and a high amputation risk. Exosomes—which are nanovesicles carrying proteins, RNAs, and lipids—mediate intercellular communication in wound microenvironments, yet their biomarker potential in DFUs remains [...] Read more.

Background: Diabetic foot ulcers (DFUs) are a severe complication of diabetes and are characterized by impaired wound healing and a high amputation risk. Exosomes—which are nanovesicles carrying proteins, RNAs, and lipids—mediate intercellular communication in wound microenvironments, yet their biomarker potential in DFUs remains underexplored. Methods: We analyzed transcriptomic data from GSE134431 (13 DFU vs. 8 controls) as a training set and validated findings in GSE80178 (6 DFU vs. 3 controls). A sum of 7901 differentially expressed genes (DEGs) of DFUs were detected and intersected with 125 literature-curated exosome-related genes (ERGs) to yield 51 candidates. This was followed by GO/KEGG analyses and a PPI network construction. Support vector machine–recursive feature elimination (SVM-RFE) and the Boruta random forest algorithm distilled five biomarkers (DIS3L, EXOSC7, SDC1, STX11, SYT17). Expression trends were confirmed in both datasets. Analyses included nomogram construction, functional and correlation analyses, immune infiltration, GSEA, gene co-expression and regulatory network construction, drug prediction, molecular docking, and RT-qPCR validation in clinical samples. Results: A nomogram combining these markers achieved an acceptable calibration (Hosmer–Lemeshow p = 0.0718, MAE = 0.044). Immune cell infiltration (CIBERSORT) revealed associations between biomarker levels and NK cell and neutrophil subsets. Gene set enrichment analysis (GSEA) implicated IL-17 signaling, proteasome function, and microbial infection pathways. A GeneMANIA network highlighted RNA processing and vesicle trafficking. Transcription factor and miRNA predictions uncovered regulatory circuits, and DGIdb-driven drug repurposing followed by molecular docking identified Indatuximab ravtansine and heparin as high-affinity SDC1 binders. Finally, RT-qPCR validation in clinical DFU tissues (n = 5) recapitulated the bioinformatic expression patterns. Conclusions: We present five exosome-associated genes as novel DFU biomarkers with diagnostic potential and mechanistic links to immune modulation and vesicular transport. These findings lay the groundwork for exosome-based diagnostics and therapeutic targeting in DFU management. Full article

(This article belongs to the Section Cell Biology and Pathology)

► Show Figures

Figure 1

16 pages, 2351 KiB

Open AccessArticle

Associations Between Dietary Amino Acid Intake and Elevated High-Sensitivity C-Reactive Protein in Children: Insights from a Cross-Sectional Machine Learning Study

by Lianlong Yu, Xiaodong Zheng, Jilan Li, Changqing Liu, Yiya Liu, Meina Tian, Qianrang Zhu, Zhenchuang Tang and Maoyu Wu

Nutrients 2025, 17(13), 2235; https://doi.org/10.3390/nu17132235 - 5 Jul 2025

Viewed by 563

Abstract

Background High-sensitivity C-reactive protein (hs-CRP) is a protein that indicates inflammation and the risk of cardiovascular diseases. The intake of dietary amino acids can influence immune and inflammatory reactions. However, studies on the relationship between dietary amino acids and hs-CRP, especially in children, [...] Read more.

Background High-sensitivity C-reactive protein (hs-CRP) is a protein that indicates inflammation and the risk of cardiovascular diseases. The intake of dietary amino acids can influence immune and inflammatory reactions. However, studies on the relationship between dietary amino acids and hs-CRP, especially in children, remain scarce. Methods This cross-sectional study analyzed data from the Nutrition and China Children and Lactating Women Nutrition and Health Survey (2016–2019), focusing on 3514 children (724 with elevated hs-CRP ≥ 3 mg/L and 2790 with normal levels). Dietary information was gathered via a food frequency questionnaire, and hs-CRP levels were obtained from blood samples. Boruta algorithm and propensity scores were used to select and match dietary factors and sample sizes. Machine learning (ML) algorithms and logistic regression models assessed the link between amino acid intake and elevated hs-CRP risk, adjusting for age, sex, BMI, and lifestyle factors. Results The odds ratios (ORs) for elevated hs-CRP were significant for several amino acids, including Ile, Leu, Lys, Ser, Cys, Tyr, His, Pro, SAA, and AAA, with values ranging from 1.10 to 2.07. The LightGBM algorithm was the most effective in predicting elevated hs-CRP risk, achieving an AUC of 0.927. Tyrosine, methionine, cysteine, and proline were identified as important features by SHAP analysis and logistic regression. The intake of Ser, Cys, Tyr, and Pro showed a linear increase in the risk of elevated hs-CRP, especially in individuals with low protein intake and normal weight (p < 0.1). Conclusions Intake of amino acids like Ser, Cys, Tyr, and Pro significantly impacts hs-CRP levels in children, indicating that regulating these could help prevent inflammation-related diseases. This study supports future dietary and health management strategies. This is first large-scale ML study linking amino acids to pediatric inflammation in China. The main limitations are the cross-section design and the use of self-reported dietary data. Full article

(This article belongs to the Special Issue Feature Papers in Proteins and Amino Acids in Relation to Human Health)

► Show Figures

Figure 1

25 pages, 24212 KiB

Open AccessArticle

Spatial Prediction of Soil Organic Carbon Based on a Multivariate Feature Set and Stacking Ensemble Algorithm: A Case Study of Wei-Ku Oasis in China

by Zuming Cao, Xiaowei Luo, Xuemei Wang and Dun Li

Sustainability 2025, 17(13), 6168; https://doi.org/10.3390/su17136168 - 4 Jul 2025

Viewed by 300

Abstract

Accurate estimation of soil organic carbon (SOC) content is crucial for assessing terrestrial ecosystem carbon stocks. Although traditional methods offer relatively high estimation accuracy, they are limited by poor timeliness and high costs. Combining measured data, remote sensing technology, and machine learning (ML) [...] Read more.

Accurate estimation of soil organic carbon (SOC) content is crucial for assessing terrestrial ecosystem carbon stocks. Although traditional methods offer relatively high estimation accuracy, they are limited by poor timeliness and high costs. Combining measured data, remote sensing technology, and machine learning (ML) algorithms enables rapid, efficient, and accurate large-scale prediction. However, single ML models often face issues like high feature variable redundancy and weak generalization ability. Integrated models can effectively overcome these problems. This study focuses on the Weigan–Kuqa River oasis (Wei-Ku Oasis), a typical arid oasis in northwest China. It integrates Sentinel-2A multispectral imagery, a digital elevation model, ERA5 meteorological reanalysis data, soil attribute, and land use (LU) data to estimate SOC. The Boruta algorithm, Lasso regression, and its combination methods were used to screen feature variables, constructing a multidimensional feature space. Ensemble models like Random Forest (RF), Gradient Boosting Machine (GBM), and the Stacking model are built. Results show that the Stacking model, constructed by combining the screened variable sets, exhibited optimal prediction accuracy (test set R² = 0.61, RMSE = 2.17 g∙kg⁻¹, RPD = 1.61), which reduced the prediction error by 9% compared to single model prediction. Difference Vegetation Index (DVI), Bare Soil Evapotranspiration (BSE), and type of land use (TLU) have a substantial multidimensional synergistic influence on the spatial differentiation pattern of the SOC. The implementation of TLU has been demonstrated to exert a substantial influence on the model’s estimation performance, as evidenced by an augmentation of 24% in the R² of the test set. The integration of Boruta–Lasso combination screening and Stacking has been shown to facilitate the construction of a high-precision SOC content estimation model. This model has the capacity to provide technical support for precision fertilization in oasis regions in arid zones and the management of regional carbon sinks. Full article

► Show Figures

Figure 1

26 pages, 2124 KiB

Open AccessArticle

Integrating Boruta, LASSO, and SHAP for Clinically Interpretable Glioma Classification Using Machine Learning

by Mohammad Najeh Samara and Kimberly D. Harry

BioMedInformatics 2025, 5(3), 34; https://doi.org/10.3390/biomedinformatics5030034 - 30 Jun 2025

Viewed by 924

Abstract

Background: Gliomas represent the most prevalent and aggressive primary brain tumors, requiring precise classification to guide treatment strategies and improve patient outcomes. Purpose: This study aimed to develop and evaluate a machine learning-driven approach for glioma classification by identifying the most relevant genetic [...] Read more.

Background: Gliomas represent the most prevalent and aggressive primary brain tumors, requiring precise classification to guide treatment strategies and improve patient outcomes. Purpose: This study aimed to develop and evaluate a machine learning-driven approach for glioma classification by identifying the most relevant genetic and clinical biomarkers while demonstrating clinical utility. Methods: A dataset from The Cancer Genome Atlas (TCGA) containing 23 features was analyzed using an integrative approach combining Boruta, Least Absolute Shrinkage and Selection Operator (LASSO), and SHapley Additive exPlanations (SHAP) for feature selection. The refined feature set was used to train four machine learning models: Random Forest, Support Vector Machine, XGBoost, and Logistic Regression. Comprehensive evaluation included class distribution analysis, calibration assessment, and decision curve analysis. Results: The feature selection approach identified 13 key predictors, including IDH1, TP53, ATRX, PTEN, NF1, EGFR, NOTCH1, PIK3R1, MUC16, CIC mutations, along with Age at Diagnosis and race. XGBoost achieved the highest AUC (0.93), while Logistic Regression recorded the highest testing accuracy (88.09%). Class distribution analysis revealed excellent GBM detection (Average Precision 0.840–0.880) with minimal false negatives (5–7 cases). Calibration analysis demonstrated reliable probability estimates (Brier scores 0.103–0.124), and decision curve analysis confirmed substantial clinical utility with net benefit values of 0.36–0.39 across clinically relevant thresholds. Conclusions: The integration of feature selection techniques with machine learning models enhances diagnostic precision, interpretability, and clinical utility in glioma classification, providing a clinically ready framework that bridges computational predictions with evidence-based medical decision-making. Full article

► Show Figures

Figure 1

23 pages, 7504 KiB

Open AccessArticle

Development and Validation of the Early Gastric Carcinoma Prediction Model in Post-Eradication Patients with Intestinal Metaplasia

by Wulian Lin, Guanpo Zhang, Hong Chen, Weidong Huang, Guilin Xu, Yunmeng Zheng, Chao Gao, Jin Zheng, Dazhou Li and Wen Wang

Cancers 2025, 17(13), 2158; https://doi.org/10.3390/cancers17132158 - 26 Jun 2025

Viewed by 381

Abstract

Background: Gastric cancer (GC) remains a major global health challenge, with rising incidence among patients post-Helicobacter pylori (H. pylori) eradication, particularly those with persistent intestinal metaplasia (IM). Current risk stratification tools are limited in this high-risk population. Aim: [...] Read more.

Background: Gastric cancer (GC) remains a major global health challenge, with rising incidence among patients post-Helicobacter pylori (H. pylori) eradication, particularly those with persistent intestinal metaplasia (IM). Current risk stratification tools are limited in this high-risk population. Aim: To develop, validate, and externally test a machine learning-based prediction model—termed the Early Gastric Cancer Model (EGCM)—for identifying early gastric cancer (EGC) risk in H. pylori-eradicated patients with IM, and to implement it as a web-based clinical tool. Methods: This retrospective, dual-center study enrolled 214 H. pylori-eradicated patients with histologically confirmed IM from 900 Hospital and Fujian Provincial People’s Hospital. The dataset was split into a training cohort (70%) and an internal validation cohort (30%), with an external test cohort from the second center. A total of 21 machine learning algorithms were screened using cross-validation and hyperparameter optimization. Boruta and SHAP analyses were employed for feature selection, and the final EGCM was constructed using the top five predictors: atrophy range, xanthoma, map-like redness (MLR), MLR range, and age. Model performance was evaluated via ROC curves, precision–recall curves, calibration plots, and decision curve analysis (DCA), and compared against conventional inflammatory biomarkers such as NLR and PLR. Results: The CatBoost algorithm demonstrated the best overall performance, achieving an AUC of 0.743 (95% CI: 0.70–0.80) in internal validation and 0.905 in the external test set. The EGCM exhibited superior discrimination compared to individual inflammatory markers (p < 0.01). Calibration analysis confirmed strong agreement between predicted and observed outcomes. DCA showed the EGCM yielded greater net clinical benefit. A web calculator was developed to facilitate clinical application. Conclusions: The EGCM is a validated, interpretable, and practical tool for stratifying EGC risk in H. pylori-eradicated IM patients across multiple centers. Its integration into clinical practice could improve surveillance precision and early cancer detection. Full article

(This article belongs to the Section Cancer Causes, Screening and Diagnosis)

► Show Figures

Figure 1

17 pages, 3203 KiB

Open AccessFeature PaperArticle

Performance Assessment of CCGT Integrated with PTSA-Based CO₂ Capture: Effect of Sorbent Type and Operating Conditions

by Karol Sztekler, Agata Mlonka-Mędrala, Piotr Boruta, Tomasz Bujok, Ewelina Radomska and Łukasz Mika

Energies 2025, 18(13), 3289; https://doi.org/10.3390/en18133289 - 23 Jun 2025

Viewed by 261

Abstract

Recognizing the growing importance of natural gas as a transition fuel in Poland’s energy mix and the necessity of reducing CO₂ emissions, this article aims to assess the use of carbon capture and storage (CCS) technology to effectively reduce CO₂ emissions [...] Read more.

Recognizing the growing importance of natural gas as a transition fuel in Poland’s energy mix and the necessity of reducing CO₂ emissions, this article aims to assess the use of carbon capture and storage (CCS) technology to effectively reduce CO₂ emissions from combined cycle gas turbine (CCGT). The research employs the pressure–temperature swing adsorption (PTSA) to capture CO₂ from flue gases. Computer simulations, using IPSEpro (SimTech), are used to calculate the heat and mass balances for CCGT and PTSA units and assess their performance. In the first part of the research, the effect of sorbent type (Na-A and 5A) and flue gas share directed to the PTSA unit on the performance of the CCGT was investigated. Secondly, the parametric analysis regarding the adsorption and desorption pressures in the PTSA was carried out. The results showed that CO₂ emissions from CCGT can be reduced by 1.1 Mt (megatons) per year, but the use of PTSA was associated with a reduction in net electrical power and efficiency of the CCGT by up to 14.7% for Na-A and 11.1% for 5A sorbent. It was also found that the heat and electricity demand of the PTSA depends on the adsorption and desorption pressures. Full article

(This article belongs to the Section B3: Carbon Emission and Utilization)

► Show Figures

Figure 1

15 pages, 640 KiB

Open AccessArticle

Interpretable Machine Learning for Serum-Based Metabolomics in Breast Cancer Diagnostics: Insights from Multi-Objective Feature Selection-Driven LightGBM-SHAP Models

by Emek Guldogan, Fatma Hilal Yagin, Hasan Ucuzal, Sarah A. Alzakari, Amel Ali Alhussan and Luca Paolo Ardigò

Medicina 2025, 61(6), 1112; https://doi.org/10.3390/medicina61061112 - 19 Jun 2025

Viewed by 951

Abstract

Background and Objectives: Breast cancer accounts for 12.5% of all new cancer cases in women worldwide. Early detection significantly improves survival rates, but traditional biomarkers like CA 15-3 and HER2 lack sensitivity and specificity, particularly for early-stage disease. Advances in metabolomics and machine [...] Read more.

Background and Objectives: Breast cancer accounts for 12.5% of all new cancer cases in women worldwide. Early detection significantly improves survival rates, but traditional biomarkers like CA 15-3 and HER2 lack sensitivity and specificity, particularly for early-stage disease. Advances in metabolomics and machine learning, particularly explainable artificial intelligence (XAI), offer new opportunities for identifying robust biomarkers and improving diagnostic accuracy. This study aimed to identify and validate serum-based metabolic biomarkers for breast cancer using advanced metabolomic profiling techniques and a Light Gradient Boosting Machine (LightGBM) model. Additionally, SHapley Additive exPlanations (SHAP) were applied to enhance model interpretability and biological insight. Materials and Methods: The study included 103 breast cancer patients and 31 healthy controls. Serum samples underwent liquid and gas chromatography–time-of-flight mass spectrometry (LC-TOFMS and GC-TOFMS). Mutual Information (MI), Sparse Partial Least Squares (sPLS), Boruta, and Multi-Objective Feature Selection (MOFS) approaches were applied to the data for biomarker discovery. LightGBM, AdaBoost, and Random Forest were employed for classification and to identify class imbalance with the Synthetic Minority Oversampling Technique (SMOTE). SHAP analysis ranked metabolites based on their contribution to model predictions. Results: Compared to other feature selection approaches, the MOFS approach was more robust in terms of predictive performance, and metabolites identified by this method were used in subsequent analyses for biomarker discovery. LightGBM outperformed the AdaBoost and Random Forest models, achieving 86.6% accuracy, 89.1% sensitivity, 84.2% specificity, and an F1-score of 87.0%. SHAP analysis identified 2-Aminobutyric acid, choline, and coproporphyrin as the most influential metabolites, with dysregulation of these markers associated with breast cancer risk. Conclusions: This study is among the first to integrate SHAP explainability with metabolomic profiling, bridging computational predictions and biological insights for improved clinical adoption. This study demonstrates the effectiveness of combining metabolomics with XAI-driven machine learning for breast cancer diagnostics. The identified biomarkers not only improve diagnostic accuracy but also reveal critical metabolic dysregulations associated with disease progression. Full article

(This article belongs to the Special Issue Recent Advances in Diagnosis and Therapy of Gynecologic and Breast Cancers)

► Show Figures

Figure 1

24 pages, 7335 KiB

Open AccessArticle

Soil Organic Matter Content Prediction Using Multi-Input Convolutional Neural Network Based on Multi-Source Information Fusion

by Li Guo, Qin Gao, Mengyi Zhang, Panting Cheng, Peng He, Lujun Li, Dong Ding, Changcheng Liu, Francis Collins Muga, Masroor Kamal and Jiangtao Qi

Agriculture 2025, 15(12), 1313; https://doi.org/10.3390/agriculture15121313 - 19 Jun 2025

Viewed by 475

Abstract

Soil organic matter (SOM) content is a key indicator for assessing soil health, carbon cycling, and soil degradation. Traditional SOM detection methods are complex and time-consuming and do not meet the modern agricultural demand for rapid, non-destructive analysis. While significant progress has been [...] Read more.

Soil organic matter (SOM) content is a key indicator for assessing soil health, carbon cycling, and soil degradation. Traditional SOM detection methods are complex and time-consuming and do not meet the modern agricultural demand for rapid, non-destructive analysis. While significant progress has been made in spectral inversion for SOM prediction, its accuracy still lags behind traditional chemical methods. This study proposes a novel approach to predict SOM content by integrating spectral, texture, and color features using a three-branch convolutional neural network (3B-CNN). Spectral reflectance data (400–1000 nm) were collected using a portable hyperspectral imaging device. The top 15 spectral bands with the highest correlation were selected from 260 spectral bands using the Correlation Coefficient Method (CCM), Boruta algorithm, and Successive Projections Algorithm (SPA). Compared to other methods, CCM demonstrated superior dimensionality reduction performance, retaining bands highly correlated with SOM, which laid a solid foundation for multi-source data fusion. Additionally, six soil texture features were extracted from soil images taken with a smartphone using the gray-level co-occurrence matrix (GLCM), and twelve color features were obtained through the color histogram. These multi-source features were fused via trilinear pooling. The results showed that the 3B-CNN model, integrating multi-source data, performed exceptionally well in SOM prediction, with an R² of 0.87 and an RMSE of 1.68, a 23% improvement in R² compared to the 1D-CNN model using only spectral data. Incorporating multi-source data into traditional machine learning models (SVM, RF, and PLS) also improved prediction accuracy, with R² improvements ranging from 4% to 11%. This study demonstrates the potential of multi-source data fusion in accurately predicting SOM content, enabling rapid assessment at the field scale and providing a scientific basis for precision fertilization and agricultural management. Full article

(This article belongs to the Section Agricultural Soils)

► Show Figures

Figure 1

20 pages, 1495 KiB

Open AccessArticle

Multi-Indicator Assessment of Heavy Metals Contamination and Ecological Risk Around the Landfills of the Boruta Zgierz Dye Industry Plant in Central Poland

by Wojciech Pietruszewski and Anna Podlasek

Sustainability 2025, 17(12), 5425; https://doi.org/10.3390/su17125425 - 12 Jun 2025

Viewed by 441

Abstract

This study assesses the extent of heavy metals (HMs) contamination and the associated ecological risks in soils surrounding waste landfills at the former Boruta Dye Industry Plant in Zgierz, Poland. Soil samples were collected during two sampling campaigns (summer 2023 and winter 2024) [...] Read more.

This study assesses the extent of heavy metals (HMs) contamination and the associated ecological risks in soils surrounding waste landfills at the former Boruta Dye Industry Plant in Zgierz, Poland. Soil samples were collected during two sampling campaigns (summer 2023 and winter 2024) from 13 locations. Concentrations of Cu, Ni, Zn, Pb, and Cd were measured, and contamination levels were evaluated using several indices: geoaccumulation index (I_geo), pollution index (PI), pollution load index (PLI), Nemerow integrated pollution index (NIPI), ecological risk factor for a single metal (E_rⁱ), index of potential ecological risk (ERI). The highest I_geo value (10.95) was recorded for Cu in the area of the old landfill, which had been in operation for 90 years. The average PI values were Cu—120.97, Pb—52.46, Cd—46.70, Zn—22.19, and Ni—5.38, indicating considerable (3 ≤ PI < 6) to high (PI ≥ 6) contamination levels. The NIPI values, in descending order, were Cu (2102.2) > Pb (270.7) > Zn (88.3) > Cd (62.8) > Ni (21.5), all reflecting high (NIPI >3) contamination levels. The highest PLI was 5.10, with all remaining values exceeding the contamination threshold (PLI >1). The E_rⁱ value for Cu reached 14,852.75, indicating an extremely high (E_rⁱ ≥ 320) ecological risk. The average ERI value across the study area was 1347.2, suggesting a severe (ERI ≥ 600) ecological threat. These findings confirm that the industrial landfills associated with the dye plant constitute a critical pollution hotspot. The results underscore the urgent need for ongoing environmental monitoring, risk mitigation, and site remediation to prevent further environmental degradation and potential contamination of nearby water bodies. Full article

(This article belongs to the Section Pollution Prevention, Mitigation and Sustainability)

► Show Figures

Figure 1

26 pages, 9671 KiB

Open AccessArticle

Fine Resolution Mapping of Forest Soil Organic Carbon Based on Feature Selection and Machine Learning Algorithm

by Yanan Li, Jing Li, Jun Tan, Tianyue Ma, Xingguang Yan, Zongyang Chen and Kunheng Li

Remote Sens. 2025, 17(12), 2000; https://doi.org/10.3390/rs17122000 - 10 Jun 2025

Viewed by 584

Abstract

An accurate forest soil organic carbon (SOC) assessment aids in the ecological restoration of forest mining areas, enabling dynamic monitoring of carbon sink accounting and informed land reclamation decisions. Digital soil mapping (DSM) has enhanced soil monitoring, with machine learning and environmental covariates [...] Read more.

An accurate forest soil organic carbon (SOC) assessment aids in the ecological restoration of forest mining areas, enabling dynamic monitoring of carbon sink accounting and informed land reclamation decisions. Digital soil mapping (DSM) has enhanced soil monitoring, with machine learning and environmental covariates becoming the keys to improving accuracy. This study utilized 32 environmental variables from multispectral, topographic, and soil data, along with 142 soil samples and six machine learning methods to construct a forest SOC model for the Huodong mining district. The performance of Boruta and SHAP (SHapley Additive exPlanations) in optimizing feature selection was evaluated. Ultimately, the optimal machine learning model and feature selection method were applied to map the SOC distribution, with variable contributions quantified using SHAP. The results showed that CatBoost performed best among the six algorithms in predicting the SOC content (R² = 0.70). Both Boruta and SHAP improved the prediction accuracy, with Boruta achieving the highest precision. Introducing the Boruta model increased R² by 8.57% (from 0.70 to 0.76) compared to models without feature selection. The spatial distribution mapping revealed higher SOC concentrations in the southern and northern regions and lower levels in the central area, indicating strong spatial heterogeneity. Key factors influencing the SOC distribution included pH, the nitrogen content, sand content, DEM, and B3 band. Full article

► Show Figures

Figure 1

18 pages, 1917 KiB

Open AccessArticle

In-Season Potato Nitrogen Prediction Using Multispectral Drone Data and Machine Learning

by Ehsan Chatraei Azizabadi, Mohamed El-Shetehy, Xiaodong Cheng, Ali Youssef and Nasem Badreldin

Remote Sens. 2025, 17(11), 1860; https://doi.org/10.3390/rs17111860 - 27 May 2025

Viewed by 766

Abstract

Assessing nitrogen (N) status in potato (Solanum tuberosum L.) during the growing season is crucial for optimizing fertilizer application, aligning it with crop demand, and improving N use efficiency, particularly in Western Canada, where extensive potato cultivation supports the agricultural industry. This [...] Read more.

Assessing nitrogen (N) status in potato (Solanum tuberosum L.) during the growing season is crucial for optimizing fertilizer application, aligning it with crop demand, and improving N use efficiency, particularly in Western Canada, where extensive potato cultivation supports the agricultural industry. This study evaluated the performance of three machine learning (ML) models—Random Forest (RF), Support Vector Machine (SVM), and Gradient Boosting Regression (GBR)—for predicting potato N status and examined the impact of feature selection techniques, including Partial Least Squares Regression (PLSR), Boruta, and Recursive Feature Elimination (RFE). A field experiment was conducted in 2023 and 2024 near Carberry, Manitoba, Canada, with plots receiving different N rates from various fertilizer sources. Multispectral drone imagery was collected throughout the growing seasons, and key vegetation indices (VIs) related to plant N concentration were extracted for model training. Among the VIs, Cl green exhibited the highest correlation with petiole NO₃-N concentration (PNC). The results indicate that RF outperformed SVM and GBR, achieving the highest coefficient of determination (R² = 0.571) and the lowest mean absolute error (MAE = 0.365%) using the RFE feature selection method. Feature selection enhanced model performance in specific cases, notably RF with RFE, and both SVM and GBR with Boruta. These findings highlight the potential of ML-based approaches for in-season potato N monitoring and emphasize the importance of feature selection in enhancing predictive accuracy. Full article

(This article belongs to the Special Issue Remote and Proximal Sensing for Precision Agriculture and Viticulture(2nd Edition))

► Show Figures

Graphical abstract

19 pages, 1617 KiB

Open AccessArticle

A Short-Term Risk Prediction Method Based on In-Vehicle Perception Data

by Xinpeng Yao, Nengchao Lyu and Mengfei Liu

Sensors 2025, 25(10), 3213; https://doi.org/10.3390/s25103213 - 20 May 2025

Viewed by 389

Abstract

Advanced driving assistance systems (ADASs) provide rich data on vehicles and their surroundings, enabling early detection and warning of driving risks. This study proposes a short-term risk prediction method based on in-vehicle perception data, aiming to support real-time risk identification in ADAS environments. [...] Read more.

Advanced driving assistance systems (ADASs) provide rich data on vehicles and their surroundings, enabling early detection and warning of driving risks. This study proposes a short-term risk prediction method based on in-vehicle perception data, aiming to support real-time risk identification in ADAS environments. A variable sliding window approach is employed to determine the optimal prediction window lead length and duration. The method incorporates Monte Carlo simulation for threshold calibration, Boruta-based feature selection, and multiple machine learning models, including the light gradient-boosting machine (LGBM), with performance interpretation via SHAP analysis. Validation is conducted using data from 90 real-world driving sessions. Results show that the optimal prediction lead time and window length are 1.6 s and 1.2 s, respectively, with LGBM achieving the best predictive performance. Risk prediction effectiveness is enhanced when integrating information across the human–vehicle–road environment system. Key features influencing prediction include vehicle speed, accelerator operation, braking deceleration, and the reciprocal of time to collision (TTCi). The proposed approach provides an effective solution for short-term risk prediction and offers algorithmic support for future ADAS applications. Full article

(This article belongs to the Special Issue Intelligent Traffic Safety and Security)

► Show Figures

Figure 1

22 pages, 1088 KiB

Open AccessArticle

Intelligent Feature Selection Ensemble Model for Price Prediction in Real Estate Markets

by Daniel Cristóbal Andrade-Girón, William Joel Marin-Rodriguez and Marcelo Gumercindo Zuñiga-Rojas

Informatics 2025, 12(2), 52; https://doi.org/10.3390/informatics12020052 - 20 May 2025

Viewed by 2040

Abstract

Real estate is crucial to the global economy, propelling economic and social development. This study examines the effects of dimensionality reduction through Recursive Feature Elimination (RFE), Random Forest (RF), and Boruta on real estate price prediction, assessing ensemble models like Bagging, Random Forest, [...] Read more.

Real estate is crucial to the global economy, propelling economic and social development. This study examines the effects of dimensionality reduction through Recursive Feature Elimination (RFE), Random Forest (RF), and Boruta on real estate price prediction, assessing ensemble models like Bagging, Random Forest, Gradient Boosting, AdaBoost, Stacking, Voting, and Extra Trees. The results indicate that the Stacking model achieved the best performance with an MAE (mean absolute error) of 14,090, MSE (mean squared error) of 5.338 × 10⁸, RMSE (root mean square error) of 23,100, R² of 0.924, and a Concordance Correlation Coefficient (CCC) of 0.960, also demonstrating notable computational efficiency with a time of 67.23 s. Gradient Boosting closely followed, with an MAE of 14,540, R² of 0.920, and a CCC of 0.958, requiring 1.76 s for computation. Variable reduction through RFE in both Gradient Boosting and Stacking led to an increase in MAE by 16.9% and 14.6%, respectively, along with slight reductions in R² and CCC. The application of Boruta reduced the variables to 16, maintaining performance in Stacking, with an increase in MAE of 9.8% and a R² of 0.908. These dimensionality reduction techniques enhanced computational efficiency and proved effective for practical applications without significantly compromising accuracy. Future research should explore automatic hyperparameter optimization and hybrid approaches to improve the adaptability and robustness of models in complex contexts. Full article

(This article belongs to the Section Machine Learning)

► Show Figures

Figure 1

Search Results (211)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (211)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI