MDPI - Publisher of Open Access Journals

30 pages, 9245 KB

Open AccessArticle

Soil Organic Carbon Modelling with Different Input Variables: The Case of the Western Lowlands of Eritrea

by Tumuzghi Tesfay, Elsayed Said Mohamed, Igor Yu. Savin, Dmitry E. Kucher, Nazih Y. Rebouh and Woldeselassie Ogbazghi

Sustainability 2025, 17(21), 9884; https://doi.org/10.3390/su17219884 - 5 Nov 2025

Viewed by 325

Abstract

In Eritrea, efforts are being made to tackle the widespread land degradation and promote natural resources and the agricultural sector. However, these efforts lack digital resources assessment, mapping, planning and monitoring. Thus, we developed soil organic carbon (SOC) predictor models for the Western [...] Read more.

In Eritrea, efforts are being made to tackle the widespread land degradation and promote natural resources and the agricultural sector. However, these efforts lack digital resources assessment, mapping, planning and monitoring. Thus, we developed soil organic carbon (SOC) predictor models for the Western Lowlands of the country, employing 6 machine learning models with different input variables (36, 27, 15, and 08) obtained following these variables selection strategies: (1) all proposed SOC predictor variables; (2) very high multicollinearity (≥0.900 **) reduction; (3) high multicollinearity (≥0.700 **) reduction; (4) the Boruta feature selection algorithm. The results revealed that SOC levels were generally low (mean = 0.43%). Grazing lands, rainfed croplands, and irrigated farmlands all exhibited similarly low SOC values, attributed to unsustainable land management practices that deplete soil nutrients. In contrast, natural forestlands exhibited significantly higher SOC concentrations, highlighting their potential for soil carbon sequestration. Among the tested models, the XGBoost algorithm using 27 covariates achieved the highest predictive performance (RMSE = 0.118, R² = 0.758, RPD = 2.252), whereas the multiple linear regression (MLR) model with 8 variables yielded the lowest performance (RMSE = 0.141, R² = 0.742, RPD = 1.883). Compared to the Boruta-based feature selection, the MLR, PLS, XGBoost, Cubist, and GB models showed performance improvements of 10.41%, 10.06%, 6.72%, 6.50%, and 3.15%, respectively. Rainfall emerged as the most influential predictor of SOC spatial variability in the study area. Other important predictors included temperature, soil taxonomy, SWIR2 and NIR bands from Landsat 8 imagery, as well as sand and clay contents. We conclude that reducing very high multicollinearity is essential for improving model performance across all tested algorithms, while reducing moderate multicollinearity is not consistently necessary. The developed SOC prediction models demonstrate robust predictive capabilities and can serve as effective tools for supporting soil fertility management, land restoration planning, and climate change mitigation strategies in the Western Lowlands of Eritrea. Full article

(This article belongs to the Special Issue Adoption of New Technologies and Practices for Sustainable and Smart Agriculture)

► Show Figures

Figure 1

24 pages, 3499 KB

Open AccessArticle

Integrative Machine Learning Model for Overall Survival Prediction in Breast Cancer Using Clinical and Transcriptomic Data

by Mehmet Kivrak, Hatice Sevim Nalkiran, Oguzhan Kesen and Ihsan Nalkiran

Biology 2025, 14(11), 1539; https://doi.org/10.3390/biology14111539 - 3 Nov 2025

Viewed by 401

Abstract

Breast cancer is the most common malignancy in women, with the Luminal A subtype generally associated with favorable survival. However, age and menopausal status may influence tumor biology and prognosis. To improve prediction beyond conventional models, we analyzed transcriptomic and clinical data from [...] Read more.

Breast cancer is the most common malignancy in women, with the Luminal A subtype generally associated with favorable survival. However, age and menopausal status may influence tumor biology and prognosis. To improve prediction beyond conventional models, we analyzed transcriptomic and clinical data from the METABRIC cohort. Patients with Luminal A breast cancer were stratified into premenopausal, postmenopausal–nongeriatric, and geriatric (≥70 years) groups. Differentially expressed genes (DEGs) were identified, and Boruta feature selection revealed 27 clinical and genomic variables. Random Forest, Logistic Regression, Multilayer Perceptron, and ensemble XGBoost models were trained with stratified 5-fold cross-validation, using SMOTE to correct class imbalance. Principal component analysis showed distinct clustering across age groups, while DEG analysis revealed 41 genes associated with age and survival. Key predictors included clinical variables (age, tumor size, NPI, radiotherapy) and molecular markers (ATM, HERC2, AKT2, FOXO3, CYP3A43). Among ML models, XGBoost demonstrated the highest performance (accuracy 98%, sensitivity 98%, specificity 97%, F1-score 0.99, AUC 0.86), outperforming other algorithms. These findings indicate that age-related transcriptomic changes impact survival in Luminal A breast cancer and that an ML-based integrative approach combining clinical and molecular variables provides superior prognostic accuracy, supporting its potential for clinical application. Full article

► Show Figures

Graphical abstract

21 pages, 2727 KB

Open AccessArticle

Explainable Artificial Intelligence for Ovarian Cancer: Biomarker Contributions in Ensemble Models

by Hasan Ucuzal and Mehmet Kıvrak

Biology 2025, 14(11), 1487; https://doi.org/10.3390/biology14111487 - 24 Oct 2025

Viewed by 387

Abstract

Ovarian cancer’s high mortality is primarily due to late-stage diagnosis, underscoring the critical need for improved early detection tools. This study develops and validates explainable artificial intelligence (XAI) models to discriminate malignant from benign ovarian masses using readily available demographic and laboratory data. [...] Read more.

Ovarian cancer’s high mortality is primarily due to late-stage diagnosis, underscoring the critical need for improved early detection tools. This study develops and validates explainable artificial intelligence (XAI) models to discriminate malignant from benign ovarian masses using readily available demographic and laboratory data. A dataset of 309 patients (140 malignant, 169 benign) with 47 clinical parameters was analyzed. The Boruta algorithm selected 19 significant features, including tumor markers (CA125, HE4, CEA, CA19-9, AFP), hematological indices, liver function tests, and electrolytes. Five ensemble machine learning algorithms were optimized and evaluated using repeated stratified 5-fold cross-validation. The Gradient Boosting model achieved the highest performance with 88.99% (±3.2%) accuracy, 0.934 AUC-ROC, and 0.782 Matthews correlation coefficient. SHAP analysis identified HE4, CEA, globulin, CA125, and age as the most globally important features. Unlike black-box approaches, our XAI framework provides clinically interpretable decision pathways through LIME and SHAP visualizations, revealing how feature values push predictions toward malignancy or benignity. Partial dependence plots illustrated non-linear risk relationships, such as a sharp increase in malignancy probability with CA125 > 35 U/mL. This explainable approach demonstrates that ensemble models can achieve high diagnostic accuracy using routine lab data alone, performing comparably to established clinical indices while ensuring transparency and clinical plausibility. The integration of state-of-the-art XAI techniques highlights established biomarkers and reveals potential novel contributors like inflammatory and hepatic indices, offering a pragmatic, scalable triage tool to augment existing diagnostic pathways, particularly in resource-constrained settings. Full article

(This article belongs to the Special Issue AI Deep Learning Approach to Study Biological Questions (2nd Edition))

► Show Figures

Figure 1

24 pages, 2652 KB

Open AccessArticle

Diabetes Prediction Using Feature Selection Algorithms and Boosting-Based Machine Learning Classifiers

by Fatima Rahman, Sheyum Hossain, Jun-Jiat Tiang and Abdullah-Al Nahid

Diagnostics 2025, 15(20), 2622; https://doi.org/10.3390/diagnostics15202622 - 17 Oct 2025

Viewed by 1218

Abstract

Background: Diabetes mellitus is a significant primary global health concern that requires accurate diagnosis at an early stage to prevent severe complications. However, accurate prediction remains challenging due to limited, noisy, and imbalanced datasets. This study proposes a novel machine learning framework [...] Read more.

Background: Diabetes mellitus is a significant primary global health concern that requires accurate diagnosis at an early stage to prevent severe complications. However, accurate prediction remains challenging due to limited, noisy, and imbalanced datasets. This study proposes a novel machine learning framework for improved diabetes prediction, addressing key challenges such as inadequate feature selection, class imbalance, and data preprocessing. Methods: This proposed work systematically evaluates five feature selection algorithms—Recursive Feature Elimination, Grey Wolf Optimizer, Particle Swarm Optimizer, Genetic Algorithm, and Boruta—using cross-validation and SHAP analysis to enhance feature interpretability. Classification is performed using two boosting algorithms: the light gradient boosting machine algorithm (LGBM) and the extreme gradient boosting algorithm (XGBoost). Results: The proposed framework, using the five most important features selected by the Boruta feature selection algorithm, outperformed other configurations with the LightGBM classifier, achieving an accuracy of 85.16%, an F1-score of 85.41%, and a 54.96% reduction in training time. Conclusions: Additionally, we have benchmarked our approach against recent studies and validated its effectiveness on both the Pima Indian Diabetes Dataset and the newly released DiaHealth dataset, demonstrating robust and accurate early diabetes detection across diverse clinical datasets. This approach offers a cost-effective, interpretable, and clinically relevant solution for early diabetes detection by reducing the number of input features, providing transparent feature importance, and achieving high predictive accuracy with efficient model training. Full article

(This article belongs to the Section Machine Learning and Artificial Intelligence in Diagnostics)

► Show Figures

Figure 1

24 pages, 1582 KB

Open AccessArticle

Future Internet Applications in Healthcare: Big Data-Driven Fraud Detection with Machine Learning

by Konstantinos P. Fourkiotis and Athanasios Tsadiras

Future Internet 2025, 17(10), 460; https://doi.org/10.3390/fi17100460 - 8 Oct 2025

Viewed by 634

Abstract

Hospital fraud detection has often relied on periodic audits that miss evolving, internet-mediated patterns in electronic claims. An artificial intelligence and machine learning pipeline is being developed that is leakage-safe, imbalance aware, and aligned with operational capacity for large healthcare datasets. The preprocessing [...] Read more.

Hospital fraud detection has often relied on periodic audits that miss evolving, internet-mediated patterns in electronic claims. An artificial intelligence and machine learning pipeline is being developed that is leakage-safe, imbalance aware, and aligned with operational capacity for large healthcare datasets. The preprocessing stack integrates four tables, engineers 13 features, applies imputation, categorical encoding, Power transformation, Boruta selection, and denoising autoencoder representations, with class balancing via SMOTE-ENN evaluated inside cross-validation folds. Eight algorithms are compared under a fraud-oriented composite productivity index that weighs recall, precision, MCC, F1, ROC-AUC, and G-Mean, with per-fold threshold calibration and explicit reporting of Type I and Type II errors. Multilayer perceptron attains the highest composite index, while CatBoost offers the strongest control of false positives with high accuracy. SMOTE-ENN provides limited gains once representations regularize class geometry. The calibrated scores support prepayment triage, postpayment audit, and provider-level profiling, linking alert volume to expected recovery and protecting investigator workload. Situated in the Future Internet context, this work targets internet-mediated claim flows and web-accessible provider registries. Governance procedures for drift monitoring, fairness assessment, and change control complete an internet-ready deployment path. The results indicate that disciplined preprocessing and evaluation, more than classifier choice alone, translate AI improvements into measurable economic value and sustainable fraud prevention in digital health ecosystems. Full article

(This article belongs to the Special Issue Information and Future Internet Security, Trust and Privacy—4th Edition)

► Show Figures

Figure 1

18 pages, 7125 KB

Open AccessArticle

Development of Fruit-Specific Spectral Indices and Endmember-Based Analysis for Apple Cultivar Classification Using Hyperspectral Imaging

by Ye-Jin Lee, HwangWeon Jeong, Seoyeon Lee, Eunji Ga, JeongHo Baek, Song Lim Kim, Sang-Ho Kang, Youn-Il Park, Kyung-Hwan Kim and Jae Il Lyu

Horticulturae 2025, 11(10), 1177; https://doi.org/10.3390/horticulturae11101177 - 2 Oct 2025

Viewed by 473

Abstract

Hyperspectral imaging (HSI) has emerged as a powerful tool for non-destructive phenotyping, yet fruit crop applications remain underexplored. We propose a methodological framework to enhance the spectral characterization of apple fruits by identifying robust vegetation indices (VIs) and interpretable endmembers. We screened 284 [...] Read more.

Hyperspectral imaging (HSI) has emerged as a powerful tool for non-destructive phenotyping, yet fruit crop applications remain underexplored. We propose a methodological framework to enhance the spectral characterization of apple fruits by identifying robust vegetation indices (VIs) and interpretable endmembers. We screened 284 Vis, which were evaluated using four feature selection algorithms (Boruta, MI+Lasso, RFE, and ensemble voting), generalizing across red, yellow, green, and purple apple cultivars. An ensemble criterion (≥2 algorithms) yielded 50 selected VIs from the NDSI/DSI/RSI families, preserving > 95% classification accuracy and capturing cultivar-specific variation. Pigment-sensitive wavelength bands were identified via PLS-DA VIP scores and one-vs-rest ANOVA. Using these bands, we formulated a new normalized-difference, ratio, and difference spectral indices tailored to cultivar-specific pigmentation. Several indices achieved >89% classification accuracy and showed patterns consistent with those of anthocyanin, carotenoid, and chlorophyll. A two-stage spectral unmixing pipeline (K-Means → N-FINDR) achieved the lowest reconstruction RMSE (0.043%). This multi-level strategy provides a scalable, interpretable framework for enhancing phenotypic resolution in apple hyperspectral data, contributing to fruit index development and generalized spectral analysis methods for horticultural applications. Full article

(This article belongs to the Section Fruit Production Systems)

► Show Figures

Figure 1

68 pages, 8643 KB

Open AccessArticle

From Sensors to Insights: Interpretable Audio-Based Machine Learning for Real-Time Vehicle Fault and Emergency Sound Classification

by Mahmoud Badawy, Amr Rashed, Amna Bamaqa, Hanaa A. Sayed, Rasha Elagamy, Malik Almaliki, Tamer Ahmed Farrag and Mostafa A. Elhosseini

Machines 2025, 13(10), 888; https://doi.org/10.3390/machines13100888 - 28 Sep 2025

Viewed by 977

Abstract

Unrecognized mechanical faults and emergency sounds in vehicles can compromise safety, particularly for individuals with hearing impairments and in sound-insulated or autonomous driving environments. As intelligent transportation systems (ITSs) evolve, there is a growing need for inclusive, non-intrusive, and real-time diagnostic solutions that [...] Read more.

Unrecognized mechanical faults and emergency sounds in vehicles can compromise safety, particularly for individuals with hearing impairments and in sound-insulated or autonomous driving environments. As intelligent transportation systems (ITSs) evolve, there is a growing need for inclusive, non-intrusive, and real-time diagnostic solutions that enhance situational awareness and accessibility. This study introduces an interpretable, sound-based machine learning framework to detect vehicle faults and emergency sound events using acoustic signals as a scalable diagnostic source. Three purpose-built datasets were developed: one for vehicular fault detection, another for emergency and environmental sounds, and a third integrating both to reflect real-world ITS acoustic scenarios. Audio data were preprocessed through normalization, resampling, and segmentation and transformed into numerical vectors using Mel-Frequency Cepstral Coefficients (MFCCs), Mel spectrograms, and Chroma features. To ensure performance and interpretability, feature selection was conducted using SHAP (explainability), Boruta (relevance), and ANOVA (statistical significance). A two-phase experimental workflow was implemented: Phase 1 evaluated 15 classical models, identifying ensemble classifiers and multi-layer perceptrons (MLPs) as top performers; Phase 2 applied advanced feature selection to refine model accuracy and transparency. Ensemble models such as Extra Trees, LightGBM, and XGBoost achieved over 91% accuracy and AUC scores exceeding 0.99. SHAP provided model transparency without performance loss, while ANOVA achieved high accuracy with fewer features. The proposed framework enhances accessibility by translating auditory alarms into visual/haptic alerts for hearing-impaired drivers and can be integrated into smart city ITS platforms via roadside monitoring systems. Full article

(This article belongs to the Section Vehicle Engineering)

► Show Figures

Figure 1

13 pages, 1587 KB

Open AccessArticle

Glioma Grading by Integrating Radiomic Features from Peritumoral Edema in Fused MRI Images and Automated Machine Learning

by Amir Khorasani

J. Imaging 2025, 11(10), 336; https://doi.org/10.3390/jimaging11100336 - 27 Sep 2025

Cited by 1 | Viewed by 640

Abstract

We aimed to investigate the utility of peritumoral edema-derived radiomic features from magnetic resonance imaging (MRI) image weights and fused MRI sequences for enhancing the performance of machine learning-based glioma grading. The present study utilized the Multimodal Brain Tumor Segmentation Challenge 2023 (BraTS [...] Read more.

We aimed to investigate the utility of peritumoral edema-derived radiomic features from magnetic resonance imaging (MRI) image weights and fused MRI sequences for enhancing the performance of machine learning-based glioma grading. The present study utilized the Multimodal Brain Tumor Segmentation Challenge 2023 (BraTS 2023) dataset. Laplacian Re-decomposition (LRD) was employed to fuse multimodal MRI sequences. The fused image quality was evaluated using the Entropy, standard deviation (STD), peak signal-to-noise ratio (PSNR), and structural similarity index measure (SSIM) metrics. A comprehensive set of radiomic features was subsequently extracted from peritumoral edema regions using PyRadiomics. The Boruta algorithm was applied for feature selection, and an optimized classification pipeline was developed using the Tree-based Pipeline Optimization Tool (TPOT). Model performance for glioma grade classification was evaluated based on accuracy, precision, recall, F1-score, and area under the curve (AUC) parameters. Analysis of fused image quality metrics confirmed that the LRD method produces high-quality fused images. From 851 radiomic features extracted from peritumoral edema regions, the Boruta algorithm selected different sets of informative features in both standard MRI and fused images. Subsequent TPOT automated machine learning optimization analysis identified a fine-tuned Stochastic Gradient Descent (SGD) classifier, trained on features from T₁Gd+FLAIR fused images, as the top-performing model. This model achieved superior performance in glioma grade classification (Accuracy = 0.96, Precision = 1.0, Recall = 0.94, F1-Score = 0.96, AUC = 1.0). Radiomic features derived from peritumoral edema in fused MRI images using the LRD method demonstrated distinct, grade-specific patterns and can be utilized as a non-invasive, accurate, and rapid glioma grade classification method. Full article

(This article belongs to the Topic Machine Learning and Deep Learning in Medical Imaging)

► Show Figures

Figure 1

16 pages, 1286 KB

Open AccessArticle

Integrating Feature Selection, Machine Learning, and SHAP Explainability to Predict Severe Acute Pancreatitis

by İzzet Ustaalioğlu and Rohat Ak

Diagnostics 2025, 15(19), 2473; https://doi.org/10.3390/diagnostics15192473 - 27 Sep 2025

Viewed by 674

Abstract

Background/Objectives: Severe acute pancreatitis (SAP) carries substantial morbidity and resource burden, and early risk stratification remains challenging with conventional scores that require serial observations. The aim of this study was to develop and compare supervised machine-learning (ML) pipelines—integrating feature selection and SHAP-based [...] Read more.

Background/Objectives: Severe acute pancreatitis (SAP) carries substantial morbidity and resource burden, and early risk stratification remains challenging with conventional scores that require serial observations. The aim of this study was to develop and compare supervised machine-learning (ML) pipelines—integrating feature selection and SHAP-based explainability—for early prediction of SAP at emergency department (ED) presentation. Methods: This retrospective, single-center cohort was conducted in a tertiary-care ED between 1 January 2022 and 1 January 2025. Adult patients with acute pancreatitis were identified from electronic records; SAP was classified per the Revised Atlanta criteria (persistent organ failure ≥ 48 h). Six feature-selection methods (univariate AUROC filter, RFE, mRMR, LASSO, elastic net, Boruta) were paired with six classifiers (kNN, elastic-net logistic regression, MARS, random forest, SVM-RBF, XGBoost) to yield 36 pipelines. Discrimination, calibration, and error metrics were estimated with bootstrapping; SHAP was used for model interpretability. Results: Of 743 patients (non-SAP 676; SAP 67), SAP prevalence was 9.0%. Compared with non-SAP, SAP patients more often had hypertension (38.8% vs. 27.1%) and malignancy (19.4% vs. 7.2%); they presented with lower GCS, higher heart and respiratory rates, lower systolic blood pressure, and more frequent peripancreatic fluid (31.3% vs. 16.9%) and pleural effusion (43.3% vs. 17.5%). Albumin was lower by 4.18 g/L, with broader renal–electrolyte and inflammatory derangements. Across the best-performing models, AUROC spanned 0.750–0.826; the top pipeline (RFE–RF features + kNN) reached 0.826, while random-forest-based pipelines showed favorable calibration. SHAP confirmed clinically plausible contributions from routinely available variables. Conclusions: In this study, integrating feature selection with ML produced accurate and interpretable early prediction of SAP using data available at ED arrival. The approach highlights actionable predictors and may support earlier triage and resource allocation; external validation is warranted. Full article

(This article belongs to the Special Issue Artificial Intelligence for Clinical Diagnostic Decision Making)

► Show Figures

Figure 1

30 pages, 8824 KB

Open AccessArticle

Modeling Urban-Vegetation Aboveground Carbon by Integrating Spectral–Textural Features with Tree Height and Canopy Cover Ratio Using Machine Learning

by Yuhao Fang, Yuning Cheng and Yilun Cao

Forests 2025, 16(9), 1381; https://doi.org/10.3390/f16091381 - 28 Aug 2025

Cited by 1 | Viewed by 837

Abstract

Accurately estimating aboveground carbon storage (AGC) of urban vegetation remains a major challenge, due to the heterogeneity and vertical complexity of urban environments, where traditional forest-based remote sensing models often perform poorly. This study integrates multimodal remote sensing data and incorporates two three-dimensional [...] Read more.

Accurately estimating aboveground carbon storage (AGC) of urban vegetation remains a major challenge, due to the heterogeneity and vertical complexity of urban environments, where traditional forest-based remote sensing models often perform poorly. This study integrates multimodal remote sensing data and incorporates two three-dimensional structural features—mean tree height (

H_{m e a n}

) and canopy cover ratio (CCR)—in addition to conventional spectral and textural variables. To minimize redundancy, the Boruta algorithm was applied for feature selection, and four machine learning models (SVR, RF, XGBoost, and CatBoost) were evaluated. Results demonstrate that under multimodal data fusion, three-dimensional features emerge as the dominant predictors, with XGBoost using Boruta-selected variables achieving the highest accuracy (R² = 0.701, RMSE = 0.894 tC/400 m²). Spatial mapping of AGC revealed a “high-aggregation, low-dispersion” pattern, with the model performing best in large, continuous green spaces, while accuracy declined in fragmented or small-scale vegetation patches. Overall, this study highlights the potential of machine learning with multi-source variable inputs for fine-scale urban AGC estimation, emphasizes the importance of three-dimensional vegetation indicators, and provides practical insights for urban carbon assessment and green infrastructure planning. Full article

(This article belongs to the Section Urban Forestry)

► Show Figures

Figure 1

21 pages, 12855 KB

Open AccessArticle

Identification of Novel Lactylation-Related Biomarkers for COPD Diagnosis Through Machine Learning and Experimental Validation

by Chundi Hu, Weiliang Qian, Runling Wei, Gengluan Liu, Qin Jiang, Zhenglong Sun and Hui Li

Biomedicines 2025, 13(8), 2006; https://doi.org/10.3390/biomedicines13082006 - 18 Aug 2025

Viewed by 1262

Abstract

Objective: This study aims to identify clinically relevant lactylation-related biomarkers in chronic obstructive pulmonary disease (COPD) and investigate their potential mechanistic roles in COPD pathogenesis. Methods: Differentially expressed genes (DEGs) were identified from the GSE21359 dataset, followed by weighted gene co-expression network analysis [...] Read more.

Objective: This study aims to identify clinically relevant lactylation-related biomarkers in chronic obstructive pulmonary disease (COPD) and investigate their potential mechanistic roles in COPD pathogenesis. Methods: Differentially expressed genes (DEGs) were identified from the GSE21359 dataset, followed by weighted gene co-expression network analysis (WGCNA) to detect COPD-associated modules. Least absolute shrinkage and selection operator (LASSO) regression and support vector machine–recursive feature elimination (SVM–RFE) algorithms were applied to screen lactylation-related biomarkers, with diagnostic performance evaluated through the ROC curve. Candidates were validated in the GSE76925 dataset for expression and diagnostic robustness. Immune cell infiltration patterns were exhibited using EPIC deconvolution. Single-cell transcriptomics (from GSE173896) were processed via the ‘Seurat’ package encompassing quality control, dimensionality reduction, and cell type annotation. Cell-type-specific markers and intercellular communication networks were delineated using the ‘FindAllMarkers’ package and the ‘CellChat’ R package, respectively. In vitro validation was conducted using a cigarette smoke extract (CSE)-induced COPD model. Results: Integrated transcriptomic approaches and multi-algorithm screening (LASSO/Boruta/SVM–RFE) revealed carbonyl reductase 1 (CBR1) and peroxiredoxin 1 (PRDX1) as core COPD biomarkers enriched in oxidation–reduction and inflammatory pathways, with high diagnostic accuracy (AUC > 0.85). Immune profiling and scRNA-seq delineated macrophage and cancer-associated fibroblasts (CAFs) infiltration with oxidative-redox transcriptional dominance in COPD. CBR1 was significantly upregulated in T cells, neutrophils, and mast cells; and PRDX1 showed significant upregulation in endothelial, macrophage, and ciliated cells. Experimental validation in CSE-induced models confirmed significant upregulation of both biomarkers via transcription PCR (qRT-PCR) and immunofluorescence. Conclusions: CBR1 and PRDX1 are lactylation-associated diagnostic markers, with lactylation-driven redox imbalance implicated in COPD progression. Full article

(This article belongs to the Section Molecular and Translational Medicine)

► Show Figures

Figure 1

24 pages, 2794 KB

Open AccessArticle

Algorithmic Modeling of Generation Z’s Therapeutic Toys Consumption Behavior in an Emotional Economy Context

by Xinyi Ma, Xu Qin and Li Lv

Algorithms 2025, 18(8), 506; https://doi.org/10.3390/a18080506 - 13 Aug 2025

Viewed by 1149

Abstract

The quantification of emotional value and accurate prediction of purchase intention has emerged as a critical interdisciplinary challenge in the evolving emotional economy. Focusing on Generation Z (born 1995–2009), this study proposes a hybrid algorithmic framework integrating text-based sentiment computation, feature selection, and [...] Read more.

The quantification of emotional value and accurate prediction of purchase intention has emerged as a critical interdisciplinary challenge in the evolving emotional economy. Focusing on Generation Z (born 1995–2009), this study proposes a hybrid algorithmic framework integrating text-based sentiment computation, feature selection, and random forest modeling to forecast purchase intention for therapeutic toys and interpret its underlying drivers. First, 856 customer reviews were scraped from Jellycat’s official website and subjected to polarity classification using a fine-tuned RoBERTa-wwm-ext model (F1 = 0.92), with generated sentiment scores and high-frequency keywords mapped as interpretable features. Next, Boruta–SHAP feature selection was applied to 35 structured variables from 336 survey records, retaining 17 significant predictors. The core module employed a RF (random forest) model to estimate continuous “purchase intention” scores, achieving R² = 0.83 and MSE = 0.14 under 10-fold cross-validation. To enhance interpretability, RF model was also utilized to evaluate feature importance, quantifying each feature’s contribution to the model outputs, revealing Social Ostracism (β = 0.307) and Task Overload (β = 0.207) as dominant predictors. Finally, k-means clustering with gap statistics segmented consumers based on emotional relevance, value rationality, and interest level, with model performance compared across clusters. Experimental results demonstrate that our integrated predictive model achieves a balance between forecasting accuracy and decision interpretability in emotional value computation, offering actionable insights for targeted product development and precision marketing in the therapeutic goods sector. Full article

(This article belongs to the Section Algorithms for Multidisciplinary Applications)

► Show Figures

Graphical abstract

13 pages, 830 KB

Open AccessArticle

Machine Learning-Based Prediction of Postoperative Deep Vein Thrombosis Following Tibial Fracture Surgery

by Humam Baki and İsmail Bülent Özçelik

Diagnostics 2025, 15(14), 1787; https://doi.org/10.3390/diagnostics15141787 - 16 Jul 2025

Viewed by 830

Abstract

Background/Objectives: Postoperative deep vein thrombosis (DVT) is a common and serious complication after tibial fracture surgery. This study aimed to develop and evaluate machine learning (ML) models to predict the occurrence of DVT following tibia fracture surgery. Methods: A retrospective analysis [...] Read more.

Background/Objectives: Postoperative deep vein thrombosis (DVT) is a common and serious complication after tibial fracture surgery. This study aimed to develop and evaluate machine learning (ML) models to predict the occurrence of DVT following tibia fracture surgery. Methods: A retrospective analysis was conducted on patients who had undergone surgery for isolated tibial fractures. A total of 42 predictive models were developed using combinations of six ML algorithms—logistic regression, support vector machine, random forest, extreme gradient boosting, Light Gradient Boosting Machine (LightGBM), and neural networks—and seven feature selection methods, including SHapley Additive exPlanations (SHAP), Least Absolute Shrinkage and Selection Operator (LASSO), Boruta, recursive feature elimination, univariate filtering, and full-variable inclusion. Model performance was assessed based on discrimination, quantified by the area under the receiver operating characteristic curve (AUC-ROC), and calibration, measured using Brier scores, with internal validation performed via bootstrapping. Results: Of 471 patients, 80 (17.0%) developed postoperative DVT. The ML models achieved high overall accuracy in predicting DVT. Twenty-four models showed similarly excellent discrimination (pairwise AUC comparisons, p > 0.05). The top-performing model (random forest with RFE) attained an AUC of ~0.99, while several others (including LightGBM and SVM-based models) also reached AUC values in the 0.97–0.99 range. Notably, support vector machine models paired with Boruta or LASSO feature selection demonstrated the best calibration (lowest Brier scores), indicating reliable risk estimation. The final selected SVM models achieved high specificity (≥95%) with moderate sensitivity (~75–80%) for DVT detection. Conclusions: ML models demonstrated high accuracy in predicting postoperative DVT following tibial fracture surgery. Support vector machine-based models showed particularly favorable discrimination and calibration. These results suggest the potential utility of ML-based risk stratification to guide individualized prophylaxis, warranting further validation in prospective clinical settings. Full article

(This article belongs to the Special Issue Applications of Artificial Intelligence in Orthopedics)

► Show Figures

Figure 1

18 pages, 8113 KB

Open AccessArticle

An Interpretable Machine Learning Model Based on Inflammatory–Nutritional Biomarkers for Predicting Metachronous Liver Metastases After Colorectal Cancer Surgery

by Hao Zhu, Danyang Shen, Xiaojie Gan and Ding Sun

Biomedicines 2025, 13(7), 1706; https://doi.org/10.3390/biomedicines13071706 - 12 Jul 2025

Cited by 1 | Viewed by 1098

Abstract

Objective: Tumor progression is regulated by systemic immune status, nutritional metabolism, and the inflammatory microenvironment. This study aims to investigate inflammatory–nutritional biomarkers associated with metachronous liver metastasis (MLM) in colorectal cancer (CRC) and develop a machine learning model for accurate prediction. Methods [...] Read more.

Objective: Tumor progression is regulated by systemic immune status, nutritional metabolism, and the inflammatory microenvironment. This study aims to investigate inflammatory–nutritional biomarkers associated with metachronous liver metastasis (MLM) in colorectal cancer (CRC) and develop a machine learning model for accurate prediction. Methods: This study enrolled 680 patients with CRC who underwent curative resection, randomly allocated into a training set (n = 477) and a validation set (n = 203) in a 7:3 ratio. Feature selection was performed using Boruta and Lasso algorithms, identifying nine core prognostic factors through variable intersection. Seven machine learning (ML) models were constructed using the training set, with the optimal predictive model selected based on comprehensive evaluation metrics. An interactive visualization tool was developed to interpret the dynamic impact of key features on individual predictions. The partial dependence plots (PDPs) revealed a potential dose–response relationship between inflammatory–nutritional markers and MLM risk. Results: Among 680 patients with CRC, the cumulative incidence of MLM at 6 months postoperatively was 39.1%. Multimodal feature selection identified nine key predictors, including the N stage, vascular invasion, carcinoembryonic antigen (CEA), systemic immune–inflammation index (SII), albumin–bilirubin index (ALBI), differentiation grade, prognostic nutritional index (PNI), fatty liver, and T stage. The gradient boosting machine (GBM) demonstrated the best overall performance (AUROC: 0.916, sensitivity: 0.772, specificity: 0.871). The generalized additive model (GAM)-fitted SHAP analysis established, for the first time, risk thresholds for four continuous variables (CEA > 8.14 μg/L, PNI < 44.46, SII > 856.36, ALBI > −2.67), confirming their significant association with MLM development. Conclusions: This study developed a GBM model incorporating inflammatory-nutritional biomarkers and clinical features to accurately predict MLM in colorectal cancer. Integrated with dynamic visualization tools, the model enables real-time risk stratification via a freely accessible web calculator, guiding individualized surveillance planning and optimizing clinical decision-making for precision postoperative care. Full article

(This article belongs to the Special Issue Advances in Hepatology)

► Show Figures

Figure 1

16 pages, 2351 KB

Open AccessArticle

Associations Between Dietary Amino Acid Intake and Elevated High-Sensitivity C-Reactive Protein in Children: Insights from a Cross-Sectional Machine Learning Study

by Lianlong Yu, Xiaodong Zheng, Jilan Li, Changqing Liu, Yiya Liu, Meina Tian, Qianrang Zhu, Zhenchuang Tang and Maoyu Wu

Nutrients 2025, 17(13), 2235; https://doi.org/10.3390/nu17132235 - 5 Jul 2025

Viewed by 1234

Abstract

Background High-sensitivity C-reactive protein (hs-CRP) is a protein that indicates inflammation and the risk of cardiovascular diseases. The intake of dietary amino acids can influence immune and inflammatory reactions. However, studies on the relationship between dietary amino acids and hs-CRP, especially in children, [...] Read more.

Background High-sensitivity C-reactive protein (hs-CRP) is a protein that indicates inflammation and the risk of cardiovascular diseases. The intake of dietary amino acids can influence immune and inflammatory reactions. However, studies on the relationship between dietary amino acids and hs-CRP, especially in children, remain scarce. Methods This cross-sectional study analyzed data from the Nutrition and China Children and Lactating Women Nutrition and Health Survey (2016–2019), focusing on 3514 children (724 with elevated hs-CRP ≥ 3 mg/L and 2790 with normal levels). Dietary information was gathered via a food frequency questionnaire, and hs-CRP levels were obtained from blood samples. Boruta algorithm and propensity scores were used to select and match dietary factors and sample sizes. Machine learning (ML) algorithms and logistic regression models assessed the link between amino acid intake and elevated hs-CRP risk, adjusting for age, sex, BMI, and lifestyle factors. Results The odds ratios (ORs) for elevated hs-CRP were significant for several amino acids, including Ile, Leu, Lys, Ser, Cys, Tyr, His, Pro, SAA, and AAA, with values ranging from 1.10 to 2.07. The LightGBM algorithm was the most effective in predicting elevated hs-CRP risk, achieving an AUC of 0.927. Tyrosine, methionine, cysteine, and proline were identified as important features by SHAP analysis and logistic regression. The intake of Ser, Cys, Tyr, and Pro showed a linear increase in the risk of elevated hs-CRP, especially in individuals with low protein intake and normal weight (p < 0.1). Conclusions Intake of amino acids like Ser, Cys, Tyr, and Pro significantly impacts hs-CRP levels in children, indicating that regulating these could help prevent inflammation-related diseases. This study supports future dietary and health management strategies. This is first large-scale ML study linking amino acids to pediatric inflammation in China. The main limitations are the cross-section design and the use of self-reported dietary data. Full article

(This article belongs to the Special Issue Feature Papers in Proteins and Amino Acids in Relation to Human Health)

► Show Figures

Figure 1

Search Results (120)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (120)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI