MDPI - Publisher of Open Access Journals

25 pages, 5013 KB

Open AccessArticle

Machine Learning Approaches for Fatigue Life Prediction of Steel and Feature Importance Analyses

by Babak Naeim, Ali Javadzade Khiavi, Erfan Khajavi, Amir Reza Taghavi Khanghah, Ali Asgari, Reza Taghipour and Mohsen Bagheri

Infrastructures 2025, 10(11), 295; https://doi.org/10.3390/infrastructures10110295 - 6 Nov 2025

Abstract

Predicting fatigue behavior in steel components is highly challenging due to the nonlinear and uncertain nature of material degradation under cyclic loading. In this study, four hybrid machine learning models were developed—Histogram Gradient Boosting optimized with Prairie Dog Optimization (HGPD), Histogram Gradient Boosting [...] Read more.

Predicting fatigue behavior in steel components is highly challenging due to the nonlinear and uncertain nature of material degradation under cyclic loading. In this study, four hybrid machine learning models were developed—Histogram Gradient Boosting optimized with Prairie Dog Optimization (HGPD), Histogram Gradient Boosting optimized with Wild Geese Algorithm (HGGW), Categorical Gradient Boosting optimized with Prairie Dog Optimization (CAPD), and Categorical Gradient Boosting optimized with Wild Geese Algorithm (CAGW)—by coupling two advanced ensemble learning frameworks, Histogram Gradient Boosting (HGB) and Categorical Gradient Boosting (CAT), with two emerging metaheuristic optimization algorithms, Prairie Dog Optimization (PDO) and Wild Geese Algorithm (WGA). This integrated approach aims to enhance the accuracy, generalization, and robustness of predictive modeling for steel fatigue life assessment. Shapley Additive Explanations (SHAP) were employed to quantify feature importance and enhance interpretability. Results revealed that reduction ratio (RedRatio) and total heat treatment time (THT) exhibited the highest variability, with RedRatio emerging as the dominant factor due to its wide range and significant influence on model outcomes. The SHAP-driven analysis provided clear insights into complex interactions among processing parameters and fatigue behavior, enabling effective feature selection without loss of accuracy. Overall, integrating gradient boosting with novel optimization algorithms substantially improved predictive accuracy and robustness, advancing decision-making in materials science. Full article

► Show Figures

Figure 1

24 pages, 3499 KB

Open AccessArticle

Integrative Machine Learning Model for Overall Survival Prediction in Breast Cancer Using Clinical and Transcriptomic Data

by Mehmet Kivrak, Hatice Sevim Nalkiran, Oguzhan Kesen and Ihsan Nalkiran

Biology 2025, 14(11), 1539; https://doi.org/10.3390/biology14111539 - 3 Nov 2025

Viewed by 202

Abstract

Breast cancer is the most common malignancy in women, with the Luminal A subtype generally associated with favorable survival. However, age and menopausal status may influence tumor biology and prognosis. To improve prediction beyond conventional models, we analyzed transcriptomic and clinical data from [...] Read more.

Breast cancer is the most common malignancy in women, with the Luminal A subtype generally associated with favorable survival. However, age and menopausal status may influence tumor biology and prognosis. To improve prediction beyond conventional models, we analyzed transcriptomic and clinical data from the METABRIC cohort. Patients with Luminal A breast cancer were stratified into premenopausal, postmenopausal–nongeriatric, and geriatric (≥70 years) groups. Differentially expressed genes (DEGs) were identified, and Boruta feature selection revealed 27 clinical and genomic variables. Random Forest, Logistic Regression, Multilayer Perceptron, and ensemble XGBoost models were trained with stratified 5-fold cross-validation, using SMOTE to correct class imbalance. Principal component analysis showed distinct clustering across age groups, while DEG analysis revealed 41 genes associated with age and survival. Key predictors included clinical variables (age, tumor size, NPI, radiotherapy) and molecular markers (ATM, HERC2, AKT2, FOXO3, CYP3A43). Among ML models, XGBoost demonstrated the highest performance (accuracy 98%, sensitivity 98%, specificity 97%, F1-score 0.99, AUC 0.86), outperforming other algorithms. These findings indicate that age-related transcriptomic changes impact survival in Luminal A breast cancer and that an ML-based integrative approach combining clinical and molecular variables provides superior prognostic accuracy, supporting its potential for clinical application. Full article

► Show Figures

Graphical abstract

36 pages, 1090 KB

Open AccessArticle

Integrating Linguistic and Eye Movements Features for Arabic Text Readability Assessment Using ML and DL Models

by Ibtehal Baazeem, Hend Al-Khalifa and Abdulmalik Al-Salman

Computation 2025, 13(11), 258; https://doi.org/10.3390/computation13110258 - 3 Nov 2025

Viewed by 258

Abstract

Evaluating text readability is crucial for supporting both language learners and native readers in selecting appropriate materials. Cognitive psychology research, leveraging behavioral data such as eye-tracking and electroencephalogram (EEG) signals, has demonstrated effectiveness in identifying cognitive activities associated with text difficulty during reading. [...] Read more.

Evaluating text readability is crucial for supporting both language learners and native readers in selecting appropriate materials. Cognitive psychology research, leveraging behavioral data such as eye-tracking and electroencephalogram (EEG) signals, has demonstrated effectiveness in identifying cognitive activities associated with text difficulty during reading. However, the distinctive linguistic characteristics of Arabic present unique challenges for applying such data in readability assessments. While behavioral signals have been explored for this purpose, their potential for Arabic remains underutilized. This study aims to advance Arabic readability assessments by integrating eye-tracking features into computational models. It presents a series of experiments that utilize both text-based and gaze-based features within machine learning (ML) and deep learning (DL) frameworks. The gaze-based features were extracted from the AraEyebility corpus, which contains eye-tracking data collected from 15 native Arabic speakers. The experimental results show that ensemble ML models, particularly AdaBoost with linguistic and eye-tracking handcrafted features, outperform ML models using TF-IDF and DL models employing word embedding vectorization. Among the DL models, convolutional neural networks (CNNs) achieved the best performance with combined linguistic and eye-tracking features. These findings underscore the value of cognitive data and emphasize the need for exploration to fully realize its potential in Arabic readability assessment. Full article

(This article belongs to the Special Issue Recent Advances on Computational Linguistics and Natural Language Processing)

► Show Figures

Figure 1

23 pages, 5331 KB

Open AccessArticle

Training and Optimization of a Rice Disease Detection Model Based on Ensemble Learning

by Jihong Sun, Peng Tian, Jiawei Zhao, Haokai Zhang and Ye Qian

Agriculture 2025, 15(21), 2283; https://doi.org/10.3390/agriculture15212283 - 2 Nov 2025

Viewed by 240

Abstract

Accurate and reliable detection of rice diseases and pests is crucial for ensuring food security. However, traditional deep learning methods often suffer from high rates of missed and false detections when dealing with complex field environments, especially in the presence of tiny disease [...] Read more.

Accurate and reliable detection of rice diseases and pests is crucial for ensuring food security. However, traditional deep learning methods often suffer from high rates of missed and false detections when dealing with complex field environments, especially in the presence of tiny disease spots, due to insufficient feature extraction capabilities. To address this issue, this study proposes a high-precision rice disease detection method based on ensemble learning and conducts experiments on a self-built dataset of 12,572 images containing five types of diseases and one type of pest. The ensemble learning model is optimized and constructed through a phased approach: First, using YOLOv8s as the baseline, transfer learning is performed with the agriculture-related dataset PlantDoc. Subsequently, a P2 small-object detection head, an EMA mechanism, and the Focal Loss function are introduced to build an optimized single model, which achieves an mAP_0.5 of 0.899, an absolute improvement of 5.5% compared to the baseline YOLOv8s. Then, three high-performance YOLO object detection models, including the improved model mentioned above, are selected, and the Weighted Box Fusion technique is used to integrate their prediction results to construct the final Ensemble-WBF model. Finally, the AP_0.5 and AR_0.5:0.95 of the model reach 0.922 and 0.648, respectively, with absolute improvements of 2.2% and 3.2% compared to the improved single model, further reducing the false and missed detection rates. The experimental results show that the ensemble learning method proposed in this study can effectively overcome the interference of complex backgrounds, significantly improve the detection accuracy and robustness for tiny and similar diseases, and reduce the missed detection rate, providing an efficient technical solution for the accurate and automated monitoring of rice diseases in real agricultural scenarios. Full article

(This article belongs to the Section Artificial Intelligence and Digital Agriculture)

► Show Figures

Figure 1

26 pages, 4723 KB

Open AccessArticle

Time-Frequency-Based Separation of Earthquake and Noise Signals on Real Seismic Data: EMD, DWT and Ensemble Classifier Approaches

by Yunus Emre Erdoğan and Ali Narin

Sensors 2025, 25(21), 6671; https://doi.org/10.3390/s25216671 - 1 Nov 2025

Viewed by 252

Abstract

Earthquakes are sudden and destructive natural events caused by tectonic movements in the Earth’s crust. Although they cannot be predicted with certainty, rapid and reliable detection is essential to reduce loss of life and property. This study aims to automatically distinguish earthquake and [...] Read more.

Earthquakes are sudden and destructive natural events caused by tectonic movements in the Earth’s crust. Although they cannot be predicted with certainty, rapid and reliable detection is essential to reduce loss of life and property. This study aims to automatically distinguish earthquake and noise signals from real seismic data by analyzing time-frequency features. Signals were scaled using z-score normalization, and extracted with Empirical Mode Decomposition (EMD), Discrete Wavelet Transform (DWT), and combined EMD+DWT methods. Feature selection methods such as Lasso, ReliefF, and Student’s t-test were applied to identify the most discriminative features. Classification was performed with Ensemble Bagged Trees, Decision Trees, Random Forest, k-Nearest Neighbors (k-NN), and Support Vector Machines (SVM). The highest performance was achieved using the RF classifier with the Lasso-based EMD+DWT feature set, reaching 100% accuracy, specificity, and sensitivity. Overall, DWT and EMD+DWT features yielded higher performance than EMD alone. While k-NN and SVM were less effective, tree-based methods achieved superior results. Moreover, Lasso and ReliefF outperformed Student’s t-test. These findings show that time-frequency-based features are crucial for separating earthquake signals from noise and provide a basis for improving real-time detection. The study contributes to the academic literature and holds significant potential for integration into early warning and earthquake monitoring systems. Full article

(This article belongs to the Special Issue Advanced Environmental Sensing Towards Acoustic Monitoring and Modeling: Applications and Challenges)

► Show Figures

Figure 1

12 pages, 1247 KB

Open AccessArticle

Artificial Intelligence-Assisted Wrist Radiography Analysis in Orthodontics: Classification of Maturation Stage

by Nursezen Kavasoglu, Omer Faruk Ertugrul, Seda Kotan, Yunus Hazar and Veysel Eratilla

Appl. Sci. 2025, 15(21), 11681; https://doi.org/10.3390/app152111681 - 31 Oct 2025

Viewed by 142

Abstract

This study aims to evaluate the ability of an artificial intelligence (AI) model developed for use in the field of orthodontics to accurately and reliably classify skeletal maturation stages of individuals using hand–wrist radiographs. A total of 809 grayscale hand–wrist radiographs (250 × [...] Read more.

This study aims to evaluate the ability of an artificial intelligence (AI) model developed for use in the field of orthodontics to accurately and reliably classify skeletal maturation stages of individuals using hand–wrist radiographs. A total of 809 grayscale hand–wrist radiographs (250 × 250 px; pre-peak n = 400, peak n = 100, post-peak n = 309) were analyzed using four complementary image-based feature extraction methods: Local Binary Pattern (LBP), Histogram of Oriented Gradients (HOG), Zernike Moments (ZM), and Intensity Histogram (IH). These methods generated 2355 features per image, of which 2099 were retained after variance thresholding. The most informative 1250 features were selected using the ANOVA F-test and classified with a stacking-based machine learning (ML) architecture composed of Light Gradient Boosting Machine (LightGBM) and Logistic Regression (LR) as base learners, and Random Forest (RF) as the meta-learner. Across all evaluation folds, the average performance of the model was Accuracy = 83.42%, Precision = 84.48%, Recall = 83.42%, and F1 = 83.50%. The proposed model achieved 87.5% accuracy, 87.8% precision, 87.5% recall, and an F1-score of 87.6% in 10-fold cross-validation, with a macro-average area under the ROC curve (AUC) of 0.96. The pre-peak stage, corresponding to the period of maximum growth velocity, was identified with 92.5% accuracy. These findings indicate that integrating handcrafted radiographic features with ensemble learning can enhance diagnostic precision, reduce observer variability, and accelerate evaluation. The model provides an interpretable and clinically applicable AI-based decision-support tool for skeletal maturity assessment in orthodontic practice. Full article

► Show Figures

Figure 1

32 pages, 2684 KB

Open AccessArticle

Hybrid Framework for Cartilage Damage Detection from Vibroacoustic Signals Using Ensemble Empirical Mode Decomposition and CNNs

by Anna Machrowska, Robert Karpiński, Marcin Maciejewski, Józef Jonak, Przemysław Krakowski and Arkadiusz Syta

Sensors 2025, 25(21), 6638; https://doi.org/10.3390/s25216638 - 29 Oct 2025

Viewed by 485

Abstract

This study proposes a hybrid analytical framework for detecting chondromalacia using vibroacoustic (VAG) signals from patients with knee osteoarthritis (OA) and healthy controls (HCs). The methodology combines nonlinear signal decomposition, feature extraction, and deep learning classification. Raw VAG signals, recorded with a custom [...] Read more.

This study proposes a hybrid analytical framework for detecting chondromalacia using vibroacoustic (VAG) signals from patients with knee osteoarthritis (OA) and healthy controls (HCs). The methodology combines nonlinear signal decomposition, feature extraction, and deep learning classification. Raw VAG signals, recorded with a custom multi-sensor system during open (OKC) and closed (CKC) kinetic chain knee flexion–extension, underwent preprocessing (denoising, segmentation, normalization). Ensemble Empirical Mode Decomposition (EEMD) was used to isolate Intrinsic Mode Functions (IMFs), and Detrended Fluctuation Analysis (DFA) computed local (α₁) and global (α₂) scaling exponents as well as breakpoint location. Frequency–energy features of IMFs were statistically assessed and selected via Neighborhood Component Analysis (NCA) for support vector machine (SVM) classification. Additionally, reconstructed α₁/α₂-based signals and raw signals were converted into continuous wavelet transform (CWT) scalograms, classified with convolutional neural networks (CNNs) at two resolutions. The SVM approach achieved the best performance in CKC conditions (accuracy 0.87, AUC 0.91). CNN classification on CWT scalograms also demonstrated robust OA/HC discrimination with acceptable computational times at higher resolutions. Results suggest that combining multiscale decomposition, nonlinear fluctuation analysis, and deep learning enables accurate, non-invasive detection of cartilage degeneration, with potential for early knee pathology diagnosis. Full article

(This article belongs to the Special Issue Biomedical Imaging, Sensing and Signal Processing)

► Show Figures

Figure 1

28 pages, 1624 KB

Open AccessArticle

Domain-Constrained Stacking Framework for Credit Default Prediction

by Ming-Liang Ding, Yu-Liang Ma and Fu-Qiang You

Mathematics 2025, 13(21), 3451; https://doi.org/10.3390/math13213451 - 29 Oct 2025

Viewed by 312

Abstract

Accurate and reliable credit risk classification is fundamental to the stability of financial systems and the efficient allocation of capital. However, with the rapid expansion of customer information in both volume and complexity, traditional rule-based or purely statistical approaches have become increasingly inadequate. [...] Read more.

Accurate and reliable credit risk classification is fundamental to the stability of financial systems and the efficient allocation of capital. However, with the rapid expansion of customer information in both volume and complexity, traditional rule-based or purely statistical approaches have become increasingly inadequate. Motivated by these challenges, this study introduces a domain-constrained stacking ensemble framework that systematically integrates business knowledge with advanced machine learning techniques. First, domain heuristics are embedded at multiple stages of the pipeline: threshold-based outlier removal improves data quality, target variable redefinition ensures consistency with industry practice, and feature discretization with monotonicity verification enhances interpretability. Then, each variable is transformed through Weight-of-Evidence (WOE) encoding and evaluated via Information Value (IV), which enables robust feature selection and effective dimensionality reduction. Next, on this transformed feature space, we train logistic regression (LR), random forest (RF), extreme gradient boosting (XGBoost), and a two-layer stacking ensemble. Finally, the ensemble aggregates cross-validated out-of-fold predictions from LR, RF and XGBoost as meta-features, which are fused by a meta-level logistic regression, thereby capturing both linear and nonlinear relationships while mitigating overfitting. Experimental results across two credit datasets demonstrate that the proposed framework achieves superior predictive performance compared with single models, highlighting its potential as a practical solution for credit risk assessment in real-world financial applications. Full article

► Show Figures

Figure 1

35 pages, 10688 KB

Open AccessArticle

Multi-Armed Bandit Optimization for Explainable AI Models in Chronic Kidney Disease Risk Evaluation

by Jianbo Huang, Long Li and Jia Chen

Symmetry 2025, 17(11), 1808; https://doi.org/10.3390/sym17111808 - 27 Oct 2025

Viewed by 356

Abstract

Chronic kidney disease (CKD) impacts over 850 million people globally, representing a critical public health issue, yet existing risk assessment methodologies inadequately address the complexity of disease progression trajectories. Traditional machine learning approaches encounter critical limitations including inefficient hyperparameter selection and lack of [...] Read more.

Chronic kidney disease (CKD) impacts over 850 million people globally, representing a critical public health issue, yet existing risk assessment methodologies inadequately address the complexity of disease progression trajectories. Traditional machine learning approaches encounter critical limitations including inefficient hyperparameter selection and lack of clinical transparency, hindering their deployment in healthcare settings. This study introduces an innovative computational framework that integrates adaptive Multi-Armed Bandit (MAB) strategies with BorderlineSMOTE sampling techniques to improve CKD risk assessment. The proposed methodology leverages XGBoost within an ensemble learning paradigm enhanced by Upper Confidence Bound exploration strategy, coupled with a comprehensive interpretability system incorporating SHAP and LIME analytical tools to ensure model transparency. To address the challenge of algorithmic interpretability while maintaining clinical utility, a four-level risk categorization framework was developed, employing cross-validated stratification methods and balanced performance evaluation metrics, thereby ensuring fair predictive accuracy across diverse patient populations and minimizing bias toward dominant risk categories. Through rigorous empirical evaluation on clinical datasets, we performed extensive comparative analysis against sixteen established algorithms using paired statistical testing with Bonferroni correction. The MAB-optimized framework achieved superior predictive performance with accuracy of 91.8%, F1-score of 91.0%, and ROC-AUC of 97.8%, demonstrating superior performance within the evaluated cohort of reference algorithms (p-value < 0.001). Remarkably, our optimized framework delivered nearly ten-fold computational efficiency gains relative to conventional grid search methods while preserving robust classification performance. Feature importance analysis identified albumin-to-creatinine ratio, eGFR measurements, and CKD staging as dominant prognostic factors, demonstrating concordance with established clinical nephrology practice. This research addresses three core limitations in healthcare artificial intelligence: optimization computational cost, model interpretability, and consistent performance across heterogeneous clinical populations, offering a practical solution for improved CKD risk stratification in clinical practice. Full article

(This article belongs to the Special Issue Simulation and Modelling in Natural Sciences, Biomedicine and Engineering III)

► Show Figures

Figure 1

20 pages, 3577 KB

Open AccessArticle

Hyperspectral Remote Sensing and Artificial Intelligence for High-Resolution Soil Moisture Prediction

by Ki-Sung Kim, Junwon Lee, Jeongjun Park, Gigwon Hong and Kicheol Lee

Water 2025, 17(21), 3069; https://doi.org/10.3390/w17213069 - 27 Oct 2025

Viewed by 396

Abstract

Reliable field estimation of soil moisture supports hydrology and water resources management. This study develops a drone-based hyperspectral approach in which visible and near-infrared reflectance is paired one-to-one with gravimetric water content measured by oven drying, yielding 1000 matched samples. After standardization, outlier [...] Read more.

Reliable field estimation of soil moisture supports hydrology and water resources management. This study develops a drone-based hyperspectral approach in which visible and near-infrared reflectance is paired one-to-one with gravimetric water content measured by oven drying, yielding 1000 matched samples. After standardization, outlier control, ranked wavelength selection, and light feature engineering, several predictors were evaluated. Conventional machine learning methods, including simple and multiple regression and tree-based ensembles, were limited by band collinearity and piecewise approximations and therefore failed to meet the accuracy target. Gradient boosting reached the target but used different trade-offs in variable sensitivity. An artificial neural network with three hidden layers, rectified linear unit activations, and dropout was trained using a feature count sweep and early stopping. With ten predictors, the model achieved a coefficient of determination of 0.9557, demonstrating accurate mapping from hyperspectral reflectance to gravimetric water content and providing a reproducible framework suitable for larger, multi date acquisitions and operational decision support. Full article

(This article belongs to the Special Issue Applications of Remote Sensing in Hydrology and Water Resource Management)

► Show Figures

Figure 1

25 pages, 1928 KB

Open AccessArticle

A Methodological Comparison of Forecasting Models Using KZ Decomposition and Walk-Forward Validation

by Khawla Al-Saeedi, Diwei Zhou, Andrew Fish, Katerina Tsakiri and Antonios Marsellos

Mathematics 2025, 13(21), 3410; https://doi.org/10.3390/math13213410 - 26 Oct 2025

Viewed by 232

Abstract

The accurate forecasting of surface air temperature (T2M) is crucial for climate analysis, agricultural planning, and energy management. This study proposes a novel forecasting framework grounded in structured temporal decomposition. Using the Kolmogorov–Zurbenko (KZ) filter, all predictor variables are decomposed into three physically [...] Read more.

The accurate forecasting of surface air temperature (T2M) is crucial for climate analysis, agricultural planning, and energy management. This study proposes a novel forecasting framework grounded in structured temporal decomposition. Using the Kolmogorov–Zurbenko (KZ) filter, all predictor variables are decomposed into three physically interpretable components: long-term, seasonal, and short-term variations, forming an expanded multi-scale feature space. A central innovation of this framework lies in training a single unified model on the decomposed feature set to predict the original target variable, thereby enabling the direct learning of scale-specific driver–response relationships. We present the first comprehensive benchmarking of this architecture, demonstrating that it consistently enhances the performance of both regularized linear models (Ridge and Lasso) and tree-based ensemble methods (Random Forest and XGBoost). Under rigorous walk-forward validation, the framework substantially outperforms conventional, non-decomposed approaches—for example, XGBoost improves the coefficient of determination (

R^{2}

) from 0.80 to 0.91. Furthermore, temporal decomposition enhances interpretability by enabling Ridge and Lasso models to achieve performance levels comparable to complex ensembles. Despite these promising results, we acknowledge several limitations: the analysis is restricted to a single geographic location and time span, and short-term components remain challenging to predict due to their stochastic nature and the weaker relevance of predictors. Additionally, the framework’s effectiveness may depend on the optimal selection of KZ parameters and the availability of sufficiently long historical datasets for stable walk-forward validation. Future research could extend this approach to multiple geographic regions, longer time series, adaptive KZ tuning, and specialized short-term modeling strategies. Overall, the proposed framework demonstrates that temporal decomposition of predictors offers a powerful inductive bias, establishing a robust and interpretable paradigm for surface air temperature forecasting. Full article

(This article belongs to the Special Issue Mathematical and Statistical Methods for Prediction and Optimisation in Artificial Intelligence)

► Show Figures

Graphical abstract

21 pages, 2727 KB

Open AccessArticle

Explainable Artificial Intelligence for Ovarian Cancer: Biomarker Contributions in Ensemble Models

by Hasan Ucuzal and Mehmet Kıvrak

Biology 2025, 14(11), 1487; https://doi.org/10.3390/biology14111487 - 24 Oct 2025

Viewed by 349

Abstract

Ovarian cancer’s high mortality is primarily due to late-stage diagnosis, underscoring the critical need for improved early detection tools. This study develops and validates explainable artificial intelligence (XAI) models to discriminate malignant from benign ovarian masses using readily available demographic and laboratory data. [...] Read more.

Ovarian cancer’s high mortality is primarily due to late-stage diagnosis, underscoring the critical need for improved early detection tools. This study develops and validates explainable artificial intelligence (XAI) models to discriminate malignant from benign ovarian masses using readily available demographic and laboratory data. A dataset of 309 patients (140 malignant, 169 benign) with 47 clinical parameters was analyzed. The Boruta algorithm selected 19 significant features, including tumor markers (CA125, HE4, CEA, CA19-9, AFP), hematological indices, liver function tests, and electrolytes. Five ensemble machine learning algorithms were optimized and evaluated using repeated stratified 5-fold cross-validation. The Gradient Boosting model achieved the highest performance with 88.99% (±3.2%) accuracy, 0.934 AUC-ROC, and 0.782 Matthews correlation coefficient. SHAP analysis identified HE4, CEA, globulin, CA125, and age as the most globally important features. Unlike black-box approaches, our XAI framework provides clinically interpretable decision pathways through LIME and SHAP visualizations, revealing how feature values push predictions toward malignancy or benignity. Partial dependence plots illustrated non-linear risk relationships, such as a sharp increase in malignancy probability with CA125 > 35 U/mL. This explainable approach demonstrates that ensemble models can achieve high diagnostic accuracy using routine lab data alone, performing comparably to established clinical indices while ensuring transparency and clinical plausibility. The integration of state-of-the-art XAI techniques highlights established biomarkers and reveals potential novel contributors like inflammatory and hepatic indices, offering a pragmatic, scalable triage tool to augment existing diagnostic pathways, particularly in resource-constrained settings. Full article

(This article belongs to the Special Issue AI Deep Learning Approach to Study Biological Questions (2nd Edition))

► Show Figures

Figure 1

16 pages, 1300 KB

Open AccessArticle

Multi-Class Segmentation and Classification of Intestinal Organoids: YOLO Stand-Alone vs. Hybrid Machine Learning Pipelines

by Luana Conte, Giorgio De Nunzio, Giuseppe Raso and Donato Cascio

Appl. Sci. 2025, 15(21), 11311; https://doi.org/10.3390/app152111311 - 22 Oct 2025

Viewed by 269

Abstract

Background: The automated analysis of intestinal organoids in microscopy images are essential for high-throughput morphological studies, enabling precision and scalability. Traditional manual analysis is time-consuming and subject to observer bias, whereas Machine Learning (ML) approaches have recently demonstrated superior performance. Purpose: [...] Read more.

Background: The automated analysis of intestinal organoids in microscopy images are essential for high-throughput morphological studies, enabling precision and scalability. Traditional manual analysis is time-consuming and subject to observer bias, whereas Machine Learning (ML) approaches have recently demonstrated superior performance. Purpose: This study aims to evaluate YOLO (You Only Look Once) for organoid segmentation and classification, comparing its standalone performance with a hybrid pipeline that integrates DL-based feature extraction and ML classifiers. Methods: The dataset, consisting of 840 light microscopy images and over 23,000 annotated intestinal organoids, was divided into training (756 images) and validation (84 images) sets. Organoids were categorized into four morphological classes: cystic non-budding organoids (Org0), early organoids (Org1), late organoids (Org3), and Spheroids (Sph). YOLO version 10 (YOLOv10) was trained as a segmenter-classifier for the detection and classification of organoids. Performance metrics for YOLOv10 as a standalone model included Average Precision (AP), mean AP at 50% overlap (mAP50), and confusion matrix evaluated on the validation set. In the hybrid pipeline, trained YOLOv10 segmented bounding boxes, and features extracted from these regions using YOLOv10 and ResNet50 were classified with ML algorithms, including Logistic Regression, Naive Bayes, K-Nearest Neighbors (KNN), Random Forest, eXtreme Gradient Boosting (XGBoost), and Multi-Layer Perceptrons (MLP). The performance of these classifiers was assessed using the Receiver Operating Characteristic (ROC) curve and its corresponding Area Under the Curve (AUC), precision, F1 score, and confusion matrix metrics. Principal Component Analysis (PCA) was applied to reduce feature dimensionality while retaining 95% of cumulative variance. To optimize the classification results, an ensemble approach based on AUC-weighted probability fusion was implemented to combine predictions across classifiers. Results: YOLOv10 as a standalone model achieved an overall mAP50 of 0.845, with high AP across all four classes (range 0.797–0.901). In the hybrid pipeline, features extracted with ResNet50 outperformed those extracted with YOLO, with multiple classifiers achieving AUC scores ranging from 0.71 to 0.98 on the validation set. Among all classifiers, Logistic Regression emerged as the best-performing model, achieving the highest AUC scores across multiple classes (range 0.93–0.98). Feature selection using PCA did not improve classification performance. The AUC-weighted ensemble method further enhanced performance, leveraging the strengths of multiple classifiers to optimize prediction, as demonstrated by improved ROC-AUC scores across all organoid classes (range 0.92–0.98). Conclusions: This study demonstrates the effectiveness of YOLOv10 as a standalone model and the robustness of hybrid pipelines combining ResNet50 feature extraction and ML classifiers. Logistic Regression emerged as the best-performing classifier, achieving the highest ROC-AUC across multiple classes. This approach ensures reproducible, automated, and precise morphological analysis, with significant potential for high-throughput organoid studies and live imaging applications. Full article

(This article belongs to the Special Issue Deep Learning for Biomedical Image Analysis: Recent Advances and Future Trends)

► Show Figures

Figure 1

21 pages, 8773 KB

Open AccessArticle

Engineering-Oriented Explainable Machine Learning and Digital Twin Framework for Sustainable Dairy Production and Environmental Impact Optimisation

by Ruiming Xing, Baihua Li, Shirin Dora, Michael Whittaker and Janette Mathie

Algorithms 2025, 18(10), 670; https://doi.org/10.3390/a18100670 - 21 Oct 2025

Viewed by 287

Abstract

Enhancing productivity while reducing environmental impact presents a major engineering challenge in sustainable dairy farming. This study proposes an engineering-oriented explainable machine learning and digital twin framework for multi-objective optimisation of milk yield and nitrogen-related emissions. Using the CowNflow dataset, which integrates individual-level [...] Read more.

Enhancing productivity while reducing environmental impact presents a major engineering challenge in sustainable dairy farming. This study proposes an engineering-oriented explainable machine learning and digital twin framework for multi-objective optimisation of milk yield and nitrogen-related emissions. Using the CowNflow dataset, which integrates individual-level nitrogen balance, feeding, and production data collected under controlled experimental conditions, the framework combines data analytics, feature selection, predictive modelling, and SHAP-based explainability to support decision-making in dairy production. The stacking ensemble model achieved the best predictive performance (

R^{2}

= 0.85 for milk yield and

R^{2}

= 0.794 for milk urea), providing reliable surrogates for downstream optimisation. Predicted milk urea values were further transformed using empirical equations to estimate urinary urea nitrogen (UUN) and ammonia (NH₃) emissions, offering an indirect yet practical approach to assess environmental sustainability. Furthermore, the predictive models are integrated into a digital twin platform that provides a dynamic, real-time simulation environment for scenario testing, continuous optimisation, and data-driven decision support, effectively bridging data analytics with sustainable dairy system management. This research demonstrates how explainable AI, machine learning, and digital twin engineering can jointly drive sustainable dairy production, offering actionable insights for improving productivity while minimising environmental impact. Full article

(This article belongs to the Special Issue AI-Driven Engineering Optimization)

► Show Figures

Figure 1

21 pages, 2684 KB

Open AccessArticle

Construction of Yunnan Flue-Cured Tobacco Yield Integrated Learning Prediction Model Driven by Meteorological Data

by Yunshuang Wang, Jinheng Zhang, Xiaoyi Bai, Mengyan Zhao, Xianjin Jin and Bing Zhou

Agronomy 2025, 15(10), 2436; https://doi.org/10.3390/agronomy15102436 - 21 Oct 2025

Viewed by 274

Abstract

The timely and accurate prediction of flue-cured tobacco yield is crucial for its stable yield and income growth. Based on yield and meteorological data from 2003 to 2023 (from the NASA POWER database) of Yunnan Province, this study constructed a coupled framework of [...] Read more.

The timely and accurate prediction of flue-cured tobacco yield is crucial for its stable yield and income growth. Based on yield and meteorological data from 2003 to 2023 (from the NASA POWER database) of Yunnan Province, this study constructed a coupled framework of polynomial regression and a Stacking ensemble model. Four trend yield separation methods were compared, with polynomial regression selected as being optimal for capturing long-term trends. A total of 135 meteorological features were built using flue-cured tobacco’s growth period data, and 17 core features were screened via Pearson’s correlation analysis and Recursive Feature Elimination (RFE). With Random Forest (RF), Multi-Layer Perceptron (MLP), and Support Vector Regression (SVR) as base models, a ridge regression meta-model was developed to predict meteorological yield. The final results were obtained by integrating trend and meteorological yields, and core influencing factors were analyzed via SHapley Additive exPlanations (SHAP). The results showed that the Stacking model had the best predictive performance, significantly outperforming single models; August was the optimal prediction lead time; and the day–night temperature difference in the August maturity stage and the solar radiation in the April transplantation stage were core yield-influencing factors. This framework provides a practical yield prediction tool for Yunnan’s flue-cured tobacco areas and offers important empirical support for exploring meteorology–yield interactions in subtropical plateau crops. Full article

(This article belongs to the Section Precision and Digital Agriculture)

► Show Figures

Figure 1

Search Results (970)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (970)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI