Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (2,254)

Search Parameters:
Keywords = Gradient Boosting Regression

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
25 pages, 4672 KB  
Article
Data-Efficient and Explainable Multimodal Survival Prediction in NSCLC Using Deep Image Embeddings, Clinical Variables, and Gradient-Boosted Trees
by Sevim Sahin and Adil Gursel Karacor
Diagnostics 2026, 16(12), 1941; https://doi.org/10.3390/diagnostics16121941 (registering DOI) - 22 Jun 2026
Abstract
Background/Objectives: Survival prediction in non-small cell lung cancer (NSCLC) remains challenging, particularly in limited-sample settings where end-to-end deep learning models may suffer from limited generalization. This study aimed to develop a data-efficient, multimodal, and explainable framework integrating computed tomography (CT)-derived imaging information with [...] Read more.
Background/Objectives: Survival prediction in non-small cell lung cancer (NSCLC) remains challenging, particularly in limited-sample settings where end-to-end deep learning models may suffer from limited generalization. This study aimed to develop a data-efficient, multimodal, and explainable framework integrating computed tomography (CT)-derived imaging information with clinical variables for NSCLC survival prediction. Methods: CT images, tumor segmentations, and clinical data from the publicly available NSCLC Radiomics (LUNG1) dataset (377 patients) were used. Tumor-focused regions were extracted using segmentation masks, and pretrained RadImageNet-InceptionV3 embeddings were obtained from the largest tumor-containing slice and neighboring-slice summaries. Deep imaging embeddings, engineered imaging features, and clinical variables were fused into a unified tabular representation. To improve robustness under limited-sample conditions, feature blocks were compressed using principal component analysis. CatBoost, XGBoost, and LightGBM models were trained on a development set and evaluated on a strictly held-out final validation set. Results: In three-class survival stratification, assigning censored/non-event patients to the upper survival group produced the strongest ordinal prognostic performance. Under the EX_PLUS_NON_EX_TOP setting, CatBoost achieved the best holdout score-based class C-index of 0.655. In continuous survival regression, LightGBM achieved the best holdout event-patient C-index of 0.576. Clinical variables provided the dominant prognostic signal, while compact deep image embeddings contributed complementary information, particularly in separating short- and long-survival groups. SHAP analysis confirmed contributions from both clinical and image-derived features. Conclusions: The proposed framework provides a proof-of-concept demonstration of a data-efficient and explainable image-to-tabular approach for NSCLC survival prediction under strict internal holdout validation. The results suggest that pretrained CT embeddings, clinical variables, gradient-boosted trees, and SHAP-based interpretation can be combined in a feasible, limited-sample survival modeling pipeline, while external validation remains necessary before clinical translation. Full article
(This article belongs to the Section Machine Learning and Artificial Intelligence in Diagnostics)
Show Figures

Figure 1

22 pages, 1211 KB  
Article
CYP3A4, CYP3A5, and CYP4F2 Polymorphisms and Bleeding Risk in Ticagrelor-Based Dual Antiplatelet Therapy
by Sonja Dakić, Zoran Perišić, Svetlana Apostolović, Tomislav Kostić, Goran Koraćević, Tatjana Jevtović, Boris Đinđić, Nikola Stefanović, Danijela Đorđević-Radojković, Bojan Maričić, Dragana Stanojević, Maša Jović, Jelena Perišić and Tamara Filipović
Medicina 2026, 62(6), 1202; https://doi.org/10.3390/medicina62061202 (registering DOI) - 22 Jun 2026
Abstract
Background and Objectives: Ticagrelor reduces ischemic events in acute coronary syndrome (ACS) but increases bleeding risk. Clinical predictors of bleeding are well established; the contribution of cytochrome P450 polymorphisms involved in ticagrelor metabolism remains uncertain, with conflicting reports in the literature. We [...] Read more.
Background and Objectives: Ticagrelor reduces ischemic events in acute coronary syndrome (ACS) but increases bleeding risk. Clinical predictors of bleeding are well established; the contribution of cytochrome P450 polymorphisms involved in ticagrelor metabolism remains uncertain, with conflicting reports in the literature. We examined the association of CYP3A4* 22 (rs 35599367), CYP3A5* 3 (rs 776746), and CYP4F2 (rs3093135) with bleeding in a Serbian ACS cohort. Materials and Methods: This prospective, single- center observational study enrolled 105 consecutive ACS patients undergoing percutaneous coronary intervention (PCI) or medical management after coronary angiography and receiving dual antiplatelet therapy (DAPT) with acetylsalicylic acid and ticagrelor at the University Clinical Center Niš between January 2024 and the end of May 2025. Bleeding events occurring during the index hospitalization and the six-month follow-up were classified according to the Bleeding Academic Research Consortium (BARC) criteria. Genotyping used TaqMan assays. Associations with bleeding were assessed using Firth’s penalized logistic regression, with multivariable adjustment for age and renal function. Severity-stratified analyses and gradient-boosted machine learning (XGBoost with SHAP) were performed as exploratory analyses. Results: Thirteen patients (12.4%) experienced bleeding (nine minor [BARC 1/2], four major [BARC 3/5]). Age ≥ 75 years (univariable OR 7.62, p = 0.001) and eGFR < 60 mL/min/1. 73 m 2 (OR 3.68, p = 0.006) were the strongest predictors. CYP3A5 *1 carrier status was univariably associated with bleeding (OR 4.16, p = 0.043) but did not remain significant after adjustment for age and renal function, and *1 carriers were significantly older and more likely to have impaired renal function. No genotype was associated with major (BARC 3/5) bleeding. The apparent effect was concentrated in minor bleeding (BARC 1/2 rate: 30.8% versus 5.5%), with no major events among *1 carriers. CYP 3 A 4* 22 (OR 1.37, p = 0.109) and CYP 4 F 2 (OR 1.17, p = 0.111) showed no association. Machine-learning analyses confirmed eGFR and age as the dominant predictors. Conclusions: In this Serbian ACS cohort, clinical factors—particularly advanced age and impaired renal function—dominated the prediction of bleeding risk. The CYP3A5 signal was largely explained by baseline imbalances in age and renal function. CYP 3 A 4* 22 and CYP 4 F 2 polymorphisms did not contribute additional predictive information. Preemptive genotyping for these variants is unlikely to materially improve bleeding-risk assessment beyond standard clinical evaluation in patients of this type. Full article
(This article belongs to the Special Issue Advances in Acute Myocardial Infarction)
Show Figures

Figure 1

16 pages, 1520 KB  
Article
Paranasal Sinus Morphometry for Forensic Sex Estimation: A Computed Tomography Study of 499 Individuals with a Cross-Validated, Transparently Reported Machine Learning Model
by Muhammet Can, Cihangir Işık and Burcu Düzel Asıg
Diagnostics 2026, 16(12), 1928; https://doi.org/10.3390/diagnostics16121928 (registering DOI) - 22 Jun 2026
Abstract
Background/Objectives: Paranasal sinus morphometry on computed tomography (CT) is of interest for forensic sex estimation, but many published predictive models rely on in-sample formulas without cross-validation, external testing, or release of model parameters. We aimed to characterize sex differences, pneumatization patterns, asymmetry, and [...] Read more.
Background/Objectives: Paranasal sinus morphometry on computed tomography (CT) is of interest for forensic sex estimation, but many published predictive models rely on in-sample formulas without cross-validation, external testing, or release of model parameters. We aimed to characterize sex differences, pneumatization patterns, asymmetry, and age relationships of the paranasal sinuses in a Turkish adult population, and to develop, cross-validate, and transparently report a predictive model for sex estimation, explicitly benchmarked against the single best morphometric feature. Methods: In this single-center, STROBE-compliant retrospective cross-sectional study, maxillary, frontal, and sphenoid sinus volumes were measured by semi-automated active-contour segmentation in ITK-SNAP on CT scans of 499 adults (282 male, 217 female; 18–65 years). Between-sex differences were tested with the Mann–Whitney U test with Bonferroni correction; effect sizes used Cliff’s delta and the probability of superiority. L1-regularized logistic regression, random forest, and gradient boosting were trained with 10-fold stratified cross-validation and a held-out 20% test set, and compared with a univariate frontal-volume benchmark. Results: All three sinus volumes were larger in males (all Bonferroni-adjusted p < 0.001), with the largest effect among the individual sinuses for the frontal sinus (Cliff’s delta = 0.53; probability of male superiority = 0.77). The best classifier was L1-regularized logistic regression (10-fold cross-validated AUC 0.79 ± 0.07; held-out test AUC 0.80; accuracy 70%). Because the area under the ROC curve of a single continuous marker equals its probability of superiority, frontal volume alone reached an AUC of approximately 0.77; the multivariable model therefore added little beyond this single feature. Age could not be reliably estimated (test mean absolute error ≈ 10.8 years; R2 ≈ 0). Conclusions: Paranasal sinus volumes show robust sex dimorphism, concentrated in the frontal sinus, but provide only moderate sex discrimination—appropriate as one corroborating input in a forensic identification workflow rather than a stand-alone determinant. Age cannot be reliably estimated from sinus morphometry in this cohort. Full model coefficients are reported to permit independent replication. Full article
(This article belongs to the Section Forensic Diagnostics)
Show Figures

Figure 1

24 pages, 21264 KB  
Article
Cluster-Based Interpretable Machine Learning for Landslide Susceptibility Mapping: A Case Study in Northern Guangdong
by Zhanhui Qing, Wenfeng Cui, Chuangeng Sun, Zhiwen Zheng, Wei Zhang, Jinxiang Li and Muhammad Zeeshan Ali
Sustainability 2026, 18(12), 6347; https://doi.org/10.3390/su18126347 (registering DOI) - 22 Jun 2026
Abstract
Operational landslide susceptibility mapping (LSM) remains challenging in regions with pronounced geo-environmental heterogeneity, where single global models often overlook spatially variable landslide-environment relationships. Northern Guangdong, China, is a typical humid mountainous region where steep terrain, diverse lithology, and highly variable rainfall produce non-stationary [...] Read more.
Operational landslide susceptibility mapping (LSM) remains challenging in regions with pronounced geo-environmental heterogeneity, where single global models often overlook spatially variable landslide-environment relationships. Northern Guangdong, China, is a typical humid mountainous region where steep terrain, diverse lithology, and highly variable rainfall produce non-stationary landslide controls. To address this challenge, we develop a cluster-informed LSM framework that integrates unsupervised consensus K-means sub-zoning with localized Random Forest (RF) models and SHapley Additive exPlanations (SHAP). We use a harmonized inventory of 1510 landslides (2011–2022), together with twelve 30 m conditioning factors, for model training and validation. Compared with logistic regression, Support Vector Machines (SVM), and Light Gradient Boosting Machine (LightGBM), RF consistently achieves higher accuracy across clusters, and the cluster-wise RF ensemble attains pooled ACC = 0.8212, F1 = 0.8176, and AUC = 0.8956. SHAP highlights both regionally consistent predictors (e.g., NDVI, distance to road) and distinct cluster-specific controls linked to geomorphic and hydrologic settings. The proposed framework enhances predictive accuracy, produces finer susceptibility gradients, and yields better-calibrated probability estimates than a single global model. These results demonstrate that explicitly accounting for geo-environmental heterogeneity can generate interpretable, spatially adaptive susceptibility outputs. By identifying high-risk zones for priority monitoring, land-use regulation, infrastructure protection, and mitigation planning, the proposed framework provides a practical decision-support tool for sustainable mountain development and disaster risk reduction in heterogeneous mountainous regions. Full article
(This article belongs to the Special Issue Sustainable Assessment and Risk Analysis on Landslide Hazards)
Show Figures

Figure 1

19 pages, 18788 KB  
Article
Interpretable Machine Learning and Spatiotemporal Modeling of Meteorological and Environmental Drivers for Tuberculosis Incidence in China
by Zihao Wang, Siyuan Li, Xiaotong Jiang, Kang Hu and Yangzhou Wu
Toxics 2026, 14(6), 537; https://doi.org/10.3390/toxics14060537 (registering DOI) - 21 Jun 2026
Abstract
Tuberculosis (TB) remains a major public health burden in China. Although meteorological and environmental factors are recognized to influence TB transmission, their non-linear effects and spatiotemporal heterogeneity have not been fully elucidated. Based on monthly TB incidence data from 31 provinces in China [...] Read more.
Tuberculosis (TB) remains a major public health burden in China. Although meteorological and environmental factors are recognized to influence TB transmission, their non-linear effects and spatiotemporal heterogeneity have not been fully elucidated. Based on monthly TB incidence data from 31 provinces in China during 2005–2020, this study systematically investigated these effects by integrating nine meteorological and air pollution variables within a combined machine learning and spatial statistical modeling framework. The results indicated that the Extreme Gradient Boosting (XGBoost) model effectively captured the complex non-linear relationships between environmental exposure and TB incidence. SHAP interpretability analysis identified surface pressure (SP), vegetation coverage, and PM2.5 as the key drivers and revealed pronounced nonlinear response patterns and threshold effects. In particular, the promoting effect of PM2.5 on TB incidence increased sharply at medium-to-high concentration levels. To further investigate spatial and temporal non-stationarity, Geographically and Temporally Weighted Regression (GTWR) was applied. The results demonstrated strong spatiotemporal heterogeneity in driver effects across provinces. The influence of PM2.5 showed a consistently positive association with TB incidence and exhibited a distinct temporal evolution characterized by an initial strengthening before 2015 followed by a weakening thereafter, closely aligning with China’s air pollution control process. These findings provide new insights into the nonlinear and spatiotemporally heterogeneous effects of meteorological and environmental factors on TB incidence and support the development of more targeted, region-specific TB prevention strategies. Full article
Show Figures

Figure 1

27 pages, 4601 KB  
Article
Few-Shot Learning–Based Water Quality Classification Under Limited Data Conditions for Smart Aquaculture Monitoring
by Ashikur Rahman, Gwo Chin Chung, Yin Hoe Ng, Kah Yoong Chan and Soo Fun Tan
Water 2026, 18(12), 1523; https://doi.org/10.3390/w18121523 (registering DOI) - 20 Jun 2026
Abstract
Water quality monitoring is a fundamental element of sustainable aquaculture management, as changes in parameters of physicochemical and biological properties directly affect the health, growth performance, and productivity of the aquaculture systems. Although traditional machine learning (ML) methods have demonstrated effectiveness in water [...] Read more.
Water quality monitoring is a fundamental element of sustainable aquaculture management, as changes in parameters of physicochemical and biological properties directly affect the health, growth performance, and productivity of the aquaculture systems. Although traditional machine learning (ML) methods have demonstrated effectiveness in water quality classification, their performance often depends on large amounts of labeled data, which can be challenging and expensive to collect in real-world aquaculture environments. This study explores a few-shot learning (FSL) framework for data-efficient water quality classification under limited supervision to address this limitation. Several FSL models, including prototypical networks (ProtoNet), Siamese Networks, and Matching Networks were developed and evaluated in a comparative experimental framework against the traditional machine learning classifiers logistic regression, random forest, support vector machine and extreme gradient boosting. Low-data learning scenarios were simulated using a structured episodic evaluation approach. Experimental results demonstrate FSL techniques outperform traditional machine learning methods across all evaluated scenarios. Among the tested methods, ProtoNet achieved the highest performance, attaining an accuracy of 94.46% and an ROC-AUC score of 98.65%, indicating superior discriminative capability and robustness. Siamese Networks also demonstrated competitive performance under highly constrained data conditions. Furthermore, latent-space visualization, confusion matrix analysis, paired t-test statistical analysis, and ablation studies confirmed that episodic meta-learning enables the learning of highly discriminative latent representations with strong generalization capability under limited labeled data conditions. The findings highlight that FSL provides a robust and scalable framework for intelligent water quality classification in aquaculture systems, particularly in scenarios where labeled data are scarce, offering significant potential for sustainable aquaculture monitoring applications. Full article
Show Figures

Figure 1

17 pages, 1410 KB  
Article
Preoperative OCT Biomarkers as Predictors of Postoperative Functional Outcome Assessed by Microperimetry After Inverted ILM Flap Surgery
by Ovidiu Samoilă, Anca Mădălina Sere, Lăcrămioara Samoilă and Daniel-Corneliu Leucuța
Diagnostics 2026, 16(12), 1919; https://doi.org/10.3390/diagnostics16121919 (registering DOI) - 20 Jun 2026
Abstract
Background/Objectives: A macular hole represents a significant surgical condition in an increasingly aging population. Advances in surgical techniques, particularly pars plana vitrectomy with inverted internal limiting membrane (ILM) flap, have established high anatomical closure rates exceeding 90%. The prognostic factors influencing visual [...] Read more.
Background/Objectives: A macular hole represents a significant surgical condition in an increasingly aging population. Advances in surgical techniques, particularly pars plana vitrectomy with inverted internal limiting membrane (ILM) flap, have established high anatomical closure rates exceeding 90%. The prognostic factors influencing visual recovery remain incompletely understood, and it is unclear which patients can be expected to achieve optimal functional outcomes. Methods: This retrospective longitudinal study included 35 eyes of 32 patients followed for 3–12 months. Preoperative OCT parameters (minimum linear diameter, basal diameter, and hole height) and derived indices were correlated with functional outcomes, including best-corrected visual acuity (BCVA) and microperimetry, stratified as central macular sensitivity (CMS) and sensitivity at 4° and 20°. Postoperative ellipsoid zone (EZ) and external limiting membrane (ELM) integrity were also analyzed. Predictive performance was assessed using root mean square error (RMSE) and coefficient of determination (R2). A linear regression model based on BCVA served as baseline, while Extreme Gradient Boosting (XGBoost) models incorporating OCT features were developed. Feature importance was evaluated using Shapley Additive Explanations (SHAP). Results: Overall closure rate was 100%, including 91.4% Type 1 and 8.6% Type 2 closure. Models incorporating OCT parameters outperformed BCVA-based models (lower RMSE, and higher R2). Minimum linear diameter and hole height were the strongest predictors of postoperative outcomes. Microperimetry detected functional improvement beyond BCVA and correlated with EZ and ELM restoration. Conclusions: Preoperative macular hole morphology represents a key determinant of postoperative functional recovery. These structural parameters provide meaningful prognostic value beyond visual acuity alone, supporting the role of combined OCT and microperimetric assessment in predicting surgical outcomes. Full article
(This article belongs to the Special Issue Clinical Prognostic and Predictive Biomarkers, 4th Edition)
Show Figures

Figure 1

21 pages, 1295 KB  
Article
Machine Learning-Assisted Synthesis of Self-Organizing SISO Control Systems with Guaranteed Lyapunov Stability
by Nurgul Shazhdekeyeva, Beket Kenzhegulov, Kamka Uteuliyeva, Gulash Kochshanova, Gulmira Nigmetova, Lyailya Kurmangaziyeva, Raigul Tuleuova, Saya Kenzhegulova and Raushan Moldasheva
Computation 2026, 14(6), 142; https://doi.org/10.3390/computation14060142 - 19 Jun 2026
Viewed by 51
Abstract
The proposed methodology combines analytical control laws with adaptive mechanisms and machine-learning-assisted modules based on regression trees, random forests, and extreme gradient boosting (XGBoost). Machine learning models are employed to approximate unknown nonlinear dynamics, compensate disturbances, and adjust controller parameters, while the overall [...] Read more.
The proposed methodology combines analytical control laws with adaptive mechanisms and machine-learning-assisted modules based on regression trees, random forests, and extreme gradient boosting (XGBoost). Machine learning models are employed to approximate unknown nonlinear dynamics, compensate disturbances, and adjust controller parameters, while the overall control structure is constrained by Lyapunov stability conditions. This ensures that the inclusion of data-driven components does not violate the fundamental requirement of system stability. The effectiveness of the proposed approach is evaluated through simulation experiments across three operating modes with varying degrees of nonlinearity and dynamic complexity. The results show that hybrid models incorporating ensemble machine learning methods improved performance compared with the analytical and adaptive baselines examined. XGBoost-based control achieves the lowest error values and the highest level of Lyapunov stability compliance (up to 99.3%). The main contribution of this study lies in the development of a unified synthesis framework in which machine learning is not used as a standalone control strategy but as a machine-learning-assisted support mechanism integrated into a theoretically grounded control architecture. The proposed approach provides a balance between adaptability, accuracy, and rigorous stability guarantees, suggesting potential applicability to simulation-based and offline-assisted control design tasks, while real-time embedded implementation requires additional computational optimization and validation. Full article
(This article belongs to the Section Computational Engineering)
19 pages, 7719 KB  
Article
Predicting the Thermal Conductivity of Structural Materials Under Lead–Bismuth Corrosion Based on Machine Learning
by Xinxin Gao and Xian Zeng
Materials 2026, 19(12), 2639; https://doi.org/10.3390/ma19122639 - 18 Jun 2026
Viewed by 150
Abstract
316L stainless steel and T91 heat-resistant steel are key structural materials for lead-cooled fast reactors (LFRs). Lead–bismuth eutectic (LBE) corrosion induces oxide layer formation and remarkably degrades thermal conductivity, endangering reactor safety and efficiency. Systematic experimental studies on and predictive tools for the [...] Read more.
316L stainless steel and T91 heat-resistant steel are key structural materials for lead-cooled fast reactors (LFRs). Lead–bismuth eutectic (LBE) corrosion induces oxide layer formation and remarkably degrades thermal conductivity, endangering reactor safety and efficiency. Systematic experimental studies on and predictive tools for the thermal conductivity of stainless steels after LBE corrosion are currently scarce. To address the lack of experimental data and predictive capabilities regarding changes in thermal conductivity following LBE corrosion, this study experimentally obtained thermal conductivity data from stainless steels after lead–bismuth corrosion and developed machine learning models to predict thermal conductivity under multi-parameter coupled LBE corrosion conditions. Three machine learning models were established using material composition and corrosion parameters as inputs. Overall, the hyperparameter-optimized Gradient Boosting Regression model showed competitive predictive performance with low overall prediction error. The model therefore provides a preliminary data-driven tool for estimating the thermal conductivity of corroded 316L stainless steel and T91 heat-resistant steel, thereby providing technical support for material selection, thermal design, and safety assessment of LFRs, with further specimen-level validation required for broader engineering application. Full article
(This article belongs to the Section Corrosion)
Show Figures

Graphical abstract

24 pages, 9969 KB  
Article
Multisource Satellite Data-Driven Machine Learning Approach for Rice Yield Prediction
by Sudheer Kumar Tiwari, Vinay Kumar Srivastava and Sonam Agrawal
ISPRS Int. J. Geo-Inf. 2026, 15(6), 275; https://doi.org/10.3390/ijgi15060275 - 18 Jun 2026
Viewed by 191
Abstract
Estimation of rice crop yield at the village level is essential because village is the Insurance Unit (IU) for rice crop in many regions in India, and timely and accurate yield information at this scale supports timely and transparent claim settlements for farmers [...] Read more.
Estimation of rice crop yield at the village level is essential because village is the Insurance Unit (IU) for rice crop in many regions in India, and timely and accurate yield information at this scale supports timely and transparent claim settlements for farmers and supports local agricultural planning. To achieve this, a multi-source satellite data-based machine learning approach was used to estimate rice yield at the village level using optical and SAR data, climatic data and land surface model-derived parameters in Kakinada of Andhra Pradesh, India. The predictor dataset included seasonal cumulative rainfall, seasonal Normalized Difference Vegetation Index (NDVI)-Max, seasonal NDVI-Mean, seasonal Land Surface Water Index (LSWI)-Max, seasonal LSWI-Mean, season total Fraction of Absorbed Photosynthetically Active Radiation (fAPAR) and season total Root Zone Soil Moisture (RZSM), and season total backscatter of the Sentinel-1 VH polarization were used to represent crop greenness, moisture status, photosynthetic activity, soil water availability, canopy structure, and seasonal water supply. For model development and validation, village-level rice yield data from 2017 to 2023 was used, which was collected through Crop Cutting Experiment (CCE) at the maturity stage of Kharif season. In this study, four machine learning models such as Random Forest (RF), Support Vector Regression (SVR), Extreme Gradient Boosting (XGBoost), and Gradient Boosting (GB) were evaluated. The multi-source satellite data and yield data for the period 2017–2021 were used to train the models, which were independently tested on 2022 data and then applied to predict the rice yield in 2023. Leave-One-Year-Out (LOYO) cross-validation was also conducted on the 2017–2022 data to assess temporal robustness and generalization capability across years. Among the evaluated models, Random Forest exhibited the best overall performance. For the independent test year 2022, RF achieved an R2 of 0.465, RMSE of 415.34 kg ha−1, MAE of 322.22 kg ha−1, and MAPE of 10.36%. For the prediction year 2023, RF achieved improved accuracy with an R2 of 0.838, RMSE of 325.75 kg ha−1, MAE of 262.21 kg ha−1, and MAPE of 7.68%. Further, LOYO cross-validation also showed the robustness of RF, achieving the highest mean R2 of 0.702 and mean RMSE of 384.73 kg ha−1. The results illustrate that multi-source satellite data combined with machine learning can be a reliable and operationally useful tool in predicting village-level rice yield, which can be used for crop insurance claim settlement. Full article
Show Figures

Figure 1

40 pages, 5744 KB  
Article
Consolidating Access to Candidate Data for Recruitment Headhunting: Leveraging Explainable Machine Learning
by Mncedisi Mncwabe and Thulane Paepae
Informatics 2026, 13(6), 94; https://doi.org/10.3390/informatics13060094 - 18 Jun 2026
Viewed by 252
Abstract
The recruitment headhunting process is time-intensive due to manual candidate searches across multiple job platforms, creating inefficiencies in identifying suitable candidates. Current AI-driven recruitment platforms frequently prioritize accuracy over explainability, limiting transparency for non-technical users such as recruiters. This study streamlines recruitment headhunting [...] Read more.
The recruitment headhunting process is time-intensive due to manual candidate searches across multiple job platforms, creating inefficiencies in identifying suitable candidates. Current AI-driven recruitment platforms frequently prioritize accuracy over explainability, limiting transparency for non-technical users such as recruiters. This study streamlines recruitment headhunting by (1) consolidating publicly available candidate data from multiple job portals using a professional data aggregation Application Programming Interface (API), and (2) implementing explainable machine learning for transparent candidate–job matching. We utilized the Coresignal API (v1) to aggregate and standardize candidate profiles (N = 587) sourced from LinkedIn and Indeed, including skills, experience, certifications, and education. Using Term Frequency–Inverse Document Frequency (TF-IDF) feature vectors and regression models (Ridge, Gradient Boosting, Random Forest), we matched and ranked candidates against a standardized Data Scientist job description. Shapash was incorporated to provide interpretable feature importance explanations accessible to non-technical users. Model performance was evaluated using stratified 5-fold cross-validation with statistical significance testing. Ridge Regression achieved superior performance (cross-validated R2 = 0.935, bootstrap R2 = 0.954, 95% confidence interval [0.939, 0.965], RMSE = 0.025) compared with Gradient Boosting (R2 = 0.840) and Random Forest (R2 = 0.733). Paired t-tests confirmed significant differences between all model pairs (all ps ≤ 0.001, Bonferroni corrected) with large effect sizes (Cohen’s d ≥ 1.992). Shapash analysis revealed that top-contributing features, such as “engineering”, “data science”, “machine learning”, and “python”, aligned precisely with job description requirements, validating the model’s feature-learning capability. This approach reduces repetitive manual searches across job portals while providing interpretable insights into candidate–job rankings. The methodology’s originality lies in combining professional data aggregation APIs that access publicly available profile data with interpretable models enhanced by user-friendly visualization tools, creating a practical, potentially transferable solution for transparent AI-driven recruitment. Full article
Show Figures

Figure 1

17 pages, 1761 KB  
Article
Development and Validation of a Machine Learning-Based Risk Assessment Tool for In-Hospital Mortality in Elderly Patients with Postoperative Hypoxemia Following Non-Cardiac Surgery
by Yuchen Zhou, Xinhe Zhou, Xiaozhu Liu, Chenghui Zhou and Yang Liu
J. Clin. Med. 2026, 15(12), 4725; https://doi.org/10.3390/jcm15124725 - 18 Jun 2026
Viewed by 135
Abstract
Background/Objectives: Postoperative hypoxemia is a frequent complication after non-cardiac surgery and is correlated with elevated mortality rates in elderly patients. However, a dedicated predictive tool for mortality in this specific patient subgroup remains unavailable. To construct and validate a machine learning (ML) model [...] Read more.
Background/Objectives: Postoperative hypoxemia is a frequent complication after non-cardiac surgery and is correlated with elevated mortality rates in elderly patients. However, a dedicated predictive tool for mortality in this specific patient subgroup remains unavailable. To construct and validate a machine learning (ML) model for predicting in-hospital mortality among elderly adults who develop hypoxemia after non-cardiac surgery. Methods: Data for this retrospective cohort study were obtained from the Medical Information Mart for Intensive Care IV (MIMIC-IV) database. The study encompassed patients aged 65 years or older who exhibited hypoxemia, defined as a PaO2/FiO2 ratio below 300 mmHg, within the initial 48 h of intensive care unit (ICU) stay. LASSO (Least Absolute Shrinkage and Selection Operator) regression was applied for feature selection, after which six distinct machine learning models and five conventional scoring systems were constructed and evaluated. SHapley Additive exPlanations (SHAP) was employed to improve model interpretability. Results: Out of 6051 eligible patients, 1838 (30.4%) succumbed during hospitalization. The XGBoost algorithm demonstrated superior predictive capability, achieving an area under the curve (AUC) of 0.794, along with a specificity of 0.917, accuracy of 0.769, and positive predictive value of 0.693. Critical predictors identified included administration of vasopressors, advanced age, and the PaO2/FiO2 ratio. Conclusions: The Extreme Gradient Boosting (XGBoost)-driven ML model provides accurate prediction of in-hospital mortality in elderly patients with postoperative hypoxemia after non-cardiac surgery, presenting a valuable instrument for early risk evaluation and potential intervention. Full article
(This article belongs to the Topic Machine Learning and Deep Learning in Medical Imaging)
Show Figures

Figure 1

28 pages, 12454 KB  
Article
Forecasting and Enhancing Weight on Bit Through Machine Learning Methods in the Sudanese Oil and Gas Sector
by Asaad Mustafa, Guojun Wen, AL-Wesabi Ibrahim, Wahib Yahya and Abobaker Albabo
Appl. Sci. 2026, 16(12), 6149; https://doi.org/10.3390/app16126149 - 17 Jun 2026
Viewed by 103
Abstract
Drilling optimization seeks to enhance the efficiency of drilling operations by fine-tuning adjustable factors like weight on bit (WOB); the goal is to boost the rate of penetration during drilling and decrease overall well expenses. It is crucial to efficiently and precisely manage [...] Read more.
Drilling optimization seeks to enhance the efficiency of drilling operations by fine-tuning adjustable factors like weight on bit (WOB); the goal is to boost the rate of penetration during drilling and decrease overall well expenses. It is crucial to efficiently and precisely manage weight on bit (WOB) to fine-tune drilling parameters promptly. Drilling optimization focuses on adjusting controllable variables, such as weight on the bit and bit rotation speed, to achieve the highest possible drilling rate during operations. Consequently, it is necessary to conduct a comparative analysis of ML models to evaluate practitioners in picking the appropriate predictive model. This research employs four machine learning methods to forecast weight on bit: Random Forest (RF), K-Nearest Neighbors (KNNs), Gradient Boosting Regression (GBR), and Decision Tree (DT). Machine learning techniques are being evaluated using datasets sourced from well drilling data in Western Sudan, marking the first instance of such data being utilized for this purpose. The key accomplishment of this study is the automation of predicting weight on bit by utilizing machine learning techniques tailored to our datasets. The findings indicated that among the algorithms tested, Random Forest stood out as the most dependable, displaying a prediction accuracy of 98% and a lower RMSE value of 1.015. In contrast, KNN, GBR, and DT achieved accuracies of 91.40%, 80.66%, and 100.00% respectively, with RMSE values of 2.008, 3.011, and 6.27 on the testing dataset, correspondingly. At last, this research is acknowledged as a groundbreaking effort in the field, utilizing machine learning techniques to predict weight on bit occurrences. Consequently, this study presents a publicly available dataset containing details about drilled wells in the Sudanese oil and gas sector. This dataset is meant to be used for upcoming experiments, validating algorithms, and for analytical purposes. Full article
19 pages, 3637 KB  
Article
Machine Learning-Based Classification of BI-RADS 4 and BI-RADS 5 Microcalcifications in Mammography Combined with DCE-MRI for Malignant–Benign Discrimination
by Sevgi Ünal and Enes Açıkgözoğlu
Tomography 2026, 12(6), 88; https://doi.org/10.3390/tomography12060088 - 17 Jun 2026
Viewed by 93
Abstract
Background/Objectives: Breast cancer remains one of the leading causes of cancer-related mortality among women worldwide. Early and accurate characterization of suspicious mammographic microcalcifications is essential for improving diagnostic decision-making and reducing unnecessary invasive procedures. Microcalcifications classified as BI-RADS 4 and BI-RADS 5 are [...] Read more.
Background/Objectives: Breast cancer remains one of the leading causes of cancer-related mortality among women worldwide. Early and accurate characterization of suspicious mammographic microcalcifications is essential for improving diagnostic decision-making and reducing unnecessary invasive procedures. Microcalcifications classified as BI-RADS 4 and BI-RADS 5 are clinically important radiological findings; however, differentiating benign from malignant lesions remains challenging because of overlapping morphological and distribution patterns. This study aimed to develop a structured feature-based machine learning model for predicting the pathological diagnosis of breast microcalcifications by integrating mammographic descriptors, patient age, and dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI) contrast enhancement findings. Methods: The dataset included 53 biopsy-confirmed cases and consisted of clinical and radiological variables, including patient age, calcification morphology, calcification size, distribution pattern, DCE-MRI contrast enhancement status, and histopathological outcome. Several conventional machine learning algorithms were evaluated, including Logistic Regression, Support Vector Machine with radial basis function kernel, K-Nearest Neighbors, Decision Tree, Random Forest, Extra Trees, Gradient Boosting, AdaBoost, and CatBoost. Hyperparameter optimization was performed using grid search with five-fold cross-validation. Model performance was assessed using accuracy, precision, recall, F1-score, ROC-AUC, and log loss. Results: Logistic Regression achieved the highest overall performance, with an accuracy of 0.909 and an F1-score of 0.889, while AdaBoost achieved a recall of 1.000 in the internal evaluation. However, given the limited sample size and lack of external validation, these findings should be interpreted as preliminary. Conclusions: The results suggest that structured radiological descriptors combined with DCE-MRI enhancement information may support malignancy risk stratification of BI-RADS 4–5 microcalcifications, although larger multicenter studies are required before clinical implementation. Full article
Show Figures

Figure 1

31 pages, 3577 KB  
Article
Machine Learning-Based Weather Classification over Morocco Using Multi-Station METAR Observations
by Samir Saadane, Lahcen Hassine, Hatim Kharraz Aroussi and Rachid Saadane
Earth 2026, 7(3), 104; https://doi.org/10.3390/earth7030104 - 17 Jun 2026
Viewed by 163
Abstract
Accurate weather-regime classification is increasingly important for climate-sensitive decision-making in agriculture, aviation, disaster preparedness, and territorial planning, particularly in regions where strong climatic heterogeneity complicates conventional operational workflows. This study proposes a machine learning-based framework for broad-regime weather classification over Morocco using hourly [...] Read more.
Accurate weather-regime classification is increasingly important for climate-sensitive decision-making in agriculture, aviation, disaster preparedness, and territorial planning, particularly in regions where strong climatic heterogeneity complicates conventional operational workflows. This study proposes a machine learning-based framework for broad-regime weather classification over Morocco using hourly METAR observations collected from 22 meteorological stations between July 2022 and February 2024. The proposed workflow integrates data cleaning, missing-value imputation, feature transformation, categorical encoding, class-imbalance handling, and model optimization under a leakage-safe experimental protocol. To preserve temporal integrity, observations were chronologically split into training, validation, and independent test subsets; SMOTE and random undersampling were applied exclusively to the training subset, whereas the validation and test subsets retained their original class distributions. Seven classifiers were evaluated, including XGBoost, LightGBM, CatBoost, Random Forest, Gradient Boosting, Support Vector Machine, and Logistic Regression, with hyperparameters optimized using Optuna. The results show that optimized boosting models are particularly effective for Moroccan station-based weather classification. XGBoost achieved the highest test-set accuracy of 95.1%, followed by LightGBM at 94.7% and CatBoost at 93.8%, with optimization improving accuracy by approximately 8–12 percentage points compared with baseline configurations. Because the dataset exhibits class imbalance, macro-averaged precision, recall, and F1-score were emphasized alongside accuracy to provide a more reliable assessment across weather classes. Confusion-matrix analysis indicates improved recognition of underrepresented regimes, especially Dust/Sand events, while residual confusion between Fog/Haze and Rain/Storm reflects both physical overlap and the limits of a four-class METAR taxonomy. Overall, the findings demonstrate that optimized ensemble learning can provide a robust, computationally efficient, and operationally relevant classification layer for regional meteorological decision support in Morocco, while future work should extend the framework to longer time series, finer weather taxonomies, and external regional validation. Full article
(This article belongs to the Section AI and Big Data in Earth Science)
Show Figures

Figure 1

Back to TopTop