Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (1,244)

Search Parameters:
Keywords = gradient-boosting decision trees

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
24 pages, 3822 KB  
Article
Optimising Calculation Logic in Emergency Management: A Framework for Strategic Decision-Making
by Yuqi Hang and Kexi Wang
Systems 2026, 14(2), 139; https://doi.org/10.3390/systems14020139 - 29 Jan 2026
Viewed by 33
Abstract
Given the increasing demand for rapid emergency management decision-making, which must be both timely and reliable, even slight delays can result in substantial human and economic losses. However, current systems and recent state-of-the-art work often use inflexible rule-based logic that cannot adapt to [...] Read more.
Given the increasing demand for rapid emergency management decision-making, which must be both timely and reliable, even slight delays can result in substantial human and economic losses. However, current systems and recent state-of-the-art work often use inflexible rule-based logic that cannot adapt to rapidly changing emergency conditions or dynamically optimise response allocation. As a result, our study presents the Calculation Logic Optimisation Framework (CLOF), a novel data-driven approach that enhances decision-making intelligently and strategically through learning-based predictive and multi-objective optimisation, utilising the 911 Emergency Calls data set, comprising more than half a million records from Montgomery County, Pennsylvania, USA. The CLOF examines patterns over space and time and uses optimised calculation logic to reduce response latency and increase decision reliability. The suggested framework outperforms the standard Decision Tree, Random Forest, Gradient Boosting, and XGBoost baselines, achieving 94.68% accuracy, a log-loss of 0.081, and a reliability score (R2) of 0.955. The mean response time error is reported to have been reduced by 19%, illustrating robustness to real-world uncertainty. The CLOF aims to deliver results that confirm the scalability, interpretability, and efficiency of modern EM frameworks, thereby improving safety, risk awareness, and operational quality in large-scale emergency networks. Full article
(This article belongs to the Section Artificial Intelligence and Digital Systems Engineering)
Show Figures

Figure 1

9 pages, 756 KB  
Proceeding Paper
Effect of Data Preparation on Machine Learning Models for Diabetes Prediction
by Goran Martinović, Ivan Ivković, Domen Verber and Tatjana Bačun
Eng. Proc. 2026, 125(1), 13; https://doi.org/10.3390/engproc2026125013 - 28 Jan 2026
Viewed by 114
Abstract
This paper examines how data preparation affects machine-learning classifiers for diabetes-risk prediction using the Pima Indians Diabetes Database. Three preprocessing methods are considered: imputing invalid zeros, handling outliers, and data scaling. Nine algorithms are evaluated on this dataset: linear/probabilistic baselines (Logistic Regression, Gaussian [...] Read more.
This paper examines how data preparation affects machine-learning classifiers for diabetes-risk prediction using the Pima Indians Diabetes Database. Three preprocessing methods are considered: imputing invalid zeros, handling outliers, and data scaling. Nine algorithms are evaluated on this dataset: linear/probabilistic baselines (Logistic Regression, Gaussian Naive Bayes), distance-based methods (KNN, Support Vector Machines), a single tree-based model (Decision Tree), and tree ensembles (Random Forest, Gradient Boosting, XGBClassifier, LightGBM). Median imputation of invalid zeros yields the largest and most consistent gains in accuracy and AUC. Outlier handling uses interquartile-range filtering, with Local Outlier Factor as an auxiliary indicator; effects are modest for accuracy and small, model-dependent for AUC. Scaling offers targeted benefits: for KNN, robust scaling can slightly alter performance and may reduce AUC relative to median-only imputation in this setup; SVM shows modest gains, while tree ensembles are comparatively insensitive overall. Ensembles achieve the highest performance and remain robust under minimal preparation, while simpler models benefit most from pipelines combining median imputation, careful outlier handling, and appropriate scaling. Hyperparameter tuning yields small to substantial gains—large for Decision Trees—while leaving ensemble rankings largely unchanged. Overall, results highlight the centrality of median imputation and the selective value of scaling for distance-based classifiers in diabetes-risk prediction. Full article
Show Figures

Figure 1

21 pages, 1645 KB  
Article
Machine Learning-Based Prediction of Optimum Design Parameters for Axially Symmetric Cylindrical Reinforced Concrete Walls
by Aylin Ece Kayabekir
Processes 2026, 14(3), 455; https://doi.org/10.3390/pr14030455 - 28 Jan 2026
Viewed by 127
Abstract
This study presents a hybrid approach integrating metaheuristic optimization and machine learning methods to quickly and reliably estimate the optimum design parameters of dome-shaped axially symmetric cylindrical reinforced concrete (RC) walls. A comprehensive dataset was created using the Jaya algorithm to minimize total [...] Read more.
This study presents a hybrid approach integrating metaheuristic optimization and machine learning methods to quickly and reliably estimate the optimum design parameters of dome-shaped axially symmetric cylindrical reinforced concrete (RC) walls. A comprehensive dataset was created using the Jaya algorithm to minimize total material cost for hinged and fixed support conditions. For each optimized design case, total wall height (H), dome height (Hd), dome thickness (hd), and fluid unit weight (γ) were considered as input parameters; optimum wall thickness (hw) and total cost were determined as output parameters. Using the obtained dataset, a total of thirteen different regression-based machine learning algorithms, including linear regression-based models, tree-based ensemble methods, and neural network models, were trained and tested. Hyperparameter adjustments for all models were performed using the Optuna framework, and model performances were evaluated using a ten-fold cross-validation method and holdout dataset results. The results showed that machine learning models can learn the optimum design space obtained from metaheuristic optimization outputs with high accuracy. In optimum wall thickness estimation, Gradient Boosting-based models provided the highest accuracy under both hinged and fixed support conditions. In total cost estimation, the Gradient Boosting model stood out under hinged support conditions, while the XGBoost model yielded the most successful results for fixed support conditions. The findings clearly show that no single machine learning model exhibits the best performance for all output parameters and support conditions. The proposed approach offers significantly higher computational efficiency compared to traditional iterative optimization processes and allows for rapid estimation of optimum design parameters without the need for any iterations. In this respect, this study provides an effective decision support tool that can be used especially in the preliminary design phases and contributes to sustainable, cost-effective reinforced concrete structure design. Full article
(This article belongs to the Special Issue Machine Learning Models for Sustainable Composite Materials)
Show Figures

Figure 1

23 pages, 3441 KB  
Article
Integrating Large Language Models with Deep Learning for Breast Cancer Treatment Decision Support
by Heeseung Park, Serin Ok, Taewoo Kang and Meeyoung Park
Diagnostics 2026, 16(3), 394; https://doi.org/10.3390/diagnostics16030394 - 26 Jan 2026
Viewed by 220
Abstract
Background/Objectives: Breast cancer is one of the most common malignancies, but its heterogeneous molecular subtypes make treatment decision-making complex and patient-specific. Both the pathology reports and the electronic medical record (EMR) play a critical role for an appropriate treatment decision. This study [...] Read more.
Background/Objectives: Breast cancer is one of the most common malignancies, but its heterogeneous molecular subtypes make treatment decision-making complex and patient-specific. Both the pathology reports and the electronic medical record (EMR) play a critical role for an appropriate treatment decision. This study aimed to develop an integrated clinical decision support system (CDSS) that combines a large language model (LLM)-based pathology analysis with deep learning-based treatment prediction to support standardized and reliable decision-making. Methods: Real-world data (RWD) obtained from a cohort of 5015 patients diagnosed with breast cancer were analyzed. Meta-Llama-3-8B-Instruct automatically extracted the TNM stage and tumor size from the pathology reports, which were then integrated with EMR variables. A multi-label classification of 16 treatment combinations was performed using six models, including Decision Tree, Random Forest, GBM, XGBoost, DNN, and Transformer. Performance was evaluated using accuracy, macro/micro-averaged precision, recall, F1 score, and AUC. Results: Using combined LLM-extracted pathology and EMR features, GBM and XGBoost achieved the highest and most stable predictive performance across all feature subset configurations (macro-F1 ≈ 0.88–0.89; AUC = 0.867–0.868). Both models demonstrated strong discrimination ability and consistent recall and precision, highlighting their robustness for multi-label classification in real-world settings. Decision Tree and Random Forest showed moderate but reliable performance (macro-F1 = 0.84–0.86; AUC = 0.849–0.821), indicating their applicability despite lower predictive capability. By contrast, the DNN and Transformer models produced comparatively lower scores (macro-F1 = 0.74–0.82; AUC = 0.780–0.757), especially when using the full feature set, suggesting limited suitability for structured clinical data without strong contextual dependencies. These findings indicate that gradient-boosting ensemble approaches are better optimized for tabular medical data and generate more clinically reliable treatment recommendations. Conclusions: The proposed artificial intelligence-based CDSS improves accuracy and consistency in breast cancer treatment decision support by integrating automated pathology interpretation with deep learning, demonstrating its potential utility in real-world cancer care. Full article
(This article belongs to the Section Machine Learning and Artificial Intelligence in Diagnostics)
Show Figures

Figure 1

24 pages, 5159 KB  
Article
Forest Age Estimation by Integrating Tree Species Identity and Multi-Source Remote Sensing: Validating Heterogeneous Growth Patterns Through the Plant Economic Spectrum Theory
by Xiyu Zhang, Chao Zhang, Li Zhou, Huan Liu, Lianjin Fu and Wenlong Yang
Remote Sens. 2026, 18(3), 407; https://doi.org/10.3390/rs18030407 - 26 Jan 2026
Viewed by 204
Abstract
Current mainstream remote sensing approaches to forest age estimation frequently neglect interspecific differences in functional traits, which may limit the accurate representation of species-specific tree growth strategies. This study develops and validates a technical framework that incorporates multi-source remote sensing and tree species [...] Read more.
Current mainstream remote sensing approaches to forest age estimation frequently neglect interspecific differences in functional traits, which may limit the accurate representation of species-specific tree growth strategies. This study develops and validates a technical framework that incorporates multi-source remote sensing and tree species functional trait heterogeneity to systematically improve the accuracy of plantation age mapping. We constructed a processing chain—“multi-source feature fusion–species identification–heterogeneity modeling”—for a typical karst plantation landscape in southeastern Yunnan. Using the Google Earth Engine (GEE) platform, we integrated Sentinel-1/2 and Landsat time-series data, implemented a Gradient Boosting Decision Tree (GBDT) algorithm for species classification, and built age estimation models that incorporate species identity as a proxy for the growth strategy heterogeneity delineated by the Plant Economic Spectrum (PES) theory. Key results indicate: (1) Species classification reached an overall accuracy of 89.34% under spatial block cross-validation, establishing a reliable basis for subsequent modeling. (2) The operational model incorporating species information achieved an R2 (coefficient of determination) of 0.84 (RMSE (Root Mean Square Error) = 6.52 years) on the test set, demonstrating a substantial improvement over the baseline model that ignored species heterogeneity (R2 = 0.62). This demonstrates that species identity serves as an effective proxy for capturing the growth strategy heterogeneity described by the Plant Economic Spectrum (PES) theory, which is both distinguishable and valuable for modeling within the remote sensing feature space. (3) Error propagation analysis demonstrated strong robustness to classification uncertainties (γ = 0.23). (4) Plantation structure in the region was predominantly young-aged, with forests aged 0–20 years covering over 70% of the area. Despite inherent uncertainties in ground-reference age data, the integrated framework exhibited clear relative superiority, improving R2 from 0.62 to 0.84. Both error propagation analysis (γ = 0.23) and Monte Carlo simulations affirmed the robustness of the tandem workflow and the stability of the findings, providing a reliable methodology for improved-accuracy plantation carbon sink quantification. Full article
Show Figures

Figure 1

21 pages, 3370 KB  
Article
Mapping Soil Erodibility Using Machine Learning and Remote Sensing Data Fusion in the Northern Adana Region, Türkiye
by Melek Işik, Mehmet Işik, Mert Acar, Taofeek Samuel Wahab, Yakup Kenan Koca and Cenk Şahin
Agronomy 2026, 16(3), 294; https://doi.org/10.3390/agronomy16030294 - 24 Jan 2026
Viewed by 224
Abstract
Soil erosion is a major threat to the sustainable productivity of arable lands, making the accurate prediction of soil erodibility essential for effective soil conservation planning. Soil erodibility is strongly controlled by intrinsic soil properties that regulate aggregate resistance and detachment processes under [...] Read more.
Soil erosion is a major threat to the sustainable productivity of arable lands, making the accurate prediction of soil erodibility essential for effective soil conservation planning. Soil erodibility is strongly controlled by intrinsic soil properties that regulate aggregate resistance and detachment processes under erosive forces. In this study, machine learning (ML) models, including the Multi-layer Perceptron Regressor (MLP), Random Forest (RF), Decision Tree (DT), and Extreme Gradient Boosting (XGBoost), were applied to predict the soil erodibility factor (K-factor). A comprehensive set of soil properties, including soil texture, clay ratio (CR), organic matter (OM), aggregate stability (AS), mean weight diameter (MWD), dispersion ratio (DR), modified clay ratio (MCR), and critical level of organic matter (CLOM), was analyzed using 110 soil samples collected from the northern part of Adana Province, Türkiye. The observed K-factor was calculated using the RUSLE equation, and ML-based predictions were spatially mapped using Geographic Information Systems (GISs). The mean K-factor values for arable, forest, and horticultural land uses were 0.065, 0.071, and 0.109 t h MJ−1 mm−1, respectively. Among the tested models, XGBoost showed the best predictive performance, with the lowest MAE (0.0051) and RMSE (0.0110) and the highest R2 (0.9458). Furthermore, the XGBoost algorithm identified the CR as the most influential variable, closely followed by clay and MCR content. These results highlight the potential of ML-based approaches to support erosion risk assessment and soil management strategies at the regional scale. Full article
(This article belongs to the Section Precision and Digital Agriculture)
Show Figures

Figure 1

18 pages, 1843 KB  
Article
Predicting Human and Environmental Risk Factors of Accidents in the Energy Sector Using Machine Learning
by Kawtar Benderouach, Idriss Bennis, Khalifa Mansouri and Ali Siadat
Appl. Sci. 2026, 16(3), 1203; https://doi.org/10.3390/app16031203 - 24 Jan 2026
Viewed by 132
Abstract
The aim of this article is to develop a machine learning (ML)-based predictive model for industrial accidents in the energy sector. The dataset used in this study was obtained from the Kaggle platform and consists of summaries derived from reports of occupational incidents [...] Read more.
The aim of this article is to develop a machine learning (ML)-based predictive model for industrial accidents in the energy sector. The dataset used in this study was obtained from the Kaggle platform and consists of summaries derived from reports of occupational incidents resulting in injuries or deaths between 2015 and 2017. A total of 4739 accident cases were included, containing information on accident date, accident summary, degree and nature of injury, affected body part, event type, human factors, and environmental factors. Six supervised machine learning models—Logistic Regression, Decision Tree, Random Forest, Support Vector Machine, Gradient Boosting Decision Trees (GBDT), and Extreme Gradient Boosting (XGBoost)—were developed and compared to identify the most suitable model for the data. Model performance was evaluated using accuracy, precision, recall, F1-score, and the area under the receiver operating characteristic curve (AUC), which were selected to ensure reliable prediction in safety-critical accident scenarios. The results indicate that XGBoost and GBDT achieve superior performance in predicting human and environmental risk factors. These findings demonstrate the potential of machine learning for improving safety management in the energy sector by identifying risk mechanisms, enhancing safety awareness, and providing quantitative predictions of fatal and non-fatal accident occurrences for integration into safety management systems. Full article
(This article belongs to the Special Issue AI in Industry 4.0)
Show Figures

Figure 1

18 pages, 3659 KB  
Article
Grey Wolf Optimization-Optimized Ensemble Models for Predicting the Uniaxial Compressive Strength of Rocks
by Xigui Zheng, Arzoo Batool, Santosh Kumar and Niaz Muhammad Shahani
Appl. Sci. 2026, 16(2), 1130; https://doi.org/10.3390/app16021130 - 22 Jan 2026
Viewed by 53
Abstract
Reliable models for predicting the uniaxial compressive strength (UCS) of rocks are crucial for mining operations and rock engineering design. Empirical methods, including statistical methods, are often faced with many limitations when generalizing in a wide range of lithological types. To address this [...] Read more.
Reliable models for predicting the uniaxial compressive strength (UCS) of rocks are crucial for mining operations and rock engineering design. Empirical methods, including statistical methods, are often faced with many limitations when generalizing in a wide range of lithological types. To address this limitation, this study investigates the capability of grey wolf optimization (GWO)-optimized ensemble machine learning models, including decision tree (DT), extreme gradient boosting (XGBoost), and adaptive boosting (AdaBoost) for predicting UCS using a small dataset of easily measurable and non-destructive rock index properties. The study’s objective is to evaluate whether metaheuristic-based hyperparameter optimization can enhance model robustness and generalization performance under small-sample conditions. A unified experimental framework incorporating GWO-based optimization, three-fold cross-validation, sensitivity analysis, and multiple statistical performance indicators was implemented. The findings of this study confirm that although the GWO-XGBoost model achieves the highest training accuracy, it exhibits signs of mild overfitting. In contrast, the GWO-AdaBoost model outpaced with significant improvement in terms of coefficient of determination (R2) = 0.993, root mean square error (RMSE) = 2.2830, mean absolute error (MAE) = 1.6853, and mean absolute percentage error (MAPE) = 4.6974. Therefore, the GWO-AdaBoost has proven to be the most effective in terms of its prediction potential of UCS, with significant potential for adaptation due to its effectively learned parameters. From a theoretical perspective, this study highlights the non-equivalence between training accuracy and predictive reliability in UCS modeling. Practically, the findings support the use of GWO-AdaBoost as a reliable decision-support tool for preliminary rock strength assessment in mining and geotechnical engineering, particularly when comprehensive laboratory testing is not feasible. Full article
Show Figures

Figure 1

22 pages, 1714 KB  
Article
Integrating Machine-Learning Methods with Importance–Performance Maps to Evaluate Drivers for the Acceptance of New Vaccines: Application to AstraZeneca COVID-19 Vaccine
by Jorge de Andrés-Sánchez, Mar Souto-Romero and Mario Arias-Oliva
AI 2026, 7(1), 34; https://doi.org/10.3390/ai7010034 - 21 Jan 2026
Viewed by 199
Abstract
Background: The acceptance of new vaccines under uncertainty—such as during the COVID-19 pandemic—poses a major public health challenge because efficacy and safety information is still evolving. Methods: We propose an integrative analytical framework that combines a theory-based model of vaccine acceptance—the cognitive–affective–normative (CAN) [...] Read more.
Background: The acceptance of new vaccines under uncertainty—such as during the COVID-19 pandemic—poses a major public health challenge because efficacy and safety information is still evolving. Methods: We propose an integrative analytical framework that combines a theory-based model of vaccine acceptance—the cognitive–affective–normative (CAN) model—with machine-learning techniques (decision tree regression, random forest, and Extreme Gradient Boosting) and SHapley Additive exPlanations (SHAP) integrated into an importance–performance map (IPM) to prioritize determinants of vaccination intention. Using survey data collected in Spain in September 2020 (N = 600), when the AstraZeneca vaccine had not yet been approved, we examine the roles of perceived efficacy (EF), fear of COVID-19 (FC), fear of the vaccine (FV), and social influence (SI). Results: EF and SI consistently emerged as the most influential determinants across modelling approaches. Ensemble learners (random forest and Extreme Gradient Boosting) achieved stronger out-of-sample predictive performance than the single decision tree, while decision tree regression provided an interpretable, rule-based representation of the main decision pathways. Exploiting the local nature of SHAP values, we also constructed SHAP-based IPMs for the full sample and for the low-acceptance segment, enhancing the policy relevance of the prioritization exercise. Conclusions: By combining theory-driven structural modelling with predictive and explainable machine learning, the proposed framework offers a transparent and replicable tool to support the design of vaccination communication strategies and can be transferred to other settings involving emerging health technologies. Full article
Show Figures

Figure 1

13 pages, 6367 KB  
Article
Gene Expression-Based Colorectal Cancer Prediction Using Machine Learning and SHAP Analysis
by Yulai Yin, Zhen Yang, Xueqing Li, Shuo Gong and Chen Xu
Genes 2026, 17(1), 114; https://doi.org/10.3390/genes17010114 - 20 Jan 2026
Viewed by 287
Abstract
Objective: To develop and validate a genetic diagnostic model for colorectal cancer (CRC). Methods: First, differential expression genes (DEGs) between colorectal cancer and normal groups were screened using the TCGA database. Subsequently, a two-sample Mendelian randomization analysis was performed using the eQTL genomic [...] Read more.
Objective: To develop and validate a genetic diagnostic model for colorectal cancer (CRC). Methods: First, differential expression genes (DEGs) between colorectal cancer and normal groups were screened using the TCGA database. Subsequently, a two-sample Mendelian randomization analysis was performed using the eQTL genomic data from the IEU OpenGWAS database and colorectal cancer outcomes from the R12 Finnish database to identify associated genes. The intersecting genes from both methods were selected for the development and validation of the CRC genetic diagnostic model using nine machine learning algorithms: Lasso Regression, XGBoost, Gradient Boosting Machine (GBM), Generalized Linear Model (GLM), Neural Network (NN), Support Vector Machine (SVM), k-Nearest Neighbors (KNN), Random Forest (RF), and Decision Tree (DT). Results: A total of 3716 DEGs were identified from the TCGA database, while 121 genes were associated with CRC based on the eQTL Mendelian randomization analysis. The intersection of these two methods yielded 27 genes. Among the nine machine learning methods, XGBoost achieved the highest AUC value of 0.990. The top five genes predicted by the XGBoost method—RIF1, GDPD5, DBNDD1, RCCD1, and CLDN5—along with the five most significantly differentially expressed genes (ASCL2, IFITM3, IFITM1, SMPDL3A, and SUCLG2) in the GSE87211 dataset, were selected for the construction of the final colorectal cancer (CRC) genetic diagnostic model. The ROC curve analysis revealed an AUC (95% CI) of 0.9875 (0.9737–0.9875) for the training set, and 0.9601 (0.9145–0.9601) for the validation set, indicating strong predictive performance of the model. SHAP model interpretation further identified IFITM1 and DBNDD1 as the most influential genes in the XGBoost model, with both making positive contributions to the model’s predictions. Conclusions: The gene expression profile in colorectal cancer is characterized by enhanced cell proliferation, elevated metabolic activity, and immune evasion. A genetic diagnostic model constructed based on ten genes (RIF1, GDPD5, DBNDD1, RCCD1, CLDN5, ASCL2, IFITM3, IFITM1, SMPDL3A, and SUCLG2) demonstrates strong predictive performance. This model holds significant potential for the early diagnosis and intervention of colorectal cancer, contributing to the implementation of third-tier prevention strategies. Full article
(This article belongs to the Section Bioinformatics)
Show Figures

Figure 1

24 pages, 1576 KB  
Article
Non-Imaging Differential Diagnosis of Lower Limb Osteoarthritis: An Interpretable Machine Learning Framework
by Zhanel Baigarayeva, Assiya Boltaboyeva, Baglan Imanbek, Bibars Amangeldy, Nurdaulet Tasmurzayev, Kassymbek Ozhikenov, Assylbek Ozhiken, Zhadyra Alimbayeva and Naoya Maeda-Nishino
Algorithms 2026, 19(1), 87; https://doi.org/10.3390/a19010087 - 20 Jan 2026
Viewed by 204
Abstract
Background: Osteoarthritis (OA) is a prevalent chronic degenerative disorder, with coxarthrosis (hip OA) and gonarthrosis (knee OA) representing its most significant clinical manifestations. While diagnosis typically relies on imaging, such methods can be resource-intensive and insensitive to early disease trajectories. Objective: This study [...] Read more.
Background: Osteoarthritis (OA) is a prevalent chronic degenerative disorder, with coxarthrosis (hip OA) and gonarthrosis (knee OA) representing its most significant clinical manifestations. While diagnosis typically relies on imaging, such methods can be resource-intensive and insensitive to early disease trajectories. Objective: This study aims to achieve the differential diagnosis of coxarthrosis and gonarthrosis using solely routine preoperative clinical and laboratory data, benchmarking state-of-the-art machine learning algorithms. Methods: A retrospective analysis was conducted on 893 patients (617 with knee OA, 276 with hip OA) from a clinical hospital in Almaty, Kazakhstan. The study evaluated a diverse portfolio of models, including gradient boosting decision trees (LightGBM, XGBoost, CatBoost), deep learning architectures (RealMLP, TabDPT, TabM), and the pretrained tabular foundation model RealTabPFN v2.5. Results: The RealTabPFN v2.5 (Tuned) model achieved superior performance, recording a mean ROC–AUC of 0.9831, accuracy of 0.9485, and an F1-score of 0.9474. SHAP interpretability analysis identified heart rate (66.2%) and age (18.1%) as the dominant predictors driving the model’s decision-making process. Conclusion: Pretrained tabular foundation models demonstrate exceptional capability in distinguishing OA subtypes using limited clinical datasets, outperforming traditional ensemble methods. This approach offers a practical, high-performance triage tool for primary clinical assessment in resource-constrained settings. Full article
Show Figures

Figure 1

35 pages, 14165 KB  
Article
Spatiotemporal Patterns of Aboveground Carbon Storage in Hainan Mangroves Based on Machine Learning and Multi-Source Remote Sensing Data
by Zhikuan Liu, Zhaode Yin, Wenlu Zhao, Zhongke Feng, Huiqing Pei, Pietro Grimaldi and Zixuan Qiu
Forests 2026, 17(1), 131; https://doi.org/10.3390/f17010131 - 19 Jan 2026
Viewed by 190
Abstract
As an essential blue carbon ecosystem, mangroves play a vital role in coastal protection, biodiversity conservation, and climate regulation. However, their complex and variable growth environments pose challenges for precise monitoring. Hainan Island represents a region within China where mangrove forests are the [...] Read more.
As an essential blue carbon ecosystem, mangroves play a vital role in coastal protection, biodiversity conservation, and climate regulation. However, their complex and variable growth environments pose challenges for precise monitoring. Hainan Island represents a region within China where mangrove forests are the most concentrated and diverse in type. In recent years, ecological restoration efforts have led to the recovery of their coverage areas. This study analyzed the spatial distribution, canopy height, and aboveground carbon storage variations in Hainan mangrove forests. Deep-learning and multiple machine-learning algorithms were used to integrate multitemporal Sentinel-2 remote sensing imagery from 2019 to 2023 with unmanned aerial vehicle observations and field survey data. Multi-rule image fusion and deep-learning techniques effectively enhanced mangrove identification accuracy. The mangrove classification achieved an overall accuracy exceeding 90%. The mangrove area in Hainan increased from 3948.83 ha in 2019 to 4304.29 ha in 2023. Gradient-boosted decision tree (GBDT) models estimated average canopy height with a high coefficient of determination (R2 = 0.89), and Random Forest (RF) models yielded the best estimations of total above-ground carbon stock with strong agreement to field observations. Integrating multisource remote sensing data with artificial intelligence algorithms enabled high-precision dynamic monitoring of mangrove distribution, structure, and carbon storage to provide scientific support for the assessment, management, and carbon sink accounting of Hainan mangrove ecosystems. Full article
Show Figures

Figure 1

20 pages, 3262 KB  
Article
Glass Fall-Offs Detection for Glass Insulated Terminals via a Coarse-to-Fine Machine-Learning Framework
by Weibo Li, Bingxun Zeng, Weibin Li, Nian Cai, Yinghong Zhou, Shuai Zhou and Hao Xia
Micromachines 2026, 17(1), 128; https://doi.org/10.3390/mi17010128 - 19 Jan 2026
Viewed by 163
Abstract
Glass-insulated terminals (GITs) are widely used in high-reliability microelectronic systems, where glass fall-offs in the sealing region may seriously degrade the reliability of the microelectronic component and further degrade the device reliability. Automatic inspection of such defects is challenging due to strong light [...] Read more.
Glass-insulated terminals (GITs) are widely used in high-reliability microelectronic systems, where glass fall-offs in the sealing region may seriously degrade the reliability of the microelectronic component and further degrade the device reliability. Automatic inspection of such defects is challenging due to strong light reflection, irregular defect appearances, and limited defective samples. To address these issues, a coarse-to-fine machine-learning framework is proposed for glass fall-off detection in GIT images. By exploiting the circular-ring geometric prior of GITs, an adaptive sector partition scheme is introduced to divide the region of interest into sectors. Four categories of sector features, including color statistics, gray-level variations, reflective properties, and gradient distributions, are designed for coarse classification using a gradient boosting decision tree (GBDT). Furthermore, a sector neighbor (SN) feature vector is constructed from adjacent sectors to enhance fine classification. Experiments on real industrial GIT images show that the proposed method outperforms several representative inspection approaches, achieving an average IoU of 96.85%, an F1-score of 0.984, a pixel-level false alarm rate of 0.55%, and a pixel-level missed alarm rate of 35.62% at a practical inspection speed of 32.18 s per image. Full article
(This article belongs to the Special Issue Emerging Technologies and Applications for Semiconductor Industry)
Show Figures

Figure 1

24 pages, 43005 KB  
Article
Accurate Estimation of Spring Maize Aboveground Biomass in Arid Regions Based on Integrated UAV Remote Sensing Feature Selection
by Fengxiu Li, Yanzhao Guo, Yingjie Ma, Ning Lv, Zhijian Gao, Guodong Wang, Zhitao Zhang, Lei Shi and Chongqi Zhao
Agronomy 2026, 16(2), 219; https://doi.org/10.3390/agronomy16020219 - 16 Jan 2026
Viewed by 250
Abstract
Maize is one of the top three crops globally, ranking only behind rice and wheat, making it an important crop of interest. Aboveground biomass is a key indicator for assessing maize growth and its yield potential. This study developed an efficient and stable [...] Read more.
Maize is one of the top three crops globally, ranking only behind rice and wheat, making it an important crop of interest. Aboveground biomass is a key indicator for assessing maize growth and its yield potential. This study developed an efficient and stable biomass prediction model to estimate the aboveground biomass (AGB) of spring maize (Zea mays L.) under subsurface drip irrigation in arid regions, based on UAV multispectral remote sensing and machine learning techniques. Focusing on typical subsurface drip-irrigated spring maize in arid Xinjiang, multispectral images and field-measured AGB data were collected from 96 sample points (selected via stratified random sampling across 24 plots) over four key phenological stages in 2024 and 2025. Sixteen vegetation indices were calculated and 40 texture features were extracted using the gray-level co-occurrence matrix method, while an integrated feature-selection strategy combining Elastic Net and Random Forest was employed to effectively screen key predictor variables. Based on the selected features, six machine learning models were constructed, including Elastic Net Regression (ENR), Gradient Boosting Decision Trees (GBDT), Gaussian Process Regression (GPR), Partial Least Squares Regression (PLSR), Random Forest (RF), and Extreme Gradient Boosting (XGB). Results showed that the fused feature set comprised four vegetation indices (GRDVI, RERVI, GRVI, NDVI) and five texture features (R_Corr, NIR_Mean, NIR_Vari, B_Mean, B_Corr), thereby retaining red-edge and visible-light texture information highly sensitive to AGB. The GPR model based on the fused features exhibited the best performance (test set R2 = 0.852, RMSE = 2890.74 kg ha−1, MAE = 1676.70 kg ha−1), demonstrating high fitting accuracy and stable predictive ability across both the training and test sets. Spatial inversions over the two growing seasons of 2024 and 2025, derived from the fused-feature GPR optimal model at four key phenological stages, revealed pronounced spatiotemporal heterogeneity and stage-dependent dynamics of spring maize AGB: the biomass accumulates rapidly from jointing to grain filling, slows thereafter, and peaks at maturity. At a constant planting density, AGB increased markedly with nitrogen inputs from N0 to N3 (420 kg N ha−1), with the high-nitrogen N3 treatment producing the greatest biomass; this successfully captured the regulatory effect of the nitrogen gradient on maize growth, provided reliable data for variable-rate fertilization, and is highly relevant for optimizing water–fertilizer coordination in subsurface drip irrigation systems. Future research may extend this integrated feature selection and modeling framework to monitor the growth and estimate the yield of other crops, such as rice and cotton, thereby validating its generalizability and robustness in diverse agricultural scenarios. Full article
Show Figures

Figure 1

25 pages, 1436 KB  
Article
Entropy-Augmented Forecasting and Portfolio Construction at the Industry-Group Level: A Causal Machine-Learning Approach Using Gradient-Boosted Decision Trees
by Gil Cohen, Avishay Aiche and Ron Eichel
Entropy 2026, 28(1), 108; https://doi.org/10.3390/e28010108 - 16 Jan 2026
Viewed by 256
Abstract
This paper examines whether information-theoretic complexity measures enhance industry-group return forecasting and portfolio construction within a machine-learning framework. Using daily data for 25 U.S. GICS industry groups spanning more than three decades, we augment gradient-boosted decision tree models with Shannon entropy and fuzzy [...] Read more.
This paper examines whether information-theoretic complexity measures enhance industry-group return forecasting and portfolio construction within a machine-learning framework. Using daily data for 25 U.S. GICS industry groups spanning more than three decades, we augment gradient-boosted decision tree models with Shannon entropy and fuzzy entropy computed from recent return dynamics. Models are estimated at weekly, monthly, and quarterly horizons using a strictly causal rolling-window design and translated into two economically interpretable allocation rules, a maximum-profit strategy and a minimum-risk strategy. Results show that the top performing strategy, the weekly maximum-profit model augmented with Shannon entropy, achieves an accumulated return exceeding 30,000%, substantially outperforming both the baseline model and the fuzzy-entropy variant. On monthly and quarterly horizons, entropy and fuzzy entropy generate smaller but robust improvements by maintaining lower volatility and better downside protection. Industry allocations display stable and economically interpretable patterns, profit-oriented strategies concentrate primarily in cyclical and growth-sensitive industries such as semiconductors, automobiles, technology hardware, banks, and energy, while minimum-risk strategies consistently favor defensive industries including utilities, food, beverage and tobacco, real estate, and consumer staples. Overall, the results demonstrate that entropy-based complexity measures improve both economic performance and interpretability, yielding industry-rotation strategies that are simultaneously more profitable, more stable, and more transparent. Full article
(This article belongs to the Special Issue Entropy, Artificial Intelligence and the Financial Markets)
Show Figures

Figure 1

Back to TopTop