Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (262)

Search Parameters:
Keywords = eXtreme Gradient Boosting (XGB)

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
15 pages, 2070 KiB  
Article
Machine Learning for Personalized Prediction of Electrocardiogram (EKG) Use in Emergency Care
by Hairong Wang and Xingyu Zhang
J. Pers. Med. 2025, 15(8), 358; https://doi.org/10.3390/jpm15080358 - 6 Aug 2025
Abstract
Background: Electrocardiograms (EKGs) are essential tools in emergency medicine, often used to evaluate chest pain, dyspnea, and other symptoms suggestive of cardiac dysfunction. Yet, EKGs are not universally administered to all emergency department (ED) patients. Understanding and predicting which patients receive an [...] Read more.
Background: Electrocardiograms (EKGs) are essential tools in emergency medicine, often used to evaluate chest pain, dyspnea, and other symptoms suggestive of cardiac dysfunction. Yet, EKGs are not universally administered to all emergency department (ED) patients. Understanding and predicting which patients receive an EKG may offer insights into clinical decision making, resource allocation, and potential disparities in care. This study examines whether integrating structured clinical data with free-text patient narratives can improve prediction of EKG utilization in the ED. Methods: We conducted a retrospective observational study to predict electrocardiogram (EKG) utilization using data from 13,115 adult emergency department (ED) visits in the nationally representative 2021 National Hospital Ambulatory Medical Care Survey–Emergency Department (NHAMCS-ED), leveraging both structured features—demographics, vital signs, comorbidities, arrival mode, and triage acuity, with the most influential selected via Lasso regression—and unstructured patient narratives transformed into numerical embeddings using Clinical-BERT. Four supervised learning models—Logistic Regression (LR), Support Vector Machine (SVM), Random Forest (RF) and Extreme Gradient Boosting (XGB)—were trained on three inputs (structured data only, text embeddings only, and a late-fusion combined model); hyperparameters were optimized by grid search with 5-fold cross-validation; performance was evaluated via AUROC, accuracy, sensitivity, specificity and precision; and interpretability was assessed using SHAP values and Permutation Feature Importance. Results: EKGs were administered in 30.6% of adult ED visits. Patients who received EKGs were more likely to be older, White, Medicare-insured, and to present with abnormal vital signs or higher triage severity. Across all models, the combined data approach yielded superior predictive performance. The SVM and LR achieved the highest area under the ROC curve (AUC = 0.860 and 0.861) when using both structured and unstructured data, compared to 0.772 with structured data alone and 0.823 and 0.822 with unstructured data alone. Similar improvements were observed in accuracy, sensitivity, and specificity. Conclusions: Integrating structured clinical data with patient narratives significantly enhances the ability to predict EKG utilization in the emergency department. These findings support a personalized medicine framework by demonstrating how multimodal data integration can enable individualized, real-time decision support in the ED. Full article
(This article belongs to the Special Issue Machine Learning in Epidemiology)
Show Figures

Figure 1

11 pages, 830 KiB  
Article
Machine Learning-Based Prediction of Shoulder Dystocia in Pregnancies Without Suspected Macrosomia Using Fetal Biometric Ratios
by Can Ozan Ulusoy, Ahmet Kurt, Ayşe Gizem Yıldız, Özgür Volkan Akbulut, Gonca Karataş Baran and Yaprak Engin Üstün
J. Clin. Med. 2025, 14(15), 5240; https://doi.org/10.3390/jcm14155240 - 24 Jul 2025
Viewed by 292
Abstract
Objective: Shoulder dystocia (ShD) is a rare but serious obstetric emergency associated with significant neonatal morbidity. This study aimed to evaluate the predictive performance of machine learning (ML) models based on fetal biometric ratios and clinical characteristics for the identification of ShD [...] Read more.
Objective: Shoulder dystocia (ShD) is a rare but serious obstetric emergency associated with significant neonatal morbidity. This study aimed to evaluate the predictive performance of machine learning (ML) models based on fetal biometric ratios and clinical characteristics for the identification of ShD in pregnancies without clinical suspicion of macrosomia. Methods: We conducted a retrospective case-control study including 284 women (84 ShD cases and 200 controls) who underwent spontaneous vaginal delivery between 37 and 42 weeks of gestation. All participants had an estimated fetal weight (EFW) below the 90th percentile according to Hadlock reference curves. Univariate and multivariate logistic regression analyses were performed on maternal and neonatal parameters, and statistically significant variables (p < 0.05) were used to construct adjusted odds ratio (aOR) models. Supervised ML models—Logistic Regression (LR), Random Forest (RF), and Extreme Gradient Boosting (XGB)—were trained and tested to assess predictive accuracy. Performance metrics included AUC-ROC, sensitivity, specificity, accuracy, and F1-score. Results: The BPD/AC ratio and AC/FL ratio markedly enhanced the prediction of ShD. When added to other features in RF models, the BPD/AC ratio got an AUC of 0.884 (95% CI: 0.802–0.957), a sensitivity of 68%, and a specificity of 83%. On the other hand, the AC/FL ratio, along with other factors, led to an AUC of 0.896 (95% CI: 0.805–0.972), 68% sensitivity, and 90% specificity. Conclusions: In pregnancies without clinical suspicion of macrosomia, ML models integrating fetal biometric ratios with maternal and labor-related factors significantly improved the prediction of ShD. These models may support clinical decision-making in low-risk deliveries where ShD is often unexpected. Full article
(This article belongs to the Section Obstetrics & Gynecology)
Show Figures

Figure 1

21 pages, 9989 KiB  
Article
Machine Learning-Based Comparative Analysis on Direct and Indirect Mapping of Soil Texture Types Through Soil Particle Size Fractions Using Multi-Source Remote Sensing
by Jia Liu, Yingcong Ye, Cui Wang, Songchao Chen, Yameng Jiang, Xi Guo and Yefeng Jiang
Agriculture 2025, 15(13), 1395; https://doi.org/10.3390/agriculture15131395 - 28 Jun 2025
Cited by 1 | Viewed by 827
Abstract
Soil texture, defined by the proportions of sand, silt, and clay particles in the soil, is one of the most essential physical properties of soil. High-resolution soil texture data can provide critical parameter support for soil hydrological modeling, agricultural production management, and ecosystem [...] Read more.
Soil texture, defined by the proportions of sand, silt, and clay particles in the soil, is one of the most essential physical properties of soil. High-resolution soil texture data can provide critical parameter support for soil hydrological modeling, agricultural production management, and ecosystem assessment. In digital soil mapping, previous studies often predicted the sand, silt, and clay contents in soil and then indirectly calculated soil texture. Currently, approaches that directly map soil texture by classification modeling are gaining increasing attention due to the decreased error from data conversion, but few studies have systematically compared these two methods yet. In this study, we comprehensively assessed the performance of direct and indirect predicting soil texture using four machine learning algorithms (e.g., extreme gradient boosting, random forest, gradient boosting decision tree, and extremely randomized tree) with 190 covariates from the Digital Elevation Model, Sentinel-1/2 satellite images, and classification maps and generated a 10 m resolution soil texture map based on 405 topsoil (0–20 cm) sample data collected in Suichuan County, China. The results showed that compared with indirect predictions, direct predictions improved overall accuracy (OA) by 20.57–44.19% and the Kappa coefficient (Kappa) by 0.220–0.402. Among the models used, the XGB model achieved the highest accuracy (OA: 0.948; Kappa: 0.931) and the lowest uncertainty (confusion index: 0.052). The direct prediction map (nine classes recorded) exhibited more detailed and diverse spatial distribution patterns than the indirect prediction map (six classes recorded), aligning better with the actual environment. Based on accuracy validation and spatial distribution, the performance of the XGB model was best during direct prediction. The Shapley additive explanation from the XGB model revealed that the normalized height and stream power indices were the most significant factors driving the soil texture in the study area. Our results provide a reference for future studies on soil texture mapping using machine learning models. Full article
(This article belongs to the Section Agricultural Soils)
Show Figures

Figure 1

18 pages, 2452 KiB  
Article
Exploring the Habitat Distribution of Decapterus macarellus in the South China Sea Under Varying Spatial Resolutions: A Combined Approach Using Multiple Machine Learning and the MaxEnt Model
by Qikun Shen, Peng Zhang, Xue Feng, Zuozhi Chen and Jiangtao Fan
Biology 2025, 14(7), 753; https://doi.org/10.3390/biology14070753 - 24 Jun 2025
Viewed by 395
Abstract
The selection of environmental variables with different spatial resolutions is a critical factor affecting the accuracy of machine learning-based fishery forecasting. In this study, spring-season survey data of Decapterus macarellus in the South China Sea from 2016 to 2024 were used to construct [...] Read more.
The selection of environmental variables with different spatial resolutions is a critical factor affecting the accuracy of machine learning-based fishery forecasting. In this study, spring-season survey data of Decapterus macarellus in the South China Sea from 2016 to 2024 were used to construct six machine learning models—decision tree (DT), extra trees (ETs), K-Nearest Neighbors (KNN), light gradient boosting machine (LGBM), random forest (RF), and extreme gradient boosting (XGB)—based on seven environmental variables (e.g., sea surface temperature (SST), chlorophyll-a concentration (CHL)) at four spatial resolutions (0.083°, 0.25°, 0.5°, and 1°), filtered using Pearson correlation analysis. Optimal models were selected under each resolution through performance comparison. SHapley Additive exPlanations (SHAP) values were employed to interpret the contribution of environmental predictors, and the maximum entropy (MaxEnt) model was used to perform habitat suitability mapping. Results showed that the XGB model at 0.083° resolution achieved the best performance, with the area under the receiver operating characteristic curve (ROC_AUC) = 0.836, accuracy = 0.793, and negative predictive value = 0.862, outperforming models at coarser resolutions. CHL was identified as the most influential variable, showing high importance in both the SHAP distribution and the cumulative area under the curve contribution. Predicted suitable habitats were mainly located in the northern and central-southern South China Sea, with the latter covering a broader area. This study is the first to systematically evaluate the impact of spatial resolution on environmental variable selection in machine learning models, integrating SHAP-based interpretability with MaxEnt modeling to achieve reliable habitat suitability prediction, offering valuable insights for fishery forecasting in the South China Sea. Full article
(This article belongs to the Section Marine Biology)
Show Figures

Figure 1

24 pages, 4652 KiB  
Article
A Machine Learning-Based Assessment of Proxies and Drivers of Harmful Algal Blooms in the Western Lake Erie Basin Using Satellite Remote Sensing
by Neha Joshi, Armeen Ghoorkhanian, Jongmin Park, Kaiguang Zhao and Sami Khanal
Remote Sens. 2025, 17(13), 2164; https://doi.org/10.3390/rs17132164 - 24 Jun 2025
Viewed by 412
Abstract
The western region of Lake Erie has been experiencing severe water-quality issues, mainly through the infestation of algal blooms, highlighting the urgent need for action. Understanding the drivers and the intricacies associated with algal bloom phenomena is important to develop effective water-quality remediation [...] Read more.
The western region of Lake Erie has been experiencing severe water-quality issues, mainly through the infestation of algal blooms, highlighting the urgent need for action. Understanding the drivers and the intricacies associated with algal bloom phenomena is important to develop effective water-quality remediation strategies. In this study, the influences of multiple bloom drivers were explored, together with Harmonized Landsat Sentinel-2 (HLS) images, using the datasets collected in Western Lake Erie from 2013 to 2022. Bloom drivers included a group of physicochemical and meteorological variables, and Chlorophyll-a (Chl-a) served as a proxy for algal blooms. Various combinations of these datasets were used as predictor variables for three machine learning models, including Support Vector Regression (SVR), Extreme Gradient Boosting (XGB), and Random Forest (RF). Each model is complemented with the SHapley Additive exPlanations (SHAP) model to understand the role of predictor variables in Chl-a estimation. A combination of physicochemical variables and optical spectral bands yielded the highest model performance (R2 up to 0.76, RMSE as low as 8.04 µg/L). The models using only meteorological data and spectral bands performed poorly (R2 < 0.40), indicating the limited standalone predictive power of meteorological variables. While satellite-only models achieved moderate performance (R2 up to 0.48), they could still be useful for preliminary monitoring where field data are unavailable. Furthermore, all 20 variables did not substantially improve model performance over models with only spectral and physicochemical inputs. While SVR achieved the highest R2 in individual runs, XGB provided the most stable and consistently strong performance across input configurations, which could be an important consideration for operational use. These findings are highly relevant for harmful algal bloom (HAB) monitoring, where Chl-a serves as a critical proxy. By clarifying the contribution of diverse variables to Chl-a prediction and identifying robust modeling approaches, this study provides actionable insights to support data-driven management decisions aimed at mitigating HAB impacts in freshwater systems. Full article
Show Figures

Graphical abstract

19 pages, 2832 KiB  
Article
High Spatial Resolution Soil Moisture Mapping over Agricultural Field Integrating SMAP, IMERG, and Sentinel-1 Data in Machine Learning Models
by Diego Tola, Lautaro Bustillos, Fanny Arragan, Rene Chipana, Renaud Hostache, Eléonore Resongles, Raúl Espinoza-Villar, Ramiro Pillco Zolá, Elvis Uscamayta, Mayra Perez-Flores and Frédéric Satgé
Remote Sens. 2025, 17(13), 2129; https://doi.org/10.3390/rs17132129 - 21 Jun 2025
Viewed by 1930
Abstract
Soil moisture content (SMC) is a critical parameter for agricultural productivity, particularly in semi-arid regions, where irrigation practices are extensively used to offset water deficits and ensure decent yields. Yet, the socio-economic and remote context of these regions prevents sufficiently dense SMC monitoring [...] Read more.
Soil moisture content (SMC) is a critical parameter for agricultural productivity, particularly in semi-arid regions, where irrigation practices are extensively used to offset water deficits and ensure decent yields. Yet, the socio-economic and remote context of these regions prevents sufficiently dense SMC monitoring in space and time to support farmers in their work to avoid unsustainable irrigation practices and preserve water resource availability. In this context, our study addresses the challenge of high spatial resolution (i.e., 20 m) SMC estimation by integrating remote sensing datasets in machine learning models. For this purpose, a dataset made of 166 soil samples’ SMC along with corresponding SMC, precipitation, and radar signal derived from Soil Moisture Active Passive (SMAP), Integrated Multi-satellitE Retrievals for GPM (IMERG), and Sentinel-1 (S1), respectively, was used to assess four machine learning models’ (Decision Tree—DT, Random Forest—RF, Gradient Boosting—GB, Extreme Gradient Boosting—XGB) reliability for SMC mapping. First, each model was trained/validated using only the coarse spatial resolution (i.e., 10 km) SMAP SMC and IMERG precipitation estimates as independent features, and, second, S1 information (i.e., 20 m) derived from single scenes and/or composite images was added as independent features to highlight the benefit of information (i.e., S1 information) for SMC mapping at high spatial resolution (i.e., 20 m). Results show that integrating S1 information from both single scenes and composite images to SMAP SMC and IMERG precipitation data significantly improves model reliability, as R2 increased by 12% to 16%, while RMSE decreased by 10% to 18%, depending on the considered model (i.e., RF, XGB, DT, GB). Overall, all models provided reliable SMC estimates at 20 m spatial resolution, with the GB model performing the best (R2 = 0.86, RMSE = 2.55%). Full article
(This article belongs to the Special Issue Remote Sensing for Soil Properties and Plant Ecosystems)
Show Figures

Figure 1

27 pages, 4150 KiB  
Article
Improved Liquefaction Hazard Assessment via Deep Feature Extraction and Stacked Ensemble Learning on Microtremor Data
by Oussama Arab, Soufiana Mekouar, Mohamed Mastere, Roberto Cabieces and David Rodríguez Collantes
Appl. Sci. 2025, 15(12), 6614; https://doi.org/10.3390/app15126614 - 12 Jun 2025
Viewed by 408
Abstract
The reduction in disaster risk in urban regions due to natural hazards (e.g., earthquakes, landslides, floods, and tropical cyclones) is primarily a development matter that must be treated within the scope of a broader urban development framework. Natural hazard assessment is one of [...] Read more.
The reduction in disaster risk in urban regions due to natural hazards (e.g., earthquakes, landslides, floods, and tropical cyclones) is primarily a development matter that must be treated within the scope of a broader urban development framework. Natural hazard assessment is one of the turning points in mitigating disaster risk, which typically contributes to stronger urban resilience and more sustainable urban development. Regarding this challenge, our research proposes a new approach in the signal processing chain and feature extraction from microtremor data that focuses mainly on the Horizontal-to-Vertical Spectral Ratio (HVSR) so as to assess liquefaction potential as a natural hazard using AI. The key raw seismic features of site amplification and resonance are extracted from the data via bandpass filtering, Fourier Transformation (FT), the calculation of the HVSR, and smoothing through the use of moving averages. The main novelty is the integration of machine learning, particularly stacked ensemble learning, for liquefaction potential classification from imbalanced seismic datasets. For this approach, several models are used to consider class imbalance, enhancing classification performance and offering better insight into liquefaction risk based on microtremor data. Then, the paper proposes a liquefaction detection method based on deep learning with an autoencoder and stacked classifiers. The autoencoder compresses data into the latent space, underlining the liquefaction features classified by the multi-layer perceptron (MLP) classifier and eXtreme Gradient Boosting (XGB) classifier, and the meta-model combines these outputs to put special emphasis on rare liquefaction events. This proposed methodology improved the detection of an imbalanced dataset, although challenges remain in both interpretability and computational complexity. We created a synthetic dataset of 1000 samples using realistic feature ranges that mimic the Rif data region to test model performance and conduct sensitivity analysis. Key seismic and geotechnical variables were included, confirming the amplification factor (Af) and seismic vulnerability index (Kg) as dominant predictors and supporting model generalizability in data-scarce regions. Our proposed method for liquefaction potential classification achieves 100% classification accuracy, 100% precision, and 100% recall, providing a new baseline. Compared to existing models such as XGB and MLP, the proposed model performs better in all metrics. This new approach could become a critical component in assessing liquefaction hazard, contributing to disaster mitigation and urban planning. Full article
Show Figures

Figure 1

28 pages, 4269 KiB  
Article
XGB-BIF: An XGBoost-Driven Biomarker Identification Framework for Detecting Cancer Using Human Genomic Data
by Veena Ghuriani, Jyotsna Talreja Wassan, Priyal Tripathi and Anshika Chauhan
Int. J. Mol. Sci. 2025, 26(12), 5590; https://doi.org/10.3390/ijms26125590 - 11 Jun 2025
Viewed by 808
Abstract
The human genome has a profound impact on human health and disease detection. Carcinoma (cancer) is one of the prominent diseases that majorly affect human health and requires the development of different treatment strategies and targeted therapies based on effective disease detection. Therefore, [...] Read more.
The human genome has a profound impact on human health and disease detection. Carcinoma (cancer) is one of the prominent diseases that majorly affect human health and requires the development of different treatment strategies and targeted therapies based on effective disease detection. Therefore, our research aims to identify biomarkers associated with distinct cancer types (gastric, lung, and breast) using machine learning. In the current study, we have analyzed the human genomic data of gastric cancer, breast cancer, and lung cancer patients using XGB-BIF (i.e., XGBoost-Driven Biomarker Identification Framework for detecting cancer). The proposed framework utilizes feature selection via XGBoost (eXtreme Gradient Boosting), which captures feature interactions efficiently and takes care of the non-linear effects in the genomic data. The research progressed by training XGBoost on the full dataset, ranking the features based on the Gain measure (importance), followed by the classification phase, which employed support vector machines (SVM), logistic regression (LR), and random forest (RF) models for classifying cancer-diseased and non-diseased states. To ensure interpretability and transparency, we also applied SHapley Additive exPlanations (SHAP) and Local Interpretable Model-agnostic Explanations (LIME), enabling the identification of high-impact biomarkers contributing to risk stratification. Biomarker significance is discussed primarily via pathway enrichment and by studying survival analysis (Kaplan–Meier curves, Cox regression) for identified biomarkers to strengthen translational value. Our models achieved high predictive performance, with an accuracy of more than 90%, to classify and link genomic data into diseased (cancer) and non-diseased states. Furthermore, we evaluated the models using Cohen’s Kappa statistic, which confirmed strong agreement between predicted and actual risk categories, with Kappa scores ranging from 0.80 to 0.99. Our proposed framework also achieved strong predictions on the METABRIC dataset during external validation, attaining an AUC-ROC of 93%, accuracy of 0.79%, and Kappa of 74%. Through extensive experimentation, XGB-BIF identified the top biomarker genes for different cancer datasets (gastric, lung, and breast). CBX2, CLDN1, SDC2, PGF, FOXS1, ADAMTS18, POLR1B, and PYCR3 were identified as important biomarkers to identify diseased and non-diseased states of gastric cancer; CAVIN2, ADAMTS5, SCARA5, CD300LG, and GIPC2 were identified as important biomarkers for breast cancer; and CLDN18, MYBL2, ASPA, AQP4, FOLR1, and SLC39A8 were identified as important biomarkers for lung cancer. XGB-BIF could be utilized for identifying biomarkers of different cancer types using genetic data, which can further help clinicians in developing targeted therapies for cancer patients. Full article
Show Figures

Graphical abstract

24 pages, 9236 KiB  
Article
Evaluating the Thermohydraulic Performance of Microchannel Gas Coolers: A Machine Learning Approach
by Shehryar Ishaque, Naveed Ullah, Sanghun Choi and Man-Hoe Kim
Energies 2025, 18(12), 3007; https://doi.org/10.3390/en18123007 - 6 Jun 2025
Viewed by 370
Abstract
In this study, a numerical model of a microchannel gas cooler was developed using a segment-by-segment approach for thermohydraulic performance evaluation. State-of-the-art heat transfer and pressure drop correlations were used to determine the air and refrigerant side heat transfer coefficients and friction factors. [...] Read more.
In this study, a numerical model of a microchannel gas cooler was developed using a segment-by-segment approach for thermohydraulic performance evaluation. State-of-the-art heat transfer and pressure drop correlations were used to determine the air and refrigerant side heat transfer coefficients and friction factors. The developed model was validated against a wide range of experimental data and was found to accurately predict the gas cooler capacity (Q) and pressure drop (ΔP) within an acceptable margin of error. Furthermore, advanced machine learning algorithms such as extreme gradient boosting (XGB), random forest (RF), support vector regression (SVR), k-nearest neighbors (KNNs), and artificial neural networks (ANNs) were employed to analyze their predictive capability. Over 11,000 data points from the numerical model were used, with 80% of the data for training and 20% for testing. The evaluation metrics, such as the coefficient of determination (R2, 0.99841–0.99836) and mean squared error values (0.09918–0.10639), demonstrated high predictive efficacy and accuracy, with only slight variations among the models. All models accurately predict the Q, with the XGB and ANN models showing superior performance in ΔP prediction. Notably, the ANN model emerges as the most accurate method for refrigerant and air outlet temperatures predictions. These findings highlight the potential of machine learning as a robust tool for optimizing thermal system performance and guiding the design of energy-efficient heat exchange technologies. Full article
(This article belongs to the Special Issue Heat Transfer Analysis: Recent Challenges and Applications)
Show Figures

Figure 1

19 pages, 1865 KiB  
Article
Modeling Soil Temperature with Fuzzy Logic and Supervised Learning Methods
by Bilal Cemek, Yunus Kültürel, Emirhan Cemek, Erdem Küçüktopçu and Halis Simsek
Appl. Sci. 2025, 15(11), 6319; https://doi.org/10.3390/app15116319 - 4 Jun 2025
Viewed by 538
Abstract
Soil temperature is a critical environmental factor that affects plant development, physiological processes, and overall productivity. This study compares two modeling approaches for predicting soil temperature at various depths: (i) fuzzy logic-based systems, including the Mamdani fuzzy inference system (MFIS) and the adaptive [...] Read more.
Soil temperature is a critical environmental factor that affects plant development, physiological processes, and overall productivity. This study compares two modeling approaches for predicting soil temperature at various depths: (i) fuzzy logic-based systems, including the Mamdani fuzzy inference system (MFIS) and the adaptive neuro-fuzzy inference system (ANFIS); (ii) supervised machine learning algorithms, such as multilayer perceptron (MLP), support vector regression (SVR), random forest (RF), extreme gradient boosting (XGB), and k-nearest neighbors (KNN), along with multiple Linear regression (MLR) as a statistical benchmark. Soil temperature data were collected from Tokat, Türkiye, between 2016 and 2024 at depths of 5, 10, 20, 50, and 100 cm. The dataset was split into training (2016–2021) and testing (2022–2024) periods. Performance was evaluated using the root mean square error (RMSE), the mean absolute error (MAE), and the coefficient of determination (R2). The ANFIS achieved the best prediction accuracy (MAE = 1.46 °C, RMSE = 1.89 °C, R2 = 0.95), followed by RF, XGB, MLP, KNN, SVR, MLR, and MFIS. This study underscores the potential of integrating machine learning and fuzzy logic techniques for more accurate soil temperature modeling, contributing to precision agriculture and better resource management. Full article
Show Figures

Figure 1

25 pages, 9716 KiB  
Article
Comparison of Neural Network, Ordinary Kriging, and Inverse Distance Weighting Algorithms for Seismic and Well-Derived Depth Data: A Case Study in the Bjelovar Subdepression, Croatia
by Ana Brcković, Tomislav Malvić, Jasna Orešković and Josipa Kapuralić
Geosciences 2025, 15(6), 206; https://doi.org/10.3390/geosciences15060206 - 2 Jun 2025
Viewed by 574
Abstract
In subsurface geological mapping, it is more than advisable to compare different solutions obtained with neural and other algorithms. Here, for such comparison, we used the previously published and well-prepared dataset of subsurface data collected from the Bjelovar Subdepression, a 2900 km2 [...] Read more.
In subsurface geological mapping, it is more than advisable to compare different solutions obtained with neural and other algorithms. Here, for such comparison, we used the previously published and well-prepared dataset of subsurface data collected from the Bjelovar Subdepression, a 2900 km2 large regional macrounit in the Croatian part of the Pannonian Basin System. Data on depth were obtained for the youngest (the shallowest) Lonja Formation (Pliocene, Quaternary) and mapped using neural network (NN), inverse distance weighting (IDW), and ordinary kriging (OK) algorithms. The obtained maps were compared based on square error (using k-fold cross-validation) and the visual interpretation of isopaches. Two other algorithms were also tested, namely, random forest (RF) and extreme gradient boosting (XGB) algorithms, but they were rejected as inappropriate for this purpose solely based on the visuals of the obtained maps, which did not follow any interpretable geological structures. The results showed that NN is a highly adjustable method for interpolation, with adjustment for numerous hyperparameters. IDW showed its strength as one of the classical interpolators, and its results are always located close to the top if several methods are compared. OK is the relative winner, showing the flexibility of variogram analysis regarding the number of data points and possible clustering. The presented variogram model, even with a relatively high sill and occasional nugget effect, can be well fitted into OK, giving better results than other methods when applied to the presented area and datasets. This was not surprising because kriging is a well-established method used exclusively for interpolation. In contrast, NN and machine learning algorithms are used in many fields, and these algorithms, particularly the fitting of hyperparameters in NN, simply cannot be the best solution for all. Full article
(This article belongs to the Section Geophysics)
Show Figures

Figure 1

25 pages, 7899 KiB  
Article
Machine Learning-Based Alfalfa Height Estimation Using Sentinel-2 Multispectral Imagery
by Hazhir Bahrami, Karem Chokmani, Saeid Homayouni, Viacheslav I. Adamchuk, Rami Albasha, Md Saifuzzaman and Maxime Leduc
Remote Sens. 2025, 17(10), 1759; https://doi.org/10.3390/rs17101759 - 18 May 2025
Cited by 1 | Viewed by 1542
Abstract
Climate change is threatening the sustainability of crop yields due to an increasing frequency of extreme weather conditions, requiring timely agricultural monitoring. Remote sensing facilitates consistent and continuous monitoring of field crops. This study aimed to estimate alfalfa crop height through satellite images [...] Read more.
Climate change is threatening the sustainability of crop yields due to an increasing frequency of extreme weather conditions, requiring timely agricultural monitoring. Remote sensing facilitates consistent and continuous monitoring of field crops. This study aimed to estimate alfalfa crop height through satellite images and machine learning methods within the Google Earth Engine (GEE) Python API. Ground measurements for this study were collected over three years in four Canadian provinces. We utilized Sentinel-2 data to obtain satellite imagery corresponding to the same timeframe and location as the ground measurements. Three machine learning algorithms were employed to estimate plant height from satellite images: random forest (RF), support vector regression (SVR), and extreme gradient boosting (XGB). The efficacy of these algorithms has been assessed and compared. Several widely used vegetation indices, for instance normalized difference vegetation index (NDVI), enhanced vegetation index (EVI), and normalized difference red-edge (NDRE), were selected and assessed in this study. RF feature importance was utilized to determine the ranking of features from most to least significant. Several feature selection strategies were utilized and compared with the situation where all features are used. We demonstrated that RF and XGB surpassed SVR when assessing test data performance. Our findings showed that XGB and RF could predict alfalfa crop height with an R2 of 0.79 and a mean absolute error (MAE) of around 4 cm Our findings indicated that SVR exhibited the lowest accuracy among the three algorithms tested, with R2 of 0.69 and an MAE of 4.63 cm. The analysis of important features showed that normalized difference red edge (NDRE) and normalized difference water index (NDWI) were the most important variables in determining alfalfa crop height. The results of this study also demonstrated that using RF and feature selection strategies, alfalfa crop height can be estimated with comparably high accuracy. Given that the models were fully trained and developed in Python (v. 3.10), they can be readily implemented in a decision support system and deliver near real-time estimations of alfalfa crop height for farmers throughout Canada. Full article
Show Figures

Figure 1

18 pages, 4841 KiB  
Article
Multi-Hazard Susceptibility Mapping Using Machine Learning Approaches: A Case Study of South Korea
by Changju Kim, Soonchan Park and Heechan Han
Remote Sens. 2025, 17(10), 1660; https://doi.org/10.3390/rs17101660 - 8 May 2025
Viewed by 893
Abstract
The frequency and magnitude of natural hazards have been steadily increasing, largely due to extreme weather events driven by climate change. These hazards pose significant global challenges, underscoring the need for accurate prediction models and systematic preparedness. This study aimed to predict multiple [...] Read more.
The frequency and magnitude of natural hazards have been steadily increasing, largely due to extreme weather events driven by climate change. These hazards pose significant global challenges, underscoring the need for accurate prediction models and systematic preparedness. This study aimed to predict multiple natural hazards in South Korea using various machine learning algorithms. The study area, South Korea (100,210 km2), was divided into a grid system with a 0.01° resolution. Meteorological, climatic, topographical, and remotely sensed data were interpolated into each grid cell for analysis. The study focused on three major natural hazards: drought, flood, and wildfire. Predictive models were developed using two machine learning algorithms: Random Forest (RF) and Extreme Gradient Boosting (XGB). The analysis showed that XGB performed exceptionally well in predicting droughts and floods, achieving ROC scores of 0.9998 and 0.9999, respectively. For wildfire prediction, RF achieved a high ROC score of 0.9583. The results were integrated to generate a multi-hazard susceptibility map. This study provides foundational data for the development of hazard management and response strategies in the context of climate change. Furthermore, it offers a basis for future research exploring the interaction effects of multi-hazards. Full article
Show Figures

Figure 1

17 pages, 2046 KiB  
Article
Breast Lesion Detection Using Weakly Dependent Customized Features and Machine Learning Models with Explainable Artificial Intelligence
by Simona Moldovanu, Dan Munteanu, Keka C. Biswas and Luminita Moraru
J. Imaging 2025, 11(5), 135; https://doi.org/10.3390/jimaging11050135 - 28 Apr 2025
Viewed by 586
Abstract
This research proposes a novel strategy for accurate breast lesion classification that combines explainable artificial intelligence (XAI), machine learning (ML) classifiers, and customized weakly dependent features from ultrasound (BU) images. Two new weakly dependent feature classes are proposed to improve the diagnostic accuracy [...] Read more.
This research proposes a novel strategy for accurate breast lesion classification that combines explainable artificial intelligence (XAI), machine learning (ML) classifiers, and customized weakly dependent features from ultrasound (BU) images. Two new weakly dependent feature classes are proposed to improve the diagnostic accuracy and diversify the training data. These are based on image intensity variations and the area of bounded partitions and provide complementary rather than overlapping information. ML classifiers such as Random Forest (RF), Extreme Gradient Boosting (XGB), Gradient Boosting Classifiers (GBC), and LASSO regression were trained with both customized feature classes. To validate the reliability of our study and the results obtained, we conducted a statistical analysis using the McNemar test. Later, an XAI model was combined with ML to tackle the influence of certain features, the constraints of feature selection, and the interpretability capabilities across various ML models. LIME (Local Interpretable Model-Agnostic Explanations) and SHAP (SHapley Additive exPlanations) models were used in the XAI process to enhance the transparency and interpretation in clinical decision-making. The results revealed common relevant features for the malignant class, consistently identified by all of the classifiers, and for the benign class. However, we observed variations in the feature importance rankings across the different classifiers. Furthermore, our study demonstrates that the correlation between dependent features does not impact explainability. Full article
(This article belongs to the Special Issue Celebrating the 10th Anniversary of the Journal of Imaging)
Show Figures

Figure 1

22 pages, 10717 KiB  
Article
Interpretable Multi-Sensor Fusion of Optical and SAR Data for GEDI-Based Canopy Height Mapping in Southeastern North Carolina
by Chao Wang, Conghe Song, Todd A. Schroeder, Curtis E. Woodcock, Tamlin M. Pavelsky, Qianqian Han and Fangfang Yao
Remote Sens. 2025, 17(9), 1536; https://doi.org/10.3390/rs17091536 - 25 Apr 2025
Viewed by 1297
Abstract
Accurately monitoring forest canopy height is crucial for sustainable forest management, particularly in southeastern North Carolina, USA, where dense forests and limited accessibility pose substantial challenges. This study presents an explainable machine learning framework that integrates sparse GEDI LiDAR samples with multi-sensor remote [...] Read more.
Accurately monitoring forest canopy height is crucial for sustainable forest management, particularly in southeastern North Carolina, USA, where dense forests and limited accessibility pose substantial challenges. This study presents an explainable machine learning framework that integrates sparse GEDI LiDAR samples with multi-sensor remote sensing data to improve both the accuracy and interpretability of forest canopy height estimation. This framework incorporates multitemporal optical observations from Sentinel-2; C-band backscatter and InSAR coherence from Sentinel-1; quad-polarization L-Band backscatter and polarimetric decompositions from the Uninhabited Aerial Vehicle Synthetic Aperture Radar (UAVSAR); texture features from the National Agriculture Imagery Program (NAIP) aerial photography; and topographic data derived from an airborne LiDAR-based digital elevation model. We evaluated four machine learning algorithms, K-nearest neighbors (KNN), random forest (RF), support vector machine (SVM), and eXtreme gradient boosting (XGB), and found consistent accuracy across all models. Our evaluation highlights our method’s robustness, evidenced by closely matched R2 and RMSE values across models: KNN (R2 of 0.496, RMSE of 5.13 m), RF (R2 of 0.510, RMSE of 5.06 m), SVM (R2 of 0.544, RMSE of 4.88 m), and XGB (R2 of 0.548, RMSE of 4.85 m). The integration of comprehensive feature sets, as opposed to subsets, yielded better results, underscoring the value of using multisource remotely sensed data. Crucially, SHapley Additive exPlanations (SHAP) revealed the multi-seasonal red-edge spectral bands of Sentinel-2 as dominant predictors across models, while volume scattering from UAVSAR emerged as a key driver in tree-based algorithms. This study underscores the complementary nature of multi-sensor data and highlights the interpretability of our models. By offering spatially continuous, high-quality canopy height estimates, this cost-effective, data-driven approach advances large-scale forest management and environmental monitoring, paving the way for improved decision-making and conservation strategies. Full article
Show Figures

Graphical abstract

Back to TopTop