MDPI - Publisher of Open Access Journals

18 pages, 4041 KiB

Open AccessArticle

A Deep Learning Approach to Alzheimer’s Diagnosis Using EEG Data: Dual-Attention and Optuna-Optimized SVM

by Funda Bulut Arikan, Dilber Cetintas, Aziz Aksoy and Muhammed Yildirim

Biomedicines 2025, 13(8), 2017; https://doi.org/10.3390/biomedicines13082017 - 19 Aug 2025

Background/Objectives: Alzheimer’s disease (AD) is a progressive neurodegenerative disorder, pathologically defined by the accumulation of amyloid-β plaques and tau-related neurofibrillary tangles in the brain. It represents a principal driver of cognitive deterioration in middle-aged and elderly populations. Early diagnosis and pharmacological management [...] Read more.

Background/Objectives: Alzheimer’s disease (AD) is a progressive neurodegenerative disorder, pathologically defined by the accumulation of amyloid-β plaques and tau-related neurofibrillary tangles in the brain. It represents a principal driver of cognitive deterioration in middle-aged and elderly populations. Early diagnosis and pharmacological management of the disease markedly improve both the quality and duration of life. Methods: Electroencephalography (EEG) is critical in detecting and analyzing Alzheimer’s disease. The widespread use of mobile EEG devices in recent years has necessitated real-time and effective data processing. However, extracting disease-specific features from EEG data still poses a significant challenge, especially in cases that must be completed quickly. This study aims to determine the frequency bands associated with Alzheimer’s disease in EEG data obtained from multiple channels and to accelerate the detection methods. An accurate classification that requires little computation is the primary goal. Results: EEG recordings of 48 individuals (24 AD and 24 healthy controls (HC)) obtained from Florida State University were divided into Alpha, Beta, Delta, Gamma, and Theta frequency bands; scalograms and spectrograms were generated for each frequency band. The effectiveness of these bands was evaluated using the MobileNetV2 architecture. The results showed that Delta and Beta frequency bands were the most significant for Alzheimer’s detection. By analyzing the features obtained from the Delta and Beta bands using the MobileNetV2 model integrated with the Dual-Attention Mechanism, it was determined that the attention mechanisms improved model performance by 2%. In addition, the use of an SVM classifier with hyperparameters optimized via Optuna resulted in approximately 3% performance improvement, suggesting that hyperparameter tuning may contribute positively to classification accuracy. Furthermore, combining features obtained from these frequency bands increased the detection performance when evaluated with larger datasets. Conclusions: The study demonstrates the potential of frequency band-based analyses and feature fusion methods to increase the accuracy and efficiency of Alzheimer’s diagnosis using EEG data. The results are promising; however, they should be interpreted with caution regarding their generalizability. Full article

(This article belongs to the Special Issue Neural Nexus: Interdisciplinary Perspectives on Neurological Disorders)

► Show Figures

Figure 1

18 pages, 2291 KiB

Open AccessArticle

Forecasting Tibetan Plateau Lake Level Responses to Climate Change: An Explainable Deep Learning Approach Using Altimetry and Climate Models

by Atefeh Gholami and Wen Zhang

Water 2025, 17(16), 2434; https://doi.org/10.3390/w17162434 - 17 Aug 2025

Viewed by 109

Abstract

The Tibetan Plateau’s lakes, serving as critical water towers for over two billion people, exhibit divergent responses to climate change that remain poorly quantified. This study develops a deep learning framework integrating Synthetic Aperture Radar (SAR) altimetry from Sentinel-3A with bias-corrected CMIP6 (Coupled [...] Read more.

The Tibetan Plateau’s lakes, serving as critical water towers for over two billion people, exhibit divergent responses to climate change that remain poorly quantified. This study develops a deep learning framework integrating Synthetic Aperture Radar (SAR) altimetry from Sentinel-3A with bias-corrected CMIP6 (Coupled Model Intercomparison Project Phase 6) climate projections under Shared Socioeconomic Pathways (SSP) scenarios (SSP2-4.5 and SSP5-8.5, adjusted via quantile mapping) to predict lake-level changes across eight Tibetan Plateau (TP) lakes. Using a Feed-Forward Neural Network (FFNN) optimized via Bayesian optimization using the Optuna framework, we achieve robust water level projections (mean validation R² = 0.861) and attribute drivers through Shapley Additive exPlanations (SHAP) analysis. Results reveal a stark north–south divergence: glacier-fed northern lakes like Migriggyangzham will rise by 13.18 ± 0.56 m under SSP5-8.5 due to meltwater inputs (temperature SHAP value = 0.41), consistent with the early (melt-dominated) phase of the IPCC’s ‘peak water’ framework. In comparison, evaporation-dominated southern lakes such as Langacuo face irreversible desiccation (−4.96 ± 0.68 m by 2100) as evaporative demand surpasses precipitation gains. Transitional western lakes exhibit “peak water” inflection points (e.g., Lumajang Dong’s 2060 maximum) signaling cryospheric buffer loss. These projections, validated through rigorous quantile mapping and rolling-window cross-validation, provide the first process-aware assessment of TP Lake vulnerabilities, informing adaptation strategies under the Sustainable Development Goals (SDGs) for water security (SDG 6) and climate action (SDG 13). The methodological framework establishes a transferable paradigm for monitoring high-altitude freshwater systems globally. Full article

(This article belongs to the Special Issue Applications of Remote Sensing in Hydrology and Water Resource Management)

► Show Figures

Figure 1

28 pages, 2570 KiB

Open AccessArticle

Efficient Hydrodynamic Shape Optimization of a Sea-Turtle-Inspired AUH Using an Optuna-Tuned NSGA-II

by Xintong Guo, Hongwu Huang, Chao Yuan, Xiujing Gao, Hao Zhong and Lijiao Wang

J. Mar. Sci. Eng. 2025, 13(8), 1541; https://doi.org/10.3390/jmse13081541 - 11 Aug 2025

Viewed by 190

Abstract

Disc-shaped Autonomous Underwater Helicopters (AUHs) offer superior maneuverability but suffer from high hydrodynamic drag, which limits their operational endurance. To address this challenge, this study proposes a robust optimization framework for a novel sea-turtle-inspired AUH. A parametric hull, governed by two dimensionless shape [...] Read more.

Disc-shaped Autonomous Underwater Helicopters (AUHs) offer superior maneuverability but suffer from high hydrodynamic drag, which limits their operational endurance. To address this challenge, this study proposes a robust optimization framework for a novel sea-turtle-inspired AUH. A parametric hull, governed by two dimensionless shape factors based on modified Myring equations, was established to facilitate systematic exploration. To reduce the high computational cost of direct CFD evaluations, a high-precision Gaussian Process Regression (GPR) surrogate model was constructed from a small dataset of 24 samples. The core methodological innovation is T-NSGA-II, an algorithm featuring hyperparameters that are systematically optimized by the Optuna framework. In comparative evaluations, the T-NSGA-II-generated Pareto front demonstrated clear superiority over the standard NSGA-II, identifying designs with significantly lower drag for an equivalent vertical force. A key scientific contribution of this research is the identification of a distinct performance gap on the Pareto front. This phenomenon is interpreted not as an algorithmic artifact but as a ‘natural gap’, reflecting a deep physical trade-off, with potential underlying causes including a critical transition in flow physics or a topological shift in the optimal hull geometries. This work not only delivers a suite of optimized, practical AUH designs but also presents a powerful, intelligent optimization methodology that is capable of revealing fundamental physical trade-offs in complex engineering problems. Full article

(This article belongs to the Section Ocean Engineering)

► Show Figures

Figure 1

18 pages, 5052 KiB

Open AccessArticle

Slope Stability Assessment Using an Optuna-TPE-Optimized CatBoost Model

by Liangcheng Wang, Chengliang Zhang, Wei Wang, Tao Deng, Tao Ma and Pei Shuai

Eng 2025, 6(8), 185; https://doi.org/10.3390/eng6080185 - 4 Aug 2025

Viewed by 257

Abstract

Slope stability assessment is a critical component of engineering safety. Conventional analytical methods frequently struggle to integrate heterogeneous slope data and model intricate failure mechanisms, thereby constraining their efficacy in practical engineering scenarios. To tackle these issues, this study presents a novel slope [...] Read more.

Slope stability assessment is a critical component of engineering safety. Conventional analytical methods frequently struggle to integrate heterogeneous slope data and model intricate failure mechanisms, thereby constraining their efficacy in practical engineering scenarios. To tackle these issues, this study presents a novel slope stability classification model grounded in the Optuna-TPE-CatBoost framework. By leveraging the Tree-structured Parzen Estimator (TPE) within the Optuna framework, the model adaptively optimizes CatBoost hyperparameters, thus enhancing prediction accuracy and robustness. It incorporates six key features—slope height, slope angle, unit weight, cohesion, internal friction angle, and the pore pressure ratio—to establish a comprehensive and intelligent assessment system. Utilizing a dataset of 272 slope cases, the model was trained with k-fold cross-validation and dynamic class imbalance strategies to ensure its generalizability. The optimized model achieved impressive performance metrics: an area under the receiver operating characteristic curve (AUC) of 0.926, an accuracy of 0.901, a recall of 0.874, and an F1-score of 0.881, outperforming benchmark algorithms such as XGBoost, LightGBM, and the unoptimized CatBoost. Validation via engineering case studies confirms that the model accurately evaluates slope stability across diverse scenarios and effectively captures the complex interactions between key parameters. This model offers a reliable and interpretable solution for slope stability assessment under complex failure mechanisms. Full article

► Show Figures

Figure 1

24 pages, 6378 KiB

Open AccessArticle

Comparative Analysis of Ensemble Machine Learning Methods for Alumina Concentration Prediction

by Xiang Xia, Xiangquan Li, Yanhong Wang and Jianheng Li

Processes 2025, 13(8), 2365; https://doi.org/10.3390/pr13082365 - 25 Jul 2025

Viewed by 383

Abstract

In the aluminum electrolysis production process, the traditional cell control method based on cell voltage and series current can no longer meet the goals of energy conservation, consumption reduction, and digital-intelligent transformation. Therefore, a new digital cell control technology that is centrally dependent [...] Read more.

In the aluminum electrolysis production process, the traditional cell control method based on cell voltage and series current can no longer meet the goals of energy conservation, consumption reduction, and digital-intelligent transformation. Therefore, a new digital cell control technology that is centrally dependent on various process parameters has become an urgent demand in the aluminum electrolysis industry. Among them, the real-time online measurement of alumina concentration is one of the key data points for implementing such technology. However, due to the harsh production environment and limitations of current sensor technologies, hardware-based detection of alumina concentration is difficult to achieve. To address this issue, this study proposes a soft-sensing model for alumina concentration based on a long short-term memory (LSTM) neural network optimized by a weighted average algorithm (WAA). The proposed method outperforms BiLSTM, CNN-LSTM, CNN-BiLSTM, CNN-LSTM-Attention, and CNN-BiLSTM-Attention models in terms of predictive accuracy. In comparison to LSTM models optimized using the Grey Wolf Optimizer (GWO), Harris Hawks Optimization (HHO), Optuna, Tornado Optimization Algorithm (TOC), and Whale Migration Algorithm (WMA), the WAA-enhanced LSTM model consistently achieves significantly better performance. This superiority is evidenced by lower MAE and RMSE values, along with higher R² and accuracy scores. The WAA-LSTM model remains stable throughout the training process and achieves the lowest final loss, further confirming the accuracy and superiority of the proposed approach. Full article

(This article belongs to the Special Issue AI / Machine Learning Techniques as a Tool for Process Modeling and Product Design)

► Show Figures

Figure 1

13 pages, 4726 KiB

Open AccessArticle

Interpretable Prediction and Analysis of PVA Hydrogel Mechanical Behavior Using Machine Learning

by Liying Xu, Siqi Liu, Anqi Lin, Zichuan Su and Daxin Liang

Gels 2025, 11(7), 550; https://doi.org/10.3390/gels11070550 - 16 Jul 2025

Viewed by 404

Abstract

Polyvinyl alcohol (PVA) hydrogels have emerged as versatile materials due to their exceptional biocompatibility and tunable mechanical properties, showing great promise for flexible sensors, smart wound dressings, and tissue engineering applications. However, rational design remains challenging due to complex structure–property relationships involving multiple [...] Read more.

Polyvinyl alcohol (PVA) hydrogels have emerged as versatile materials due to their exceptional biocompatibility and tunable mechanical properties, showing great promise for flexible sensors, smart wound dressings, and tissue engineering applications. However, rational design remains challenging due to complex structure–property relationships involving multiple formulation parameters. This study presents an interpretable machine learning framework for predicting PVA hydrogel tensile strain properties with emphasis on mechanistic understanding, based on a comprehensive dataset of 350 data points collected from a systematic literature review. XGBoost demonstrated superior performance after Optuna-based optimization, achieving R² values of 0.964 for training and 0.801 for testing. SHAP analysis provided unprecedented mechanistic insights, revealing that PVA molecular weight dominates mechanical performance (SHAP importance: 84.94) through chain entanglement and crystallization mechanisms, followed by degree of hydrolysis (72.46) and cross-linking parameters. The interpretability analysis identified optimal parameter ranges and critical feature interactions, elucidating complex non-linear relationships and reinforcement mechanisms. By addressing the “black box” limitation of machine learning, this approach enables rational design strategies and mechanistic understanding for next-generation multifunctional hydrogels. Full article

(This article belongs to the Special Issue Research Progress and Application Prospects of Gel Electrolytes)

► Show Figures

Figure 1

22 pages, 8891 KiB

Open AccessArticle

Mapping Soil Available Nitrogen Using Crop-Specific Growth Information and Remote Sensing

by Xinle Zhang, Yihan Ma, Shinai Ma, Chuan Qin, Yiang Wang, Huanjun Liu, Lu Chen and Xiaomeng Zhu

Agriculture 2025, 15(14), 1531; https://doi.org/10.3390/agriculture15141531 - 15 Jul 2025

Viewed by 487

Abstract

Soil available nitrogen (AN) is a critical nutrient for plant absorption and utilization. Accurately mapping its spatial distribution is essential for improving crop yields and advancing precision agriculture. In this study, 188 AN soil samples (0–20 cm) were collected at Heshan Farm, Nenjiang [...] Read more.

Soil available nitrogen (AN) is a critical nutrient for plant absorption and utilization. Accurately mapping its spatial distribution is essential for improving crop yields and advancing precision agriculture. In this study, 188 AN soil samples (0–20 cm) were collected at Heshan Farm, Nenjiang County, Heihe City, Heilongjiang Province, in 2023. The soil available nitrogen content ranged from 65.81 to 387.10 mg kg⁻¹, with a mean value of 213.85 ± 61.16 mg kg⁻¹. Sentinel-2 images and normalized vegetation index (NDVI) and enhanced vegetation index (EVI) time series data were acquired on the Google Earth Engine (GEE) platform in the study area during the bare soil period (April, May, and October) and the growth period (June–September). These remote sensing variables were combined with soil sample data, crop type information, and crop growth period data as predictive factors and input into a Random Forest (RF) model optimized using the Optuna hyperparameter tuning algorithm. The accuracy of different strategies was evaluated using 5-fold cross-validation. The research results indicate that (1) the introduction of growth information at different growth periods of soybean and maize has different effects on the accuracy of soil AN mapping. In soybean plantations, the introduction of EVI data during the pod setting period increased the mapping accuracy R² by 0.024–0.088 compared to other growth periods. In maize plantations, the introduction of EVI data during the grouting period increased R² by 0.004–0.033 compared to other growth periods, which is closely related to the nitrogen absorption intensity and spectral response characteristics during the reproductive growth period of crops. (2) Combining the crop types and their optimal period growth information could improve the mapping accuracy, compared with only using the bare soil period image (R² = 0.597)—the R² increased by 0.035, the root mean square error (RMSE) decreased by 0.504%, and the mapping accuracy of R² could be up to 0.632. (3) The mapping accuracy of the bare soil period image differed significantly among different months, with a higher mapping accuracy for the spring data than the fall, the R² value improved by 0.106 and 0.100 compared with that of the fall, and the month of April was the optimal window period of the bare soil period in the present study area. The study shows that when mapping the soil AN content in arable land, different crop types, data collection time, and crop growth differences should be considered comprehensively, and the combination of specific crop types and their optimal period growth information has a greater potential to improve the accuracy of mapping soil AN content. This method not only opens up a new technological path to improve the accuracy of remote sensing mapping of soil attributes but also lays a solid foundation for the research and development of precision agriculture and sustainability. Full article

(This article belongs to the Topic Advances in Smart Agriculture with Remote Sensing as the Core and Its Applications in Crops Field)

► Show Figures

Figure 1

22 pages, 1906 KiB

Open AccessArticle

Explainable and Optuna-Optimized Machine Learning for Battery Thermal Runaway Prediction Under Class Imbalance Conditions

by Abir El Abed, Ghalia Nassreddine, Obada Al-Khatib, Mohamad Nassereddine and Ali Hellany

Thermo 2025, 5(3), 23; https://doi.org/10.3390/thermo5030023 - 15 Jul 2025

Viewed by 541

Abstract

Modern energy storage systems for both power and transportation are highly related to lithium-ion batteries (LIBs). However, their safety depends on a potentially hazardous failure mode known as thermal runaway (TR). Predicting and classifying TR causes can widely enhance the safety of power [...] Read more.

Modern energy storage systems for both power and transportation are highly related to lithium-ion batteries (LIBs). However, their safety depends on a potentially hazardous failure mode known as thermal runaway (TR). Predicting and classifying TR causes can widely enhance the safety of power and transportation systems. This paper presents an advanced machine learning method for forecasting and classifying the causes of TR. A generative model for synthetic data generation was used to handle class imbalance in the dataset. Hyperparameter optimization was conducted using Optuna for four classifiers: Support Vector Machine (SVM), Multi-Layer Perceptron (MLP), tabular network (TabNet), and Extreme Gradient Boosting (XGBoost). A three-fold cross-validation approach was used to guarantee a robust evaluation. An open-source database of LIB failure events is used for model training and testing. The XGBoost model outperforms the other models across all TR categories by achieving 100% accuracy and a high recall (1.00). Model results were interpreted using SHapley Additive exPlanations analysis to investigate the most significant factors in TR predictors. The findings show that important TR indicators include energy adjusted for heat and weight loss, heater power, average cell temperature upon activation, and heater duration. These findings guide the design of safer battery systems and preventive monitoring systems for real applications. They can help experts develop more efficient battery management systems, thereby improving the performance and longevity of battery-operated devices. By enhancing the predictive knowledge of temperature-driven failure mechanisms in LIBs, the study directly advances thermal analysis and energy storage safety domains. Full article

► Show Figures

Figure 1

23 pages, 6067 KiB

Open AccessArticle

Daily-Scale Fire Risk Assessment for Eastern Mongolian Grasslands by Integrating Multi-Source Remote Sensing and Machine Learning

by Risu Na, Byambakhuu Gantumur, Wala Du, Sainbuyan Bayarsaikhan, Yu Shan, Qier Mu, Yuhai Bao, Nyamaa Tegshjargal and Battsengel Vandansambuu

Fire 2025, 8(7), 273; https://doi.org/10.3390/fire8070273 - 11 Jul 2025

Viewed by 827

Abstract

Frequent wildfires in the eastern grasslands of Mongolia pose significant threats to the ecological environment and pastoral livelihoods, creating an urgent need for high-temporal-resolution and high-precision fire prediction. To address this, this study established a daily-scale grassland fire risk assessment framework integrating multi-source [...] Read more.

Frequent wildfires in the eastern grasslands of Mongolia pose significant threats to the ecological environment and pastoral livelihoods, creating an urgent need for high-temporal-resolution and high-precision fire prediction. To address this, this study established a daily-scale grassland fire risk assessment framework integrating multi-source remote sensing data to enhance predictive capabilities in eastern Mongolia. Utilizing fire point data from eastern Mongolia (2012–2022), we fused multiple feature variables and developed and optimized three models: random forest (RF), XGBoost, and deep neural network (DNN). Model performance was enhanced using Bayesian hyperparameter optimization via Optuna. Results indicate that the Bayesian-optimized XGBoost model achieved the best generalization performance, with an overall accuracy of 92.3%. Shapley additive explanations (SHAP) interpretability analysis revealed that daily-scale meteorological factors—daily average relative humidity, daily average wind speed, daily maximum temperature—and the normalized difference vegetation index (NDVI) were consistently among the top four contributing variables across all three models, identifying them as key drivers of fire occurrence. Spatiotemporal validation using historical fire data from 2023 demonstrated that fire points recorded on 8 April and 1 May 2023 fell within areas predicted to have “extremely high” fire risk probability on those respective days. Moreover, points A (117.36° E, 46.70° N) and B (116.34° E, 49.57° N) exhibited the highest number of days classified as “high” or “extremely high” risk during the April/May and September/October periods, consistent with actual fire occurrences. In summary, the integration of multi-source data fusion and Bayesian-optimized machine learning has enabled the first high-precision daily-scale wildfire risk prediction for the eastern Mongolian grasslands, thus providing a scientific foundation and decision-making support for wildfire prevention and control in the region. Full article

(This article belongs to the Special Issue Machine Learning (ML) and Deep Learning (DL) Applications in Wildfire Science: Principles, Progress and Prospects (2nd Edition))

► Show Figures

Figure 1

25 pages, 7504 KiB

Open AccessArticle

Explainable Artificial Intelligence (XAI) for Flood Susceptibility Assessment in Seoul: Leveraging Evolutionary and Bayesian AutoML Optimization

by Kounghoon Nam, Youngkyu Lee, Sungsu Lee, Sungyoon Kim and Shuai Zhang

Remote Sens. 2025, 17(13), 2244; https://doi.org/10.3390/rs17132244 - 30 Jun 2025

Viewed by 594

Abstract

This study aims to enhance the accuracy and interpretability of flood susceptibility mapping (FSM) in Seoul, South Korea, by integrating automated machine learning (AutoML) with explainable artificial intelligence (XAI) techniques. Ten topographic and environmental conditioning factors were selected as model inputs. We first [...] Read more.

This study aims to enhance the accuracy and interpretability of flood susceptibility mapping (FSM) in Seoul, South Korea, by integrating automated machine learning (AutoML) with explainable artificial intelligence (XAI) techniques. Ten topographic and environmental conditioning factors were selected as model inputs. We first employed the Tree-based Pipeline Optimization Tool (TPOT), an evolutionary AutoML algorithm, to construct baseline ensemble models using Gradient Boosting (GB), Random Forest (RF), and XGBoost (XGB). These models were further fine-tuned using Bayesian optimization via Optuna. To interpret the model outcomes, SHAP (SHapley Additive exPlanations) was applied to analyze both the global and local contributions of each factor. The SHAP analysis revealed that lower elevation, slope, and stream distance, as well as higher stream density and built-up areas, were the most influential factors contributing to flood susceptibility. Moreover, interactions between these factors, such as built-up areas located on gentle slopes near streams, further intensified flood risk. The susceptibility maps were reclassified into five categories (very low to very high), and the GB model identified that approximately 15.047% of the study area falls under very-high-flood-risk zones. Among the models, the GB classifier achieved the highest performance, followed by XGB and RF. The proposed framework, which integrates TPOT, Optuna, and SHAP within an XAI pipeline, not only improves predictive capability but also offers transparent insights into feature behavior and model logic. These findings support more robust and interpretable flood risk assessments for effective disaster management in urban areas. Full article

(This article belongs to the Special Issue Artificial Intelligence for Natural Hazards (AI4NH))

► Show Figures

Figure 1

31 pages, 1127 KiB

Open AccessArticle

Optimizing Credit Risk Prediction for Peer-to-Peer Lending Using Machine Learning

by Lyne Imene Souadda, Ahmed Rami Halitim, Billel Benilles, José Manuel Oliveira and Patrícia Ramos

Forecasting 2025, 7(3), 35; https://doi.org/10.3390/forecast7030035 - 29 Jun 2025

Viewed by 956

Abstract

Hyperparameter optimization (HPO) is critical for enhancing the predictive performance of machine learning models in credit risk assessment for peer-to-peer (P2P) lending. This study evaluates four HPO methods, Grid Search, Random Search, Hyperopt, and Optuna, across four models, Logistic Regression, Random Forest, XGBoost, [...] Read more.

Hyperparameter optimization (HPO) is critical for enhancing the predictive performance of machine learning models in credit risk assessment for peer-to-peer (P2P) lending. This study evaluates four HPO methods, Grid Search, Random Search, Hyperopt, and Optuna, across four models, Logistic Regression, Random Forest, XGBoost, and LightGBM, using three real-world datasets (Lending Club, Australia, Taiwan). We assess predictive accuracy (AUC, Sensitivity, Specificity, G-Mean), computational efficiency, robustness, and interpretability. LightGBM achieves the highest AUC (e.g.,

70.77 %

on Lending Club,

93.25 %

on Australia,

77.85 %

on Taiwan), with XGBoost performing comparably. Bayesian methods (Hyperopt, Optuna) match or approach Grid Search’s accuracy while reducing runtime by up to

75.7

-fold (e.g.,

3.19

vs.

241.47

min for LightGBM on Lending Club). A sensitivity analysis confirms robust hyperparameter configurations, with AUC variations typically below

0.4 %

under

\pm 10 %

perturbations. A feature importance analysis, using gain and SHAP metrics, identifies debt-to-income ratio and employment title as key default predictors, with stable rankings (Spearman correlation

> 0.95, p < 0.01

) across tuning methods, enhancing model interpretability. Operational impact depends on data quality, scalable infrastructure, fairness audits for features like employment title, and stakeholder collaboration to ensure compliance with regulations like the EU AI Act and U.S. Equal Credit Opportunity Act. These findings advocate Bayesian HPO and ensemble models in P2P lending, offering scalable, transparent, and fair solutions for default prediction, with future research suggested to explore advanced resampling, cost-sensitive metrics, and feature interactions. Full article

(This article belongs to the Special Issue Feature Papers of Forecasting 2025)

► Show Figures

Figure 1

17 pages, 4941 KiB

Open AccessArticle

Estimating Soil Cd Contamination in Wheat Farmland Using Hyperspectral Data and Interpretable Stacking Ensemble Learning

by Liang Zhong, Meng Ding, Shengjie Yang, Xindan Xu, Jianlong Li and Zhengguo Sun

Agronomy 2025, 15(7), 1574; https://doi.org/10.3390/agronomy15071574 - 27 Jun 2025

Viewed by 325

Abstract

Soil heavy metal pollution threatens agricultural safety and human health, with Cd exceeding standards being the most common problem in contaminated farmland. The development of hyperspectral remote sensing technology has provided a novel methodology of quickly and non-destructively monitoring heavy metal contamination in [...] Read more.

Soil heavy metal pollution threatens agricultural safety and human health, with Cd exceeding standards being the most common problem in contaminated farmland. The development of hyperspectral remote sensing technology has provided a novel methodology of quickly and non-destructively monitoring heavy metal contamination in soil. This study aims to explore the potential of an interpretable Stacking ensemble learning model for the estimation of soil Cd contamination in farmland hyperspectral data. We assume that this method can improve the modeling accuracy. We chose Zhangjiagang City, Jiangsu Province, China, as the study area. We gathered soil samples from wheat fields and analyzed soil spectral data and Cd level in the lab. First, we pre-processed the spectra utilizing fractional-order derivative (FOD) and standard normal variate (SNV) transforms to highlight the spectral features. Second, we applied the competitive adaptive reweighted sampling (CARS) feature selection algorithm to identify the significant wavelengths correlated with soil Cd content. Then, we constructed and compared the estimation accuracy of multiple machine learning models and a Stacking ensemble learning method and utilized the Optuna method for hyperparameter optimization. Ultimately, the SHAP method was used to shed light on the model’s decision-making process. The results show that (1) FOD can further highlight the spectral features, thereby strengthening the correlation between soil Cd content and wavelength; (2) the CARS algorithm extracted 3.4–6.8% of the feature wavelengths from the full spectrum, and most of them were the wavelengths with high correlation with soil Cd; (3) the optimal estimation precision was achieved using the FOD1.5-SNV spectral pre-processing combined with the Stacking model (R² = 0.77, RMSE = 0.05 mg/kg, RPD = 2.07), and the model effectively quantitatively estimated soil Cd contamination; and (4) SHAP further revealed the contribution of each base model and characteristic wavelengths in the Stacking modeling process. This research confirms the advantages of the interpretable Stacking model in hyperspectral estimation of Cd contamination in farmland wheat soil. Furthermore, it offers a foundational reference for the future implementation of quantitative and non-destructive regional monitoring of heavy metal contamination in farmland soil. Full article

(This article belongs to the Special Issue Response of Agroecosystem Carbon–Water Cycling and Crop Spatial Patterns to Environmental Change: Mechanisms and Adaptive Management)

► Show Figures

Figure 1

39 pages, 4528 KiB

Open AccessArticle

Prediction of Unconfined Compressive Strength in Cement-Treated Soils: A Machine Learning Approach

by Iancu-Bogdan Teodoru, Zakaria Owusu-Yeboah, Mircea Aniculăesi, Andreea Vasilica Dascălu, Florian Hörtkorn, Alessia Amelio and Irina Lungu

Appl. Sci. 2025, 15(13), 7022; https://doi.org/10.3390/app15137022 - 22 Jun 2025

Viewed by 1156

Abstract

This study integrates systematic laboratory testing with advanced machine learning techniques to predict the unconfined compressive strength (UCS) of cement-treated clayey silt from northwestern Iași, Romania. Laboratory experiments generated 185 UCS measurements, examining the effects of cement content, curing period, and compaction velocity [...] Read more.

This study integrates systematic laboratory testing with advanced machine learning techniques to predict the unconfined compressive strength (UCS) of cement-treated clayey silt from northwestern Iași, Romania. Laboratory experiments generated 185 UCS measurements, examining the effects of cement content, curing period, and compaction velocity on strength development. Fourteen regression algorithms were initially screened, with the top three performers subsequently evaluated using nested cross-validation and Bayesian hyperparameter optimization via the Optuna framework. Correlation analysis identified cement content as the primary factor, with curing period as moderately influential and compaction rate having minimal impact when target density was achieved. Random Forest emerged as the optimal algorithm, providing robust and accurate UCS predictions. Beyond standard predictions, a two-stage uncertainty quantification system was implemented, allowing for both central estimates and reliable confidence intervals. SHAP analysis confirmed the dominant roles of cement content and curing period and enabled mechanistic interpretation of parameter contributions. The complete predictive system is available as a public web application, enabling geotechnical engineers to obtain rapid UCS predictions with quantified uncertainty, supporting efficient ground improvement design and risk assessment. Full article

(This article belongs to the Section Civil Engineering)

► Show Figures

Figure 1

26 pages, 4550 KiB

Open AccessArticle

Optimization of Rockburst Grade Prediction Model Based on Multidimensional Feature Selection: Integrated Learning and Index System Correlation Analysis

by Jiayang Chen and Xuebin Xie

Appl. Sci. 2025, 15(12), 6466; https://doi.org/10.3390/app15126466 - 9 Jun 2025

Viewed by 451

Abstract

Rockburst is a major disaster in deep underground engineering, and its prediction is crucial for engineering safety. This study proposes an optimization method based on multidimensional feature selection and integrated learning that systematically evaluates the impact of different indicator dimensions by constructing an [...] Read more.

Rockburst is a major disaster in deep underground engineering, and its prediction is crucial for engineering safety. This study proposes an optimization method based on multidimensional feature selection and integrated learning that systematically evaluates the impact of different indicator dimensions by constructing an indicator–indicator system and an indicator–rockburst hierarchy using a combination of seven-, six-, five-, four-, and three-dimensional indicators in conjunction with six machine-learning models, such as XGBoost, LightGBM, and CatBoost. The results show that tree models (e.g., CatBoost, LightGBM, etc.) are naturally resistant to multicollinearity, and PCA preprocessing destroys their nonlinear feature relationships, leading to performance degradation. CatBoost has the best performance and strong overfitting resistance; LightGBM is the second most efficient and suitable for real-time applications. The indicator–indicator system has better overall performance but less stability, and the indicator–rockburst system has slightly lower performance but a more stable downward trend. The six-dimensional system in both types of systems can balance the performance and complexity and is the optimal choice for engineering applications. This study provides theoretical support and practical reference for the selection of rockburst prediction and an evaluation index system. Full article

► Show Figures

Figure 1

19 pages, 840 KiB

Open AccessArticle

A Dual-Feature Framework for Enhanced Diagnosis of Myeloproliferative Neoplasm Subtypes Using Artificial Intelligence

by Amna Bamaqa, N. S. Labeeb, Eman M. El-Gendy, Hani M. Ibrahim, Mohamed Farsi, Hossam Magdy Balaha, Mahmoud Badawy and Mostafa A. Elhosseini

Bioengineering 2025, 12(6), 623; https://doi.org/10.3390/bioengineering12060623 - 7 Jun 2025

Viewed by 772

Abstract

Myeloproliferative neoplasms, particularly the Philadelphia chromosome-negative (Ph-negative) subtypes such as essential thrombocythemia, polycythemia vera, and primary myelofibrosis, present diagnostic challenges due to overlapping morphological features and clinical heterogeneity. Traditional diagnostic approaches, including imaging and histopathological analysis, are often limited by interobserver variability, delayed [...] Read more.

Myeloproliferative neoplasms, particularly the Philadelphia chromosome-negative (Ph-negative) subtypes such as essential thrombocythemia, polycythemia vera, and primary myelofibrosis, present diagnostic challenges due to overlapping morphological features and clinical heterogeneity. Traditional diagnostic approaches, including imaging and histopathological analysis, are often limited by interobserver variability, delayed diagnosis, and subjective interpretations. To address these limitations, we propose a novel framework that integrates handcrafted and automatic feature extraction techniques for improved classification of Ph-negative myeloproliferative neoplasms. Handcrafted features capture interpretable morphological and textural characteristics. In contrast, automatic features utilize deep learning models to identify complex patterns in histopathological images. The extracted features were used to train machine learning models, with hyperparameter optimization performed using Optuna. Our framework achieved high performance across multiple metrics, including precision, recall, F1 score, accuracy, specificity, and weighted average. The concatenated probabilities, which combine both feature types, demonstrated the highest mean weighted average of 0.9969, surpassing the individual performances of handcrafted (0.9765) and embedded features (0.9686). Statistical analysis confirmed the robustness and reliability of the results. However, challenges remain in assuming normal distributions for certain feature types. This study highlights the potential of combining domain-specific knowledge with data-driven approaches to enhance diagnostic accuracy and support clinical decision-making. Full article

(This article belongs to the Special Issue Deciphering Medicine: The Role of Explainable Artificial Intelligence in Healthcare Innovations, 2nd Edition)

► Show Figures

Figure 1

Search Results (73)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (73)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI