Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (3,115)

Search Parameters:
Keywords = ensembling algorithms

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
26 pages, 3911 KB  
Article
Parametric Optimization of VLM Panel Discretization Using Bio-Inspired Crayfish and Aquila Algorithms Coupled with Hybrid RSM-Based Ensemble Machine Learning Surrogate Models: A Case Study
by Yüksel Eraslan and Esmanur Şengün
Biomimetics 2026, 11(3), 204; https://doi.org/10.3390/biomimetics11030204 - 11 Mar 2026
Abstract
Fast and reliable aerodynamic predictions are crucial in the early phases of aircraft design, where a quick assessment of various configurations is required. In this context, the Vortex Lattice Method (VLM) is widely adopted due to its computational efficiency; however, its predictive accuracy [...] Read more.
Fast and reliable aerodynamic predictions are crucial in the early phases of aircraft design, where a quick assessment of various configurations is required. In this context, the Vortex Lattice Method (VLM) is widely adopted due to its computational efficiency; however, its predictive accuracy is highly sensitive to panel discretization strategies, which are often determined heuristically. This study proposes a bio-inspired optimization framework for VLM panel discretization and evaluates it through a systematic case study on a representative wing geometry. A grid-convergence analysis was initially carried out to ensure solution independence across various spanwise-to-chordwise panel ratios. Subsequently, a novel Hybrid Response Surface Methodology (HRSM), integrating Box–Behnken and Central Composite experimental designs, was employed to enable a more comprehensive exploration of the factor space while quantifying the effects of clustering parameters at the leading-edge, trailing-edge, root, and tip regions of the wing. The HRSM dataset was further utilized to train Ensemble Machine-Learning surrogate models, which were coupled with bio-inspired Crayfish and Aquila optimization algorithms, alongside a classical Genetic Algorithm (GA) as a performance benchmark, to identify the optimal discretization strategy and to enable a comparative assessment of their convergence behavior and robustness against the numerical noise of the ensemble-based landscape. Compared to base (i.e., uniform) panel distribution, the optimally clustered discretization enhanced overall aerodynamic prediction accuracy by approximately 33%, particularly at low angles of attack, while maintaining robust performance at higher angles. Both algorithms converged to similar minima; however, the Aquila algorithm achieved higher solution consistency, whereas the Crayfish algorithm exhibited greater dispersion despite faster convergence, revealing a multimodal optimization landscape. The variance decomposition revealed that trailing-edge clustering dominated aerodynamic accuracy at low angles of attack, contributing up to 90% of the total variance, whereas tip clustering became increasingly influential at higher angles, exceeding 30%, highlighting the need for adaptive discretization strategies to ensure reliable VLM-based aerodynamic analyses. Full article
Show Figures

Figure 1

25 pages, 3570 KB  
Article
A Context-Aware Flood Warning Framework Integrating Ensemble Learning and LLMs
by Adnan Ahmed Abi Sen, Fares Hamad Aljohani, Nour Mahmoud Bahbouh, Adel Ben Mnaouer, Omar Tayan and Ahmad. B. Alkhodre
GeoHazards 2026, 7(1), 35; https://doi.org/10.3390/geohazards7010035 - 11 Mar 2026
Abstract
Smart cities require effective disaster management (like flooding, solar storms, sandstorms, or hurricanes), as it directly impacts people’s lives. The key challenges of disaster management are timely detection and effective notification during the crisis. This research presents a smart multi-layer framework for notification [...] Read more.
Smart cities require effective disaster management (like flooding, solar storms, sandstorms, or hurricanes), as it directly impacts people’s lives. The key challenges of disaster management are timely detection and effective notification during the crisis. This research presents a smart multi-layer framework for notification classification and management before and during flooding disasters. The framework includes an early detection module as the main phase in the alerting process. This step depends on an Ensemble Learning (EL) model based on a triad of the three best selected models (Deep Learning (DL), Random Forest (RF), and K-nearest Neighbor (KNN)) to analyze data collected continuously from the Internet of Things (IoT) layer. In the boosting phase, the framework utilizes Large Language Models (LLMs) with DL to analyze social textual crowdsourcing data. The results will enable the framework to identify the most affected areas during a flood. The framework adds a fog computing layer alongside a cloud layer to enable instantaneous processing of user responses and generate specialized alerts based on contextual factors such as location, time, risk level, alert type, and user characteristics. Through testing and implementation, the proposed algorithms demonstrated an accuracy rate of over 98% in detecting threats using a dataset of real, collected weather and flooding data. Additionally, the framework proposes a centralized control panel and a design of a smartphone application that offers essential services and facilitates communication among managed civil defense teams, citizens, and volunteers. Full article
Show Figures

Figure 1

18 pages, 1664 KB  
Article
Forest Restoration Potential and Carbon-Stock Interface: Integration of Spectroscopy-Derived Biomass Maps with Machine-Learning Regression Models
by Varaprasad Anupoju, Boddeda Eswar Rao, Kare Satish, Adduri Sai Pavan Kalyan, Kondapalli Krishna Kavya and Venkata Ravi Sankar Cheela
Spectrosc. J. 2026, 4(1), 5; https://doi.org/10.3390/spectroscj4010005 - 10 Mar 2026
Abstract
Forests are vital regulators of global carbon balance, yet accelerating deforestation and land-use conversion continue to erode their capacity to sequester carbon. This research quantifies forest restoration and carbon sequestration potential across Visakhapatnam, India, by integrating imaging spectroscopy with machine learning at medium [...] Read more.
Forests are vital regulators of global carbon balance, yet accelerating deforestation and land-use conversion continue to erode their capacity to sequester carbon. This research quantifies forest restoration and carbon sequestration potential across Visakhapatnam, India, by integrating imaging spectroscopy with machine learning at medium spatial resolution. Using 33 spectral and environmental predictors, an ensemble Random Forest model was developed and benchmarked against a K-Nearest Neighbors algorithm. The Random Forest approach demonstrated markedly higher predictive strength, explaining 87% of the spatial variability in tree cover, while maintaining low error margins. By excluding agricultural and urban areas, the analysis identified approximately 104,800 hectares of restorable land. The restorable area corresponds to an estimated carbon sequestration potential of about 0.12 petagrams, underscoring the district’s significant yet underutilized capacity to contribute to regional and national climate goals. The research highlights how integrating spectroscopy-derived vegetation metrics with ensemble learning enables spatially precise, policy-relevant restoration planning. By linking medium-resolution environmental data with carbon accounting, this framework advances a scalable pathway for data-driven forest recovery and nature-based climate mitigation, bridging the gap between site-specific ecological assessments and large-scale sustainability initiatives. Full article
Show Figures

Figure 1

20 pages, 3757 KB  
Article
Short-Term Photovoltaic Power Forecasting Using a Hybrid RF-ICEEMDAN-SE-RWCE-GRU Model
by Chuang Li, Xiaohuang Huang, Mang Su, Huanhuan Duan, Weile Cao and Guomin Cui
Energies 2026, 19(6), 1386; https://doi.org/10.3390/en19061386 - 10 Mar 2026
Abstract
To enhance the accuracy of short-term photovoltaic (PV) power forecasting, this study proposes a novel hybrid model that integrates Random Forest (RF), Improved Complete Ensemble Empirical Mode Decomposition with Adaptive Noise (ICEEMDAN), Sample Entropy (SE), the Random Walk with Compulsory Evolution (RWCE) algorithm, [...] Read more.
To enhance the accuracy of short-term photovoltaic (PV) power forecasting, this study proposes a novel hybrid model that integrates Random Forest (RF), Improved Complete Ensemble Empirical Mode Decomposition with Adaptive Noise (ICEEMDAN), Sample Entropy (SE), the Random Walk with Compulsory Evolution (RWCE) algorithm, and the Gated Recurrent Unit (GRU) network. Initially, RF is applied to select relevant meteorological features, minimizing redundancy and improving both training efficiency and predictive robustness under complex operating conditions. ICEEMDAN is then employed to decompose the PV power series into multiple quasi-stationary components, mitigating the adverse effects of non-stationarity on forecasting accuracy. Following this, SE is applied to quantify the complexity of each component and reconstruct the decomposed signals into high-, mid-, and low-frequency bands, simplifying the inputs to the forecasting model. To further improve performance, the RWCE algorithm optimizes GRU network hyperparameters through global exploration, individual evolution, and enforced evolution strategies. The optimized GRU network then predicts each reconstructed component, and the component-wise forecasts are aggregated to yield the final PV power output. Simulation results from several representative months indicate that the proposed approach reduces RMSE by an average of 9.02% compared to comparison model and by 43.41% relative to the baseline model, demonstrating its superior forecasting capability. Additionally, the model demonstrated scalability across varying climate conditions, confirming its applicability in real-world scenarios. Full article
Show Figures

Figure 1

20 pages, 508 KB  
Article
Predictive Modelling of Credit Default Risk Using Machine Learning and Ensemble Techniques
by Mofoka Rebuseditsoe Mathibela and Daniel Maposa
Math. Comput. Appl. 2026, 31(2), 45; https://doi.org/10.3390/mca31020045 - 10 Mar 2026
Abstract
This study develops a hybrid framework integrating ensemble learning with explainable artificial intelligence to address the methodological challenge of balancing predictive accuracy and interpretability in credit risk model comparison. Using the German Credit Dataset, we implemented a comprehensive preprocessing pipeline, including feature encoding, [...] Read more.
This study develops a hybrid framework integrating ensemble learning with explainable artificial intelligence to address the methodological challenge of balancing predictive accuracy and interpretability in credit risk model comparison. Using the German Credit Dataset, we implemented a comprehensive preprocessing pipeline, including feature encoding, scaling, and SMOTE for class imbalance handling. Four base models, logistic regression, Random Forest, XGBoost, and Multilayer Perceptron, were combined through a Stacked Ensemble with a logistic regression meta learner. The ensemble demonstrated strong performance, achieving an AUC of 0.761, precision of 0.783, recall of 0.806, and an F1 score of 0.794, which represented the highest scores among all models tested. Notably, Random Forest (AUC = 0.749) surpassed XGBoost (AUC = 0.733), challenging conventional algorithmic hierarchies. SHAP analysis provided transparent global and local interpretability, identifying Current Account status (SHAP = 0.153), Loan Duration (0.064), and Savings Account (0.063) as dominant predictor variables. Class-imbalance handling and threshold optimisation enhanced practical utility by reducing false positives from 39 to 16, thereby aligning with financial risk priorities. The framework provides a reproducible methodological pipeline for systematically comparing credit scoring approaches, demonstrating how predictive performance can be evaluated alongside interpretability considerations within a benchmark dataset context. Full article
Show Figures

Figure 1

27 pages, 5395 KB  
Article
ML-Driven Decision Support for Dynamic Modeling of Calcareous Sands
by Abdalla Y. Almarzooqi, Mohamed G. Arab, Maher Omar and Emran Alotaibi
Mach. Learn. Knowl. Extr. 2026, 8(3), 68; https://doi.org/10.3390/make8030068 - 9 Mar 2026
Abstract
Dynamic characterization of calcareous (carbonate) sands is essential for performance-based design of offshore foundations, coastal reclamation, and marine infrastructure in tropical and subtropical regions. In contrast to silica sands, carbonate sediments are biogenic and typically comprise angular, irregular grains with intra-particle voids and [...] Read more.
Dynamic characterization of calcareous (carbonate) sands is essential for performance-based design of offshore foundations, coastal reclamation, and marine infrastructure in tropical and subtropical regions. In contrast to silica sands, carbonate sediments are biogenic and typically comprise angular, irregular grains with intra-particle voids and fragile skeletal microstructure. These traits promote grain crushing and fabric evolution at relatively low-to-moderate confinement, leading to pronounced stress dependency, strong nonlinearity with strain amplitude, and substantial scatter in laboratory stiffness and damping measurements. Consequently, empirical correlations calibrated primarily on quartz sands may yield biased estimates when transferred to carbonate environments. This study presents an ML-driven, leakage-aware benchmarking framework for predicting two key dynamic parameters of biogenic calcareous sands, damping ratio D and shear modulus G, using standard tabular descriptors commonly available in geotechnical practice. Two consolidated experimental databases were curated from resonant column and cyclic triaxial measurements (D: n=890; G: n=966), spanning mean effective confining stress 25  σm1600 kPa and a wide range of density and gradation conditions. To emphasize transferability, explicit deposit/site labels were excluded, and missingness arising from heterogeneous reporting was handled through a consistent preprocessing pipeline (training-only imputation, categorical encoding, and scaling). Eleven regression algorithms were evaluated, covering linear baselines, regularized regression, neighborhood learning, single trees, bagging and boosting ensembles, kernel regression, and a feedforward neural network. Performance was assessed using R2, RMSE, and MAE on training/validation/test splits, and engineering credibility was supported through explainability-based diagnostics to verify mechanically plausible sensitivities. Results show that ensemble-tree models (Extra Trees and Random Forest) provide the most reliable accuracy–robustness balance across both targets, consistently outperforming linear models and the tested SVR configuration and exhibiting stable validation-to-test behavior. The explainability audit confirms physically meaningful separation of governing controls: stiffness is primarily stress-controlled (σm dominant for G), whereas damping is primarily strain-controlled (γ dominant for D). The proposed framework supports practical deployment as a fast surrogate for generating Gγ and Dγ curves within the training domain and for guiding targeted laboratory test planning in carbonate settings. Full article
Show Figures

Graphical abstract

25 pages, 2908 KB  
Article
Data-Driven Prediction of Compressive Strength in Concrete with Lightweight Expanded Clay Aggregate Using Machine Learning Techniques
by Soorya M. Nair, Anand Nammalvar and Diana Andrushia
J. Compos. Sci. 2026, 10(3), 151; https://doi.org/10.3390/jcs10030151 - 9 Mar 2026
Viewed by 40
Abstract
The growing need for sustainable and lightweight building materials has accelerated research on alternatives to conventional concretes, out of which Lightweight Expanded Clay Aggregate (LECA) concrete has emerged as a promising solution. However, the high porosity and nonlinear mechanical behavior of LECA concrete [...] Read more.
The growing need for sustainable and lightweight building materials has accelerated research on alternatives to conventional concretes, out of which Lightweight Expanded Clay Aggregate (LECA) concrete has emerged as a promising solution. However, the high porosity and nonlinear mechanical behavior of LECA concrete complicate the accurate prediction of compressive strength through conventional empirical models. The main focus of the paper is on identifying a comprehensive machine learning-based framework for modeling and predicting the 28-day compressive strength of LECA-based lightweight concrete. The dataset was created and preprocessed by using statistical normalization and correlation analysis. In this study, five supervised machine learning models—Multiple Linear Regression (MLR), Support Vector Regression (SVR), Random Forest (RF), eXtreme Gradient Boosting (XGBoost), and Categorical Boosting (CatBoost)—were developed and fine-tuned using a grid-search strategy combined with ten-fold cross-validation. The quality of the prediction made by each model was evaluated by means of standard performance indicators, such as the coefficient of determination (R2), root mean square error (RMSE), mean absolute error (MAE), and mean absolute percentage error (MAPE). After the evaluation, the models were subsequently compared and ranked according to the Gray Relational Analysis (GRA) method. The comparative assessment shows that CatBoost demonstrated the most reliable performance, achieving an R2 of 0.907, RMSE of 3.41 MPa, MAE of 2.47 MPa, and MAPE of 10.05%, outperforming the remaining algorithms. To interpret the significance of features, SHAP (Shapley Additive exPlanations) analysis was applied, which identified water and LECA content as the dominant factors influencing compressive strength, followed by the cement and fine aggregate proportions. The findings reveal that the ensemble-based gradient boosting model is capable of capturing intricate nonlinear interactions, as observed in the heterogeneous matrix of LECA concrete. Full article
(This article belongs to the Section Composites Applications)
Show Figures

Figure 1

31 pages, 5209 KB  
Review
AI-Driven Fault Detection and O&M for Wind Turbine Drivetrains: A Review of SCADA, CMS and Digital Twin Integration
by Ning Jia, Jiangzhe Feng, Zongyou Zuo, Zhiyi Liu, Tengyuan Wang, Chang Cai and Qingan Li
Energies 2026, 19(5), 1370; https://doi.org/10.3390/en19051370 - 7 Mar 2026
Viewed by 126
Abstract
The rapid expansion of wind energy has increased the operational complexity of wind turbines, where component degradation, environmental variability, and maintenance decisions are tightly coupled. Artificial intelligence (AI) has been widely applied to support fault detection and operation and maintenance (O&M), yet many [...] Read more.
The rapid expansion of wind energy has increased the operational complexity of wind turbines, where component degradation, environmental variability, and maintenance decisions are tightly coupled. Artificial intelligence (AI) has been widely applied to support fault detection and operation and maintenance (O&M), yet many existing studies remain fragmented and insufficiently address practical challenges such as heterogeneous data, sparse fault labels, and cross-site generalization. This review provides an engineering-oriented synthesis of AI-based methods for wind turbine fault detection and O&M, focusing on drivetrain diagnostics as a representative application. The literature is organized along an end-to-end O&M workflow, including SCADA-based condition monitoring, component-level fault diagnosis, health assessment and remaining useful life estimation, multi-modal blade inspection, and DT (Digital Twin) integration. Traditional ML (machine learning), ensemble methods, deep learning, physics-informed learning, and transfer learning are reviewed with respect to their data requirements, operational assumptions, and deployment constraints. Beyond algorithmic performance, this review discusses data governance, alarm design, model updating, and interpretability, and summarizes public datasets and emerging data resources. The aim is to bridge methodological advances and practical O&M requirements, supporting reliable and deployable AI applications in wind energy systems. Full article
(This article belongs to the Section A3: Wind, Wave and Tidal Energy)
Show Figures

Figure 1

35 pages, 5289 KB  
Article
Sentiment Classification of Amazon Product Reviews Based on Machine and Deep Learning Techniques: A Comparative Study
by Eman Daraghmi and Noora Zyadeh
Future Internet 2026, 18(3), 138; https://doi.org/10.3390/fi18030138 - 7 Mar 2026
Viewed by 138
Abstract
Sentiment classification plays a crucial role in analyzing customer feedback to identify market trends, enhance product recommendations, and improve customer satisfaction. This study focuses on sentiment analysis of Amazon reviews using two major datasets—Fine Food Reviews and Unlocked Mobile Reviews—which exhibit label imbalance. [...] Read more.
Sentiment classification plays a crucial role in analyzing customer feedback to identify market trends, enhance product recommendations, and improve customer satisfaction. This study focuses on sentiment analysis of Amazon reviews using two major datasets—Fine Food Reviews and Unlocked Mobile Reviews—which exhibit label imbalance. To address this challenge, both oversampling and undersampling techniques were applied to balance the datasets. Various machine learning (ML) algorithms, including Random Forest (RF), Logistic Regression (LR), Support Vector Machine (SVM), Naïve Bayes (NB), and Gradient Boosting Machine (GBM), as well as deep learning (DL) models such as Convolutional Neural Network (CNN), Long Short-Term Memory (LSTM), and transformer-based models like RoBERTa, were implemented. After data cleaning and preprocessing, models were trained, and performance was evaluated. The results indicate that oversampling significantly enhances classification accuracy, particularly for the Fine Food dataset. Among ML models, Random Forest achieved the highest accuracy due to its ensemble approach and robustness in handling high-dimensional data. DL models, particularly RoBERTa, also demonstrated superior performance owing to their capacity to capture contextual dependencies. The findings emphasize the importance of data balancing for optimal sentiment analysis and contribute valuable insights toward advancing automated opinion classification in e-commerce applications. Full article
(This article belongs to the Section Big Data and Augmented Intelligence)
Show Figures

Figure 1

28 pages, 5263 KB  
Article
Inversion of Soil Arsenic Concentration in Sanlisha’an Mining Area Based on ZY-02E Hyperspectral Satellite Images
by Yuqin Li, Dan Meng, Qi Yang, Mengru Zhang and Yue Zhao
Remote Sens. 2026, 18(5), 822; https://doi.org/10.3390/rs18050822 - 6 Mar 2026
Viewed by 229
Abstract
Soil heavy metal pollution caused by mineral resource extraction activities poses a serious threat to the ecological environment within and surrounding mining areas. As a highly concealed toxic heavy metal, arsenic (As) urgently requires the establishment of efficient pollution monitoring methods to achieve [...] Read more.
Soil heavy metal pollution caused by mineral resource extraction activities poses a serious threat to the ecological environment within and surrounding mining areas. As a highly concealed toxic heavy metal, arsenic (As) urgently requires the establishment of efficient pollution monitoring methods to achieve pollution prevention and control, as well as environmental remediation in mining areas. This study investigated the feasibility of hyperspectral remote sensing inversion for soil heavy metal arsenic based on ZY-1 02E hyperspectral satellite imagery, focusing on a mining area and its surrounding soils in Sanlisha’an, Wuxuan County, Guangxi. Full Constrained Least Squares (FCLS) was employed to separate mixed pixels and enhance soil spectral contributions in ZY-1 02E imagery, thereby mitigating vegetation interference. Six mathematical transformations, including RT, AT, FD, RTFD, ATFD, and SD, were applied to both the original and enhanced spectra to enhance spectral features. The correlations between the transformed spectra, as well as the original image spectra (S), and soil As concentration were analyzed; then the spectra strongly correlated with soil As concentration were selected to construct Ratio Spectral Index (RSI) and Normalized Difference Spectral Index (NDSI). Correlation matrices were calculated between RSI/NDSI indices and As concentration. Sensitive features were screened using an improved Successive Projection Algorithm (SPA). As concentration inversion was also performed with four models: traditional regression models, PLSR and MLR, and ensemble learning models (RF and XGBoost). In the soil contribution-enhanced spectral modeling results, the optimal transformation–index combination is ATFD-NDSI. The performance indicators of each model are as follows: MLR test set R2 = 0.65, PLSR test set R2 = 0.62, RF test set R2 = 0.7, and XGBoost test set R2 = 0.64. The results indicate that the ATFD-NDSI-RF ensemble model provides the best performance. By integrating multiple decision trees, RF effectively handles complex nonlinear relationships, thus enhancing the accuracy and generalization ability of predication. The analysis of NDSI–ATFD–RF inversion results based on sampling points indicates that model error correlates with the pollution intensity gradient, showing greater errors, especially in high-concentration areas, but still maintaining strong correlations (tailings reservoir: r = 0.92, forested areas: r = 0.96, and cropland: r = 0.83). The spatial distribution reveals that the inversion results are closely similar to the spatial distribution of IDW interpolation. Areas with high As concentrations are concentrated in the tailings reservoir and in the southeastern part of the study area. The correlation coefficient between the inversion results and IDW interpolation is 0.6, which further verifies that the inversion results effectively reproduce the spatial distribution trend of highly polluted areas. Full article
Show Figures

Figure 1

30 pages, 2628 KB  
Article
Predicting Bond Defaults in China: A Double-Ensemble Model Leveraging SMOTE for Class Imbalance
by Chongwen Tian and Rong Li
Big Data Cogn. Comput. 2026, 10(3), 81; https://doi.org/10.3390/bdcc10030081 - 6 Mar 2026
Viewed by 167
Abstract
This study proposes the Double-Ensemble Learning Classification with SMOTE (DELC-SMOTE), a novel hierarchical framework designed to address the critical challenge of severe class imbalance in financial bond default prediction. The model integrates the Synthetic Minority Over-sampling Technique (SMOTE) into a two-phase ensemble architecture. [...] Read more.
This study proposes the Double-Ensemble Learning Classification with SMOTE (DELC-SMOTE), a novel hierarchical framework designed to address the critical challenge of severe class imbalance in financial bond default prediction. The model integrates the Synthetic Minority Over-sampling Technique (SMOTE) into a two-phase ensemble architecture. The first phase employs introspective stacking, where six heterogeneous base learners are individually enhanced through algorithm-specific balancing and meta-learning. The second phase fuses these optimized experts via performance-weighted voting. Empirical analysis utilizes a comprehensive dataset of 10,440 Chinese corporate bonds (522 defaults, ~5% default rate) sourced from Wind and CSMAR databases. Given the high cost of both false negatives and false positives in risk assessment, the Geometric Mean (G-mean) and Specificity are employed as primary evaluation metrics. Results demonstrate that the proposed DELC-SMOTE model significantly outperforms individual base classifiers and benchmark ensemble variants, achieving a G-mean of 0.9152 and a Specificity of 0.8715 under the primary experimental setting. The model exhibits robust performance across varying imbalance ratios (2%, 10%, 20%) and strong resilience against data noise, perturbations, and outliers. These findings indicate that the synergistic integration of data-level resampling within a diversified, two-tiered ensemble structure effectively mitigates class imbalance bias and enhances predictive reliability. The framework offers a robust and generalizable tool for actionable default risk assessment in imbalanced financial datasets. Full article
(This article belongs to the Section Data Mining and Machine Learning)
Show Figures

Figure 1

18 pages, 3617 KB  
Article
Adaptive Ensemble Weight Optimization for Natural Gas Consumption Forecasting: A Hybrid Stochastic–Deep Learning Framework Applied to the Czech Market
by Vojtěch Vávra and Josef Jablonsky
Mathematics 2026, 14(5), 900; https://doi.org/10.3390/math14050900 - 6 Mar 2026
Viewed by 112
Abstract
The transition towards data-driven energy management requires predictive frameworks capable of handling the nonlinear and non-stationary nature of natural gas consumption. Traditional static models often struggle to adapt to rapid regime shifts in liberalized markets. To address this forecasting problem, this study proposes [...] Read more.
The transition towards data-driven energy management requires predictive frameworks capable of handling the nonlinear and non-stationary nature of natural gas consumption. Traditional static models often struggle to adapt to rapid regime shifts in liberalized markets. To address this forecasting problem, this study proposes a convex ensemble weight optimization framework. Moving beyond simple model averaging, we formulate the ensemble weighting problem as a constrained convex optimization task on the unit simplex. We utilize the Frank–Wolfe algorithm (Conditional Gradient) to dynamically optimize the weights of a heterogeneous set of base learners, including SARIMAX, XGBoost, N-HiTS, and Temporal Fusion Transformers (TFTs). Our results on the Czech gas market dataset demonstrate that this mathematically grounded approach achieves a Mean Absolute Percentage Error (MAPE) of 4.25%, which compares favorably to individual models such as N-HiTS (5.31%) and static averaging (6.74%). While the accuracy gain over greedy ensemble selection is marginal, the proposed convex formulation offers improved stability and interpretability, which are practical advantages for operational deployment. Full article
(This article belongs to the Section D: Statistics and Operational Research)
Show Figures

Figure 1

82 pages, 6468 KB  
Article
Correction Functions and Refinement Algorithms for Enhancing the Performance of Machine Learning Models
by Attila Kovács, Judit Kovácsné Molnár and Károly Jármai
Automation 2026, 7(2), 45; https://doi.org/10.3390/automation7020045 - 6 Mar 2026
Viewed by 216
Abstract
The aim of this study is to investigate and demonstrate the role of correction functions and optimisation-based refinement algorithms in enhancing the performance of machine learning models, particularly in predictive anomaly detection tasks applied in industrial environments. The performance of machine learning models [...] Read more.
The aim of this study is to investigate and demonstrate the role of correction functions and optimisation-based refinement algorithms in enhancing the performance of machine learning models, particularly in predictive anomaly detection tasks applied in industrial environments. The performance of machine learning models is highly dependent on the quality of data preprocessing, model architecture, and post-processing methodology. In many practical applications—particularly in time-series forecasting and anomaly detection—the conventional training pipeline alone is insufficient, because model uncertainty, structural bias and the handling of rare events require specialised post hoc calibration and refinement mechanisms. This study provides a systematic overview of the role of correction functions (e.g., Principal Component Analysis (PCA), Squared Prediction Error (SPE)/Q-statistics, Hotelling’s T2, Bayesian calibration) and adaptive improvement algorithms (e.g., Genetic Algorithms (GA), Particle Swarm Optimisation (PSO), Simulated Annealing (SA), Gaussian Mixture Model (GMM) and ensemble-based techniques) in enhancing the performance of machine learning pipelines. The models were trained on a real industrial dataset compiled from power network analytics and harmonic-injection-based loading conditions. Model validation and equipment-level testing were performed using a large-scale harmonic measurement dataset collected over a five-year period. The reliability of the approach was confirmed by comparing predicted state transitions with actual fault occurrences, demonstrating its practical applicability and suitability for integration into predictive maintenance frameworks. The analysis demonstrates that correction functions introduce deterministic transformations in the data or error space, whereas improvement algorithms apply adaptive optimisation to fine-tune model parameters or decision boundaries. The combined use of these approaches significantly reduces overfitting, improves predictive accuracy and lowers false alarm rates. This work introduces the concept of an Organically Adaptive Predictive (OAP) ML model. The proposed model presents organic adaptivity, continuously adjusting its predictive behaviour in response to dynamic variations in network loading and harmonic spectrum composition. The introduced terminology characterises the organically emergent nature of the adaptive learning mechanism. Full article
Show Figures

Figure 1

25 pages, 3080 KB  
Review
Machine Learning for Alloy Design: A Property-Oriented Review
by Shamim Pourrahimi and Soroosh Hakimian
Alloys 2026, 5(1), 7; https://doi.org/10.3390/alloys5010007 - 6 Mar 2026
Viewed by 127
Abstract
Machine learning (ML) is becoming an established part of alloy research, offering new ways to link composition, processing routes, and microstructure with measured properties. In this work, recent studies using ML for predicting or optimizing alloy behavior are reviewed, covering mechanical, corrosion, phase-related, [...] Read more.
Machine learning (ML) is becoming an established part of alloy research, offering new ways to link composition, processing routes, and microstructure with measured properties. In this work, recent studies using ML for predicting or optimizing alloy behavior are reviewed, covering mechanical, corrosion, phase-related, and physical properties. Unlike previous reviews organized by alloy system or modeling approach, this review is structured by target property (mechanical, corrosion, phase/structure, and physical), which helps identify the input features commonly used to model each property and highlights existing gaps in data and validation. For each study, the main property of interest, dataset features, model type, algorithm choice, use of hyperparameter tuning, and validation strategy were examined. Comparing these reports shows that ensemble models such as random forest and XGBoost, together with deep neural networks, usually perform better than linear approaches. At the same time, issues related to small datasets and inconsistent reporting remain major challenges. Attention is also drawn to new directions, particularly physics-based learning and multi-objective optimization, that are changing how ML is applied in materials design. Overall, this review summarizes current practices and outlines areas where closer integration of data-driven and experimental methods could accelerate the development of next-generation alloys. Full article
Show Figures

Figure 1

26 pages, 3199 KB  
Article
XGBoost Ensemble Algorithm for Classifying Tomato Leaf Diseases Based on Texture Descriptors
by Alpamis Kutlimuratov, Baxodir Achilov, Kuanishbay Seitnazarov, Piratdin Allayarov, Islambek Saymanov, Rashid Oteniyazov and Jamshid Khamzaev
AgriEngineering 2026, 8(3), 98; https://doi.org/10.3390/agriengineering8030098 - 5 Mar 2026
Viewed by 153
Abstract
This article presents a simple and understandable approach to the automatic assessment of the severity of late blight on tomato leaves. We collect our own dataset of 5245 RGB images of healthy and diseased tomato leaves and determine five ordinal classes: healthy (0%) [...] Read more.
This article presents a simple and understandable approach to the automatic assessment of the severity of late blight on tomato leaves. We collect our own dataset of 5245 RGB images of healthy and diseased tomato leaves and determine five ordinal classes: healthy (0%) and four infection levels (0.1–10%, 11–25%, 26–50%, and ≥51% of the affected area). Each image is segmented using the global definition of the Otsu threshold, followed by morphological purification, after which seven textural and geometric characteristics are extracted from the contours of the lesion: contrast, number of contours, average and standard deviation of the contour area, average and standard deviation of the contour perimeter, and average area-to-perimeter ratio. All characteristics are normalized and used as input data for the XGBoost classifier. The dataset is randomly split into 80% training and 20% test images, resulting in an independent test set of 1049 images. In this test set, the proposed model provides an overall accuracy of 0.93 and an F1 macro score of 0.93 points, while for each F1 class, it varies from 0.90 to 0.97. The confusion matrix shows a stable difference between neighboring severity levels, while the analysis of the importance of the features confirms the relevance of contour descriptors for characterizing the size and shape of the lesion. This method only runs on a central processor, requires a small amount of memory, and outputs interpretable output data, making it suitable for use in greenhouses and farms with limited computing resources. We also discuss the limitations associated with the boundaries between neighboring classes and the potential shift in the subject area, and we outline directions for expanding the approach to multi-sheet scenes and explicit ordinal loss functions. Full article
Show Figures

Figure 1

Back to TopTop