Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (124)

Search Parameters:
Keywords = ensemble bagged trees method

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
25 pages, 4095 KB  
Article
Comparison of Machine Learning Methods for Marker Identification in GWAS
by Weverton Gomes da Costa, Hélcio Duarte Pereira, Gabi Nunes Silva, Aluizio Borém, Eveline Teixeira Caixeta, Antonio Carlos Baião de Oliveira, Cosme Damião Cruz and Moyses Nascimento
Int. J. Plant Biol. 2026, 17(1), 6; https://doi.org/10.3390/ijpb17010006 - 19 Jan 2026
Viewed by 144
Abstract
Genome-wide association studies (GWAS) are essential for identifying genomic regions associated with agronomic traits, but Linear Mixed Model (LMM)-based GWAS face challenges in capturing complex gene interactions. This study explores the potential of machine learning (ML) methodologies to enhance marker identification and association [...] Read more.
Genome-wide association studies (GWAS) are essential for identifying genomic regions associated with agronomic traits, but Linear Mixed Model (LMM)-based GWAS face challenges in capturing complex gene interactions. This study explores the potential of machine learning (ML) methodologies to enhance marker identification and association modeling in plant breeding. Unlike LMM-based GWAS, ML approaches do not require prior assumptions about marker–phenotype relationships, enabling the detection of epistatic effects and non-linear interactions. The research sought to assess and contrast approaches utilizing ML (Decision Tree—DT; Bagging—BA; Random Forest—RF; Boosting—BO; and Multivariate Adaptive Regression Splines—MARS) and LMM-based GWAS. A simulated F2 population comprising 1000 individuals was analyzed using 4010 SNP markers and ten traits modeled with epistatic interactions. The simulation included quantitative trait loci (QTL) counts varying between 8 and 240, with heritability levels set at 0.5 and 0.8. These characteristics simulate traits of candidate crops that represent a diverse range of agronomic species, including major cereal crops (e.g., maize and wheat) as well as leguminous crops (e.g., soybean), such as yield, with moderate heritability and a high number of QTLs, and plant height, with high heritability and an average number of QTLs, among others. To validate the simulation findings, the methodologies were further applied to a real Coffea arabica population (n = 195) to identify genomic regions associated with yield, a complex polygenic trait. Results demonstrated a fundamental trade-off between sensitivity and precision. Specifically, for the most complex trait evaluated (240 QTLs under epistatic control), Ensemble methods (Bagging and Random Forest) maintained a Detection Power (DP) exceeding 90%, significantly outperforming state-of-the-art GWAS methods (FarmCPU), which dropped to approximately 30%, and traditional Linear Mixed Models, which failed to detect signals (0%). However, this sensitivity resulted in lower precision for ensembles. In contrast, MARS (Degree 1) and BLINK achieved exceptional Specificity (>99%) and Precision (>90%), effectively minimizing false positives. The real data analysis corroborated these trends: while standard GWAS models failed to detect significant associations, the ML framework successfully prioritized consensus genomic regions harboring functional candidates, such as SWEET sugar transporters and NAC transcription factors. In conclusion, ML Ensembles are recommended for broad exploratory screening to recover missing heritability, while MARS and BLINK are the most effective methods for precise candidate gene validation. Full article
(This article belongs to the Section Application of Artificial Intelligence in Plant Biology)
Show Figures

Figure 1

22 pages, 1021 KB  
Article
A Multiclass Machine Learning Framework for Detecting Routing Attacks in RPL-Based IoT Networks Using a Novel Simulation-Driven Dataset
by Niharika Panda and Supriya Muthuraman
Future Internet 2026, 18(1), 35; https://doi.org/10.3390/fi18010035 - 7 Jan 2026
Viewed by 308
Abstract
The use of resource-constrained Low-Power and Lossy Networks (LLNs), where the IPv6 Routing Protocol for LLNs (RPL) is the de facto routing standard, has increased due to the Internet of Things’ (IoT) explosive growth. Because of the dynamic nature of IoT deployments and [...] Read more.
The use of resource-constrained Low-Power and Lossy Networks (LLNs), where the IPv6 Routing Protocol for LLNs (RPL) is the de facto routing standard, has increased due to the Internet of Things’ (IoT) explosive growth. Because of the dynamic nature of IoT deployments and the lack of in-protocol security, RPL is still quite susceptible to routing-layer attacks like Blackhole, Lowered Rank, version number manipulation, and Flooding despite its lightweight architecture. Lightweight, data-driven intrusion detection methods are necessary since traditional cryptographic countermeasures are frequently unfeasible for LLNs. However, the lack of RPL-specific control-plane semantics in current cybersecurity datasets restricts the use of machine learning (ML) for practical anomaly identification. In order to close this gap, this work models both static and mobile networks under benign and adversarial settings by creating a novel, large-scale multiclass RPL attack dataset using Contiki-NG’s Cooja simulator. To record detailed packet-level and control-plane activity including DODAG Information Object (DIO), DODAG Information Solicitation (DIS), and Destination Advertisement Object (DAO) message statistics along with forwarding and dropping patterns and objective-function fluctuations, a protocol-aware feature extraction pipeline is developed. This dataset is used to evaluate fifteen classifiers, including Logistic Regression (LR), Support Vector Machine (SVM), Decision Tree (DT), k-Nearest Neighbors (KNN), Random Forest (RF), Extra Trees (ET), Gradient Boosting (GB), AdaBoost (AB), and XGBoost (XGB) and several ensemble strategies like soft/hard voting, stacking, and bagging, as part of a comprehensive ML-based detection system. Numerous tests show that ensemble approaches offer better generalization and prediction performance. With overfitting gaps less than 0.006 and low cross-validation variance, the Soft Voting Classifier obtains the greatest accuracy of 99.47%, closely followed by XGBoost with 99.45% and Random Forest with 99.44%. Full article
Show Figures

Graphical abstract

29 pages, 8414 KB  
Article
Optimized Explainable Machine Learning Protocol for Battery State-of-Health Prediction Based on Electrochemical Impedance Spectra
by Lamia Akther, Md Shafiul Alam, Mohammad Ali, Mohammed A. AlAqil, Tahmida Khanam and Md. Feroz Ali
Electronics 2025, 14(24), 4869; https://doi.org/10.3390/electronics14244869 - 10 Dec 2025
Viewed by 581
Abstract
Monitoring the battery state of health (SOH) has become increasingly important for electric vehicles (EVs), renewable storage systems, and consumer gadgets. It indicates the residual usable capacity and performance of a battery in relation to its original specifications. This information is crucial for [...] Read more.
Monitoring the battery state of health (SOH) has become increasingly important for electric vehicles (EVs), renewable storage systems, and consumer gadgets. It indicates the residual usable capacity and performance of a battery in relation to its original specifications. This information is crucial for the safety and performance enhancement of the overall system. This paper develops an explainable machine learning protocol with Bayesian optimization techniques trained on electrochemical impedance spectroscopy (EIS) data to predict battery SOH. Various robust ensemble algorithms, including HistGradientBoosting (HGB), Random Forest, AdaBoost, Extra Trees, Bagging, CatBoost, Decision Tree, LightGBM, Gradient Boost, and XGB, have been developed and fine-tuned for predicting battery health. Eight comprehensive metrics are employed to estimate the model’s performance rigorously: coefficient of determination (R2), mean squared error (MSE), median absolute error (medae), mean absolute error (MAE), correlation coefficient (R), Nash–Sutcliffe efficiency (NSE), Kling–Gupta efficiency (KGE), and root mean squared error (RMSE). Bayesian optimization techniques were developed to optimize hyperparameters across all models, ensuring optimal implementation of each algorithm. Feature importance analysis was performed to thoroughly evaluate the models and assess the features with the most influence on battery health degradation. The comparison indicated that the GradientBoosting model outperformed others, achieving an MAE of 0.1041 and an R2 of 0.9996. The findings suggest that Bayesian-optimized tree-based ensemble methods, particularly gradient boosting, excel at forecasting battery health status from electrochemical impedance spectroscopy data. This result offers an excellent opportunity for practical use in battery management systems that employ diverse industrial state-of-health assessment techniques to enhance battery longevity, contributing to sustainability initiatives for second-life lithium-ion batteries. This capability enables the recycling of vehicle batteries for application in static storage systems, which is environmentally advantageous and ensures continuity. Full article
(This article belongs to the Special Issue Advanced Control and Power Electronics for Electric Vehicles)
Show Figures

Figure 1

25 pages, 1358 KB  
Article
Incorporating Uncertainty in Machine Learning Models to Improve Early Detection of Flavescence Dorée: A Demonstration of Applicability
by Cristina Nuzzi, Erica Saldi, Ilaria Negri and Simone Pasinetti
Sensors 2025, 25(24), 7493; https://doi.org/10.3390/s25247493 - 9 Dec 2025
Viewed by 416
Abstract
Early detection of Flavescence dorée leaf symptoms remains an open question for the research community. This work tries to fill this gap by proposing a methodology exploiting per-pixel data obtained from hyperspectral imaging to produce features suitable for machine learning training. However, since [...] Read more.
Early detection of Flavescence dorée leaf symptoms remains an open question for the research community. This work tries to fill this gap by proposing a methodology exploiting per-pixel data obtained from hyperspectral imaging to produce features suitable for machine learning training. However, since asymptomatic samples are similar to healthy samples, we propose “uncertainty-aware” models that address the probability of the samples being similar, thus producing, as output, an “unclassified” category when the uncertainty between multiple classes is too high. The original dataset of leaves hypercubes was collected in a field of Pinot Noir in northern Italy during 2023 and 2024, for a total of 201 hypercubes equally divided into three classes (“healthy”, “asymptomatic”, “diseased”). Feature predictors were 4 for each of the 10 vegetation indices (population quartiles 25-50-75 and population’s mean), for a total of 40 predictors in total per leaf. Due to the low number of samples, it was not possible to estimate the uncertainty of the input data reliably. Thus, we adopted a double Monte Carlo procedure: First, we generated 30,000 synthetic hypercubes, thus computing the per class variance of each feature predictor. Second, we used this variance (serving as uncertainty of the input data) to generate 60,000 new predictors starting from the data in the test dataset. The trained models were therefore tested on these new data, and their predictions were further examined by a Bayesian test for validation purposes. It is highlighted that the proposed method notably improves recognition of “asymptomatic” samples with respect to the original models. The best model structure is the Decision Tree, achieving a prediction accuracy for “asymptomatic” samples of 75.7% against the original 49.3% for the Ensemble of Bagged Decision Trees (ML4) and of 44.6% against the original 13.2% for the Coarse Decision Tree (ML1). Full article
Show Figures

Figure 1

23 pages, 2510 KB  
Article
MCH-Ensemble: Minority Class Highlighting Ensemble Method for Class Imbalance in Network Intrusion Detection
by Sumin Oh, Seoyoung Sohn, Chaewon Kim and Minseo Park
Appl. Sci. 2025, 15(23), 12647; https://doi.org/10.3390/app152312647 - 28 Nov 2025
Viewed by 461
Abstract
As cyber threats such as denial-of-service (DoS) attacks continue to rise, network intrusion detection systems (NIDS) have become essential components of cybersecurity defense. Although machine learning is widely applied to network intrusion detection, its performance often deteriorates due to the extreme class imbalance [...] Read more.
As cyber threats such as denial-of-service (DoS) attacks continue to rise, network intrusion detection systems (NIDS) have become essential components of cybersecurity defense. Although machine learning is widely applied to network intrusion detection, its performance often deteriorates due to the extreme class imbalance present in real-world data. This imbalance causes models to become biased and unable to detect critical attack instances. To address this issue, we propose MCH-Ensemble (Minority Class Highlighting Ensemble), an ensemble framework designed to improve the detection of minority attack classes. The method constructs multiple balanced subsets through random under-sampling and trains base learners, including decision tree, XGBoost, and LightGBM models. Features of correctly predicted attack samples are then amplified by adding a constant value, producing a boosting-like effect that enhances minority class representation. The highlighted subsets are subsequently combined to train a random forest meta-model, which leverages bagging to capture diverse and fine-grained decision boundaries. Experimental evaluations on the UNSW-NB15, CIC-IDS2017, and WSN-DS datasets demonstrate that MCH-Ensemble effectively mitigates class imbalance and achieves superior recognition of DoS attacks. The proposed method achieves enhanced performance compared with those reported previously. On the UNSW-NB15 and CIC-IDS2017 datasets, it achieves improvements in accuracy, precision, recall, F1-score, and area under the receiver operating characteristic curve (AUC-ROC) by ~1.2% and ~0.61%, ~9.8% and 0.77%, ~0.7% and ~0.56%, ~5.3% and 0.66%, and ~0.1% and ~0.06%, respectively. In addition, it achieves these improvements by ~0.17%, ~1.66%, ~0.11%, ~0.88%, and ~0.06%, respectively, on the WSN-DS dataset. These findings indicate that the proposed framework offers a robust and accurate approach to intrusion detection, contributing to the development of reliable cybersecurity systems in highly imbalanced network environments. Full article
Show Figures

Figure 1

32 pages, 6248 KB  
Article
AI-Driven Resilient Fault Diagnosis of Bearings in Rotating Machinery
by Syed Muhammad Wasi ul Hassan Naqvi, Arsalan Arif, Asif Khan, Fazail Bangash, Ghulam Jawad Sirewal and Bin Huang
Sensors 2025, 25(22), 7092; https://doi.org/10.3390/s25227092 - 20 Nov 2025
Viewed by 1064
Abstract
Predictive maintenance is increasingly important in rotating machinery to prevent unexpected failures, reduce downtime, and improve operational efficiency. This study compares the efficacy of traditional machine learning (ML) and deep learning (DL) techniques in diagnosing bearing faults under varying load and speed conditions. [...] Read more.
Predictive maintenance is increasingly important in rotating machinery to prevent unexpected failures, reduce downtime, and improve operational efficiency. This study compares the efficacy of traditional machine learning (ML) and deep learning (DL) techniques in diagnosing bearing faults under varying load and speed conditions. Two classification tasks were conducted: a simpler three-class task that distinguishes healthy bearings, inner race faults, and outer race faults, and a more complex nine-class task that includes faults of varying severity in the inner and outer races. In this study, the machine learning algorithm ensemble bagged trees, achieved maximum accuracies of 93.04% for the three-class and 87.13% for the nine-class classifications, followed by neural network, SVM, KNN, decision tree, and other algorithms. For deep learning, the CNN model, trained on scalograms (time–frequency images generated by continuous wavelet transform), demonstrated superior performance, reaching up to 100% accuracy in both classification tasks after six training epochs for the nine-class classifications. While CNNs take longer training time, their superior accuracy and capability to automatically extract complex features make the investment worthwhile. Consequently, the results demonstrate that the CNN model trained on CWT-based scalogram images achieved remarkably high classification accuracy, confirming that deep learning methods can outperform traditional ML algorithms in handling complex, non-linear, and dynamic diagnostic scenarios. Full article
(This article belongs to the Special Issue AI-Assisted Condition Monitoring and Fault Diagnosis)
Show Figures

Figure 1

26 pages, 2975 KB  
Article
CTGAN-Augmented Ensemble Learning Models for Classifying Dementia and Heart Failure
by Pornthep Phanbua, Sujitra Arwatchananukul, Georgi Hristov and Punnarumol Temdee
Inventions 2025, 10(6), 101; https://doi.org/10.3390/inventions10060101 - 6 Nov 2025
Viewed by 1144
Abstract
Research shows that individuals with heart failure are 60% more likely to develop dementia because of their shared metabolic risk factors. Developing a classification model to differentiate between these two conditions effectively is crucial for improving diagnostic accuracy, guiding clinical decision-making, and supporting [...] Read more.
Research shows that individuals with heart failure are 60% more likely to develop dementia because of their shared metabolic risk factors. Developing a classification model to differentiate between these two conditions effectively is crucial for improving diagnostic accuracy, guiding clinical decision-making, and supporting timely interventions in older adults. This study proposes a novel method for dementia classification, distinguishing it from its common comorbidity, heart failure, using blood testing and personal data. A dataset comprising 11,124 imbalanced electronic health records of older adults from hospitals in Chiang Rai, Thailand, was utilized. Conditional tabular generative adversarial networks (CTGANs) were employed to generate synthetic data while preserving key statistical relationships, diversity, and distributions of the original dataset. Two groups of ensemble models were analyzed: the boosting group—extreme gradient boosting, light gradient boosting machine—and the bagging group—random forest and extra trees. Performance metrics, including accuracy, precision, recall, F1-score, and area under the receiver-operating characteristic curve were evaluated. Compared with the synthetic minority oversampling technique, CTGAN-based synthetic data generation significantly enhanced the performance of ensemble learning models in classifying dementia and heart failure. Full article
(This article belongs to the Special Issue Machine Learning Applications in Healthcare and Disease Prediction)
Show Figures

Figure 1

24 pages, 3177 KB  
Article
National-Scale Electricity Consumption Forecasting in Turkey Using Ensemble Machine Learning Models: An Interpretability-Centered Approach
by Ahmet Sabri Öğütlü
Sustainability 2025, 17(21), 9829; https://doi.org/10.3390/su17219829 - 4 Nov 2025
Viewed by 851
Abstract
This study presents an advanced, interpretability-focused machine learning framework for forecasting electricity consumption in Turkey over the period 2016–2024. The proposed approach is based on a high-dimensional dataset that incorporates a diverse set of variables, including sector-specific electricity usage (residential, industrial, lighting, agricultural, [...] Read more.
This study presents an advanced, interpretability-focused machine learning framework for forecasting electricity consumption in Turkey over the period 2016–2024. The proposed approach is based on a high-dimensional dataset that incorporates a diverse set of variables, including sector-specific electricity usage (residential, industrial, lighting, agricultural, and commercial), electricity production, trade metrics (imports and exports in USD), and macroeconomic indicators such as the Industrial Production Index (IPI). A comprehensive set of eight state-of-the-art regression algorithms—including ensemble models such as CatBoost, LightGBM, Random Forest, and Bagging Regressor—were developed and rigorously evaluated. Among these, CatBoost emerged as the most accurate model, achieving R2 values of 0.9144 for electricity production and 0.8247 for electricity consumption. Random Forest and LightGBM followed closely, further confirming the effectiveness of tree-based ensemble methods in capturing nonlinear relationships in complex datasets. To enhance model interpretability, SHAP (SHapley Additive exPlanations) and traditional feature importance analyses were applied, revealing that residential electricity consumption was the dominant predictor across all models, accounting for more than 70% of the variance explained in consumption forecasts. In contrast, macroeconomic indicators and temporal variables showed marginal contributions, suggesting that electricity demand in Turkey is predominantly driven by internal sectoral consumption trends rather than external economic or seasonal dynamics. In addition to historical evaluation, scenario-based forecasting was conducted for the 2025–2030 period, incorporating varying assumptions about economic growth and population trends. These scenarios demonstrated the model’s robustness and adaptability to different future trajectories, offering valuable foresight for strategic energy planning. The methodological contributions of this study lie in its integration of high-dimensional, multivariate data with transparent, interpretable machine learning models, making it a robust and scalable decision-support tool for policymakers, energy authorities, and infrastructure planners aiming to enhance national energy resilience and policy responsiveness. Full article
Show Figures

Figure 1

21 pages, 3338 KB  
Article
Enhancing Migraine Classification Through Machine Learning: A Comparative Study of Ensemble Methods
by Raniya R. Sarra, Ayad E. Korial, Ivan Isho Gorial and Amjad J. Humaidi
Technologies 2025, 13(11), 500; https://doi.org/10.3390/technologies13110500 - 1 Nov 2025
Viewed by 938
Abstract
A migraine is a common and complex neurological disorder affecting more than 90% of people globally. Traditional migraine diagnostic and classification methods are time-intensive and prone to error. In today’s world, where health and technology are closely connected, there is an urgent need [...] Read more.
A migraine is a common and complex neurological disorder affecting more than 90% of people globally. Traditional migraine diagnostic and classification methods are time-intensive and prone to error. In today’s world, where health and technology are closely connected, there is an urgent need for more advanced tools to accurately predict and classify migraine types. Machine learning (ML) has shown promise in automating migraine diagnoses and classification. However, individual ML classifiers may not always work well, which means that they need to be improved. In this paper, we used three ML classifiers that include decision tree, naïve Bayes, and k-nearest neighbor to classify seven different types of migraines. We also investigated ensemble classifiers like bagging, boosting, stacking, and majority voting to obtain better results. All classifiers were trained on a migraine dataset of 400 patients with 24 features. Before training the classifiers, we pre-processed the data by balancing the classes, removing useless features, and checking for correlations. After evaluating the performance, the results showed that majority voting achieved the highest accuracy improvement (7.59%), followed by boosting (6.55%), bagging (5.86%), and stacking (5.52%). These results indicate that the ensemble methods are effective in improving the classification accuracy of individual ML classifiers when it comes to classifying migraines. Full article
Show Figures

Figure 1

26 pages, 4723 KB  
Article
Time-Frequency-Based Separation of Earthquake and Noise Signals on Real Seismic Data: EMD, DWT and Ensemble Classifier Approaches
by Yunus Emre Erdoğan and Ali Narin
Sensors 2025, 25(21), 6671; https://doi.org/10.3390/s25216671 - 1 Nov 2025
Viewed by 666
Abstract
Earthquakes are sudden and destructive natural events caused by tectonic movements in the Earth’s crust. Although they cannot be predicted with certainty, rapid and reliable detection is essential to reduce loss of life and property. This study aims to automatically distinguish earthquake and [...] Read more.
Earthquakes are sudden and destructive natural events caused by tectonic movements in the Earth’s crust. Although they cannot be predicted with certainty, rapid and reliable detection is essential to reduce loss of life and property. This study aims to automatically distinguish earthquake and noise signals from real seismic data by analyzing time-frequency features. Signals were scaled using z-score normalization, and extracted with Empirical Mode Decomposition (EMD), Discrete Wavelet Transform (DWT), and combined EMD+DWT methods. Feature selection methods such as Lasso, ReliefF, and Student’s t-test were applied to identify the most discriminative features. Classification was performed with Ensemble Bagged Trees, Decision Trees, Random Forest, k-Nearest Neighbors (k-NN), and Support Vector Machines (SVM). The highest performance was achieved using the RF classifier with the Lasso-based EMD+DWT feature set, reaching 100% accuracy, specificity, and sensitivity. Overall, DWT and EMD+DWT features yielded higher performance than EMD alone. While k-NN and SVM were less effective, tree-based methods achieved superior results. Moreover, Lasso and ReliefF outperformed Student’s t-test. These findings show that time-frequency-based features are crucial for separating earthquake signals from noise and provide a basis for improving real-time detection. The study contributes to the academic literature and holds significant potential for integration into early warning and earthquake monitoring systems. Full article
Show Figures

Figure 1

30 pages, 379 KB  
Article
An Enhanced Discriminant Analysis Approach for Multi-Classification with Integrated Machine Learning-Based Missing Data Imputation
by Autcha Araveeporn and Atid Kangtunyakarn
Mathematics 2025, 13(21), 3392; https://doi.org/10.3390/math13213392 - 24 Oct 2025
Viewed by 729
Abstract
This study addresses the challenge of accurate classification under missing data conditions by integrating multiple imputation strategies with discriminant analysis frameworks. The proposed approach evaluates six imputation methods (Mean, Regression, KNN, Random Forest, Bagged Trees, MissRanger) across several discriminant techniques. Simulation scenarios varied [...] Read more.
This study addresses the challenge of accurate classification under missing data conditions by integrating multiple imputation strategies with discriminant analysis frameworks. The proposed approach evaluates six imputation methods (Mean, Regression, KNN, Random Forest, Bagged Trees, MissRanger) across several discriminant techniques. Simulation scenarios varied in sample size, predictor dimensionality, and correlation structure, while the real-world application employed the Cirrhosis Prediction Dataset. The results consistently demonstrate that ensemble-based imputations, particularly regression, KNN, and MissRanger, outperform simpler approaches by preserving multivariate structure, especially in high-dimensional and highly correlated settings. MissRanger yielded the highest classification accuracy across most discriminant analysis methods in both simulated and real data, with performance gains most pronounced when combined with flexible or regularized classifiers. Regression imputation showed notable improvements under low correlation, aligning with the theoretical benefits of shrinkage-based covariance estimation. Across all methods, larger sample sizes and high correlation enhanced classification accuracy by improving parameter stability and imputation precision. Full article
(This article belongs to the Section D1: Probability and Statistics)
24 pages, 2310 KB  
Article
Optimizing Mycophenolate Therapy in Renal Transplant Patients Using Machine Learning and Population Pharmacokinetic Modeling
by Anastasia Tsyplakova, Aleksandra Catic-Djorđevic, Nikola Stefanović and Vangelis D. Karalis
Med. Sci. 2025, 13(4), 235; https://doi.org/10.3390/medsci13040235 - 20 Oct 2025
Viewed by 1152
Abstract
Background/Objectives: Mycophenolic acid (MPA) is used as part of first-line combination immunosuppressive therapy for renal transplant recipients. Personalized dosing approaches are needed to balance efficacy and minimize toxicity due to the pharmacokinetic variability of the drug. In this study, population pharmacokinetic (PopPK) modeling [...] Read more.
Background/Objectives: Mycophenolic acid (MPA) is used as part of first-line combination immunosuppressive therapy for renal transplant recipients. Personalized dosing approaches are needed to balance efficacy and minimize toxicity due to the pharmacokinetic variability of the drug. In this study, population pharmacokinetic (PopPK) modeling and machine learning (ML) techniques are coupled to provide valuable insights into optimizing MPA therapy. Methods: Using data from 76 renal transplant patients, two PopPK models were developed to describe and predict MPA levels for two different formulations (enteric-coated mycophenolate sodium and mycophenolate mofetil). Covariate effects on drug clearance were assessed, and Monte Carlo simulations were used to evaluate exposure under normal and reduced clearance conditions. ML techniques, including principal component analysis (PCA) and ensemble tree models (bagging and boosting), were applied to identify predictive factors and explore associations between MPA plasma/saliva concentrations and the examined covariates. Results: Total daily dose and post-transplant time (PTP) were identified as key covariates affecting clearance. PCA highlighted MPA dose as the primary determinant of plasma levels, with urea and PTP also playing significant roles. Boosted tree analysis confirmed these findings, demonstrating strong predictive accuracy (R2 > 0.91). Incorporating saliva MPA levels improved predictive performance, suggesting that saliva may be a complementary monitoring tool, although plasma monitoring remained superior. Simulations allowed exploring potential dosing adjustments for patients with reduced clearance. Conclusions: This study demonstrates the potential of integrating machine learning with population pharmacokinetic modeling to improve the understanding of MPA variability and support individualized dosing strategies in renal transplant recipients. The developed PopPK/ML models provide a methodological foundation for future research toward more personalized immunosuppressive therapy. Full article
(This article belongs to the Section Translational Medicine)
Show Figures

Graphical abstract

18 pages, 4759 KB  
Article
Daily Peak Load Prediction Method Based on XGBoost and MLR
by Bin Cao, Yahui Chen, Sile Hu, Yu Guo, Xianglong Liu, Yuan Wang, Xiaolei Cheng, Qian Zhang and Jiaqiang Yang
Appl. Sci. 2025, 15(20), 11180; https://doi.org/10.3390/app152011180 - 18 Oct 2025
Cited by 1 | Viewed by 538
Abstract
During the peak load period, there is a high level of imbalance between power supply and demand, which has become a critical challenge, leading to higher operational costs for power grids. To improve the accuracy of peak load forecasting, this study introduces a [...] Read more.
During the peak load period, there is a high level of imbalance between power supply and demand, which has become a critical challenge, leading to higher operational costs for power grids. To improve the accuracy of peak load forecasting, this study introduces a novel approach based on Extreme Gradient Boosting Trees (XGBoost) and Multiple Linear Regression (MLR) for daily peak load prediction. The proposed methodology first employs an improved version of the Complete Ensemble Empirical Mode Decomposition with Adaptive Noise (ICEEMDAN) algorithm to decompose the raw load data, subsequently reconstructing each Intrinsic Mode Function (IMF) into high-frequency and stationary components. For the high-frequency components, XGBoost serves as the base predictor within a Bagging-based ensemble structure, while the Sparrow Search Algorithm (SSA) is employed to optimize hyperparameters automatically, ensuring efficient learning and accurate representation of complex peak load fluctuations. Meanwhile, the stationary components are modeled using MLR to provide fast and reliable estimations. The proposed framework was evaluated using actual daily peak load data from Western Inner Mongolia, China. The results indicate that the proposed method successfully captures the peak characteristics of the power grid, delivering both robust and precise predictions. When compared to the baseline model, the RMSE and MAPE are reduced by 54.4% and 87.3%, respectively, underscoring its significant potential for practical applications in power system operation and planning. Full article
Show Figures

Figure 1

36 pages, 3174 KB  
Review
A Bibliometric-Systematic Literature Review (B-SLR) of Machine Learning-Based Water Quality Prediction: Trends, Gaps, and Future Directions
by Jeimmy Adriana Muñoz-Alegría, Jorge Núñez, Ricardo Oyarzún, Cristian Alfredo Chávez, José Luis Arumí and Lien Rodríguez-López
Water 2025, 17(20), 2994; https://doi.org/10.3390/w17202994 - 17 Oct 2025
Viewed by 2766
Abstract
Predicting the quality of freshwater, both surface and groundwater, is essential for the sustainable management of water resources. This study collected 1822 articles from the Scopus database (2000–2024) and filtered them using Topic Modeling to create the study corpus. The B-SLR analysis identified [...] Read more.
Predicting the quality of freshwater, both surface and groundwater, is essential for the sustainable management of water resources. This study collected 1822 articles from the Scopus database (2000–2024) and filtered them using Topic Modeling to create the study corpus. The B-SLR analysis identified exponential growth in scientific publications since 2020, indicating that this field has reached a stage of maturity. The results showed that the predominant techniques for predicting water quality, both for surface and groundwater, fall into three main categories: (i) ensemble models, with Bagging and Boosting representing 43.07% and 25.91%, respectively, particularly random forest (RF), light gradient boosting machine (LightGBM), and extreme gradient boosting (XGB), along with their optimized variants; (ii) deep neural networks such as long short-term memory (LSTM) and convolutional neural network (CNN), which excel at modeling complex temporal dynamics; and (iii) traditional algorithms like artificial neural network (ANN), support vector machines (SVMs), and decision tree (DT), which remain widely used. Current trends point towards the use of hybrid and explainable architectures, with increased application of interpretability techniques. Emerging approaches such as Generative Adversarial Network (GAN) and Group Method of Data Handling (GMDH) for data-scarce contexts, Transfer Learning for knowledge reuse, and Transformer architectures that outperform LSTM in time series prediction tasks were also identified. Furthermore, the most studied water bodies (e.g., rivers, aquifers) and the most commonly used water quality indicators (e.g., WQI, EWQI, dissolved oxygen, nitrates) were identified. The B-SLR and Topic Modeling methodology provided a more robust, reproducible, and comprehensive overview of AI/ML/DL models for freshwater quality prediction, facilitating the identification of thematic patterns and research opportunities. Full article
(This article belongs to the Special Issue Machine Learning Applications in the Water Domain)
Show Figures

Figure 1

19 pages, 1951 KB  
Article
Enhancing Lemon Leaf Disease Detection: A Hybrid Approach Combining Deep Learning Feature Extraction and mRMR-Optimized SVM Classification
by Ahmet Saygılı
Appl. Sci. 2025, 15(20), 10988; https://doi.org/10.3390/app152010988 - 13 Oct 2025
Cited by 1 | Viewed by 1285
Abstract
This study presents a robust and extensible hybrid classification framework for accurately detecting diseases in citrus leaves by integrating transfer learning-based deep learning models with classical machine learning techniques. Features were extracted using advanced pretrained architectures—DenseNet201, ResNet50, MobileNetV2, and EfficientNet-B0—and refined via the [...] Read more.
This study presents a robust and extensible hybrid classification framework for accurately detecting diseases in citrus leaves by integrating transfer learning-based deep learning models with classical machine learning techniques. Features were extracted using advanced pretrained architectures—DenseNet201, ResNet50, MobileNetV2, and EfficientNet-B0—and refined via the minimum redundancy maximum relevance (mRMR) method to reduce redundancy while maximizing discriminative power. These features were classified using support vector machines (SVMs), ensemble bagged trees, k-nearest neighbors (kNNs), and neural networks under stratified 10-fold cross-validation. On the lemon dataset, the best configuration (DenseNet201 + SVM) achieved 94.1 ± 4.9% accuracy, 93.2 ± 5.7% F1 score, and a balanced accuracy of 93.4 ± 6.0%, demonstrating strong and stable performance. To assess external generalization, the same pipeline was applied to mango and pomegranate leaves, achieving 100.0 ± 0.0% and 98.7 ± 1.5% accuracy, respectively—confirming the model’s robustness across citrus and non-citrus domains. Beyond accuracy, lightweight models such as EfficientNet-B0 and MobileNetV2 provided significantly higher throughput and lower latency, underscoring their suitability for real-time agricultural applications. These findings highlight the importance of combining deep representations with efficient classical classifiers for precision agriculture, offering both high diagnostic accuracy and practical deployability in field conditions. Full article
(This article belongs to the Topic Digital Agriculture, Smart Farming and Crop Monitoring)
Show Figures

Figure 1

Back to TopTop