MDPI - Publisher of Open Access Journals

48 pages, 5217 KB

Open AccessArticle

AutoML-Based Prediction of Unconfined Compressive Strength of Stabilized Soils: A Multi-Dataset Evaluation on Worldwide Experimental Data

by Romulo Murucci Oliveira, Deivid Campos, Katia Vanessa Bicalho, Bruno da S. Macêdo, Matteo Bodini, Camila Martins Saporetti and Leonardo Goliatt

Forecasting 2025, 7(4), 80; https://doi.org/10.3390/forecast7040080 - 18 Dec 2025

Cited by 1 | Viewed by 1023

Abstract

Unconfined Compressive Strength (UCS) of stabilized soils is commonly used for evaluating the effectiveness of soil improvement techniques. Achieving target UCS values through conventional trial-and-error approaches requires extensive laboratory experiments, which are time-consuming and resource-intensive. Automated Machine Learning (AutoML) frameworks offer a promising [...] Read more.

Unconfined Compressive Strength (UCS) of stabilized soils is commonly used for evaluating the effectiveness of soil improvement techniques. Achieving target UCS values through conventional trial-and-error approaches requires extensive laboratory experiments, which are time-consuming and resource-intensive. Automated Machine Learning (AutoML) frameworks offer a promising alternative by enabling automated, reproducible, and accessible predictive modeling of UCS values from more readily obtainable index and physical soil and stabilizer properties, reducing the reliance on experimental testing and empirical relationships, and allowing systematic exploration of multiple models and configurations. This study evaluates the predictive performance of five state-of-the-art AutoML frameworks (i.e., AutoGluon, AutoKeras, FLAML, H2O, and TPOT) using analyses of results from 10 experimental datasets comprising 2083 samples from laboratory experiments spanning diverse soil types, stabilizers, and experimental conditions across many countries worldwide. Comparative analyses revealed that FLAML achieved the highest overall performance (average PI score of 0.7848), whereas AutoKeras exhibited lower accuracy on complex datasets; AutoGluon , H2O and TPOT also demonstrated strong predictive capabilities, with performance varying with dataset characteristics. Despite the promising potential of AutoML, prior research has shown that fully automated frameworks have limited applicability to UCS prediction, highlighting a gap in end-to-end pipeline automation. The findings provide practical guidance for selecting AutoML tools based on dataset characteristics and research objectives, and suggest avenues for future studies, including expanding the range of AutoML frameworks and integrating interpretability techniques, such as feature importance analysis, to deepen understanding of soil–stabilizer interactions. Overall, the results indicate that AutoML frameworks can effectively accelerate UCS prediction, reduce laboratory workload, and support data-driven decision-making in geotechnical engineering. Full article

► Show Figures

Figure 1

32 pages, 28258 KB

Open AccessArticle

Machine Learning-Based Classification of ICU-Acquired Neuromuscular Weakness: A Comparative Study in Survivors of Critical Illness

by David Estévez-Freire, Ivan Cangas, Andrés Tirado-Espín, Johanna Pozo-Neira, Fernando Villalba-Meneses, Diego Almeida-Galárraga and Omar Alvarado-Cando

Life 2025, 15(12), 1802; https://doi.org/10.3390/life15121802 - 25 Nov 2025

Viewed by 924

Abstract

Classifying the severity of intensive-care-unit-acquired muscle atrophy (ICU-AW) is essential for early prognosis and individualized neurorehabilitation, improving functional outcomes in survivors of critical illness. This study evaluated and compared advanced machine learning (ML) algorithms for classifying neuromuscular atrophy in neurocritical patients. Clinical, biochemical, [...] Read more.

Classifying the severity of intensive-care-unit-acquired muscle atrophy (ICU-AW) is essential for early prognosis and individualized neurorehabilitation, improving functional outcomes in survivors of critical illness. This study evaluated and compared advanced machine learning (ML) algorithms for classifying neuromuscular atrophy in neurocritical patients. Clinical, biochemical, anthropometric, and morphometric data from 198 neuro-ICU patients were retrospectively analyzed. Six supervised ML models—Support Vector Machine (SVM), Multilayer Perceptron (MLP), Extreme Gradient Boosting (XGBoost), TPOT AutoML, AdaBoost, and Multinomial Logistic Regression—were trained using stratified cross-validation, synthetic oversampling, and hyperparameter optimization. Among the most outstanding models, SVM achieved the best performance (accuracy = 93%, ROC-AUC = 0.95), followed by MLP (accuracy = 82.8%, ROC-AUC = 0.93) and XGBoost (accuracy = 80%, ROC-AUC = 0.94). Stability analyses across random seeds confirmed the robustness of SVM and TPOT, with the highest median AUPRC (>0.90). Explainable AI methods (LIME and SHAP) identified BMI, serum albumin, and body surface area as the most influential variables, showing physiologically consistent patterns associated with a classification of muscle loss. Full article

(This article belongs to the Section Biochemistry, Biophysics and Computational Biology)

► Show Figures

Figure 1

15 pages, 2365 KB

Open AccessArticle

Leveraging Explainable Automated Machine Learning (AutoML) and Metabolomics for Robust Diagnosis and Pathophysiological Insights in Myalgic Encephalomyelitis/Chronic Fatigue Syndrome (ME/CFS)

by Fatma Hilal Yagin, Cemil Colak, Fahaid Al-Hashem, Sarah A. Alzakari, Amel Ali Alhussan and Mohammadreza Aghaei

Diagnostics 2025, 15(21), 2755; https://doi.org/10.3390/diagnostics15212755 - 30 Oct 2025

Cited by 1 | Viewed by 1203

Abstract

Background/Objectives: Myalgic Encephalomyelitis/Chronic Fatigue Syndrome (ME/CFS) is a debilitating complex disease with an elusive etiology, lacking objective diagnostic biomarkers. This study leverages advanced Automated Machine Learning (AutoML) to analyze plasma metabolomic and lipidomic profiles for the purpose of ME/CFS detection. Methods: [...] Read more.

Background/Objectives: Myalgic Encephalomyelitis/Chronic Fatigue Syndrome (ME/CFS) is a debilitating complex disease with an elusive etiology, lacking objective diagnostic biomarkers. This study leverages advanced Automated Machine Learning (AutoML) to analyze plasma metabolomic and lipidomic profiles for the purpose of ME/CFS detection. Methods: We utilized a publicly available dataset comprising 888 metabolic features from 106 ME/CFS patients and 91 matched controls. Three AutoML frameworks—TPOT, Auto-Sklearn, and H2O AutoML—were benchmarked under identical time constraints. Univariate ROC and PLS-DA analyses with cross-validation, permutation testing, and VIP-based feature selection were applied to standardized, log-transformed omics data to identify significant discriminatory metabolites/lipids and assess their intercorrelations. Results: TPOT significantly outperformed its counterparts, achieving an area under the curve (AUC) of 92.1%, accuracy of 87.3%, sensitivity of 85.8%, and specificity of 89.0%. The PLS-DA model revealed a moderate but statistically significant discrimination between ME/CFS and controls. Explainable artificial intelligence (XAI) via SHAP analysis of the optimal TPOT model identified key metabolites implicating dysregulated pathways in mitochondrial energy metabolism (succinic acid, pyruvic acid, leucine), chronic inflammation (prostaglandin D₂, 11,12-EET), gut–brain axis communication (glycocholic acid), and cell membrane integrity (pc(35:2)a). Conclusions: Our results demonstrate that TPOT-derived models not only provide a highly accurate and robust diagnostic tool but also yield biologically interpretable insights into the pathophysiology of ME/CFS, highlighting its potential for clinical decision support and elucidating novel therapeutic targets. Full article

(This article belongs to the Special Issue The Future of Diagnostics: Exploring the Role of Artificial Intelligence in Medicine)

► Show Figures

Figure 1

5 pages, 314 KB

Open AccessProceeding Paper

Bespoke Biomarker Combinations for Cancer Survival Prognosis Using Artificial Intelligence on Tumour Transcriptomics

by Ricardo Jorge Pais, Tiago Alexandre Pais and Uraquitan Lima Filho

Med. Sci. Forum 2025, 37(1), 18; https://doi.org/10.3390/msf2025037018 - 2 Sep 2025

Viewed by 461

Abstract

Accurate cancer prognosis remains a major challenge, as single gene expression biomarkers often lack clinical reliability, and most ML approaches fail even when considering large gene panels. In this study, we used a novel AutoML framework (O2Pmgen) benchmarked with a well-established framework (TPOT) [...] Read more.

Accurate cancer prognosis remains a major challenge, as single gene expression biomarkers often lack clinical reliability, and most ML approaches fail even when considering large gene panels. In this study, we used a novel AutoML framework (O2Pmgen) benchmarked with a well-established framework (TPOT) on TCGA transcriptomic data for breast, lung, and renal cancers to identify small gene panels predictive of patient survival. From 58 EMT-related genes, we found models based on panels of 6–10 genes that outperformed single-marker models and ML models that considered the 58 EMT genes, with performance gains up to 21%. Further, the generated models achieved good predictive power with AUCs of 71–83%. Our results demonstrated that affordable and efficient prognostic tools using small, biologically relevant gene sets can provide better risk stratification in clinical oncology. Full article

(This article belongs to the Proceedings of

7th CiiEM International Congress 2025—Empowering One Health to Reduce Social Vulnerabilities

)

► Show Figures

Figure 1

21 pages, 3353 KB

Open AccessArticle

Automated Machine Learning-Based Significant Wave Height Prediction for Marine Operations

by Yuan Zhang, Hao Wang, Bo Wu, Jiajing Sun, Mingli Fan, Shu Dai, Hengyi Yang and Minyi Xu

J. Mar. Sci. Eng. 2025, 13(8), 1476; https://doi.org/10.3390/jmse13081476 - 31 Jul 2025

Cited by 2 | Viewed by 1366

Abstract

Determining/predicting the environment dominates a variety of marine operations, such as route planning and offshore installation. Significant wave height (Hs) is a critical parameter-defining wave, a dominating marine load. Data-driven machine learning methods have been increasingly applied to Hs prediction, but challenges remain [...] Read more.

Determining/predicting the environment dominates a variety of marine operations, such as route planning and offshore installation. Significant wave height (Hs) is a critical parameter-defining wave, a dominating marine load. Data-driven machine learning methods have been increasingly applied to Hs prediction, but challenges remain in hyperparameter tuning and spatial generalization. This study explores a novel effective approach for intelligent Hs forecasting for marine operations. Multiple automated machine learning (AutoML) frameworks, namely H2O, PyCaret, AutoGluon, and TPOT, have been systematically evaluated on buoy-based Hs prediction tasks, which reveal their advantages and limitations under various forecast horizons and data quality scenarios. The results indicate that PyCaret achieves superior accuracy in short-term forecasts, while AutoGluon demonstrates better robustness in medium-term and long-term predictions. To address the limitations of single-point prediction models, which often exhibit high dependence on localized data and limited spatial generalization, a multi-point data fusion framework incorporating Principal Component Analysis (PCA) is proposed. The framework utilizes Hs data from two stations near the California coast to predict Hs at another adjacent station. The results indicate that it is possible to realize cross-station predictions based on the data from adjacent (high relevance) stations. Full article

(This article belongs to the Section Physical Oceanography)

► Show Figures

Figure 1

24 pages, 5075 KB

Open AccessArticle

Automated Machine Learning-Based Prediction of the Effects of Physicochemical Properties and External Experimental Conditions on Cadmium Adsorption by Biochar

by Shuoyang Wang, Xiangyu Song, Jicheng Duan, Shuo Li, Dangdang Gao, Jia Liu, Fanjing Meng, Wen Yang, Shixin Yu, Fangshu Wang, Jie Xu, Siyi Luo, Fangchao Zhao and Dong Chen

Water 2025, 17(15), 2266; https://doi.org/10.3390/w17152266 - 30 Jul 2025

Cited by 3 | Viewed by 1702

Abstract

Biochar serves as an effective adsorbent for the heavy metal cadmium, with its performance significantly influenced by its physicochemical properties and various environmental features. Traditional machine learning models, though adept at managing complex multi-feature relationships, rely heavily on expertise in feature engineering and [...] Read more.

Biochar serves as an effective adsorbent for the heavy metal cadmium, with its performance significantly influenced by its physicochemical properties and various environmental features. Traditional machine learning models, though adept at managing complex multi-feature relationships, rely heavily on expertise in feature engineering and hyperparameter optimization. To address these issues, this study employs an automated machine learning (AutoML) approach, automating feature selection and model optimization, coupled with an intuitive online graphical user interface, enhancing accessibility and generalizability. Comparative analysis of four AutoML frameworks (TPOT, FLAML, AutoGluon, H₂O AutoML) demonstrated that H₂O AutoML achieved the highest prediction accuracy (R² = 0.918). Key features influencing adsorption performance were identified as initial cadmium concentration (23%), stirring rate (14.7%), and the biochar H/C ratio (9.7%). Additionally, the maximum adsorption capacity of the biochar was determined to be 105 mg/g. Optimal production conditions for biochar were determined to be a pyrolysis temperature of 570–800 °C, a residence time of ≥2 h, and a heating rate of 3–10 °C/min to achieve an H/C ratio of <0.2. An online graphical user interface was developed to facilitate user interaction with the model. This study not only provides practical guidelines for optimizing biochar but also introduces a novel approach to modeling using AutoML. Full article

(This article belongs to the Special Issue Advanced Adsorbent-Based Technologies for Efficient Wastewater Treatment)

► Show Figures

Figure 1

19 pages, 1425 KB

Open AccessArticle

Early Detection of Autism Spectrum Disorder Through Automated Machine Learning

by Khafsa Ehsan, Kashif Sultan, Abreen Fatima, Muhammad Sheraz and Teong Chee Chuah

Diagnostics 2025, 15(15), 1859; https://doi.org/10.3390/diagnostics15151859 - 24 Jul 2025

Cited by 5 | Viewed by 6682

Abstract

Background/Objectives: Autism spectrum disorder (ASD) is a neurodevelopmental disorder distinguished by an extensive range of symptoms, including reduced social interaction, communication difficulties and tiresome behaviors. Early detection of ASD is important because it allows for timely intervention, which significantly improves developmental, behavioral, [...] Read more.

Background/Objectives: Autism spectrum disorder (ASD) is a neurodevelopmental disorder distinguished by an extensive range of symptoms, including reduced social interaction, communication difficulties and tiresome behaviors. Early detection of ASD is important because it allows for timely intervention, which significantly improves developmental, behavioral, and communicative outcomes in children. However, traditional diagnostic procedures for identifying autism spectrum disorder (ASD) typically involve lengthy clinical examinations, which can be both time-consuming and costly. This research proposes leveraging automated machine learning (AUTOML) to streamline the diagnostic process and enhance its accuracy. Methods: In this study, by collecting data from various rehabilitation centers across Pakistan, we applied a specific AUTOML tool known as Tree-based Pipeline Optimization Tool (TPOT) for ASD detection. Notably, this study marks one of the initial explorations into utilizing AUTOML for ASD detection. The experimentations indicate that the TPOT provided the best pipeline for the dataset, which was verified using a manual machine learning method. Results: The study contributes to the field of ASD diagnosis by using AUTOML to determine the likelihood of ASD in children at prompt stages of evolution. The study also provides an evaluation of precision, recall, and F1-score metrics to confirm the correctness of the diagnosis. The propose TPOT-based AUTOML framework attained an overall accuracy 78%, with a precision of 83%, a recall of 90%, and an F1-score of 86% for the autistic class. Conclusions: In summary, this research offers an encouraging approach to improve the detection of autism spectrum disorders (ASD) in children, which could lead to better results for affected individuals and their families. Full article

(This article belongs to the Special Issue Artificial Intelligence in Biomedical Diagnostics and Analysis 2024)

► Show Figures

Figure 1

25 pages, 7504 KB

Open AccessArticle

Explainable Artificial Intelligence (XAI) for Flood Susceptibility Assessment in Seoul: Leveraging Evolutionary and Bayesian AutoML Optimization

by Kounghoon Nam, Youngkyu Lee, Sungsu Lee, Sungyoon Kim and Shuai Zhang

Remote Sens. 2025, 17(13), 2244; https://doi.org/10.3390/rs17132244 - 30 Jun 2025

Cited by 4 | Viewed by 2217

Abstract

This study aims to enhance the accuracy and interpretability of flood susceptibility mapping (FSM) in Seoul, South Korea, by integrating automated machine learning (AutoML) with explainable artificial intelligence (XAI) techniques. Ten topographic and environmental conditioning factors were selected as model inputs. We first [...] Read more.

This study aims to enhance the accuracy and interpretability of flood susceptibility mapping (FSM) in Seoul, South Korea, by integrating automated machine learning (AutoML) with explainable artificial intelligence (XAI) techniques. Ten topographic and environmental conditioning factors were selected as model inputs. We first employed the Tree-based Pipeline Optimization Tool (TPOT), an evolutionary AutoML algorithm, to construct baseline ensemble models using Gradient Boosting (GB), Random Forest (RF), and XGBoost (XGB). These models were further fine-tuned using Bayesian optimization via Optuna. To interpret the model outcomes, SHAP (SHapley Additive exPlanations) was applied to analyze both the global and local contributions of each factor. The SHAP analysis revealed that lower elevation, slope, and stream distance, as well as higher stream density and built-up areas, were the most influential factors contributing to flood susceptibility. Moreover, interactions between these factors, such as built-up areas located on gentle slopes near streams, further intensified flood risk. The susceptibility maps were reclassified into five categories (very low to very high), and the GB model identified that approximately 15.047% of the study area falls under very-high-flood-risk zones. Among the models, the GB classifier achieved the highest performance, followed by XGB and RF. The proposed framework, which integrates TPOT, Optuna, and SHAP within an XAI pipeline, not only improves predictive capability but also offers transparent insights into feature behavior and model logic. These findings support more robust and interpretable flood risk assessments for effective disaster management in urban areas. Full article

(This article belongs to the Special Issue Artificial Intelligence for Natural Hazards (AI4NH))

► Show Figures

Figure 1

19 pages, 4395 KB

Open AccessArticle

Web-Based Baseflow Estimation in SWAT Considering Spatiotemporal Recession Characteristics Using Machine Learning

by Jimin Lee, Jeongho Han, Bernard Engel and Kyoung Jae Lim

Environments 2025, 12(3), 94; https://doi.org/10.3390/environments12030094 - 17 Mar 2025

Cited by 2 | Viewed by 1922

Abstract

The increasing frequency and severity of hydrological extremes due to climate change necessitate accurate baseflow estimation and effective watershed management for sustainable water resource use. The Soil and Water Assessment Tool (SWAT) is widely utilized for hydrological modeling but shows limitations in baseflow [...] Read more.

The increasing frequency and severity of hydrological extremes due to climate change necessitate accurate baseflow estimation and effective watershed management for sustainable water resource use. The Soil and Water Assessment Tool (SWAT) is widely utilized for hydrological modeling but shows limitations in baseflow simulation due to its uniform application of the alpha factor across Hydrologic Response Units (HRUs), neglecting spatial and temporal variability. To address these challenges, this study integrated SWAT with the Tree-Based Pipeline Optimization Tool (TPOT), an automated machine learning (AutoML) framework, to predict HRU-specific alpha factors. Furthermore, a user-friendly web-based program was developed to improve the accessibility and practical application of these optimized alpha factors, supporting more accurate baseflow predictions, even in ungauged watersheds. The proposed HRU-specific alpha factor approach in the study area significantly enhanced the recession and baseflow predictions compared to the traditional uniform alpha factor method. This improvement was supported by key performance metrics, including the Nash–Sutcliffe Efficiency (NSE), the coefficient of determination (R²), the percent bias (PBIAS), and the mean absolute percentage error (MAPE). This integrated framework effectively improves the accuracy and practicality of hydrological modeling, offering scalable and innovative solutions for sustainable watershed management in the face of increasing water stress. Full article

(This article belongs to the Special Issue Hydrological Modeling and Sustainable Water Resources Management)

► Show Figures

Figure 1

22 pages, 11145 KB

Open AccessArticle

Regional Soil Moisture Estimation Leveraging Multi-Source Data Fusion and Automated Machine Learning

by Shenglin Li, Pengyuan Zhu, Ni Song, Caixia Li and Jinglei Wang

Remote Sens. 2025, 17(5), 837; https://doi.org/10.3390/rs17050837 - 27 Feb 2025

Cited by 15 | Viewed by 3378

Abstract

Soil moisture (SM) monitoring in farmland at a regional scale is crucial for precision irrigation management and ensuring food security. However, existing methods for SM estimation encounter significant challenges related to accuracy, generalizability, and automation. This study proposes an integrated data fusion method [...] Read more.

Soil moisture (SM) monitoring in farmland at a regional scale is crucial for precision irrigation management and ensuring food security. However, existing methods for SM estimation encounter significant challenges related to accuracy, generalizability, and automation. This study proposes an integrated data fusion method to systematically assess the potential of three automated machine learning (AutoML) frameworks—tree-based pipeline optimization tool (TPOT), AutoGluon, and H2O AutoML—in retrieving SM. To evaluate the impact of input variables on estimation accuracy, six input scenarios were designed: multispectral data (MS), thermal infrared data (TIR), MS combined with TIR, MS with auxiliary data, TIR with auxiliary data, and a comprehensive combination of MS, TIR, and auxiliary data. The research was conducted in a winter wheat cultivation area within the People’s Victory Canal Irrigation Area, focusing on the 0–40 cm soil layer. The results revealed that the scenario incorporating all data types (MS + TIR + auxiliary) achieved the highest retrieval accuracy. Under this scenario, all three AutoML frameworks demonstrated optimal performance. AutoGluon demonstrated superior performance in most scenarios, particularly excelling in the MS + TIR + auxiliary data scenario. It achieved the highest retrieval accuracy with a Pearson correlation coefficient (R) value of 0.822, root mean square error (RMSE) of 0.038 cm³/cm³, and relative root mean square error (RRMSE) of 16.46%. This study underscores the critical role of input data types and fusion strategies in enhancing SM estimation accuracy and highlights the significant advantages of AutoML frameworks for regional-scale SM retrieval. The findings offer a robust technical foundation and theoretical guidance for advancing precision irrigation management and efficient SM monitoring. Full article

► Show Figures

Figure 1

28 pages, 4440 KB

Open AccessArticle

Simplatab: An Automated Machine Learning Framework for Radiomics-Based Bi-Parametric MRI Detection of Clinically Significant Prostate Cancer

by Dimitrios I. Zaridis, Vasileios C. Pezoulas, Eugenia Mylona, Charalampos N. Kalantzopoulos, Nikolaos S. Tachos, Nikos Tsiknakis, George K. Matsopoulos, Daniele Regge, Nikolaos Papanikolaou, Manolis Tsiknakis, Kostas Marias and Dimitrios I. Fotiadis

Bioengineering 2025, 12(3), 242; https://doi.org/10.3390/bioengineering12030242 - 26 Feb 2025

Cited by 5 | Viewed by 2642

Abstract

Background: Prostate cancer (PCa) diagnosis using MRI is often challenged by lesion variability. Methods: This study introduces Simplatab, an open-source automated machine learning (AutoML) framework designed for, but not limited to, automating the entire machine Learning pipeline to facilitate the detection of clinically [...] Read more.

Background: Prostate cancer (PCa) diagnosis using MRI is often challenged by lesion variability. Methods: This study introduces Simplatab, an open-source automated machine learning (AutoML) framework designed for, but not limited to, automating the entire machine Learning pipeline to facilitate the detection of clinically significant prostate cancer (csPCa) using radiomics features. Unlike existing AutoML tools such as Auto-WEKA, Auto-Sklearn, ML-Plan, ATM, Google AutoML, and TPOT, Simplatab offers a comprehensive, user-friendly framework that integrates data bias detection, feature selection, model training with hyperparameter optimization, explainable AI (XAI) analysis, and post-training model vulnerabilities detection. Simplatab requires no coding expertise, provides detailed performance reports, and includes robust data bias detection, making it particularly suitable for clinical applications. Results: Evaluated on a large pan-European cohort of 4816 patients from 12 clinical centers, Simplatab supports multiple machine learning algorithms. The most notable features that differentiate Simplatab include ease of use, a user interface accessible to those with no coding experience, comprehensive reporting, XAI integration, and thorough bias assessment, all provided in a human-understandable format. Conclusions: Our findings indicate that Simplatab can significantly enhance the usability, accountability, and explainability of machine learning in clinical settings, thereby increasing trust and accessibility for AI non-experts. Full article

► Show Figures

Graphical abstract

21 pages, 7635 KB

Open AccessArticle

Developing an Hourly Water Level Prediction Model for Small- and Medium-Sized Agricultural Reservoirs Using AutoML: Case Study of Baekhak Reservoir, South Korea

by Jeongho Han and Joo Hyun Bae

Agriculture 2025, 15(1), 71; https://doi.org/10.3390/agriculture15010071 - 30 Dec 2024

Cited by 3 | Viewed by 2304

Abstract

This study focuses on developing an hourly water level prediction model for small- and medium-sized agricultural reservoirs using the Tree-based Pipeline Optimization Tool (TPOT), an automated machine learning (AutoML) technique. The study area is the Baekhak Reservoir in South Korea, and various precipitation-related [...] Read more.

This study focuses on developing an hourly water level prediction model for small- and medium-sized agricultural reservoirs using the Tree-based Pipeline Optimization Tool (TPOT), an automated machine learning (AutoML) technique. The study area is the Baekhak Reservoir in South Korea, and various precipitation-related and reservoir water storage data were collected. Using these collected data, we compared widely used individual machine learning and deep learning models with the pipeline models generated by TPOT. The comparison showed that pipeline models, which included various preprocessing and ensemble techniques, exhibited higher predictive accuracy than individual machine learning and even deep learning models. The optimal pipeline model was evaluated for its performance in predicting water levels during an extreme rainfall event, demonstrating its effectiveness for hourly water level prediction. However, issues such as the overprediction of peak water levels and delays in predicting sudden water level changes were observed, likely due to inaccuracies in the ultra-short-term forecast precipitation data and the lack of information on reservoir operations (e.g., gate openings and drainage plans for agriculture). This study highlights the potential of AutoML techniques for use in hydrological modeling, and demonstrates their contribution to more efficient water management and flood prevention strategies in agricultural reservoirs. Full article

(This article belongs to the Special Issue Sustainable Water-Resource Strategies in Agriculture for Climate Change Adaptation)

► Show Figures

Figure 1

14 pages, 8478 KB

Open AccessFeature PaperArticle

Estimating Rainfall Erosivity in North Korea Using Automated Machine Learning: Insights into Regional Soil Erosion Risks

by Jeongho Han and Seoro Lee

Land 2024, 13(12), 2038; https://doi.org/10.3390/land13122038 - 28 Nov 2024

Cited by 1 | Viewed by 1515

Abstract

Soil erosion due to rainfall is a critical environmental issue in North Korea, exacerbated by deforestation and climate change. This study aims to estimate rainfall erosivity (RE) in North Korea using automated machine learning (AutoML), with a particular focus on regional soil erosion [...] Read more.

Soil erosion due to rainfall is a critical environmental issue in North Korea, exacerbated by deforestation and climate change. This study aims to estimate rainfall erosivity (RE) in North Korea using automated machine learning (AutoML), with a particular focus on regional soil erosion risks. North Korean data were sourced from the European Centre for Medium-Range Weather Forecasts (ECMWF) ReAnalysis 5 dataset, while South Korean data were obtained from the Korea Meteorological Administration. Data from 50 stations in South Korea (2013–2019) and 27 stations in North Korea (1980–2020) were used. The GradientBoostingRegressor (GBR) model, optimized using the Tree-based Pipeline Optimization Tool (TPOT), was trained on South Korean data. The model’s performance was evaluated using metrics such as the root mean square error (RMSE), mean absolute error (MAE), and coefficient of determination (R²), achieving high predictive accuracy across eight stations in South Korea. Using the optimized model, RE in North Korea was estimated, and the spatial distribution of RE was analyzed using the Kriging interpolation. Results reveal significant regional variability, with the southern and western areas displaying the highest erosivity. These findings provide valuable insights into soil erosion management and the development of sustainable agricultural and environmental strategies in North Korea. Full article

(This article belongs to the Section Land, Soil and Water)

► Show Figures

Figure 1

17 pages, 4057 KB

Open AccessArticle

A Comparative Analysis of Automated Machine Learning Tools: A Use Case for Autism Spectrum Disorder Detection

by Rana Tuqeer Abbas, Kashif Sultan, Muhammad Sheraz and Teong Chee Chuah

Information 2024, 15(10), 625; https://doi.org/10.3390/info15100625 - 11 Oct 2024

Cited by 7 | Viewed by 2500

Abstract

Automated Machine Learning (AutoML) enhances productivity and efficiency by automating the entire process of machine learning model development, from data preprocessing to model deployment. These tools are accessible to users with varying levels of expertise and enable efficient, scalable, and accurate classification across [...] Read more.

Automated Machine Learning (AutoML) enhances productivity and efficiency by automating the entire process of machine learning model development, from data preprocessing to model deployment. These tools are accessible to users with varying levels of expertise and enable efficient, scalable, and accurate classification across different applications. This paper evaluates two popular AutoML tools, the Tree-Based Pipeline Optimization Tool (TPOT) version 0.10.2 and Konstanz Information Miner (KNIME) version 5.2.5, comparing their performance in a classification task. Specifically, this work analyzes autism spectrum disorder (ASD) detection in toddlers as a use case. The dataset for ASD detection was collected from various rehabilitation centers in Pakistan. TPOT and KNIME were applied to the ASD dataset, with TPOT achieving an accuracy of 85.23% and KNIME achieving 83.89%. Evaluation metrics such as precision, recall, and F1-score validated the reliability of the models. After selecting the best models with optimal accuracy, the most important features for ASD detection were identified using these AutoML tools. The tools optimized the feature selection process and significantly reduced diagnosis time. This study demonstrates the potential of AutoML tools and feature selection techniques to improve early ASD detection and outcomes for affected children and their families. Full article

(This article belongs to the Special Issue Real-World Applications of Machine Learning Techniques)

► Show Figures

Figure 1

31 pages, 1004 KB

Open AccessArticle

Daily Streamflow Forecasting Using AutoML and Remote-Sensing-Estimated Rainfall Datasets in the Amazon Biomes

by Matteo Bodini

Signals 2024, 5(4), 659-689; https://doi.org/10.3390/signals5040037 - 10 Oct 2024

Cited by 5 | Viewed by 4085

Abstract

Reliable streamflow forecasting is crucial for several tasks related to water-resource management, including planning reservoir operations, power generation via Hydroelectric Power Plants (HPPs), and flood mitigation, thus resulting in relevant social implications. The present study is focused on the application of Automated Machine-Learning [...] Read more.

Reliable streamflow forecasting is crucial for several tasks related to water-resource management, including planning reservoir operations, power generation via Hydroelectric Power Plants (HPPs), and flood mitigation, thus resulting in relevant social implications. The present study is focused on the application of Automated Machine-Learning (AutoML) models to forecast daily streamflow in the area of the upper Teles Pires River basin, located in the region of the Amazon biomes. The latter area is characterized by extensive water-resource utilization, mostly for power generation through HPPs, and it has a limited hydrological data-monitoring network. Five different AutoML models were employed to forecast the streamflow daily, i.e., auto-sklearn, Tree-based Pipeline Optimization Tool (TPOT), H2O AutoML, AutoKeras, and MLBox. The AutoML input features were set as the time-lagged streamflow and average rainfall data sourced from four rain gauge stations and one streamflow gauge station. To overcome the lack of training data, in addition to the previous features, products estimated via remote sensing were leveraged as training data, including PERSIANN, PERSIANN-CCS, PERSIANN-CDR, and PDIR-Now. The selected AutoML models proved their effectiveness in forecasting the streamflow in the considered basin. In particular, the reliability of streamflow predictions was high both in the case when training data came from rain and streamflow gauge stations and when training data were collected by the four previously mentioned estimated remote-sensing products. Moreover, the selected AutoML models showed promising results in forecasting the streamflow up to a three-day horizon, relying on the two available kinds of input features. As a final result, the present research underscores the potential of employing AutoML models for reliable streamflow forecasting, which can significantly advance water-resource planning and management within the studied geographical area. Full article

(This article belongs to the Special Issue Rainfall Estimation Using Signals)

► Show Figures

Figure 1

Search Results (24)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (24)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI