MDPI - Publisher of Open Access Journals

26 pages, 4563 KB

Open AccessArticle

Personalized Smart Home Automation Using Machine Learning: Predicting User Activities

by Mark M. Gad, Walaa Gad, Tamer Abdelkader and Kshirasagar Naik

Sensors 2025, 25(19), 6082; https://doi.org/10.3390/s25196082 - 2 Oct 2025

A personalized framework for smart home automation is introduced, utilizing machine learning to predict user activities and allow for the context-aware control of living spaces. Predicting user activities, such as ‘Watch_TV’, ‘Sleep’, ‘Work_On_Computer’, and ‘Cook_Dinner’, is essential for improving occupant comfort, optimizing energy [...] Read more.

A personalized framework for smart home automation is introduced, utilizing machine learning to predict user activities and allow for the context-aware control of living spaces. Predicting user activities, such as ‘Watch_TV’, ‘Sleep’, ‘Work_On_Computer’, and ‘Cook_Dinner’, is essential for improving occupant comfort, optimizing energy consumption, and offering proactive support in smart home settings. The Edge Light Human Activity Recognition Predictor, or EL-HARP, is the main prediction model used in this framework to predict user behavior. The system combines open-source software for real-time sensing, facial recognition, and appliance control with affordable hardware, including the Raspberry Pi 5, ESP32-CAM, Tuya smart switches, NFC (Near Field Communication), and ultrasonic sensors. In order to predict daily user activities, three gradient-boosting models—XGBoost, CatBoost, and LightGBM (Gradient Boosting Models)—are trained for each household using engineered features and past behaviour patterns. Using extended temporal features, LightGBM in particular achieves strong predictive performance within EL-HARP. The framework is optimized for edge deployment with efficient training, regularization, and class imbalance handling. A fully functional prototype demonstrates real-time performance and adaptability to individual behavior patterns. This work contributes a scalable, privacy-preserving, and user-centric approach to intelligent home automation. Full article

(This article belongs to the Special Issue Sensor-Based Human Activity Recognition)

► Show Figures

Graphical abstract

32 pages, 4265 KB

Open AccessArticle

Machine Learning Approaches for Classification of Composite Materials

by Dmytro Tymoshchuk, Iryna Didych, Pavlo Maruschak, Oleh Yasniy, Andrii Mykytyshyn and Mykola Mytnyk

Modelling 2025, 6(4), 118; https://doi.org/10.3390/modelling6040118 - 1 Oct 2025

Abstract

The paper presents a comparative analysis of various machine learning algorithms for the classification of epoxy composites reinforced with basalt fiber and modified with inorganic fillers. The classification is based on key thermophysical characteristics, in particular, the mass fraction of the filler, temperature, [...] Read more.

The paper presents a comparative analysis of various machine learning algorithms for the classification of epoxy composites reinforced with basalt fiber and modified with inorganic fillers. The classification is based on key thermophysical characteristics, in particular, the mass fraction of the filler, temperature, and thermal conductivity coefficient. A dataset of 16,056 interpolated samples was used to train and evaluate more than a dozen models. Among the tested algorithms, the MLP neural network model showed the highest accuracy of 99.7% and balanced classification metrics F1-measure and G-Mean. Ensemble methods, including XGBoost, CatBoost, ExtraTrees, and HistGradientBoosting, also showed high classification accuracy. To interpret the results of the MLP model, SHAP analysis was applied, which confirmed the predominant influence of the mass fraction of the filler on decision-making for all classes. The results of the study confirm the high effectiveness of machine learning methods for recognizing filler type in composite materials, as well as the potential of interpretable AI in materials science tasks. Full article

(This article belongs to the Special Issue Machine Learning and Artificial Intelligence in Modelling)

31 pages, 3702 KB

Open AccessArticle

Optimized Intrusion Detection in the IoT Through Statistical Selection and Classification with CatBoost and SNN

by Brou Médard Kouassi, Abou Bakary Ballo, Kacoutchy Jean Ayikpa, Diarra Mamadou and Youssouf Diabagate

Technologies 2025, 13(10), 441; https://doi.org/10.3390/technologies13100441 - 30 Sep 2025

Abstract

With the rapid expansion of the Internet of Things (IoT), interconnected systems are becoming increasingly vulnerable to cyberattacks, making intrusion detection essential but difficult. The marked imbalance between regular traffic and attacks, as well as the redundancy of variables from multiple sensors and [...] Read more.

With the rapid expansion of the Internet of Things (IoT), interconnected systems are becoming increasingly vulnerable to cyberattacks, making intrusion detection essential but difficult. The marked imbalance between regular traffic and attacks, as well as the redundancy of variables from multiple sensors and protocols, greatly complicates this task. The study aims to improve the robustness of IoT intrusion detection systems by reducing the risks of overfitting and false negatives through appropriate rebalancing and variable selection strategies. We combine two data rebalancing techniques, Synthetic Minority Over-sampling Technique (SMOTE) and Random Undersampling (RUS), with two feature selection methods, LASSO and Mutual Information, and then evaluate their performance on two classification models: CatBoost and a Simple Neural Network (SNN). The experiments show the superiority of CatBoost, which achieves an accuracy of 82% compared to 80% for SNN, and confirm the effectiveness of SMOTE over RUS, particularly for SNN. The CatBoost + SMOTE + LASSO configuration stands out with a recall of 82.43% and an F1-score of 85.08%, offering the best compromise between detection and reliability. These results demonstrate that combining rebalancing and variable selection techniques significantly enhances the performance and reliability of intrusion detection systems in the IoT, thereby strengthening cybersecurity in connected environments. Full article

(This article belongs to the Special Issue IoT-Enabling Technologies and Applications—2nd Edition)

15 pages, 855 KB

Open AccessArticle

Integrating Fitbit Wearables and Self-Reported Surveys for Machine Learning-Based State–Trait Anxiety Prediction

by Archana Velu, Jayroop Ramesh, Abdullah Ahmed, Sandipan Ganguly, Raafat Aburukba, Assim Sagahyroon and Fadi Aloul

Appl. Sci. 2025, 15(19), 10519; https://doi.org/10.3390/app151910519 - 28 Sep 2025

Abstract

Anxiety disorders represent a significant global health challenge, yet a substantial treatment gap persists, motivating the development of scalable digital health solutions. This study investigates the potential of integrating passive physiological data from consumer wearable devices with subjective self-reported surveys to predict state–trait [...] Read more.

Anxiety disorders represent a significant global health challenge, yet a substantial treatment gap persists, motivating the development of scalable digital health solutions. This study investigates the potential of integrating passive physiological data from consumer wearable devices with subjective self-reported surveys to predict state–trait anxiety. Leveraging the multi-modal, longitudinal LifeSnaps dataset, which captured “in the wild” data from 71 participants over four months, this research develops and evaluates a machine learning framework for this purpose. The methodology meticulously details a reproducible data curation pipeline, including participant-specific time zone harmonization, validated survey scoring, and comprehensive feature engineering from Fitbit Sense physiological data. A suite of machine learning models was trained to classify the presence of anxiety, defined by the State–Trait Anxiety Inventory (S-STAI). The CatBoost ensemble model achieved an accuracy of 77.6%, with high sensitivity (92.9%) but more modest specificity (48.9%). The positive predictive value (77.3%) and negative predictive value (78.6%) indicate balanced predictive utility across classes. The model obtained an F1-score of 84.3%, a Matthews correlation coefficient of 0.483, and an AUC of 0.709, suggesting good detection of anxious cases but more limited ability to correctly identify non-anxious cases. Post hoc explainability approaches (local and global) reveal that key predictors of state anxiety include measures of cardio-respiratory fitness (VO₂Max), calorie expenditure, duration of light activity, resting heart rate, thermal regulation and age. While additional sensitivity analysis and conformal prediction methods reveal that the size of the datasets contributes to overfitting, the features and the proposed approach is generally conducive for reasonable anxiety prediction. These findings underscore the use of machine learning and ubiquitous sensing modalities for a more holistic and accurate digital phenotyping of state anxiety. Full article

(This article belongs to the Special Issue AI Technologies for eHealth and mHealth, 2nd Edition)

► Show Figures

Figure 1

24 pages, 11488 KB

Open AccessArticle

An Innovative Approach for Forecasting Hydroelectricity Generation by Benchmarking Tree-Based Machine Learning Models

by Bektaş Aykut Atalay and Kasım Zor

Appl. Sci. 2025, 15(19), 10514; https://doi.org/10.3390/app151910514 - 28 Sep 2025

Abstract

Hydroelectricity, one of the oldest and most potent forms of renewable energy, not only provides low-cost electricity for the grid but also preserves nature through flood control and irrigation support. Forecasting hydroelectricity generation is vital for utilizing alleviating resources effectively, optimizing energy production, [...] Read more.

Hydroelectricity, one of the oldest and most potent forms of renewable energy, not only provides low-cost electricity for the grid but also preserves nature through flood control and irrigation support. Forecasting hydroelectricity generation is vital for utilizing alleviating resources effectively, optimizing energy production, and ensuring sustainability. This paper provides an innovative approach to hydroelectricity generation forecasting (HGF) of a 138 MW hydroelectric power plant (HPP) in the Eastern Mediterranean by taking electricity productions from the remaining upstream HPPs on the Ceyhan River within the same basin into account, unlike prior research focusing on individual HPPs. In light of tuning hyperparameters such as number of trees and learning rates, this paper presents a thorough benchmark of the state-of-the-art tree-based machine learning models, namely categorical boosting (CatBoost), extreme gradient boosting (XGBoost), and light gradient boosting machines (LightGBM). The comprehensive data set includes historical hydroelectricity generation, meteorological conditions, market pricing, and calendar variables acquired from the transparency platform of the Energy Exchange Istanbul (EXIST) and MERRA-2 reanalysis of the NASA with hourly resolution. Although all three models demonstrated successful performances, LightGBM emerged as the most accurate and efficient model by outperforming the others with the highest coefficient of determination (R²) (97.07%), the lowest root mean squared scaled error (RMSSE) (0.1217), and the shortest computational time (1.24 s). Consequently, it is considered that the proposed methodology demonstrates significant potential for advancing the HGF and will contribute to the operation of existing HPPs and the improvement of power dispatch planning. Full article

► Show Figures

Figure 1

23 pages, 17838 KB

Open AccessArticle

Integrating Multi-Temporal Sentinel-1/2 Vegetation Signatures with Machine Learning for Enhanced Soil Salinity Mapping Accuracy in Coastal Irrigation Zones: A Case Study of the Yellow River Delta

by Junyong Zhang, Tao Liu, Wenjie Feng, Lijing Han, Rui Gao, Fei Wang, Shuang Ma, Dongrui Han, Zhuoran Zhang, Shuai Yan, Jie Yang, Jianfei Wang and Meng Wang

Agronomy 2025, 15(10), 2292; https://doi.org/10.3390/agronomy15102292 - 27 Sep 2025

Abstract

Soil salinization poses a severe threat to agricultural sustainability in the Yellow River Delta, where conventional spectral indices are limited by vegetation interference and seasonal dynamics in coastal saline-alkali landscapes. To address this, we developed an inversion framework integrating spectral indices and vegetation [...] Read more.

Soil salinization poses a severe threat to agricultural sustainability in the Yellow River Delta, where conventional spectral indices are limited by vegetation interference and seasonal dynamics in coastal saline-alkali landscapes. To address this, we developed an inversion framework integrating spectral indices and vegetation temporal features, combining multi-temporal Sentinel-2 optical data (January 2024–March 2025), Sentinel-1 SAR data, and terrain covariates. The framework employs Savitzky–Golay (SG) filtering to extract vegetation temporal indices—including NDVI temporal extremum and principal component features, capturing salt stress response mechanisms beyond single-temporal spectral indices. Based on 119 field samples and Variable Importance in Projection (VIP) feature selection, three ensemble models (XGBoost, CatBoost, LightGBM) were constructed under two strategies: single spectral features versus fused spectral and vegetation temporal features. The key results demonstrate the following: (1) The LightGBM model with fused features achieved optimal validation accuracy (R² = 0.77, RMSE = 0.26 g/kg), outperforming single-feature models by 13% in R². (2) SHAP analysis identified vegetation-related factors as key predictors, revealing a negative correlation between peak biomass and salinity accumulation, and the summer crop growth process affects soil salinization in the following spring. (3) The fused strategy reduced overestimation in low-salinity zones, enhanced model robustness, and significantly improved spatial gradient continuity. This study confirms that vegetation phenological features effectively mitigate agricultural interference (e.g., tillage-induced signal noise) and achieve high-resolution salinity mapping in areas where traditional spectral indices fail. The multi-temporal integration framework provides a replicable methodology for monitoring coastal salinization under complex land cover conditions. Full article

(This article belongs to the Special Issue In-Field Detection and Monitoring Technology in Precision Agriculture—2nd Edition)

► Show Figures

Figure 1

17 pages, 1454 KB

Open AccessArticle

Machine Learning Model for Predicting Multidrug Resistance in Clinical Escherichia coli Isolates: A Retrospective General Surgery Study

by Hüseyin Kerem Tolan, İrfan Aydın, Handan Tanyildizi-Kokkulunk, Mehmet Karakuş, Yüksel Akkaya, Osman Kaya and Ferruh Kemal İşman

Antibiotics 2025, 14(10), 969; https://doi.org/10.3390/antibiotics14100969 - 26 Sep 2025

Abstract

Background/Objectives: Escherichia coli is one of the leading causes of surgical site infections (SSIs) and poses a growing public health concern due to its increasing antimicrobial resistance. High rates of extended-spectrum beta-lactamase (ESBL) production among E. coli strains complicate treatment outcomes and [...] Read more.

Background/Objectives: Escherichia coli is one of the leading causes of surgical site infections (SSIs) and poses a growing public health concern due to its increasing antimicrobial resistance. High rates of extended-spectrum beta-lactamase (ESBL) production among E. coli strains complicate treatment outcomes and emphasize the need for effective surveillance and control strategies. Methods: A total of 691 E. coli isolates from general surgery clinics (2020–2025) were identified using MALDI-TOF MS. Antibiotic susceptibility data and patient variables were cleaned, encoded, and used to predict resistance using the Random Forest, CatBoost, and Naive Bayes algorithms. SMOTE addressed class imbalance, and model performance was assessed through various validation methods. Results: Among the three machine learning models tested, Random Forest (RF) showed the best performance in predicting antibiotic resistance of E. coli, achieving median accuracy, precision, recall, and F1-scores of 0.90 and AUC values up to 0.99 for key antibiotics. CatBoost performed similarly but was less stable with imbalanced data, while Naive Bayes showed lower accuracy. Feature importance analysis highlighted strong inter-antibiotic resistance links, especially among β-lactams, and some influence of demographic factors. Conclusions: This study highlights the potential of simple, high-performing models using structured clinical data to predict antimicrobial resistance, especially in resource-limited clinical settings. By incorporating machine learning into antimicrobial resistance (AMR) surveillance systems, our goal is to support the advancement of rapid diagnostics and targeted antimicrobial stewardship approaches, which are essential in addressing the growing challenge of multidrug resistance. Full article

(This article belongs to the Section Antibiotics Use and Antimicrobial Stewardship)

► Show Figures

Figure 1

16 pages, 2342 KB

Open AccessArticle

Modeling Pain Dynamics and Opioid Response in Oncology Inpatients: A Retrospective Study with Application to AI-Guided Analgesic Strategies in Colorectal Cancer

by Eliza-Maria Froicu (Armeanu), Oriana-Maria Onicescu (Oniciuc), Ioana Creangă-Murariu, Camelia Dascălu, Bogdan Gafton, Vlad-Adrian Afrăsânie, Teodora Alexa-Stratulat, Mihai-Vasile Marinca, Diana-Maria Pușcașu, Lucian Miron, Gema Bacoanu, Irina Afrăsânie and Vladimir Poroch

Medicina 2025, 61(10), 1741; https://doi.org/10.3390/medicina61101741 - 25 Sep 2025

Viewed by 37

Abstract

Background and Objectives: Cancer pain continues to be a major clinical problem nowadays. This study aims to evaluate the World Health Organization (WHO) analgesic ladder effectiveness in patients with colorectal cancer and develop machine learning models to predict treatment response for precision pain [...] Read more.

Background and Objectives: Cancer pain continues to be a major clinical problem nowadays. This study aims to evaluate the World Health Organization (WHO) analgesic ladder effectiveness in patients with colorectal cancer and develop machine learning models to predict treatment response for precision pain management. Materials and Methods: In a retrospective observational study, a total of 107 oncological patients were analyzed, with a detailed subgroup analysis of 42 patients with colorectal cancer, hospitalized between July and September in 2022. The pain assessment used numerical rating scales at baseline and 2–3 weeks follow-up. Clinical variables included demographics, disease staging, metastatic patterns, analgesic progression, and medication usage. Machine learning algorithms (e.g., Random Forest, CatBoost, XGBoost, and Neural Network) were used to predict pain reduction outcomes. The UMAP dimensionality reduction and clustering identified the patient phenotypes. Results: Statistical analyses included descriptive methods, Chi-square and Mann–Whitney tests, and the models’ performance was evaluated by AUC. Among patients with colorectal cancer, 73.8% achieved clinically pain improvement, with a mean reduction of 2.62 points and median improvement of 3.00 points. The metastatic site significantly affected outcomes: visceral metastases patients showed median improvement of 3.00 points with high variability, patients with bone metastases demonstrated heterogeneous responses (range: −2.00 to +8.00 points), while non-metastatic patients exhibited consistent improvement. Random Forest achieved optimal predictive performance (AUC: 0.9167), identifying the baseline pain score, bone metastases, Fentanyl usage, anticonvulsants, and antispasmodics as key predictive features. The clustering analysis revealed two distinct phenotypes, requiring different analgesic intensities. Conclusions: This study validates the WHO analgesic ladder effectiveness while demonstrating superior outcomes in patients with colorectal cancer. The machine learning models successfully predict the treatment response with excellent discriminative ability, supporting precision medicine implementation in cancer pain management. Full article

(This article belongs to the Section Oncology)

► Show Figures

Figure 1

22 pages, 5879 KB

Open AccessArticle

Explainable Machine Learning for Multicomponent Concrete: Predictive Modeling and Feature Interaction Insights

by Jie Wang, Junqi Deng, Siyi Li, Weijie Du, Zengqi Zhang and Xiaoming Liu

Materials 2025, 18(19), 4456; https://doi.org/10.3390/ma18194456 - 24 Sep 2025

Viewed by 41

Abstract

Multicomponent concrete is a widely used industrial material, yet its performance evaluation still relies heavily on expert judgment and long-term monitoring. With the rapid development of artificial intelligence (AI), machine learning has emerged as a promising tool in building science for analyzing complex [...] Read more.

Multicomponent concrete is a widely used industrial material, yet its performance evaluation still relies heavily on expert judgment and long-term monitoring. With the rapid development of artificial intelligence (AI), machine learning has emerged as a promising tool in building science for analyzing complex datasets and reducing uncertainties associated with human factors. This study applies a variety of machine learning techniques—including linear and polynomial regressions, tree-based algorithms (Decision Tree, Random Forest, ExtraTrees, AdaBoost, CatBoost, and XGBoost), and the TabPFN model—to investigate the key factors influencing concrete compressive strength. To enhance interpretability, SHAP analysis was employed to uncover feature importance and interactions, offering new insights into the underlying mechanisms of multicomponent concrete. The findings provide a data-driven approach to support engineering design, facilitate decision-making in construction practice, and contribute to the development of more efficient and sustainable building materials. Full article

(This article belongs to the Section Construction and Building Materials)

► Show Figures

Graphical abstract

35 pages, 7791 KB

Open AccessArticle

Data-Driven Spatial Optimization of Elderly Care Facilities: A Study on Nonlinear Threshold Effects Based on XGBoost and SHAP—A Case Study of Xi’an, China

by Linggui Liu, Han Lyu, Jinghua Dai, Yuheng Tu and Taotao Gao

ISPRS Int. J. Geo-Inf. 2025, 14(10), 371; https://doi.org/10.3390/ijgi14100371 - 24 Sep 2025

Viewed by 120

Abstract

Under the accelerating demographic aging trend, the rational allocation of elderly care facilities has emerged as a critical challenge. Although existing studies have investigated elderly care facilities planning using conventional methods, they frequently overlook the nonlinear interactions between built environment factors and heterogeneous [...] Read more.

Under the accelerating demographic aging trend, the rational allocation of elderly care facilities has emerged as a critical challenge. Although existing studies have investigated elderly care facilities planning using conventional methods, they frequently overlook the nonlinear interactions between built environment factors and heterogeneous demands across different elderly care facility types. This study addresses these gaps by proposing a data-driven framework that integrates machine learning with spatial analysis to optimize elderly care facility distribution in Xi’an City central area, Shaanxi Province, China. Leveraging multi-source datasets encompassing points of interest (POIs), road networks, and demographic statistics, we classify facilities into three categories (service-oriented, activity-oriented, and care-oriented) and employ an XGBoost model with SHAP interpretability to evaluate spatial distributions and influencing factors. The results demonstrate that the XGBoost model outperforms comparative algorithms (Random Forest, CatBoost, LightGBM) with superior performance metrics (accuracy rate of 97%, precision of 95%, and F1-score of 90%), effectively capturing nonlinear thresholds effects. Key findings reveal the following: (1) Accessibility and road density exert threshold effects on care-oriented facilities, with facility attractiveness saturating when these values exceed 6; (2) Land use intensity and medical resources positively correlate with activity-oriented facilities, while excessive retail density inhibits their distribution; (3) Service-oriented facilities thrive in areas with balanced accessibility and moderate commercial diversity. Spatial analysis identifies clustered distribution patterns in urban core areas contrasted with peripheral deficiencies, indicating need for targeted interventions. This research contributes a scalable methodology for equitable facility planning, emphasizing the integration of dynamic built environment variations with model interpretability. The framework provides significant implications for formulating age-friendly urban policies applicable to global cities undergoing rapid urbanization and population aging. Full article

► Show Figures

Figure 1

20 pages, 2119 KB

Open AccessArticle

Power Outage Prediction on Overhead Power Lines on the Basis of Their Technical Parameters: Machine Learning Approach

by Vadim Bol’shev, Dmitry Budnikov, Andrei Dzeikalo and Roman Korolev

Energies 2025, 18(18), 5034; https://doi.org/10.3390/en18185034 - 22 Sep 2025

Viewed by 206

Abstract

In this study, data on the characteristics of overhead power lines of high voltage was used in a classification task to predict power supply outages by means of a supervised machine learning technique. In order to choose the most optimal features for power [...] Read more.

In this study, data on the characteristics of overhead power lines of high voltage was used in a classification task to predict power supply outages by means of a supervised machine learning technique. In order to choose the most optimal features for power outage prediction, an Exploratory Data Analysis on power line parameters was carried out, including statistical and correlational methods. For the given task, five classifiers were considered as machine learning algorithms: Support Vector Machine, Logistic Regression, Random Forest, and two gradient-boosting algorithms over decisive trees LightGBM Classifier and CatBoost Classifier. To automate the process of data conversion and eliminate the possibility of data leakage, Pipeline and Column Transformers (builder of heterogeneous features) were applied; data for the models was prepared using One-Hot Encoding and standardization techniques. The data were divided into training and validation samples through cross-validation with stratified separation. The hyperparameters of the classifiers were adjusted using optimization methods: randomized and exhaustive search over specified parameter values. The results of the study demonstrated the potential for predicting power failures on 110 kV overhead power lines based on data on their parameters, as can be seen from the derived quality metrics of tuned classifiers. The best quality of outage prediction was achieved by the Logistic Regression model with quality metrics ROC AUC equal to 0.78 and AUC-PR equal to 0.68. In the final phase of the research, an analysis of the influence of power line parameters on the failure probability was made using the embedded method for determining the feature importance of various models, including estimating the vector of regression coefficients. It allowed for the evaluation of the numerical impact of power line parameters on power supply outages. Full article

(This article belongs to the Special Issue Future Multi-Energy Smart-Grids: Advances in Operation, Control, and Monitoring)

► Show Figures

Figure 1

11 pages, 617 KB

Open AccessArticle

An Explainable AI Framework for Online Diabetes Risk Prediction with a Personalized Chatbot Assistant

by Ehesan Maimaitijiang, Muyesaier Aihaiti and Yasin Mamatjan

Electronics 2025, 14(18), 3738; https://doi.org/10.3390/electronics14183738 - 22 Sep 2025

Viewed by 297

Abstract

Background and Objective: Diabetes is a prevalent chronic disease that presents considerable health risks, making prompt diagnosis and treatment essential to avert complications. Traditional Artificial Intelligence (AI) models for diabetes prediction often operate as black boxes. A major issue caused by this is [...] Read more.

Background and Objective: Diabetes is a prevalent chronic disease that presents considerable health risks, making prompt diagnosis and treatment essential to avert complications. Traditional Artificial Intelligence (AI) models for diabetes prediction often operate as black boxes. A major issue caused by this is that black boxes lack interpretability, which impacts their effectiveness in clinical use cases. We introduce a novel online recommendation framework using explainable AI (XAI) to predict type II diabetes risk and provide clear, actionable analyses with a personalized chatbot assistant. Methods: To make the model, we chose the CatBoost classifier and SHapley Additive exPlanations (SHAP) due to their ability to provide accurate predictions. Using those tools, we analyzed 16 individual risk factors from a dataset of 520 patients. We applied the Synthetic Minority Over-sampling Technique (SMOTE) to reduce the effect of data imbalance. We also developed an interactive interface that allows users to input data, visualize personalized risk profiles, and understand the driving factors behind predictions. Finally, large language models (LLMs) were integrated into the interface for patient-specific recommendations for improving health and lifestyle through a personalized chatbot assistant. Results: The model demonstrated great predictive performance, with an Area Under the ROC Curve (AUC) of 0.99, a Cohen Kappa score of 0.978, and an F1 score of 0.99. For the minority class, SMOTE application improved performance metrics, resulting in an AUC of 0.98 and an F1 score of 0.91 for female patients. Conclusions: This study proposes an explainable AI framework for predicting diabetes risk online and providing patient-specific advice through a personalized chatbot assistant. This will help to facilitate better decision-making and improved management of diabetes risk. Full article

(This article belongs to the Special Issue Machine Learning in Electronic and Biomedical Engineering, 3rd Edition)

► Show Figures

Figure 1

25 pages, 5602 KB

Open AccessArticle

Machine Learning-Based Estimation of Tractor Performance in Tillage Operations Using Soil Physical Properties

by So-Yun Gong, Seung-Min Baek, Seung-Yun Baek, Yong-Joo Kim and Wan-Soo Kim

Agronomy 2025, 15(9), 2228; https://doi.org/10.3390/agronomy15092228 - 21 Sep 2025

Viewed by 194

Abstract

Accurate estimation of tractor performance under various soil conditions is essential for enhancing operational efficiency in precision agriculture. This study developed machine learning models to estimate tractor performance based on key soil physical properties. Three algorithms—decision tree (DT), CatBoost, and LightGBM—were employed to [...] Read more.

Accurate estimation of tractor performance under various soil conditions is essential for enhancing operational efficiency in precision agriculture. This study developed machine learning models to estimate tractor performance based on key soil physical properties. Three algorithms—decision tree (DT), CatBoost, and LightGBM—were employed to capture nonlinear relationships between soil parameters and tractor performance indicators. The input variables included soil moisture content, cone index, and particle composition, while the output variables were engine torque, power, slip ratio, and axle power. The models in this study were trained and validated using field data collected from eight paddy fields in Chungcheongnam-do (two in Seosan, two in Cheongyang, and four in Dangjin) and two paddy fields in Gyeonggi-do (Anseong), Republic of Korea. Results showed that models using multiple soil variables significantly outperformed those using single variables. In Model D, CatBoost demonstrated superior performance in predicting engine torque, engine power, slip ratio, and axle power, achieving R² values that were 7.0–14.2% higher than those of DT and 1.6–3.8% higher than those of LightGBM. These findings demonstrate the feasibility of using machine learning with minimal input data to estimate tractor performance, potentially reducing the reliance on extensive physical testing. Full article

(This article belongs to the Special Issue Harnessing Sensing, Artificial Intelligence, and Robotics for Digital Agriculture)

► Show Figures

Figure 1

29 pages, 7187 KB

Open AccessArticle

A Novel Framework for Predicting Daily Reference Evapotranspiration Using Interpretable Machine Learning Techniques

by Elsayed Ahmed Elsadek, Mosaad Ali Hussein Ali, Clinton Williams, Kelly R. Thorp and Diaa Eldin M. Elshikha

Agriculture 2025, 15(18), 1985; https://doi.org/10.3390/agriculture15181985 - 20 Sep 2025

Viewed by 234

Abstract

Accurate estimation of daily reference evapotranspiration (ET_o) is crucial for sustainable water resource management and irrigation scheduling, especially in water-scarce regions like Arizona. The standardized Penman–Monteith (PM) method is costly and requires specialized instruments and expertise, making it generally impractical for [...] Read more.

Accurate estimation of daily reference evapotranspiration (ET_o) is crucial for sustainable water resource management and irrigation scheduling, especially in water-scarce regions like Arizona. The standardized Penman–Monteith (PM) method is costly and requires specialized instruments and expertise, making it generally impractical for commercial growers. This study developed 35 ET_o models to predict daily ET_o across Coolidge, Maricopa, and Queen Creek in Pinal County, Arizona. Seven input combinations of daily meteorological variables were used for training and testing five machine learning (ML) models: Artificial Neural Network (ANN), Random Forest (RF), Extreme Gradient Boosting (XGBoost), Categorical Boosting (CatBoost), and Support Vector Machine (SVM). Four statistical indicators, coefficient of determination (R²), the normalized root-mean-squared error (RMSE_n), mean absolute error (MAE), and simulation error (S_e), were used to evaluate the ML models’ performance in comparison with the FAO-56 PM standardized method. The SHapley Additive exPlanations (SHAP) method was used to interpret each meteorological variable’s contribution to the model predictions. Overall, the 35 ET_o-developed models showed an excellent to fair performance in predicting daily ET_o over the three weather stations. Employing ANN10, RF10, XGBoost10, CatBoost10, and SVM10, incorporating all ten meteorological variables, yielded the highest accuracies during training and testing periods (0.994 ≤ R² ≤ 1.0, 0.729 ≤ RMSE_n ≤ 3.662, 0.030 ≤ MAE ≤ 0.181 mm·day⁻¹, and 0.833 ≤ S_e ≤ 2.295). Excluding meteorological variables caused a gradual decline in ET-developed models’ performance across the stations. However, 3-variable models using only maximum, minimum, and average temperatures (T_max, T_min, and T_ave) predicted ET_o well across the three stations during testing (17.655 ≤ RMSE_n ≤ 13.469 and S_e ≤ 15.45%). Results highlighted that T_max, solar radiation (R_s), and wind speed at 2 m height (U₂) are the most influential factors affecting ET_o at the central Arizona sites, followed by extraterrestrial solar radiation (R_a) and T_ave. In contrast, humidity-related variables (RH_min, RH_max, and RH_ave), along with T_min and precipitation (P_r), had minimal impact on the model’s predictions. The results are informative for assisting growers and policymakers in developing effective water management strategies, especially for arid regions like central Arizona. Full article

(This article belongs to the Section Agricultural Water Management)

► Show Figures

Figure 1

46 pages, 29512 KB

Open AccessArticle

From Research Trend to Performance Prediction: Metaheuristic-Driven Machine Learning Optimization for Cement Pastes Containing Bio-Based Phase Change Materials

by Leifa Li, Wangwen Sun, Lauren Y. Gómez-Zamorano, Zhuangzhuang Liu, Wenzhen Zhang and Haoran Ma

Polymers 2025, 17(18), 2541; https://doi.org/10.3390/polym17182541 - 19 Sep 2025

Viewed by 279

Abstract

This study presents an integrated approach combining bibliometric analysis and machine learning to explore research trends and predict the performance of cement pastes containing bio-based phase change materials. A bibliometric review of 5928 articles from the Web of Science Core Collection was conducted [...] Read more.

This study presents an integrated approach combining bibliometric analysis and machine learning to explore research trends and predict the performance of cement pastes containing bio-based phase change materials. A bibliometric review of 5928 articles from the Web of Science Core Collection was conducted using CiteSpace (v.6.3.R1) to identify research hotspots. A dataset of 100 experimental samples was compiled, including nine input variables and three output properties identified as thermal conductivity (Tc), latent heat capacity (LH) and compressive strength (CS). Four machine learning algorithms (SVR, RF, XGBoost, and CatBoost) were optimized using five metaheuristic algorithms (GA, PSO, WOA, GWO, and FFA), resulting in 24 optimized hybrid models. Of all the models considered, CatBoost-WOA achieved the best overall performance, with R² values of 0.927, 0.955, and 0.944, and RMSEs of 0.0057 W/m·K, 1.84 J/g, and 2.91 MPa for Tc, LH, and CS. Additionally, SVR-GWO and XGBoost-WOA also showed strong generalization and low error dispersion. The developed models provide a transferable and data-driven modeling pipeline for predicting the coupled thermal and mechanical behavior of cement pastes containing bio-based phase change materials. Full article

(This article belongs to the Special Issue Application of Polymers in Cementitious Materials)

► Show Figures

Figure 1

Search Results (590)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (590)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI