Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (48)

Search Parameters:
Keywords = tabular ML

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
33 pages, 3142 KB  
Article
Exploring Net Promoter Score with Machine Learning and Explainable Artificial Intelligence: Evidence from Brazilian Broadband Services
by Matheus Raphael Elero, Rafael Henrique Palma Lima, Bruno Samways dos Santos and Gislaine Camila Lapasini Leal
Computers 2026, 15(2), 96; https://doi.org/10.3390/computers15020096 - 2 Feb 2026
Viewed by 50
Abstract
Despite the growing use of machine learning (ML) for analyzing service quality and customer satisfaction, empirical studies based on Brazilian broadband telecommunications data remain scarce. This is especially true for those who leverage publicly available nationwide datasets. To address this gap, this study [...] Read more.
Despite the growing use of machine learning (ML) for analyzing service quality and customer satisfaction, empirical studies based on Brazilian broadband telecommunications data remain scarce. This is especially true for those who leverage publicly available nationwide datasets. To address this gap, this study investigates customer satisfaction with broadband internet services in Brazil using supervised ML and explainable artificial intelligence (XAI) techniques applied to survey data collected by ANATEL between 2017 and 2020. Customer satisfaction was operationalized using the Net Promoter Score (NPS) reference scale, and three modifications in the scale were evaluated: (i) a binary model grouping ratings ≥ 8 as satisfied and ≤7 as dissatisfied (portion of the neutrals as satisfied and another as dissatisfied); (ii) a binary model excluding neutral responses (ratings 7–8) and retaining only detractors (≤6) and promoters (≥9); and (iii) a multiclass model following the original NPS categories (detractors, neutrals, and promoters). Nine ML classifiers were trained and validated on tabular data for each formulation. Model interpretability was addressed through SHAP and feature importance analysis using tree-based models. The results indicate that Histogram Gradient Boosting and Random Forest achieve the most robust and stable performance, particularly in binary classification scenarios. The analysis of neutral customers reveals classification ambiguity, showing scores of “7” tend toward dissatisfaction, while scores of “8” tend toward satisfaction. XAI analyses consistently identify browsing speed, billing accuracy, fulfillment of advertised service conditions, and connection stability as the most influential predictors of satisfaction. By combining predictive performance with model transparency, this study provides computational evidence for explainable satisfaction modeling and highlights the value of public regulatory datasets for reproducible ML research. Full article
Show Figures

Graphical abstract

34 pages, 459 KB  
Article
Comparative Analysis and Optimisation of Machine Learning Models for Regression and Classification on Structured Tabular Datasets
by Siegfried Fredrich Stumpfe and Sandile Charles Shongwe
Mathematics 2026, 14(3), 473; https://doi.org/10.3390/math14030473 - 29 Jan 2026
Viewed by 144
Abstract
This research entails comparative analysis and optimisation of machine learning models for regression and classification tasks on structured tabular datasets. The primary target audience for this analysis comprises researchers and practitioners working with structured tabular data. Common fields include biostatistics, insurance, and financial [...] Read more.
This research entails comparative analysis and optimisation of machine learning models for regression and classification tasks on structured tabular datasets. The primary target audience for this analysis comprises researchers and practitioners working with structured tabular data. Common fields include biostatistics, insurance, and financial risk modelling, where computational efficiency and robust predictive performance are essential. Four machine learning techniques (i.e., linear/logistic regression, support vector machines (SVMs), Extreme Gradient Boosting (XGBoost), and Multi-Layered Perceptrons (MLPs)) were applied across 72 datasets sourced from OpenML and Kaggle. The datasets systematically varied by observation size, dimensionality, noise levels, linearity, and class balance. Based on extensive empirical analysis (72 datasets ×4 models ×2 configurations =576 experiments), it is observed that, understanding the dataset characteristics is more critical than extensive hyperparameter tuning for optimal model performance. Also, linear models are robust across various settings, while non-linear models, like XGBoost and MLP, perform better in complex and noisy environments. In general, this study provides valuable insights for model selection and benchmarking in machine learning applications that involve structured tabular datasets. Full article
(This article belongs to the Special Issue Computational Statistics: Analysis and Applications for Mathematics)
17 pages, 2326 KB  
Article
Explainable AutoML with Uncertainty Quantification for CO2-Cured Concrete Compressive Strength Prediction
by Liping Wang, Yuanfeng Wang, Chengcheng Shi, Baolong Ma, Yinshan Liu, Boqun Zhang, Shaoqin Xue, Xinlei Chang and Xiaodong Liu
Buildings 2026, 16(1), 89; https://doi.org/10.3390/buildings16010089 - 24 Dec 2025
Viewed by 338
Abstract
The cement and concrete industry is one of the primary sources of anthropogenic carbon dioxide (CO2) emissions globally, responsible for nearly 8% of total emissions, making the need for a low-carbon transition urgent. CO2 curing provides both strength enhancement and [...] Read more.
The cement and concrete industry is one of the primary sources of anthropogenic carbon dioxide (CO2) emissions globally, responsible for nearly 8% of total emissions, making the need for a low-carbon transition urgent. CO2 curing provides both strength enhancement and carbon sequestration, yet the compressive strength of such concrete remains challenging to predict due to limited and strongly coupled experimental factors. This study developed an explainable Automated Machine Learning (AutoML) framework with integrated uncertainty quantification to predict the 28-day compressive strength of CO2-cured concrete. The framework was built using 198 standardized experimental data and trained with four algorithms—Random Forest (RF), Support Vector Regression (SVR), eXtreme Gradient Boosting (XGBoost), and the transformer-based Tabular Prior-Data Fitted Network (TabPFN). To enhance model accuracy and efficiency, stratified cross-validation, hyperparameter optimization, and bootstrap-based uncertainty analysis were applied during training. The results show that TabPFN achieves the highest predictive accuracy (test R2 = 0.959) and maintains a stable 95% prediction interval. SHapley Additive exPlanations (SHAP) indicates that cement content, aggregate composition, water–binder (W/B) ratio, and CO2 curing time are the dominant factors, with an optimal W/B ratio near 0.40. Interaction analysis further reveals synergistic effects between cement content and W/B, and a strengthening coupling between curing time and CO2 concentration at longer durations. The framework enhances predictive reliability and explainability, supporting mixture design and curing optimization for low-carbon concrete development. Full article
(This article belongs to the Section Building Materials, and Repair & Renovation)
Show Figures

Figure 1

25 pages, 2290 KB  
Article
Machine Learning-Based Risk Stratification for Sudden Cardiac Death Using Clinical and Device-Derived Data
by Hana Ivandic, Branimir Pervan, Mislav Puljevic, Vedran Velagic and Alan Jovic
Sensors 2026, 26(1), 86; https://doi.org/10.3390/s26010086 - 22 Dec 2025
Viewed by 571
Abstract
Sudden cardiac death (SCD) remains a major clinical challenge, with implantable cardioverter-defibrillators (ICDs) serving as the primary preventive intervention. Current patient selection guidelines rely on limited and imperfect risk markers. This study explores the potential of machine learning (ML) models to improve SCD [...] Read more.
Sudden cardiac death (SCD) remains a major clinical challenge, with implantable cardioverter-defibrillators (ICDs) serving as the primary preventive intervention. Current patient selection guidelines rely on limited and imperfect risk markers. This study explores the potential of machine learning (ML) models to improve SCD risk prediction using tabular clinical data that include features derived from medical sensing devices such as electrocardiograms (ECGs) and ICDs. Several ML models, including tree-based models, Naive Bayes (NB), logistic regression (LR), and voting classifiers (VC), were trained on demographic, clinical, laboratory, and device-derived variables from patients who underwent ICD implantation at a Croatian tertiary center. The target variable was the activation of the ICD device (appropriate or inappropriate/missed), serving as a surrogate for high-risk SCD detection. Models were optimized for the F2-score to prioritize high-risk patient detection, and interpretability was achieved with post hoc SHAP value analysis, which confirmed known and revealed additional potential SCD predictors. The random forest (RF) model achieved the highest F2-score (F2-score 0.74, AUC-ROC 0.73), demonstrating a recall of 97.30% and meeting the primary objective of high true positive detection, while the VC classifier achieved the highest overall discrimination (F2-score 0.71, AUC-ROC 0.76). The predictive performance of multiple ML models, particularly the high recall they achieved, demonstrates the promising potential of ML to refine ICD patient selection. Full article
(This article belongs to the Special Issue Machine Learning in Biomedical Signal Processing)
Show Figures

Figure 1

24 pages, 1239 KB  
Article
Privacy-Preserving Classification of Medical Tabular Data with Homomorphic Encryption
by Fairuz Haq, Chao Chen and Zesheng Chen
Algorithms 2025, 18(12), 731; https://doi.org/10.3390/a18120731 - 21 Nov 2025
Viewed by 633
Abstract
Machine learning (ML) offers significant potential for disease prediction, clinical decision support, and medical data classification, but its reliance on sensitive patient data raises privacy and security concerns, particularly under strict healthcare regulations. Traditional encryption methods require data to be decrypted prior to [...] Read more.
Machine learning (ML) offers significant potential for disease prediction, clinical decision support, and medical data classification, but its reliance on sensitive patient data raises privacy and security concerns, particularly under strict healthcare regulations. Traditional encryption methods require data to be decrypted prior to computation, such as in ML workflows, thereby introducing risks of exposure and undermining data confidentiality. Homomorphic Encryption (HE) addresses this challenge by enabling computations directly on encrypted data, ensuring end-to-end privacy. This paper explores the integration of the Cheon-Kim-Kim-Song (CKKS) HE scheme into the inference phase of medical tabular data classification. We evaluate the performance of Logistic Regression (LR), Support Vector Machine (SVM), and a lightweight multilayer perceptron (MLP) under HE-based inference, and compare their classification accuracy, computational overhead, and latency against plaintext counterparts. Additionally, we propose two hybrid models (LR-MLP and SVM-MLP) to accelerate training convergence and enhance inference performance. Experimental results demonstrate that while HE-based inference introduces moderate computational cost and data transmission overheads, it maintains accuracy comparable to plaintext inference. These outcomes affirm the practical feasibility of HE for privacy-preserving machine learning in healthcare, while also highlighting key implementation trade-offs. Furthermore, the findings support the advancement of secure AI systems and promote the adoption of cryptographic techniques in digital health and other privacy-critical fields. Full article
(This article belongs to the Section Evolutionary Algorithms and Machine Learning)
Show Figures

Figure 1

15 pages, 2816 KB  
Article
Electron Density and Effective Atomic Number as Quantitative Biomarkers for Differentiating Malignant Brain Tumors: An Exploratory Study with Machine Learning
by Tsubasa Nakano, Daisuke Hirahara, Tomohito Hasegawa, Kiyohisa Kamimura, Masanori Nakajo, Junki Kamizono, Koji Takumi, Masatoyo Nakajo, Fumitaka Ejima, Ryota Nakanosono, Ryoji Yamagishi, Fumiko Kanzaki, Hiroki Muraoka, Nayuta Higa, Hajime Yonezawa, Ikumi Kitazono, Jihun Kwon, Gregor Pahn, Eran Langzam, Ko Higuchi and Takashi Yoshiuraadd Show full author list remove Hide full author list
Tomography 2025, 11(11), 120; https://doi.org/10.3390/tomography11110120 - 29 Oct 2025
Viewed by 804
Abstract
Objectives: The potential use of electron density (ED) and effective atomic number (Zeff) derived from dual-energy computed tomography (DECT) as novel quantitative imaging biomarkers for differentiating malignant brain tumors was investigated. Methods: Data pertaining to 136 patients with a pathological diagnosis of brain [...] Read more.
Objectives: The potential use of electron density (ED) and effective atomic number (Zeff) derived from dual-energy computed tomography (DECT) as novel quantitative imaging biomarkers for differentiating malignant brain tumors was investigated. Methods: Data pertaining to 136 patients with a pathological diagnosis of brain metastasis (BM), glioblastoma, and primary central nervous system lymphoma (PCNSL) were retrospectively reviewed. The 10th percentile, mean and 90th percentile values of conventional 120-kVp CT value (CTconv), ED, Zeff, and relative apparent diffusion coefficient derived from diffusion-weighted magnetic resonance imaging (rADC: ADC of lesion divided by ADC of normal-appearing white matter) within the contrast-enhanced tumor region were compared across the three groups. Furthermore, machine learning (ML)-based diagnostic models were developed to maximize diagnostic performance for each tumor classification using the indices of DECT parameters and rADC. Machine learning models were developed using the AutoGluon-Tabular framework with rigorous patient-level data splitting into training (60%), validation (20%), and independent test sets (20%). Results: The 10th percentile of Zeff was significantly higher in glioblastomas than in BMs (p = 0.02), and it was the only index with a significant difference between BMs and glioblastomas. In the comparisons including PCNSLs, all indices of CTconv, Zeff, and rADC exhibited significant differences (p < 0.001–0.02). DECT-based ML models exhibited high area under the receiver operating characteristic curves (AUC) for all pairwise differentiations (BMs vs. Glioblastomas: AUC = 0.83; BMs vs. PCNSLs: AUC = 0.91; Glioblastomas vs. PCNSLs: AUC = 0.82). Combined models of DECT and rADC demonstrated excellent diagnostic performance between BMs and PCNSLs (AUC = 1) and between Glioblastomas and PCNSLs (AUC = 0.93). Conclusion: This study suggested the potential of DECT-derived ED and Zeff as novel quantitative imaging biomarkers for differentiating malignant brain tumors. Full article
Show Figures

Figure 1

25 pages, 3236 KB  
Article
A Wearable IoT-Based Measurement System for Real-Time Cardiovascular Risk Prediction Using Heart Rate Variability
by Nurdaulet Tasmurzayev, Bibars Amangeldy, Timur Imankulov, Baglan Imanbek, Octavian Adrian Postolache and Akzhan Konysbekova
Eng 2025, 6(10), 259; https://doi.org/10.3390/eng6100259 - 2 Oct 2025
Viewed by 3159
Abstract
Cardiovascular diseases (CVDs) remain the leading cause of global mortality, with ischemic heart disease (IHD) being the most prevalent and deadly subtype. The growing burden of IHD underscores the urgent need for effective early detection methods that are scalable and non-invasive. Heart Rate [...] Read more.
Cardiovascular diseases (CVDs) remain the leading cause of global mortality, with ischemic heart disease (IHD) being the most prevalent and deadly subtype. The growing burden of IHD underscores the urgent need for effective early detection methods that are scalable and non-invasive. Heart Rate Variability (HRV), a non-invasive physiological marker influenced by the autonomic nervous system (ANS), has shown clinical relevance in predicting adverse cardiac events. This study presents a photoplethysmography (PPG)-based Zhurek IoT device, a custom-developed Internet of Things (IoT) device for non-invasive HRV monitoring. The platform’s effectiveness was evaluated using HRV metrics from electrocardiography (ECG) and PPG signals, with machine learning (ML) models applied to the task of early IHD risk detection. ML classifiers were trained on HRV features, and the Random Forest (RF) model achieved the highest classification accuracy of 90.82%, precision of 92.11%, and recall of 91.00% when tested on real data. The model demonstrated excellent discriminative ability with an area under the ROC curve (AUC) of 0.98, reaching a sensitivity of 88% and specificity of 100% at its optimal threshold. The preliminary results suggest that data collected with the “Zhurek” IoT devices are promising for the further development of ML models for IHD risk detection. This study aimed to address the limitations of previous work, such as small datasets and a lack of validation, by utilizing real and synthetically augmented data (conditional tabular GAN (CTGAN)), as well as multi-sensor input (ECG and PPG). The findings of this pilot study can serve as a starting point for developing scalable, remote, and cost-effective screening systems. The further integration of wearable devices and intelligent algorithms is a promising direction for improving routine monitoring and advancing preventative cardiology. Full article
Show Figures

Figure 1

19 pages, 1318 KB  
Article
Hybrid Stochastic–Machine Learning Framework for Postprandial Glucose Prediction in Type 1 Diabetes
by Irina Naskinova, Mikhail Kolev, Dilyana Karova and Mariyan Milev
Algorithms 2025, 18(10), 623; https://doi.org/10.3390/a18100623 - 1 Oct 2025
Cited by 2 | Viewed by 871
Abstract
This research introduces a hybrid framework that integrates stochastic modeling and machine learning for predicting postprandial glucose levels in individuals with Type 1 Diabetes (T1D). The primary aim is to enhance the accuracy of glucose predictions by merging a biophysical Glucose–Insulin–Meal (GIM) model [...] Read more.
This research introduces a hybrid framework that integrates stochastic modeling and machine learning for predicting postprandial glucose levels in individuals with Type 1 Diabetes (T1D). The primary aim is to enhance the accuracy of glucose predictions by merging a biophysical Glucose–Insulin–Meal (GIM) model with advanced machine learning techniques. This framework is tailored to utilize the Kaggle BRIST1D dataset, which comprises real-world data from continuous glucose monitoring (CGM), insulin administration, and meal intake records. The methodology employs the GIM model as a physiological prior to generate simulated glucose and insulin trajectories, which are then utilized as input features for the machine learning (ML) component. For this component, the study leverages the Light Gradient Boosting Machine (LightGBM) due to its efficiency and strong performance with tabular data, while Long Short-Term Memory (LSTM) networks are applied to capture temporal dependencies. Additionally, Bayesian regression is integrated to assess prediction uncertainty. A key advancement of this research is the transition from a deterministic GIM formulation to a stochastic differential equation (SDE) framework, which allows the model to represent the probabilistic range of physiological responses and improves uncertainty management when working with real-world data. The findings reveal that this hybrid methodology enhances both the precision and applicability of glucose predictions by integrating the physiological insights of Glucose Interaction Models (GIM) with the flexibility of data-driven machine learning techniques to accommodate real-world variability. This innovative framework facilitates the creation of robust, transparent, and personalized decision-support systems aimed at improving diabetes management. Full article
Show Figures

Figure 1

20 pages, 646 KB  
Article
Adversarial Attacks Detection Method for Tabular Data
by Łukasz Wawrowski, Piotr Biczyk, Dominik Ślęzak and Marek Sikora
Mach. Learn. Knowl. Extr. 2025, 7(4), 112; https://doi.org/10.3390/make7040112 - 1 Oct 2025
Viewed by 1591
Abstract
Adversarial attacks involve malicious actors introducing intentional perturbations to machine learning (ML) models, causing unintended behavior. This poses a significant threat to the integrity and trustworthiness of ML models, necessitating the development of robust detection techniques to protect systems from potential threats. The [...] Read more.
Adversarial attacks involve malicious actors introducing intentional perturbations to machine learning (ML) models, causing unintended behavior. This poses a significant threat to the integrity and trustworthiness of ML models, necessitating the development of robust detection techniques to protect systems from potential threats. The paper proposes a new approach for detecting adversarial attacks using a surrogate model and diagnostic attributes. The method was tested on 22 tabular datasets on which four different ML models were trained. Furthermore, various attacks were conducted, which led to obtaining perturbed data. The proposed approach is characterized by high efficiency in detecting known and unknown attacks—balanced accuracy was above 0.94, with very low false negative rates (0.02–0.10) for binary detection. Sensitivity analysis shows that classifiers trained based on diagnostic attributes can detect even very subtle adversarial attacks. Full article
(This article belongs to the Section Learning)
Show Figures

Graphical abstract

29 pages, 1427 KB  
Article
Gallstone Classification Using Random Forest Optimized by Sand Cat Swarm Optimization Algorithm with SHAP and DiCE-Based Interpretability
by Proshenjit Sarker, Jun-Jiat Tiang and Abdullah-Al Nahid
Sensors 2025, 25(17), 5489; https://doi.org/10.3390/s25175489 - 3 Sep 2025
Cited by 1 | Viewed by 1742
Abstract
Gallstone disease affects approximately 10–20% of the global adult population, with early diagnosis being essential for effective treatment and management. While image-based machine learning (ML) models have shown high accuracy in gallstone detection, tabular data approaches remain less explored. In this study, we [...] Read more.
Gallstone disease affects approximately 10–20% of the global adult population, with early diagnosis being essential for effective treatment and management. While image-based machine learning (ML) models have shown high accuracy in gallstone detection, tabular data approaches remain less explored. In this study, we have proposed a Random Forest (RF) classifier optimized using the Sand Cat Swarm Optimization (SCSO) algorithm for gallstone prediction based on a tabular dataset. Our experiments have been conducted across four frameworks: only RF without cross-validation (CV), RF with CV, RF-SCSO without CV, and RF-SCSO with CV. Only RF without CV model has achieved 81.25%, 79.07%, 85%, and 73.91% accuracy, F-score, precision, and recall, respectively, using all 38 features, while the RF with CV has obtained a 10-fold cross-validation accuracy of 78.42% using the same feature set. With SCSO-based feature reduction, the RF-SCSO without and with CV models have delivered a comparable accuracy of 79.17% and 78.32%, respectively, using only 13 features, indicating effective dimensionality reduction. SHAP analysis has identified CRP, Vitamin D, and AAST as the most influential features, and DiCE has further illustrated the model’s behavior by highlighting corrective counterfactuals for misclassified instances. These findings demonstrate the potential of interpretable, feature-optimized ML models for gallstone diagnosis using structured clinical data. Full article
(This article belongs to the Section Biomedical Sensors)
Show Figures

Figure 1

23 pages, 3347 KB  
Article
Integrating Remote Sensing and Weather Time Series for Australian Irrigated Rice Phenology Prediction
by Sunil Kumar Jha, James Brinkhoff, Andrew J. Robson and Brian W. Dunn
Remote Sens. 2025, 17(17), 3050; https://doi.org/10.3390/rs17173050 - 2 Sep 2025
Cited by 1 | Viewed by 1835
Abstract
Phenology prediction is critical for optimizing the timing of rice crop management operations such as fertilization and irrigation, particularly in the face of increasing climate variability. This study aimed to estimate three key developmental stages in the temperate irrigated rice systems of Australia: [...] Read more.
Phenology prediction is critical for optimizing the timing of rice crop management operations such as fertilization and irrigation, particularly in the face of increasing climate variability. This study aimed to estimate three key developmental stages in the temperate irrigated rice systems of Australia: panicle initiation (PI), flowering, and harvest maturity. Extensive and diverse field observations (n302) were collected over four consecutive seasons (2022–2025) from the rice-growing regions of the Murrumbidgee and Murray Valleys in southern New South Wales, encompassing six varieties and three sowing methods. The extent of data available allowed a number of traditional and emerging machine learning (ML) models to be directly compared to determine the most robust strategies to predict Australian rice crop phenology. Among all models, Tabular Prior-data Fitted Network (TabPFN), a pre-trained transformer model trained on large synthetic datasets, achieved the highest precision for PI and flowering predictions, with root mean square errors (RMSEs) of 4.9 and 6.5 days, respectively. Meanwhile, long short-term memory (LSTM) excelled in predicting harvest maturity with an RMSE of 5.9 days. Notably, TabPFN achieved strong results without the need for hyperparameter tuning, consistently outperforming other ML approaches. Across all stages, models that integrated remote sensing (RS) and weather variables consistently outperformed those relying on single-source input. These findings underscore the value of hybrid data fusion and modern time series modeling techniques for accurate and scalable phenology prediction, ultimately enabling more informed and adaptive agronomic decision-making. Full article
Show Figures

Figure 1

24 pages, 4428 KB  
Article
Average Voltage Prediction of Battery Electrodes Using Transformer Models with SHAP-Based Interpretability
by Mary Vinolisha Antony Dhason, Indranil Bhattacharya, Ernest Ozoemela Ezugwu and Adeloye Ifeoluwa Ayomide
Energies 2025, 18(17), 4587; https://doi.org/10.3390/en18174587 - 29 Aug 2025
Viewed by 1233
Abstract
Batteries are ubiquitous, with their presence ranging from electric vehicles to portable electronics. Research focused on increasing average voltage, improving stability, and extending cycle longevity of batteries is pivotal for the advancement of battery technology. These advancements can be accelerated through research into [...] Read more.
Batteries are ubiquitous, with their presence ranging from electric vehicles to portable electronics. Research focused on increasing average voltage, improving stability, and extending cycle longevity of batteries is pivotal for the advancement of battery technology. These advancements can be accelerated through research into battery chemistries. The traditional approach, which examines each material combination individually, poses significant challenges in terms of resources and financial investment. Physics-based simulations, while detailed, are both time-consuming and resource-intensive. Researchers aim to mitigate these concerns by employing Machine Learning (ML) techniques. In this study, we propose a Transformer-based deep learning model for predicting the average voltage of battery electrodes. Transformers, known for their ability to capture complex dependencies and relationships, are adapted here for tabular data and regression tasks. The model was trained on data from the Materials Project database. The results demonstrated strong predictive performance, with lower mean absolute error (MAE) and mean squared error (MSE), and higher R2 values, indicating high accuracy in voltage prediction. Additionally, we conducted detailed per-ion performance analysis across ten working ions and apply sample-wise loss weighting to address data imbalance, significantly improving accuracy on rare-ion systems (e.g., Rb and Y) while preserving overall performance. Furthermore, we performed SHAP-based feature attribution to interpret model predictions, revealing that gravimetric energy and capacity dominate prediction influence, with architecture-specific differences in learned feature importance. This work highlights the potential of Transformer architectures in accelerating the discovery of advanced materials for sustainable energy storage. Full article
Show Figures

Figure 1

12 pages, 811 KB  
Article
Determination of Malignancy Risk Factors Using Gallstone Data and Comparing Machine Learning Methods to Predict Malignancy
by Sirin Cetin, Ayse Ulgen, Ozge Pasin, Hakan Sıvgın and Meryem Cetin
J. Clin. Med. 2025, 14(17), 6091; https://doi.org/10.3390/jcm14176091 - 28 Aug 2025
Cited by 1 | Viewed by 1280
Abstract
Background/Objectives: Gallstone disease, a prevalent and costly digestive system disorder, is influenced by multifactorial risk factors, some of which may predispose to malignancy. This study aims to evaluate the association between gallstone disease and malignancy using advanced machine learning (ML) algorithms. Methods: A [...] Read more.
Background/Objectives: Gallstone disease, a prevalent and costly digestive system disorder, is influenced by multifactorial risk factors, some of which may predispose to malignancy. This study aims to evaluate the association between gallstone disease and malignancy using advanced machine learning (ML) algorithms. Methods: A dataset comprising approximately 1000 patients was analyzed, employing six ML methods: random forests (RFs), support vector machines (SVMs), multi-layer perceptron (MLP), MLP with PyTorch 2.3.1 (MLP_PT), naive Bayes (NB), and Tabular Prior-data Fitted Network (TabPFN). Comparative performance was assessed using Pearson correlation, sensitivity, specificity, Kappa, receiver operating characteristic (ROC), area under curve (AUC), and accuracy metrics. Results: Our results revealed that age, body mass index (BMI), and history of HRT were the most significant predictors of malignancy. Among the ML models, TabPFN emerged as the most effective, achieving superior performance across multiple evaluation criteria. Conclusions: This study highlights the potential of leveraging cutting-edge ML methodologies to uncover complex relationships in clinical datasets, offering a novel perspective on gallstone-related malignancy. By identifying critical risk factors and demonstrating the efficacy of TabPFN, this research provides actionable insights for predictive modeling and personalized patient management in clinical practice. Full article
(This article belongs to the Section General Surgery)
Show Figures

Figure 1

24 pages, 3133 KB  
Article
A Feature Selection-Based Multi-Stage Methodology for Improving Driver Injury Severity Prediction on Imbalanced Crash Data
by Çiğdem İnan Acı, Gizen Mutlu, Murat Ozen, Esra Sarac and Vahide Nida Kılıç Uzel
Electronics 2025, 14(17), 3377; https://doi.org/10.3390/electronics14173377 - 25 Aug 2025
Cited by 1 | Viewed by 1292
Abstract
Predicting driver injury severity is critical for enhancing road safety, but it is complicated because fatal accidents inherently create class imbalance within datasets. This study conducts a comparative analysis of machine-learning (ML) and deep-learning (DL) models for multi-class driver injury severity prediction using [...] Read more.
Predicting driver injury severity is critical for enhancing road safety, but it is complicated because fatal accidents inherently create class imbalance within datasets. This study conducts a comparative analysis of machine-learning (ML) and deep-learning (DL) models for multi-class driver injury severity prediction using a comprehensive dataset of 107,195 traffic accidents from the Adana, Mersin, and Antalya provinces in Turkey (2018–2023). To address the significant imbalance between fatal, injury, and non-injury classes, the hybrid SMOTE-ENN algorithm was employed for data balancing. Subsequently, feature selection techniques, including Relief-F, Extra Trees, and Recursive Feature Elimination (RFE), were utilized to identify the most influential predictors. Various ML models (K-Nearest Neighbors (KNN), XGBoost, Random Forest) and DL architectures (Convolutional Neural Network (CNN), Long Short-Term Memory (LSTM), Recurrent Neural Network (RNN)) were developed and rigorously evaluated. The findings demonstrate that traditional ML models, particularly KNN (0.95 accuracy, 0.95 F1-macro) and XGBoost (0.92 accuracy, 0.92 F1-macro), significantly outperformed DL models. The SMOTE-ENN technique proved effective in managing class imbalance, and RFE identified a critical 25-feature subset including driver fault, speed limit, and road conditions. This research highlights the efficacy of well-preprocessed ML approaches for tabular crash data, offering valuable insights for developing robust predictive tools to improve traffic safety outcomes. Full article
(This article belongs to the Special Issue Machine Learning Approach for Prediction: Cross-Domain Applications)
Show Figures

Graphical abstract

17 pages, 977 KB  
Article
Evaluation of Learning-Based Models for Crop Recommendation in Smart Agriculture
by Muhammad Abu Bakr, Ahmad Jaffar Khan, Sultan Daud Khan, Mohammad Haseeb Zafar, Mohib Ullah and Habib Ullah
Information 2025, 16(8), 632; https://doi.org/10.3390/info16080632 - 24 Jul 2025
Viewed by 5290
Abstract
The use of intelligent crop recommendation systems has become crucial in the era of smart agriculture to increase yield and enhance resource utilization. In this study, we compared different machine learning (ML), and deep learning (DL) models utilizing structured tabular data for crop [...] Read more.
The use of intelligent crop recommendation systems has become crucial in the era of smart agriculture to increase yield and enhance resource utilization. In this study, we compared different machine learning (ML), and deep learning (DL) models utilizing structured tabular data for crop recommendation. During our experimentation, both ML and DL models achieved decent performance. However, their architectures are not suited for setting up conversational systems. To overcome this limitation, we converted the structured tabular data to descriptive textual data and utilized it to fine-tune Large Language Models (LLMs), including BERT and GPT-2. In comprehensive experiments, we demonstrated that GPT-2 achieved a higher accuracy of 99.55% than the best-performing ML and DL models, while maintaining precision of 99.58% and recall of 99.55%. We also demonstrated that GPT-2 not only keeps up competitive accuracy but also offers natural language interaction capabilities. Due to this capability, it is a viable option to be used for real-time agricultural decision support systems. Full article
Show Figures

Figure 1

Back to TopTop