Next Article in Journal
A New Cosine Topp–Leone Exponentiated Half Logistic-G Family of Distributions with Applications
Previous Article in Journal
Analysis of Implicit Neutral-Tempered Caputo Fractional Volterra–Fredholm Integro-Differential Equations Involving Retarded and Advanced Arguments
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
Article

Comparative Analysis and Optimisation of Machine Learning Models for Regression and Classification on Structured Tabular Datasets

by
Siegfried Fredrich Stumpfe
and
Sandile Charles Shongwe
*
Department of Mathematical Statistics and Actuarial Science, Faculty of Natural and Agricultural Sciences, University of the Free State, Bloemfontein 9301, South Africa
*
Author to whom correspondence should be addressed.
Mathematics 2026, 14(3), 473; https://doi.org/10.3390/math14030473
Submission received: 9 December 2025 / Revised: 6 January 2026 / Accepted: 13 January 2026 / Published: 29 January 2026
(This article belongs to the Special Issue Computational Statistics: Analysis and Applications for Mathematics)

Abstract

This research entails comparative analysis and optimisation of machine learning models for regression and classification tasks on structured tabular datasets. The primary target audience for this analysis comprises researchers and practitioners working with structured tabular data. Common fields include biostatistics, insurance, and financial risk modelling, where computational efficiency and robust predictive performance are essential. Four machine learning techniques (i.e., linear/logistic regression, support vector machines (SVMs), Extreme Gradient Boosting (XGBoost), and Multi-Layered Perceptrons (MLPs)) were applied across 72 datasets sourced from OpenML and Kaggle. The datasets systematically varied by observation size, dimensionality, noise levels, linearity, and class balance. Based on extensive empirical analysis (72 datasets ×4 models ×2 configurations =576 experiments), it is observed that, understanding the dataset characteristics is more critical than extensive hyperparameter tuning for optimal model performance. Also, linear models are robust across various settings, while non-linear models, like XGBoost and MLP, perform better in complex and noisy environments. In general, this study provides valuable insights for model selection and benchmarking in machine learning applications that involve structured tabular datasets.
Keywords: machine learning; regression; classification; noise; balance; linearity; benchmark; hyperparameter tuning machine learning; regression; classification; noise; balance; linearity; benchmark; hyperparameter tuning

Share and Cite

MDPI and ACS Style

Stumpfe, S.F.; Shongwe, S.C. Comparative Analysis and Optimisation of Machine Learning Models for Regression and Classification on Structured Tabular Datasets. Mathematics 2026, 14, 473. https://doi.org/10.3390/math14030473

AMA Style

Stumpfe SF, Shongwe SC. Comparative Analysis and Optimisation of Machine Learning Models for Regression and Classification on Structured Tabular Datasets. Mathematics. 2026; 14(3):473. https://doi.org/10.3390/math14030473

Chicago/Turabian Style

Stumpfe, Siegfried Fredrich, and Sandile Charles Shongwe. 2026. "Comparative Analysis and Optimisation of Machine Learning Models for Regression and Classification on Structured Tabular Datasets" Mathematics 14, no. 3: 473. https://doi.org/10.3390/math14030473

APA Style

Stumpfe, S. F., & Shongwe, S. C. (2026). Comparative Analysis and Optimisation of Machine Learning Models for Regression and Classification on Structured Tabular Datasets. Mathematics, 14(3), 473. https://doi.org/10.3390/math14030473

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop