A Support Vector Machine (SVM) for regression is a popular machine learning model that aims to solve nonlinear function approximation problems wherein explicit model equations are difficult to formulate. The performance of an SVM depends largely on the selection of its parameters. Choosing between an SVM that solves an optimization problem with inequality constrains and one that solves the least square of errors (LS-SVM) adds to the complexity. Various methods have been proposed for tuning parameters, but no article puts the SVM and LS-SVM side by side to discuss the issue using a large dataset from the real world, which could be problematic for existing parameter tuning methods. We investigated both the SVM and LS-SVM with an artificial dataset and a dataset of more than 200,000 points used for the reconstruction of the global surface ocean CO2
concentration. The results reveal that: (1) the two models are most sensitive to the parameter of the kernel function, which lies in a narrow range for scaled input data; (2) the optimal values of other parameters do not change much for different datasets; and (3) the LS-SVM performs better than the SVM in general. The LS-SVM is recommended, as it has less parameters to be tuned and yields a smaller bias. Nevertheless, the SVM has advantages of consuming less computer resources and taking less time to train. The results suggest initial parameter guesses for using the models.
This is an open access article distributed under the Creative Commons Attribution License
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited