3.1. Support Vector Machine for Regression
A support vector machine (SVM) was officially proposed by Cortes & Vapnik in 1995, which was a significant achievement in the field of machine learning. Vapnik et al. [
16,
17] introduced an insensitive loss function
based on the SVM classification and obtained a support vector machine for regression (SVR), in an attempt to solve the regression problem. The structural diagram of the SVR is shown in
Figure 1 in which the number of allocated containers of the output container ship for one voyage,
, is a linear combination of intermediate nodes [
33]. Each intermediate node corresponds to a support vector,
represents the input variable,
is the network weight, and
is the inner-product kernel function [
34].
The algorithm is as follows:
Step 1: Given a training set, , where , .
Step 2: Select the appropriate kernel function , the appropriate precision and penalty parameter .
The kernel function effects the transformation from space
to Hilbert space
,
i.e., it replaces the inner product in the original space,
. The insensitive loss function.
is as given below:
is a positive number selected in advance and when the difference between the observed value
and predicted value
of
point does not exceed a given value set in advance, the predicted value
at that point is considered to be lossless, although the predicted value
and the observed value
may not be exactly equal. An image of the insensitive loss function
is shown in
Figure 2.
Step 3: Construct and solve convex quadratic programming
The solution is given by the expression .
Step 4: Calculation of
: Select the component
or
of
in the open interval
. If
is selected, then
and if
is selected, then
Step 5: Construct the decision function
3.2. Construction of Mixed Kernel Function
The assessment of the learning performance and generalization performance of the SVM depends on the selection of the kernel functions. Two types of kernel functions are widely used: (1) the
polynomial kernel function,
and (2) the Gaussian radial basis kernel function,
[
35].
The polynomial kernel function is a global kernel function with strong generalization ability but weak learning ability [
36], whereas the Gaussian radial basis kernel function is a local kernel function with strong learning ability but weak generalization ability. It is difficult to obtain good results in regression forecasting [
37] by using only a single kernel function. Moreover, there are certain limitations in using the SVM with a single kernel function to predict the non-linear change in the data of the number of allocated containers of the container ship for one voyage.
A mixed kernel function is a combination of single kernel functions, integrating their advantages while compensating for the drawbacks, to obtain a performance that can not be achieved by a single kernel function. The mixed function proposed in this study is based on a comprehensive consideration of the local and global kernel functions. According to Mercer’s theorem, the convex combination of two Mercer kernel functions is a Mercer kernel function, and thus, the following kernel functions given by Equation (11) are also kernel functions:
where
, and
is the weight adjustment factor.
In Equation (11), the flexible combination of the radial basis kernel function and polynomial kernel function is obtained by adjusting the value of
[
38]. When
, the polynomial kernel function is dominant and the mixed function shows strong generalization ability and when
, the radial basis kernel function is dominant, and the mixed kernel function shows strong learning ability. Therefore, the mixed kernel function SVM exhibits a better overall performance in predicting the number of allocated containers of the container ship for one voyage.
3.3. Parameter Optimization
The prediction accuracy of the mixed kernel function SVM is related to the insensitive loss parameter
, penalty parameter
, polynomial kernel function parameter
, width of the radial basis kernel function
, and the weight adjustment factor
. At present, when the SVM is used for regression fitting prediction, the methods for determining the penalty parameters and kernel parameters mainly include the experimental method [
39], grid method [
40], ant colony algorithm [
41], and particle swarm algorithm [
42]. Although the relevant parameters for the experiment can be obtained by a large number of calculations, the efficiency is low and the selected parameters do not necessarily measure up to the global optimum. By setting the step size for the data within the parameter range, the grid method sequentially optimizes and compares the results to obtain the optimal parameter values. If the parameter range is large and the set step size is small, the time spent in the optimization process is too long, and the result obtained may be a local optimum. As a general stochastic optimization method, the ant colony algorithm has achieved good results in a series of combinatorial optimization procedures. However, the parameter setting in the algorithm is usually determined by experimental methods resulting in a close interdependence between the optimization performance of the method and human experience, making it difficult to optimize the performance of the algorithm. Due to the loss of diversity of species in search space, the particle swarm algorithm leads to premature convergence and poor local optimization ability [
43,
44]. Therefore, it is of great importance to apply the appropriate optimization algorithm for optimal combinatorial results of the parameters of the support vector of the mixed kernel functions to obtain the SVM with the best performance [
45], which will ensure an accurate prediction of the number of allocated containers of the container ship for one voyage.
The GA [
46] is the most widely used successful algorithm in intelligent optimization. It is a general optimization algorithm with a relatively simple coding technique using genetic operators. Its optimization is not restrained by restrictive conditions and its two most prominent features are implicit parallelism and global solution space search. Therefore, GA is used in this study to optimize the parameter combination
consisting of 5 parameters.
In the optimization of the SVM parameter combination of the mixed kernel function by using GA, each chromosome represents a set of parameters and the chromosome species search for the optimal solution through the GA (including interlace operation and mutation operation) and strategy selection. As the objective of optimizing the SVM parameters of the mixed kernel function is to obtain better prediction accuracy, the mixed kernel function
SVM model is trained and then tested by 5 × cross validation. Proportional selection is the selection strategy adopted in this study. After the probability is obtained, the roulette wheel is used to determine the selection operation and hence, the fitness function is defined as the reciprocal of the prediction error of the mixed kernel function SVM as given below:
where NP is the number of data in each sample subset,
is the predicted value, and
is the actual value. Thus, the chromosome with the minimum fitness function value in the whole chromosome swarm as well as its index among the chromosome swarm is determined.
The step-wise process of optimizing the parameters
by using GA is given below and the flow diagram is illustrated in
Figure 3.
Step 1: Data preprocessing, mainly including normalization processing and dividing the sample data into training data and test data.
Step 2: Initialize various parameters of the GA and determine the range of values of the various parameters of the mixed kernel function SVM. First, set the maximum number of generations (gen = 50), population size (NP), individual length, generation gap (GGAP = 0.95), crossover rate (
= 0.7), and mutation rate (
= 0.01). Next, set the range of the parameters
. Since this optimization model (GA) is not the highlight in this paper, the criteria for parameter selection, i.e., gen = 50, GGAP = 0.95,
= 0.7,
= 0.01, are not given here in detail, and the selection of parameters is based on the empirical practice provided in reference [
46]. Moreover, the setting of these parameters has achieved good results in this paper.
Step 3: Encode the chromosomes and generate the initial population. Encode the chromosomes in a 7-bit binary and randomly generate NP individuals () to form the initial population ().
Step 4: Calculate the fitness of each individual. Find the minimum mean squared error (MSE) among the GA swarm.
Step 5: If the termination condition is satisfied, the individual with the greatest fitness in is the most sought after result which is then decoded to obtain the optimal parameters . The optimized parameters are used to train the SVM model, which generates the prediction result. This marks the end of the algorithm.
Step 6: Proportional selection is performed by the roulette wheel method, and the selected probability is calculated by using Equation (13). 95% of the individuals are selected from the parent population
to form the progeny population
(as GGAP = 0.95). Genetic operations are performed on new populations, crossover operations are performed using single tangent points, and mutation operations are performed using basic bit variation operations.
Step 7: Subsequent to the genetic manipulation, a new population is obtained and the parameters are calculated. The SVM model is then trained with the new parameters.
Step 8: is now considered to be the new generation population, i.e., is replaced by , gen = gen + 1, and the process is repeated from step 4.