A Forecast Model of the Number of Containers for Containership Voyage

: Container ships must pass through multiple ports of call during a voyage. Therefore, forecasting container volume information at the port of origin followed by sending such information to subsequent ports is crucial for container terminal management and container stowage personnel. Numerous factors inﬂuence container allocation to container ships for a voyage, and the degree of inﬂuence varies, engendering a complex nonlinearity. Therefore, this paper proposes a model based on gray relational analysis (GRA) and mixed kernel support vector machine (SVM) for predicting container allocation to a container ship for a voyage. First, in this model, the weights of inﬂuencing factors are determined through GRA. Then, the weighted factors serve as the input of the SVM model, and SVM model parameters are optimized through a genetic algorithm. Numerical simulations revealed that the proposed model could effectively predict the number of containers for container ship voyage and that it exhibited strong generalization ability and high accuracy. Accordingly, this model provides a new method for predicting container volume for a voyage.


Introduction
Container transportation is a highly complicated process and involves numerous parties, necessitating close cooperation between ports, ships, shipping companies, and other relevant departments. Therefore, container transportation management is characterized by extremely detailed planning [1,2]. For example, container terminals must formulate strategies such as berthing plans [3], container truck dispatch plans, yard planning systems, and yard stowage plans [4][5][6]. In addition, ships or shipping companies must formulate voyage stowage plans for container ships at the port of departure. The number of containers in the subsequent port must be predicted, and such prediction information forms a crucial basis for the subsequent plan. These processes must be completed before the development of a stowage system for full-route container ships [7].
Changes in the number of containers allocated to container ships are influenced by several factors, which are characterized by uncertain information; this thus engenders a complex nonlinear relationship between the number of allocated containers and influencing factors [8]. The number of allocated containers is influenced by the port of call, local GDP, port industrial structures, and collection and distribution systems; by shipping company-related factors such as the capacity of a company, inland turnaround time of containers, seasonal changes in cargo volume, and quantity of containers managed by the company; and by ship-related factors such as the transportation capacity of a single ship and the full-container-loading rate of the ship. Each of these factors exerts distinct effects on the number of containers allocated to a container ship for one voyage; therefore, describing these factors by using Gray relation analysis is the serialization and patterning of the gray relation between an operating mechanism and its physical prototype, which is either not clear at all or certainly lacks a physical prototype. The "essence" of the analysis is an overall comparison of the measurements with a reference system [29]. The technical connotation of gray relation analysis is: (i) acquiring information about the differences between sequences and establishing a difference information space; (ii) establishing and calculating the differences to compare with the measurements (gray correlation degree); and (iii) establishing the order of the relation among the factors to determine the weight of each influencing factor [30]. The calculation steps are as follows [31]: Step 1: Set a sequence X 0 = (x 0 (1), x 0 (2), · · · , x 0 (k), · · · , x 0 (n)) as the reference sequence, i.e., the object of study and X i = (x i (1), x i (2), · · · , x i (k), · · · , x i (n),)(i = 1, 2, · · · , m) as the comparative sequence, i.e., the influencing factors.
Step 2: Data conversion or dimensionless processing for which, the initialization conversion is adopted in this study, wherein the first variable of each sequence is used to remove all the other variables to obtain the initial value of the image Y i (k): where i = 0, 1, 2, · · · , m and k = 1, 2, · · · , n. Step 3: Calculate the gray relation coefficient γ where ζ ∈ [0, 1] is the resolution coefficient. The resolution coefficient ζ determines the result of the correlation analysis. The literature [32] shows that when ζ ≤ 0.05, the resolution of the correlation degree changes more obviously, so ζ = 0.05 is selected in this paper.
Step 4: Calculate the correlation degree γ it is indicated that the ith factor has a greater impact on the results than the jth factor.
Step 5: Calculate the weight of the various influencing factors:

Support Vector Machine for Regression
A support vector machine (SVM) was officially proposed by Cortes & Vapnik in 1995, which was a significant achievement in the field of machine learning. Vapnik et al. [16,17] introduced an insensitive loss function ε based on the SVM classification and obtained a support vector machine for regression (SVR), in an attempt to solve the regression problem. The structural diagram of the SVR is shown in Figure 1 in which the number of allocated containers of the output container ship for one voyage, g(x), is a linear combination of intermediate nodes [33]. Each intermediate node corresponds to a support vector, x 1 , x 2 , · · · , x l represents the input variable, α * i − α i is the network weight, and K(x i , x) is the inner-product kernel function [34].
The algorithm is as follows: Step 1: Given a training set, Step 2: Select the appropriate kernel function K(x, x ), the appropriate precision ε > 0 and penalty parameter C > 0.
The kernel function effects the transformation from space R n to Hilbert space Φ : , it replaces the inner product in the original space, K(x, x ) = (Φ(x) · Φ(x )). The insensitive loss function. c is as given below: ε is a positive number selected in advance and when the difference between the observed value y and predicted value g(x) of ε point does not exceed a given value set in advance, the predicted value g(x) at that point is considered to be lossless, although the predicted value g(x) and the observed value y may not be exactly equal. An image of the insensitive loss function ε is shown in Figure 2.  is a positive number selected in advance and when the difference between the observed value y and predicted value   g x of  point does not exceed a given value set in advance, the predicted value   g x at that point is considered to be lossless, although the predicted value   g x and the observed value y may not be exactly equal. An image of the insensitive loss function  is shown in Figure 2.  Step 3: Construct and solve convex quadratic programming The solution is given by the expression     Step   is a positive number selected in advance and when the difference between the observed value y and predicted value   g x of  point does not exceed a given value set in advance, the predicted value   g x at that point is considered to be lossless, although the predicted value   g x and the observed value y may not be exactly equal. An image of the insensitive loss function  is shown in Figure 2.  Step 3: Construct and solve convex quadratic programming The solution is given by the expression     Step  Step 3: Construct and solve convex quadratic programming The solution is given by the expression a ( * ) = α 1 , α * 1 , · · · , α l , α * l T .
Step 4: Calculation of b: Select the component α j or α * k of a ( * ) in the open interval (0, C). If α j is selected, then and if α * k is selected, then Algorithms 2018, 11, 193 5 of 15 Step 5: Construct the decision function

Construction of Mixed Kernel Function
The assessment of the learning performance and generalization performance of the SVM depends on the selection of the kernel functions. Two types of kernel functions are widely used: (1) the q polynomial kernel function, K(x, x ) = [(x · x ) + 1] q , (q = 1, 2, · · ·) and (2) the Gaussian radial basis kernel function, [35].
The polynomial kernel function is a global kernel function with strong generalization ability but weak learning ability [36], whereas the Gaussian radial basis kernel function is a local kernel function with strong learning ability but weak generalization ability. It is difficult to obtain good results in regression forecasting [37] by using only a single kernel function. Moreover, there are certain limitations in using the SVM with a single kernel function to predict the non-linear change in the data of the number of allocated containers of the container ship for one voyage.
A mixed kernel function is a combination of single kernel functions, integrating their advantages while compensating for the drawbacks, to obtain a performance that can not be achieved by a single kernel function. The mixed function proposed in this study is based on a comprehensive consideration of the local and global kernel functions. According to Mercer's theorem, the convex combination of two Mercer kernel functions is a Mercer kernel function, and thus, the following kernel functions given by Equation (11) are also kernel functions: where 0 < ρ < 1, and ρ is the weight adjustment factor. In Equation (11), the flexible combination of the radial basis kernel function and polynomial kernel function is obtained by adjusting the value of ρ [38]. When ρ > 0.5, the polynomial kernel function is dominant and the mixed function shows strong generalization ability and when ρ < 0.5, the radial basis kernel function is dominant, and the mixed kernel function shows strong learning ability. Therefore, the mixed kernel function SVM exhibits a better overall performance in predicting the number of allocated containers of the container ship for one voyage.

Parameter Optimization
The prediction accuracy of the mixed kernel function SVM is related to the insensitive loss parameter ε, penalty parameter C, polynomial kernel function parameter q, width of the radial basis kernel function σ, and the weight adjustment factor ρ. At present, when the SVM is used for regression fitting prediction, the methods for determining the penalty parameters and kernel parameters mainly include the experimental method [39], grid method [40], ant colony algorithm [41], and particle swarm algorithm [42]. Although the relevant parameters for the experiment can be obtained by a large number of calculations, the efficiency is low and the selected parameters do not necessarily measure up to the global optimum. By setting the step size for the data within the parameter range, the grid method sequentially optimizes and compares the results to obtain the optimal parameter values. If the parameter range is large and the set step size is small, the time spent in the optimization process is too long, and the result obtained may be a local optimum. As a general stochastic optimization method, the ant colony algorithm has achieved good results in a series of combinatorial optimization procedures. However, the parameter setting in the algorithm is usually determined by experimental methods resulting in a close interdependence between the optimization performance of the method and human experience, making it difficult to optimize the performance of the algorithm. Due to the loss of diversity of species in search space, the particle swarm algorithm leads to premature convergence and poor local Algorithms 2018, 11,193 6 of 15 optimization ability [43,44]. Therefore, it is of great importance to apply the appropriate optimization algorithm for optimal combinatorial results of the parameters of the support vector of the mixed kernel functions to obtain the SVM with the best performance [45], which will ensure an accurate prediction of the number of allocated containers of the container ship for one voyage.
The GA [46] is the most widely used successful algorithm in intelligent optimization. It is a general optimization algorithm with a relatively simple coding technique using genetic operators. Its optimization is not restrained by restrictive conditions and its two most prominent features are implicit parallelism and global solution space search. Therefore, GA is used in this study to optimize the parameter combination (ε C q σ ρ) consisting of 5 parameters.
In the optimization of the SVM parameter combination of the mixed kernel function by using GA, each chromosome represents a set of parameters and the chromosome species search for the optimal solution through the GA (including interlace operation and mutation operation) and strategy selection. As the objective of optimizing the SVM parameters of the mixed kernel function is to obtain better prediction accuracy, the mixed kernel function (ε n C n q n σ n ρ n ) SVM model is trained and then tested by 5 × cross validation. Proportional selection is the selection strategy adopted in this study. After the probability is obtained, the roulette wheel is used to determine the selection operation and hence, the fitness function is defined as the reciprocal of the prediction error of the mixed kernel function SVM as given below: where NP is the number of data in each sample subset,P ij is the predicted value, and P ij is the actual value. Thus, the chromosome with the minimum fitness function value in the whole chromosome swarm as well as its index among the chromosome swarm is determined. The step-wise process of optimizing the parameters (ε C q σ ρ) by using GA is given below and the flow diagram is illustrated in Figure 3.
Step 1: Data preprocessing, mainly including normalization processing and dividing the sample data into training data and test data.
Step 2: Initialize various parameters of the GA and determine the range of values of the various parameters of the mixed kernel function SVM. First, set the maximum number of generations (gen = 50), population size (NP), individual length, generation gap (GGAP = 0.95), crossover rate (P x = 0.7), and mutation rate (P m = 0.01). Next, set the range of the parameters (ε C q σ ρ). Since this optimization model (GA) is not the highlight in this paper, the criteria for parameter selection, i.e., gen = 50, GGAP = 0.95, P x = 0.7, P m = 0.01, are not given here in detail, and the selection of parameters is based on the empirical practice provided in reference [46]. Moreover, the setting of these parameters has achieved good results in this paper.
Step 3: Encode the chromosomes and generate the initial population. Encode the chromosomes in a 7-bit binary and randomly generate NP individuals (s 1 , s 2 , · · · , s NP ) to form the initial population S (S = {s 1 , s 2 , · · · , s NP }).
Step 4: Calculate the fitness of each individual. Find the minimum mean squared error (MSE) among the GA swarm.
Step 5: If the termination condition is satisfied, the individual with the greatest fitness in S is the most sought after result which is then decoded to obtain the optimal parameters (ε C q σ ρ). The optimized parameters (ε C q σ ρ) are used to train the SVM model, which generates the prediction result. This marks the end of the algorithm.
Step 6: Proportional selection is performed by the roulette wheel method, and the selected probability is calculated by using Equation (13). 95% of the individuals are selected from the parent population S to form the progeny population S 1 (as GGAP = 0.95). Genetic operations are performed on new populations, crossover operations are performed using single tangent points, and mutation operations are performed using basic bit variation operations.
Step 7: Subsequent to the genetic manipulation, a new population S 3 is obtained and the parameters (ε C q σ ρ) are calculated. The SVM model is then trained with the new parameters.
Step 8: S 3 is now considered to be the new generation population, i.e., S is replaced by S 3 , gen = gen + 1, and the process is repeated from step 4.

 
C q    . The optimized parameters   C q    are used to train the SVM model, which generates the prediction result. This marks the end of the algorithm.
Step 6: Proportional selection is performed by the roulette wheel method, and the selected probability is calculated by using Equation (13). 95% of the individuals are selected from the parent population S to form the progeny population 1 S (as GGAP = 0.95). Genetic operations are performed on new populations, crossover operations are performed using single tangent points, and mutation operations are performed using basic bit variation operations.
Step 7: Subsequent to the genetic manipulation, a new population Step 8:

3
S is now considered to be the new generation population, i.e., S is replaced by 3 S , gen = gen + 1, and the process is repeated from step 4.

Example Analysis
This study set X 0 = (x 0 (1), x 0 (2), · · · , x 0 (k), · · · , x 0 (n)) as the reference sequence (i.e., the number of containers for a voyage (the study object)) and X i = (x i (1), x i (2), · · · , x i (k), · · · , x i (n),)(i = 1, 2, · · · , m) as the comparative sequence (i.e., factors influencing the number of allocated containers for one voyage during the voyage period). Parameter n represents the number of samples and m represents the number of influencing factors; in this study, m = 9. GRA was applied. The weighted influencing factors were then used as the input of the mixed kernel function SVM.

Data Samples
To establish a model for forecasting the number of containers allocated to a container ship for one voyage, factors influencing the number of allocated containers must be analyzed and an index system for forecasting the number of allocated containers must be established. Numerous factors influence the number of containers allocated to a container ship for one voyage; such factors include the port of call, the company (fleet) to which the ship belongs, and the ship itself. A predictive index for forecasting container allocation is outlined as follows (i represents the ith influencing factor, i = 1, 2, · · · , m): (1) X 1 , local GDP of the region in which the port of call is located, which can be calculated on the basis of the formula actual amount/100 million yuan; (2) X 2 , changes in port industrial structures, which can calculated according to the percentage occupied by the tertiary industry; (3) X 3 , completeness of the collection and distribution system, which can calculated according to the actual annual throughput of containers per million twenty-foot equivalent units (TEU) at the port of call; (4) X 4 , company's capacity, which can be calculated according to the actual number of containers/10,000 TEU; (5) X 5 , inland turnaround time of containers, which can be calculated according to the actual number of days; (6) X 6 , seasonal changes in cargo volume, which can be calculated as a percentage; (7) X 7 , quantity of containers handled by the company, which can be calculated according to the actual number of containers/10,000 TEU; (8) X 8 , transport capacity for a single ship, which can be calculated according to the actual number of containers/TEU; and (9) X 9 , full-container-loading rate of the ship, which can be calculated as a percentage.
For different shipping lines, ports, and container ships, collecting actual data pertaining to the nine aforementioned factors is difficult. Moreover, information on some of these factors is treated as confidential by company or ship management teams. To verify the practicality of the model, this study simulated a set of data. To ensure that the sample data were reasonable and approached real situations, this study sought information from the literature [8,9], in addition to consulting the department heads of shipping lines and stowage operators.
The selected training samples are presented in Table 1. No.

Determining the Weight of Influencing Factors
As indicated by the data in the table, the order of magnitude of the sequences was quite different, and the two sequences were standardized using Equation (1). The correlation between each influencing Algorithms 2018, 11, 193 9 of 15 factor and the number of containers allocated to the container ship for one voyage was calculated using Equations (2), (3) and (4), and the calculation results are presented in Table 2. As shown in Table 2, the correlation degrees of X 1 , X 4 , and X 7 were all approximately 0.17, indicating that the three influencing factors had the lowest effect on the number of allocated containers for one voyage and could be ignored. The correlation degree of X 5 was 0.3998, signifying that this factor had little effect on the number of allocated containers for one voyage; the correlation degrees of X 2 , X 6 , X 3 , and X 9 were higher than 0.6, indicating that these three factors had a significant effect on the number of allocated containers for one voyage. However, the correlation degree of X 8 was 0.8345, signifying that this factor had the greatest effect on the number of allocated containers for one voyage. The weight of each influencing factor was calculated using Equation (5), and Table 3 shows the results. As shown in Table 3, the weight values of X 1 , X 4 , and X 7 were relatively low (all lower than 0.091), and the weight values of the other influencing factors were higher than 0.14, with no significant difference. This is mainly because the other factors had greater effects on the number of allocated containers for one voyage, and their weight values were scattered.

Prediction of Number of Allocated Containers for One Voyage Using Mixed Kernel SVM
Weighted factors could be derived by multiplying the influencing factors by the corresponding weights: where X i is the weighted factor influencing the number of allocated containers for one voyage.
When X i in the composition vector Q = X 1 , X 2 , · · · , X 9 T was considered the input variable and X 0 was considered the corresponding output variable, a mixed kernel SVM for predicting the number of allocated containers for one voyage was constructed. All data were normalized to the interval [0, 1]. The data presented in Table 1 served as training samples, whereas those presented in Table 4 served as test samples.
This study applied MSE, mean absolute percentage error (e MAPE ), and correlation coefficients (R) to evaluate the predictive performance of the model. R was set to the interval [0, 1]. Lower MSE and e MRE values and R values approaching 1 were considered to indicate higher model predictive performance.
where l is the number of samples, y i (i = 1, 2, · · · l) is the real value of the ith sample, and y i (i = 1, 2, · · · l) is the predicted value of the ith sample.

Simulation Results and Analysis
The parameters εĈqσρ of the mixed kernel SVM were obtained through GA optimization, which was used to establish the mixed kernel SVM model and predict the number of voyage containers in the test samples. The various input variables affected the predictive performance of the model, and the specific results are presented in Table 5 and Figure 4. voyage containers in the test samples. The various input variables affected the predictive performance of the model, and the specific results are presented in Table 5 and Figure 4.  The calculation results revealed that under the same model parameters, changing the input variables engendered different predictive performance levels. The input variables of the GRA-SVM-MIXED model were constituted by all the weighted influencing factors (see Table 3  The calculation results revealed that under the same model parameters, changing the input variables engendered different predictive performance levels. The input variables of the GRA-SVM-MIXED model were constituted by all the weighted influencing factors (see Table 3 for weight values); the input variables of the SVM-Mixed model were constituted by all the unweighted factors. For the GRA-SVM-Mixed-D model, influencing factors with correlation degrees lower than 0.6 were eliminated, and the remaining influencing factors were considered the model input variables. As presented in Table 5 and Figure 4, the maximum (minimum) error, MSE, and e MRE of the GRA-SVM-Mixed model were significantly lower than those of the GRA-SVM-mixed-D and SVM-mixed models; in addition, the GRA-SVM-Mixed model had the highest correlation coefficient R, indicating that the GRA-SVM-mixed model exhibited higher predictive performance than did the other two models. As illustrated in Figures 4 and 5, the GRA-SVM-Mixed model provided closer predictions to the actual values in the test sample than did the other two models, and no large inflection point was observed. Furthermore, the maximum relative error observed for the GRA-SVM-Mixed model was −4.2%, minimum error was −0.11%, and correlation coefficient was as high as 0.9993, showing higher predictive performance. This is because after the influencing factors were subjected to gray correlation analysis, different weights were assigned to the input variables, the intrinsic correlation characteristics between the influencing factors and the number of allocated containers for one voyage were fully explored, and the influencing factors with low correlation degree were eliminated. The maximum relative error observed for the GRA-SVM-Mixed-D model was −13.22%, minimum relative error was 0.93%, and correlation coefficient was 0.9877, which was the smallest among the three models, and this could be attributed to the elimination of influencing factors with low correlation degrees. Although eliminating influencing factors with low correlation degrees could simplify the structure of the prediction model, the predictive performance of the model was relatively poor because it could not reflect the differences among the factors.  On the basis of the same sample in this study, the methods in [8,9] were used to construct models for predicting the number of allocated containers for one voyage, which were denoted as BP and SVM, respectively. As illustrated in Figure 6, the GRA-SVM-Mixed model had more stable prediction results and more accurate predictions than did the BP and SVM models. The test indicators in Table 6 further confirm these findings. As presented in Table 6, all the three models could provide good prediction results and satisfactory MSE and  On the basis of the same sample in this study, the methods in [8,9] were used to construct models for predicting the number of allocated containers for one voyage, which were denoted as BP and SVM, respectively. As illustrated in Figure 6, the GRA-SVM-Mixed model had more stable prediction results and more accurate predictions than did the BP and SVM models. The test indicators in Table 6 further confirm these findings. As presented in Table 6, all the three models could provide good prediction results and satisfactory MSE and e MAPE values. The GRA-SVM-Mixed model exhibited higher predictive performance. Moreover, the GRA-SVM-Mixed model was determined to have significant advantages over the other two models from a timesaving perspective. and more accurate predictions than did the BP and SVM models. The test indicators in Table 6 further confirm these findings. As presented in Table 6, all the three models could provide good prediction results and satisfactory MSE and MAPE e values. The GRA-SVM-Mixed model exhibited higher predictive performance. Moreover, the GRA-SVM-Mixed model was determined to have significant advantages over the other two models from a timesaving perspective.    The maximum relative errors of the predictions of the three models were −13.6%, 2.53%, and 9.8%, and the minimum relative errors were −2.22%, −0.11%, and −0.97%. The predictive performance of a model can be expressed by the MSE and correlation coefficient R. As shown in Table 6, the MSE of the GRA-SVM-Mixed model was 197.6 and the R was 0.9993, which was closer to 1 compared with those of the other two models. This is because under small samples, the BP neural network model adopts empirical risk minimization, whereas the minimum expected risk cannot be guaranteed. Moreover, the BP neural network model can only guarantee convergence to a certain point in the optimization process and cannot derive a global optimal solution. By contrast, the SVM model adopts structural risk minimization and VC dimension theory, which not only minimizes the structural risk but also minimizes the boundary of the VC dimension under a small sample, effectively narrowing the confidence interval, thus achieving the minimum expected risk and improving the generalization ability and promotion ability of the model. The GRA-SVM-Mixed model applies parameter ρ to adjust the flexible use of radial basis and polynomial kernel functions in order to improve its robustness and generalization ability. In addition, the model applies gray correlation analysis for weighting input variables, strengthening the internal feature space structure and reflecting the differences among influencing factors. In this study, this model exhibited good performance in predicting the number of containers allocated to a container ship for one voyage.

Conclusions
This paper proposes a model for predicting the number of containers allocated to a container ship for one voyage. First, GRA theory is applied to determine the correlation between influencing factors and the forecasting sequence. Subsequently, different weights are allocated to each influencing factor to reflect their differences and highlight their internal characteristics. The weighted influencing factors serve as the input variables of the SVM prediction model, and a radial basis kernel function and polynomial kernel function are applied to improve the generalization ability and promotion ability of the SVM model. Finally, a GA is used to optimize the SVM parameters, and samples are trained using the optimized parameters to improve the predictive performance of the model. Simulations revealed that compared with an SVM model with a single kernel function and without gray correlation processing, the proposed model exhibited higher performance, with the minimum relative error rates being −0.11% and −0.97%, respectively. Additionally, compared with a BP neural network model, the GRA-SVM-Mixed model exhibited superior generalization ability, according to a relative error analysis. Accordingly, the proposed model provides an effective method for predicting the number of containers allocated to a container ship for one voyage.