Vessel Trajectory Prediction Model Based on AIS Sensor Data and Adaptive Chaos Di ﬀ erential Evolution Support Vector Regression (ACDE-SVR)

. Abstract: There are di ﬃ culties in obtaining accurate modeling of ship trajectories with traditional prediction methods. For example, neural networks are prone to falling into local optima and there are a small number of Automatic Identiﬁcation System (AIS) information samples regarding target ships acquired in real time at sea. In order to improve the accuracy of ship trajectory predictions and solve these problems, a trajectory prediction model based on support vector regression (SVR) is proposed. Ship speed, course, time stamp, longitude and latitude from AIS data were selected as sample features and the wavelet threshold de-noising method was used to process the ship position data. The adaptive chaos di ﬀ erential evolution (ACDE) algorithm was used to optimize the internal model parameters to improve convergence speed and prediction accuracy. AIS sensor data corresponding to a certain section of the Tianjin Port ships were selected, on which SVR, Recurrent Neural Network (RNN) and Back Propagation (BP) neural network model trajectory prediction simulations were carried out. A comparison of the results shows that the trajectory prediction model based on ACDE-SVR has higher and more stable prediction accuracy, requires less time and is simple, feasible and e ﬃ cient.


Introduction
With the rapid development of economic globalization, shipping has become the main form of international trade.As an important means of transportation at sea, ships have made tremendous contributions to economic development and social progress.The large-scale and diversified development of ships has also brought great security risks to maritime navigation.According to statistics from relevant departments [1], in recent years, more than 200 ships have been involved in shipwrecks at sea each year and more than half of the losses were caused by collisions.Investigation [1] of these marine collision accidents has found that the actual situations encountered during maritime navigation are often highly complicated.Especially in multi-ship collision avoidance, the ambiguity of a target ship's intentions and inability to make real-time and effective collision avoidance decisions are two of the largest factors contributing to the frequent occurrence of collision accidents at sea.Considering these factors, collision avoidance decisions cannot be based only on the current navigational information but should instead be integrated into the target ship's future navigational behavior for a certain period of time.This allows for collision avoidance decisions to be made ahead of time to some extent, which effectively improves the reliability of the decisions and reduces the collision risk.Therefore, a ship trajectory prediction model with high precision and real-time prediction capability is an urgent necessity.
Current trajectory prediction methods usually adopt the traditional Markov model [2][3][4][5], particle filter algorithm [6,7], simulated annealing algorithm [8] and Kalman filter algorithm [9].These methods often have the following shortcomings: First, the ship kinematic equations must be established and consideration of hydrological environmental factors such as wind and current greatly increases the modeling complexity and difficulty; Second, according to the needs of marine collision avoidance decision-making, trajectory prediction must often occur in real time.It is often difficult to establish real-time and accurate mathematical models, as many are only suitable for ideal research.In contrast to traditional modeling methods, with the rapid rise of the artificial intelligence era, more machine learning algorithms are being gradually applied to trajectory prediction, among which back propagation (BP) neural networks are common [10,11].However, these algorithms fall easily into local optima and a problem often arises where the amount of data is too small for accurate BP neural network training [12].As a result, the accuracy is insufficient for real-time data training and modeling at sea.
Considering these limitations, a vessel trajectory prediction model based on adaptive chaos differential evolution algorithm [13] support vector regression (ACDE-SVR) is proposed herein.The SVR model ensures high prediction accuracy despite the limited samples; this solve the problem of insufficient accuracy of models such as neural network by the real-time acquisition of AIS data for track prediction at sea.The prediction accuracy of the support vector regression (SVR) is closely related to the selection of parameters in the model.To further improve the prediction accuracy, the heuristic algorithm is used to optimize the parameters of the SVR model.The improved differential evolution (DE) algorithm-ACDE algorithm is used for parameter optimization.The speed, course, ship position and Unix time from AIS information are used as model sample characteristics and a multi-dimensional trajectory time series during maritime navigation is successfully predicted.The results provide a better understanding of ship dynamics and contribute to improved determination of collision avoidance decisions.

SVM System and SVR
SVM (Support Vector Machine) is a novel supervised learning algorithm that was first proposed by Cortes and Vapnik [14] in 1995.It is based on the statistical theory concept of the VC (Vapnik Chervonenkis) dimension and can improve the generalization performance of learning machines by seeking structural risk minimization [15].The VC-dimension of the set of indicator functions f (x, a), a ∈ Λ, is the maximum number of a set of vectors x 1 , x 2 , x 3 , ..., x h that can be separated into two classes in all 2 h possible ways using functions of the set.If for any n there exists a set of n vectors that can be shattered by the set of the set f (x, a), a ∈ Λ, then the VC dimension is equal to infinity [16].VC dimension measures the capacity of a hypothesis space.Capacity is a measure of complexity and measures the expressive power, richness or flexibility of a set of functions by assessing how wiggly its members can be.Notably, SVM can obtain good statistical laws under the premise of limited samples [17].SVM is divided into support vector classification (SVC) and SVR, which are used to solve data classification and data regression problems, respectively.This paper uses an SVR model for vessel trajectory prediction.
Given a set of sample data (x i , y i )| i = 1, 2, ..., n for prediction, where x i ∈ R n are the input variables and y i ∈ R are the output variables.The core of the SVR model is the kernel function.By introducing a kernel function, SVM subtly solves the inner product operation in high-dimensional feature space, which solves the problem of nonlinear classification and prediction.Commonly used kernel functions mainly include linear kernel functions, polynomial kernel functions, Radial Basis Function (RBF) and sigmoid kernel functions.Among them, RBF is the most widely used kernel function, which can realize nonlinear mapping [18,19].It has better performance for both large samples and small samples and has fewer parameters than polynomial kernel functions.In this study, RBF function is employed as the kernel function for the SVR model [20]: where K stands for kernel function; x and x i are the vectors in initial low-dimensional feature space and g is the kernel function parameter; ϕ(x i ) is the nonlinear mapping function, which can map the input data into the high dimensional feature space; The operator • is the inner symbol.Through RBF kernel function mapping, each sample point (x i , y i ) is fitted as closely as possible to the following SVR linear model: where ω is the weight vector, with ω ∈ R n ; and b is the bias, with b ∈ R.
SVR takes an ε-insensitive function as a loss function.First, a constant ε is defined for a specific problem, where ε > 0. For a sample That is, the ε-insensitive loss function can be written as: As shown in Figure 1, an interval band with f (x) as the center and a width of 2ε is established.If the training sample falls into the interval band, the prediction is considered correct.
Appl.Sci.2019, 9, x FOR PEER REVIEW 3 of 24 function, which can realize nonlinear mapping [18,19].It has better performance for both large samples and small samples and has fewer parameters than polynomial kernel functions.In this study, RBF function is employed as the kernel function for the SVR model [20]: where K stands for kernel function; x and i x are the vectors in initial low-dimensional feature space and g is the kernel function parameter; ( ) i ϕ x is the nonlinear mapping function, which can map the input data into the high dimensional feature space; The operator is the inner symbol.
Through RBF kernel function mapping, each sample point ( ) x y is fitted as closely as possible to the following SVR linear model: where ω is the weight vector, with n R ∈ ω ; and b is the bias, with b R ∈ .SVR takes an ε -insensitive function as a loss function.First, a constant ε is defined for a specific problem, where 0 ε > .For a sample ( ) ( ) , there is no loss; otherwise, the corresponding loss is ( ) That is, the ε -insensitive loss function can be written as: As shown in Figure 1, an interval band with ( ) In order to minimize the loss function, based on the criterion of structural risk minimization [21], determining the constraints ω and b and optimizing the objective function, the following optimization criterion is constructed: Figure 1.SVR schematic diagram.The area between the two dashed lines represents the ε-interval band; the predicted loss of samples falling into this area is not calculated.
In order to minimize the loss function, based on the criterion of structural risk minimization [21], determining the constraints ω and b and optimizing the objective function, the following optimization criterion is constructed: Appl.Sci.2019, 9, 2983 This is the convex quadratic programming problem of the original problem of SVR, where C is the penalty factor and C > 0, which aims to find a compromise between generalization performance and training error; ε is the maximum tolerance value; and ξ i and ξ * i are slack variables to avoid over-training.Combining these variables allows the model a certain degree of fault tolerance [22].The Lagrange function can be obtained by introducing Lagrange multipliers: where α ( * ) i and µ ( * ) i are Lagrange multipliers and 0 ≤ α i ≤ C. The partial derivatives of ω, b, α i and e i are solved separately.Letting the partial derivatives be equal to zero [23] yields: Substituting Equation ( 7) into Equation ( 6) gives the SVR dual problem as: The above process must satisfy KKT conditions [24], that is: Select the component α j of α ( * ) in the open interval (0, C) to calculate the parameter b: Finally, substitute the values of ω and b into the formula (2): which is called the decision function of SVR.

ACDE Algorithm
Differential evolution (DE) is a stochastic parallel optimization algorithm [25,26] based on population differences proposed by Storn and Price in 1997 [27] on the basis of evolutionary ideas such as the genetic algorithm.DE solves optimization problems by imitating the heuristic swarm intelligence generated by competition and cooperation among organisms [28]; the algorithm has strong usability, global optimization capability and stability [29] and is widely used in clustering optimization calculations, neural network optimizations, constrained optimization calculations and filter designs.However, the DE algorithm also has two defects: (1) the fixed scaling factor hinders the full search performance of the algorithm when performing mutation operations [30]; (2) in the later stage of optimization, the population diversity is reduced and the algorithm falls easily into local optima [31,32].Among them, the first defect has been basically solved by Brest et al. 13 years ago.Brest et al. [33].They assign a separate scaling factor to each individual, so that it can be automatically adjusted according to two thresholds during the running of the algorithm but this will inevitably have some randomness and blindness [34].To solve these problems, this study uses an improved DE algorithm, ACDE, which improves upon DE in two aspects.First, in the mutation operation stage, the adaptive scaling factor can be changed with the number of iterations, thus addressing the insufficient searching ability caused by the fixed scaling factor.In the standard differential evolution algorithm, increasing the scaling factor can expand the search range and speed up the individual update in the population.The reduction of the scaling factor can enhance the ability of local search and improve the convergence of the algorithm.Based on this, in the early stage of algorithm evolution, the search range can be expanded by increasing the scaling factor.In the later stage of algorithm evolution, the scaling factor is reduced to speed up the convergence of the algorithm.At the same time, in order to avoid the destruction of the genetic structure of the dominant individual, the scaling factor of the individual with higher fitness should be reduced; in order to promote the improvement of the structure of the disadvantaged individual, the scaling factor of the individual with lower fitness should be increased.Second, chaotic motion has the characteristics of sensitivity, randomness, regularity and ergodicity and it can traverse the entire space state without repeating within the specified range.Based on chaos theory, a chaotic fine search strategy is proposed to improve the local search efficiency of the algorithm in order to avoid premature population.
For any optimization problem, where N is the dimension of the solution space and L j and U j represent the upper and lower bounds of the range of x j values, respectively.The specific process of the ACDE algorithm is as follows: Step 1: Population initialization.D individuals randomly satisfying the constraint condition are generated in the N-dimensional solution space as the i-th individual of the 0-th generation population: x i (0) x j,i (0) ∈ L j,i , U j,i , i = 1, 2, 3, ..., D; j = 1, 2, 3, ..., N .At the same time, the maximum number of iterations is assumed to be M.The value of the j-th dimension of the i-th individual is calculated according to the following formula: where rand(0, 1) represents a random number between [0,1] and obeys a uniform distribution.
Step 2: Chaos initialization [13].Chaotic motion has the characteristics of randomness, regularity and ergodicity and can traverse the entire space state without repetition within a specified range.Generally speaking, the chaotic variable is first mapped to the value interval of the optimization variable and then the chaotic variable is used for optimization.Here, we use the logistic mapping commonly used by chaotic systems, with the basic expression as follows: where x t ∈ [0, 1], t is a natural number and η ∈ [0, 4]; when η= 4, the system is completely chaotic.
Step 3: Mutation.Different from the genetic algorithm, the DE algorithm realizes individual mutation by differential means.That is, two different individuals in the population are selected and the vector difference is scaled to perform vector synthesis with the individuals to be mutated [35].The k-generation population generated by k iterations is denoted as: Three individuals x i1 (k), x i2 (k) and x i3 (k) are randomly selected from the population and the mutation operations are performed according to the following equations: where F(k) is the adaptive scaling factor, which controls the influence of the difference vector and changes as the number of iterations k changes; i1, i2, i3 ∈ [1, D] and i1 i2 i3 i; F min and F max are the upper and lower limits of the scaling factor, respectively; and F i1 (k), F i2 (k) and F i3 (k) are the optimal, suboptimal and worst-case fitness, respectively, with random selection for the current k-th generation sub-population.Through the mutation operations, an intermediate individual is created: In the process of evolution, in order to ensure the validity of the solution, it is necessary to judge whether the "genes" in the "chromosome" satisfy the boundary conditions.If the boundary condition is not met, the "gene" is generated in the same way as the initial population, that is, it is regenerated by a random method.
Step 4: Crossover.The inter-individual crossover operation is performed on the g-generation population x i (k) and the intermediate individual v i (k + 1) produced by the mutation operations [36].The specific method is as follows: where cr is the probability of crossover and cr ∈ [0, 1]; and j rand is a random integer on the interval [1, 2, 3, ..., N].
Step 5: Selection.Using the principle of greedy choice, the better individuals [37] are selected to form the next generation population according to the following evaluation function: Step 6: Chaotic fine search.Letting η = 4, a chaotic fine search is performed on the decision variable x k j,i and x k j,i is mapped to a chaotic variable according to the following formula: The next generation iterative chaotic variables s k+1 j,i are calculated according to Equation ( 15) and the chaotic variables s k+1 j,i are converted into decision variables x k+1 j,i according to Equation (22).Determine whether the chaotic search has reached the maximum number of iterations, if not, continue to iterate.

ACDE-SVR Prediction Model
One important factor affecting the prediction accuracy of the SVR model is the selection of model parameters.Based on its strong stability, superior global optimization performance and higher convergence speed, this study uses the ACDE algorithm to optimize the parameters of the SVR trajectory prediction model with the penalty factor C, insensitive factor ε and kernel function parameter g.

Setting of Parameters in ACDE Algorithm
For the ACDE algorithm, the parameters should be reasonably set to obtain better parameter optimization results.The main parameters of the ACDE algorithm are scaling factor F, population size D, maximum iteration number M and crossover probability cr.Among them, the selection of scaling factor is introduced in Part 2.2, which is related to the number of iterations k.The bases for setting the remaining three parameters are as follows: (1) Population size D. According to experience, D should be 5 to 10 times the dimension of the problem but not less than 4; otherwise, the mutation operation will not be possible.Generally speaking, as D increases, the population diversity will gradually increase along with the probability of reaching an optimal solution.However, this will also greatly increase the calculation time.In general, D ranges from 20 to 50.
(2) Maximum number of iterations M.This parameter is the termination condition of the evolutionary operation.A larger M increases running time of the program and yields a more accurate optimal solution.However, when M is too large, the accuracy of the optimal solution will not increase correspondingly, because the population has reached an extremely single state at this time and the algorithm cannot jump out of the local optimal solution [39].
(3) Crossover probability cr.The value range of this parameter is [0, 1] and the value range of this parameter is (0.0, 0.2) when the objective function is separable [40].If cr is large, the population convergence is gradually accelerated and precocity is prone to occur.Step 1: The trajectory prediction sample is input after de-noising and the current number of iterations is set to k.The ACDE parameters including population size D, maximum number of iterations M, adaptive scaling factor F(k) and crossover probability cr are set according to the actual situation.

ACDE Algorithm Superiority Verification
Step 2: The optimal range of parameters (C, ε, g) is set and the population is initialized randomly and uniformly.
Step 3: The SVR algorithm is used to predict each individual.The MSE (mean square error) between the predicted value y i and real value Y i is calculated according to: The MSE is used as the fitness function [38].The fitness value, individual extremum of each individual Y individual (i), global extremum Y global and global extremum points X global are recorded.
Step 4: If the number of iterations is less than the maximum number of iterations, that is, k < M, the algorithm proceeds to the next step.Otherwise, it proceeds to step 7 and outputs the global extreme point as the searched optimal parameter, (C * , ε * , g * ).
Step 5: According to Equations ( 17)-( 22), mutation, crossover, selection and chaotic fine search operations are carried out to generate new populations.The fitness of individuals within the population, individual extremum Y individual (i), global extremum Y global and global extremum X global are calculated.
Step 6: Iterations are set as k = k + 1 and the algorithm returns to step 4.
Step 7: The SVR prediction model is built based on the calculated parameters.

Setting of Parameters in ACDE Algorithm
For the ACDE algorithm, the parameters should be reasonably set to obtain better parameter optimization results.The main parameters of the ACDE algorithm are scaling factor F, population size D, maximum iteration number M and crossover probability cr.Among them, the selection of scaling factor is introduced in Part 2.2, which is related to the number of iterations k.The bases for setting the remaining three parameters are as follows: (1) Population size D. According to experience, D should be 5 to 10 times the dimension of the problem but not less than 4; otherwise, the mutation operation will not be possible.Generally speaking, as D increases, the population diversity will gradually increase along with the probability of reaching an optimal solution.However, this will also greatly increase the calculation time.In general, D ranges from 20 to 50.
(2) Maximum number of iterations M.This parameter is the termination condition of the evolutionary operation.A larger M increases running time of the program and yields a more accurate optimal solution.However, when M is too large, the accuracy of the optimal solution will not increase correspondingly, because the population has reached an extremely single state at this time and the algorithm cannot jump out of the local optimal solution [39].
(3) Crossover probability cr.The value range of this parameter is [0, 1] and the value range of this parameter is (0.0, 0.2) when the objective function is separable [40].If cr is large, the population convergence is gradually accelerated and precocity is prone to occur.

ACDE Algorithm Superiority Verification
In order to verify the excellent global optimization performance of the proposed ACDE algorithm, which is compared with the current state-of-the-art differential evolution algorithm, the LSHADE algorithm [41].The ACDE algorithm and LSHADE algorithm are evaluated using a set of problems presented in the CEC2017 competition on single objective bound constrained real-parameter optimization.This benchmark contains 30 test functions with multiple characteristics.T is the dimensionality of the problem and the functions are tested on 50T.In short, functions 1-3 are unimodal, functions 4-10 are multimodal, functions 11-20 are hybrid functions and 21-30 are composition functions.More details can be found in Reference [42].
The parameters values of ACDE algorithm are set as follows: population size D of the ACDE algorithm set to 50, maximum iteration number M set to 10,000*T, the upper and lower limits of the scaling factor are 1.2 and 0.3 and crossover probability cr set to 0.9.While the parameters of the LAHADE algorithm are set as shown in Reference [41].
The two algorithms perform 51 simulation tests on 30 benchmark problems, respectively.The statistical results of the comparisons on the benchmarks with ACDE algorithm and LSHADE algorithm are summarized in Table 1.It includes the mean and the standard deviations of error.The best results are marked in bold for all problems.As can be seen from Table 1, the ACDE algorithm is significantly better or flatter than the LSHADE algorithm on 18 functions.The ACDE algorithm is less accurate than the LSHADE algorithm on 12 functions.In general, the optimization performance of the ACDE algorithm is slightly higher than the LSHADE algorithm.

AIS Sensor Navigation Trajectory Data Extraction and Separation
In contrast to actual maritime navigation, the Automatic Identification System (AIS) trajectory data of a target ship can be directly obtained as sample data.In the simulation experimental stage, the route data from within the massive AIS sensor data from a certain sea area must be extracted to screen out ship trajectories that meets the model requirements.The shore-based AIS has navigational data for all ships passing through the sea and a ship may travel back and forth several times between the starting and end points of a certain area.Thus, the route data are numerous and complicated and it is difficult to directly extract ship routes to create a sample set for model training.
The trajectories of different ships are distinguished according to their unique maritime mobile service identification (MMSI) numbers.According to reference [43], the time intervals between two adjacent data points of the same ship were preliminarily calculated and the route trajectories were identified by the speed and time interval.Data points with long time intervals and instantaneous velocities of 0 or close to 0 were selected as the starting points of the route.Considering the abnormal recording of time data, the starting point of the route was determined by combining the speed and change in ship position based on a time difference judgment.An AIS sequence is denoted as T = (t 1 , v 1 , λ 1 , ...), (t 2 , v 2 , λ 2 , ...), ..., (t n , v n , λ n , ...) , where t n denotes the time interval from the next point, v n denotes the speed of ship and λ n denotes the longitude of the ship.The specific algorithm for extracting trajectory data is as follows: Step 1: In sequence T, t i is traversed until data points t i and t j satisfy t i − t j ≥ 12h.Then, the i and j data points in the time series are set as sequence tangent points and the steps are repeated until all data points in the sequence are traversed.
Step 2: If there is an abnormality in the time data, this step is used for assistance.In sequence T, v i is traversed until the data point satisfies v i ≈ 0; this point is then set to the tangent point in the sequence.
Step 1 is then repeated until all data points in the sequence have been traversed.

AIS Sensor Data Cleaning
AIS sensor data unavoidably contains data anomalies, missing data and other issues that necessitate cleaning of the data after trajectory extraction.After trajectory extraction, the original cluttered AIS data are processed into trajectory data sets.In each trajectory data set, if the acquisition time of two data points is quite different, this indicates that data in the sequence are missing and must be interpolated.According to the experimental error results of linear interpolation, Lagrange interpolation, median interpolation and mean interpolation discussed in reference [42], the Lagrange interpolation method has the lowest interpolation error for missing data.Therefore, the Lagrange interpolation method was used to interpolate missing values of ship latitude, longitude, heading and speed.

Wavelet Threshold De-Noising
The AIS system obtains ship trajectory information by accessing GPS signals.Owing to the mutual interference and signal attenuation of maritime communication equipment, ship position information may also be affected by noise interference and data distortion.
For a long time, people often choose Fourier transform to process signals.However, because the function constructed by Fourier transform is periodic sine wave and cosine wave, it can only be applied to those signals with periodic or approximate periodicity but the effect is not very good on those signals with non-periodic or strong local characteristics.
The wavelet transform developed from Fourier transform can solve the above problems well.It not only retains many advantages of Fourier transform but also improves and develops on the original basis, so that it can process signals in time-frequency domain.The remarkable advantage of wavelet transform is that it can process the signal more subtly and show some characteristics of the signal better.It realizes the requirement of localization and multi-scale analysis of the signal in time-frequency domain.The signal de-noising method developed on the basis of wavelet shows good de-noising effect, which is the perfection and development of Fourier transform in the field of signal processing.Based on this, we choose the wavelet threshold de-noising method based on wavelet analysis theory.The advantage of this method is that the noise is almost completely suppressed and the characteristic peaks of the original signal are well preserved.Moreover, de-noising using soft threshold can minimize the maximum mean square error of the estimated signal, that is, the estimated signal after de-noising is the approximate optimal estimate of the original signal and the estimated signal is at least as smooth as the original signal without additional oscillation.In addition, the method is fast in calculation and has wide adaptability and is the most widely used one of many wavelet de-noising methods.
The main theoretical basis of wavelet threshold de-noising is as follows.The wavelet transform has a strong decorrelation, which concentrates the energy of the signal in some small wavelet coefficients in the wavelet domain, while the energy of the noise is distributed in the entire wavelet domain.Therefore, after the wavelet decomposition, the amplitude of the wavelet coefficient of the signal is greater than the amplitude of the coefficient of the noise.Wavelet coefficients with larger amplitudes can be considered to be generally dominated by signals, while coefficients with smaller amplitudes are largely noise.Thus, the threshold can be used to preserve the signal coefficients, while reducing the noise to almost zero.

Data Normalization
To accelerate model convergence and solve errors caused by large differences in magnitude between model samples, the sample data were linearly normalized by the following formula to the interval [0, 1]: where x is the sample data and x ∈ R n ; x is the normalized data and x ∈ [0, 1]; and x max and x min are the maximum and minimum values in the sample points, respectively.

Trajectory Prediction Model
When a ship is sailing, it mainly determines its navigational behavior by receiving the AIS data of a target ship to make timely and accurate collision avoidance decisions under harsh sea conditions and complicated encounter situations.In an actual voyage, the navigational behavior of different ships is mainly reflected in the time series changes in ship position, heading and speed.The navigational behavior of a ship at time t can be characterized as where lon t , lat t , C t , v t and T t represent the longitude, latitude, course, speed over ground and Unix time of the ship, respectively, at time t.Aiming at the SVM characteristics of multivariable input and single variable output, the ship longitude and latitude parameters should be separately optimized.That is to say, in order to separately predict the longitude and latitude of the ship characterizing the nature of the ship, two SVR prediction models need to be established.The input characteristics of the two models are the same and the output characteristics are different.Generally speaking, the future navigational behavior of a ship is often the result of the interaction between current and previous navigational behaviors.Therefore, in order to improve the accuracy of trajectory prediction, the navigation behavioral of the past four moments, N t−3 , N t−2 , N t−1 and N t and the time stamp of the next moment, t + 1, were used as input variables and the longitude and latitude of the next moment were used as the model output variables respectively: where N t−3 , N t−2 , N t−1 and N t represent the navigational behaviors at times t − 3, t − 2, t − 1 and t, respectively, from Equation (24).The model uses the root mean square error (RMSE), mean absolute error (MAE) and maximum absolute error (E MAX ) to measure the prediction accuracy, which are calculated as: where S represents the total sample size and y t and Y t represent the predicted and actual values, respectively.

Analysis of Experimental Process
This study used the libsvm toolbox developed by Professor Lin Chih-Jen [44] of Taiwan University for modeling and simulations.AIS trajectories were extracted from 2,346,643 sets of AIS data from Tianjin Port waters in March 2015 using the algorithm described in Section 2.1.A certain trajectory of the ship with an MMSI of 538,004,758 was randomly selected as the research object; the trajectory was cleaned and missing values were interpolated.To satisfy the collision avoidance requirements, it was found that there may have been a collision risk with the target ship during maritime navigation.The AIS data of the target ship acquired in real time were used as the sample data to establish a prediction model, for which there may be a problem related to the small number of training samples.To simulate accurately, a part of the trajectory data was selected as the sample and 226 sets of AIS data were selected as the sample data.Figure 3 is ship trajectory with sample data sets.Among these, 200 groups were used as the training set and 26 groups were used as the test set to carry out simulation experiments.The sample data structure is summarized in Table 2.
Appl.Sci.2019, 9, x FOR PEER REVIEW 13 of 24 This study used the libsvm toolbox developed by Professor Lin Chih-Jen [44] of Taiwan University for modeling and simulations.AIS trajectories were extracted from 2,346,643 sets of AIS data from Tianjin Port waters in March 2015 using the algorithm described in Section 2.1.A certain trajectory of the ship with an MMSI of 538,004,758 was randomly selected as the research object; the trajectory was cleaned and missing values were interpolated.To satisfy the collision avoidance requirements, it was found that there may have been a collision risk with the target ship during maritime navigation.The AIS data of the target ship acquired in real time were used as the sample data to establish a prediction model, for which there may be a problem related to the small number of training samples.To simulate accurately, a part of the trajectory data was selected as the sample and 226 sets of AIS data were selected as the sample data.Figure 3 is ship trajectory with sample data sets.Among these, 200 groups were used as the training set and 26 groups were used as the test set to carry out simulation experiments.The sample data structure is summarized in Table 2.The original sequence was constructed from 226 sets of ship position data from AIS information and de-noised by wavelet threshold.Owing to the strong continuity of the ship sequence, the sym8 wavelet basis function, with strong continuity and symmetry, was used for threshold de-noising.The number of noise reduction times was set to 3, the heuristic threshold function was used to select the threshold and the soft threshold function was used for wavelet de-noising.Figure 4 shows a noise sequence and Figure 5 is a comparison between original and de-noised sequences.As can be seen from Figure 5, the presence of noise causes a slight disturbance in the position signal.To verify the effectiveness of wavelet de-noising, we used ship position sequences with and without de-noising to predict ship trajectories and compared the results.The comparison shows that the data predicted with de-noising are more accurate.The original sequence was constructed from 226 sets of ship position data from AIS information and de-noised by wavelet threshold.Owing to the strong continuity of the ship sequence, the sym8 wavelet basis function, with strong continuity and symmetry, was used for threshold de-noising.The number of noise reduction times was set to 3, the heuristic threshold function was used to select the threshold and the soft threshold function was used for wavelet de-noising.Figure 4 shows a noise sequence and Figure 5 is a comparison between original and de-noised sequences.As can be seen from Figure 5, the presence of noise causes a slight disturbance in the position signal.To verify the effectiveness of wavelet de-noising, we used ship position sequences with and without de-noising to predict ship trajectories and compared the results.The comparison shows that the data predicted with de-noising are more accurate.To reduce the influence of the order of magnitude on prediction accuracy, the training set and test set data were normalized to the interval [0, 1] and the RBF was selected as the kernel function.The ACDE algorithm was used to optimize the parameters of the longitude and latitude prediction models and the optimal penalty factor C, insensitive factor ε and kernel function parameter g were searched.According to Section 2.4, the parameters of the ACDE algorithm were set as follows: population size D of the ACDE algorithm set to 30, maximum iteration number M set to 200, the upper and lower limits of the scaling factor are 1.2 and 0.3 and crossover probability cr set to 0.9.The  The original sequence was constructed from 226 sets of ship position data from AIS information and de-noised by wavelet threshold.Owing to the strong continuity of the ship sequence, the sym8 wavelet basis function, with strong continuity and symmetry, was used for threshold de-noising.The number of noise reduction times was set to 3, the heuristic threshold function was used to select the threshold and the soft threshold function was used for wavelet de-noising.Figure 4 shows a noise sequence and Figure 5 is a comparison between original and de-noised sequences.As can be seen from Figure 5, the presence of noise causes a slight disturbance in the position signal.To verify the effectiveness of wavelet de-noising, we used ship position sequences with and without de-noising to predict ship trajectories and compared the results.The comparison shows that the data predicted with de-noising are more accurate.To reduce the influence of the order of magnitude on prediction accuracy, the training set and test set data were normalized to the interval [0, 1] and the RBF was selected as the kernel function.The ACDE algorithm was used to optimize the parameters of the longitude and latitude prediction models and the optimal penalty factor C, insensitive factor ε and kernel function parameter g were searched.According to Section 2.4, the parameters of the ACDE algorithm were set as follows: population size D of the ACDE algorithm set to 30, maximum iteration number M set to 200, the upper and lower limits of the scaling factor are 1.2 and 0.3 and crossover probability cr set to 0.9.The To reduce the influence of the order of magnitude on prediction accuracy, the training set and test set data were normalized to the interval [0, 1] and the RBF was selected as the kernel function.The ACDE algorithm was used to optimize the parameters of the longitude and latitude prediction models and the optimal penalty factor C, insensitive factor ε and kernel function parameter g were searched.According to Section 2.4, the parameters of the ACDE algorithm were set as follows: population size D of the ACDE algorithm set to 30, maximum iteration number M set to 200, the upper and lower limits of the scaling factor are 1.2 and 0.3 and crossover probability cr set to 0.9.The optimization results are summarized in Table 3.As can be seen from Table 3, the parameters of the two models are different.As shown in Equation ( 24), the input variables of the two models are the same but the output variables are different and the difference of the output variables determines the difference between the optimized parameters of the two models.To verify the de-noising effect, trajectory sequences with and without de-noising were used for trajectory prediction according to the above prediction steps and the errors were compared.The results are summarized in Table 4.

Comparison of Prediction Results from Different Optimization Algorithms
To verify the superiority of the ACDE algorithm for optimization of SVR trajectory prediction model parameters, parameters determined by other optimization methods were used for comparison.Parameters optimized by the standard DE algorithm, PSO (particle swarm algorithm, grid search algorithm and GA (genetic algorithm) and the default parameters of the libsvm toolbox were used for trajectory prediction simulations.The predicted results were compared and analyzed and the error indicator values are given in Table 5.First, the prediction results of the ACDE-SVR model are compared with the standard DE-SVR model.The ACDE-SVR algorithm converges faster than the DE-SVR algorithm and each error index is smaller than that of the DE-SVR, which indicates that the ACDE-SVR trajectory prediction model has the advantages of faster convergence and higher prediction accuracy.In addition, the ship trajectory prediction models based on the parameters optimized by grid search and default toolbox values require less run time but have low prediction accuracies.Using parameters optimized by GA and PSO algorithms to build the model greatly improves the prediction accuracy but also increases the convergence time.The model based on the parameters optimized by the ACDE algorithm has the highest prediction accuracy, lowest run time and the best overall performance.2 respectively verify that wavelet threshold de-noising and ACDE algorithm optimization yield an SVR model with a higher prediction accuracy; thus, the model was used to predict the trajectory sample data.A comparison between predicted results and actual results is shown in Figure 6 and the longitude and latitude prediction errors are shown in Figure 7, where the error value refers to the absolute error, that is, the difference between the predicted value and actual value.It can be seen from Figures 6 and 7  2 respectively verify that wavelet threshold de-noising and ACDE algorithm optimization yield an SVR model with a higher prediction accuracy; thus, the model was used to predict the trajectory sample data.A comparison between predicted results and actual results is shown in Figure 6 and the longitude and latitude prediction errors are shown in Figure 7, where the error value refers to the absolute error, that is, the difference between the predicted value and actual value.It can be seen from Figures 6 and 7 that the results of trajectory prediction using the ACDE-SVR algorithm are basically consistent with the actual values.The magnitude of the longitude and latitude prediction error is 10 −5 and the maximum absolute values of 20 points longitude and latitude, respectively, are 3.475414 × 10 −5 ° (3.86 m) and 3.957327 × 10 −5 ° (4.394 m).This error is within the acceptable range and thus the prediction accuracy satisfies the requirements for making collision avoidance decisions during actual maritime navigation.To verify the prediction accuracy of the model, the BP neural network, recurrent neural network (RNN) and SVR prediction models were also compared.As the preferred neural network for processing time series data, RNN has good timing prediction performance [45][46][47].Because the initial weights and thresholds of each BP neural network layer are set randomly [12], it has some problems  2 respectively verify that wavelet threshold de-noising and ACDE algorithm optimization yield an SVR model with a higher prediction accuracy; thus, the model was used to predict the trajectory sample data.A comparison between predicted results and actual results is shown in Figure 6 and the longitude and latitude prediction errors are shown in Figure 7, where the error value refers to the absolute error, that is, the difference between the predicted value and actual value.It can be seen from Figures 6 and 7 that the results of trajectory prediction using the ACDE-SVR algorithm are basically consistent with the actual values.The magnitude of the longitude and latitude prediction error is 10 −5 and the maximum absolute values of 20 points longitude and latitude, respectively, are 3.475414 × 10 −5 ° (3.86 m) and 3.957327 × 10 −5 ° (4.394 m).This error is within the acceptable range and thus the prediction accuracy satisfies the requirements for making collision avoidance decisions during actual maritime navigation.To verify the prediction accuracy of the model, the BP neural network, recurrent neural network (RNN) and SVR prediction models were also compared.As the preferred neural network for processing time series data, RNN has good timing prediction performance [45][46][47].Because the initial weights and thresholds of each BP neural network layer are set randomly [12], it has some problems To verify the prediction accuracy of the model, the BP neural network, recurrent neural network (RNN) and SVR prediction models were also compared.As the preferred neural network for processing time series data, RNN has good timing prediction performance [45][46][47].Because the initial weights and thresholds of each BP neural network layer are set randomly [12], it has some problems such as a low prediction accuracy and ease of falling into local optima.To solve the influence of initial parameter settings on the BP neural network prediction model, the ACDE optimization algorithm was used to optimize the weights and thresholds of the BP neural network.In summary, the RNN trajectory prediction model and ACDE-BP neural network model were established under the premise of limited data samples and the prediction errors were calculated and compared with the ACDE-SVR prediction results as shown in Table 6.Under the premise of limited sample data, the ACDE-SVR trajectory prediction model has the highest training accuracy and the shortest run time.Therefore, the ACDE-SVR trajectory prediction model proposed in this paper has good prediction results in regard to both prediction accuracy and operation time.To further analyze the prediction errors of the different trajectory sequences, the analysis in Table 6 was carried out, which clearly shows that the prediction error of the a-trajectory is the largest.The MAE of latitude and longitude is 3.007 × 10 −4• (33.41 m), which is within the allowable range and can thus meet the collision avoidance requirement.The other trajectory errors are smaller than that of the a-trajectory and thus can also meet the collision avoidance requirement.In summary, the trajectory prediction model based on ACDE-SVR has high generalization capability and can be applied for collision avoidance.
prediction results as shown in Table 6.Under the premise of limited sample data, the ACDE-SVR trajectory prediction model has the highest training accuracy and the shortest run time.Therefore, the ACDE-SVR trajectory prediction model proposed in this paper has good prediction results in regard to both prediction accuracy and operation time.

Conclusions
This paper analyzes and summarizes the problems existing in current ship trajectory prediction methods.Considering the good global optimal fitting performance of SVR and its characteristics suitable for small sample training, an offline trajectory prediction method based on SVR was proposed.The speed, course, longitude, latitude and Unix time from AIS information were selected as the sample characteristic variables in the model to predict the time series of a ship's trajectory toward a better understanding of current and future trends of other ships and generating collision avoidance decisions with a certain foresight.However, the model proposed in this study is an offline model, which assumes that all samples are acquired at one time and cannot be modified once the model has been trained.AIS information is collected intermittently based on speed and heading.If newly acquired AIS data are significantly different from the original sample data, the prediction error is large.Therefore, establishing an incremental SVR-based trajectory prediction model is the next step that should be explored.

Conclusions
This paper analyzes and summarizes the problems existing in current ship trajectory prediction methods.Considering the good global optimal fitting performance of SVR and its characteristics suitable for small sample training, an offline trajectory prediction method based on SVR was proposed.The speed, course, longitude, latitude and Unix time from AIS information were selected as the sample characteristic variables in the model to predict the time series of a ship's trajectory toward a better understanding of current and future trends of other ships and generating collision avoidance decisions with a certain foresight.However, the model proposed in this study is an offline model, which assumes that all samples are acquired at one time and cannot be modified once the model has been trained.AIS information is collected intermittently based on speed and heading.If newly acquired AIS data are significantly different from the original sample data, the prediction error is large.Therefore, establishing an incremental SVR-based trajectory prediction model is the next step that should be explored.

Figure 1 .
Figure 1.SVR schematic diagram.The area between the two dashed lines represents the ε -interval band; the predicted loss of samples falling into this area is not calculated.

Figure 3 .
Figure 3. Ship trajectory with sample data sets.The purple line refers to the Automatic Identification System (AIS) trajectory of ship.The purple hollow circle refers to the trajectory point of the selected trajectory segment and the arrow indicates the ship's navigation direction.Input ti refers to the trajectory point of the current ti time and the past three times (ti−1, ti−2, ti−3) as the input data of the model and output ti refers to the trajectory of the next time ti+1 The point is used as the output data of the model; Input ti+1 refers to the track point of the ti+1 time and the past three times (ti, ti−1, ti−2) as the input data of the model and the output ti+1 means The track point of ti+2 at the next moment is used as the output data of the model.

Figure 3 .
Figure 3. Ship trajectory with sample data sets.The purple line refers to the Automatic Identification System (AIS) trajectory of ship.The purple hollow circle refers to the trajectory point of the selected trajectory segment and the arrow indicates the ship's navigation direction.Input t i refers to the trajectory point of the current t i time and the past three times (t i−1 , t i−2 , t i−3 ) as the input data of the model and output t i refers to the trajectory of the next time t i+1 The point is used as the output data of the model; Input t i+1 refers to the track point of the t i+1 time and the past three times (t i , t i−1 , t i−2 ) as the input data of the model and the output t i+1 means The track point of t i+2 at the next moment is used as the output data of the model.

Figure 5 .
Figure 5.Comparison of original and de-noised sequences.

Figure 5 .
Figure 5.Comparison of original and de-noised sequences.

Figure 5 .
Figure 5.Comparison of original and de-noised sequences.

24 Latitude 2 . 4 4. 3 . 3 .
that the results of trajectory prediction using the ACDE-SVR algorithm are basically consistent with the actual values.The magnitude of the longitude and latitude prediction error is 10 −5 and the maximum absolute values of 20 points longitude and latitude, respectively, are 3.475414 × 10 −5• (3.86 m) and 3.957327 × 10 −5• (4.394 m).This error is within the acceptable range and thus the prediction accuracy satisfies the requirements for making collision avoidance decisions during actual maritime navigation.Appl.Sci.2019, 9, x FOR PEER REVIEW 16 of 545720 × 10 −4 2.541201 × 10 −4 3.948753 × 10 −Comparison of Prediction Results between ACDE-SVR Model and Neural Network Prediction Model Sections 4.3.1 and 4.3.

Figure 6 .
Figure 6.Comparison between predicted and original data.

Figure 7 .
Figure 7. Error values of trajectory prediction model based on ACDE-SVR.(a) Longitudinal error values; (b) Latitudinal error values.

Figure 6 .
Figure 6.Comparison between predicted and original data.

Figure 6 .
Figure 6.Comparison between predicted and original data.

Figure 7 .
Figure 7. Error values of trajectory prediction model based on ACDE-SVR.(a) Longitudinal error values; (b) Latitudinal error values.

Figure 7 .
Figure 7. Error values of trajectory prediction model based on ACDE-SVR.(a) Longitudinal error values; (b) Latitudinal error values.

Figure 8 .Figure 8 .Figure 9 .
Figure 8. Prediction results of a-trajectory using the ACDE-SVR model.(a) Comparison of predicted and actual values of a-trajectory; (b) Longitude and latitude absolute error values of a-trajectory.

Figure 9 .
Figure 9. Prediction results of b-trajectory using the ACDE-SVR model.(a) Comparison of predicted and actual values of b-trajectory; (b) Longitude and latitude absolute error values of b-trajectory.

Figure 10 .
Figure 10.Prediction results of c-trajectory using the ACDE-SVR model.(a) Comparison of predicted and actual values of c-trajectory; (b) Longitude and latitude absolute error values of c-trajectory.

Figure 10 .Figure 10 .Figure 11 .
Figure 10.Prediction results of c-trajectory using the ACDE-SVR model.(a) Comparison of predicted and actual values of c-trajectory; (b) Longitude and latitude absolute error values of c-trajectory.

Figure 11 .Figure 12 .Figure 12 .Figure 12 .
Figure 11.Prediction results of d-trajectory using the ACDE-SVR model.(a) Comparison of predicted and actual values of d-trajectory; (b) Longitude and latitude absolute error values of d-trajectory.

Figure 13 .
Figure 13.Prediction results of f-trajectory using the ACDE-SVR model.(a) Comparison of predicted and actual values of f-trajectory; (b) Longitude and latitude absolute error values of f-trajectory.

Table 1 .
Performance comparison of ACDE and LSHADE algorithms of the 50T benchmark.

Table 2 .
Sample data structure.

Table 2 .
Sample data structure.

Table 3 .
Parameter optimization results of the ACDE algorithm.

Table 4 .
Comparison of trajectory sequence prediction errors with and without de-noising.

Table 5 .
Influence of different parameter optimization methods on prediction accuracy of SVR model.

Table 6 .
Comparison of prediction error indicators between three trajectory prediction models.

Table 7 .
Comparison of prediction results for different ship trajectory sequences.

Table 7 .
Comparison of prediction results for different ship trajectory sequences.