A Multi-Feature Ensemble Learning Classiﬁcation Method for Ship Classiﬁcation with Space-Based AIS Data

: AIS (Automatic Identiﬁcation System) is an effective navigation aid system aimed to realize ship monitoring and collision avoidance. Space-based AIS data, which are received by satellites, have become a popular and promising approach for providing ship information around the world. To recognize the types of ships from the massive space-based AIS data, we propose a multi-feature ensemble learning classiﬁcation model (MFELCM). The method consists of three steps. Firstly, the static and dynamic information of the original data is preprocessed and features are then extracted in order to obtain static feature samples, dynamic feature distribution samples, time-series samples, and time-series feature samples. Secondly, four base classiﬁers, namely Random Forest, 1D-CNN (one-dimensional convolutional neural network), Bi-GRU (bidirectional gated recurrent unit), and XGBoost (extreme gradient boosting), are trained by the above four types of samples, respectively. Finally, the base classiﬁers are integrated by another Random Forest, and the ﬁnal ship classiﬁcation is outputted. In this paper, we use the global space-based AIS data of passenger ships, cargo ships, ﬁshing boats, and tankers. The model gets a total accuracy of 0.9010 and an F1 score of 0.9019. The experiments prove that MFELCM is better than the base classiﬁers. In addition, MFELCM can achieve near real-time online classiﬁcation, which has important applications in ship behavior anomaly detection and maritime supervision.


Introduction
Maritime transportation represents approximately 90% of global trade by volume [1], and more than 50,000 ships are sailing in the ocean every day [2]. As the number of ships continues to grow, the safety of maritime traffic is becoming an increasingly important issue. To strengthen maritime traffic supervision, the International Maritime Organization (IMO) has required the Automatic Identification System (AIS) to be fitted to all Class A ships [3]. AIS is a new type of navigation aid system that is used to achieve identification, positioning, and collision avoidance among ships, and has 27 types of messages (from message1 to message27) covering dynamic information, static information, voyage information, and safety information [4]. The traditional shore-based AIS covers about 40 nautical miles and the inter-ship communication range is about 20 nautical miles [5]. To achieve global coverage of AIS data, AIS receivers have been put onto satellites, creating space-based AIS.
Among all of the messages in AIS data, message5 contains the ship type field, but the type file can be missing, either because it has been set to the default value or filled in incorrectly. There are many ships with unknown types in space-based AIS data. For instance, after matching the type field in the static data (i.e., message5, which contains the static information of ships) received by four satellites (i.e., the ocean satellites HY1C/D and HY2B/C) with the dynamic data received by HY-2B (i.e., message1, which contains the dynamic information of ships) from 1 November 2019 to 21 April 2020, approximately 33% of the ships in message1 have unknown types, the distribution of which are shown 2 of 31 in Figure 1. Furthermore, these ships broadcast few message5, which makes it difficult to identify their types through message5. Except for the problem of the types of ship being unknown, the types of ships may be mislabeled for various reasons [6,7], which often relate to violations, such as smuggling and illegal fishing [8]. These security and law enforcement issues put forward higher requirements for maritime traffic supervision. If ship types can be obtained from historical AIS data, the corresponding prior knowledge of a certain type of ship can be used for maritime traffic management. Thus, accurate identification of ship type is helpful in enhancing the maritime situational awareness of the related departments and is of great value in various areas, such as maritime surveillance, camouflage identification, ship behavioral pattern mining, and anomaly detection.
Appl. Sci. 2021, 11, x FOR PEER REVIEW 2 of 35 the dynamic information of ships) from 1 November 2019 to 21 April 2020, approximately 33% of the ships in message1 have unknown types, the distribution of which are shown in Figure 1. Furthermore, these ships broadcast few message5, which makes it difficult to identify their types through message5. Except for the problem of the types of ship being unknown, the types of ships may be mislabeled for various reasons [6,7], which often relate to violations, such as smuggling and illegal fishing [8]. These security and law enforcement issues put forward higher requirements for maritime traffic supervision. If ship types can be obtained from historical AIS data, the corresponding prior knowledge of a certain type of ship can be used for maritime traffic management. Thus, accurate identification of ship type is helpful in enhancing the maritime situational awareness of the related departments and is of great value in various areas, such as maritime surveillance, camouflage identification, ship behavioral pattern mining, and anomaly detection. Existing ship classification methods, based on AIS data, generally consider static and dynamic information.
For static information, Damastuti et al. [9] use KNN (K-NearestNeighbor) to classify ships based on tonnage, length, and width in Indonesian waters and achieved an accuracy of up to 0.83 on six categories. Zhong et al. [10] use Random Forest for static information and achieve an accuracy of 0.865 on a three-classification task.
For dynamic information, Hong et al. [11] study the ships near the Ieodo Marine Research Station, and infer ship types by comparing the flag state of ships and the distribution of their corresponding trajectories with those of type-unknown ships. David et al. [12] use decision trees to identify fishing boats and achieve an accuracy of 0.8 and an F1 score of 0.7. Moreover, through conducting comparative experiments, they attempt to extract motion features as possible data. Sheng et al. [13] extract the COG (course over ground), ROT (rate of turn), and global features of vessels in the sea near Shantou, China, and use logistic regression to distinguish fishing and cargo vessels, achieving an accuracy of 0.923. Liang et al. [14] propose a multi-view feature fusion network that combines the CAE (convolutional auto-encoder) and the Bi-GRU (bidirectional gated recurrent unit) network to classify ships. They realize an accuracy of 95.51% and 94.24% in Luotou Channel and Qiongzhou Strait, respectively. Ginoulhac et al. [15] extract the statistical features from each temporal variable of AIS data, and the features are then input into a Gradient Boosting classifier; their method has an accuracy of 0.86. Xiang et al. [16] use p-GRUs (Partition-wise Gated Recurrent Units) to achieve the recognition of trawlers with an accuracy of 0.89. Another similar approach is presented in [17], which uses RNN Existing ship classification methods, based on AIS data, generally consider static and dynamic information.
For static information, Damastuti et al. [9] use KNN (K-NearestNeighbor) to classify ships based on tonnage, length, and width in Indonesian waters and achieved an accuracy of up to 0.83 on six categories. Zhong et al. [10] use Random Forest for static information and achieve an accuracy of 0.865 on a three-classification task.
For dynamic information, Hong et al. [11] study the ships near the Ieodo Marine Research Station, and infer ship types by comparing the flag state of ships and the distribution of their corresponding trajectories with those of type-unknown ships. David et al. [12] use decision trees to identify fishing boats and achieve an accuracy of 0.8 and an F1 score of 0.7. Moreover, through conducting comparative experiments, they attempt to extract motion features as possible data. Sheng et al. [13] extract the COG (course over ground), ROT (rate of turn), and global features of vessels in the sea near Shantou, China, and use logistic regression to distinguish fishing and cargo vessels, achieving an accuracy of 0.923. Liang et al. [14] propose a multi-view feature fusion network that combines the CAE (convolutional auto-encoder) and the Bi-GRU (bidirectional gated recurrent unit) network to classify ships. They realize an accuracy of 95.51% and 94.24% in Luotou Channel and Qiongzhou Strait, respectively. Ginoulhac et al. [15] extract the statistical features from each temporal variable of AIS data, and the features are then input into a Gradient Boosting classifier; their method has an accuracy of 0.86. Xiang et al. [16] use p-GRUs (Partition-wise Gated Recurrent Units) to achieve the recognition of trawlers with an accuracy of 0.89. Another similar approach is presented in [17], which uses RNN (recurrent neural network) to classify five types of ships, with an accuracy of 0.783. In addition, some methods combine dynamic and static features. Kraus et al. [18] extract geographical distribution features, motion features, time of start/stop, and the static shape features of vessels from AIS data from German Bight and achieve an accuracy of 0.9751 on a five-classification task-however, the method has a data leakage problem. Kim et al. [19] integrate the vessel's course change, speed, and environmental information (i.e., tide, light, and water temperature) to identify six types of fishing vessel activities in the waters around Jeju Island, achieving an accuracy of 0.963.
However, there still exists some disadvantages in the methods mentioned above, and they are summarized as follows.

•
In some studies, only dynamic or static features are used for ship classification. As such, the utilization of multiple characteristics of ships is lacking, and the dynamic features are mainly set manually and empirically.

•
Most of the existing studies use shore-based data which is usually distributed in a small area, for which the ship trajectories and motion features are restricted. For example, in inland rivers or ports, the ships' position, speed, and direction are subject to limitations associated with the navigation channels, leading to an insufficient generalization ability of the classifier. Moreover, there is a lack of methods applicable to worldwide ship classification.

•
The characteristics of space-based AIS data are different from those of shore-based data. Due to the limited number of satellites and the AIS signal conflict, global realtime coverage of AIS cannot currently be achieved. The continuity of space-based AIS data is weak, and there are few long-term ship trajectories with high continuity. The existing ship classification method may not be suitable for space-based AIS data.

•
The classification number of ships in some researches is few, the differences between ships of different types are obvious, and the binary classification methods have limited application value.

•
When splitting the sub-trajectories set, some researchers do not specifically distinguish the sources of sub-trajectories, which causes data from the same ship to appear in the training sets, validation sets, and testing sets. This data leakage problem will lead to the performance of the classifiers being overestimated.
To solve the problems outlined above, this paper proposes a multi-feature ensemble learning classification method (MFELCM) that integrates ships' static and dynamic information. The method applies to the global range of satellite-based AIS data. The detailed process of MFELCM is shown in Figure 2, which consists of three steps. In the first step, the original data are preprocessed, and the cleaned static and dynamic data are then converted into static feature samples, dynamic feature distribution samples, time-series samples, and time-series feature samples. In the second step, four base classifiers, namely Random Forest [20], 1D-CNN (one-dimensional convolutional neural network) [21], Bi-GRU, and XGBoost (extreme gradient boosting) [22], are trained by the samples above. In the third step, another Random Forest is applied in order to integrate the base classifiers as MFELCM. The main contributions of this paper are as follows.

•
A multiple-perspectives method of ship feature description is proposed in order to extract the dynamic and static features of ships from space-based AIS data.

•
We propose the method to segment trajectories and split the data set by MMSI (Maritime Mobile Service Identity). The latter avoids the data leakage problem during the classifier training process.

•
The proposed MFELCM, fusing the static and dynamic information, is suitable for the global wide space-based AIS data, which can update the type prediction with the continuous input of AIS data and achieve near real-time online classification. MFELCM can be applied in detecting the abnormal behaviors of ships and, thus, can enhance the capability of maritime supervision.

•
The model parameters of MFELCM are determined by experiments, and it is verified that MFELCM outperforms the base classifiers. Moreover, when there are insufficient samples for a certain base classifier (e.g., dynamic feature distribution samples), the degraded MFELCM, integrated with the remaining base classifiers, can also achieve acceptable classification accuracy, which extends the application scope of MFELLCM.
data, including data pre-processing, data volume, and ship type distribution. Section 3 illustrates the detailed implementation of MFELCM, including static and dynamic features extraction, samples construction for different base classifiers, data set splitting, and the implementation of base classifiers. In Section 4, MFELCM is applied to the real AIS data, and the performance of the model is evaluated. In addition, we discuss the effectiveness of degraded MFELCM without a certain base classifier. Section 5 concludes the full paper and presents an expectation for future work.

Data Preprocessing
Some fields in AIS messages are key information for classification. Considering that the dynamic features (from message1) should fully reflect the kinematic information of the ship at a specific moment, and the static features (from message5) should reflect the ship's dimensions, draft, and type, we filter the key fields, as listed in Table 1, from the AIS message for ship classification. MMSI is the unique identification of a ship. The Time field in message1 is the time flag that is automatically injected by the space-based AIS receiver every minute [23], and it is accurate to a minute. Time Stamp is the UTC second when the AIS message is broadcasted. The exact time when the AIS message is sent can be obtained by combining the Time and Time Stamp. A, B, C, and D reflect the overall dimensions of the ship, which, respectively, represent the distances from the reference point O to the bow, stern, port side, and starboard of the ship, as shown in Figure 3. The ship length and width are calculated by Equation (1). The rest of this paper is organized as follows. In Section 2, the data from the ocean satellite HY-2B is taken as an example to present a basic introduction of space-based AIS data, including data pre-processing, data volume, and ship type distribution. Section 3 illustrates the detailed implementation of MFELCM, including static and dynamic features extraction, samples construction for different base classifiers, data set splitting, and the implementation of base classifiers. In Section 4, MFELCM is applied to the real AIS data, and the performance of the model is evaluated. In addition, we discuss the effectiveness of degraded MFELCM without a certain base classifier. Section 5 concludes the full paper and presents an expectation for future work.

Data Preprocessing
Some fields in AIS messages are key information for classification. Considering that the dynamic features (from message1) should fully reflect the kinematic information of the ship at a specific moment, and the static features (from message5) should reflect the ship's dimensions, draft, and type, we filter the key fields, as listed in Table 1, from the AIS message for ship classification. MMSI is the unique identification of a ship. The Time field in message1 is the time flag that is automatically injected by the space-based AIS receiver every minute [23], and it is accurate to a minute. Time Stamp is the UTC second when the AIS message is broadcasted. The exact time when the AIS message is sent can be obtained by combining the Time and Time Stamp. A, B, C, and D reflect the overall dimensions of the ship, which, respectively, represent the distances from the reference point O to the bow, stern, port side, and starboard of the ship, as shown in Figure 3. The ship length and width are calculated by Equation (1).
Raw AIS data may contain bad data, duplicate data, and missing data. In addition, the data format may be not convenient to analyze [24,25]. We perform the following data preprocessing operations on the fields in Table 1.

•
If the field does not conform to the standards in [4], the message to which the field belongs is defined as an error message and should be removed.  Raw AIS data may contain bad data, duplicate data, and missing data. In addition, the data format may be not convenient to analyze [24,25]. We perform the following data preprocessing operations on the fields in Table 1.

•
If the field does not conform to the standards in [4], the message to which the field belongs is defined as an error message and should be removed. Equation (2) defines the preprocessed static and dynamic data. For a ship whose MMSI is i m , d i m j is the jth dynamic data of this ship in time order, and s i m is the static data of this ship. Moreover, Equation (3) defines the trajectory of this ship as a time series T i m .

Data Volume and Ship Distribution
The dynamic data used in this paper are extracted from message1 (noted as DYM1) received by the ocean satellite HY-2B from 1 November 2019 to 21 April 2020, and the ship distribution is shown in Figure 4. The static data are extracted from message5 (noted as STM5) received by ocean satellite HY-1C/D and HY-2B/C. The detailed information of DYM1 and STM5 is shown in Table 2.
In the literature [4], the value in the Type field of passenger ships, cargo ships, tankers, fishing boats, and tugs are 60 to 69, 70 to 79, 80 to 89, 30, and 52, respectively. The ships with code 0, 90, and a code larger than 99 have no specific type definition, and these ships are not considered in ship type statistics of space-based AIS data. Figure 5a,c show the number and cumulative percentage of the top 20 types of vessels in DYM1, counted by message quantity and ship (MMSI) quantity, which account for 96.62% and 95.37% of message1. In Figure 5b,d, four major categories of ships (i.e., passenger ships, tankers, fishing boats, and cargo ships) are used for type statistics, which account for 90.65% and 88.98% of message1 by message quantity and ship quantity. In this paper, these four kinds of ships are selected as the research object, and their global distribution is shown in Figure 4.

Data Volume and Ship Distribution
The dynamic data used in this paper are extracted from message1 (noted as DYM1) received by the ocean satellite HY-2B from 1 November 2019 to 21 April 2020, and the ship distribution is shown in Figure 4. The static data are extracted from message5 (noted as STM5) received by ocean satellite HY-1C/D and HY-2B/C. The detailed information of DYM1 and STM5 is shown in Table 2.  In the literature [4], the value in the Type field of passenger ships, cargo ships, tankers, fishing boats, and tugs are 60 to 69, 70 to 79, 80 to 89, 30, and 52, respectively. The ships with code 0, 90, and a code larger than 99 have no specific type definition, and these ships are not considered in ship type statistics of space-based AIS data. Figure 5a,c show the number and cumulative percentage of the top 20 types of vessels in DYM1, counted by message quantity and ship (MMSI) quantity, which account for 96.62% and 95.37% of message1. In Figure 5b,d, four major categories of ships (i.e., passenger ships, tankers, fishing boats, and cargo ships) are used for type statistics, which account for 90.65% and 88.98% of message1 by message quantity and ship quantity. In this paper, these four kinds of ships are selected as the research object, and their global distribution is shown in Figure 4.

Methodology
MFELCM integrates dynamic and static features of AIS data for ship classification. To realize MFELCM, the static feature dataset SF and the dynamic feature datasets, i.e., DFD (dynamic feature distribution dataset), TS (time-series dataset), and TSF (time-series feature dataset) are firstly constructed. The four base classifiers (i.e., Random Forest, 1D-CNN, Bi-GRU, and XGBoost) are then trained by SF, DFD, TS, and TSF, respectively. Finally, the MFELCM model is obtained by integrating the output of base classifiers using another Random Forest.

Static Feature Samples Construction
MFELCM integrates dynamic and static features of AIS data for ship classification. To realize MFELCM, the static feature dataset SF and the dynamic feature datasets, i.e., DFD (dynamic feature distribution dataset), TS (time-series dataset), and TSF (time-series feature dataset) are as follows.
For the static data s i m , five features, i.e., ship length, ship width, aspect ratio (ldivw), area, and ship girth, are added into s i m according to Equations (1) and (4). The static data s i m is redefined as Equation (5). In addition, the missing features in s i are filled with 0.
   ldivw = length/width area = length * width grith = length + width As ship static information can be entered artificially, we should remove the unreasonable data before training classifiers. The distribution of static data is variable between the different types of ships, which makes it difficult to identify unreasonable data by a uniform standard for all categories. To filter the outliers, we first calculate the upper quartile (Q u ), the lower quartile (Q l ), and the interquartile (IQR) of a certain type of ship (e.g., passenger ships). If a feature in s i m (e.g., s i m belongs to a passenger ship) is outside [Q l − 3IQR, Q u + 3IQR], then s i m is recognized as an outlier and should be removed. We use this approach because it has no mandatory requirements on data distribution and is robust to outlier identification. A violin plot is the combination of box plot and KDE (kernel density estimation), and it can show the distribution of the variables. To illustrate the changes in the static features before and after removing outliers, we use a violin plot to visualize the static data, as shown in Figure 6. It should be noted that the data in Figure 6 is the static data after being standardized in accordance with the whole data set. Figure 6a,b are the distribution of original static data and the data having removed outliers of passenger ships, respectively. Figure 6c,d are the distribution of original static data and the data having removed outliers of four types of ships. Take Figure 6a as an example: the green part inside the red rectangle reflects part of the probability density function (PDF) of feature A, which takes the maximum value near the value where standardized A takes −2. The black part inside the blue rectangle represents the potential outliers judged by feature ldivw; the more the data is biased to both ends of the feature (i.e., ldivd) value, the more likely it is to be an outlier. By comparing Figure 6a-d, some obvious outliers are removed effectively.
So far, we have obtained the static feature sample s i m . Let the set of all static feature samples be the static dataset SF, which is defined as Equation (6)

Dynamic Feature Samples Construction
For d i m j+1 in T i m , add the features in Equation (7) into d i m j+1 , then d i m j is redefined as Equation (8). For d i 0 in T i , the supplementary features of d i m 0 defined by Equation (7) take the same value as those in d i m 0 except that δt i m 0 , δlng i m 0 , δlat i m 0 , δCOG i m 0 , and δSOG i m 0 take the value zero.
We add the features above into d i j+1 for the following reasons. The time interval δt is associated with ships' motion state [4] in Table A1. Sang et al. [26] and Kim et al. [19] pointed out that AIS equipment installed on different types of ships is of various cost and performance (e.g., fishing boats tend to install AIS equipment with low cost and accuracy), which may lead to the deviation of COG, SOG, ROT and other kinematic information. Considering the situation mentioned above, we calculate ROT , accelerate , speedlng , speedlat , and speed . Although the supplementary dynamic features may be redundant for ship motion state description, they can improve the anti-noise capability of the classifier.
Appl. Sci. 2021, 11, x FOR PEER REVIEW 9 of 35 the feature (i.e., ldivd) value, the more likely it is to be an outlier. By comparing Figure  6a-d, some obvious outliers are removed effectively.  As shown in Figure 7, we describe the trajectory T i m from three aspects, which are DFD i m , TS i m , and TSF i m . DFD i m is the distribution of the dynamic feature of T i m , which can reflect the overall motion characteristics of T i m . TS i m is the set of sub-trajectories (e.g., TS i m n ) obtained from T i m , which reflects the short-term time series characteristics of n , e.g., x 1 can be the average longitude of TS i m n . TS i m and TSF i m are defined as Equations (9) and (10).  DFD , which is defined by Equation (11).
Limited by the number of satellites, signal conflicts, or AIS receiver performance [27,28], the dynamic data

Dynamic Feature Distribution Samples Dataset (DFD)
Let DFD i m be the dynamic feature distribution of a ship whose MMSI is i m , then DFD is the set of DFD i m , which is defined by Equation (11).
Limited by the number of satellites, signal conflicts, or AIS receiver performance [27,28], the dynamic data d i m j received by satellite is insufficient to describe T i m completely. But in a longer period, for a trajectory T i m , the feature distribution function of d i m j can describe the overall motion of the ship (e.g., the distribution of latitude and longitude in Figure 7 describes the area of the ship's activity). In addition, this description can reduce the impact of outliers. For the massive amount of space-based AIS data, it is impractical to calculate the feature distribution function for each T i m , so we use the frequency histogram to approximate the distribution function, as shown in Figure 8. For feature f of trajectory T i m , the range of f on the dataset is sliced uniformly into n intervals, on which the frequency distribution of f (i.e., f d ) is calculated. The matrix DFD i m in Figure    TS is defined as Equation (12) TSF TSF TSF  TSF   TSF  TSF TSF  TSF   TSF x

Time-Series Samples Dataset (TS)
TS is defined as Equation (12), where TS i m denotes the set of all sub-trajectories of T i m , as shown in Figure 7. Space-based AIS data has weak data continuity, and there are data points between which the time interval is large in T i m , which cannot reflect the ship movement correctly. As such, we break T i m into a series of sub-trajectories TS i m n . For T i m , the steps for constructing TS i m are as follows. 1.
Calculate the upper quartile Q δt u , the lower quartile Q δt l , and the quartile distance IQR δt of the field δt on the whole dynamic dataset.

2.
Traverse the points on T i m in time order. Apply the second step on all trajectories, and then we obtain TS.

Time-Series Feature Samples Dataset (TSF)
Denote TSF i m as all feature vectors of sub-trajectories extracted from T i m , and then the time-series feature samples dataset TSF is defined by Equation (13), as shown in Figure 7. Since it demands professional knowledge to point out the relationship between motion characteristics and the type of ships, we use a python toolkit named tsfresh (Time Series Feature extraction based on scalable hypothesis tests) [29]

Dataset Segmentation
After obtaining the static dataset SF and the dynamic datasets (i.e., DFD, TS, and TSF), the four datasets are split into training, validation, and testing sets. In the dynamic dataset, there is a correlation between the samples generated from the dynamic data of the same ship (e.g., [x 1 , x 2 , . . . , x k ] i m 1 T may be similar to [x 1 , x 2 , . . . , x k ] i m n T ). If we randomly split the dynamic datasets, samples from the same ship can simultaneously appear in training, validation, and testing sets, which will cause data leakage and the overestimation of the classifiers' performance. In this paper, the datasets are split by the MMSI. Taking TSF as an Appl. Sci. 2021, 11, 10336 12 of 31 example, we divide TSF at the level of TSF i m rather than [x 1 , x 2 , . . . , x k ] i m n T , i.e., once the TSF i m is assigned to the training set, the feature vectors belonging to T i m can only appear in the training set.
Ideally, the trajectory T i m and the static feature sample s i m are one-to-one correspondence. Due to the processing of removing outliers from static data in Section 3.1, there may be no s i m corresponding to T i m . Furthermore, the dynamic data used in this paper is only a part of the whole dataset, which may result in no T i m corresponding to s i m . To solve this problem, the MMSI in AIS data is divided into three parts, which are the MMSI that only appears in dynamic data (MMSI_D), the MMSI that only appears in static data (MMSI_S), and the MMSI which exists in both dynamic data and static data (MMSI_C), as shown in Figure 9. The MMSI_C is then divided into the MMSI training set (MMSI_TR), MMSI validation set (MMSI_V), and MMSI testing set (MMSI_T) by the stratified sampling of different types of ships. The static data, which is the MMSI in MMSI_S and MMSI_TR, forms the training set of static feature samples. The dynamic data, which is the MMSI in MMSI_D and MMSI_TR, forms the training set of dynamic feature samples. The static data, which is the MMSI in MMSI_V and MMSI_T, forms the validation set and the testing set of static feature samples, respectively. The dynamic data, which is the MMSI in MMSI_V and MMSI_T, forms the validation set and the testing set of dynamic feature samples, respectively. data of the same ship (e.g., 1  ). If we randomly split the dynamic datasets, samples from the same ship can simultaneously appear in training, validation, and testing sets, which will cause data leakage and the overestimation of the classifiers' performance. In this paper, the datasets are split by the MMSI. Taking TSF as an example, we divide TSF at the level of m i TSF rather than   For a trajectory T i m , if i m in MMSI_C, we can generate one DFD i m , n TS i m n , and n TSF i m n , which corresponds to one s i m . The number of static samples and dynamic samples with the same MMSI is different. To solve this problem, we copy DFD i m and s i m n times in DFD and SF, respectively.

Implementation of Random Forest
We use Random Forest to classify the static feature dataset SF. It is an integrated learning algorithm that is based on decision trees, which has the advantages of low bias, low variance, and high generalization ability. The method to create Random Forest is as follows.

1.
Select n samples from the training set randomly, which are used to create a decision tree. The decision tree is trained by the CART (Classification and Regression Tree) algorithm. In each node of a decision tree, m features of samples are randomly selected as an alternative set (AS) for node splitting. The feature k in AS and its threshold t k according to which the samples in this node are split into the left and the right nodes are then determined by minimizing the cost function, as shown in Equation (14). In Equation (14), G le f t/right and m le f t/right is the Gini Impurity and the number of samples of the left/right node, respectively. G le f t/right is calculated by Equation (15), in which p i,j is the proportion of class j samples in the ith node; 2.
When the decision tree reaches its maximum depth or the cost function cannot be reduced, stop node splitting and terminate the decision tree creation. 3.
Repeat the first step to create a large number of decision trees. We then obtain the Random Forest. When classifying the ship type, the decision trees vote on the class of the ship.
3.4.2. Implementation of 1D-CNN Figure 10 shows the network structure of 1D-CNN. The details of 1D-CNN are shown in Table 3.

Implementation of Bi-GRU
To solve the short-term memory problem of the original RNN, Hochreiter, et al. [30] propose the long short-term memory (LSTM) unit. GRU [31] is the simplified version of LSTM. Figure 11 is the structure of a GRU unit, in which t g is the main layer and t r controls the reset gate, which resets

Implementation of Bi-GRU
To solve the short-term memory problem of the original RNN, Hochreiter, et al. [30] propose the long short-term memory (LSTM) unit. GRU [31] is the simplified version of LSTM. Figure 11 is the structure of a GRU unit, in which g t is the main layer and r t controls the reset gate, which resets h t−1 based on h t−1 (from the previous time step) and x t (from the current time step). The h t−1 which has been reset is submitted to g t . z t controls the forgetting gate as well as the output gate, which uses a '1-' operation to ensure that the weight of the forgotten memory must be equal to the weight of the added memory at the current time step. The weights of h t−1 to forget and that of g t to input into h t−1 are determined by h t−1 and x t , Equation (16) shows the way to update the paraments of a GRU unit, where W xz , W xr , W xg and W hz , W hr , W hg are the connection weight matrixes of x t and h t−1 to three fully connected layers. In addition, b z , b r , b g are the bias terms of three full connection layers, respectively.
Appl. Sci. 2021, 11, x FOR PEER REVIEW 16 of 35 Figure 11. A GRU unit.  Figure 12 shows the structure of Bi-GRU. The time-series samples in a batch (e.g.,   Figure 12 shows the structure of Bi-GRU. The time-series samples in a batch (e.g., TS i m 1 , TS i m 2 , and TS i m 3 ) are firstly filled with a value v to the same length. After going through the mask layer, the network will ignore the time step in which the data are filled with v. Bi-GRU is composed of a cyclic part and a full connection part. The cyclic part has four layers, and the hidden state of each layer in the cyclic part is 35 dimensions. The first layer in the cyclic part uses the bidirectional GRU, which enables the network to understand the behaviors of the previous time steps with the help of subsequent time steps. In this layer, for the GRU unit with input d i m j , its output y j is obtained by concatenating h j (output in the forward direction) and h j (output in the reverse direction). The second layer and the third layer are the same unidirectional GRU network. The fourth layer outputs y i m j+k of the last time step and inputs it to two full connection layers. Finally, the network outputs a 4-dimensional vector, and the value (between 0 and 1) of each dimension in this vector represents the probability that TS i m n belongs to a certain class of ships.
Appl. Sci. 2021, 11, x FOR PEER REVIEW 16 of 35 Figure 11. A GRU unit. Figure 12 shows the structure of Bi-GRU. The time-series samples in a batch (e.g.,  The trajectory T i m contains multiple time-series samples TS i m n , and each TS i m n has its predicted type. As shown in Figure 13, to obtain the type of T i m , we must first calculate the mean value of the vectors whose TS i m n belonging to T i m , and we then take the category with the highest probability as the predicting type. Therefore, the classification performance evaluation of Bi-GRU includes two aspects. One is the performance of ship classification according to a single TS i m n , and the other is the performance of ship classification by integrating all TS i m n from T i m . These two standards are also applicable to the based classifiers and MFELCM mentioned in this paper.

Implementation of XGBoost
XGBoost is an optimized implementation of Gradient Boosting Decision Tree (GBDT). The basic idea of GBDT is to build several Classification and Regression Trees (CARTs), and each CART fits the residual of the previous CART. The results of all CARTs are summed to obtain the prediction. The principle of XGBoost is explained in detail in the literature [21]. This article uses Python's XGBoost toolkit to implement the XGBoost method.

Integration of the Base Classifiers
Since the base classifiers have extracted features from the samples, to avoid overfitting we use the Random Forest method to integrate the base classifiers, as it is relatively simple and has good interpretability. Figure 14 illustrates the process of base classifier integration. To prevent confusion, the Random Forest in base classifiers is denoted as Random Forest1, and the Random Forest in the integrated procedure is recorded as Random Forest2. In Figure 14, the output of the four base classifiers on the validation set and the real label of the validation set are input into the Random Forest2 to integrate the base classifiers. The four base classifiers and the Random Forest2 form the MFELCM model, and the performance of MFELCM is evaluated on the testing set.

Implementation of XGBoost
XGBoost is an optimized implementation of Gradient Boosting Decision Tree (GBDT). The basic idea of GBDT is to build several Classification and Regression Trees (CARTs), and each CART fits the residual of the previous CART. The results of all CARTs are summed to obtain the prediction. The principle of XGBoost is explained in detail in the literature [21]. This article uses Python's XGBoost toolkit to implement the XGBoost method.

Integration of the Base Classifiers
Since the base classifiers have extracted features from the samples, to avoid overfitting we use the Random Forest method to integrate the base classifiers, as it is relatively simple and has good interpretability. Figure 14 illustrates the process of base classifier integration. To prevent confusion, the Random Forest in base classifiers is denoted as Random Forest1, and the Random Forest in the integrated procedure is recorded as Random Forest2. In Figure 14, the output of the four base classifiers on the validation set and the real label of the validation set are input into the Random Forest2 to integrate the base classifiers. The four base classifiers and the Random Forest2 form the MFELCM model, and the performance of MFELCM is evaluated on the testing set.

Experimental Results and Analysis
The experiments of MFELCM are carried out on AIS data of four types of ships, i.e., passenger ships, tankers, fishing boats, and cargo ships, which are received by HY-1C/D and HY-2B/C. The experimental environment is Windows 10, Tensorflow 2.5.0, Keras 2.5.0, CPU is an AMD R7-5800H (3.2GHz), GPU is NVIDIA 3060 Laptop, and the memory is 32 G.

Experimental Results and Analysis
The experiments of MFELCM are carried out on AIS data of four types of ships, i.e., passenger ships, tankers, fishing boats, and cargo ships, which are received by HY-1C/D and HY-2B/C. The experimental environment is Windows 10, Tensorflow 2.5.0, Keras 2.5.0, CPU is an AMD R7-5800H (3.2GHz), GPU is NVIDIA 3060 Laptop, and the memory is 32 G.

Overview of Experimental Data
For dynamic data, to reduce category imbalance and obtain effective ship dynamic features distribution, we select 300,000 pieces of messages from every four types of ships. Each ship should have more than 500 pieces of messages. Considering that the dynamic data should reflect ships' motion features, the messages for which the SOG is lower than 2 knots are removed. The amount of dynamic data used in the experiments is shown in Table 4, and Figure 15 shows the data distribution.

Experimental Results and Analysis
The experiments of MFELCM are carried out on AIS data of four types of ships, i.e., passenger ships, tankers, fishing boats, and cargo ships, which are received by HY-1C/D and HY-2B/C. The experimental environment is Windows 10, Tensorflow 2.5.0, Keras 2.5.0, CPU is an AMD R7-5800H (3.2GHz), GPU is NVIDIA 3060 Laptop, and the memory is 32 G.

Overview of Experimental Data
For dynamic data, to reduce category imbalance and obtain effective ship dynamic features distribution, we select 300,000 pieces of messages from every four types of ships. Each ship should have more than 500 pieces of messages. Considering that the dynamic data should reflect ships' motion features, the messages for which the SOG is lower than 2 knots are removed. The amount of dynamic data used in the experiments is shown in Table 4, and Figure 15 shows the data distribution.   Too short sub-trajectories are insufficient to reflect the short-term motion state of the ships. When constructing the dynamic feature datasets (i.e., DFD, TS, and TSF), the subtrajectories with less than 10 messages are ignored. The dynamic data used for experiments are shown in Table 5, and the data distribution is shown in Figure 16. For static data, all static data of the four types of ships are extracted from the database. Table 6 shows the amount of data processed according to the method in Section 3.1. The number of static data messages is greater than the number of ships corresponding to the static data because the features in static messages may change sometimes (e.g., draught and position reference point), and there exists fraudulent use of MMSI. There are 2066 MMSI in MMSI_C (see Figure 9), and the dynamic and the static datasets are split according to the methods in Section 3.  For static data, all static data of the four types of ships are extracted from the database. Table 6 shows the amount of data processed according to the method in Section 3.1. The number of static data messages is greater than the number of ships corresponding to the static data because the features in static messages may change sometimes (e.g., draught and position reference point), and there exists fraudulent use of MMSI. There are 2066 MMSI in MMSI_C (see Figure 9), and the dynamic and the static datasets are split according to the methods in Section 3.3. It should be noted that if one MMSI in MMSI_C corresponds to several s i m , those s i m should be replaced by their average before the datasets are split. Table 7 shows the evaluation of the base classifiers and MFELCM on the testing set, in which class 0 to class 3 represent passenger ships, tankers, fishing boats, and cargo ships, respectively.

Base Classifiers and MFELCM
F1 score is calculated by Equation (17). TP i , FP i and FN i are the number of true positive, false positive, and false-negative samples of type i ships, respectively. P i and R i are the accuracy and the recall of the model in classifying class i samples, respectively. The F1 score of a model is the weighted average of F1 scores for each type of ship, and the weight w i is the proportion of class i samples in the total number of samples. In addition, to reduce the effect of sample imbalance, the loss of samples is weighted during the training process, and the weight of the class i samples is set as w i −1 .

Random Forest1 Experimental Results
The optimal hyper-parameters of the Random Forest1 are determined by random search, as shown in Table 8. Figure 17 is the confusion matrixes of the classifier on the testing set. The model tends to confuse passenger ships with fishing boats, and confuse tankers with cargo ships.  To explain the confusion matrixes, we use t-SNE [32] to visualize the static features. t-SNE is a method of data visualization which can map data from high-dimensional space to low-dimensional space. If two samples are similar in high-dimensional space, the distance between their maps in low-dimensional space will be close. Figure 18 shows the visual results of static features. There are partial overlaps between the reduced dimensional distributions of passenger ships and fishing vessels, as well as that of tankers and cargo ships, which implies that the static information of the misclassified ships is similar. To explain the confusion matrixes, we use t-SNE [32] to visualize the static features. t-SNE is a method of data visualization which can map data from high-dimensional space to low-dimensional space. If two samples are similar in high-dimensional space, the distance between their maps in low-dimensional space will be close. Figure 18 shows the visual results of static features. There are partial overlaps between the reduced dimensional distributions of passenger ships and fishing vessels, as well as that of tankers and cargo ships, which implies that the static information of the misclassified ships is similar. Figure 19 shows the importance of the static information features. It can be seen that the ship's dimensional characteristics, such as A, length, and length-width ratio, can better describe the ships' features than draught. In addition, although the features C and D contribute little to the classifier, these two parameters have been reflected in the length-width ratio, girth, area, and width, which proves that the static features constructed in this paper are effective in the ship classification task. t-SNE is a method of data visualization which can map data from high-dimensional space to low-dimensional space. If two samples are similar in high-dimensional space, the distance between their maps in low-dimensional space will be close. Figure 18 shows the visual results of static features. There are partial overlaps between the reduced dimensional distributions of passenger ships and fishing vessels, as well as that of tankers and cargo ships, which implies that the static information of the misclassified ships is similar. Figure 18. Visualization of static features. Figure 19 shows the importance of the static information features. It can be seen that the ship's dimensional characteristics, such as A, length, and length-width ratio, can better describe the ships' features than draught. In addition, although the features C and D contribute little to the classifier, these two parameters have been reflected in the lengthwidth ratio, girth, area, and width, which proves that the static features constructed in this paper are effective in the ship classification task.

1D-CNN Experimental Results
The number and the width of the convolution kernels in the first convolution layer (Conv_1) in 1D-CNN are optimized using grid search, as shown in Table 9. The model used an Adam optimizer with a learning rate of 1e-3, a batch size of 2500, and the crossentropy loss function. Figure 10 shows the structure of 1D-CNN. 1D-CNN performs best when the parament of Conv_1 is Conv (30,15).  Figure 20 plots the learning curve of 1D-CNN. Figure 21 shows the confusion matrixes of 1D-CNN on the testing set, where the classifier tends to confuse passenger ships with fishing boats, as well as tankers with cargo ships.

1D-CNN Experimental Results
The number and the width of the convolution kernels in the first convolution layer (Conv_1) in 1D-CNN are optimized using grid search, as shown in Table 9. The model used an Adam optimizer with a learning rate of 1e-3, a batch size of 2500, and the cross-entropy loss function. Figure 10 shows the structure of 1D-CNN. 1D-CNN performs best when the parament of Conv_1 is Conv (30,15).  Figure 20 plots the learning curve of 1D-CNN. Figure 21 shows the confusion matrixes of 1D-CNN on the testing set, where the classifier tends to confuse passenger ships with fishing boats, as well as tankers with cargo ships.   Figure 22b shows the matrix Q added to by the 30 convolution kernels in Conv_1. Figure 22c shows the result of taking the absolute value after summing the matrix Q according to the columns. Convolution realizes the dimension reduction and feature extraction from the original data. In the first convolution layer, the value of the convolution kernel in different columns reflects the response intensity of the network to different characteristics. By summing Q and taking the absolute value, the importance of features for 1D-CNN can be inferred.   Figure 22b shows the matrix Q added to by the 30 convolution kernels in Conv_1. Figure 22c shows the result of taking the absolute value after summing the matrix Q according to the columns. Convolution realizes the dimension reduction and feature extraction from the original data. In the first convolution layer, the value of the convolution kernel in different columns reflects the response intensity of the network to different characteristics. By summing Q and taking the absolute value, the importance of features for 1D-CNN can be inferred.   Figure 22b shows the matrix Q added to by the 30 convolution kernels in Conv_1. Figure 22c shows the result of taking the absolute value after summing the matrix Q according to the columns. Convolution realizes the dimension reduction and feature extraction from the original data. In the first convolution layer, the value of the convolution kernel in different columns reflects the response intensity of the network to different characteristics. By summing Q and taking the absolute value, the importance of features for 1D-CNN can be inferred.
In Figure 22c, the features from 0 to 12 correspond to the features of DFD i m in Figure 8. As we can see from Figure 22c, the four most important features to 1D-CNN are δlat, δt, δSOG, and δlon. The combination of δlat and δlon can reflect the directional information of ship motion. According to Table A1, δt is associated with ships motion state, and the combination of these four features (i.e., δlat, δt, δSOG, and δlon) can reflect the information of ship's speed, acceleration, and steering rate. In Figure 4, it is obvious that there are some routes for cargo ships and tankers around the world, and the ship's direction within the routes is usually fixed. However, the movements of fishing boats and passenger ships are more variable. Based on the above analysis, we speculate that the 1D-CNN network may learn the movement characteristics of different types of ships on the routes. In addition, the time feature contributes little to 1D-CNN, which is probably because most of the ships do not have such features, except for some offshore or inland river ships, which have regular activity periods in a day. The reason why the classifier confuses the samples (see Figure 21) may be that the features of interest to 1D-CNN have some similarities between oil tankers and cargo ships, and between passenger ships and fishing boats.  Table A1, t δ is associated with ships motion state, and the combination of these four features (i.e., lat δ , t δ , SOG δ , and lon δ ) can reflect the information of ship's speed, acceleration, and steering rate. In Figure 4, it is obvious that there are some routes for cargo ships and tankers around the world, and the ship's direction within the routes is usually fixed. However, the movements of fishing boats and passenger ships are more variable. Based on the above analysis, we speculate that the 1D-CNN network may learn the movement characteristics of different types of ships on the routes. In addition, the time feature contributes little to 1D-CNN, which is probably because most of the ships do not have such features, except for some offshore or inland river ships, which have regular activity periods in a day. The reason why the classifier confuses the samples (see Figure 21) may be that the features of interest to 1D-CNN have some similarities between oil tankers and cargo ships, and between passenger ships and fishing boats.

Bi-GRU Experimental Results
The number of cells in the hidden layer (NoC) of Bi-GRU is optimized using random search, as shown in Table 10. The model structure is shown in Figure 12. The model used an Adam optimizer with a learning rate of 1e-3, a batch size of 1500, and the cross-entropy loss function. In Figure 12, as we prefer to choose the model with better performance on

Bi-GRU Experimental Results
The number of cells in the hidden layer (NoC) of Bi-GRU is optimized using random search, as shown in Table 10. The model structure is shown in Figure 12. The model used an Adam optimizer with a learning rate of 1 × 10 −3 , a batch size of 1500, and the cross-entropy loss function. In Figure 12, as we prefer to choose the model with better performance on ship classification, the NoC of Bi-GRU is set to 35. Figure 23 plots the learning curve of Bi-GRU. Figure 24 shows the confusion matrixes of Bi-GRU on the testing set. In Figure 24a, part of the sub-trajectories of passenger ships are misclassified into three other categories, and there is confusion between tankers and cargo ships, and some sub-trajectories of fishing boats are misclassified into cargo ships. In Figure 24b, the classifier confuses passenger ships with fishing boats, as well as tankers with cargo ships, and some fishing boats are mislabeled as cargo ships.

XGBoost Experimental Results
The maximum depth of trees in XGBoost is 20 and the learning rate is 0.03. The python toolkit named tsfresh can automatically generate a large number of features from time series, but it requires a lot of computing resources. In addition, many

XGBoost Experimental Results
The maximum depth of trees in XGBoost is 20 and the learning rate is 0.03. The python toolkit named tsfresh can automatically generate a large number of features from time series, but it requires a lot of computing resources. In addition, many

XGBoost Experimental Results
The maximum depth of trees in XGBoost is 20 and the learning rate is 0.03. The python toolkit named tsfresh can automatically generate a large number of features from time series, but it requires a lot of computing resources. In addition, many features are useless for classification, and thus it is necessary to filter the features outputted by tsfresh, which is done as follows.
We select 50,000 dynamic messages from the four types of ships, and each ship should have more than 500 pieces of messages. After removing those sub-trajectories which contain less than 10 pieces of messages, there are 106,169 pieces of dynamic messages left, which is the dataset M. Table 11 illustrates the amount of M, whose distribution is shown in Figure 25. The XGBoost1 is trained on TS(M) with 3765 features extracted from M by tsfresh. Figure 26 shows the importance of the features outputted by XGBoost1. Table 12 shows the performance of XGBoost2 which trains on TS with a different number of features. The XGBoost2 performs best when the top 40 important features are considered. The features' names and weights are shown in Table A2, and the detailed definition of these features can be obtained from [33]. 4.
Step 4. If n is not zero, return to Step 2 and decrease n with a certain interval. 5.
Step 5. When n can no longer decrease, choose the XGBoost2 with the best performance obtained in Step 3. The features used by this model are the most important feature that is generated by tsfresh.
We select 50,000 dynamic messages from the four types of ships, and each ship should have more than 500 pieces of messages. After removing those sub-trajectories which contain less than 10 pieces of messages, there are 106,169 pieces of dynamic messages left, which is the dataset M. Table 11 illustrates the amount of M, whose distribution is shown in Figure 25. The XGBoost1 is trained on ( ) TS M with 3765 features extracted from M by tsfresh. Figure 26 shows the importance of the features outputted by XGBoost1. Table 12 shows the performance of XGBoost2 which trains on TS with a different number of features. The XGBoost2 performs best when the top 40 important features are considered. The features' names and weights are shown in Table A2, and the detailed definition of these features can be obtained from [33].     When using 40 features to complete the classification of ships, the learning curve of XGBoost2 and the confusion matrix on the testing set are as shown in Figure 27. In Figure 27b, some sub-trajectories of passenger ships are misclassified into the other three categories. The classifier tends to confuse tankers with cargo ships, and tankers are more likely to be mislabeled as cargo ships. In Figure 27c, the confusion of tankers and cargo ships remains significant. Figure 28 shows the importance score of these 40 features, whose names and values of importance score are shown in Table A2. The features are mostly related to the location, speed, heading, and steering rate information of the ships, and we infer that the XGBoost2 tends to learn ships' spatial distribution.    In addition, we carry out an experiment to illustrate the necessity of splitting the dataset by MMSI. The XGBoost trained on the time-series feature dataset TSF (TSF is split by TSF i m n instead of TSF i m ) is denoted as XGBoost3. Under the same parameters with XGBoost2, the learning curve and the confusion matrixes of XGBoost3 are shown in Figure 29. The F1 score and total accuracy of XGBoost3 evaluated on TSF i m are 0.8180 and 0.8186, respectively, and those evaluated on ships are 0.7996 and 0.7994. It seems that the XGBoost3 performs better than the XGBoost2, but this is because the samples (TSF i m ) from the same ship (TSF i m ) appear in the training set and the testing/validation set at the same time. This data leakage leads to the performance of XGBoost3 being overestimated.

MFELCM Experimental Results
The Random Forest (denoted as Random Forest2) is used to integrate four base classifiers, and MFELCM is the combination of four base classifiers and Random Forest2. The weight of class i of samples (i.e., samples in Figure 14) is the ratio of the total number of samples to the number of class i samples. The best paraments of Random Forest2 are shown in Table 13, and are obtained by random search. The weights of Random Forest2 to each base classifier are shown in Figure 30. Figure 31 shows the confusion matrixes of Random Forest2 on the validation set (which is also the confusion matrixes of MFELCM). Comparing the performance of MFELCM with the base classifiers in Table 5, MFELCM has higher total accuracy and F1 score than those of the base classifiers. Specifically, MFELCM reduces the confusion between passenger ships and fishing boats, as well as that between tankers and cargo ships. When evaluating the performance of MFELCM by samples, MFELCM improves the total accuracy by 1.57% and the F1 score by 1.63%, which is equivalent to a 24.61% reduction in misclassification over the best base classifiers (Random Forest1). When evaluating the performance of MFELCM by ships, MFELCM improves the total accuracy by 3.14% and F1 score by 3%, which is equivalent to a 24.08% reduction in misclassification over the best base classifiers (Random Forest1). Different classifiers focus on different ship features and they have different classification tendencies. By integrating multiple features, MFELCM reduces the bias effectively. Furthermore, the classification results of ships can be refreshed by updating the dynamic information regularly (e.g., using satellites to transmit and update data regularly), which enables near real-time online classification.

MFELCM Experimental Results
The Random Forest (denoted as Random Forest2) is used to integrate four base classifiers, and MFELCM is the combination of four base classifiers and Random Forest2. The weight of class i of samples (i.e., samples in Figure 14) is the ratio of the total number of samples to the number of class i samples. The best paraments of Random Forest2 are shown in Table 13, and are obtained by random search. The weights of Random Forest2 to each base classifier are shown in Figure 30. Figure 31 shows the confusion matrixes of Random Forest2 on the validation set (which is also the confusion matrixes of MFELCM). Comparing the performance of MFELCM with the base classifiers in Table 5, MFELCM has higher total accuracy and F1 score than those of the base classifiers. Specifically, MFELCM reduces the confusion between passenger ships and fishing boats, as well as that between tankers and cargo ships. When evaluating the performance of MFELCM by samples, MFELCM improves the total accuracy by 1.57% and the F1 score by 1.63%, which is equivalent to a 24.61% reduction in misclassification over the best base classifiers (Random Forest1). When evaluating the performance of MFELCM by ships, MFELCM improves the total accuracy by 3.14% and F1 score by 3%, which is equivalent to a 24.08% reduction in misclassification over the best base classifiers (Random Forest1). Different classifiers focus on different ship features and they have different classification tendencies. By integrating multiple features, MFELCM reduces the bias effectively. Furthermore, the classification results of ships can be refreshed by updating the dynamic information regularly (e.g., using satellites to transmit and update data regularly), which enables near real-time online classification.

the Degraded MFELCM
In practice, space-based AIS data may not provide the features required by the four base classifiers at the same time. The classification effect of degraded MFELCM with one base classifier absence is discussed below.
When the static features are missing, the degraded MFELCM (denoted as MFELCM1) integrates 1D-CNN, Bi-GRU, and XGBoost. The paraments of Random Forest2 in MFELCM1 are shown in Table 14. The weights of Random Forest2 to each base classifier are shown in Figure 32. Figure 33 shows the confusion matrixes of MFELCM1 on the testing set. Table 15 compares the performance of MFELCM1 with the base classifiers, and MFELCM1 is better than the base classifiers in terms of the total accuracy and F1 score. In addition, MFELCM1 can either refresh the classification results by updating the inputs or can switch to MFELCM when receiving static features.

the Degraded MFELCM
In practice, space-based AIS data may not provide the features required by the four base classifiers at the same time. The classification effect of degraded MFELCM with one base classifier absence is discussed below.
When the static features are missing, the degraded MFELCM (denoted as MFELCM1) integrates 1D-CNN, Bi-GRU, and XGBoost. The paraments of Random Forest2 in MFELCM1 are shown in Table 14. The weights of Random Forest2 to each base classifier are shown in Figure 32. Figure 33 shows the confusion matrixes of MFELCM1 on the testing set. Table 15 compares the performance of MFELCM1 with the base classifiers, and MFELCM1 is better than the base classifiers in terms of the total accuracy and F1 score. In addition, MFELCM1 can either refresh the classification results by updating the inputs or can switch to MFELCM when receiving static features.

The Degraded MFELCM
In practice, space-based AIS data may not provide the features required by the four base classifiers at the same time. The classification effect of degraded MFELCM with one base classifier absence is discussed below.
When the static features are missing, the degraded MFELCM (denoted as MFELCM1) integrates 1D-CNN, Bi-GRU, and XGBoost. The paraments of Random Forest2 in MFELCM1 are shown in Table 14. The weights of Random Forest2 to each base classifier are shown in Figure 32. Figure 33 shows the confusion matrixes of MFELCM1 on the testing set. Table 15 compares the performance of MFELCM1 with the base classifiers, and MFELCM1 is better than the base classifiers in terms of the total accuracy and F1 score. In addition, MFELCM1 can either refresh the classification results by updating the inputs or can switch to MFELCM when receiving static features.   In the case of missing dynamic feature distribution due to insufficient dynamic data, the degraded MFELCM (noted as MFELCM2) integrates Random Forest1, Bi-GRU, and XGBoost. The paraments of Random Forest2 in MFELCM2 are shown in Table 16. The weights of Random Forest2 to each base classifier are shown in Figure 34. Figure 35 shows the confusion matrixes of MFELCM2 on the testing set. Table 17 compares the performance of MFELCM2 with the base classifiers. MFELCM2 outperforms the base classifiers in terms of the total accuracy and F1 score. In addition, MFELCM2 can either refresh the classification prediction by updating the inputs or can switch to MFELCM after receiving a sufficient amount of dynamic data.   In the case of missing dynamic feature distribution due to insufficient dynamic data, the degraded MFELCM (noted as MFELCM2) integrates Random Forest1, Bi-GRU, and XGBoost. The paraments of Random Forest2 in MFELCM2 are shown in Table 16. The weights of Random Forest2 to each base classifier are shown in Figure 34. Figure 35 shows the confusion matrixes of MFELCM2 on the testing set. Table 17 compares the performance of MFELCM2 with the base classifiers. MFELCM2 outperforms the base classifiers in terms of the total accuracy and F1 score. In addition, MFELCM2 can either refresh the classification prediction by updating the inputs or can switch to MFELCM after receiving a sufficient amount of dynamic data.

Parameters
Value max_depth 37  In the case of missing dynamic feature distribution due to insufficient dynamic data, the degraded MFELCM (noted as MFELCM2) integrates Random Forest1, Bi-GRU, and XGBoost. The paraments of Random Forest2 in MFELCM2 are shown in Table 16. The weights of Random Forest2 to each base classifier are shown in Figure 34. Figure 35 shows the confusion matrixes of MFELCM2 on the testing set. Table 17 compares the performance of MFELCM2 with the base classifiers. MFELCM2 outperforms the base classifiers in terms of the total accuracy and F1 score. In addition, MFELCM2 can either refresh the classification prediction by updating the inputs or can switch to MFELCM after receiving a sufficient amount of dynamic data.

Conclusions and Future Work
In this paper, we propose a ship classification method named MFELCM which is suitable for space-based AIS data worldwide. MFELCM integrates four base classifiers, i.e., Radom Forest, 1D-CNN, Bi-GRU, and XGBoost. The dynamic and static data are firstly preprocessed and four datasets are constructed (i.e., the static feature dataset SF , the dynamic feature distribution dataset DFD , the time-series dataset TS , and the time-series feature dataset TSF ), after which the datasets are split by MMSI to avoid the data leakage problem. Finally, the base classifiers are integrated by another Random Forest. Experiments show that MFELCM performs better than the four base classifiers, and MFELCM can effectively integrate the static and dynamic information of ships. Moreover, in the case of one base classifier being missing, the degraded MFELCM-which

Conclusions and Future Work
In this paper, we propose a ship classification method named MFELCM which is suitable for space-based AIS data worldwide. MFELCM integrates four base classifiers, i.e., Radom Forest, 1D-CNN, Bi-GRU, and XGBoost. The dynamic and static data are firstly preprocessed and four datasets are constructed (i.e., the static feature dataset SF , the dynamic feature distribution dataset DFD , the time-series dataset TS , and the time-series feature dataset TSF ), after which the datasets are split by MMSI to avoid the data leakage problem. Finally, the base classifiers are integrated by another Random Forest. Experiments show that MFELCM performs better than the four base classifiers, and MFELCM can effectively integrate the static and dynamic information of ships. Moreover, in the case of one base classifier being missing, the degraded MFELCM-which integrates the remaining base classifiers-still outperforms the base classifiers. As

Conclusions and Future Work
In this paper, we propose a ship classification method named MFELCM which is suitable for space-based AIS data worldwide. MFELCM integrates four base classifiers, i.e., Radom Forest, 1D-CNN, Bi-GRU, and XGBoost. The dynamic and static data are firstly preprocessed and four datasets are constructed (i.e., the static feature dataset SF, the dynamic feature distribution dataset DFD, the time-series dataset TS, and the time-series feature dataset TSF), after which the datasets are split by MMSI to avoid the data leakage problem. Finally, the base classifiers are integrated by another Random Forest. Experiments show that MFELCM performs better than the four base classifiers, and MFELCM can effectively integrate the static and dynamic information of ships. Moreover, in the case of one base classifier being missing, the degraded MFELCM-which integrates the remaining base classifiers-still outperforms the base classifiers. As MFELCM integrates multiple features, it can achieve near real-time online classification, which can be applied to ship behavior anomaly detection as well as enhancing the supervision of maritime activities.
The methods used to generate the dynamic features are an important factor for classification performance. In addition, the parameters of the classifiers are obtained by experiments in this paper. In the future, to further improve the performance of MFELCM, we plan to refine the methods of dynamic features generation as well as develop an automatic classifier parameter optimization method. Acknowledgments: HY-1C/D and HY-2B/C data were obtained from https://osdds.nsoas.org.cn (accessed on 5 March 2021). The authors would like to thank NSOAS for providing the data free of charge.

Conflicts of Interest:
The authors declare no conflict of interest. Table A1. Class A shipborne mobile equipment reporting intervals.

Ship's Dynamic Conditions Nominal Reporting Interval
Ship at anchor or moored and not moving faster than 3 knots 3 min Ship at anchor or moored and moving faster than 3 knots 10 s Ship 0-14 knots 10 s Ship 0-14 knots and changing course 3 1/3 s Ship 14-23 knots 6 s Ship 14-23 knots and changing course 2 s Ship > 23 knots 2 s Ship > 23 knots and changing course 2 s