Ship Classification in High-Resolution SAR Images Using Deep Learning of Small Datasets

With the capability to automatically learn discriminative features, deep learning has experienced great success in natural images but has rarely been explored for ship classification in high-resolution SAR images due to the training bottleneck caused by the small datasets. In this paper, convolutional neural networks (CNNs) are applied to ship classification by using SAR images with the small datasets. First, ship chips are constructed from high-resolution SAR images and split into training and validation datasets. Second, a ship classification model is constructed based on very deep convolutional networks (VGG). Then, VGG is pretrained via ImageNet, and fine tuning is utilized to train our model. Six scenes of COSMO-SkyMed images are used to evaluate our proposed model with regard to the classification accuracy. The experimental results reveal that (1) our proposed ship classification model trained by fine tuning achieves more than 95% average classification accuracy, even with 5-cross validation; (2) compared with other models, the ship classification model based on VGG16 achieves at least 2% higher accuracies for classification. These experimental results reveal the effectiveness of our proposed method.


Introduction
Due to their all-weather, all-day, and high-resolution advantages, synthetic aperture radar (SAR) images have recently been used for ship classification in marine surveillance. There are several satellites that have provided high-resolution SAR images since 2007, such as ASI's COSMO-SkyMed, DLR's TerraSAR-X, Japan's ALOS-2, and China's Gaofen-3, These high-resolution SAR images provide a resolution greater than 3 m that contain rich information about the targets, such as the geometry of ships, which makes discriminating different types of ships possible [1][2][3][4].
The methods used for ship classification with SAR images mainly focus on feature selection and optimized classifier techniques [1][2][3][4][5][6][7][8][9][10][11][12][13]. Currently, commonly used features are (1) geometric features, such as ship length, ratio of length to width, distribution of scattering centers, covariance coefficient, contour features [11], and ship scale; and (2) scattering features, such as 2D comb features [7], local radar cross section (RCS) density [1], permanent symmetric scatterers [12], and polarimetric characteristics [13]. For classifiers, models from machine learning can be adapted, such as support vector machines [14] and artificial neural networks [15]. Besides, many researchers provide classifiers that aim for high-classification accuracy given the particularity of ships in SAR images, such as the analytical hierarchy process [2] and hierarchical scheme [3]. Since these methods are highly dependent on features and classifiers, researchers exploit several strategies to relieve the processes of feature of our experiments, especially our training scheme with small datasets. Section 3 introduces our experimental results and analysis. Sections 4 and 5 are the discussion and conclusions of this paper, respectively.

Background on VGG16
The building blocks for VGG16 are a convolutional operation, pooling operation and an activation function, including rectified linear units (ReLu) and softmax. The convolutional operation is defined in Equation (1).
where the size of the kernel is represented by m × n. X represents the input, and W represents the weights of the kernel in the cth channel. For the pooling operation, there are two widely used pooling operations (i.e., maximum and average pooling), and they are expressed by Equations (2) and (3), respectively. y = max 1≤i≤m, 1≤j≤n Even though there are many variations of activation functions (e.g., rectified linear units (ReLu), softmax, Tanh, and leaky ReLU), the most commonly used variations are ReLu and leaky ReLu in the feature extraction phase and softmax in the fully connected layers. These variations are expressed by Equations (4)- (6), respectively, and the others are referred to elsewhere [22]. α in Equation (5) is a hyperparameter [22]. ReLu is adapted in the original VGG16.
There are two architectures of VGG: VGG16 and VGG19. The main difference between them is that VGG19 has 3 more convolutional layers than VGG16. Therefore, compared with VGG19, VGG16 has a relatively shallow depth of neural networks. The building blocks of VGG16 can be divided into two groups: convolutional building blocks and fully connected (FC) layer blocks, as indicated by the green and brown rectangles in Figure 1, respectively. Convolutional building blocks distill image features that range from low-level to high-level. They consist of five convolutional layer groups (each layer is stacked with convolutional layers (Conv), rectified linear units (ReLu), and pooling layers (Pool)). Specifically, there are two Conv layers, two ReLu layers, and one Pool layer in the first two groups and there are three Conv layers, three ReLu layers, and one Pool layer for the last three groups. Each Conv layer has a kernel size of 3, the stride of 1, and the padding of 1 and generates an output feature map with units of 64, 128, 256, 512, and 512. For each Pool layer, the stride and kernel size are both 2. The four numbers that occur after each layer indicate the output number of the feature maps, padding, kernel size, and stride. Fully connected layers act as classifiers. The outputs of these three fully connected layers are 4096, 4096, and 1000, where 4096 is the number of hidden neural units in the fully connected layers and 1000 is the number of classes. Drop layers (Drop) are also used to avoid overfitting. The parameter after Drop indicates the random removal of neurons with a probability of 0.5.

Proposed Method
The experimental workflow is shown in Figure 2. Black, blue, and red arrows represent the phase of data processing, training, and testing, respectively. First, three categories of ship chips which include Bulk Carrier, Container, and Oil tanker, are obtained from the high-resolution COSMO-SkyMed Images. Then, these ship chips are split into training and validation dataset, with percentages of 70% and 30%, respectively. After that, the ship classification model based on VGG16 is constructed and trained on the basis of the pretrained VGG16 via the ImageNet dataset. Finally, the validation dataset is used to evaluate the performance of the trained model. The dataset is discussed later in Section 3.1, and the constructed ship classification and training strategies are the focus in this section. Figure 2. Experimental workflow. The black, blue, and red arrows represent the data processing, training, and test phases, respectively.

Ship Classification Model
Our proposed ship classification model is based on VGG16. Unlike the ImageNet and Coco dataset, there are only three categories. To better suit the SAR ship classification, the fully connected layer blocks are substituted with two fully connected blocks, as shown in Figure 3. The settings for the convolutional layers blocks are the same as those for VGG16. The main difference from VGG16 is the top layers. There are two fully connected layers with 4096 hidden units in VGG16, whereas there is only one layer with 32 hidden units in the ship classification model. For fully connected layer blocks, the dropout value is 0.5, and the output for softmax is 3. Since the training dataset is small, the number of neural units is the hyperparameter and will be discussed later in Section 3.2.1.

Fine Tuning Ship Classification Model
There are two efficient ways to train deep-learning models with small datasets: fine tuning and transfer learning. These two methods first train the model on large public datasets, such as ImageNet, and then usually modify the top layers to accommodate a specific application. They have

Proposed Method
The experimental workflow is shown in Figure 2. Black, blue, and red arrows represent the phase of data processing, training, and testing, respectively. First, three categories of ship chips which include Bulk Carrier, Container, and Oil tanker, are obtained from the high-resolution COSMO-SkyMed Images. Then, these ship chips are split into training and validation dataset, with percentages of 70% and 30%, respectively. After that, the ship classification model based on VGG16 is constructed and trained on the basis of the pretrained VGG16 via the ImageNet dataset. Finally, the validation dataset is used to evaluate the performance of the trained model. The dataset is discussed later in Section 3.1, and the constructed ship classification and training strategies are the focus in this section.

Proposed Method
The experimental workflow is shown in Figure 2. Black, blue, and red arrows represent the phase of data processing, training, and testing, respectively. First, three categories of ship chips which include Bulk Carrier, Container, and Oil tanker, are obtained from the high-resolution COSMO-SkyMed Images. Then, these ship chips are split into training and validation dataset, with percentages of 70% and 30%, respectively. After that, the ship classification model based on VGG16 is constructed and trained on the basis of the pretrained VGG16 via the ImageNet dataset. Finally, the validation dataset is used to evaluate the performance of the trained model. The dataset is discussed later in Section 3.1, and the constructed ship classification and training strategies are the focus in this section. Figure 2. Experimental workflow. The black, blue, and red arrows represent the data processing, training, and test phases, respectively.

Ship Classification Model
Our proposed ship classification model is based on VGG16. Unlike the ImageNet and Coco dataset, there are only three categories. To better suit the SAR ship classification, the fully connected layer blocks are substituted with two fully connected blocks, as shown in Figure 3. The settings for the convolutional layers blocks are the same as those for VGG16. The main difference from VGG16 is the top layers. There are two fully connected layers with 4096 hidden units in VGG16, whereas there is only one layer with 32 hidden units in the ship classification model. For fully connected layer blocks, the dropout value is 0.5, and the output for softmax is 3. Since the training dataset is small, the number of neural units is the hyperparameter and will be discussed later in Section 3.2.1.

Fine Tuning Ship Classification Model
There are two efficient ways to train deep-learning models with small datasets: fine tuning and transfer learning. These two methods first train the model on large public datasets, such as ImageNet, and then usually modify the top layers to accommodate a specific application. They have Figure 2. Experimental workflow. The black, blue, and red arrows represent the data processing, training, and test phases, respectively.

Ship Classification Model
Our proposed ship classification model is based on VGG16. Unlike the ImageNet and Coco dataset, there are only three categories. To better suit the SAR ship classification, the fully connected layer blocks are substituted with two fully connected blocks, as shown in Figure 3. The settings for the convolutional layers blocks are the same as those for VGG16. The main difference from VGG16 is the top layers. There are two fully connected layers with 4096 hidden units in VGG16, whereas there is only one layer with 32 hidden units in the ship classification model. For fully connected layer blocks, the dropout value is 0.5, and the output for softmax is 3. Since the training dataset is small, the number of neural units is the hyperparameter and will be discussed later in Section 3.2.1.

Proposed Method
The experimental workflow is shown in Figure 2. Black, blue, and red arrows represent the phase of data processing, training, and testing, respectively. First, three categories of ship chips which include Bulk Carrier, Container, and Oil tanker, are obtained from the high-resolution COSMO-SkyMed Images. Then, these ship chips are split into training and validation dataset, with percentages of 70% and 30%, respectively. After that, the ship classification model based on VGG16 is constructed and trained on the basis of the pretrained VGG16 via the ImageNet dataset. Finally, the validation dataset is used to evaluate the performance of the trained model. The dataset is discussed later in Section 3.1, and the constructed ship classification and training strategies are the focus in this section. Figure 2. Experimental workflow. The black, blue, and red arrows represent the data processing, training, and test phases, respectively.

Ship Classification Model
Our proposed ship classification model is based on VGG16. Unlike the ImageNet and Coco dataset, there are only three categories. To better suit the SAR ship classification, the fully connected layer blocks are substituted with two fully connected blocks, as shown in Figure 3. The settings for the convolutional layers blocks are the same as those for VGG16. The main difference from VGG16 is the top layers. There are two fully connected layers with 4096 hidden units in VGG16, whereas there is only one layer with 32 hidden units in the ship classification model. For fully connected layer blocks, the dropout value is 0.5, and the output for softmax is 3. Since the training dataset is small, the number of neural units is the hyperparameter and will be discussed later in Section 3.2.1.

Fine Tuning Ship Classification Model
There are two efficient ways to train deep-learning models with small datasets: fine tuning and transfer learning. These two methods first train the model on large public datasets, such as ImageNet, and then usually modify the top layers to accommodate a specific application. They have

Fine Tuning Ship Classification Model
There are two efficient ways to train deep-learning models with small datasets: fine tuning and transfer learning. These two methods first train the model on large public datasets, such as ImageNet, and then usually modify the top layers to accommodate a specific application. They have the advantages of training a deep learning model with small datasets. Besides, as stated in Ref. [32], the low level neural layers learned by deep learning models are similar to Gabor filters [32], which is useful for extracting features such as corners, edges [33]. So, we make the assumption that the lower level neural layers share common features. Here, the top layer is typically a fully connected layer. Both methods take the weights of the models in the low-level features as the inputs instead of random weights. The difference between them is the training process. Transfer learning only trains the newly modified features [32], whereas fine tuning not only trains the new layers but also the pretrained layers.
To better illustrate the difference between them, VGG16 is used as an example. First, VGG16 is trained using a public ImageNet dataset with weights initialized with the Gaussian distribution, as shown in Figure 4. During training, the weights (except for the top layers (i.e., fully connected layers)) are transferred to the model for ship classification, as shown in Figures 5 and 6. It is obvious that the parameters for both fine tuning and transfer learning are initialized by those trained on the ImageNet dataset. Transfer learning maintains the transferred weights indicated by the green arrows and only learns the parameters of the newly added layers indicated by the red arrows, as shown in Figure 6, whereas fine tuning learns both parts (as indicated by the red arrows shown in Figure 5). Even transfer learning and fine tuning have the advantages of flexibility and robustness and are widely used in the natural image community [22] and remote sensing [19,20,33], but the weights of the low-level and mid-level convolutional blocks for transfer learning are fixed; thus, it may not suitable for SAR images. Therefore, fine tuning is utilized in this paper. the advantages of training a deep learning model with small datasets. Besides, as stated in Ref. [32], the low level neural layers learned by deep learning models are similar to Gabor filters [32], which is useful for extracting features such as corners, edges [33]. So, we make the assumption that the lower level neural layers share common features. Here, the top layer is typically a fully connected layer. Both methods take the weights of the models in the low-level features as the inputs instead of random weights. The difference between them is the training process. Transfer learning only trains the newly modified features [32], whereas fine tuning not only trains the new layers but also the pretrained layers.
To better illustrate the difference between them, VGG16 is used as an example. First, VGG16 is trained using a public ImageNet dataset with weights initialized with the Gaussian distribution, as shown in Figure 4. During training, the weights (except for the top layers (i.e., fully connected layers)) are transferred to the model for ship classification, as shown in Figures 5 and 6. It is obvious that the parameters for both fine tuning and transfer learning are initialized by those trained on the ImageNet dataset. Transfer learning maintains the transferred weights indicated by the green arrows and only learns the parameters of the newly added layers indicated by the red arrows, as shown in Figure 6, whereas fine tuning learns both parts (as indicated by the red arrows shown in Figure 5). Even transfer learning and fine tuning have the advantages of flexibility and robustness and are widely used in the natural image community [22] and remote sensing [19,20,33], but the weights of the low-level and mid-level convolutional blocks for transfer learning are fixed; thus, it may not suitable for SAR images. Therefore, fine tuning is utilized in this paper.  The red arrows indicate the weights involving these layers will be learned. Figure 6. Workflow of training using transfer learning on the ship dataset for ship classification based on VGG16. The red arrows indicate that the weights involved in these layers will be learned, the advantages of training a deep learning model with small datasets. Besides, as stated in Ref. [32], the low level neural layers learned by deep learning models are similar to Gabor filters [32], which is useful for extracting features such as corners, edges [33]. So, we make the assumption that the lower level neural layers share common features. Here, the top layer is typically a fully connected layer. Both methods take the weights of the models in the low-level features as the inputs instead of random weights. The difference between them is the training process. Transfer learning only trains the newly modified features [32], whereas fine tuning not only trains the new layers but also the pretrained layers.
To better illustrate the difference between them, VGG16 is used as an example. First, VGG16 is trained using a public ImageNet dataset with weights initialized with the Gaussian distribution, as shown in Figure 4. During training, the weights (except for the top layers (i.e., fully connected layers)) are transferred to the model for ship classification, as shown in Figures 5 and 6. It is obvious that the parameters for both fine tuning and transfer learning are initialized by those trained on the ImageNet dataset. Transfer learning maintains the transferred weights indicated by the green arrows and only learns the parameters of the newly added layers indicated by the red arrows, as shown in Figure 6, whereas fine tuning learns both parts (as indicated by the red arrows shown in Figure 5). Even transfer learning and fine tuning have the advantages of flexibility and robustness and are widely used in the natural image community [22] and remote sensing [19,20,33], but the weights of the low-level and mid-level convolutional blocks for transfer learning are fixed; thus, it may not suitable for SAR images. Therefore, fine tuning is utilized in this paper.     Figure 5. Workflow of training using fine tuning on the ship dataset for ship classification based on VGG16. The red arrows indicate the weights involving these layers will be learned. Figure 6. Workflow of training using transfer learning on the ship dataset for ship classification based on VGG16. The red arrows indicate that the weights involved in these layers will be learned, and the green arrows represent that the parameters will remain unchanged. Figure 6. Workflow of training using transfer learning on the ship dataset for ship classification based on VGG16. The red arrows indicate that the weights involved in these layers will be learned, and the green arrows represent that the parameters will remain unchanged.

Training Details
Our experiment was performed with a free and open operating system with the version 14.04 released by international Canonical Ltd, and an 8G memory NVIDAI GPU GTX1070. Our implementations are on the top of Keras [34], which is a deep-learning library that is built on the basis of TensorFlow, CNKT, and Theano. The pretrained models are downloaded from the Keras website. As stated in Section 2.2.2, transfer learning is also used to train the models to compare it with the performance of fine tuning. Fine tuning is learned by a stochastic gradient descent [22,35,36], with a learning rate of 0.0001 and a moment of 0.99. Transfer learning is guided by RMSprop [22,37], with a learning rate of 0.001 and a moment of 0.9. The conditions for termination are that the classification accuracy or loss almost remains the same.

SAR Images
A total of 6 scenes from the COSMO-SkyMed level 1A HIMAGE (single-look complex Slant (SCS) product) images in the X-band with single HH or VV polarization are used to evaluate our approach, and the detailed information is shown in Table 1. The data are calibrated and converted into grey in the range [0, 255]. The ships are found automatically or manually on the SAR images. Three kinds of ships are obtained from these images through SAR experts and field experiments, and there are 146 bulk carriers, 156 containers, and 144 oil tankers used for classification. Some ship chips are shown in Figure 7. The bulk carriers have repeating textures due to the box-like hatches in the longitudinal direction. The containers also have repeating characteristics due to truck-size containers. The oil tankers have symmetric features because of the intense backscattering of pipelines along the centerline [3]. Three kinds of ships are obtained from these images through SAR experts and field experiments, and there are 146 bulk carriers, 156 containers, and 144 oil tankers used for classification. Some ship chips are shown in Figure 7. The bulk carriers have repeating textures due to the box-like hatches in the longitudinal direction. The containers also have repeating characteristics due to truck-size containers. The oil tankers have symmetric features because of the intense backscattering of pipelines along the centerline [3].

Pretrained Models
To evaluate the effectiveness of our proposed method, three other models, including VGG19, Xception, and InceptionV3, are also used to construct our ship classification in these experiments. These four models can be divided into groups: the first two stem from VGG and the last two models evolve from GoogleNet [38]. The aim of GoogleNet is focused on the fact that the deeper the neural network is, the higher the performance. In addition, the inception module is used to increase both the width and depth of the neural network. There are five variations of GoogleNet, including GoogleNet, InceptionV2, InceptionV3, InceptionV4, and Xception. Considering the fact that InceptionV3 and Xception have the top performances regarding classification and are available now, these two neural architectures are used in this paper. InceptionV2 adds batch normalization to GoogleNet, and InceptionV3 exploits factorization ideas in the Inception and Xception variations, which introduces the depth-wise separable convolution of InceptionV3 [30]. These four models are pretrained on the ImageNet dataset, and the ship classification model based on them is constructed similarly to that based on VGG16, as shown in Section 2.2.1. To save energy when training these models on the ImageNet dataset, pretrained models are downloaded from the Keras website, and the information is shown in Table 2. It is obvious that VGG16 has a relatively shallow depth, which is equal to the sum of the input layer, Conv, Pool, FC, and an output layer. Table 2. Pretrained model information via the ImageNet dataset. During the inference for an image in ImageNet, the trained deep learning model outputs the probabilities of each class. The Top-1 accuracy means if the label corresponding to the maximum probability is correct, then this image is correctly classified. The Top-5 accuracy means if the label is included in the top-5 maximum probability, then this image is regarded as correctly classified.

Influence of Units in the Fully Connected Layers
Considering that the number of neural units in the fully connected layers may have a potential impact on the classification results, comparison experiments are conducted with a different number of units in the fully connected layers 32, 64, 128, 256, and 4096. The classification accuracy and validation accuracy are shown in Table 3. In addition, precision, recall, and F 1 score are also used to evaluate the influence of units in the fully connected layers. It is obvious that the number of units has little impact on the classification results from Table 3 with regard to the five metrics (e.g., training accuracy, validation accuracy, and F 1 score) when using fine tuning to train the ship classification model. To evaluate the performance of these five models, McNemar's test is used [39,40]. There is no significant difference among these models because the p-value is larger than the significance level of 0.05. Since the model has the minimum hidden neural units (32), which means few weights to be learned to avoid overfitting, it is used for our experiments.

Comparison of Fine Tuning and Transfer Learning
To evaluate the effectiveness of our proposed training method, the constructed ship classification model based on VGG16 is trained via fine tuning and transfer learning separately. The training and validation accuracies are shown in Table 4 and Figures 8 and 9. On the one hand, it is obvious that fine tuning is more stable than transfer learning during the training process. On the other hand, even when the training accuracies of both methods are the same (i.e., 100%), the validation accuracy of the model trained by fine tuning is approximately 2% higher than that trained by transfer learning. The reason for this phenomenon may be that features in the deeper models are suitable for natural images and not SAR images. Besides, the time for training by transfer learning is less than that by fine tuning. This is because compared with transfer learning, fine tuning also modifies the weights in the transferred weights, which may cost time. For better clarification, the confusion matrix of the two training methods is shown in Table 5 and the precision, recall, and F 1 score of each category are shown in Table 6.

Comparison of Fine Tuning and Transfer Learning
To evaluate the effectiveness of our proposed training method, the constructed ship classification model based on VGG16 is trained via fine tuning and transfer learning separately. The training and validation accuracies are shown in Table 4 and Figures 8 and 9. On the one hand, it is obvious that fine tuning is more stable than transfer learning during the training process. On the other hand, even when the training accuracies of both methods are the same (i.e., 100%), the validation accuracy of the model trained by fine tuning is approximately 2% higher than that trained by transfer learning. The reason for this phenomenon may be that features in the deeper models are suitable for natural images and not SAR images. Besides, the time for training by transfer learning is less than that by fine tuning. This is because compared with transfer learning, fine tuning also modifies the weights in the transferred weights, which may cost time. For better clarification, the confusion matrix of the two training methods is shown in Table 5 and the precision, recall, and F1 score of each category are shown in Table 6.  McNemar's test is also used and the contingency table is shown in Table 7. It is easy to obtain the fact that there is no significant difference between these two models. However, compared with the model trained via fine tuning, even the model via transfer learning can exactly classify oil tankers, but it failed to recognize the containers as shown in Table 5. Therefore, the precision, recall, and F1 score of the model trained by fine tuning are higher than those trained by transfer learning. In McNemar's test is also used and the contingency table is shown in Table 7. It is easy to obtain the fact that there is no significant difference between these two models. However, compared with the model trained via fine tuning, even the model via transfer learning can exactly classify oil tankers, but it failed to recognize the containers as shown in Table 5. Therefore, the precision, recall, and F 1 score of the model trained by fine tuning are higher than those trained by transfer learning. In addition, it is obvious that F 1 score via fine tuning is 2% greater that via transfer learning. The reason for this is that the features of the pretrained model may be suitable for natural images, not SAR images. Unlike transfer learning, fine tuning also modifies the weights of the low layers in the pretrained models, thus the features learned may suitable for SAR images. Since the weights of convolutional neural networks are fixed for the ship classification model, it may be beneficial for transfer learning in SAR images if the dataset of the pretrained models is from SAR images.

Comparison with Other Models
Three other models are trained via fine tuning based on the same condition (i.e., the fully connected layers have one layer with 32 units). The training and validation accuracies are shown in Table 8 and Figure 10. McNemar's test is also used. Except for the ship classification model based on Xception, the other two models are not significantly changed through through McNemar's test. However, the VGG16 performs better than the other three models with regard to F 1 score. It is apparent that VGG16 has the lowest Top-1 and Top-5 accuracies and the shallowest depths among these four models in the pretrained models from Table 2. However, the ship classification accuracy of VGG16 achieves the highest accuracies (at least 2% for the F 1 score). This may be due to the fact that because of the differences between SAR images and natural images, features in the higher layers of the deeper model pretrained on the ImageNet dataset are suitable for natural images and not SAR images. To further evaluate the performance of these four models, the confusion matrix, precision, recall, and F 1 score are used. The confusion matrices of these four models are shown in Tables 5 and 9, and the metrics are shown in Table 10. It is obvious that the ship classification model based on VGG16 has the highest F 1 score. In addition, it is easy to see that the higher classification on the ImageNet dataset is not guaranteed to be higher in the SAR images. For the first group, VGG16 with a relatively shallow depth has a 5% higher accuracy compared to VGG19; for the second group, InceptionV3 improves by almost 5% with regard to F 1 score. The reason for this may be that transfer learning only trains the newly modified features, whereas fine tuning not only trains the new layers but also the pretrained layers. Thus, through training the model by fine tuning, the features may be more suitable for SAR images. Besides, compared with the other three models, the time of our proposed model has a fast speed. There are two reasons for this, i.e., the depth of the constructed model and the training times of these models. Since both the depth of VGG16 and the training time are the minimum, VGG16 has the fast training speed.
be that transfer learning only trains the newly modified features, whereas fine tuning not only trains the new layers but also the pretrained layers. Thus, through training the model by fine tuning, the features may be more suitable for SAR images. Besides, compared with the other three models, the time of our proposed model has a fast speed. There are two reasons for this, i.e., the depth of the constructed model and the training times of these models. Since both the depth of VGG16 and the training time are the minimum, VGG16 has the fast training speed.    In addition, it is easy to note that bulk carriers and containers are misclassified as other kinds of ships in these four models. Examples of misclassified bulk carriers and containers are shown in Figure 11.   In addition, it is easy to note that bulk carriers and containers are misclassified as other kinds of ships in these four models. Examples of misclassified bulk carriers and containers are shown in Figure 11.

Discussion
Since the volume of ship constructed datasets is small, cross validation is important to evaluate the performance of the classifier. In this paper, 5-cross validation is used. Table 11 is the cross validation accuracy for transfer learning and fine tuning of VGG16 and fine tuning of VGG19. It is obvious that compared with transfer learning, fine tuning achieves higher average classification accuracy and lower standard deviation. This is because fine tuning also modifies the transferred weights, which may make the learned features suitable for SAR images.
Through the above experiments and analysis, it can be clearly seen that among the above four models, each model has its own advantages. The model based on VGG16 achieves the best F1 score and the best observations of Oil Tankers. The model based on VGG19 has the best observations of Bulk Carriers. The model based on InceptionV3 has the best precision for Oil Tankers and the best observations of Containers. The model based on Xception has the best precision for containers and the best observations of oil tankers. It will be our future goal to adapt ensemble learning [41] to take advantage of these models and achieve the best performance. However, among these four models, Figure 11. Examples of misclassified ships. (a) Bulk Carrier is classified as Container; (b) Container is misclassified as Oil Tanker. Their corresponding classification probabilities into three categories are also shown as indicated by royal blue histograms.

Discussion
Since the volume of ship constructed datasets is small, cross validation is important to evaluate the performance of the classifier. In this paper, 5-cross validation is used. Table 11 is the cross validation accuracy for transfer learning and fine tuning of VGG16 and fine tuning of VGG19. It is obvious that compared with transfer learning, fine tuning achieves higher average classification accuracy and lower standard deviation. This is because fine tuning also modifies the transferred weights, which may make the learned features suitable for SAR images.
Through the above experiments and analysis, it can be clearly seen that among the above four models, each model has its own advantages. The model based on VGG16 achieves the best F 1 score and the best observations of Oil Tankers. The model based on VGG19 has the best observations of Bulk Carriers. The model based on InceptionV3 has the best precision for Oil Tankers and the best observations of Containers. The model based on Xception has the best precision for containers and the best observations of oil tankers. It will be our future goal to adapt ensemble learning [41] to take advantage of these models and achieve the best performance. However, among these four models, i.e., VGG16, VGG19, Xception, and InceptionV3, but not Xception, the other three models have no significant change for ship classification with McNemar's test. This may be characteristic of this dataset and more ship classification datasets are needed to verify this in the future. In addition, due to the differences between SAR images and optical images, large SAR image datasets will be constructed to better exploit the methodology of deep learning, and future research will be conducted on pretrained models with SAR images, which may show the benefits of transfer learning and combined SAR inherent characteristics (e.g., statistical distributions with convolutional neural networks) to enhance the classification results of ship classifications. Not only does fine tuning achieve promising results, it also results in overfitting of the present model. The increase in datasets is likely to solve this problem.

Conclusions
In this paper, convolutional neural networks are used to classify ships in high-resolution SAR images with small datasets. Specifically, firstly, a SAR ship dataset is constructed from 6 COSMO-SkyMed images and, considering the characteristics of SAR images, a ship classification model based on VGG16 is constructed. Second, fine tuning is utilized to train the model, which is pretrained on an ImageNet dataset. The experimental results reveal that (1) our proposed ship classification model achieves the best performance with regard to classification accuracy, even with 5-cross validation; (2) compared with the model trained via transfer learning, the model trained by fine tuning has the best performance in classification accuracy; (3) compared with other models, the ship classification model based on VGG16 achieves at least 2% higher accuracies for classification. The inherent characteristics of SAR images will be combined with convolutional neural networks to enhance performance in future works. In addition, the ensemble of different deep-learning models will be used to improve performance.