Ship Detection Using Deep Convolutional Neural Networks for PolSAR Images

Ship detection plays an important role in many remote sensing applications. However, the performance of the PolSAR ship detection may be degraded by the complicated scattering mechanism, multi-scale size of targets, and random speckle noise, etc. In this paper, we propose a ship detection method for PolSAR images based on modified faster region-based convolutional neural network (Faster R-CNN). The main improvements include proposal generation by adopting multi-level features produced by the convolution layers, which fits ships with different sizes, and the addition of a Deep Convolutional Neural Network (DCNN)-based classifier for training sample generation and coast mitigation. The proposed method has been validated by four measured datasets of NASA/JPL airborne synthetic aperture radar (AIRSAR) and uninhabited aerial vehicle synthetic aperture radar (UAVSAR). Performance comparison with the modified constant false alarm rate (CFAR) detector and the Faster R-CNN has demonstrated that the proposed method can improve the detection probability while reducing the false alarm rate and missed detections.


Introduction
As one of the most important applications in remote sensing, ship detection has played significant roles in commercial, fishery, vessel traffic services, and military applications, etc. [1][2][3]. Particularly, polarimetric synthetic aperture radar (PolSAR) is vital in ship detection because it permits good target characterization and guarantees effective detection regardless of illumination and weather conditions by capturing massive structural and textural information. Therefore, ship detection from PolSAR images has drawn increasing attention in recent years. Featured by adaptive detection threshold, CFAR has become one of the most popular ship detectors [1][2][3][4][5][6]. However, its performance strongly depends on statistical modeling of the local background clutter and empirical sliding window size (i.e., the target window, protect window, and background window). As we all know, heterogeneous clutter and interfering targets often lead to inaccurate estimations and biased modeling in practice. To deal with the intrinsic drawbacks of the CFAR detector, modified CFAR detectors have been proposed, where the heterogeneous clutter is reduced by advanced background estimation method [7] and the problem of statistical contamination, i.e., the sliding window contains one or more interfering targets, is solved by truncated statistics [8,9]. Additionally, other ship detectors have been developed by exploiting polarimetric scattering characteristics; e.g., polarization entropy, cross entropy, the degree The major contributions include that the modified Faster R-CNN is proposed to achieve fast and accurate ship detection for ships with differing size. Meanwhile, the proposed ship detector can mitigate the influence of azimuth ambiguities on ship detection, and has exhibited superior detection performance for the heterogeneous PolSAR images.
The organization of this paper is listed as follows: Section 2 provides detailed description of the ship detection method and gives an introduction of the overall architecture, as well as the components of the proposed method. Section 3 presents experimental results on measured AIRSAR and USVSAR datasets, and Section 4 concludes the paper.

Theory and Methodology
The workflow of the proposed ship detector is shown in Figure 1. There are four core processes involved: preprocessing, DCNN based sea-coast-ship classifier, modified Faster R-CNN ship detector, and the target fusion. Firstly, in the preprocessing stage, we convert the covariance matrix of the PolSAR data to real vector by Pauli decomposition. To enrich the training dataset and suppress overfitting for DCNN, we segment the PolSAR image with sliding windows and do multi-scale rotation to segmented samples. Meanwhile, we apply the box-car multi-look processing for speckle noise reduction. Secondly, we designed a sea-coast-ship classifier based on deep convolutional neural network to extract samples containing ships. Thirdly, we use the modified Faster R-CNN detector to detect ships of the selected samples by using sea-coast-ship classifier. Finally, we merge the detection results according to the geometrical relationship of samples and output the final detection results. Below, we will discuss them sequentially.

Detection Results
PolSAR MLC Images

Preprocessing
To feed PolSAR data into real-valued DCNN, the Pauli decomposition are utilized to generate color-coded images. The Pauli decomposition can express the Sinclair scattering as elementary scattering mechanisms, and they are given by Specifically, a represents single scattering from a plane surface, b and c represent dihedral scattering with relative orientations of 0 • and 45 • , respectively, and the final elements are all the antisymmetric components of the scattering S matrix. The corresponding color-coded image of the Pauli decomposition with Red = |a| 2 , Green = |b| 2 , and Blue = |c| 2 . Then, the input can be expressed as Input = |a| 2 , |b| 2 , |c| 2 (2) In the UAVSAR and AIRSAR datasets, the number of PolSAR images containing ships are usually limited. Meanwhile, the image resolution is low and most of the ships are small in size. To avoid severe overfitting and improve detection performance, it is required to generate sufficient samples by segmenting the PolSAR image with sliding windows. According to the prior knowledge of ship size in the UAVSAR and AIRSAR datasets, we set the segmentation size and stride size at 64 × 64 and 8 pixels, respectively, in the training stage. The stride is set to 64 or 32 pixels in the testing stage. To further enrich the training data and suppress overfitting for DCNN, we do multi-scale rotation to segmented samples, and the multi-scale rotation mainly refers to rotating the samples 180 degrees and flipping the sample from left to right.
Illustratively, Figure 2 shows some Pauli images of the three types of targets, each image at 1090 × 5111 pixels. Figure 2a shows Pauli images of coast, Figure 2b shows Pauli images of sea, and Figure 2c shows Pauli images containing ships and sea. Meanwhile, some rotated Pauli images of sub-samples are presented in Figure 3 and them are extracted from Figure 2, where the top row corresponds to ship samples, the middle row corresponds to sea samples, and the bottom row corresponds to coast samples. It is worth noting that the first image in Figure 2a and the second image in Figure 2b are collected by AIRSAR in L band, while other images are collected by UAVSAR in L band. Meanwhile, the training dataset consists of different sea states, resolution, and azimuth ambiguities samples to make the sea-coast-ship classifier and the modified Faster R-CNN more robust.     Additionally, severe speckle noise will appear in the PolSAR imagery and affect ships with weak scattering due to the unique scattering and imaging mechanism for SAR. Therefore, multi-look processing should be utilized to suppress speckle noise before image segmentation. In this paper, we apply the simple box-car multi-look processing for speckle noise reduction, which will be proven necessary for detection from heterogeneous PolSAR data in Section 3.

Sea-Coast-Ship Segmentation
Compared to sea surface, ships usually exhibit stronger backscattering due to their unique double-bounce structures and large radar cross sections. On the contrary, ships and coast have similar backscattering intensity in PolSAR images, which would prevent effective detection. Therefore, coast samples should be removed firstly from the dataset before implementing the modified Faster R-CNN. Common approaches to coast masking include manual operation or designing classifiers with domain knowledge, which usually have poor performance with large computational burden. In this paper, we propose a DCNN-based sea-coast-ship classifier to distinguish ships, coast, and sea, which extracts ship samples automatically.
The network architecture is shown in Figure 4, which consists of 8 convolutional layers, 3 max-pooling layers, and 2 fully connected layers. The network architecture is designed by the best performance of the classification for sea, coast, and ship [24,25]. The sea-coast-ship classifier based on CNN is trained on the NVIDIA Titan-X GPU using caffe [26]. All the weights and biases are learned via the mini-batch stochastic gradient descent algorithm [26], where the mini-batch size is 64, the momentum parameter is 0.9, the weight decay parameter is 0.0005, and the learning rate is 0.001. Meanwhile, we randomly initialize the layers by drawing weights from Gaussian distributions with zero mean and standard deviation of 0.01, and initialize the bias by small constants; e.g., 0.1 [13].

Modified Faster-RCNN
With sufficient samples, we perform ship detection by modified Faster R-CNN, which is an improved version of Faster-RCNN specifically designed for ship detection. Originally, the Faster R-CNN is a DCNN-based detection algorithm, and the core is the region proposal network (RPN) that shares full-image convolutional features with the detection network. Figure 5 shows a typical architecture of the Faster R-CNN, which mainly includes two modules: (1) a fully DCNN that generates region proposal; and (2) the Fast R-CNN detector [15] that detects target based on region proposal generated by the first module. Such architecture enables a unified and deep-learning-based object detection system, which runs at near real-time frame rates and achieves satisfying performance with object detection.   The performance of the state-of-the-art ship detection networks depends heavily on region proposal algorithms; e.g., the RPN in the Faster R-CNN. The RPN simultaneously predicts a set of rectangular object bounds with object scores. To generate a region proposal, a small network with a spatial window of 3 × 3 slides over feature maps produced by the last shared convolutional layer. Each sub-feature included in the sliding window is then mapped to lower-dimension features, which is fed into two sibling fully connected layers: the proposal-regression layer (termed "Proposal_bbox_pred" in Figure 7) and the proposal-classification layer (termed "Proposal_cls_score" in Figure 7). At each position, the small network simultaneously predicts multiple region proposals and the number of maximum possible proposals; i.e., k. Therefore, the proposal-regression layer has 4k outputs encoding coordinates of k boxes, and the proposal-classification layer outputs 2k scores to give the probability of being a ship or not a ship for each proposal. The k proposals are parameterized relative to k reference boxes; namely, anchors. By default, the "proposals" are dense sliding windows of 4 scales (2,4,8,16) and 3 aspect ratios (1:1, 1:2, 2:1).
The detection network takes the entire image and the set of object proposals as input, and processes the whole image with several convolutional and max pooling layers to generate feature maps. It shares convolutional layers with RPN, where the fully connected layer demand fixed-length vectors as inputs. Therefore, the region of interest (ROI) pooling layer utilizes max-pooling to convert features inside a valid region of interest into a small feature map with fixed spatial extent. Figure 6 shows the structure of an ROI pooling layer, where each valid ROI feature map is converted into a vector with fixed length. Then, each feature vector is fed into a sequence of fully connected layers, which are finally branched into two sibling output layers; i.e., the box-regression layer (termed "Bbox_pred" in Figure 7) and the box-classification layer (termed "Cls_score" in Figure 7). The box-classification layer produces softmax probability of the ships plus a catch-all "background", and the box-regression layer outputs four real-valued numbers to calibrate the coordinates of the boxes.
Remote Sens. 2019, 11, x FOR PEER REVIEW 3 of 14 Figure 6. Structure of the region of interest (ROI) pooling layer, the proposal in the last feature map must be converted into the same size to fit the input of the fully connected layer.   According to the translation-invariant property, the base proposal size defined in (3) depends on the total stride of the last convolutional layer, where NumP is the total stride. Meanwhile, the architecture of the Faster R-CNN should be determined by the ship size; i.e., a large ship needs a large receptive filed to learn instinct features and generate large proposals, while a small ship needs a relatively small receptive filed to create small proposals. Additionally, the detection resolution, also known as the minimum detectable distance between two ships, should be equal to the base proposal size. Obviously, the conventional Faster R-CNN fails to simultaneously detect large and small ships in the PolSAR images. Therefore, we propose the modified Faster R-CNN, which is capable of detecting ships of varying size. The new architecture is shown in Figure 7, where we add an additional proposal-classification layer and proposal-regression layer after the lower-level feature map (conv2) to generate small proposals. Meanwhile, we add an additional box-regression layer and box-classification layer to improve the detection accuracy for small ships. It is observed that the RPN and the detection network share the first two convolutional layers, and the modified architecture could predict proposals with differing size from the last two shared convolutional layers. As shown in Figure 7, we apply a small feature extraction network instead of a classical network architectures (e.g., VGG-16 [13], ResNet-101 [27] and DenseNet [28]) to capture the low-level visual features of the ships in the PolSAR images. These classical network architectures are designed for optical images which have the characteristics of high resolution, large target size, and refined features. Compared with the targets in the optical images, the targets in the PolSAR images just have some rough features (e.g., brightness and edge). Meanwhile, due to the low resolution of the PolSAR images of the AIRSAR and UAVSAR, the sizes of ships usually do not match with the total stride of the classical network architectures. For example, the total stride of VGG-16 is 16, while the ship size in the PolSAR images is generally less than 16 × 16 pixel units. It is worth noting that the proposals generated from the Conv2 and Conv3 are conveyed to the Conv2-based and Conv3-based detection networks, respectively. Then, the Conv2-based and Conv3-based detection networks produce softmax probability of the ships and the coordinates of the boxes. Sharing Features  To improve the quality of region proposals, we first segmented the PolSAR image into sub-samples with the size of 64 × 64 pixels, and set the smallest ship size at 2 × 2 pixels. In the training stage of RPN, we randomly sample 64 anchors in an image to compute the loss function of a mini-batch, where the momentum parameter is set to 0.9, the decay parameter is set to 0.0005, and the learning rate is set to 0.01. Meanwhile, the weights are initialized from Gaussian distributions with a zero mean and standard deviation of 0.01, and the biases are initialized with a small constant of 0.1 [16]. The parameter values for setting up RPN network is determined by the best performance of the ship detection according to the references [24,25]. The modified Faster R-CNN is trained on the NVIDIA Titan-X GPU using caffe [26]. It takes advantage the GPUs, which makes for efficient prediction of object bounds and objectness scores at each position. The detection network shares the convolutional layer with RPN, and detailed architecture can be found in Figure 7. We train the detection network by backpropagation and stochastic gradient descent. Similarly, the detection network adapts the same parameter values as the RPN.

Target Fusion and Localization
As has been discussed previously, we segment the PolSAR image into sub-samples by the sliding window technique to fit the input size of DCNN and improve the ship detection performance. The stepping window structure of the segmented PolSAR image is shown in Figure 8, where the dashed Remote Sens. 2019, 11, 2862 9 of 27 rectangle indicates the edge of the PolSAR image, and the solid rectangle indicates the non-overlapping block-shifting segmentation window. Additionally, the black arrow denotes the shifting direction of the sliding window. For the target that occupies more than one sliding window, the detection window is split into red rectangles A and B. In this scenario, we need to merge them according to the geometrical relationship of the two samples and draw a bounding box of the ship. Additionally, the rectangles marked red and green are generated by two different detection networks shown in Figure 8; thus, we should also merge these detection results.  Supposing that the center coordinates (x, y), width w, and height h of the split detection windows A and B are known a priori, which are defined by (x A , y A , w A , h A ) and (x B , y B , w B , h B ), respectively, then they should satisfy the two principles given below: where · 1 is L1-norm. The first inequality sets constraint on the two windows in the horizontal direction, while the second one sets constraint in the vertical direction. Threshold_l is the lower bound of the distance between two detection windows, which is determined by the detection resolution and is experimentally set to 2. Meanwhile, Threshold_h is the higher bound of the distance between two detection windows distance, and is set to be If the two detection windows satisfy merge principles given in (4) and (5), then the new detection windows are defined by where x min , y min , x max , and y max denote the left, top, right, and bottom coordinates of the bounding box, respectively.
After that, we define the new detection window to draw bounds of ships in the PolSAR image, where, x min_c , x max_c , y min_c , y max_c denote the coordinates of the detection window in the PolSAR image, x min , x max , y min , y max denote the coordinates of the detection window in the segmented samples, W index , H index denote the width and height of the segmented samples, respectively, and Ss is the stride.

Experimental Results
In the following discussions, experimental results of four measured datasets are presented to show the validity and the generality of the proposed method. Firstly, we illustrate the necessity of multi-look processing for heterogeneous PolSAR data before applying the detection algorithm by comparing the detection results of different number of looks. Meanwhile, we analyze the influence of azimuth ambiguity on ship detection performance. Because the ground truth provided by the automatic identification system (AIS) is unavailable, we discriminate between ships and their ambiguities by comparing the multi-frequency data [12], and assess the UAVSAR data by comparing images of the same scene at different time instants.

Results of AIRSAR Japan Dataset
The L-band dataset covers an area of the inland sea named Kojimawan, which is close to the Tamano City in Japan. It was acquired on 4 October 2000 using the NASA/JPL AIRSAR instrument. The extracted image has 797 × 741 pixels, corresponding to a resolution of 3.33 × 4.63 m (Range × Azimuth). Figure 9 shows the Pauli RGB image of this dataset, where 21 ships are marked by red rectangles. Obviously, the PolSAR image is too large for DCNN. Therefore, we segment the image into smaller slices by the sliding-window technique, where the stride is set as 64 pixels and the size of a slice is 64 × 64. Then, the CNN-based classifier discussed in Section 2.2 is applied for ship subsamples extraction. The CNN-based classifier discussed in Section 2.2 achieves the state-of-the-art classification performance with 99.8% accuracy on the real-measured data, and is utilized for ship subsamples extraction. After that, the modified Faster R-CNN is applied to ship detection. In the training stage, it takes about 3 h to train an accurate modified Faster R-CNN model.   For illustration, Figure 10 shows typical detection results with Faster R-CNN, Figure 10a-d refer to the detection results of the original PolSAR image and Figure 10e-h refer to the detection results after multi-look processing with 9 looks. From Figure 10, it is observed that all the ships are concisely detected and marked with red and green rectangles, which are generated by feature maps of the last and the second last convolutional layers, respectively. It is worth noting that the Faster R-CNN, which generates proposals from Conv3 is termed deep Faster R-CNN, and shallow Faster R-CNN generates proposals from Conv2 (Conv2 and Conv3 are convolutional layer shown in Figure 7). Figure 10a,e shows that the all ships are clearly detected by the deep Faster R-CNN and can be individually distinguished, while the shallow Faster R-CNN detector miss the labeled A ship. Meanwhile, the large-size ship in Figure 10c,g is detected by shallow and deep Faster R-CNN, while the small ship in the right of Figure 10c,g is only detected by the shallow Faster R-CNN. The detection windows labeled A and B with respect to Figure 10a,e, and Figure 10b,f correspond to one ship, and thus they should be merged into a single window. Additionally, the detection proposals generated by feature maps from different convolutional layers should also be merged. As shown in Figure 10, there are two missed detections by using single-level feature, while robust detection performance can be achieved by using multi-level features.    Figure 11 are false alarm, missed ship, and roughness-detected ship, respectively. In Figure 11a, four missed ships and no false alarms occur. Figure 11b shows that 20 ships are detected with two false alarms, and one small ship is misdetected. Note that the clutter is detected by the deep Faster R-CNN as a ship, but it is corrected in the shallow Faster R-CNN, probably because of the strong energy of the clutter and the lack of train samples with clutter. The ship detection result of the modified Faster R-CNN is depicted in Figure 11c, where 21 ships are detected with two false alarms. No missed detections occur in this case. By comparing the detection result of the traditional Faster R-CNN, the modified Faster R-CNN achieves a better detection performance for multi-size ships by combining different feature maps. Figure 11d shows the ship detection result with modified CFAR, which applies the segmentation-based CFAR detector using truncation statistics for multi-look intensity (MLI) SAR imagery [8]. For the modified CFAR detector, the detection window, truncation ratio, confidence level, subsampling factor, and false alarm rate are set to 300 × 300, 5%, 99.9%, 4, and 10 −8 , respectively [8]. As shown in Figure 11d, all ships are correctly detected; however, the modified CFAR detector misses 2 ships and four false alarms appear due to strong clutter. Figure 11e presents the ship detection result of the SPWH detector. The SPWH detector is based on the assumption that sea clutter presents a homogeneous polarimetric behavior, and there are only two kinds of scatterers (i.e., sea, ship) in the scene [12]. We can see that major ships were clearly detected by this detector. However, there exist many false alarms marked using red dash circles, and these false alarms may be caused by severe clutter. It is indicated that the sea with abnormal polarimetric characteristics would generate false alarms, and we cannot simply suppose that only two kinds of scatterers existed in the scene. Ships do not have too much appearance information in PolSAR images, and their useful visual features (e.g., brightness and edge) can be easily captured by Conv1. Figure 11f shows the ship detection performance of the shallow Faster R-CNN detector with the proposals generated by the Conv1 in RPN. There are a lot of partial structures of large ships that have not been detected. It is worth noting that the shallow Faster R-CNN generates proposals by the Conv1 in RPN that would consume a large amount of memory, which seriously reduces the speed of target detection model training and testing. Figure 11e shows the ship detection result of the fully convolutional network-based ship detector; there are 5 false alarms and no misdetected ships. Meanwhile, some clutters are wrongly detected as a part of ships marked with greed dotted circles. The detection performance shown in Figure 11e demonstrates that the ship detector based on a fully convolutional network can't mitigate the influence of the clutter. The fully convolutional network-based ship detector can output pixel-wise segmentation results [22] (sea, coast, and ship). The parameters of the fully convolutional network-based ship detector are adopted from the reference [28], while the fully convolutional network-based ship detector utilizes the same training data for the proposed ship detector. There are, in total, 31,892 samples with the size of 64 × 64 for training the ship detector based on fully convolutional network. In Figure 11, it is shown that the proposed modified Faster R-CNN detector had superior performance compared to the traditional Faster R-CNN and the modified CFAR detector with the improvement of detection probability.
The training data was generated from Pauli decomposition. To demonstrate the validity of the training data generated from Pauli decomposition, we analyze the ship detection performance with training data generated from a different decomposition. Figure 12a shows the ship detection result with training data generated from Cloude decomposition [29], there are 13 false alarms marked with red dash ellipses and no ship missed detections. Figure 12b shows the ship detection result with training data generated from Huynen decomposition [29]; there are 14 false alarms marked with red dash ellipses and 1 missed ship was marked with a pink ellipse. By comparison, it is shown that the modified Faster R-CNN with the training data generated by Pauli decomposition has the best ship detection results.
Additionally, we analyzed the influence of the number of looks on the detection performance. Figure 13 shows the detection results of the proposed detector after multi-look processing with 9 looks, 25 looks, and 49 looks, respectively. Compared to the detection result of the original PolSAR image, the detection results after multi-look processing are more accurate. Figure 10d,h show partial ship detection result of the original and multi-look processed PolSAR images, respectively, which indicate that multi-look processing can suppress speckle noise and improve the detection performance. Compared to Figure 13, the bounding box marked by the green dash ellipse in Figure 11 became rougher. With an increasing number of looks, however, some weak ships may have been misdetected as clutter because the multi-look processing make it look more like clutter, as shown in Figure 13c. Therefore, the modified Faster R-CNN detector has better performance after multi-look processing with 9 or 25 looks for the heterogeneous PolSAR images.       To further validate the proposed method, the detection probability P d [12] and the figure of merit (FoM) [30] are introduced to perform performance evaluation quantitatively; i.e., Remote Sens. 2019, 11, 2862 15 of 27 where N td is the number of true detections, N f a is the number of false alarms, and N gt is the number of ground-truth targets. From Table 1, it is shown that the proposed ship detector has performance superior to the traditional Faster R-CNN and the modified CFAR, achieving high accuracy with P d = 100%, FoM = 91.7% for the Japan dataset.
The experiment is carried out on a personal computer with a 3.60-GHz Intel Core i7 processor and 64.00-GB RAM. The processing time of the shallow Faster R-CNN detector, the deep Faster R-CNN detector, proposed ship detector, the modified CFAR detector, and the shallow Faster R-CNN using Conv1 in RPN takes 2.53s, 2.71s, 3.03s, 201.9s, and 3.97s, respectively. The modified CFAR detector is a segmentation-based CFAR detection algorithm based on truncated statistics for multi-look intensity (MLI) SAR imagery. It takes around 195.3s for image segmentation due to the high computation complexity of the modified EM (Expectation Maximization) algorithm. On the contrary, all the parameters of the proposed ship detector are trained off-line, and thus the test stage takes much less time than the CFAR detector. A comparison between the execution time in Table 1 demonstrates that the proposed ship detector has a faster detection speed for complex and challenging sea clutter situations. In addition, we analyzed the influence of azimuth ambiguity on the detection performance. For the C-band measured dataset of the same scene, the extracted image has 785 × 693 pixels, and the Pauli image is shown in Figure 14a, where the targets marked by red rectangles are ships, and those marked by a cyan rectangle are ambiguities. There are 20 ships and 8 ambiguities.
The detection results of the shallow Faster R-CNN, deep Faster R-CNN, proposed ship detector, proposed ship detector after multi-look processing, the modified CFAR detector, and the ship detector based on fully convolutional network are presented in Figure 14b-e, respectively. For the modified CFAR detector, the detection window, truncation ratio, confidence level, subsampling factor, and specifics false alarms are separately set by 300 × 300, 5%, 99.9%, 4, and 10 −8 [8]. Meanwhile, the parameters of the modified CFAR are the same for other test dataset. There is one false alarm marked with red dotted ellipse and one missed ship marked with pink circle in Figure 14b. As is shown in Figure 14c, there are 3 false alarms and one missed ship. The false alarms are marked by red dotted ellipses shown in Figure 14d, and there are no missed detections. As we can see from Figure 14e, the ship detection performance has improved after multi-look processing, and there are two false alarms. Figure 14f shows that there are 17 false alarms and 2 missed ships. Meanwhile, 7 ambiguities marked by green dotted circles are misclassified as ships. Figure 14e shows that there are 13 false alarms and no missed detections. Although the ship detector based on a fully convolutional network can mitigate the influence of the azimuth ambiguities, it introduces more false alarms due to strong clutter. Compared to the modified CFAR detector, the proposed ship detector has better ability in distinguishing the ambiguities and ships because it can accurately extract different features of ships and ambiguities in the scattering intensity. Quantitative analysis on the ship detection results is listed in Table 2, which demonstrates that the proposed ship detector has performance superior to the modified CFAR detector, the fully convolutional network based ship detector, and the proposed ship detector using a multi-level feature that has better detection performance than traditional Faster R-CNN. Due to the existence of ambiguities having an adverse impact on ship detection, it is recommended that the dataset should undergo ambiguity removal beforehand.

Result of UAVSAR Gulfco Area A Dataset
The detection results on the AIRSAR data where ships are distributed, targets have demonstrated the effectiveness of the proposed detector. In this part, we further analyze the detection performances on the UAVSAR Gulfco area A dataset, where the ships are compactly distributed. The dataset is acquired by NASA/JPL L-band, quad-pol UAVSAR over a bay area in Gulfco in 2010. The extracted image is 249 × 223 pixels in size with a resolution of 7.2 × 5 m (Azimuth × Range). This image contains 39 ships in the scene. Figure 15 shows the Pauli image for reference, where the ships are marked by red rectangles. Additionally, the ships marked by green solid circles in Figure 15 are relatively compacted, which are difficult to detect by conventional Faster R-CNN.    Figure 16 shows ship detection result of partial samples. It is observed that the deep Faster R-CNN were prone to take compacted ships as single ship while the shallow Faster R-CNN could detect all the compacted ships, as shown in Figure 16a,b. Figure 17a-e shows the final ship detection result of the shallow Faster R-CNN, the deep Faster R-CNN, the proposed ship detector, the modified CAFR ship detector, and the fully convolutional network based ship detector, respectively. All the ships are accurately detected without false alarms and missed detections for the shallow Faster R-CNN, the proposed ship detector, the modified CFAR ship detectors, and the fully convolutional network based ship detector. However, Figure 17b shows that there were three missed ship marked with pink circles, and three small ship are detected as one marked with green dotted ellipses. The modified CFAR achieves satisfying performance because the PolSAR data presents a homogeneous polarimetric behavior. Meanwhile, the fully convolutional network-based ship detector shows that it performed well in ship compactly distributed condition, as well as in homogeneous area. This experiment also verified that the proposed detector is capable of detecting compactly distributed ships accurately. Moreover, it proves that the deep Faster R-CNN is not suitable for small and compactly distributed ship detection. Detailed detection result of the four ship detectors can be found in Table 3.

Result of UAVSAR Gulfco Area B Dataset
The ship detection results on the AIRSAR and UAVSAR Gulfco area A datasets have demonstrated the validity and generality of the proposed ship detector. However, there are no coasts in the above datasets. To verify that the proposed ship detector can suppress the negative impact of coast on the detection result, we performed experiments on the UAVSAR Gulfco area B dataset. This L band dataset is also a product of NASA/JPL UAVSAR system acquired in Gulfco. The extracted image is 2021 × 1209 pixels in size with a resolution of 7.2 × 5 m (Azimuth × Range), and there are 22 ships in the scene. Figure 18 shows the Pauli image for reference, where ships are marked by red rectangles. Meanwhile, the targets marked by yellow rectangles are not ships with referred ground truth shown in the image. After obtaining sub-samples through the sliding-window technique, the DCNN based classifier is applied to extract samples containing ships.    Figure 19a shows that most ships were detected, two false alarms marked with red dotted circles, and one miss detected ship marked with pink circles. Figure 19b shows that most ships are detected and two false alarms presented. Figure 19c shows the merged ship detection results of the proposed ship detector, where all the ships are detected with 2 false alarms marked by red dotted ellipse. Due to the resolution of the PolSAR image being comparatively low, the proposed ship detector could not learn sufficient fine features and false alarms appeared in the detection result. Meanwhile, the detection results of the modified CFAR and the fully convolutional network-based ship detector shown in Figure 19d,e further demonstrate the superiority of the proposed ship detector. The segmentation result shown in Figure 19e demonstrates that the fully convolutional network performs well, the lakes and rivers are very clear. The sea, ship, and coast are marked with blue, yellow, and green color, respectively. However, there are some false alarms which occurs because the characteristic differences between ship and some coast are too small. Detailed ship detection performance comparisons of the five detectors are shown in Table 4. Note that though there is no clutter in the detected area and ship size is moderate, the detection performance of the traditional Faster R-CNN is consistent with the proposed algorithm.

Result of AIRSAR Taiwan Area Dataset
To further demonstrate the effectiveness of the proposed modified Faster R-CNN ship detector and to verify the performance of the proposed method in inshore scenery, we perform experiments on the AIRSAR Taiwan area dataset in L band. Figure 20 shows the Pauli RGB image of the AIRSAR Taiwan area dataset and the extracted image is 1527 × 1277 pixels in size with a resolution of 2.5720 × 1.6655 m (Azimuth × Range). Figure 20 shows a complex and challenging sea situation for ship detection. There are multiple small weak ships appearing in the image, and many ships spread over the inshore. There are about 132 ships marked with red rectangle by visual inspection.     Figure 21 demonstrates that the proposed ship detector can provide a more excellent performance than the traditional Faster R-CNN, the modified CFAR, and the fully convolutional network-based ship detector. The proposed ship detector is sensitive enough to clearly detect and distinguish small and close located ships. It is worth noting that there were 14 false alarms observed around the coast edge because coast and ships have many similar image features in low-resolution PolSAR images. Moreover, the proposed ship detector completely missed the 8 ships because some samples contain ships were wrongly classified as coast. Compared with the traditional faster R-CNN and proposed ship detector, the performance of the modified CFAR and fully convolutional network-based ship detector was poor. There were many false alarms due to the complex inshore condition. Meanwhile, compared with the segmentation result shown in Figure 19e, the segmentation result shown in Figure 21e were worse, especially for coast. This situation may occur because there were no similar coastal samples in the training samples. However, the proposed ship detector performed well in such inshore areas. Detailed ship detection performance comparisons of the five detectors are shown in Table 5.

Discussion
In this paper, we developed a modified Faster R-CNN ship detector for PolSAR images. Compared with previous ship detection algorithms for PolSAR, the modified Faster R-CNN ship detector could capture the difference between the ship and clutter. Therefore, it performed well and effectively detected ships of multiple size. Four ship detection experiments performed on the measured PolSAR images in different condition demonstrates the effectiveness of the proposed ship detector. Meanwhile, the proposed ship detector had the ability to mitigate the influence of the azimuth ambiguities on the ship detection performance. Simultaneously, the modified Faster R-CNN ship detector had good performance in inshore scenery, expanding the scope of application of the proposed algorithm. However, the ship detection result presented in the last experiment shows that some small ships near land were more easily misdetected, and some coasts would be recognized as ships using the proposed ship detector. This might have occurred because the characteristic difference between ship and coast is very small in low resolution PolSAR images. In the future, we will introduce more additional features and use deep learning to improve the accuracy of ship detection for PolSAR.

Conclusions
In this paper, a DCNN based ship detection algorithm for PolSAR images is proposed. To mitigate the influence of the coast on ship detection, a three-class classifier based on DCNN was utilized to accurately extract samples containing ships. Compared with conventional Faster R-CNN, the modified Faster R-CNN could generate proposals of differing size from multi-level feature maps of the last two convolutional layers. Experimental results of four measured datasets have shown that the improvement in Faster R-CNN facilities fast and accurate detection performance for ships of differing size, especially for the compactly distributed ships and small ships. Additionally, we investigated the influence of multi-look processing, polarimetric decomposition methods and azimuth ambiguities on the detection performance for PolSAR images. Moreover, we performed an experiment on inshore scenery to demonstrate the effectiveness of the proposed ship detector.
Future works will focus on the methods of combining the deep learning and Bayesian theory to extract the structure, texture, and statistical characteristics of the ship, and further improve the detection performance.