1. Introduction
Synthetic aperture radar (SAR) is an active microwave sensor, whose resolution—both in range and azimuth—can be improved via the pulse compression technique and synthetic aperture principle, to obtain high resolution remote sensing images. Moreover, another advantage of SAR imaging is its ability to operate on an all-weather/all-day-and-night basis [
1]. Its application has been of interest in a variety of fields [
2,
3], e.g., SAR-based ocean remote sensing is widely used for environmental monitoring, search and rescue, target recognition, etc. [
4]. Spaceborne SAR can also be operated over long periods in wide-area and real-time observations. In this context, it became a fundamental system for ship target recognition [
5,
6,
7,
8].
Typical ship target recognition using SAR imagery involves land and sea segmentation, target detection, target recognition, etc. In a large SAR image, target detection can be based on the feature difference between targets and backgrounds. In this process, a minimum region in one target chip containing the whole target can be confirmed [
9], and the other part is considered background. Obvious feature differences normally exist between target and background regions, i.e., grayscale, multi-resolution, polarization, phase, etc., which form the basis for the design of many target detection methods. Hu et al. analyzed multidimensional SAR information using a linear time-frequency (TF) decomposition approach [
10]. Yuan et al. extracted the gradient ratio pattern for each pixel based on Weber’s law, and used the local gradient ratio pattern histogram (LGRPH) for SAR target recognition [
11]. In addition, the conventional constant false alarm rate (CFAR) technique is a typical detection method based on the grayscale feature. However, complicated and cluttered backgrounds severely affect CFAR detection performance [
12].
In recent years, ship target detection based on deep learning (DL) has been widely studied [
13,
14], using the typical model of convolutional neural network (CNN) [
15]. Liu et al. [
16] presented a ship detection method, namely sea-land segmentation-based convolutional neural network (SLS-CNN), which combines a SLS-CNN detector, saliency computation, and corner features. Furthermore, Zhao et al. [
17] proposed a spaceborne SAR ship detection algorithm based on low complexity CNN. Some other well-known CNN-based target detection methods include faster region-CNN (Faster R-CNN), you only look once (YOLO) list model, etc. For example, Li et al. [
18,
19] improved detection performance using Faster R-CNN, to successfully provide a densely connected multi-scale neural network [
19]. This method is used to solve multi-scale and multi-scene problems in SAR ship detection. Feature maps are fused by densely connecting different feature map layers, rather than information from single feature maps, which represent top-to-down feature map connections. The R-CNN method is used for target recognition in large scene SAR images [
20]. Furthermore, Hamza and Cai used YOLOv2 for ship detection [
21], which introduced a multitude of enhancements into the original YOLO model.
However, these methods may be no longer effective when ghost replicas exist in an imaged scene. The ghost phenomenon is an intrinsic effect of SAR’s ambiguity, both in azimuth and range [
22,
23]. Range ambiguity occurs when different backscattered echoes—one related to a transmitted pulse and the other due to a previous transmission—temporarily overlap during the receiving operation [
24]. On the other hand, azimuth ambiguity is caused by the aliasing of each target’s Doppler phase history. The Doppler frequency, which is higher than pulse repetition frequency (PRF), may lead to azimuth ambiguity [
25]. This phenomenon is particularly relevant for high reflectivity targets, which appears in SAR images as ghosts in low reflectivity areas [
26]. Moreover, according to the ghost generating principle, it is similar to its real target, rendering discrimination difficult. Azimuth ambiguity is prominent due to the spaceborne SAR’s fast platform velocity and big azimuth Doppler bandwidth.
According to the ghost generating principle and characteristics, we provided a hierarchical CNN-based ship target detection method in spaceborne SAR imagery, i.e., H-CNN. Hierarchical processing includes two stages: the coarse detection and fine detection. First, regions of interest (ROIs) were extracted from a large imaged scene in the coarse-detection stage. Although most land and sea background-related clutter was removed, ghost replicas remained in the ROIs. Therefore, the fine detection stage was introduced to further refine target detection against ghost replicas. In the experiments, H-CNN was trained and tested using Sentinel-1 SAR data [
27]. In the following sections we first discuss H-CNN parameter configuration for optimal detection results. Then, the feature extraction quality is analyzed. Detailed texture and abstract semantic information are extracted using different convolutional layer operations. Finally, we conduct detection experiments to validate the H-CNN, and compare it to conventional CFAR technique and CNN models.
2. Ghost Phenomenon in Spaceborne SAR
Spaceborne SAR is an applied formation of SAR in space. Spaceborne SAR has some characteristic differences compared to airborne SAR [
28,
29,
30,
31], e.g., the former image normally has large data size due to its large antenna beam irradiation range, etc.
Ghost is an image representation of SAR ambiguity in range or azimuth direction. When PRF is too high, successive pulses may be aliased in one pulse period [
32]. The distance between the target and its range ambiguity ghost can be calculated as follows [
33,
34]:
where
is the index of azimuth ambiguities, indicating the spatial location of ghost replicas in the azimuth direction,
is the radar wavelength,
is the PRF,
is the Doppler rate, and
is the Doppler centroid.
If the PRF is excessively low, the part of Doppler frequency higher than PRF is folded into the azimuth spectrum, resulting in the occurrence of azimuth ambiguity.
Figure 1 illustrates azimuth ambiguity formation with azimuth antenna pattern and PRF.
is the Doppler bandwidth and
, where
is the SAR platform velocity and
is the antenna size in the azimuth direction. When
is greater than the value of PRF, as shown in
Figure 1, undersampling causes aliasing in the azimuth spectrum. Blue and red dashed curves denote the first left and right replicas due to the sampling, respectively.
The distance between azimuth ambiguity ghost and target can be calculated by Equation (2) [
33,
34]:
where
is the slant range and
is the SAR platform velocity.
Moreover, in the case of a scene where ships are moving on a smooth sea surface, bright targets against a dark background would be present in the SAR image. In such cases, ghosts are noticeably observed, and may impose severe difficulties during ship target detection.
According to spaceborne SAR parameters, theoretical range and azimuth ambiguity distances can be estimated by Equations (1) and (2), respectively. Taking for instance Sentinel-1 SAR data, we analyze its azimuth ambiguity in some SAR images. Its imaging geometry is shown in
Figure 2a. Although it contains four imaging modes, we only show the interferometric wide (IW) swath mode. Moreover, Sentinel-1 SAR system parameters play a significant role in the imaging, which contain platform speed, altitude of satellite to earth ground
, elevation angle
, PRF, etc.
Table 1a,b show the Sentinel-1 satellite SAR system and a ship’s example parameters, respectively. Three different PRFs exist in one group of Sentinel-1 data. Furthermore, according to the characteristics of spaceborne SAR, slant range is influenced by the Earth’s curvature and distance from ground to satellite—their relationship is shown in
Figure 2b. In other words, it can therefore be calculated using the satellite’s altitude from the Earth’s ground, radius of the Earth
, elevation angle, and incidence angle
.
Theoretical azimuth ambiguity distance can be obtained using Equation (2). When
, the results in the cases of three PRF are ~5031.4 m, 4254.9 m, and 4940.6 m, respectively. The right graph of
Figure 3 depicts the SAR image of the ship example and corresponding ghost replicas. We then extracted the azimuth direction sequence in one fixed range direction cell. In order to decrease the dynamic range of amplitude in azimuth direction, we expressed it in decibels. Finally, the sequence in azimuth direction is shown in the left graph of
Figure 3. The distances between two ghosts and their target are approximately estimated to be ~4630 m and 4970 m, respectively, which are close to theoretical values mentioned above.
Discrimination difficulty is due to the fact that some traditional characteristics of a target and its corresponding ghost are similar, i.e., length–width ratio, area and shape complexity, etc. [
35,
36,
37]. We therefore need to dispose of special discrimination between target and ghost, to eliminate the negative effects of ghosts on the detection performance.
4. Architecture of the H-CNN Model
Traditional CNN consists of convolutional, pooling, and fully connected layers. The convolutional layer is used for feature extraction. Many convolutional kernels exist in every convolutional layer, and each pixel of kernel corresponds to one weight and one bias. Each neuron in the convolutional layer must be connected to several neighboring regions of the front layer. In addition, kernel size decides region size. In convolutional operation, kernels regularly slide in the whole feature map and feature extraction is realized as:
where
and
are the input and output results in
pixel of the
lth convolutional layer, respectively. They are all named as feature maps. In addition,
and
are weight and bias of convolutional kernel in convolutional layer
l, respectively.
is an activation function which is usually designed as sigmoid, rectified linear unit (ReLU) [
38], etc. In this paper, ReLU is selected and is defined by:
After convolutional layer feature extraction, feature maps are transmitted to the next pooling layer. The pooling operation is used for selecting a few points to replace the whole feature map. Classic pooling methods include max pooling—which we applied in this paper—mean pooling, etc.
Finally, feature maps are fully connected in the last layer, which is similar to the hidden layer of traditional feedforward neural network. In this layer, multi-dimensional feature map structures are reshaped.
Traditional CNN is a supervised network. It is usually optimized by the well-known stochastic gradient descent (SGD) algorithm [
39,
40], which is basically an improved version of the batch gradient descent (BGD) method. In every iterative procedure, all samples were computed using this optimization algorithm. Moreover, to solve the slow update problem, a group of samples were stochastically selected and used for gradient direction determination in one iterative procedure. In the next iteration, a new group of stochastically selected samples was applied for the parameter update. When the loss of function arrives at the minimum value and remains stable, all parameters, i.e., weight and bias, are confirmed.
In this paper, we provide the H-CNN method for ship target detection in the spaceborne SAR imagery, with the hierarchical training pattern. The first coarse-detection stage of H-CNN was used to discriminate between ROIs and background. The ship targets were further determined from the interference of ghost replicas during the fine-detection stage. In the test phase, the whole SAR image was cut into several chips, and processed using coarse- and fine-detection stages, during which ship targets are extracted from the whole SAR. Here, all SAR chips were input in the coarse-detection stage. The chips were extracted when different from background. In order to further mitigate ghost interference, chips extracted after the coarse-detection stage were discriminated during the fine-detection stage for the ship target detection. It should be noted that large quantities of sea chips were always present. Therefore, the coarse detection could ease the computational burden for the following step by removing plenty of background chips. Furthermore, the fine-detection stage focuses on the elimination of ghost interference. However, since the sliding step is smaller than chip size, the overlapping phenomenon may occur. We used non-maximum suppression (NMS) [
41] to further dispose of coarse-detection stage results. Architecture of the H-CNN model is shown in
Figure 5.
During the coarse-detection stage, the network was trained using target and background samples. This part of the network mainly focuses on ROI extraction from a large imaged scene. Since unwanted ghost replicas are major interference sources that remain in ROIs, coarse-detection stage outputs are inputs into the fine-detection stage network, which facilitates the discrimination between real targets and ghosts. In the meantime, the fine-detection stage network is trained using target and ghost samples. NMS is disposed to all ROIs, which are extracted during the coarse-detection stage. Based on this process, ship target detection in spaceborne SAR imagery can be realized.
6. Conclusions
A ship target detection method was proposed in this paper based on hierarchical CNN in the spaceborne SAR imagery. Its major contributions are twofold. First, a hierarchical pattern was designed to allow the single attention of each stage for the ship target detection against different interference, i.e., sea clutter and ghost replicas. Second, we adopted the statistical analyses of feature maps in the last layer, which may facilitate the understanding of these abstract features of ship targets and ghosts in spaceborne SAR images. Specifically, in the coarse-detection stage of H-CNN, ROIs can be extracted from whole images. Moreover, ship targets were detected against ghosts in the fine-detection stage. According to spaceborne SAR characteristics, we analyzed the ghost-generating principle, which conforms to the actual data situation. H-CNN designation was based on the amplitude information of SAR image chip in space dimension, and amplitude distribution differences between target and ghost were then discussed. Amplitude proportion differences were obvious, but the envelope forms of the two distributions were similar. In the experiments, we first discussed the parameter configuration of H-CNN as H-10-1 to obtain optimal detection results. Then, the feature extraction quality of H-CNN was studied. It was found that some detail texture features, and abstract semantic features, were extracted by different convolutional layers of H-CNN. Moreover, feature map amplitude distributions of target and ghost had different envelopes, which improved their distinguishable degree. Furthermore, the feature extraction quality during the fine-detection stage of H-CNN was quantitatively analyzed based on the LDA theory. Finally, we compared the proposed method with conventional CFAR technique, traditional CNN model, and low complexity CNN model using the same data. Detection results of H-CNN were optimal, and it achieved more than 13.83% and 4.57% improvement compared to CFAR and traditional CNN model, respectively. Additionally, compared with low complexity CNN, H-CNN increased by 3.51%, 3.47%, and 2.54% in FoM, recall, and F-measure, respectively. In other words, the proposed H-CNN could effectively resist the interference of sea clutter and ghost replicas. To probe deeper, we plan to explore the joint detection and classification of SAR ship targets based on DL methods in future work. The influence of more factors in practical applications will be considered and studied, such as multi-resolution, speckle noise interference, image with some defocused ROIs, etc.