High-frequency surface wave radar (HFSWR) transmits high-frequency electromagnetic waves and receives the backscatter echo based on the mechanism of coastal surface diffraction propagation [1
]. By analyzing the received echo, continuous monitoring over a wide range can be achieved, which includes vessels [2
] and the low altitude flying objects over the sea [3
] as well as the ocean dynamics parameters [4
]. It is widely used and has become the primary technical means in the field of maritime-state monitoring and target vessel detection.
Since HFSWR has the advantages of all-weather operability, and lower cost and wider range of observation as compared to other monitoring systems, it plays an important role in continuous monitoring of our exclusive economic zone and the sea-state. While the strong clutter and interference such as the sea clutter, the ionospheric clutter and the radio frequency interference (RFI) severely limit the detection ability of HFSWR. Hence, the detection and suppression of the clutter [7
] and the interference [8
] is essential to guarantee the performance of HFSWR, the automatic and accurate detection in particular is a prerequisite for the suppression of the clutter. Although the suppression of clutter and interference can be implemented without prior detection, both theoretical and simulation results show that the signal to noise ratio (SNR) of the target also can be obviously reduced after implementing clutter/interference suppression [10
]. Hence, it is of great importance to detect whether there is existence of clutter/interference before initiating clutter/interference suppression for reservation of the signal energy and improvement of computation efficiency.
The most common ways to identify and detect the clutter and the interference based on Range-Doppler (RD) spectral image are usually divided into two parts: The first step is to extract features from the images; then use the traditional image segmentation techniques to achieve the goal or use the machine learning which is currently the cutting-edge technology in the field of identifying and detecting objects. Chen et al. [11
] used the corresponding analysis and cluster analysis methods to classify interference and clutter in the time and the distance domain. In addition, Jin et al. [12
] analyzed three typical features of the sea clutter, namely, the backward propagation coefficient, Bragg resonance phenomenon, and the peak amplitude of the Bragg spectrum. These features were sent to support vector machines (SVMs) to be identified and classified into ground/sea clutter. Li et al. [13
] proposed a method using the Otsu algorithm to adaptively choose the threshold and extract the features of clutter in the RD spectrum images according to the regional characteristics. Li and Zeng [14
] used Gabor wavelet transform to extract the textural features in the RD spectrum images, which could have better performance on the feature extraction of the clutter edge. In order to extract more effective features, Li et al. [15
] integrated those features such as statistical property, Gabor features, image characters, and wavelet transform into a feature library, and then processed them using a genetic algorithm to select the better features, which were finally sent to the SVM.
Whether a method was based on traditional image segmentation or on machine learning, these methods relied heavily on artificial feature design and preselection, and hence, human intervention in these methods could not be avoided. Hence, to overcome the above difficulties, an automatic clutter and interference detection method based on deep learning network is proposed.
In 2012, Krizhevsky designed a kind of convolutional neural network (CNN) called the AlexNet [16
], which won the image classification competition of ImageNet Large Scale Visual Recognition Competition (ILSVRC). Since the coming of AlexNet, a number of architectures based on CNN, such as Zeiler and Fergus model (ZF) [17
] and VGG-16 [18
] have been proposed to solve the problems connected with regression and classification. In 2015, Ren et al. proposed the Faster R-CNN [19
] and won many titles, including target identification and detection in the ILSVRC and Common Objects in Context (COCO). With this model becoming popular, it has been applied in many image and video processing fields. Compared with the conventional image processing methods, one of the advantages of the CNN is that it can perform effective feature extraction of the images via the convolutional computation without threshold selection, therefore, it has a wider scope for application. Through its use, the drawbacks caused by the technique of artificial feature design can be totally avoided.
The deep learning method is conventionally designed for big data [20
], while the specific data with clutter and interference are limited in practice. Based on the distribution of the clutter/interference and the structure of R-CNN, we propose a novel algorithm to identify and detect the clutter and interference based on Faster R-CNN with a new structure of CNN which has much fewer parameters to be regulated for small datasets. Thanks to the structure of the end-to-end deep CNN, there is no need to design artificial features. Furthermore, the network is able to locate the clutter and interferences, which is helpful for their extraction. Once the clutter and interferences are automatically detected, the suppression algorithm can be started.
This paper is organized as follows. In Section 2
, by analyzing the characteristics of the RD spectrum of HFSWR and the deep learning network, the sea clutter and interference detection problem is formulated via Faster R-CNN in the absence of large amount of required data. In Section 3
, we propose a novel detection method based on Faster R-CNN by designing appropriate training and scoring mechanism. In Section 4
, field data experiment results are presented to show the effectiveness of the proposed method. In Section 5
, the results are discussed, and the future work is highlighted. Finally, the paper is summarized in the concluding remarks in Section 6
2. Problem Formulation
It is recognized that the deep learning networks usually need a large quantity of data. Since HFSWR data is not so massive and requires online, real-time processing, a faster R-CNN, with simple architecture and fewer parameters is needed.
2.1. Faster R-CNN
Faster R-CNN, a real-time object detection network, has a composite architecture which involves the region proposal network (RPN) and Fast R-CNN. This composite architecture is logical because while the former module generates region proposals, the latter detects objects. Due to the “Attention” mechanism, the RPN generating the suggested area frame tells the R-CNN to detect. Furthermore, these two modules share their convolutional layers, which reduces the computational cost of generating region proposals, also the burden on the GPU.
For a better learning system, a two-stage four-step architecture is selected. This can be depicted as follows.
we use the pretrained model to finetune the RPN module for region proposal task;
we use those region proposals to train the Fast R-CNN, which is also pretrained by the same model for detection task;
we fix the parameters in the convolutional layers and modulate the second RPN after initializing it by the above detection module, in which the sharing of convolutional computation is completed;
we repeat step 3 but finetune the parameters, especially those belonging to the second Fast R-CNN.
2.3. Create a Convolution Neural Network
When this composite architecture was first proposed, the ZF and VGG-16 networks were chosen to be its pretrained model because these two networks separately contain five shared convolutional layers and 13 shared layers. These deep CNN models are suitable for processing some big datasets like Pascal VOC 2007/2012 or MS COCO (the quantity reaches to ), but they are too complicated for those specific fields where a few image samples can be gathered. Therefore, as the RD spectrum data was not large, we create a deep CNN with a relatively small number of parameters to avoid the disadvantages arising from the mismatch between the complexity of the networks and the small dataset.
The whole simple structure, shown in Table 1
includes two convolutional layers and a max pooling layer and two fully connected layers. For detection tasks, the CNN needs to analyze smaller sections of the image, and we should also take into account the amount of spatial detail the CNN needs to resolve into conditions, therefore, we select an input size of (32 × 32). The reason why we choose two convolutional layers with a small kernel [3 3] as the core building blocks of CNN is also considered in two parts: The dataset of RD spectrum images we collect is small, and the distinction between the RoIs and the background varies only slightly, hence these two convolutional layers are quite enough for our purpose. The first convolutional layer close to the input layer describes the basic features such as borders, shades and simple stripes, while the second layer contains more abstract and practical features, which can possibly decide whether the predication is true or not. The output neurons of these two fully connected layers were set to be as small as possible to subtract the parameters of each fully connected layer, which may account for the majority of the parameters of the whole network.
According to the structure shown in Table 1
, we can easily figure out all the parameters of the whole CNN network, which are at the most of 3000 (320 + 320 + 64 + 2112 + 130). Compared with the parameter quantity of more than 60 million for the pretrained ImageNet model of AlexNet, our CNN structure is well adapted for processing the small dataset of RD spectrum images.
From the whole CNN structure of the R-CNN model shown in Table 2
, we can easily see that the number of parameters in these two models is small. However, Faster R-CNN model spends much less time on the region proposal and the entire process than R-CNN mainly because 1. the R-CNN model does not share convolutional weights between region proposal and SVM classification; and 2. the way that Selective Search as its region proposal method is chosen is much expensive and numerous.
Of the above three experiments: The first two experiments show how the Faster R-CNN model detects sea clutter and RFI. Notably, the high AP and accurate region proposals both provide the proof that the deep learning method based on the Faster R-CNN model has good detection performance in the field of HFSWR. The last experiment gives a comparison between R-CNN and Faster R-CNN models in a much complex dataset which contains all the three RoIs, namely, sea clutter, ionospheric clutter and multiple forms of RFI. Compared with the AP results of zero tested by the R-CNN model, the results show the advantage of having fewer but more accurate region proposals, which is the benefit from the subnetwork RPN of the Faster R-CNN model.
It is clear that we still need to consider the distribution of the dataset, such as the roughly equal proportion of positive and negative samples, the subdivision of labels owing to the multiple forms of a feature and their distribution and the uniformity among the training and testing samples. There is no doubt that the network based on deep learning has high detection performance, but for further application, more research is required on the accurate detection of ionospheric clutter and RFI, which are more complicated for dataset collection, ground truth labeling, and training of the deep neural network.