Adaptive Network Detector for Radar Target in Changing Scenes

: Data-driven deep learning has been well applied in radar target detection. However, the performance of the detection network is severely degraded when the detection scene changes, since the trained network with the data from one scene is not suitable for another scene with different data distribution. In order to address this problem, an adaptive network detector combined with scene classiﬁcation is proposed in this paper. Aiming at maximizing the posterior probability of the feature vectors, the scene classiﬁcation network is arranged to control the output ratio of a group of detection sub-networks. Due to the uncertainty of classiﬁcation error rate in traditional machine learning, the classiﬁer with a controllable false alarm rate is constructed. In addition, a new network training strategy, which freezes the parameters of the scene classiﬁcation network and selectively ﬁne-tunes the parameters of detection sub-networks, is proposed for the adaptive network structure. Comprehensive experiments are carried out to demonstrate that the proposed method guarantees a high detection probability when the detection scene changes. Compared with some classical detectors, the adaptive network detector shows better performance.


Introduction
As the most basic task of radar signal processing, radar target detection is widely used in military and civil aviation activities. In traditional detection methods, statistical inference based on probability distribution is used to distinguish the target of interest from clutter [1][2][3]. However, due to the complexity of clutter distribution, various methods can not match the clutter and target characteristics accurately, resulting in performance degradation.
In this case, the different features of clutter and target are usually extracted to realize a binary classification, which is called feature detection technology. These features usually include the radar echo amplitude, doppler spectrum, time-frequency map, polarization information, and other aspects [4][5][6][7][8]. For example, according to the fractal feature of the sea clutter and the target, some detectors are proposed to gain a high performance [9][10][11]. Using three separable features from time domain and frequency domain, the target detection is realized [12,13]. Based on the three polarization features, a three-dimensional feature detector is proposed to detect the small floating targets on the sea [14].
As the further development of feature detection technology, the method of using deep learning to detect targets in high-dimensional feature space has been proposed, which greatly improves the detection performance [15][16][17][18][19]. The advantage of this method is that it automatically selects and fully extracts features with greater differences between the clutter and the target, and automatically obtains detection results. Applications of deep networks to target detection have been in practice for some years [20][21][22][23]. The network detection methods can be divided into two categories, one is based on signal prediction, and the other is based on feature classification [24,25]. The former uses the threshold of clutter prediction error to get the detection judgement, and the latter uses the different features between the clutter and the target for binary detection. In the literatures, the integrated output of Radial Basis Function Neural Network (RBFNN) and General Regression Neural Network (GRNN) is used to predict sea clutter and detect small targets floating on the sea surface [26]. A long-short-term memory network (LSTM) sequence prediction is compared with subsequent signals to achieve target detection [27]. The application of Generative Adversarial Networks (GAN) with a pre-processing stage based on range-Doppler analysis is presented to perform clutter suppression and target detection [28]. A deep symmetric positive definite (SPD) matrix learning network is designed to detect the target signal [29]. Several researchers [30][31][32] also suggest the use of Convolutional Neural Network (CNN) to classify the time-frequency spectrum for target detection.
However, as the target detection method based on deep learning is a data-driven adaptive technology, it is limited by the quality of the dataset. When the detection scene changes, such as the detection area is located between different characteristic areas, ultimate performance degradation occurs in target detection. As the change of detection scene means the change of data distribution, the network trained with one scene is not suitable for another scene. Typical detection scenes include the boundary of the ground and the sea, the edge of the rainfall area and so on [33][34][35].
In order to address above problem, an adaptive network detector combined with scene classification is proposed in this paper. The scene classification network is designed to recognize the background clutter distribution, which represents a priori information about detection scenes. Aiming at maximizing the posterior probability of feature vectors, the output probability of the scene classification network is used to control the output ratio of a group of detection sub-networks. In addition, the detection sub-networks are integrated to tackle performance degradation. Finally, a constant false alarm detection output module is constructed to control the false alarm rate, which is an important index. In addition, a new network training strategy, which freezes the parameters of the scene classification network and selectively fine-tunes the parameters of detection sub-networks, is devised for the network structure. Accordingly, combining the target detection of each scenes, the proposed adaptive network detector can achieve data adaptation, complete the intelligent and integrated detection process, and adapt to the changing scenes.

Methods
The basic idea of the adaptive network detector is to use CNN extracting features of the input time-frequency maps for clutter and target classification. At the same time, considering the change in the detection scenes, a network structure is designed, which combines the scene classification network and the target detection network group. In addition, an optimization strategy of network training is also proposed to realize the data adaptation.

The Detector Framework
As shown in Figure 1, the border area of the ground and the sea is a typical example of changing scenes. When the aircraft target flies over the coastline, the clutter background characteristics are significantly different, and the performance of the network detection method is degraded. Based on this, the adaptive network detector is designed as follows.  Firstly, the adaptive network detector uses CNN extracting features of the input timefrequency maps for clutter and target classification. It is similar to putting time-frequency maps into a nonlinear system, which is capable of transforming the data space into a new feature space. Through the processing of the nonlinear system, as shown in Equation (1), the function of feature extraction is implemented.
where f fc is the fully connected operation, f cov is the convolution operation, and image represents the input time-frequency map. Accordingly, the distribution of the input in the new feature space S θ is evaluated. The specific feature vectors are referred to θ = [ f t , f c ] , where f t is the target feature component and f c is the clutter feature component. Considering the changing scenes D 1 and D 2 , the observation data x is denoted as x ∈ D, D = D 1 ∪ D 2 . With the event E = x ∈ D , E 1 = x ∈ D 1 and E 2 = x ∈ D 2 , the posterior probability of the feature vectors can be expressed as Assuming that the mapping of data x ∈ D 1 and x ∈ D 2 on the new feature space S θ is mutually exclusive, we derive the above formula as follows: The objective then becomes a matrix form as: From the Bayesian perspective, the detection decision requires minimum misclassification probability, i.e., maximum posterior probability. Thus, the parameter training process of deep networks meets the requirement.
Accordingly, the network structure proposed in this paper is conceived as shown in Figure 2. The output result of the whole network is the probability [p ft , p fc ], which represents the ultimate probability that a sample data belongs to the target or the clutter. The scene classification network adopts Residual Network (ResNet) as a basic model, and the network output is the probability of the event E 1 and E 2 , which refers to [p E1 , p E2 ]. The target detection sub-networks both use LeNet as the basic model, and the sub-networks are arranged parallelly as p E1t p E1c p E2t p E2c in the target detection network group. Besides, the scene classification network and the target detection network group are connected by a multiplier. Besides, the specific sub-network structure is shown in Figure 3. The scene classification network adopts Residual Network (ResNet) as a basic model. A skip connection is introduced, hence the information from the previous residual block flows into the next residual block without hindrance. It can better recognize the multi-type detection scenes. In addition, the target detection sub-networks both use LeNet as the basic model. It has less hierarchy and fewer parameters, which is enough to deal with simple data.
Finally, a constant false alarm rate module is added to meet the detection requirements. The main idea is to control the sample prediction probability in the final classification stage. As the vector of the output layer is [p ft , p fc ], where p ft is the probability that the sample belongs to a target signal and p fc is the probability that the sample belongs to a clutter signal, the threshold is set by the second component of the array to meet a controllable false alarm rate.
where P fa is the false alarm rate to be controlled, N c is the number of clutter samples, and represents rounding operation. o is the prediction probability set sorted by the second component, and T is the probability threshold for predicting the samples as clutter.

The Training Strategy
An optimization training strategy is also proposed for the network structure. With part of the dataset, the pre-trained scene classification network and the pre-trained target detection network are used to build the detector. In addition, all the dataset D is used to retrain the network detector, which makes the output results approach the true value on the entire dataset. Specifically, the parameters of the scene classification network are controlled to no longer being updated, and the parameters of the target detection network group are fine-tuned. The design is based on the consideration of the actual situation. The dataset collected by the radar is in the case of mixed scenes, and there are only accurate records of clutter and target, but no scene information.

Experimental Dataset
The data used for model training and testing includes two categories: real clutter and simulated target-plus-clutter. The real clutter uses a dataset for detection and tracking of dim aircraft targets through radar echo sequences [36], and the data numbered 11 and 14 are selected for experiments. Among them, Data11 is the ground clutter data, and Data14 is the sea clutter data. Table 1 shows the main information of the dataset. In this paper, the experiment of mixing two kinds of data is used to simulate the changing of detection scenes. As for the examples of target-plus-clutter, the selected real clutter vector is added to a target signal synthetically generated according to: where the phase φ is Uniform-distributed in [0, 2π) and the Doppler frequency f d is Uniform-distributed in [0.29, 0.35] . Frequency f d is normalized to pulse repetition frequency (PRF), since echoes are recorded at the PRF sampling rate. SCR is the signal-toclutter ratio and P c is the clutter power. Above all, the distribution information is the statistical value obtained from the real target information of the dataset. c(i) is a complex sequence of clutter echoes and c * (i) is its conjugate. The length of each sample sequence l c is set to 256, and different samples do not overlap when the sample is cut.

Data Preprocessing
Doing time-frequency analysis in the slow time dimension, the data is then preprocessed. The smoothed pseudo Wigner-Ville distribution (SPWVD) is used to obtain the time-frequency maps, which are expected to make further use of echo information to extract clutter and target features. It is defined as: where s(t) is the input signal, h(t) is the time-domain window function, g(t) is the frequency-domain window function. As shown in Figure 4, it is the separability of clutter and target samples of SPWVD that makes the detection method based on time-frequency maps meaningful and successful. It can be studied that the energy of the target is partially concentrated on a certain intermediate frequency and the time-frequency spectrum of clutter is messier. Besides, the input time-frequency maps of different detection scenes are significantly different, which is the main reason for the performance degradation of the detection network. Affected by strong scattering points, ground clutter has a larger amplitude than sea clutter. Due to the detection environment, the space-time characteristics of sea clutter are more complex and the frequency spectrum is more complex.
The ground clutter 20 40  The time-frequency analysis can then be normalized as follows, as it can speed up the model convergence and avoid the vanishing gradient problem.
where D(t, f ) is the pixel of the input time-frequency map, D(t, f ) is its mean value.

Training Details
The above experimental data is used to test three classic models: LeNet, VGGl6 and ResNet. The batch size is set to 128, and the learning rate is set to 0.01. The training criterion is the minimization of the cross-entropy loss and the training algorithm is backpropagation with momentum. Finally, ResNet is selected as the baseline of the scene classification network. LeNet is selected as the baseline of each sub-networks of the target detection network group, and a dropout layer is added to prevent overfitting. After the multiplier, a fully connected layer with two output nodes and a softmax layer are added. During the training, more than 16,000 ground clutter samples and sea clutter samples are used for pre-training the scene classification network. About 16,000 clutter samples and target samples under the ground clutter and the sea clutter were used for pre-training two target detection sub-networks separately. NET1 is trained by the ground observation data, and NET2 is trained by the sea observation data. Then, the detection network is constructed. Using the proposed training strategy, the whole network is retrained with all the 32,000 clutter samples and target samples. Finally, the false alarm rate is controlled constant at 0.001. Table 2 shows the training information of different models.

Result Analysis
Analyzing the extraction features of the difference between the clutter and the target, we visualize the feature expression during the test. As shown in Figure 5, the numerical features of the three convolutional layers are sequentially displayed. The kernel of each layer is the weight of the convolution operation on the receptive field. The feature maps of each convolutional layer are the local feature extracted by the kernel of each channel. Since the color change of feature maps indicates the response to the strength of the specific regional features, it is seen that the feature maps have the basic outline of the time-frequency ridge, and the features extracted by the higher layers are more abstract. To further analyze the final results of feature extraction, the data distribution in the last fully-connected layer is studied. The data distribution results can be regarded as twodimensional feature components, which are also the data distribution in the new feature space. The subsequent softmax layer and the constant false alarm detection judgment can be regarded as the classification surface. The results are shown in Figure 6, and the Signalto-Clutter Ratio (SCR) is −10 dB. It can be seen that the detection scenes changing leads to the mapping differences between different observation data in the new feature space. Moreover, it affects the layout of the classification surface and then affects the detection results. Using the prior information of scene classification, the proposed method unifies the data distribution into a new feature space and sets the decision surface synthetically, which realizes data adaptive.  ,d) shows the feature distribution when the test data is consistent with the detection network scene, (b,c) shows the feature distribution when the test data is inconsistent with the detection network scene, and (e,f) shows the feature distribution of the two test data in the adaptive network. It proves that the proposed method can gather the common class data together and separate clutter and target to realize the data adaptive target detection.
Similarly, Table 3 shows the test results of the ground observation data and the sea observation data using NET1, NET2, and the adaptive network. It demonstrates that the proposed method can solve the problems of weak common type feature learning and poor network performance caused by scene changing. Switching to IPIX real sea clutter and real target dataset [37], we use the data numbered #17 and #25 for further verification. The relevant data information is shown in Table 4 and it simulates the edge of the rainfall area. In this case, the scene classification network is changed to classify the sea state level. The target detection network group is connected in parallel with the detection sub-networks under the 3rd and 4th sea state. In addition, the other network structures remain unchanged. Table 5 is the detection performance. It further proves that the adaptive network detector can adapt to the data changes and is suitable for practical application.

Comparison with Classical Detectors
Verifying the advantages of the proposed method in changing scenes, this paper compares the proposed detector with some classical detectors. The classical detectors used for result comparison include the tri-feature detector [38], the SVM detector [39,40], and the heterogeneous clutter estimate-constant false alarm rate detector(HCE-CFAR) [41,42].
The tri-feature detector uses the relative average amplitude (RAA), relative Doppler peak height(RPH), and the relative vector entropy (RVE) to form a feature space, on which the convex hull optimization algorithm is used to distinguish the target signal from the clutter signal. The SVM detector first transforms the input time-frequency maps into the histogram of oriented gradients, and then obtains the detection result by the SVM classifier. As for HCE-CFAR, what kind of clutter the detection unit belongs to is determined in advance, and then the traditional CFAR detection is performed.
In the comparison experiment, we splice the units range 1-100 in Data11 and the units range 100-200 in Data14 for test. Only a single target is set in the 50th unit. As shown in Figure 7, the Range-Doppler image indicates the experimental setup. Besides, the false alarm rate is set to 0.001, and the detection performance of the proposed detector is compared with other classical detectors under different SCR. The results are presented in Figure 8. It can be seen that the detection performance of each detector increases with the increase of SCR. In addition, the proposed detector is significantly better than other classical detectors, especially in the case of low SCR. The proposed detector can better adapt to changing scenes and complex detection environments. The reason for the result is closely related to the data processing method. Figure 9 shows the separability between clutter and target features, which reveals the reasons for the comparison results. Figure 9a shows the numerical characteristics of the input amplitude information of HCE-CFAR. It can be seen that the single time domain feature is the weakest in distinguishing the target from the clutter, which leads to the weakest detection performance. Figure 9b shows the clutter and target distribution in the 3D feature space extracted by the tri-feature detector. Although the three features of RAA, RPH, and RVE are suitable for extracting target and clutter information in heterogeneous environments, other useful information of echo signal is still not utilized. It can be seen that the tri-feature detector is seriously affected by SCR. Under the condition of low SCR, the separability between clutter and target is weak, and the detection performance is low. Figure 9c shows the t-SNE visualization of HOG of the time-frequency maps used by the SVM detector. It can be seen that the time-frequency characteristics represent the signal information in time-domain and frequency-domain at the same time, which makes the clutter and target features better separated. However, the SVM detector does not consider the influence of changing scenes, clutter data is not gathered completely, and some clutter and target are mixed in the feature space, which affects the detection performance. Figure 9d shows the distribution of clutter and target in the new feature space extracted by the proposed detector. Through feature extraction of the combined network, the proposed method performs best in distinguishing target from clutter, and shows excellent detection performance.

Conclusions
In this paper, an adaptive network detector combined with scene classification has been proposed. It can solve the problem of performance degradation when detecting scene changes. The detector uses CNN extracting features of the input time-frequency maps. In order to maximize the posterior probability of the feature vectors, the scene classification network is designed to control the output ratio of a group of target detection sub-networks. A constant false alarm rate output module is also arranged. By using a new training strategy, the detector has achieved good results. Experiments are carried out with the real clutter data and the simulated target date, which verifies the advantages of the method in clutter feature extraction and common data feature learning. Compared with some classical detectors, the proposed detector has higher detection performance and better adapts to changing scenes. Because the proposed method is based on time-frequency analysis, it can be applied to both signal processing and image detection tasks.