During the radar detection process, the visual processing abilities of experienced radar operators often outperform many CFAR processing methods in terms of the detection of targets and management of false alarms. This can be attributed to the intricate visual observation mechanism of humans and their exceptional ability to extract image information. The effectiveness achieved in this manner cannot be replicated by any single CFAR method alone. Therefore, the integration of image processing techniques and computer vision technology with CFAR processing holds immense research value and has promising application prospects.
2.2.1. CNN Model
Convolutional neural network (CNN) is a deep neural network featuring localized connections, weight sharing, and additional defining traits. It is widely used in computer vision and pattern recognition domains, spanning various applications. CNN draws inspiration from the biological receptive field mechanism, wherein neurons solely receive signals from the localized area they govern. In the visual nervous system, the receptive field of a neuron refers to a specific region on the retina. Only when this area is stimulated can the neuron be activated [
26]. A standard convolutional network consists of a convolutional layer, pooling layer, and fully connected layer as its fundamental components. The network structure is depicted in
Figure 2.
In the CNN network, each convolutional block consists of
M convolutional layers and
b pooling layers. A CNN can be stacked by
N convolutional blocks, followed by
K fully connected layers. Convolution is the core of CNN. The input data enter the convolution layer after preprocessing. Assuming that the input data
and a convolution kernel weight
are given, the convolution of the input data
and the convolution kernel weight
are defined as:
where ∗ represents the convolution operation. The specific operation process is:
The primary function of the convolutional layer is to extract the features from a local region. By employing various convolutional kernels, it can perform equivalent operations as diverse feature extractors. The output of the convolutional layer can be represented by the following mathematical model:
where
b represents the bias and
represents the activation function. The activation function (activation layer) performs nonlinear operations on the data. In the CNN in this article, the relu function is employed as the activation function.
The output of the convolutional layer will enter the pooling layer. The function of the pooling layer is to decrease the dimensionality of the convolutional layer’s output feature vector, thereby reducing the parameter count.
Assuming that the input of the pooling layer is divided into multiple regions, the pooling operation is to downsample each region to obtain a value as a summary of the region. Commonly used pooling functions include maximum pooling and average pooling. The CNN network in this article uses the maximum pooling method. Maximum pooling is defined as follows:
The output of the convolutional block will be fed into the fully connected layer, which is a basic type of feed-forward neural network model known as the feedback neural network (FNN). The fully connected layer takes the features extracted by the convolutional block as its input and plays a role in classification throughout the entire CNN.
2.2.2. Experimental Data Acquisition
When the radar system receives multiple pulses, the range-pulse spectrum can be converted into a RD spectrum. The following formula is shown as:
where
represents the single-frame distance–pulse spectrum after pulse compression,
r represents the sequence number of the frame,
k represents the pulse number, and
t represents the fast time;
is the mathematical representation of the single-frame RD spectrum, where
represents the Doppler frequency; the RD spectrum will be obtained by multiplying the window function
win in the slow time domain and then performing discrete Fourier transform; here, we use the Hanning window as the window function to reduce the spectrum leakage and spectral peak side lobes, which helps to more accurately obtain the RD spectrum information of the target signal.
Figure 3 displays an RD spectrum after the above processing. The echo signals received by an HFSWR system located in the Bohai Sea are processed with two rounds of FFT to obtain the RD spectrum. The radar system parameters are shown in
Table 2.
In this spectrum, we can extract information related to the distance and Doppler frequency of the targets [
27]. This spectrogram describes the power spectral density of electromagnetic wave signals received by HFSWR within the radar detection range. The color range from blue–green–yellow represents the gradual increase in echo energy. It can be seen from the spectrum that the detection environment of HFSWR is very complex. In the actual sea detection process of HFSWR, it is difficult to accurately and comprehensively obtain the true number, location, and status of sea surface targets. While it is feasible to use ship automatic identification system (AIS) data and radar echo data correlation as target true value information, vessels without AIS can still be detected by radar. However, if we only select targets with AIS data, this approach is not conducive to constructing an RD spectrum dataset that encompasses comprehensive position and velocity ground truth values. Moreover, it will also impact the training and testing of subsequent models. Therefore, in order to subsequently verify the performance of the proposed method and compare it with existing methods, this paper adopts the target embedding evaluation method—embedding a certain number of simulated targets in the clutter background to construct a real scene of real HFSWR targets [
28]. In addition, we obtained a small amount of association data between the AIS and radar echoes of the target to verify the feasibility of detecting the target using the proposed method in measured target data.
In the original radar RD spectrum, we manually extracted the detection background. Firstly, a distance range of 20–70 km, and Doppler ranges of −0.5 Hz–−0.3 Hz and 0.3 Hz–0.5 Hz are selected from the fully measured RD spectrum as the background reference region and the distribution of this region is calculated. Then, the two regions between the positive and negative Bragg peaks and the zero frequency are randomly assigned values based on the distribution of the background reference region, resulting in a spectrum representing the background detection, as shown in
Figure 4.
In the detection background, we add a certain number of simulated targets. For each coherent integration period, we set the number of targets and their specific distance and velocity information. We adopt a linear frequency interrupted continuous wave (LFICW) approach where
pulses are transmitted within a coherent integration period. Each pulse in the received signal is sampled
times, and after performing a single fast Fourier transform (FFT),
range bins are obtained. Assuming there are
m targets in a specific range gate, the signal model of the sum of
m target signals can be represented as:
where
T represents the sampling interval,
k is the number of sampling points,
and
, respectively, represent the amplitude and Doppler frequency of the ith target,
represents the distance time delay of the ith target signal,
K represents the frequency modulation slope. Then, we obtained the RD data
of the simulation target according to Formula (
17), where
represents the number of range gates, and
represents the number of coherent accumulation cycles. The mathematical expression of
is as follows:
Finally, we superimpose the simulated target RD data and the measured background data
:
In this way, we can obtain the RD spectrum
of the semi-measured and semi-simulated radar that contains all the target true value information.
Figure 5 and
Figure 6 show a frame of RD spectrum containing 45 simulation targets and 35 simulation targets, respectively. For the convenience of viewing, the simulation targets are marked on the left figure and the simulation targets are not marked on the right figure.
It is worth mentioning that the simulated targets are within a distance range of 20–70 km. In practice, the HFSWR system has a detection blind zone of approximately 20 km. Additionally, different targets are located in different range and Doppler cells. The range resolution in the spectrum is 1.5 km and the Doppler resolution is 0.004 Hz. However, due to signal extensions in both the range and Doppler domains, two targets that are close in the spectrum can exhibit overlap, which is also observed in practical detection.
The HFSWR emits electromagnetic waves that interact with the rough sea surface. As a result, the resulting echo signal exhibits a ridge-like structure symmetrical about the Doppler zero frequency in the upward distance of the RD spectrum. Furthermore, the ionospheric clutter manifests as a band-like structure in the upward direction of the Doppler spectrum. Additionally, ground clutter is evident with a ridged structure similar to sea clutter at the Doppler zero frequency. In contrast, the target point in the RD spectrum appears as an isolated peak with a specific amplitude [
29]. In fact, these descriptions of different types of clutter and targets on the RD spectrum come from the human observation and perception of different spectral structures, and CNN can automatically extract the different characteristics of targets and clutter and conduct a one-step comparison, achieving the end-to-end classification of clutter and targets.
We utilize (VI)CFAR for the preliminary detection to obtain the first-level detection result for a single frame of RD spectrum data. Using the position of this result in the RD spectrum as the center, a 9 × 9 rectangular window is extracted from the RD spectrum to form a window slice, which contains the target or clutter in the form of two-dimensional image data. In terms of window size selection, we take into account the range and Doppler extensions of the target in the spectrum. However, the number of extension cells in each direction is generally limited to four. This ensures that the window slice contains all the information about the target in the spectrum while avoiding the redundancy caused by larger window sizes that may incorporate interference from other targets or clutter. Subsequently, CNN is employed to classify the target or clutter.
Figure 7 presents an example of window slices showing clutter and targets.
2.2.3. Training and Testing Process of (VI)CFAR-CNN
When a deep learning network is introduced to achieve radar target detection, the training of the network model is crucial. A simple training strategy is to perform sliding window processing on each frame of the RD spectrum to form window slice data that can be input to the network. Since the distance and speed information of the simulated target are known, the “target” tag is only added when the target is located in the detection unit; otherwise, a “non-target” tag is added. The problem with this strategy is that if the signal-to-noise ratio (SNR) of the target is very low, even if more such samples are trained, it will be difficult for the network model to distinguish whether it is a target or a clutter during the detection process; in addition, an imbalanced ratio of positive examples (targets) to negative examples (clutter) within the frame’s RD spectrum also has an impact on network training.
The training strategy of this article is to first use the CFAR detector to detect in each frame of the RD spectrum; the obtained detection results are centered on their position in the RD spectrum, and a 9 × 9 window is used to intercept them to form a window slice datum; based on the known target distance and speed information of each frame of data, and the distance and speed information detected by CFAR contained in each window slice datum, we can compare these information and determine whether each window slice datum is a target. Using this approach, each window slice datum can be appropriately labeled as either a target or a non-target. This labeling process effectively utilizes the CFAR detector to supervise and guide the CNN network in distinguishing between targets and non-targets.
Table 3 shows the CNN network structure parameters in the proposed method. In the network, window slice data are used as network input; the convolution layer is responsible for extracting features and obtaining the spatial structure information in the window slice through convolution operations; a batch normalization layer (BN) is introduced between the layers and the pooling layer to optimize the training and bolster network stability; the output of the last pooling layer is flattened into a one-dimensional vector and sent to the fully connected layer to complete classification.
Figure 8 shows the training and testing processes of (VI)CFAR-CNN detector.
During the training process, the CFAR detector detects “suspected targets” to form window slice data and records range and Doppler frequency information, and compares them with the actual range and Doppler information of the target, and if a match is found, add the label “target”; otherwise, add the label “non-target”. The multi-frame simulation constructs a training dataset consisting of window slices and their corresponding labels, which is then used for training the CNN model. During the test process, use CFAR for detection to obtain preliminary detection results, obtain window slice data based on the detection results, and then use the trained CNN model to further classify whether the window slice data are clutter or targets, and finally update the detection results to complete the detection.
In the (VI)CFAR-CNN detector, the training of the CNN model is supervised by the previous level CFAR detector. Different preset false alarm probability settings of the CFAR detector will produce different window slice data, such as using preset false alarms. In the window slice data generated by the CFAR detector with a low false alarm probability, the SNR of the detected target is generally high, and the proportion of clutter data in the overall data is small; while in the window slice data generated by the CFAR detector with a high preset false alarm probability, the range of the SNR of the detected target is larger, and the clutter data account for a larger proportion of the overall data. Therefore, different preset false alarm probability settings will affect the training of the CNN model. Here, we compare the results of training and testing on window slice data generated by CFAR detectors using different preset false alarm probabilities.
We used CFAR detectors with different preset false alarm probabilities (0.001, 0.002, 0.005, 0.01, 0.02, and 0.05) to generate different window slice data, and fused the window slice data generated by the above multiple preset false alarm probabilities to obtain a fused dataset. Then, we trained and tested different CNN models using the above seven types of datasets, and calculated four evaluation indicators: classification accuracy, recall, precision, and F1 score. For the sake of fairness, each type of dataset contains 2600 clutter window slices and 1000 target window slices, of which 1/10 is used for testing and the rest is used for training. According to the performance metrics shown in
Table 4, we discovered that the CNN model trained using window slice data generated by the CFAR detector with multiple preset false alarm probabilities demonstrates a superior performance. Therefore, in subsequent experiments, we used window slice data generated by the CFAR detectors with multiple preset false alarm probabilities to train the CNN model.