Artiﬁcial Neural Networks and Deep Learning Techniques Applied to Radar Target Detection: A Review

: Radar target detection (RTD) is a fundamental but important process of the radar system, which is designed to differentiate and measure targets from a complex background. Deep learning methods have gained great attention currently and have turned out to be feasible solutions in radar signal processing. Compared with the conventional RTD methods, deep learning-based methods can extract features automatically and yield more accurate results. Applying deep learning to RTD is considered as a novel concept. In this paper, we review the applications of deep learning in the ﬁeld of RTD and summarize the possible limitations. This work is timely due to the increasing number of research works published in recent years. We hope that this survey will provide guidelines for future studies and applications of deep learning in RTD and related areas of radar signal processing.


Introduction
Radar target detection (RTD) is widely used to determine whether there is a signal present in noise. Since radar signals reflected from targets are often immersed in complex backgrounds (e.g., noise, clutter, even jamming), traditional signal processing methods are often used to boost signal-to-noise ratio (SNR) [1], while constant false alarm rate (CFAR) is a useful method for detection in a noise environment based on hypothesis testing [2]. Traditional CFAR-based detection methods consider the models of target or environment as a stochastic process which is usually based on statistical theory [3]. However, due to the complex detection environment and diverse target types, finding targets in a complex scene is an extremely challenging task, and therefore a reliable and robust RTD method has been one of the key pursuits of research [4].
Deep learning is a rapidly developing technology which has dramatically brought a breakthrough in many fields such as image classification, natural language processing, speech recognition, etc. [5,6]. As a subset of machine learning, deep learning-based models attempt to extract features from large scale raw data automatically. The success of deep learning is mainly due to the availability of big data, the improvement of computational power, and the ability of data processing [7]. Various deep neural network (DNN) technologies have been successfully used, including deep neural networks (DNN), convolutional neural networks (CNN), recursive neural networks (RNN), deep belief networks (DBN), etc. [8].
Although deep learning technology has demonstrated an exciting trend over the past few years, its full potential for radar application has not yet been explored. In ref. [9], researchers grouped the radar application problems that can be solved by deep learningbased methods into three general categories: radar sensing, radar signal processing, and radar automatic target recognition (ATR), respectively, which are listed in Figure 1. Radar sensing and radar signal processing are the necessary prerequisites and procedures for radar ATR. Deep learning methods have been fully applied in radar sensing. Wang et al. [10] applied CNNs to radar waveform recognition. According to [11,12], deep learning methods can solve the problem of high computational complexity of antenna parameter optimization. The design of array antennas has been addressed by artificial neural networks (ANN) [13]. Cognitive radar antenna selection has also been solved by deep learning methods [14,15]. Gao et al. [16] explored a feature extractor based on CNN and a stacked autoencoder (SAE) to recognize modulated signals of radar, including LFM, NLFM, BPSK, FRANK, COSTAS, P1, P2, P3 and P4, etc.
Radar Signal Processing is an intermediate procedure between radar sensing and ATR. One of the most important purposes of radar signal processing is to detect targets. Machine learning-based classifiers algorithms, such as support vector machines (SVM) [42] and k-Nearest-Neighbour (kNN) [43], have been used for RTD [44,45]. As previously mentioned, RTD is a binary hypothesis testing which can be regarded as a binary classification problem, namely whether the target is present or absent. Based on this assumption, RTD can be regarded as an ATR application. As deep learning models have achieved good performance in ATR, it is feasible and reasonable to explore deep learning-based models in RTD. As one of the most reliable classifiers, ANN has been utilized to improve radar detection performance [46,47]. Many recent literatures utilize DNN to tackle RTD and also present performance improvements. Figure 2 illustrates the major developments and application of deep learning in RTD. Detailed descriptions of each algorithm will be presented later. Deep learning methods have been fully applied in radar sensing. Wang et al. [10] applied CNNs to radar waveform recognition. According to [11,12], deep learning methods can solve the problem of high computational complexity of antenna parameter optimization. The design of array antennas has been addressed by artificial neural networks (ANN) [13]. Cognitive radar antenna selection has also been solved by deep learning methods [14,15]. Gao et al. [16] explored a feature extractor based on CNN and a stacked autoencoder (SAE) to recognize modulated signals of radar, including LFM, NLFM, BPSK, FRANK, COSTAS, P1, P2, P3 and P4, etc.
Radar Signal Processing is an intermediate procedure between radar sensing and ATR. One of the most important purposes of radar signal processing is to detect targets. Machine learning-based classifiers algorithms, such as support vector machines (SVM) [42] and k-Nearest-Neighbour (kNN) [43], have been used for RTD [44,45]. As previously mentioned, RTD is a binary hypothesis testing which can be regarded as a binary classification problem, namely whether the target is present or absent. Based on this assumption, RTD can be regarded as an ATR application. As deep learning models have achieved good performance in ATR, it is feasible and reasonable to explore deep learning-based models in RTD. As one of the most reliable classifiers, ANN has been utilized to improve radar detection performance [46,47]. Many recent literatures utilize DNN to tackle RTD and also present performance improvements. Figure 2 illustrates the major developments and application of deep learning in RTD. Detailed descriptions of each algorithm will be presented later.
Applying deep learning technology to the field of RTD is a novel concept, yet by now, there is no paper which comprehensively summarizes and introduces the application and its development status. In this paper, we try to review the applications of ANN and deep learning methods in the classical radar system problem of target detection. This review focuses on articles in online databases, e.g., IEEExplore, Open Science Elsevier, Scopus, Springer and Researchgate. Recent articles (available by July 2021) published in major journals of radar signal processing and major international conferences on artificial intelligence attract more attention; these include IEEE Signal Processing Magazine, IEEE Transactions on Antennas and Propagation, IEEE Transactions on Aerospace and Electronic Systems, IEEE Transactions on Geoscience and Remote Sensing, IEEE Geoscience and Remote Sensing Letters, IEEE Journal of Selected Topics in Signal Processing, IEEE Transactions on Signal Processing, ISPRS Journal of Photogrammetry and Remote Sensing, IEEE Radar Conference, International Conference on Radar, IEEE Conference on Computer Vision and Pattern Recognition, and IEEE International Conference on Computer Vision. In addition, a number of research papers from other sources are related to this topic and thus included in this review, and most of them were published between 2000 and 2020. These papers were selected by keywords in the titles and keywords of the studies, and they represent a wide range of: (a) methods from theoretical derivation to application research, (b) detection background from noise to clutter, (c) applications from maritime target detection to human motion detection, (d) data forms from radar received echoes to PPI images, and (e) comparative methods from neural networks to deep learning models. Lastly, only studies in the English language are included in the review.
The rest of this review is organized as follows: Section 2 introduces related work on RTD including traditional CFAR detectors, which are often used for performance comparisons. For completeness, in Section 3, we first recap on the basic methodology of ANN before applying it to practical applications, and we then recap the recent deep learning-based methods for. In Section 4, some open datasets are described as well as how to construct synthetic datasets, while Section 5 summarizes the research challenges and opportunities. Finally, conclusions are presented in Section 6.

Traditional Processing Methods for RTD
In a complex scenario of RTD, radar echo signals are often immersed in noise, jamming and clutter, etc., which is shown in Figure 3. During the typical pulse Doppler radar signal processing, the reflected radar signals are processed by a series of methods such as matched filtering, coherent accumulation, clutter suppression, CFAR detection, etc. signal processing, the reflected radar signals are processed by a series of methods such a matched filtering, coherent accumulation, clutter suppression, CFAR detection, etc. The received radar signals are sampled at a certain rate, and the pulses are com pressed by the matched filter to obtain high range resolution and narrow pulse width Moving target indication (MTI) is performed to suppress static clutter. Then, Doppler pro cessing or coherent accumulation is applied over multiple pulses at each range unit to obtain the Range-Doppler spectrum. The reflected signal amplitude for each Range-Dop pler is stored in separate cells, after which the CFAR detector is utilized to reveal one whose amplitude exceeds the threshold of detection. Thus, the corresponding information about velocity and position can be measured. The flowchart is presented in Figure 4.  One of the most important purposes of designing a radar detector is to distinguish targets from noise, clutter and jamming signals. A decision must be made at the end o the detector as to whether the radar echo contains the target or not. The classical method is to establish an adaptive detection threshold based on statistical models, which varie according to noise and clutter energy. In order to minimize the false alarm rate ( fa P ) and maximize the probability of detection ( D P ), the Neyman-Pearson criterion is utilized fo decision-making [2]. The typical performance requirement of a radar system will require 0.8 D P  and . This problem is commonly solved by applying a CFAR detector which adaptively determines a local optimum threshold and maintains the fa P to be con stant as a predetermined value [48]. , , The received radar signals are sampled at a certain rate, and the pulses are compressed by the matched filter to obtain high range resolution and narrow pulse width. Moving target indication (MTI) is performed to suppress static clutter. Then, Doppler processing or coherent accumulation is applied over multiple pulses at each range unit to obtain the Range-Doppler spectrum. The reflected signal amplitude for each Range-Doppler is stored in separate cells, after which the CFAR detector is utilized to reveal one whose amplitude exceeds the threshold of detection. Thus, the corresponding information about velocity and position can be measured. The flowchart is presented in Figure 4.
Electronics 2022, 11, x FOR PEER REVIEW signal processing, the reflected radar signals are processed by a series of methods s matched filtering, coherent accumulation, clutter suppression, CFAR detection, etc The received radar signals are sampled at a certain rate, and the pulses are pressed by the matched filter to obtain high range resolution and narrow pulse Moving target indication (MTI) is performed to suppress static clutter. Then, Dopple cessing or coherent accumulation is applied over multiple pulses at each range u obtain the Range-Doppler spectrum. The reflected signal amplitude for each Range pler is stored in separate cells, after which the CFAR detector is utilized to reve whose amplitude exceeds the threshold of detection. Thus, the corresponding inform about velocity and position can be measured. The flowchart is presented in Figure   0 5000 10000 15000 -8  One of the most important purposes of designing a radar detector is to distin targets from noise, clutter and jamming signals. A decision must be made at the the detector as to whether the radar echo contains the target or not. The classical m is to establish an adaptive detection threshold based on statistical models, which according to noise and clutter energy. In order to minimize the false alarm rate ( f P maximize the probability of detection ( D P ), the Neyman-Pearson criterion is utiliz decision-making [2]. The typical performance requirement of a radar system will r 0.8 D P  and . This problem is commonly solved by applying a CFAR de which adaptively determines a local optimum threshold and maintains the fa P to b stant as a predetermined value [48]. Figure 5 depicts a general CFAR detector, which is described as a shift regi length 2   One of the most important purposes of designing a radar detector is to distinguish targets from noise, clutter and jamming signals. A decision must be made at the end of the detector as to whether the radar echo contains the target or not. The classical method is to establish an adaptive detection threshold based on statistical models, which varies according to noise and clutter energy. In order to minimize the false alarm rate (P f a ) and maximize the probability of detection (P D ), the Neyman-Pearson criterion is utilized for decision-making [2]. The typical performance requirement of a radar system will require P D ≥ 0.8 and P f a ≤ 10e −4 . This problem is commonly solved by applying a CFAR detector, which adaptively determines a local optimum threshold and maintains the P f a to be constant as a predetermined value [48]. Figure 5 depicts a general CFAR detector, which is described as a shift register of length 2n + 1. The input samples are sent into the detector cell by cell and the energy y in the cell under test (CUT) is estimated. The CFAR detector adjusts the statistic value z according to the variation of energy in 2n reference cells. The energy of the CUT is compared to the statistical result z of the CFAR processor scaled by a constant scale factor α. Thus, the detection threshold is represented as αz. Determining whether there is a target according to (1): In order to find an adaptive detection threshold, many CFAR-based methods have been studied, which can adapt the threshold to the background changes, keeping a constant P f a . The diagram of typical CFAR processors is also showed in Figure 5 [49]. In order to find an adaptive detection threshold, many CFAR-based methods have been studied, which can adapt the threshold to the background changes, keeping a constant fa P . The diagram of typical CFAR processors is also showed in Figure 5 [49]. Compared with one-dimensional range detection, two-dimensional detection under phase-coherent accumulation includes range and Doppler detection, which is presented in Figure 6. x denotes the reference cells of range dimension, v denotes the reference cells of the Doppler dimension. The detection process of Doppler dimension is the same as that of range dimension. CFAR-based detectors have been well-studied over the past years, however, there still remains a trade-off between the various CFAR technologies. The cell averaging CFAR (CA-CFAR) is the most widely-used method which has the highest detectability in homogeneous background [50]. However, it also exhibits severe performance degradation in the presence of an interfering target or an abrupt change in clutter background [51]. The greatest CFAR (GO-CFAR) and the smallest CFAR (SO-CFAR) detectors were developed to improve the detection performance under various non-ideal conditions such as nonhomogeneous clutter background and multiple target environments [52]. Rohling et al. [53] developed an ordered statistics CFAR (OS-CFAR) to control the detecting threshold even Compared with one-dimensional range detection, two-dimensional detection under phase-coherent accumulation includes range and Doppler detection, which is presented in Figure 6. x denotes the reference cells of range dimension, v denotes the reference cells of the Doppler dimension. The detection process of Doppler dimension is the same as that of range dimension. In order to find an adaptive detection threshold, many CFAR-based methods have been studied, which can adapt the threshold to the background changes, keeping a constant fa P . The diagram of typical CFAR processors is also showed in Figure 5 [49]. Compared with one-dimensional range detection, two-dimensional detection under phase-coherent accumulation includes range and Doppler detection, which is presented in Figure 6. x denotes the reference cells of range dimension, v denotes the reference cells of the Doppler dimension. The detection process of Doppler dimension is the same as that of range dimension. CFAR-based detectors have been well-studied over the past years, however, there still remains a trade-off between the various CFAR technologies. The cell averaging CFAR (CA-CFAR) is the most widely-used method which has the highest detectability in homogeneous background [50]. However, it also exhibits severe performance degradation in the presence of an interfering target or an abrupt change in clutter background [51]. The greatest CFAR (GO-CFAR) and the smallest CFAR (SO-CFAR) detectors were developed to improve the detection performance under various non-ideal conditions such as nonhomogeneous clutter background and multiple target environments [52]. Rohling et al. [53] developed an ordered statistics CFAR (OS-CFAR) to control the detecting threshold even CFAR-based detectors have been well-studied over the past years, however, there still remains a trade-off between the various CFAR technologies. The cell averaging CFAR (CA-CFAR) is the most widely-used method which has the highest detectability in homogeneous background [50]. However, it also exhibits severe performance degradation in the presence of an interfering target or an abrupt change in clutter background [51]. The greatest CFAR (GO-CFAR) and the smallest CFAR (SO-CFAR) detectors were developed to improve the detection performance under various non-ideal conditions such as nonhomogeneous clutter background and multiple target environments [52]. Rohling et al. [53] developed an ordered statistics CFAR (OS-CFAR) to control the detecting threshold even though other interfering signals occurred in reference cell. As far as we know, CFAR methods are still under investigation, such as the censored mean level detector CFAR (CMLD-CFAR) and trimmed mean CFAR (TM-CFAR) detector [54], which were introduced to improve the anti-inference performance. However, each of those CFAR algorithms is only limited to some specific cases, lacking in generality.

Deficiencies and Challenges in Conventional Approaches
RTD is a rather complex problem in practical applications. Although conventional methods have been working well in some conditions, deficiencies and challenges still exist. Currently, the difficult problems of RTD still mainly lie in high-resolution processing of targets, clutter suppression, anti-jamming technology, and 'low-small-slow' target detection (low glancing angle, small size, slow or stationary), etc. [4].
Conventional CFAR-based methods in radar systems mainly depend on statistical hypothesis. (1) In the actual RTD, only the specific type of target under the specific background has good detection performance, because the predefined parameters of the detector, such as margin, threshold, sizes of the guard and reference windows, will determine the detection accuracy. However, radar always works in a diverse scene. (2) Furthermore, traditional methods are computationally expensive and not flexible because they process inputs cell-by-cell and need to manually change the window size to adaptive targets of different resolutions. (3) Most importantly, in most cases, neither the target nor the environment (noise, clutter, interference) have known statistical models. It is difficult to find suitable parameters to design the radar detector, not to mention predict its performance accurately.
In brief, traditional statistical methods are no longer applicable to complex scenes and the selection of the optimal parameter set is extremely challenging. It seems reasonable and inevitable to develop a data-driven deep learning approach for RTD.

Artificial Neural Networks and Deep Learning-Based Models for RTD
ANN is motivated by the biological structure of the human brain. Generally, a neural network consists of neuron, weight vector, bias, activation function, etc. Multi-Layer Perceptron (MLP) is a type of neural network which consists of several hidden layers with neurons in layers being interconnected to each other [55]. Neural networks can be utilized in RTD due to their learning ability. In fact, the problem of RTD can be considered as a problem of pattern recognition, which fits well to the possibilities that an ANN provides [56]. Several approaches have suggested that considering ANNs as non-linear detectors could improve the detection performance. A typical ANN detector is shown in Figure 7. The output of each neuron can be given as (2): where X k is the input vector of the k th layer, W k is the weight vector of the k th layer, b k is an element of bias vector of the k th layer, f (·) is the activation function. ANN detector produces y 1 if the CUT contains a value larger than the sum of all the reference cells scaled by weights, otherwise, it would output y 2 . Usually, a more complex ANN detector with multiple hidden layers would give better performance. Using multiple layers to achieve more powerful generalization and abstractions, DNN outperforms all the machine learning methods. The traditional shallow neural network requires an empirical feature extraction process to decrease the networks' computational load. If the features have to be extracted automatically by the network, a deep network with powerful training capability is necessary. Currently, development in computer power and memory have made it possible to train very large-scaled networks. Various sizes of layers are used to provide different degrees of abstraction and generalization [57].
The increased popularity of deep learning has brought an increase in research publications related to RTD with deep learning models over the past few years. Specially de- Using multiple layers to achieve more powerful generalization and abstractions, DNN outperforms all the machine learning methods. The traditional shallow neural network requires an empirical feature extraction process to decrease the networks' computational load. If the features have to be extracted automatically by the network, a deep network with powerful training capability is necessary. Currently, development in computer power and memory have made it possible to train very large-scaled networks. Various sizes of layers are used to provide different degrees of abstraction and generalization [57].
The increased popularity of deep learning has brought an increase in research publications related to RTD with deep learning models over the past few years. Specially designed network frameworks for RTD in noise environments and clutter backgrounds will be included in this work; in addition, several different DNNs will be reviewed in this paper, and each of them perform well for RTD. The increase in the number of related publications confirms the valid and increasing motivation of the research community on RTD tasks.
Next, we will review the recent articles using deep learning that address RTD in online databases. Table 1 summarizes the related works on RTD where we present the type of tasks and main contributions of each study. A more detailed description of the detection methods is provided in Table 2, where we summarize all the related papers, particularly highlighting the architecture of networks and the type of input and dataset. Unfortunately, a direct comparison for all methods is not possible because they were evaluated on different datasets.

RTD in Noise Background
In the past decades, many approaches have been proposed to address the issue of RTD in diverse types of noise or clutter scenarios. Those approaches include complete descriptions of environmental statistics as well as statistical computing power. Recently, deep learning-based schemes were proposed to cope with the problem of RTD within noise backgrounds.
In practical application, the likelihood ratio can be obtained by sufficient statistics, which mainly depends on the probability density function of the noise, for example, Gaussian white noise. However, in a modern radar system, noise is usually non-Gaussian distribution, thus the likelihood ratio has a complex non-linear relation which makes it difficult to implement sufficient statistics [83,84]. Early in 1988, Gandhi and Kassam [85] analyzed the theoretical principle of CFAR processors in non-homogeneous backgrounds. This kind of processing can also be handled by ANN due to its ability to realize complicated nonlinear mappings on the data directly. In 1997, Gandhi and Ramaurti [58] were likely the first to employ neural networks to detect signals in non-Gaussian noise at some specified P f a . With a detailed theoretical derivation, it was noted that the ANN detector's performance did not rely on the SNR, but actual relied on signal strength and noise common variations during training. By setting several non-Gaussian noise environments, the performance of the ANN detector has been shown to outperform the matched filter as well as the locally optimum detectors under some certain non-Gaussian noise environments. However, the computational power and the storage requirements are generally higher in the ANN detector.
In order to improve the detection performance of radar a in non-homogeneous environment, in 2015, Rohman et al. [59] presented a novel adaptive switch between the CA-CFAR and OS-CFAR detector by using the ANN structure with MLP which consisted of 2 hidden layers. The proposed architecture is illustrated in Figure 8. The inputs of ANN were calculated thresholds and the CUT value and the output was a preliminary threshold. Then, the nearest value between raw threshold and CA or OS-CFAR would be selected and utilized as the final threshold. The experiment's results showed that the combined approach is capable of switching between CA-CFAR and OS-CFAR properly in homogenous and non-homogeneous environments based on the best detection performance.
were calculated thresholds and the CUT value and the output was a preliminary threshold. Then, the nearest value between raw threshold and CA or OS-CFAR would be selected and utilized as the final threshold. The experiment's results showed that the combined approach is capable of switching between CA-CFAR and OS-CFAR properly in homogenous and non-homogeneous environments based on the best detection performance. Additionally, a competent radar system must provide a high D P with a low fa P , which is the major principle of employing standard CFAR detectors. Akhtar and Olsen [60] presented an ANN, which is trained under a CA-CFAR detector for a fluctuating target detection procedure with a noise background. The ANN detector would output positive outcomes conditionally if a real target exists at a CUT and CA-CFAR returns a positive detection or the network would not return positive results. A prominent benefit of the ANN detector is that the outcome may be regarded as a measure of D P , and not necessarily be either 0 or 1. It was also shown that such a scheme can obtain a slightly lower, or comparable target detection performance, but with a noticeably lower fa P than that of the traditional CA-CFAR detector.
Taking into consideration that ANNs are confirmed to be able to approximate the CFAR detector, Amores et al. [86] further argued that the ANN can improve the robustness of a radar detector. Results show that although the detection performance of the trained network tends to increase as the number of hidden neurons increases, MLPs with one hidden layer with 23 units can implement very robust detectors for SNR lower than 10 dB. For more than 23 hidden neurons, the performance improvement is trivial, while the associated computational cost continues growing.

RTD in Clutter Background
Compared with a noise background, target detection in clutter background is a more common but challenging problem, because the signal returned from the targets are severely immersed by the backscatter from the clutter. According to theoretical analysis, the detection performance of CFAR detection increases as the number of reference cells increases; as the number approaches infinity, the CFAR detector approaches the optimal detector [87]. However, the serious degradation in D P of the traditional CFAR detectors is due to a reduction in the number of available reference units. This decrease may be caused by high-resolution, the presence of interference signals, or clutter patches.
In the CFAR scheme, the need for a larger reference window, or more reference cells, results from the statistical requirements for the parameters which are used for representing the clutter background and the target fluctuation. The information loss caused by the window size reduction can be compensated for by introducing some extra parameters. Early in 1994, Amoozegar et al. [61] presented a NN-based CFAR detection method that provides a robust performance to compensate the loss of reference cells. The input layer contains nine specific statistical parameters which represent the features of target and clutter. A multi-layer feedforward neural network with sigmoid activation function was Additionally, a competent radar system must provide a high P D with a low P f a , which is the major principle of employing standard CFAR detectors. Akhtar and Olsen [60] presented an ANN, which is trained under a CA-CFAR detector for a fluctuating target detection procedure with a noise background. The ANN detector would output positive outcomes conditionally if a real target exists at a CUT and CA-CFAR returns a positive detection or the network would not return positive results. A prominent benefit of the ANN detector is that the outcome may be regarded as a measure of P D , and not necessarily be either 0 or 1. It was also shown that such a scheme can obtain a slightly lower, or comparable target detection performance, but with a noticeably lower P f a than that of the traditional CA-CFAR detector.
Taking into consideration that ANNs are confirmed to be able to approximate the CFAR detector, Amores et al. [86] further argued that the ANN can improve the robustness of a radar detector. Results show that although the detection performance of the trained network tends to increase as the number of hidden neurons increases, MLPs with one hidden layer with 23 units can implement very robust detectors for SNR lower than 10 dB. For more than 23 hidden neurons, the performance improvement is trivial, while the associated computational cost continues growing.

RTD in Clutter Background
Compared with a noise background, target detection in clutter background is a more common but challenging problem, because the signal returned from the targets are severely immersed by the backscatter from the clutter. According to theoretical analysis, the detection performance of CFAR detection increases as the number of reference cells increases; as the number approaches infinity, the CFAR detector approaches the optimal detector [87]. However, the serious degradation in P D of the traditional CFAR detectors is due to a reduction in the number of available reference units. This decrease may be caused by high-resolution, the presence of interference signals, or clutter patches.
In the CFAR scheme, the need for a larger reference window, or more reference cells, results from the statistical requirements for the parameters which are used for representing the clutter background and the target fluctuation. The information loss caused by the window size reduction can be compensated for by introducing some extra parameters. Early in 1994, Amoozegar et al. [61] presented a NN-based CFAR detection method that provides a robust performance to compensate the loss of reference cells. The input layer contains nine specific statistical parameters which represent the features of target and clutter. A multi-layer feedforward neural network with sigmoid activation function was proposed to extract the features of target and clutter fluctuations. The results of experiments under diverse scenarios indicated that the NN-CFAR scheme consistently provides a superior robustness in responding to new environments and outperforms CA-CFAR detectors with small-sized reference windows.
To distinguish target from clutter, another meaningful method is to look for certain intrinsic features from the returned echoes to describe the difference between the targets and the clutter. Callaghan et al. [63] selected the magnitude of each Range-Doppler pixel as the feature from Range-Doppler maps and processed them by SVM and KNN algorithms to discriminate small maritime targets from sea clutter. Similarly, Li et al. [64] explored and extracted three practically discriminative features, namely the frequency peak to average ratio, the temporal Hurst exponent, and the temporal information entropy, from radar echo in time and frequency domains to construct feature space. An SVM-based detector which can flexibly control the P f a was designed for target detection within sea clutter. Experimental results showed that the detection probability of the proposed detector is obviously higher than that of the classical detectors under the condition of low signal-toclutter ratio (SCR) and low P f a cases. All these statistical parameters which can depict the target and clutter characteristics may be embedded as the input of the NN-based detector in the literature [61].
However, because the characteristics of clutter highly rely on the actual environment and the parameters of radar, the above extracted parameters or features often become ineffective when the detection environment changes. Compared with machine learning methods which need to empirically select features, ANN is more suitable for extracting high-dimensional features, and has been adopted as a method of radar signal detection [88]. Following on from this work, Cheikh and Soltani [62] used different ANN architectures to assess the problem of RTD in a K-distributed clutter with thermal noise. They considered the MLP architecture with the genetic algorithm and back-propagation as the training methods, and the radial basis function (RBF) which has been largely used in signal processing was also adopted in the neuronal detector. A training set was used which can describe the clutter distribution well, and the architecture of the ANN-CFAR detector is depicted in Figure 9. The results show that the ANN-CFAR detector with MLP structure has better performance compared to the classical OS-CFAR and CA-CFAR detectors.
proposed to extract the features of target and clutter fluctuations. The results of experiments under diverse scenarios indicated that the NN-CFAR scheme consistently provides a superior robustness in responding to new environments and outperforms CA-CFAR detectors with small-sized reference windows.
To distinguish target from clutter, another meaningful method is to look for certain intrinsic features from the returned echoes to describe the difference between the targets and the clutter. Callaghan et al. [63] selected the magnitude of each Range-Doppler pixel as the feature from Range-Doppler maps and processed them by SVM and KNN algorithms to discriminate small maritime targets from sea clutter. Similarly, Li et al. [64] explored and extracted three practically discriminative features, namely the frequency peak to average ratio, the temporal Hurst exponent, and the temporal information entropy, from radar echo in time and frequency domains to construct feature space. An SVM-based detector which can flexibly control the fa P was designed for target detection within sea clutter. Experimental results showed that the detection probability of the proposed detector is obviously higher than that of the classical detectors under the condition of low signal-to-clutter ratio (SCR) and low fa P cases. All these statistical parameters which can depict the target and clutter characteristics may be embedded as the input of the NNbased detector in the literature [61]. However, because the characteristics of clutter highly rely on the actual environment and the parameters of radar, the above extracted parameters or features often become ineffective when the detection environment changes. Compared with machine learning methods which need to empirically select features, ANN is more suitable for extracting high-dimensional features, and has been adopted as a method of radar signal detection [88]. Following on from this work, Cheikh and Soltani [62] used different ANN architectures to assess the problem of RTD in a K-distributed clutter with thermal noise. They considered the MLP architecture with the genetic algorithm and back-propagation as the training methods, and the radial basis function (RBF) which has been largely used in signal processing was also adopted in the neuronal detector. A training set was used which can describe the clutter distribution well, and the architecture of the ANN-CFAR detector is depicted in Figure 9. The results show that the ANN-CFAR detector with MLP structure has better performance compared to the classical OS-CFAR and CA-CFAR detectors. Previous work, such as [58,61,62], attempted to improve the target detection performance on the basis of traditional CFAR methods. One objective in [59,60,69] is to replace the CFAR-based detector completely with neural networks to optimize the process of detection. In 2019, Akhtar and Olsen [65] continued to propose a more general training strategy where the conventional GO-CFAR detectors mutually work to detect targets in Kdistributed clutter. This process would also be transferred into an ANN with four hidden Previous work, such as [58,61,62], attempted to improve the target detection performance on the basis of traditional CFAR methods. One objective in [59,60,69] is to replace the CFAR-based detector completely with neural networks to optimize the process of detection. In 2019, Akhtar and Olsen [65] continued to propose a more general training strategy where the conventional GO-CFAR detectors mutually work to detect targets in K-distributed clutter. This process would also be transferred into an ANN with four hidden layers and 19 nodes in each layer, with tanh as the activity function. The training strategy is related to [60] and the ANN structure is similar to that of [62], which is shown in Figure 9. The complete training data includes 2000 independent range-doppler maps with the average SNR over all CPIs ranging from −40 dB to 75 dB, while the average SCR varied from −60 dB to 60 dB. The experimental results show that, at least for specifically trained scenes, the overall detection performance of the ANN can significantly outperform a GO-CFAR detector, namely augment on P D with reduction in P f a .
From the above work, these ANN-based methods distinguish targets from a noise or clutter background and deliver better and more robust results than conventional statistical approaches. In addition, the multilayer architecture shows better performance, particularly in a mixed clutter environment. However, classical ANNs also have a major limitation; that is, due to the small number of neurons and layers, satisfactory performance may not be achieved when dealing with classification and regression problems.

Deep Learning for RTD with Different Data Forms
DNNs, such as CNN, RNN, etc., operate in a similar way to ANNs, but with more hidden layers and neurons. DNNs are capable of learning complex relationships from different types of data, which are more suitable to deal with large datasets and complex training algorithms. A basic CNN architecture consists of an input layer, a convolution layer, a pooling layer, a fully connected layer, and an output layer. The convolutional operation could generate local feature maps, and the pooling operation plays a role in obtaining the translation invariant [9]. The connections between these layers are more sparse than classical ANN structures.
CNNs are favored for computer vision tasks due to the grid structure of the input data, and the sparseness and locality of interactions between layers [89]. Typical optical image target detection algorithms based on CNNs are grouped into two categories: regional proposal-based and regression-based. The former contains R-CNN [90], SPP-Net [91], Fast RCNN [92] and Faster R-CNN [93], the latter contains YOLO [94], SSD [95], etc. Recently, these CNN-based methods have been widely used in RTD, which present faster speed, higher detection and position accuracy compared with CFAR.
Radar echo is a one-dimensional discrete time sequence, but a radar input data usually includes data of multiple range cells for a certain time, which determines that there are diverse forms of data for the detection network. The accuracy of deep learning also scales with the data. To accomplish this goal, the received radar signals are reshaped as various images to fit the CNN input format. Therefore, in addition to the original radar received signals, most researchers tried to detect and measure targets hidden in the noise background and clutter for multi-dimensional information with multiple CNN models based on range-Doppler spectrums, pulse-range maps, time-frequency images, synthetic aperture radar (SAR) images and plane position indicator (PPI) images as the inputs respectively, listed in Figure 10. Many studies have concluded that traditional signal processing methods of radar, serving as preprocessing methods of training data, is beneficial to extract features and improve the performance of detection. But actually, as the conventional processing meth-

Radar Received Echo
Many studies have concluded that traditional signal processing methods of radar, serving as preprocessing methods of training data, is beneficial to extract features and improve the performance of detection. But actually, as the conventional processing methods, matched filtering and coherent accumulation are essentially convolution computations, and DNN models can extract features from inputs automatically, so it is feasible to detect targets from original radar echoes with DNNs. Jiang et al. [49] proposed a model for RTD based on CNN, which works with radar echo signal directly and therefore avoids the process of conventional signal processing. The proposed model explores the time and frequency domain of radar echo, which is presented in Figure 11. The echo signal is a one-dimensional discrete complex sequence, so the input data should be constructed as a radar echo cube to fit the network. The main goal of RTD is not only to distinguish the target from noise, but also to predict position and velocity. The RD-Detection Net is used to measure range and velocity, while azimuth and elevation are predicted by the Angle-Detection Net. The CNN-based model presented better accuracy and performance of detection than traditional approaches.

Radar Received Echo
Many studies have concluded that traditional signal processing methods of radar, serving as preprocessing methods of training data, is beneficial to extract features and improve the performance of detection. But actually, as the conventional processing methods, matched filtering and coherent accumulation are essentially convolution computations, and DNN models can extract features from inputs automatically, so it is feasible to detect targets from original radar echoes with DNNs. Jiang et al. [49] proposed a model for RTD based on CNN, which works with radar echo signal directly and therefore avoids the process of conventional signal processing. The proposed model explores the time and frequency domain of radar echo, which is presented in Figure 11. The echo signal is a onedimensional discrete complex sequence, so the input data should be constructed as a radar echo cube to fit the network. The main goal of RTD is not only to distinguish the target from noise, but also to predict position and velocity. The RD-Detection Net is used to measure range and velocity, while azimuth and elevation are predicted by the Angle-Detection Net. The CNN-based model presented better accuracy and performance of detection than traditional approaches. As in previous analyses, it is feasible and reasonable to implement a complete "endto-end" multi-task RTD learning scheme, and a DNN-based learning scheme can also be As in previous analyses, it is feasible and reasonable to implement a complete "endto-end" multi-task RTD learning scheme, and a DNN-based learning scheme can also be utilized for more complex RTD tasks. For specific or more complex tasks, the received radar signals are transformed from the raw echo data to diverse data forms using different transform methods.

Range-Doppler Spectrum
Following the idea of image processing with CNN, a range-Doppler spectrum can be considered as an "image", then, the detector classifies the "image" as target absent or present, thus, the detection task could be treated as a classification task. Based on this idea, Wang et al. [66] designed a CNN target detector based on the range-Doppler spectrum and compared the proposed method with traditional CFAR detectors. The proposed network architecture presented in Figure 12 is an 8-layer CNN, including 2 convolutional layers, 2 ReLU layers, 2 max-pooling layers, and 2 fully connected layers. Different range-Doppler spectrum with multiple SNR values are constructed as the input of the CNN detector. The detector is actually a sliding window detector that slides over the Range-Doppler spectrum with a fixed window, and further decides whether the spectrum contains targets or just noise.
A deep learning method for automotive radar detection with non-image-like Range-Doppler data is proposed in [67], Brodeski et al. described a CNN model to detect and localize targets in the Range-Doppler-Azimuth-Elevation space. The training data was collected during the calibration process and augmented with raw radar data. Radar signals are transformed to the Range-Doppler domain by Doppler processing as the input to the network. When comparing with the conventional CA-CFAR method, the proposed approach outperformed the classical detection method while keeping the real-time abilities.
Their method for data construction was inspired by a simulation-based method for synthetic automotive scenes generation in [68]. considered as an "image", then, the detector classifies the "image" as target absent or present, thus, the detection task could be treated as a classification task. Based on this idea, Wang et al. [66] designed a CNN target detector based on the range-Doppler spectrum and compared the proposed method with traditional CFAR detectors. The proposed network architecture presented in Figure 12 is an 8-layer CNN, including 2 convolutional layers, 2 ReLU layers, 2 max-pooling layers, and 2 fully connected layers. Different range-Doppler spectrum with multiple SNR values are constructed as the input of the CNN detector. The detector is actually a sliding window detector that slides over the Range-Doppler spectrum with a fixed window, and further decides whether the spectrum contains targets or just noise. A deep learning method for automotive radar detection with non-image-like Range-Doppler data is proposed in [67], Brodeski et al. described a CNN model to detect and localize targets in the Range-Doppler-Azimuth-Elevation space. The training data was collected during the calibration process and augmented with raw radar data. Radar signals are transformed to the Range-Doppler domain by Doppler processing as the input to the network. When comparing with the conventional CA-CFAR method, the proposed approach outperformed the classical detection method while keeping the real-time abilities. Their method for data construction was inspired by a simulation-based method for synthetic automotive scenes generation in [68].

Pulse-Range Maps
Radar echoes also can be processed as pulse-range two-dimensional images for training and testing. Therefore, the problem of RTD can also be transformed into the target detection and location problem in Pulse-Range images. Pan et al. [69] introduced a CNN model for small marine target detection in strong sea clutter. A modified Faster R-CNN is utilized to extract the features of small targets and sea clutter, then the extracted features are utilized to detect and position the target in the pulse-range image. Figure 13 presents the DNN architecture for RTD based on Faster R-CNN with pulse-range images. This method proved to be easier to use in locating small targets from sea clutter and is able to obtain high D P obviously, which overcomes the weakness of the traditional CFAR detection methods.

Pulse-Range Maps
Radar echoes also can be processed as pulse-range two-dimensional images for training and testing. Therefore, the problem of RTD can also be transformed into the target detection and location problem in Pulse-Range images. Pan et al. [69] introduced a CNN model for small marine target detection in strong sea clutter. A modified Faster R-CNN is utilized to extract the features of small targets and sea clutter, then the extracted features are utilized to detect and position the target in the pulse-range image. Figure 13 presents the DNN architecture for RTD based on Faster R-CNN with pulse-range images. This method proved to be easier to use in locating small targets from sea clutter and is able to obtain high P D obviously, which overcomes the weakness of the traditional CFAR detection methods. We can find that classic object detection methods based on CNNs, especially Faster R-CNN and SSD, which have been widely employed to RTD. Girshick et al. [92] transformed the detection task into a classification task and proposed Fast R-CNN structure. Ren et al. [93] depicted a new region proposal network (RPN) to implement the end-toend target detection based on Faster R-CNN and RPN with shared convolutional features [96]. The target detection process of Fast R-CNN is performed in two steps; RPN proposes regions where are easy to find the target. Each region uses anchors to provide its possible course position, and then classifies the proposed target and fine-tunes its position by regression. The structure of Faster R-CNN is presented in Figure 14, and this architecture has been widely used as the basic framework for RTD, not only in Pulse-Range images, but also in SAR images and PPI images.  We can find that classic object detection methods based on CNNs, especially Faster R-CNN and SSD, which have been widely employed to RTD. Girshick et al. [92] transformed the detection task into a classification task and proposed Fast R-CNN structure. Ren et al. [93] depicted a new region proposal network (RPN) to implement the end-to-end target detection based on Faster R-CNN and RPN with shared convolutional features [96]. The target detection process of Fast R-CNN is performed in two steps; RPN proposes regions where are easy to find the target. Each region uses anchors to provide its possible course position, and then classifies the proposed target and fine-tunes its position by regression. The structure of Faster R-CNN is presented in Figure 14, and this architecture has been widely used as the basic framework for RTD, not only in Pulse-Range images, but also in SAR images and PPI images.

SAR Images
Some radar image signals, such as SAR data, can be inputted as images. Target detection based on SAR images is a key step in ATR, because SAR can provide highresolution radar images of a wide range of scenarios including all-weather and all-day. However, as mentioned earlier, SAR imaging technology is another application branch of radar signal processing, and SAR images are widely-used in ATR. RTD tasks utilizing SAR images, as the input in this review will mainly focus on target feature extraction, target detection and location.
end target detection based on Faster R-CNN and RPN with shared convolutional features [96]. The target detection process of Fast R-CNN is performed in two steps; RPN proposes regions where are easy to find the target. Each region uses anchors to provide its possible course position, and then classifies the proposed target and fine-tunes its position by regression. The structure of Faster R-CNN is presented in Figure 14, and this architecture has been widely used as the basic framework for RTD, not only in Pulse-Range images, but also in SAR images and PPI images.

SAR Images
Some radar image signals, such as SAR data, can be inputted as images. Target detection based on SAR images is a key step in ATR, because SAR can provide high-resolution radar images of a wide range of scenarios including all-weather and all-day. However, as mentioned earlier, SAR imaging technology is another application branch of radar signal processing, and SAR images are widely-used in ATR. RTD tasks utilizing SAR images, as the input in this review will mainly focus on target feature extraction, target detection and location.
CNN is a common deep learning architecture for target detection of SAR images. Wang et al. [74] presented a method and ideas of CNN in the research of target detection in SAR images, laying a foundation for future research. Yang et al. [75] adopted a DNN regression method for SAR images target detection based on the improved structure of YOLO [94]. This network is effective for extracting features with low resolution and complex composition. Zheng et al. [76] proposed a multi-feature target detection method in SAR imagers aiming at obtaining the target's actual position, which is presented in Figure  15. The CNN model was applied to capture deep features while the ANN was adopted to analyze hand-crafted features. Then, two sub-channel features are concatenated together in the main channel and the experimental results showed that the multi-features-based method outperformed other methods. CNN is a common deep learning architecture for target detection of SAR images. Wang et al. [74] presented a method and ideas of CNN in the research of target detection in SAR images, laying a foundation for future research. Yang et al. [75] adopted a DNN regression method for SAR images target detection based on the improved structure of YOLO [94]. This network is effective for extracting features with low resolution and complex composition. Zheng et al. [76] proposed a multi-feature target detection method in SAR imagers aiming at obtaining the target's actual position, which is presented in Figure 15. The CNN model was applied to capture deep features while the ANN was adopted to analyze hand-crafted features. Then, two sub-channel features are concatenated together in the main channel and the experimental results showed that the multi-features-based method outperformed other methods. In addition, Faster R-CNN [93] was modified to improve detection performance in SAR images. Kang et al. [70] introduced a CNN-based detection method combining features and pixels in which the Faster R-CNN framework was modified by the traditional CFAR detection algorithm for small-sized targets detection, which is presented in Figure  16. For targets with clearer shape and structure, Faster R-CNN gave higher classification scores, while for those small-sized targets with smaller bounding boxes, the classification score was relatively low. CFAR was employed because this detector relies on the amplitude of the pixel rather than the shape or structure of the target. The combination of detectors based on deep features and pixels can improve the multi-scale target's detection performance. An SAR image is different from an optical image in that it reflects the electromagnetic characteristics of the target. Making full use of electromagnetic characteristics in feature extraction for RTD would help improve detection performance. Zhang et al. [77] fused electromagnetic and geometrical characteristics and involved the fused features in Faster R-CNN, and the architecture is presented in Figure 17. in their work, convolutional layers were utilized to extract geometric features of SAR images, just like the process of optical images. Meanwhile, SVM was applied to extract electromagnetic characteristics from the complex data. In addition, Faster R-CNN [93] was modified to improve detection performance in SAR images. Kang et al. [70] introduced a CNN-based detection method combining features and pixels in which the Faster R-CNN framework was modified by the traditional CFAR detection algorithm for small-sized targets detection, which is presented in Figure 16. For targets with clearer shape and structure, Faster R-CNN gave higher classification scores, while for those small-sized targets with smaller bounding boxes, the classification score was relatively low. CFAR was employed because this detector relies on the amplitude of the pixel rather than the shape or structure of the target. The combination of detectors based on deep features and pixels can improve the multi-scale target's detection performance. An SAR image is different from an optical image in that it reflects the electromagnetic characteristics of the target. Making full use of electromagnetic characteristics in feature extraction for RTD would help improve detection performance. Zhang et al. [77] fused electromagnetic and geometrical characteristics and involved the fused features in Faster R-CNN, and the architecture is presented in Figure 17. in their work, convolutional layers were utilized to extract geometric features of SAR images, just like the process of optical images. Meanwhile, SVM was applied to extract electromagnetic characteristics from the complex data. tude of the pixel rather than the shape or structure of the target. The combination tectors based on deep features and pixels can improve the multi-scale target's de performance. An SAR image is different from an optical image in that it reflects th tromagnetic characteristics of the target. Making full use of electromagnetic characte in feature extraction for RTD would help improve detection performance. Zhang et fused electromagnetic and geometrical characteristics and involved the fused featu Faster R-CNN, and the architecture is presented in Figure 17. in their work, convolu layers were utilized to extract geometric features of SAR images, just like the pro optical images. Meanwhile, SVM was applied to extract electromagnetic characte from the complex data.  SSD is a single DNN designed to detect targets in images, which consists of th network and the auxiliary structure [95]. Zhao et al. [78] presented a SSD-based app to deploy a maritime target detection network model of SAR on an embedded dev tude of the pixel rather than the shape or structure of the target. The combination of detectors based on deep features and pixels can improve the multi-scale target's detection performance. An SAR image is different from an optical image in that it reflects the electromagnetic characteristics of the target. Making full use of electromagnetic characteristics in feature extraction for RTD would help improve detection performance. Zhang et al. [77] fused electromagnetic and geometrical characteristics and involved the fused features in Faster R-CNN, and the architecture is presented in Figure 17. in their work, convolutional layers were utilized to extract geometric features of SAR images, just like the process of optical images. Meanwhile, SVM was applied to extract electromagnetic characteristics from the complex data.  SSD is a single DNN designed to detect targets in images, which consists of the basic network and the auxiliary structure [95]. Zhao et al. [78] presented a SSD-based approach to deploy a maritime target detection network model of SAR on an embedded device. As SSD is a single DNN designed to detect targets in images, which consists of the basic network and the auxiliary structure [95]. Zhao et al. [78] presented a SSD-based approach to deploy a maritime target detection network model of SAR on an embedded device. As shown in Figure 18, a truncated VGG-16 [97] is adopted as the backbone. The auxiliary structure generates the following key features for detection: convolution predictors, multiscale feature maps, aspect ratios, and default boxes. Experiments based on the Gaofen-3 spaceborne SAR dataset showed that this approach has practicability and expansibility. Ma et al. [98] proposed a modified SSD model and designed a complete workflow for different targets detection in large-scale GF-3 SAR images. shown in Figure 18, a truncated VGG-16 [97] is adopted as the backbone. The auxiliary structure generates the following key features for detection: convolution predictors, multi-scale feature maps, aspect ratios, and default boxes. Experiments based on the Gaofen-3 spaceborne SAR dataset showed that this approach has practicability and expansibility. Ma et al. [98] proposed a modified SSD model and designed a complete workflow for different targets detection in large-scale GF-3 SAR images.

PPI Images
Although there are different visualizations of radar displays, such as P-display and A-display, they are all processed through radar signal processing and have a the similar process. Radar PPI images indicate all or part of the data of range, azimuth elevation or height. Mou et al. [71] proposed an improved Faster R-CNN method for marine target detection using radar PPI images. VGG16 and ResNet101 were used as backbone network models to extract target features. They modified Faster R-CNN in 4 aspects: (1) using the focal loss [99] instead of classification loss to overcome the deficiency of sample imbalance. (2) ROI Pooling is replaced by precise ROI Pooling [100] to reduce the precision loss in the process of scale unification and enhance the accuracy of pooling. (3) NMS is replaced by soft-NMS [101] to boost missing detection. (4) ReLU is replaced by ELUs [102] to speed up convergence and avoid gradient disappearance. Experimental results proved that compared with the traditional Faster R-CNN, the modified approach based on Faster R-CNN has better detection performance in stability and accuracy.

PPI Images
Although there are different visualizations of radar displays, such as P-display and A-display, they are all processed through radar signal processing and have a the similar process. Radar PPI images indicate all or part of the data of range, azimuth elevation or height. Mou et al. [71] proposed an improved Faster R-CNN method for marine target detection using radar PPI images. VGG16 and ResNet101 were used as backbone network models to extract target features. They modified Faster R-CNN in 4 aspects: (1) using the focal loss [99] instead of classification loss to overcome the deficiency of sample imbalance.
(2) ROI Pooling is replaced by precise ROI Pooling [100] to reduce the precision loss in the process of scale unification and enhance the accuracy of pooling. (3) NMS is replaced by soft-NMS [101] to boost missing detection. (4) ReLU is replaced by ELUs [102] to speed up convergence and avoid gradient disappearance. Experimental results proved that compared with the traditional Faster R-CNN, the modified approach based on Faster R-CNN has better detection performance in stability and accuracy.

Time-Frequency Images
The development of micro-Doppler technology also provides a valid method for target detection [103,104]. Targets have the characteristics of micro-motion; due to the micromotion of the targets, the amplitude and the phase of electromagnetic wave scattered by the moving parts change periodically or irregularly. This is, other words, the micro-motion signature induced by the high-speed moving parts of the targets, known as micro-Doppler.
The extraction of target micro-motion features is always one of the challenges in RTD. A common method for signal time-frequency analysis is Wigner-Ville distribution (WVD). Risueno et al. [72] introduced a WVD-CNN detector for RTD using less free weights than the conventional MLP scheme. Su et al. [73] investigated CNN-based methods (LeNet [105] and GoogLeNet [106] models) for maritime targets detection under different polarization and sea states. In this work, short-time Fourier transform (STFT) is adopted to convert the radar signal (IPIX measured data) to two-dimensional time-frequency images of the target and the clutter. According to the experiment results, LeNet is more efficient in echo signal preprocessing, while GoogLeNet has better detection performance under different polarization and sea states modes in P D and P f a .

Summarization of Different Structures for RTD
ANN-based models could be used as a kind of detectors to differentiate targets from noise or clutter with high detection performance, especially in a time-varying and nonhomogeneous environment, the parameters of the background will change. The robustness of the detector is improved by training the network for different scenarios corresponding to different noise and clutter distribution parameters. Therefore, ANN-based detector is more robust and could provide a higher detection performance than the classical CFAR detector.
DNN-based models could be designed for different RTD tasks with different input forms. The purpose of RTD is to identify whether the radar echo undertested contains a signal from the target or only contains the noise, but more importantly, to obtain multidimensional information of position and motion. DNNs turned out to be qualified and excellent in RTD, and more complex architecture with different training ways for more complicated RTD scenes. In addition, various preprocessing methods of radar signal processing could extract features effectively, which could help to improve detection performance. A fusion of the traditional detection methods and the new concepts of deep learning methods has become a promising trend and solution in RTD application.

Summary of Datasets and Performance Evaluation
The effectiveness of deep learning-based methods depends largely on the available quantity of training data. The availability of a labelled dataset is regarded as a prerequisite for applying deep learning methods to a certain application. Although several publicly datasets are available for image processing, speech recognition, nature language processing, etc., there are very few ones for radar (except SAR images), not to mention the sampled radar data for RTD based on deep learning. In this review, we summarize IPIX and CSIR experimental datasets which have been used for RTD by some researchers. Although these two measured datasets have been used for RTD, they are not generally public because they are domain-specific. Therefore, we collect and summarize the related radar experimental parameters of these two datasets, which could help you to to understand the existing datasets or construct simulation datasets.

IPIX Database
The IPIX database is a widely-used database for sea-surface small target detection, sea clutter characteristic analysis and modeling. It was collected and maintained by Prof.
Haykin's research group at McMaster University. The IPIX database contains two data sets, one was collected by the popular intelligent pixel (IPIX) processing radar under the staring mode in the city of Dartmouth, Canada in 1993 [64,107], and the other came from IPIX radar in the Grimsby area of Canada in 1998 [108]. The Dartmouth database in 1993 contains 339 data sets which cover a wide range of wave and wind conditions. About 14 data files are particularly useful to test algorithms aiming at detecting small objects in sea clutter. All these target data files have a weak target in one of their range bins and are more than 2 min long, which (a subset) is available in [109]. The Grimsby experiment upgraded the quantization bits and measured sea clutter data from different range resolutions; in this way, weak clutter signals and strong targets can be observed simultaneously without large quantization errors or clipping. About 222 datasets in the Grimsby database focus specifically on the floating targets of various sizes, and the actual Grimsby data files are available in [110]. However, the related target information and auxiliary sea state information have yet to be released [111]. The parameters of IPIX radar, experimental parameters and sea state information are summarized in Table 3. There are limitations to using the IPIX database. These two datasets only cover restricted environmental conditions and limited relative positions, and related important auxiliary information was not recorded detailed, especially the measured data from 1998 [111]. The other limitation is that the radar echoes are only reflected from motionless floating objects, not mobile ships, so it is inevitable for researchers to add extra simulated target echo data. Li et al. [64] adopted the IPIX database to extract features for surface small target detection. Chen et al. [112] validated the proposed micromotion target detection method with a simulated IPIX dataset. Su et al. [72] used IPIX measured target signal and sea clutter data to conduct CNN training and testing schemes for maritime target detection. What is worth mentioning is that the results obtained in [72] are significantly worse than the simulated data. Besides the diversity of the measurements themselves, the low speed and acceleration of the targets in IPIX radar can also lead to degradation.

CSIR Database
The CSIR dataset was collected from two kinds of sea clutter and ship target echo measurement trials which were conducted by the Defense, Peace, Safety, and Security Unit of Council for Scientific and Industrial Research (CSIR) in the southwest coastline of South Africa. The first trial was conducted at the Overberg Test Range (OTR) near Arniston in July 2006 with the Fynment radar [113,114]. The second measurement trial was conducted on 4 November 2007 with an experimental monopulse radar deployed on top of Signal Hill in Cape Town [115,116]. Radar parameters and experimental parameters are demonstrated in Table 4. The CSIR dataset contains a large amount of sea clutter and target echo data, covering multiple parameter combination (various transmitted waveforms, azimuths and distances) under different environment conditions. The radar operating parameters, marine environment parameters, GPS auxiliary data and the types of cooperative target (inflatable boat, motor yacht and fishing boat) could make up the limitations of the IPIX database. A detailed trial designing scheme and trial data records are also maintained, which could provide reference and guidance for similar trials [108]. Currently, many research institutes are using the CSIR dataset in related research. In Pan's experiment [69], the dataset TFC15_008, collected from the Fynment radar was used to demonstrate that the DNN approach can easily detect small targets from sea clutter with low SCR and accurately locate the position of the targets compared with the traditional CFAR detection methods. Chen et al. [112] employed the CSIR database to validate the detection performance of marine maneuvering targets with translational motion.
In fact, from Table 2 in Section 2, we can conclude that most research teams construct and label radar datasets by simulation methods for deep RTD, besides SAR images, as the actual radar data was hard to obtain. The simulation parameters of radar waveform can also be referred from Tables 3 and 4.

Data Preprocessing and Construction
To boost target detection performance, appropriate preprocessing is necessary. According to the summarization of all related papers, radar received echoes, Pulse-Range maps, Range-Doppler spectrum and Time-Frequency images are widely-used in RTD.
All of them are transformed from raw radar echo by a train of signal processing methods or mathematical operations. In the signal processing process of a pulsed coherent radar, the received signals are processed by pulse compression, coherent accumulation, and clutter suppression. Therefore, to make a better comparison, we will discuss these inputs that are related to deep learning regarding RTD from the literature. Figure 19 shows the flow of the preprocessing of datasets for RTD. All of them are transformed from raw radar echo by a train of signal processing methods or mathematical operations. In the signal processing process of a pulsed coherent radar, the received signals are processed by pulse compression, coherent accumulation, and clutter suppression. Therefore, to make a better comparison, we will discuss these inputs that are related to deep learning regarding RTD from the literature. Figure 19 shows the flow of the preprocessing of datasets for RTD.  Simulations in a setting with fluctuating targets model and noise background are carried out to construct the radar received echoes for RTD. An example of how different targets stand out in different SNR is presented in Figure 20. The emitted pulse is assumed to be a unit vector which does not incorporate any beneficial compression gains or additional antenna gains. The targets are assumed to fluctuate slowly and follow the standard Swerling distribution, where the value of n  varies randomly. It is assumed to consider the impact of path propagation and other environmental effects indirectly. By time and frequency domain analysis of radar received echoes, targets or environment characteristics can be obtained. Figure 19. Preprocessing diagram of the dataset for the radar detection system.

Radar Received Echoes and Radar Cube
A typical pulsed radar system is considered in which a transmitted waveform S(t) is emitted at a certain interval. The received echoes S r (t, k), which includes signals of a target and noise, are modulated by amplitudes, time delays, doppler frequency and then sampled at a given rate as (3): where t = 1, 2, . . . , R (fast time). In the incoming radar echoes, σ n denotes the reflectivity amplitude which can be calculated from the radar function, and τ n is the time delay of target n, W(t) is white Gaussian noise, an independent complex Gaussian random variable with mean zero. e jϕ n,k denotes the doppler shift for each target, for a target with a constant velocity, it can be defined as (4): where k = 1, . . . , M (slow time), M is the total number of pulses in a coherent processing interval (CPI), assuming ϕ n,0 = 0 and ν n is the radial velocity of target n, f c is the radar carrier frequency, T r is the pulse repetition interval and c is the propagation velocity.
Simulations in a setting with fluctuating targets model and noise background are carried out to construct the radar received echoes for RTD. An example of how different targets stand out in different SNR is presented in Figure 20. The emitted pulse is assumed to be a unit vector which does not incorporate any beneficial compression gains or additional antenna gains. The targets are assumed to fluctuate slowly and follow the standard Swerling distribution, where the value of σ n varies randomly. It is assumed to consider the impact of path propagation and other environmental effects indirectly. By time and frequency domain analysis of radar received echoes, targets or environment characteristics can be obtained. Using raw data as the input to perform RTD training without any preprocessing method is a complete "end-to-end" task, and the radar echo cube enables the network to exploit the temporal and spatial correlation simultaneously [49], which is presented in Figure 21. Prashant et al. [58] used radar received echoes to detect signals in a non-Gaussian noise environment. However, for raw data, it may require a deeper network to extract target features; in other words, the increasing complex detection environment makes the "end-to-end" tasks more challenging.

Pulse-Range Maps
Radar emits multiple pulses in the CPI, and the reflected signals are integrated in a coherent or incoherent way. The axes of the original echo data can be labelled as 'fast' and 'slow' time, where fast time is utilized to calculate distance and slow time is used to calculate doppler velocity (see in Figure 22). Pulse compression is applied to the echo signal via a standard matched filtering operation, using the reference signal over the fast time, which is calculated as (5): The operation of pulse compression through matched filter can obtain a narrow pulse width and a high resolution of range profile, but has no impact on the radar detection range. Therefore, the problem of RTD can be converted into target detection and location in Pulse-Range maps. The upper part of Figure 22 shows an example of the Pulse-Range map. Akhtar et al. [60] used Pulse-Range maps to detect targets in a noisy background, and in [69], the Pulse-Range maps were applied to ranging and detecting targets from sea clutter. Using raw data as the input to perform RTD training without any preprocessing method is a complete "end-to-end" task, and the radar echo cube enables the network to exploit the temporal and spatial correlation simultaneously [49], which is presented in Figure 21. Prashant et al. [58] used radar received echoes to detect signals in a non-Gaussian noise environment. However, for raw data, it may require a deeper network to extract target features; in other words, the increasing complex detection environment makes the "end-to-end" tasks more challenging. Using raw data as the input to perform RTD training without any preprocessing method is a complete "end-to-end" task, and the radar echo cube enables the network to exploit the temporal and spatial correlation simultaneously [49], which is presented in Figure 21. Prashant et al. [58] used radar received echoes to detect signals in a non-Gaussian noise environment. However, for raw data, it may require a deeper network to extract target features; in other words, the increasing complex detection environment makes the "end-to-end" tasks more challenging.

Pulse-Range Maps
Radar emits multiple pulses in the CPI, and the reflected signals are integrated in a coherent or incoherent way. The axes of the original echo data can be labelled as 'fast' and 'slow' time, where fast time is utilized to calculate distance and slow time is used to calculate doppler velocity (see in Figure 22). Pulse compression is applied to the echo signal via a standard matched filtering operation, using the reference signal over the fast time, which is calculated as (5): The operation of pulse compression through matched filter can obtain a narrow pulse width and a high resolution of range profile, but has no impact on the radar detection range. Therefore, the problem of RTD can be converted into target detection and location in Pulse-Range maps. The upper part of Figure 22 shows an example of the Pulse-Range map. Akhtar et al. [60] used Pulse-Range maps to detect targets in a noisy background, and in [69], the Pulse-Range maps were applied to ranging and detecting targets from sea clutter.

Pulse-Range Maps
Radar emits multiple pulses in the CPI, and the reflected signals are integrated in a coherent or incoherent way. The axes of the original echo data can be labelled as 'fast' and 'slow' time, where fast time is utilized to calculate distance and slow time is used to calculate doppler velocity (see in Figure 22). Pulse compression is applied to the echo signal via a standard matched filtering operation, using the reference signal over the fast time, which is calculated as (5):Ŝ The operation of pulse compression through matched filter can obtain a narrow pulse width and a high resolution of range profile, but has no impact on the radar detection range. Therefore, the problem of RTD can be converted into target detection and location in Pulse-Range maps. The upper part of Figure 22 shows an example of the Pulse-Range map. Akhtar et al. [60] used Pulse-Range maps to detect targets in a noisy background, and in [69], the Pulse-Range maps were applied to ranging and detecting targets from sea clutter. S t k is multiplied by a window function firstly and then FFT is used to yield a range-Doppler spectrum as (6): where win is a windowing function, F denotes the discrete Fourier transform (DFT).
Following DFT, targets with a steady pace will appear concentrated in doppler. An example of doppler processing after matched filtering is shown in Figure 22. The Range-Doppler map in Figure 22 shows three targets at different doppler bins, a close target at range 180 samples having a positive velocity, which indicates that the target is moving towards the radar (shown as orange grid in Figure 22), and 2 moving targets at the same range 250 samples but different doppler bins (shown as blue and green grids respectively). Multiple frames are extracted from the raw data, producing a sequence of Range-Doppler frames. Range-Doppler maps with targets absent and present is presented in Figure 23. The target echo occupies multiple cells, forming a mountain-like shape in the spectrum. If the SNR is low, the two targets at the range of 250 samples are not obvious in the spectrum. Researchers often consider the Range-Doppler map as an image, and use a classifier to classify the image as target absent or present. Target present examples with different SNR can be generated By setting a different noise power. In Refs. [62,[65][66][67], range-Doppler maps were utilized in various RTD tasks.

Range-Doppler Maps
Subsequent to gathering all pulses, doppler processing is applied over multiple pulses by applying fast Fourier transform (FFT) over a slow time at each range. The slow time domain ofŜ r (t, k) is multiplied by a window function firstly and then FFT is used to yield a range-Doppler spectrum as (6): where win is a windowing function, F denotes the discrete Fourier transform (DFT). Following DFT, targets with a steady pace will appear concentrated in doppler. An example of doppler processing after matched filtering is shown in Figure 22. The Range-Doppler map in Figure 22 shows three targets at different doppler bins, a close target at range 180 samples having a positive velocity, which indicates that the target is moving towards the radar (shown as orange grid in Figure 22), and 2 moving targets at the same range 250 samples but different doppler bins (shown as blue and green grids respectively). Multiple frames are extracted from the raw data, producing a sequence of Range-Doppler frames. Range-Doppler maps with targets absent and present is presented in Figure 23. The target echo occupies multiple cells, forming a mountain-like shape in the spectrum. If the SNR is low, the two targets at the range of 250 samples are not obvious in the spectrum. Researchers often consider the Range-Doppler map as an image, and use a classifier to classify the image as target absent or present. Target present examples with different SNR can be generated By setting a different noise power. In Refs. [62,[65][66][67], range-Doppler maps were utilized in various RTD tasks.

Time-Frequency Images
The time-frequency analysis method is an effective and powerful tool in analyzing time-varying non-stationary signals because of the time-varying characteristics of target micro-motion. After demodulation and pulse compression are applied on the radar echo, STFT is often adopted to convert the radar echo to two-dimensional time-frequency images, as shown in (7): where g(t) is a narrow windowing function such as a Hamming window function. Since the micro-motion characteristics of targets differ from background noise or clutter, timefrequency images are utilized to build training and testing datasets. Su et al. [72] adopted a time-frequency analysis method to analyze micro-motion characteristics, thereby differentiating targets from clutter.

Performance Evaluation
RTD is a fundamental process for separating targets of interest from background noise. A primary goal of RTD is to satisfy two very contradictory requirements: acquire a high P D with a low P f a . Comparing the P D under the same P f a with the DNN-based detector and CFAR detector respectively is the common metric of detection performance [59,70]. P D and P f a are defined as (8) and (9): (9) where N td is the total number of truly detected targets, N total_targets denotes the total number of targets in the sample, N f d is the total number of falsely detected targets (mistaken for targets), N de_targets is the total number of non-targets in the sample.

Summarization of Dataset and Preprocessing
The actual radar data is not widely accessible at present because of the particularity of RTD tasks. Although a few of the existing datasets, such as, IPIX and CSIR, whose data have been used for sea surface targets detection and sea clutter characteristic analysis, most recent research on RTD still relies on synthetic or simulation data. In this section, we describe how to construct radar received echoes and preprocess three types of data which are widely used in RTD. Three types of data processed in radar signal processing are applied in different target detection tasks with respective characteristics. Since pulse compression could obtain a high resolution of the range profile, pulse-range maps are mainly used in radar ranging tasks. Range-Doppler maps are more commonly used in position and velocity measurement in complex scenarios with background noise or clutter because doppler processing could improve SNR effectively. The time-frequency analysis method is efficient and especially suitable for targets with distinct micro-motion characteristics. An example would be maritime target detection and recognition, which can make full use of the micro-motion information of the target and clutter.

Discussion
Although deep learning-based approaches have had some successful applications in the field of RTD, the following challenges remain:

Dataset Deficiency
One of the difficulties in applying deep learning-based methods to RTD is the lack of publicly available labelled data. The difference with other applications is the high cost of radar data collection. Currently, radar system simulation modeling is a method to solve the problem, but it is computationally demanding and extremely challenging to generate alternative data because multiple factors need to be considered, e.g., multiple attenuation, discrete cells and multipath reflections. In other words, even if all the above issues are avoided, relying on mathematical models for simulation is inevitable, which may introduce the problem of model fidelity. Similar inaccuracies have emerged in [72].
We still hope to get enough data for training from real-world radar systems, which will certainly make the detection model more reliable. In order to further advance the research of RTD in the absence of realistic radar data, the following approaches can be considered: (1) Data augmentation, which has been used in [67,68]; (2) Developing more robust deep learning-based algorithms with insufficient training data, such as generative adversarial networks (GAN) [117]; (3) Establishing more advanced RTD frameworks seems to be a new trend; (4) Developing learning-based methods, such as meta learning [118,119] and transfer learning [120,121], which can overcome the limitation of insufficient data and insensitivity to the changes of the radar detection environment.

Varied Models in Complex Tasks
Although the application of deep learning technology in RTD has made remarkable progress, the existing literatures on ANNs and DNNs of RTD are still relatively sparse and not mature enough. For instance, a common aspect found in many of the papers cited above is the moderate size of ANN, where the networks proposed in [58,61] contain only one hidden layer. In addition, although many other architectures have been proposed in DNN, only CNN is widely used in RTD. Therefore, there is plenty of room to explore various DNN in RTD. Signal processing system that can efficiently suppress target RCS fluctuations, noise, clutter, and jamming is always considered as one of the key directions of radar research and development. How to effectively distinguish targets from strong active jammed signals is still under investigation. How to verify other deep learning architectures can perform better in RTD also needs to be considered. These also imply that newer, more diverse but practical powerful learning schemes for more complex tasks are required urgently.
The advance in computing power allows the training of large-scaled deep learning models on massive data. Remarkable progress in deep learning algorithms and great advances in radar system would benefit each other. It is worth exploring replacing the entire radar signal processing by deep learning methods.

Integrated Training Methods
Much of the existing research shows that the deep fusion of traditional signal processing methods and deep learning-based schemes in RTD application is an evident trend [122]. For one thing, as a kind of data preprocessing method, traditional radar signal processing methods, such as pulse compression, coherent accumulation, STFT, etc., are beneficial to enhance features, thus improving the detection performance. For another, according to the previous review, neural networks combined with typical radar signal processing approaches (e.g., CFAR) as a learning strategy can help improve detection performance. In addition, we believe that models based on deep learning can provide an "end-to-end" framework for integrating perception, processing and decision making. Besides, studies of simulation process optimization, performance judgment criteria and other basic problems, such as model interpretability and generalization ability, are still under investigation.

Conclusions
Research efforts in artificial neural network and deep learning models in RTD has been discussed in this review. Various architectures of networks for various application schemes have been investigated. The results obtained have shown that deep learning-based detectors perform better than the traditional processing methods to some certain degree in some specific cases. Although the study of deep learning in the field of RTD is at the initial stage and still faces some challenges, there is no doubt that the research and usage of deep