1. Introduction
With the rapid advancement of unmanned aerial vehicle (UAV) technology, UAVs have been widely utilized in reconnaissance, surveillance, and aerial photography. However, the increasing frequency of unauthorized UAV flights highlights the urgent need for the development of detection technologies specifically designed for low–slow–small targets. Passive Radar (PR) is a radar system that uses non-cooperative electromagnetic signals for the silent detection of targets in specific areas [
1,
2]. As a novel radar system, passive radar leverages ambient electromagnetic signals for target detection. Compared to active radar, passive radar offers significant advantages [
3,
4,
5], including strong anti-jamming capabilities, effectiveness against stealth targets, efficient spectrum utilization, and ease of networking and deployment. Given these advantages, passive radar shows significant potential for applications in military reconnaissance, civil aviation, and traffic monitoring [
6,
7].
As a bistatic radar system, passive radar typically uses two channels for signal reception: the reference channel captures the direct signal from the radiation source, while the surveillance channel receives the target’s reflected echoes [
8]. Current target detection methods often enhance SNR through pulse compression or cross-ambiguity accumulation [
9], focusing the target’s energy for improved detection. A statistical model is then constructed to generate detection statistics [
10,
11] based on the principle of ‘energy detection’. The traditional radar target detection process is shown in
Figure 1. Traditional detection methods rely on expansion of the available signal dimensions [
12]. Examples include one-dimensional CFAR detection for time-domain radar sequences [
13] and two-dimensional techniques such as range Doppler moving target detection, 2D CFAR for range azimuth, and space-time (spatial Doppler) processing. However, for the detection of extremely weak target echoes, insufficient accumulation time or complex target echoes causing model mismatches may degrade the detection performance of these methods [
14,
15]. In addition to the traditional ‘energy accumulation and target detection’ approaches, other techniques also enable effective target detection. In [
16], sparse decomposition of direct and echo signals was used, eliminating the need for energy accumulation. In [
17], the strengths of RNNs for time-series processing were demonstrated, using LSTM for radar signal prediction and anomaly detection, enabling target detection in cluttered sea environments. Thus, while energy accumulation enhances SNR and aids in target detection, it is not essential for effective detection.
With the improvement of computer performance and the development of neural networks, deep learning has made significant progress in the intelligent information processing of radar data, particularly in addressing classification problems that traditional methods struggle to solve. Target detection is typically framed as a binary hypothesis testing problem [
12], where the goal is to determine whether the received signal is a target echo or just noise. Deep learning’s powerful ability to model high-dimensional data enables comprehensive characterization of target echo features, such as RCS fluctuations, target shape, and Doppler frequency. This facilitates the extraction of critical features from target echoes, distinguishing them from background noise and enabling effective detection. The target detection process based on deep learning is shown in
Figure 2. Several studies have demonstrated the potential of deep learning in this field. The combination of auto-encoders and convolutional neural networks enables the extraction of deeper feature information [
18]. RNNs can capture the correlated features of potential sequences within the time-frequency distribution matrix [
19]. By leveraging a dual-channel convolutional neural network to capture multi-dimensional features of targets and clutter in both time and frequency domains, robust target detection can be achieved [
20]. In [
21], a multi-frame RD spectrum was used, and a sliding window was applied to obtain observation vectors, with detection performed by an MLP detector. By modeling the target echo as a frequency-modulated signal and transforming it into a time-frequency image, GoogLeNet and LetNet can be applied for classification [
22]. In [
23], a complex-valued UNet network was used to extract features from both the amplitude and phase of the echo to achieve clutter suppression and target detection. In [
24], deep neural networks utilized the time-frequency features extracted from the echo cells to achieve detection. Based on the constructed pulse Doppler radar R-D image dataset, a CNN-based detection model was proposed that can accurately detect and locate UAV targets [
25]. To address the problem of multi-scale and multi-scene SAR ship detection, a densely connected multi-scale neural network was designed within the Faster R-CNN framework [
26]. To address the fact that the features in the range-Doppler map vary over time, 3D-CNN was applied to the 3D representation of time-varying features [
27]. In [
28], the spectrum of the echo sequence was used as the feature of graph nodes. The adjacency matrix of the graph was computed using the spatiotemporal and power information between the sequences. Classification of the graph nodes was then performed using a graph convolutional network. By using the Visibility Graph (VG) algorithm, the radar echo sequence was modeled as a visibility graph. The graph was then classified using a GCN to achieve detection of weak sea surface targets [
29]. In [
30], by processing the measured sea clutter data in the time domain, the Laplacian matrix of the graph was constructed, and its maximum eigenvalue was extracted as the detection feature, thereby achieving target detection. It can be observed that deep learning in radar target detection has primarily focused on two-dimensional image processing, where SNR enhancement is achieved through preprocessing techniques such as integral transforms. While these methods improve the target echo SNR before detection, they do not inherently enhance the detector’s capability. Consequently, the detection of low–slow–small targets remains a significant challenge.
The cross-ambiguity function (CAF) [
4] is a classical tool for time-delay parameter estimation. However, for accurate estimation of time-frequency differences across various target types, longer detection ranges and faster target velocities expand the search space, significantly increasing the computational load of the CAF. Existing methods of calculating the CAF can be broadly categorized into two types. The first involves exhaustive search. The FFT-based cross-ambiguity function calculation method treats the echo signal and the conjugate of the reference signal, with applied time delay, as a whole and performs FFT processing to traverse the Doppler ambiguity function at specific time delays [
31]. The cross-correlation FFT method can obtain the ambiguity function results corresponding to all time delays at the given Doppler frequency [
32]. The time-frequency peak search method effectively reduces the search time but still incurs inevitable computational redundancy [
33]. Using a multi-population particle swarm optimization algorithm for non-exhaustive search effectively reduces the number of frequency points to search but is prone to falling into local optima [
34].
This paper proposes an intelligent detection method for Low, Slow, and Small (LSS) targets. First, deep feature extraction of the radar echo signals is performed using SE-ResNet34 [
35]. The channel attention mechanism effectively captures dependencies between the network’s channels, enabling the detection of subtle changes in the radar echoes and accurately determining target presence. The target detection method proposed in this paper leverages the powerful feature extraction capability of deep networks to deeply characterize the background environment and target features, effectively overcoming the issue of statistical model mismatching in traditional digital signal processing. Moreover, this method directly uses the echo signal for network training, without the need for integration or transformation processing. Next, based on the PSO algorithm [
34], a gray wolf upper-layer individual guidance and encircling search strategy is introduced, along with a multi-target search mechanism, facilitating the rapid estimation of time-frequency difference parameters for multiple targets. Additionally, the fast time-frequency difference extraction method proposed in this paper further reduces computational redundancy and introduces a multi-target search mechanism, enabling the method to perform multi-target search. Simulation and experimental data demonstrate that the proposed method enables intelligent detection of ‘LLS’ targets and facilitates the rapid extraction of target time-frequency difference parameters.
2. Passive Radar Signal Model
The passive radar system based on DTMB (Digital Terrestrial Multimedia Broadcasting) signals primarily consists of three components: the target, the receiving station, and the transmitting station [
8]. The transmitting station is a terrestrial DTMB transmitter with a known position. The receiving station typically uses two channels for signal reception: a reference channel and a surveillance channel. Its structural diagram is shown in
Figure 3.
The reference channel is directed toward the transmitting station, primarily receiving the direct path signal as the reference signal (
), which can be expressed as
where
denotes the complex envelope of the DTMB signal,
denotes the amplitude scaling factor after the DTMB signal propagates through space, and
represents complex Gaussian white noise. The surveillance channel is directed toward the target area, receiving the echo signal (
) scattered by the target, which can be expressed as
where
represents the amplitude scaling factor of the DTMB signal after scattering and propagation through space by the
n-th target,
denotes the time delay of the echo from the
n-th target,
represents the Doppler frequency of the
n-th target,
represents the complex amplitude of the received clutters,
M represents the number of multipath clutters,
represents the time delay of each multipath clutter,
N is the number of targets, and
denotes complex Gaussian whiter noise. By using the ECA clutter suppression algorithm [
36], the clutter can be removed, and the echo signal can be expressed as
As shown in
Figure 4, before clutter suppression, the energy of multipath clutter is significantly stronger than that of the target echo, completely masking the target and making detection impossible. However, after applying the ECA algorithm for clutter suppression, the target becomes more prominent.
The cross−ambiguity function is calculated using the received direct path signal and the echo signal and is defined as follows:
where
denotes the time delay,
denotes the Doppler frequency,
is the coherent integration time, and
represents the complex conjugation operation.
In the ideal case (without noise), it can be observed that the time delay (
) and Doppler frequency (
) of the echo signal relative to the reference signal correspond to the position of the maximum value in the range-Doppler matrix. This indicates that the time delay and Doppler frequency of the target echo can be expressed as follows:
In modern radar signal processing, radar signals are typically stored in a discrete digital format. The discrete form of Equation (
4) is given by
where
,
L represents the signal time delay unit,
,
K represents the Doppler frequency unit, and
represents the number of sampling points within one coherent integration interval.
3. Intelligent Detection Method for Low–Slow–Small Targets
As shown in
Figure 5, this paper proposes a deep learning-based method for intelligent detection of low–slow–small targets in passive radar, along with a fast multi−target time−frequency difference extraction method. These two methods are sequential: first, target detection determines if the echo contains a target, providing a basis for parameter extraction; then, the time−delay and Doppler frequency information of the targets are extracted using the multi−target method. The workflow is described as follows:
First, a small number of samples from the received radar echo signal are extracted for the creation of training and testing datasets. Since the radar signal is a complex sequence in practice, its real and imaginary parts are extracted, and the one−dimensional signal is transformed into a two−dimensional form to meet the network input dimension requirements, serving as two input channels for the neural network.
Next, the network is trained using the training set. The SE module’s channel attention mechanism and the residual module’s strong feature extraction capabilities are utilized to perform the ’binary classification’ task for target detection.
Then, the trained network model is used for testing. If a target is detected, the process moves to the fast time−frequency difference extraction step; if no target is detected, a new sample is tested, and the process is repeated.
Finally, the fast extraction of target time-frequency differences is performed. By adding upper-layer individual guidance and an enclosing search strategy, the rapid computation of the ambiguity function is achieved. Additionally, a multi-target search mechanism is incorporated, enabling the proposed algorithm to perform a multi−target search.
3.1. Target Detection Model Based on Channel Attention Mechanism
3.1.1. Squeeze-and-Excitation Blocks
It can be seen from the signal model in
Section 2 that when a target is present, the radar echo contains features such as the target’s motion information and RCS scattering variations, which are significantly different from the noise signal alone. In traditional convolutional neural networks, convolutional layers typically process the spatial information of input feature maps independently, while channel information is handled separately within each convolutional channel. Squeeze-and-excitation networks, in contrast, adaptively recalibrate channels, enabling the network to more effectively capture inter-channel dependencies [
35].
As shown in
Figure 6, the key to this network lies in performing the squeeze and excitation operations. First, the network applies global average pooling to the feature map of each channel, and its mathematical formulation is expressed as follows:
This operation ensures that each channel produces a single value, aggregating the global information of each channel and capturing its global feature. Suppose the input feature map is a tensor of size H × W × C, where H and W are the spatial dimensions and C is the number of channels. By performing global average pooling, a vector of size 1 × 1 × C is obtained. In radar echoes, signals that contain targets usually exhibit specific features, such as time delay and Doppler frequency. These features can be represented through the aggregation of global information. This is particularly important for the detection of low, slow, and small targets in radar echoes, as these targets may exhibit weak and localized signal characteristics.
Next, a fully connected layer is employed to learn the weight relationships between channels. The excitation operation is performed on the compressed vector to adaptively adjust and generate the final weight for each channel. The mathematical formulation of this process is expressed as follows:
where
and
are the weight matrices of the fully connected layers,
is the sigmoid activation function, and
is the output from the squeeze operation. The result,
, is the final adaptive weight vector for each channel. In radar signal processing, certain channels may contain key target features (such as time delay and Doppler frequency), while others may only represent noise. Through training, the SE module can automatically identify and enhance the channels relevant to target detection while suppressing noise or irrelevant channels.
Finally, the feature map of each of channel is multiplied by its corresponding weight, which ‘amplifies’ the important channels and ’suppresses’ the less important ones. Mathematically, this can be expressed as
where
is the original feature map for channel
c and
is the scaling factor for that channel.
In conclusion, the SE module helps the network better focus on key features in small target signals. When Low, slow, and small targets are typically smaller with weaker signals. By enhancing critical channels, the SE module improves the network’s ability to perceive these signals and prevents interference from irrelevant information.
3.1.2. A Brief Introduction to SE-ResNet
SE-ResNet [
35] is a streamlined version of the traditional ResNet [
37] that incorporates the squeeze-and-excitation mechanism to enhance channel-wise feature recalibration. The SE-residual block is shown in
Figure 7.
The residual module effectively mitigates the vanishing gradient problem that occurs with increasing model depth, ensuring performance improvement as the number of layers increases [
37]. In this paper, ResNet34 is combined with a squeeze-and-excitation block.
In various image classification tasks, the combination of the SE block and ResNet has been shown to significantly enhance the performance of the ResNet model. The SE block enables the ResNet model to selectively focus on the most important features, allowing the network to learn more discriminative data representations, thereby improving the model’s prediction accuracy. The specific network structure parameters are presented in
Table 1.
3.1.3. Fast Multi-Target Time-Frequency Difference Extraction Method
This paper presents an efficient method for the rapid extraction of multi-target time-frequency differences in passive radar. By integrating GWO [
38] with PSO [
34] and incorporating a multi-target iterative search mechanism, the proposed approach enables the accurate and efficient extraction of time delays and Doppler frequency shifts for multiple targets.
If the process of searching for the target’s time-frequency difference is compared to the hunting process of a wolf pack, the position of the prey is analogous to the target’s Doppler frequency, while each individual wolf’s position represents the current Doppler frequency value during the search. The formula for the distance between the target’s Doppler frequency and the current Doppler frequency value during this process is given by (
10), and the Doppler frequency update formula is provided in (
11).
where
represents the current Doppler frequency value during the search;
denotes the number of iterations;
indicates the Doppler frequency value after
iterations;
D is the distance between the target’s Doppler frequency and the current doppler frequency; and
and
are coefficient factors, which are calculated as follows:
where
and
are random numbers in the range of
and
is the convergence factor, which, at iteration
, is defined as follows:
where
is the maximum number of iterations set by the algorithm.
During each iteration, the algorithm updates
,
, and
while retaining their position information (current Doppler frequency), ensuring that each iteration is guided by superior individuals. The specific formulas for position updates during this phase are expressed as follows:
where
,
, and
represent the distances between
,
, and
and other individuals;
,
, and
are random vectors;
,
, and
denote the position of
,
, and
, respectively; and
represents the position of
.
Formula (
16) defines the step length and direction of
moving towards
,
, and
, while Formula (
17) defines the updated position of
.
The particle velocity and position update formulas proposed in [
34] are expressed as (
18) and (
19), respectively:
Based on (
18), the velocity update formula for the method proposed in this paper is derived as follows:
The velocity update formula exhibits the following characteristics: it retains the contraction factor () to accelerate the algorithm’s convergence speed, preserves the guiding mechanism of individual experience () and the optimal particle () from the original algorithm, and incorporates the upper-ranking wolf guidance strategy and encircling search strategy ().
To enhance the multi-target search capability of the algorithm, this paper introduces an iterative peak search mechanism. Specifically, by establishing a historical particle set and an optimal particle set to store searched particles and peaks, the algorithm temporarily excludes previously searched target peaks to avoid interference in subsequent iterations. The specific steps are shown in Algorithm 1.
Algorithm 1 Multi-Fast target time-frequency difference extraction method for passive radar |
- 1:
Divide the search area by defining the delay search range as and the Doppler frequency search range as . Initialize the historical particle set as and the optimal particle set as ; - 2:
Initialize the particle population by generating Q particles and dividing them into H equally sized subpopulations (); - 3:
Calculate the fitness values of particles that do not intersect with the historical particle set (), setting the fitness values of particles that intersect with the optimal particle set () to zero. Compute the fitness values of the remaining particles based on the historical particle set. Subsequently, update each particle’s individual optimal value (), the global optimal value () for each subpopulation, and the global optimal position () for each subpopulation; - 4:
Update each particle’s velocity and position according to ( 18) and ( 19) while also applying boundary condition handling; - 5:
Check whether the number of iterations has reached the maximum iteration count () for the subpopulation. If so, proceed to step 6; otherwise, check if the target selection threshold () has been met. If it has, proceed to step 12; if not, repeat steps 3 to 5; - 6:
Merge the subpopulations to obtain the global optimal fitness () and the global optimal position (g); - 7:
Determine whether the target screening threshold () has been reached. If so, proceed to step 12; otherwise, proceed to step 8; - 8:
Initialize all particles simultaneously as the initial gray wolf population; - 9:
Select the three particles with the highest fitness values as the three wolves, , , and . Then, update the position of particle ( ) according to Equations ( 15)–( 17). Subsequently, update the velocity and position of each particle using Equations ( 20) and ( 19) while applying boundary condition handing; - 10:
Calculate the fitness values for particles that do not intersect with the historical particle set (), while the remaining particles derive their fitness values directly from the historical particle set. Then, update the global best value () and the global best position (g); - 11:
Determine whether the maximum number of iterations () has been reached. If so, proceed to step 13; otherwise, check if the target selection threshold () has been met. If it has, proceed to step 12; if not, repeat steps 8 to 11; - 12:
Save all currently discovered particles to the and store the global best particle in the optimal particle set (). Then, repeat steps 2 to 12; - 13:
Determine whether the target selection threshold () has been met. If so, proceed step 12; otherwise, output the as the result of the cross-ambiguity function calculation.
|
3.1.4. Computational Complexity Analysis
As shown in
Table 2,
is the number of Doppler frequency bins in the RD spectrum obtained by the cross-correlation FFT method.
represents the number of algorithm iterations,
N is the signal length,
Q is the population size, and
is the repetition factor for multiple particles updating to the same Doppler frequency.
M is the number of targets,
is the number of iterations for the small population,
is the size of the small population,
is the number of iterations for the gray wolf population, and
is the size of the gray wolf population.
Since, in practical situations, the Doppler frequency range of the targets is unknown, the value of
K is set as large as possible to ensure that all targets can be searched. Therefore, based on the algorithm’s principles, we have
<
and
>
. As a result, the method proposed in this paper reduces the computational complexity compared to the cross-correlation FFT method. However, it increases computational complexity compared to the method proposed in reference [
34] due to the added iterative search mechanism, which ensures the multi-target search capability of the proposed method.
5. Discussion
In real-world applications, the proposed system may encounter several limitations that could impact its performance. Some of the most common factors include the following:
First, adverse weather conditions such as heavy rain, snow, or fog can significantly degrade the performance of radar and other sensing systems. Radar signals may be heavily attenuated by precipitation, resulting in reduced detection capabilities.
Next, due to its bistatic configuration, the passive radar system imposes strict terrain requirements. Obstructions such as mountains and tall buildings can block both the echo signal and even the direct signal reception. Additionally, during long-range detection, the Earth’s curvature may hinder the reception of reference signals.
Finally, in practical applications, environmental parameters evolve over time. As a result, the features extracted by the network may no longer be applicable to the current conditions. Therefore, periodic data collection and network retraining are necessary to maintain optimal performance.