A Classification Algorithm of UAV and Bird Target Based on L/K Dual-Band Micro-Doppler and Mamba

Zhang, Tao; Song, Xiaoru

doi:10.3390/drones10040265

Open AccessArticle

A Classification Algorithm of UAV and Bird Target Based on L/K Dual-Band Micro-Doppler and Mamba

by

Tao Zhang

^1,*

and

Xiaoru Song

²

¹

School of Mechatronic Engineering, Xi’an Technological University, Xi’an 710021, China

²

School of Electronic and Information Engineering, Xi’an Technological University, Xi’an 710021, China

^*

Author to whom correspondence should be addressed.

Drones 2026, 10(4), 265; https://doi.org/10.3390/drones10040265

Submission received: 11 February 2026 / Revised: 1 April 2026 / Accepted: 3 April 2026 / Published: 6 April 2026

(This article belongs to the Section Drone Communications)

Download

Browse Figures

Versions Notes

Highlights

What are the main findings?

We introduced the patch-tokenization mechanism to handle the two-dimensional micro-Doppler spectrograms, achieving a unification of the input representation under the serialized modeling framework, which is beneficial for the processing of radar echo signals.
We propose a micro-Doppler spectrum classification architecture based on the state-space model and the construction of a classification framework for unmanned aerial vehicles (UAVs) and birds based on dual-branch parallel encoding and late fusion (LF) in the L-band and K-band, which accomplishes classification and recognition of the target.

What are the implications of the main findings?

The first major finding is that the proposed block serialization mechanism for processing two-dimensional micro-Doppler spectra can effectively enhance the processing performance of radar echo signals.
The second major finding is the algorithm proposed for diagnosing and identifying drone and bird targets using dual-band radar signals. This mainly involves fully leveraging the complementary information of the two bands in terms of Doppler scale and fine motion texture details at the physical mechanism level of the recognition network, enabling highly accurate classification of radar echo signals.

Abstract

To address the challenge of accurately distinguishing UAVs and birds in a low-altitude detection field, this paper proposes a classification algorithm of UAVs and birds based on L/K dual-band micro-Doppler spectrograms and Mamba. We establish a dual-band radar detection model for unmanned aerial vehicles (UAVs) and birds, provide a method for characterizing the Doppler parameters of the echo signals, and research a UAV and bird target classification network model that integrates micro-Doppler and Mamba. Based on a dual-branch encoding framework, we use Patch block decomposition to design a classification model to serialize the two-dimensional spectrogram of the echo signal, and introduce the Mamba state-space backbone network to extract the long-term sequence features of the target’s micro-motion. The main breakthrough of the proposed classification algorithm lies in the feature fusion stage, where a late fusion strategy is adopted to integrate the dual-path high-level representation measures, fully leveraging the sensitivity of the K-band to high-frequency textures and the scale complementarity of the L-band. Then, according to the joint loss function of mutual learning and contrastive learning, we improve the model’s prediction consistency and representation discriminability. Through experimental testing, the results show that the proposed method can effectively classify UAVs and birds, and compared with other algorithms, the accuracy rate reaches 97.5%.

Keywords:

Mamba-based SSM encoder; micro-doppler features; multiband radar; UAV; UAV–bird classification structure

1. Introduction

With the increasingly frequent activities of UAVs in low-altitude airspace, UAV detection for defense early warning and key-area protection has become one of the key components of a low-altitude security system [1]. Illegal intrusion, improper use, and the potential security risks of UAVs have continuously promoted the research and engineering application of UAV detection technologies. Radar detection technology, due to its advantages of a long detection range, its all-weather capability, and its strong adaptability to complex weather conditions, has become an important means for low-altitude UAV detection and surveillance [2]. In real low-altitude airspace, in addition to UAVs, there are also many typical non-UAV targets, the flying birds are one of them. Because they have the characteristics of being low, slow, and small, their echo features can be easily confused under radar detection, which leads to false alarms and reduces the recognition reliability of the radar detection system [3]. Therefore, how to accurately distinguish UAVs from birds in complex low-altitude scenarios has become a key issue for improving the effectiveness of radar detection and reducing the false alarm rate.

To address the above problem, this paper proposes a UAV–bird classification method based on L/K dual-band radar. The proposed method exploits the complementary advantages of different bands in propagation characteristics and scattering responses, and jointly mines the micro-motion and scattering features of UAV and bird targets, thereby improving the classification accuracy of the two types of targets and enhancing the recognition capability in complex low-altitude environments. The main contributions and innovations of this paper are as follows:

(1): A patch-tokenization mechanism is introduced to process two-dimensional micro-Doppler spectrograms. By mapping the time-frequency distribution into sequences of feature vectors, a unified input representation within the sequence modeling paradigm is achieved. This preserves local time-frequency correlations while providing a standardized input pipeline for isomorphic feature extraction and subsequent fusion of multi-band data.
(2): A micro-Doppler spectrogram classification architecture based on a state-space model is proposed. The designed architecture utilizes the linear recurrence property of Mamba to replace the traditional self-attention (SA) mechanism. While maintaining the capability for modeling long-term micro-Doppler sequences, it effectively reduces computational complexity and memory overhead in large-scale time-frequency data processing.
(3): A classification framework is constructed based on parallel L-band and K-band dual-branch encoding and late fusion (LF) to accomplish the classification and recognition of the target. From the perspective of physical mechanisms, the designed classification framework fully leverages the complementary information provided by the two bands regarding Doppler scale and micro-motion texture details. A combined loss function oriented towards dual-branch collaborative learning is introduced to jointly constrain the discriminative capability of individual branches and the consistency and complementarity of the fused representations, thereby enhancing the stability and generalization performance of cross-band joint discrimination.

2. Related Work

In the task of radar-based classification and recognition of UAVs and birds, traditional methods relying on handcrafted features are limited by feature representation capacity and human bias, making it difficult to accurately classify UAV and bird targets. In recent years, deep learning-based classification and recognition methods have leveraged the advantage of end-to-end automatic feature learning. Without introducing human subjective bias, these methods can filter and integrate high-dimensional features, significantly improving the recognition accuracy of UAV and bird targets. The current research on deep learning recognition methods has a significant promoting effect on the detection of unmanned aerial vehicle (UAV) targets. It achieves excellent recognition results in typical conventional test environments. However, when disturbed by factors such as bird targets, the learning efficiency significantly decreases. This is one of the urgent problems that need to be solved at present.

Based on radar detection methods for identifying unmanned aerial vehicle (UAV) targets, there are currently quite a few research approaches. Their main focus is on learning discriminative features directly from spectrograms, phase sequences, or RCS sequences to reduce dependence on manual feature engineering and enhance overall robustness. Passafiume Marco et al. [4] focused on small UAV recognition using FMCW radar, analyzing the micro-Doppler characteristics of UAVs and performing classification based on target features in time-frequency representations. Furthermore, Chen et al. [5] analyzed the differences in micro-Doppler signatures between birds and rotorcraft UAVs through measured radar data. The research on characteristic analysis and feature extraction for unmanned aerial vehicles (UAVs) and other targets is quite abundant. Generally, UAVs and other targets such as birds exhibit certain differences in radar cross-sections (RCSs), motion characteristics, etc. [6,7,8,9,10]. For instance, the flight motion of UAVs is relatively more stable compared to birds, especially when UAVs execute tasks according to instructions, their motion state will follow certain regularity. In contrast, birds have extremely strong maneuverability, and their motion variability is poor. Kong et al. [11] extracted features such as standard deviation of velocity, average velocity, velocity oscillation frequency, heading angle standard deviation, heading oscillation frequency, etc. These features can effectively reflect the motion characteristics of the target, and thereby achieve classification and recognition based on the differences in target characteristics. Bao et al. [12] classified and recognized the target using the Fisher discrimination method, this method extracts features such as the speed of the target and radar cross-section (RCS), including mean, standard deviation, coefficient of variation, mean difference, period, difference variance, range, etc., to reflect the motion ability and stability of different target postures.

To more comprehensively explore the differences of the targets, Svante et al. [13] introduced micro-Doppler features based on track characteristics, and based on the micro-Doppler features of UAVs, including extracted rotor UAV signatures. Ai et al. [14] used classical signal processing methods to obtain a time-frequency diagram that intuitively presents the time-frequency characteristics of the signal. Liu et al. [15] proposed a radar signal processing framework based on complex-valued independent component analysis for the identification and quantification of the groups of targets. Liu et al. [16] proposed a method for classifying drones and birds based on grayscale spectrograms, this method utilized a powerful coordinate attention collaborative residual split attention network (RCA-ResNeSt) and was implemented in a holographic targeting radar system. Liu et al. [17] proposed a clutter suppression method based on the Orthogonal Matching Pursuit (OMP) algorithm, which is used to process the echo signals obtained by a Linear Frequency Modulated Continuous-Wave (LFMCW) radar. Ram M. Narayanan et al. [18] utilized the micro-Doppler spectrum characteristics of flying targets (such as unmanned aircraft and birds) to assist her method for remote classification. By using a custom-designed 10 GHz continuous-wave (CW) radar system, measurement data of various targets in different scenarios were recorded to create a dataset for image classification.

Although the existing research has developed significantly effective methods of UAV and bird classification based on radar micro-Doppler signatures, limitations remain in utilizing cross-band complementary information and modeling long-term micro-motion textures. Consequently, real-world applications often face issues such as degraded classification accuracy, sensitivity to interference, and elevated false alarm rates. First, most existing methods rely on single-band radar spectrograms, RCS sequences, or phase sequences for recognition. While these approaches can capture micro-motion differences to some extent, they do not fully exploit the complementary scattering responses and micro-Doppler characteristics across different frequency bands. When the observation angle changes, the signal-to-noise ratio decreases, or background clutter becomes stronger, the discriminability of single-band textures tends to degrade, thereby limiting the robustness of models in complex real-world scenarios. Second, micro-Doppler spectrograms differ from natural images in that their discriminative information mainly arises from temporal periodic patterns and long-term micro-motion behaviors. Traditional CNN-based methods are sensitive to local textures but are insufficient for modeling long-term dependencies. Transformer-based and hybrid models improve global modeling capability; however, self-attention mechanisms typically incur high computational and memory costs as the token sequence length increases, and are more prone to training instability. In this paper, we jointly utilize the L/K dual-band radar observations of UAVs and birds, fully leveraging the complementary advantages of different frequency bands to excavate the micro-motion and scattering features of targets from multiple perspectives; the purpose of this research is to improve the target recognition rate.

3. The Overall Research Framework and Ideas

Based on the collected echo data from UAVs and birds under L-band and K-band radar, we propose a classification method based on L/K dual-band micro-Doppler spectrograms and Mamba. By jointly leveraging the complementary advantages of an L-band and K-band in propagation characteristics and texture resolution, and utilizing the Mamba state-space model for efficient encoding of long-term micro-motion features, the proposed method achieves an accurate classification of drones and birds. The overall research framework and ideas are illustrated in Figure 1. Firstly, L-band and K-band Frequency-Modulated Continuous-Wave (FMCW) radars are deployed for multi-band radar data acquisition. Various models of UAVs and bird targets are detected from different observation angles, and their raw echo data are systematically collected to build a dataset. Secondly, preprocessing steps including synchronization, filtering, and motion compensation are applied to the collected dual-band echoes to suppress noise and clutter. Time-frequency analysis is then performed to extract micro-Doppler features from the L-band and K-band echoes, respectively, obtaining time-frequency spectrograms that reflect the target’s micro-motions. Finally, the extracted dual-band micro-Doppler features are effectively fused and fed into a Mamba-based classification network for training and recognition. This algorithm fully utilizes the strength of the Mamba model in processing long sequential data to deeply mine the dynamic evolution patterns of the target’s micro-Doppler features, ultimately achieving the classification and recognition of UAVs and birds.

4. Radar Echo Modeling for UAVs and Birds Based on Micro-Doppler Signatures

This section focuses on the micro-Doppler formation mechanism of unmanned aerial vehicles (UAVs) and bird targets. Based on the micro-Doppler theory, we have respectively established radar echo models for rotor UAVs and bird flapping wings. We also analyzed and discussed the composition, time-varying patterns, and dual-band characteristic differences of the main Doppler and micro-Doppler of these models. Through this study, it can be clearly determined that there are significant differences in the micro-Doppler characteristics between these two types of targets. This lays a theoretical foundation for the design of subsequent classification algorithms based on L/K dual-band micro-Doppler and Mamba.

4.1. UAV Echo Model and Micro-Doppler Parametric Representation

Based on classical micro-Doppler theory, the radar echo of a UAV can be considered as the combined result of the body’s translation and the rotation of its rotors. The body translation determines the main Doppler component of the echo, while the high-speed periodic rotation of the rotor blades introduces time-varying phase modulation around the main Doppler, thereby generating micro-Doppler signatures [19]. To provide a unified model for the micro-Doppler signatures of rotorcraft UAVs, this paper establishes a mathematical expression for the target echo starting from the radar complex baseband echo. A radar coordinate system

O - X Y Z

is established with the single radar station

S

as the origin, and a target coordinate system

o - x y z

is established with the UAV or bird target’s center of mass

o

as the origin. The observation coordinate systems of the radar and target are shown in Figure 2.

The UAV is considered as a combination of multiple scattering points composed of the main body and rotor blades. The complex baseband echo received by the radar can be expressed as the coherent superposition of echoes from multiple equivalent scattering points. The complex envelope echo component of the

q - t h

scattering point is denoted by Formula (1).

s^{U A V} (t) = \sum_{q = 1}^{Q} A_{q} \exp (- j \frac{4 π f_{c}}{c} R_{q} (t))

(1)

where

s^{U A V} (t)

represents the UAV echo signal;

A_{q}

denotes the complex amplitude coefficient of the

q - t h

scattering point;

Q

is the number of equivalent scattering points of the target;

f_{c}

is the carrier frequency;

c

is the speed of light; and

R_{q} (t)

is the instantaneous slant range from the radar to the

q - t h

scattering point [20].

From Formula (1), it can be observed that the echo phase is determined by the instantaneous slant range

R_{q} (t)

of the scattering point. Under the far-field line-of-sight (LOS) assumption, the variation in

R_{q} (t)

is primarily determined by the projection of the scattering point’s motion onto the LOS direction. Therefore, when the target undergoes a translation or micro-motion,

R_{q} (t)

changes over time, introducing time-varying modulation into the echo phase, which ultimately manifests as an instantaneous Doppler frequency shift.

The mapping relationship between the radial velocity

V_{r} (t)

of the UAV along the LOS direction and the Doppler frequency shift

f_{D}^{U A V} (t)

is expressed by Formula (2).

f_{D}^{U A V} (t) = - \frac{2 V_{r} (t)}{λ}

(2)

where

f_{D}^{U A V} (t)

represents the main Doppler component induced by the translational motion of the UAV body;

V_{r} (t)

is the radial velocity of the target relative to the radar; and

λ = c / f_{c}

is the radar operating wavelength.

To provide a unified description of the UAV translation and micro-motion, the overall radial velocity of the UAV can be extended to the instantaneous motion of any scattering point on the UAV. The instantaneous Doppler frequency shift of that scattering point can then be expressed as the projection of its velocity onto the LOS direction, Formula (3) is its calculation function.

f_{D}^{U A V} (t) = - \frac{2 f_{c}}{c} \dot{p} {(t)}^{T} n

(3)

where

\dot{p} (t)

is the instantaneous velocity of the scattering point in the radar coordinate system;

n

is the unit LOS vector pointing from the radar to the target; and

\dot{p} {(t)}^{T}

corresponds to the component of the scattering point’s velocity along the LOS direction, thereby determining the instantaneous Doppler frequency shift. When the scattering point moves only with the overall translation of the target, Formula (3) can be simplified to the main Doppler scenario characterized by Formula (2).

In the UAV motion model, the instantaneous velocity of a scattering point is composed of the body translation velocity and the local velocity induced by rotor rotation. For a scattering point rotating with an angular velocity

ω

, its instantaneous Doppler frequency is obtained by taking the time derivative of the echo phase, and it can be approximated by Formula (4).

f_{D}^{U A V} (t) = \frac{2 f_{c}}{c} {(V + \frac{d}{d t} (C (t) r_{0}))}^{T} n \approx \frac{2 f_{c}}{c} {(V + ω \times r (t))}^{T} n

(4)

where

V

is the body translation velocity vector;

r_{0}

is the initial position vector of the scattering point in the rotor coordinate system;

C (t)

is the rotation matrix describing the rotor’s rotation;

r (t) = C (t) r_{0}

is the position vector of the scattering point relative to the rotation center; and

ω

is the angular velocity vector of the rotor. As shown in Formula (4),

V

corresponds to the main Doppler component, while

ω \times r (t)

introduces time-varying phase modulation on top of the main Doppler, thereby forming the micro-Doppler signature of the UAV target [21].

4.2. Flapping Bird Echo Model and Parametric Representation of Micro-Doppler

According to micro-Doppler theory, the flapping motion of a bird can be modeled as the micro-motion scattering effect produced by an articulated non-rigid target. Although the bird’s body as a whole is not strictly rigid, at the scale of radar observation, its appendages such as wings can be equivalently represented as a multi-segment rigid-body chain connected by joints. Therefore, the radar echo of a flying bird can be attributed to the combined effect of the main body movement and the flapping motion of its wings. The movement of the bird’s body corresponds to the main Doppler component of the echo, while the periodic flapping motion of the wings introduces time-varying phase modulation on top of the main Doppler, forming the micro-Doppler signature.

For a bird target, following the same modeling approach as for the UAV target, it can also be described parametrically within the coordinate system shown in Figure 2. Ignoring the width of the bird’s body and assuming the wing surfaces do not bend during motion, we define the centers of both the UAV and the birds as the central point for the detection target relative to the radar, that is to say, the midpoint of the line connecting the roots of the bird’s two wings is also

o

. A scattering point on one wing is denoted as

Q

, the coordinate system of a bird target is still

o - x y z

, where the

y

-axis aligns with the bird’s heading direction. The radar coordinate system and the bird target coordinate system are parallel. The distance from the radar origin

O

to point

o

is

R_{0}

, with an azimuth angle

α

and an elevation angle

θ

. Assuming the bird target flies along the positive half-axis of the

y

-axis with a flapping frequency

F_{b i r d}

, the movement of any point on its wing can be considered as a composite of level flight and wing flapping. If the level flight velocity is

v

, and the angle between the wing surface and the

x o y

plane is the flapping angle

φ (t)

which is a time-varying function, it can be expressed by Formula (5).

φ (t) = θ_{\max} \sin (2 π F_{b i r d} t + φ_{0}) = θ_{\max} \sin (ω_{0} t + φ_{0})

(5)

where

θ_{\max}

is the flapping angle amplitude, i.e., the maximum angle between the wing surface and the

x o y

plane;

ω_{0} = 2 π F_{b i r d}

is the angular frequency of flapping; and

φ_{0}

denotes the initial phase of the flapping angle at time

t = 0

, typically set to

φ_{0} = 0

.

Let the distance from an arbitrary scattering point

Q

on one side of the bird wing to the symmetry center

o

be

l

, and after time

t

, point

Q

moves to point

Q^{'}

, where

Q Q^{'} = {(0, v t, 0)}^{T}

. Because the observation time is short to the radar, both the target flight distance and size are much smaller than the radar-to-target distance

R_{0}

, i.e.,

R_{0} ≫ {‖Q Q^{'}‖}_{2} = v t

,

R_{0} ≫ l

. At this time, the distance from point

Q

to point

Q^{'}

is given by Formula (6).

R (t) \approx R_{0} + v t \sin θ \sin α + l \sin θ \cos α \cos [ϕ (t)] + l \cos θ \sin [ϕ (t)]

(6)

Assuming the radar transmits a narrowband coherent signal with the wavelength

λ

, the normalized baseband signal returned from scattering point

Q

is as follows:

s^{B i r d} (t) = \exp [- j \frac{4 π R (t)}{λ}]

(7)

In Formula (7),

4 π R (t) / λ

represents the phase of point

Q

at time

t

.

Substituting Formula (6) into Formula (7) and differentiating it, the Doppler frequency of the echo from point

Q

can be obtained by Formula (8).

f_{D}^{B i r d} (t) = \frac{2}{λ} v \sin θ \sin α + \frac{2 l}{λ} φ^{'} (t) [\cos θ \cos ϕ - \cos α \cos θ \sin ϕ] = f_{b u l k}^{B i r d} + f_{m i c r o}^{B i r d} (t)

(8)

where

f_{D}^{B i r d}

represents the Doppler frequency caused by translational motion; and

f_{m i c r o}^{B i r d}

represents the micro-Doppler frequency caused by flapping motion. If the radar is initially aligned radially with the bird target at the start of observation, i.e.,

α = 0

,

f_{m i c r o}^{B i r d}

can be further simplified to Formula (9).

f_{m i c r o}^{B i r d} (t) = \frac{l}{λ} θ_{\max} ω_{0} {\cos [θ_{\max} \sin (ω_{0} t) + θ + ω_{0} t] + \cos [θ_{\max} \sin (ω_{0} t) + θ - ω_{0} t]}

(9)

From Formula (9), it can be observed that the micro-Doppler frequency of the bird target varies periodically with a period of at least

π

.

In summary, based on the motion models for UAVs and birds described above, their total Doppler frequency shift can be decomposed as follows:

\{\begin{cases} f_{D}^{U A V} (t) = f_{b u l k}^{U A V} + f_{m i c r o}^{U A V} (t) \\ f_{D}^{B i r d} (t) = f_{b u l k}^{B i r d} + f_{m i c r o}^{B i r d} (t) \end{cases}

(10)

In Formula (10), the instantaneous Doppler frequency shift of a target scattering point can be decomposed into a main Doppler term

(f_{b u l k}^{U A V}, f_{b u l k}^{B i r d})

and a micro-Doppler term

(f_{m i c r o}^{U A V} (t), f_{m i c r o}^{B i r d} (t))

. Here,

(f_{b u l k}^{U A V}, f_{b u l k}^{B i r d})

is primarily determined by the overall translation (radial velocity) of the target;

(f_{m i c r o}^{U A V} (t), f_{m i c r o}^{B i r d} (t))

is induced by micro-motions and varies with time. For rotorcraft UAVs,

f_{m i c r o}^{U A V} (t)

mainly originates from the periodic modulation caused by rotor blade rotation; for bird targets,

f_{m i c r o}^{B i r d} (t)

primarily arises from the phase modulation induced by wing flapping. Therefore, although both types of targets can be characterized by

f_{D} (t)

for their instantaneous Doppler frequency shift, the physical origins and time-varying patterns of their micro-Doppler terms differ [22].

Furthermore, this paper employs a joint representation using L-band and K-band radars. Since the Doppler frequency shift

f_{D} (t)

is proportional to the carrier frequency

f_{c}

, under identical motion conditions, the K-band radar, due to its higher carrier frequency, yields a larger micro-Doppler frequency scale, enabling the resolution of finer rotor micro-motion structures. In contrast, the L-band radar exhibits a smaller frequency scale but offers complementary advantages in terms of propagation conditions and clutter environment.

The micro-Doppler modeling of the aforementioned unmanned aircraft and birds provides the physical basis for the input representation and network design of this paper. On one hand, both rotor rotation and flapping wing movement will introduce time-varying phase modulation near the main Doppler, causing the target echo to exhibit obvious time-varying micro-Doppler textures. Therefore, this paper uses time-frequency analysis to map the original echo into a micro-Doppler time-frequency spectrum diagram to retain the structural information of the target’s micro-motion process in the two-dimensional space of time-frequency. On the other hand, since the Doppler frequency shift is proportional to the carrier frequency, the L-band and K-band have natural complementarity in the micro-Doppler scale and texture resolution capabilities, which provides physical support for the subsequent parallel encoding and fusion modeling of the dual-band. Moreover, the above model also indicates that the discrimination information between the unmanned aircraft and the birds is not only reflected in the local texture differences, but also in the periodic modulation across time and the long-term micro-motion evolution laws, which further inspires this paper to adopt the sequential modeling and state-space encoding framework in the subsequent algorithms.

5. Classification Algorithm Based on L/K Dual-Band Micro-Doppler and Mamba

To enhance the classification robustness between UAVs and bird targets, this paper converts radar echoes into micro-Doppler time-frequency spectrograms and formulates the classification task as supervised learning on L-band and K-band spectrograms. First, the model adopts a dual-branch parallel encoding and late fusion (LF) architecture. The L-band and K-band spectrograms are initially processed by a sequentialization module, which employs a patching mechanism to map the two-dimensional time-frequency distributions into sequences of feature vectors (tokens). This preserves local time-frequency correlations while meeting the requirements of sequential modeling. Second, the token sequences are fed into an encoder based on a state-space model (SSM) to extract global representations. A Mamba backbone is utilized to maintain the capability for modeling long-term micro-Doppler sequences while reducing computational complexity and memory overhead compared to the self-attention (SA) mechanism. Finally, feature fusion is performed at the representation level to complete classification and prediction. This late fusion approach fully leverages the high sensitivity of the K-band to micro-motion features from UAV rotors or bird flapping, as well as the complementary observation capabilities of the L-band under different environmental backgrounds. It achieves more compact feature integration while preserving the independent discriminative information from both bands. The network architecture of the classification algorithm based on L/K dual-band micro-Doppler and Mamba is illustrated in Figure 3.

5.1. L/K Dual-Band Micro-Doppler Spectrogram Representation and Positional Encoding

For each observation sample, its micro-Doppler spectrograms from the L-band and K-band are acquired; they are expressed in Formula (11).

X^{L} \in ℝ^{F \times T}, X^{K} \in ℝ^{F \times T}

(11)

where

F

is the frequency dimension, and

T

is the number of time frames.

To mitigate the impact of amplitude scale differences between different frequency bands on model training, this paper applies logarithmic compression to the original amplitude spectrograms and performs sample-level normalization. This ensures that the network focuses on extracting the texture and morphological features of the time-frequency distribution, reducing its reliance on absolute energy amplitudes [23].

To enable sequential modeling of the two-dimensional spectrograms, the input spectrogram

X^{b} \in ℝ^{F \times T}

,

b \in {L, K}

is partitioned into non-overlapping patches of size

p_{f} \times p_{t}

. The total number of patches

N

is calculated by Formula (12).

N = (\frac{F}{p_{f}}) (\frac{T}{p_{t}})

(12)

Within each non-overlapping patch, the

i

-th patch is vectorized to obtain

p_{i}^{b}

; it can be expressed by Formula (13).

p_{i}^{b} = vec (X^{b} [f_{i} : f_{i} + p_{f}, t_{i} : t_{i} + p_{t}])

(13)

where

p_{i}^{b}

represents the flattened vector of the

i

-th patch from the spectrogram of the

b

-th band,

p_{i}^{b} \in ℝ^{p_{f} p_{t}}

,

i = 1, \dots, N

.

The vectorized patches are mapped to a

D

-dimensional feature vector (token) space via linear projection. To incorporate time-frequency positional information, learnable positional encodings (PEs) are added to the feature vectors. A classification token (CLS Token)

z_{cls}

is prepended to the sequence to aggregate global discriminative information. The construction process of the feature sequence

z_{0}^{b}

can be expressed by Formula (14).

\{\begin{cases} z_{0}^{b} = z_{cls} \\ z_{i}^{b} = W_{e} p_{i}^{b} + b_{e} \\ z_{0}^{b} = [z_{0}^{b}; z_{1}^{b}; \dots; z_{N}^{b}] + E_{pos} \end{cases}

(14)

where

W_{e} \in ℝ^{D \times (p_{f} p_{t})}

and

b_{e} \in ℝ^{D}

are the weight matrix and bias term of the linear projection layer, respectively, which are used to map patch vectors into the

D

-dimensional token embedding space.

z_{i}^{b} \in ℝ^{D}

is the token representation corresponding to the

i

-th patch, and

z_{0}^{b} \in ℝ^{D}

is a learnable classification token designed to aggregate global discriminative information.

z_{0}^{b} \in ℝ^{(N + 1) \times D}

denotes the initial token sequence after incorporating positional encoding, where

N

is the number of patches obtained from spectrogram partitioning, and

[z_{0}^{b}; z_{1}^{b}; \dots; z_{N}^{b}]

represents concatenation along the token dimension. Meanwhile,

E_{pos} \in ℝ^{(N + 1) \times D}

is a learnable positional encoding matrix used to inject the time-frequency positional information of each token [24].

5.2. Mamba-Based SSM Spectrogram Encoding

Because micro-Doppler spectrograms contain periodic modulation textures spanning long time windows and exhibit strong non-stationary characteristics under varying observation angles, signal-to-noise ratios (SNRs), and frequency band conditions, the traditional self-attention mechanism suffers from quadratic computational complexity when processing large-scale feature vector sequences. Moreover, it is prone to unstable training convergence in small-sample scenarios. Therefore, this paper employs a state-space model (SSM) architecture as the backbone of the spectrogram encoder to replace the self-attention mechanism with quadratic complexity. The SSM block achieves a linear recurrent modeling of feature sequences through a state-space mixer, reducing computational and memory overhead while maintaining feature discriminative performance. It also enhances the capability to capture long-range time-frequency information. The network structure of the Mamba Encoder is illustrated in Figure 4.

During the encoding process, residual connections and feed-forward networks (FFNs) are used to maintain feature representation capability. Let

Φ_{SSM} (\cdot)

denote the spectrogram encoder formed by cascading

M

SSM blocks [25]. For each frequency band

b \in {L, K}

, Formula (15) is the encoder output.

H^{b} = Φ_{SSM} (z_{0}^{b})

(15)

where

H^{b}

denotes the high-level token representation of the corresponding band, where

H^{b} = z_{0}^{b} \in ℝ^{(N + 1) \times D}

.

Subsequently, the classification token at the beginning of the sequence is extracted as the global representation of that band; it can be expressed by Formula (16).

h^{b} = H_{0}^{b}

(16)

where

h^{b}

is the global representation,

h^{b} \in ℝ^{D}

; and

H_{0}^{b}

denotes the 0-th row of

h^{b}

, which is the output vector of

z_{c l s}

.

This design leverages the advantage of linear computational complexity in SSMs to effectively capture long-sequence dependencies in micro-Doppler features while reducing hardware resource consumption during training and inference. By using the classification token to aggregate global information, the model can extract the overall characteristics of the time-frequency texture within a single frequency band.

5.3. L/K Dual-Branch Joint Classification Head

After parallel encoding of the L-band and K-band spectrograms, the global representation vectors

h^{L}

and

h^{K}

are extracted [26], respectively. To integrate and complement information from multiple frequency bands, this paper employs a late fusion strategy by concatenating the two global representations along the feature dimension:

h = [h^{L}; h^{K}]

(17)

where

h

is the fused feature vector after concatenation,

h \in ℝ^{2 D}

.

Subsequently, the fused vector is fed into a Multi-Layer Perceptron (MLP) classification head, and the target classification prediction probability is computed via the softmax function; it can be expressed by Formula (18).

\hat{y} = softmax (W_{c_{0}} h + b_{c_{0}})

(18)

where

\hat{y}

is the classification prediction probability vector for the target, with

c_{0}

being the number of classes.

5.4. Joint Loss via Mutual Learning and Contrastive Learning

To simultaneously leverage the complementary information in the L-band and K-band micro-Doppler spectrograms, this paper adopts a multi-loss joint supervision objective to optimize the dual-branch encoding and late-fusion classification framework. First, the cross-entropy loss from the fused classification head is used as the main supervision term to ensure correct discrimination between UAVs and birds. Second, since the L/K dual branches are designed to classify spectrograms of the same target observed under different carrier frequencies, their posterior class distributions should remain consistent. Therefore, drawing on the concept of peer-teaching from Deep Mutual Learning, a mutual learning term based on Kullback–Leibler (KL) divergence is introduced to align the prediction distributions of the two branches during training, thereby enhancing robustness under cross-band conditions [27,28]. Finally, to further improve the intra-class compactness and inter-class separability of the micro-Doppler representations, a supervised contrastive learning (SupCon) loss is applied to the fused representation, mitigating the common issues of inter-class similarity and intra-class variation in micro-Doppler spectrograms.

For the L-band and K-band micro-Doppler spectrograms, let the classification logits of the

b

-th branch be

z^{b}

and its softmax probability can be gained by Formula (19).

p^{b} = s o f t \max (z^{b})

(19)

where

p^{b}

is the posterior probability vector obtained via standard softmax;

z^{b} \in ℝ^{C}

,

C = 2

is the number of classes; and the subscript

b \in {L, K}

denotes the band index, corresponding to the L-band and K-band branches, respectively.

To obtain smoother, soft distributions for distribution matching, the temperature parameter

T > 1

is introduced, defining the temperature-scaled softmax as follows:

p_{T}^{b} = softmax (\frac{z^{b}}{T})

(20)

where

p_{T}^{b}

represents the temperature-scaled softmax probability of branch

b

.

For each UAV or bird sample, the ground-truth label is denoted as

y

. The Cross-Entropy (CE) loss function for the

b

-th band branch is defined by Formula (21).

L_{C E}^{b} = - \sum_{c = 1}^{2} y_{c} \log (p_{c}^{b})

(21)

where

y_{c}

is the predicted probability for class

C

from branch

b

.

To avoid the bias introduced by uni-directional knowledge distillation, symmetric KL divergence is adopted. It is defined as follows:

L_{KL} = \frac{1}{2} [KL (p_{T}^{L} ‖ p_{T}^{K}) + KL (p_{T}^{K} ‖ p_{T}^{L})]

(22)

where KL denotes the Kullback–Leibler divergence, which constrains the consistency of the prediction distributions from the L and K branches for the same sample. The KL divergence is calculated by Formula (23).

KL (a ‖ b) = \sum_{c = 1}^{C} a_{c} \log \frac{a_{c}}{b_{c}}

(23)

where

a

and

b

are two probability distribution vectors, with

a_{c}

and

b_{c}

being their

c

-th components, respectively.

This term encourages the L-band and K-band branches to produce consistent class posterior distributions for the same sample, thereby achieving a collaborative constraint of discriminative information across different observation scales (carrier frequencies). To further enhance the discriminativeness of the spectrogram representations, a Supervised Contrastive Loss (SupCon) is introduced in addition to the cross-entropy supervision and the dual-band KL consistency constraint. Samples of the same class within a batch are treated as positive pairs, while samples from different classes are treated as negative pairs. This directly tightens the within-class distribution and enlarges the between-class separation in the feature space. Considering that the L and K bands provide complementary observations of the same target, the embeddings outputs from the two band branches are incorporated into the contrastive learning to simultaneously improve the consistency and discriminativeness of the cross-band representations.

Let a certain representation vector for sample

i

from a single branch be

u_{i} \in ℝ^{d}

, and its

l_{2}

-normalized version be

v_{i} = u_{i} / ‖u_{i}‖

. For an anchor point

i

, its set of positive samples is defined by Formula (24).

P (i) = {p \neq i ∣ y_{p} = y_{i}}

(24)

where

P (i)

represents the set of indices corresponding to positive samples that belong to the same class as the anchor point;

P \neq i

denotes the set of all sample indices participating in the comparison except for the anchor point itself; and

u_{i}

refers to the feature vector of sample

i

in the contrastive space [29].

The SupCon loss is defined by Formula (25).

L_{SupCon} = \sum_{i} \frac{- 1}{| P (i) |} \sum_{p \in p (i)} \log \frac{\exp (v_{i}^{T} v_{p} / τ)}{\sum_{a \neq i} \exp (v_{i}^{T} v_{a} / τ)}

(25)

where

v_{i}^{T} v_{p}

denotes the cosine similarity between normalized features; and

τ

is the contrastive temperature coefficient used to adjust the sharpness of the similarity distribution [29].

Therefore, the total loss function is defined as the weighted sum of the aforementioned loss components; it can be expressed by Formula (26).

L = L_{CE}^{F} + ω_{aux} (L_{CE}^{L} + L_{CE}^{K}) + ω_{KL} T^{2} L_{KL} + ω_{\sup} L_{SupCon}

(26)

where

L

represents the total loss;

L_{CE}^{F}

denotes the cross-entropy loss of the fused classification head;

L_{CE}^{L}

and

L_{CE}^{K}

are the auxiliary supervision terms for the L-band and K-band branches, respectively, with

ω_{aux}

as the weighting coefficient for this auxiliary supervision;

ω_{KL}

is the weighted consistency term, where

L_{KL}

is its weight and the coefficient

T^{2}

is used to offset the impact of temperature scaling on gradient magnitudes; and

ω_{\sup}

is the weighting coefficient for the SupCon loss term [30].

This paper employs a joint optimization strategy that combines the primary supervision from the fused head’s cross-entropy loss, the KL consistency constraint between the L/K dual branches for cross-band prediction alignment, and the discriminative learning via SupCon on the fused representations. This multi-loss design aligns with the L/K dual-band micro-Doppler modeling proposed in this work. It leverages the K-band branch to focus on the fine-grained micro-motion textures caused by rotors or flapping wings, while the L-band branch provides complementary observations under varying backgrounds and propagation conditions. The consistency term and the contrastive term respectively promote information synergy at the distribution level and the representation level between the two branches, thereby enhancing the robustness and generalization capability for UAV and bird classification.

6. Experiments and Analysis

To verify the effectiveness of the method proposed in this paper in the classification task of unmanned aircraft and birds, this section conducts a systematic experiment using aspects such as dataset construction, experimental setup, and experimental result analysis. Moreover, by combining with visualization results and comparative experiments, the classification performance, feature representation ability, and contributions of each component module of the proposed method are verified.

6.1. Dataset Construction

This paper employs an L-band and K-band frequency-modulated continuous-wave (FMCW) radar platform to build a dual-band cooperative observation system, in which the center frequency of the L-band radar is 1.5 GHz and that of the K-band radar is 24 GHz. A total of 540 L/K dual-band synchronized radar echo sequences of UAVs, birds, and other targets were collected. To form standardized input samples suitable for deep learning, each dual-band synchronized radar echo sequence was subjected to a unified time-frequency analysis and normalization processing to generate the corresponding dual-band micro-Doppler spectrogram samples. In this way, we eventually obtained 3240 dual-band micro-Doppler spectrogram samples, including 1260 UAV samples, 1080 bird samples, and 900 samples in the other category.

The UAV samples cover three typical rotary-wing UAV types, including two-rotor, four-rotor, and six-rotor UAVs, so as to reflect the influence of different rotor numbers and micro-motion mechanisms on micro-Doppler textures. The bird samples are derived from four bird species. The radar system synchronously records the micro-Doppler echoes generated during their natural flapping flight, so as to obtain time-frequency characteristics under different relative postures and flight stages. The samples in the other category consist of insects, small animals, plastic bags, balloons, and ground clutter, so as to enhance the robustness of the model against interference in the other category under complex low-altitude backgrounds. Among them, ground clutter mainly originates from non-target echoes caused by vegetation disturbance, ground reflections, and background scattering. For plastic bags and balloons, the experiment simulates their displacement and flipping motion within the observation area through suspension, towing, and throwing by thin strings, and the radar system synchronously records the echo sequences. In addition, a small number of samples also include non-stationary scattering caused by small moving targets at the edge of the observation area.

Regarding data acquisition conditions, all experiments were conducted in outdoor, open, low-altitude scenarios, and the acquisition environment covered sunny, cloudy, and light-wind conditions. During the acquisition process, the targets were mainly located within an observation range of 12–150 m from the radar, the observation azimuth angles were mainly distributed within the range of −60° to 60°, and the pitch angles were mainly distributed within the range of 10° to 45°. For UAV targets, the acquisition process covered typical motion states such as hovering, level flight, slow turning, ascending, and descending. For bird targets, the samples covered natural flapping flight and echo variations under different relative postures as much as possible. For interference targets in the other category, their non-target echoes in low-altitude activity areas were obtained through actual observation and simulated methods. And then, the dataset was partitioned by taking the complete L/K dual-band synchronized radar echo sequence acquired for the same target during a single continuous observation as the basic unit. The training, validation, and test sets were divided in a ratio of 7:1:2, corresponding to 2268, 324, and 648 samples, respectively. Among them, each L/K dual-band synchronized radar echo sequence and all spectrogram samples generated from it belong to only one data subset and do not appear simultaneously in the training, validation, and test sets. Different target types within the same category may appear in different data subsets, but they correspond to data acquired at different time periods, under different flight processes, and from different original radar echo sequences.

6.2. Experimental Setup

The experiments in this study were conducted on a Linux-based computing platform. The hardware configuration is Intel corporation, it included an Intel Xeon Platinum 8481C CPU, an NVIDIA A100 (32 GB) GPU, and 128 GB of DDR5 RAM. The software environment was built using the PyTorch 2.5.1 framework, Python 3.9, and CUDA 12.1. The specific hyperparameter settings of the proposed network are as follows. The input L-band and K-band micro-Doppler spectrograms are of the size 128 × 128. For each band branch, the spectrogram is serialized using an overlapping patch partitioning strategy, with a patch size of 32 × 32, a stride of 16, 49 patch tokens, and an input sequence length of 50 after adding one learnable CLS token. Each patch is linearly projected into a 256-dimensional token embedding space. The Mamba Encoder in each band branch consists of six encoding blocks in cascade, with a hidden dimension of 256, a feed-forward network dimension of 512, a Conv1d kernel size of three, and a dropout rate of 0.1. After encoding, the CLS representations from the output sequences of the L-band and K-band branches are extracted respectively and linearly mapped to 256-dimensional global embedding vectors; then, a late fusion strategy is adopted for feature-level fusion to obtain a 512-dimensional fused feature, which is finally fed into a one-layer fully connected classification head to output the final three-class classification result. The model is trained using the AdamW optimizer, with momentum parameters set to

β_{1} = 0.9

and

β_{2} = 0.999

, an initial learning rate of 0.001, a weight decay of 0.05, a batch size of eight, and 150 training epochs. The learning rate schedule adopts the Cosine Annealing strategy, and a warm-up scheme is used in the first 10 epochs to smoothly increase the learning rate; meanwhile, to enhance training stability, the gradient clipping threshold is set to 1.0. In terms of loss function settings, the main classification task uses the cross-entropy loss, the auxiliary branch supervision loss weight

β

is set to 0.3, the KL loss weight in the dual-branch consistency constraint

λ

is set to 0.5, the SupCon loss weight

γ

is set to 0.1, and the temperature coefficient

T^{2}

is set to 0.07. These settings are jointly used to ensure the training stability and final performance of the proposed dual-band Mamba classification network.

To conduct comparative experiments, we select VGG16, ResNet50, Swin Transformer, and ConvNeXt as baseline models. Considering that these networks are standard image classification architectures, the L-band and K-band micro-Doppler spectrograms are used as inputs to two separate branches to adapt to the dual-band radar time-frequency classification task. In terms of model structure, all baseline models are constructed in a dual-branch manner, where the L-band and K-band spectrograms are fed into two homogeneous branches for feature extraction, followed by late fusion at the high-level feature stage and a shared classification head to produce the final three-class output. Specifically, VGG16, ResNet50, and ConvNeXt employ dual-branch convolutional feature extraction structures, while Swin Transformer adopts a dual-branch Transformer encoding structure. The input preprocessing pipeline, training strategy, and major hyperparameter settings of the baseline models are consistent with those of our proposed method.

To comprehensively evaluate the performance of the recognition network, this experiment adopts the following four metrics for measurement: Accuracy (ACC), macro-averaged Precision (mAP), macro-averaged F1-score (MF1), and the Kappa Coefficient (KC).

A C C = \frac{T P + T N}{T P + T N + F P + F N}

(27)

m A P = \frac{1}{N} \sum_{i = 1}^{N} \frac{T P_{i}}{T P_{i} + F P_{i}}

(28)

M F 1 = \frac{1}{N} \sum_{i = 1}^{N} \frac{2 \times \frac{T P_{i}}{T P_{i} + F P_{i}} \times \frac{T P_{i}}{T P_{i} + F N_{i}}}{\frac{T P_{i}}{T P_{i} + F P_{i}} + \frac{T P_{i}}{T P_{i} + F N_{i}}}

(29)

p_{e} = \frac{(T P + F N) \times (T P + F P) + (F N + T N) \times (T N + F P)}{{\tilde{N}}^{2}}

(30)

K C = \frac{A C C - p_{e}}{1 - p_{e}}

(31)

where

N

represents the number of classes;

T P

(True Positive) denotes the number of samples correctly predicted as positive;

T N

(True Negative) denotes the number of samples correctly predicted as negative;

F P

(False Positive) denotes the number of samples incorrectly predicted as positive;

F N

(False Negative) denotes the number of samples incorrectly predicted as negative;

p_{e}

represents the relative count of misclassifications; and

\tilde{N}

represents the hypothetical total number of samples.

6.3. Analysis of Experimental Results

The classification algorithm proposed in this paper, based on L/K dual-band micro-Doppler and Mamba, uses the textural differences in the micro-Doppler time-frequency spectrograms between UAVs and birds as the basis for discrimination, modeling the classification task as supervised learning on dual-band spectrograms. Therefore, this section presents the visualization results of micro-Doppler time-frequency spectrograms for some UAV and bird samples to intuitively display the time-frequency representation patterns of the experimental data. Figure 5 shows a comparison of micro-Doppler time-frequency spectrograms for some UAV and bird samples under L-band and K-band conditions. L/K dual-band observations can characterize the micro-Doppler texture differences arising from UAV rotor rotation and bird flapping motions at different scales. For rotorcraft UAVs, the modulation stripes generated by rotor blade rotation are clearer in the K-band spectrograms, with more prominent spectral peaks. This enhancement is more pronounced when the blades are longer, or the rotational speed is higher. Under L-band conditions, the time-frequency spectrograms exhibit relatively constrained broadening and smoother textures, yet they still preserve the overall periodic structure of the rotor modulation. For bird targets, the sidebands near the main Doppler caused by flapping are more readily apparent in the K-band spectrograms, with clearer texture fluctuations. The modulation features are more significant when the wingspan is larger, or the flapping frequency is higher. Under L-band conditions, the spectral details are relatively subdued, with narrower broadening, placing greater emphasis on the overall outline and low-frequency modulation trends. Based on the aforementioned dual-band and cross-category differences, the proposed classification algorithm based on L/K dual-band micro-Doppler and Mamba encodes the two spectrogram streams separately to extract long-term sequential texture representations. By fusing representations, it jointly utilizes the details from the K-band and the scale complementarity of the L-band. Simultaneously, cross-branch consistency constraints are introduced to align the discrimination results from both bands, thereby achieving stable differentiation between UAVs and birds.

To compare the convergence and stability of different models during the training process, this paper records the accuracy (ACC) and loss variation curves with respect to training epochs for each compared method under identical training configurations. The results are shown in Figure 6.

As shown in Figure 6, all models exhibit rapid improvement in Accuracy (ACC) and a reduction in loss during the initial training phase. To the same epoch, the proposed algorithm demonstrates faster and more stable convergence, the ACC of the proposed algorithm increases more rapidly and reaches a stable plateau earlier, while the loss decreases more substantially and converges to a lower final value. This is attributed to the proposed method’s ability to jointly utilize the complementary discriminative information from the dual-band spectrograms, enabling the model to establish more stable decision boundaries early in training. Furthermore, the Mamba-based SSM spectrogram encoder possesses efficient long-range dependency modeling capabilities, effectively capturing the global structure of micro-Doppler textures, thereby accelerating feature convergence. Overall, the convergence curves validate the effectiveness of the proposed method from the perspective of the training process.

To further analyze the recognition performance of the proposed method in the UAV and bird classification task and to identify the main sources of confusion, the prediction results of the test set are summarized, and the confusion matrix is presented in Figure 7.

As shown in Figure 6, the proposed method achieves high classification accuracy for both UAVs and birds on the test set, although a small number of misclassifications still exist. Specifically, the real UAVs misclassified as birds indicate that when the rotor modulation textures of UAVs are not obvious, when the target posture changes significantly, or when the micro-Doppler sideband structures are weakened under low signal-to-noise ratio conditions due to noise and clutter interference, the spectrogram may exhibit features similar to bird flapping textures. The real birds misclassified as UAVs may be caused by the fact that, under specific flapping postures, relatively stable flapping frequencies, or low signal-to-noise ratio conditions, the local time-frequency textures of birds become similar to rotor modulation stripes, thereby increasing the risk of confusion. In addition, some “Other” samples are misclassified as birds, indicating that under specific motion states or noisy backgrounds, “Other” targets may also produce local time-frequency textures similar to those of birds; a small number of UAV samples are misclassified as “Other,” which reflects that the model still has a certain risk of missed detection under extremely low signal-to-noise ratio conditions or when the textures are incomplete. Overall, the confusion matrix shows that the proposed method can distinguish UAV and bird targets relatively well, but a small number of misclassifications may still occur under complex scenarios such as a low signal-to-noise ratio, rapid posture changes, and enhanced background interference. This indicates that the proposed method is generally stable, but its discriminative ability on difficult samples still has room for further improvement.

To intuitively demonstrate the discriminability of features learned by the model across different classes, t-SNE (t-distributed Stochastic Neighbor Embedding) is employed to project the high-dimensional feature embeddings output by the network into a two-dimensional plane for comparison. For each sample in the test set, the high-level feature representations extracted by the baseline network and the proposed algorithm are respectively obtained and visualized with different colors representing the three target classes: blue for UAV, red for Bird, and orange for Other. The comparative results are shown in Figure 8. Notably, the baseline network’s architecture does not include the proposed components such as Patch-Tokenization for sequence input, a Mamba-SSM encoder, L/K dual-branch late fusion, and the combined mutual learning and contrastive learning loss.

As shown in Figure 8a, the feature distribution of the baseline network exhibits a certain degree of inter-class overlap and blurred boundaries. UAVs and birds, as well as birds and the “Other” class, are more prone to local aliasing. Some UAV features resemble those of the “Other” class, indicating that, under the influence of background interference and non-rigid micro-motion patterns, the discriminative representations learned by the baseline network remain insufficiently compact. In contrast, in Figure 8b, the three-class samples from the proposed algorithm show more compact clustering and significantly increased inter-class separation, with only a few outliers remaining. UAVs, birds, and the “Other” class exhibit a more stable trend of separation in the embedding space. This demonstrates that the proposed Patch-Tokenization and Mamba-SSM can more effectively encode the local structure and long-term sequential relationships of micro-Doppler spectrograms. Combined with the integration of complementary cross-band information through L/K dual-branch late fusion, and the joint constraints on consistency and discriminability imposed by the combined mutual learning and contrastive learning loss, the proposed approach enhances the overall intra-class compactness and inter-class separability in the feature space. This is conducive to the effective classification of UAVs, birds, and the “Other” class. Meanwhile, the small number of samples still located near the class boundaries in Figure 8b also correspond to the small number of misclassification cases shown in the confusion matrix, indicating that these samples mainly belong to difficult cases affected by a low signal-to-noise ratio, large posture variations, or complex background interference. From the perspective of feature distribution, the t-SNE results further verify the overall feature representation ability of the proposed method and provide relatively intuitive support for the reliability analysis of the proposed method.

To verify the overall effectiveness of the proposed classification algorithm based on L/K dual-band micro-Doppler and Mamba, this paper selects VGG16, ResNet50, Swin Transformer, and ConvNeXt as comparative methods. Performance comparisons were conducted under identical training/testing splits and evaluation metrics. The results are shown in Table 1.

From Table 1, it can be observed that the proposed algorithm achieves optimal results across all evaluation metrics, with ACC, mAP, mF1, and KC reaching 0.975, 0.962, 0.956, and 0.952, respectively. Compared to the best-performing model among the baseline methods, Swin Transformer, the proposed approach improves ACC by 1.7 percentage points, and mAP, mF1, and KC by 0.011, 0.008, and 0.009, respectively. This indicates that the proposed dual-band fusion modeling can more fully utilize the complementary discriminative information from the L-band and K-band spectrograms, thereby enhancing the overall classification performance for UAVs and birds. In terms of computational cost, the proposed method has a FLOPs count of 18.95 G, which is at an acceptable level of complexity, and its performance surpasses that of comparable or even higher-complexity models. In terms of computational cost, the proposed method has 18.95 G FLOPs, which is within an acceptable complexity range, and it achieves better performance than comparable or higher-complexity models. Combined with the current network structure and experimental platform configuration, the model weight size of the proposed method is approximately 48.3 MB, and the average inference time for a single dual-band micro-Doppler spectrogram sample is approximately 6.8 ms. Considering the model size, computational cost, and inference time together, the proposed method has certain potential for real-time radar applications after the completion of the front-end time-frequency transformation.

In addition, to further evaluate the statistical robustness of the model performance and to verify that the performance improvement of the proposed method compared to the strongest baseline model, Swin Transformer, is not caused by chance due to random initialization, we conducted independent, repeated experiments with three different random seeds for both the proposed method and the strongest baseline model, Swin Transformer, under the same data split and training configuration. The experimental results are shown in Table 2. As can be seen from the repeated experimental results in Table 2, the proposed method shows only small fluctuations across the three independent repeated runs. The mean values of ACC, mAP, MF1, and KC are 97.43%, 96.17%, 95.57%, and 95.17%, respectively, with corresponding standard deviations of 0.21%, 0.21%, 0.29%, and 0.29%. In comparison, the mean values of ACC, mAP, MF1, and KC for Swin Transformer are 95.73%, 95.10%, 94.80%, and 94.30%, respectively, with corresponding standard deviations of 0.17%, 0.33%, 0.16%, and 0.14%. These results indicate that the proposed method is able to maintain stable and superior classification performance under different random initializations. The performance fluctuations are small, and the observed performance gain is not caused by random variation during training, demonstrating good stability and robustness.

To verify the contribution of each innovation in the proposed method to the classification performance of UAVs and birds, ablation experiments were conducted by progressively introducing different modules, and performance metrics including ACC, mAP, mF1, and KC were evaluated on the test set. The results are shown in Table 3.

As can be seen from Table 3, the classification performance shows a steady improvement trend as the model structure and training strategy are progressively introduced. When only the L/K dual branches and the basic supervised loss are used, the model already exhibits a preliminary discriminative capability. After introducing Patching, ACC, mAP, and mF1 all improve, indicating that converting the two-dimensional micro-Doppler spectrogram into a token sequence can better preserve local time-frequency structures and provide a more effective representation basis for subsequent sequence modeling. Next, after adding the Mamba-SSM encoder, the improvements in mAP, mF1, and KC become more pronounced, demonstrating that the SSM-based long-range dependency modeling is more effective in capturing the global periodic structures and cross-temporal correlations of micro-Doppler textures, thereby improving classification performance and overall consistency. Furthermore, by applying Late Fusion to merge the two L/K branches at the high-level representation stage, ACC and KC continue to improve, indicating that late fusion can achieve complementary integration while preserving the independent discriminative information of the two bands, which can reduce cross-band information interference. Finally, after introducing the proposed joint loss, combining mutual learning and contrastive learning, ACC, mAP, mF1, and KC all achieve further improvement without changing the model structure, indicating that the collaborative constraint imposed by the joint loss on the outputs of the two branches can enhance feature distributions and thereby improve the final classification performance. In summary, the proposed method achieves consistent improvements in ACC, mAP, mF1, and KC through spectrogram tokenization, Mamba-SSM-based global modeling, L/K dual-branch late fusion, and joint optimization with the comprehensive loss.

7. Conclusions

To address the problem that UAVs and birds are easily confused in radar detection, this paper proposes a classification method based on L/K dual-band micro-Doppler spectrograms. By constructing a micro-Doppler motion model, generating time-frequency spectrogram data, and combining Patch-tokenization-based sequential modeling, a Mamba state-space backbone network, a dual-branch late-fusion strategy, and a spectrogram recognition network with multi-scale perception and temporal modeling capability are formed. The experimental results based on the measured data show that the proposed method achieves 97.5% ACC, 96.2% mAP, 95.6% MF1, and 95.2% KC in the test set. In the future, cross-modal feature alignment or self-supervised learning may be introduced to further improve the generalization ability of the model in complex scenarios.

Author Contributions

Conceptualization, methodology, software and validation, and writing—original draft, T.Z.; methodology, software, X.S. All authors will be informed of each step of the manuscript processing, including submission, revision, revision reminder, etc., via emails from our system or assigned Assistant Editor. All authors have read and agreed to the published version of the manuscript.

Funding

This study was supported by the Project of the Shaanxi Provincial Science and Technology Department (No. 2025CY-YBXM-096).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data that support the findings of this study are available from the corresponding author upon reasonable request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Na, Z.; Cheng, L.; Sun, H.; Lin, B. Survey on UAV Detection and Identification Based on Deep Learning. J. Signal Process. 2024, 40, 609–624. [Google Scholar] [CrossRef]
Yang, J.; Li, Y.; Zheng, B.; Li, X.; Meng, Z. Research on hierarchical fusion optimization algorithm for dynamic allocation of multi-UAV radar resources. Measurement 2026, 267, 120564. [Google Scholar] [CrossRef]
Zhao, C.; Luo, G.; Wang, Y.; Chen, C.; Wu, Z. UAV recognition based on micro-Doppler dynamic attribute-guided augmentation algorithm. Remote Sens. 2021, 13, 1205. [Google Scholar] [CrossRef]
Marco, P.; Neda, R.; Giovanni, C.; Alessandro, C. Modeling small UAV micro-Doppler signature using millimeter-wave FMCW radar. Electronics 2021, 10, 747. [Google Scholar] [CrossRef]
Chen, X.; Nan, Z.; Zhang, H.; Chen, W.; Guan, J. Experimental research on radar micro-Doppler of flying bird and rotor UAV. Chin. J. Radio Sci. 2021, 36, 704–714. [Google Scholar] [CrossRef]
Ogawa, K.; Tsagaanbayar, D.; Nakamura, R. ISAR imaging for drone detection based on backprojection algorithm using millimeter-wave fast chirp modulation MIMO radar. IEICE Commun. Express 2024, 13, 276–279. [Google Scholar] [CrossRef]
Delleji, T.; Slimeni, F. RF-YOLO: A modified YOLO model for UAV detection and classification using RF spectrogram images. Telecommun. Syst. 2025, 88, 33–45. [Google Scholar] [CrossRef]
Lin, X.; Niu, Y.; Yu, X.; Fan, Z.; Zhuang, J.; Zou, A.M. Paying more attention on backgrounds: Background-centric attention for UAV detection. Neural Netw. 2025, 185, 107182. [Google Scholar] [CrossRef]
Noor, A.; Li, K.; Tovar, E.; Zhang, P.; Wei, B. Fusion flow-enhanced graph pooling residual networks for Unmanned Aerial Vehicles surveillance in day and night dual visions. Eng. Appl. Artif. Intell. 2024, 136, 108959. [Google Scholar] [CrossRef]
Guthula, V.B.; Oehmcke, S.; Chilaule, R.; Zhang, H.; Lang, N.; Kariryaa, A.; Mottelson, J.; Igel, C. Drone imagery for roof detection, classification, and segmentation to support mosquito-borne disease risk assessment: The Nacala-Roof-Material dataset. Sci. Remote Sens. 2025, 12, 100306. [Google Scholar] [CrossRef]
Kong, A.; Wang, Y.; Zhao, J. Lightweight and accurate infrared rangefinder-fused LiDAR-inertial localization for UAV-based bridge inspection in GPS-denied environments. Measurement 2026, 257, 118983. [Google Scholar] [CrossRef]
Bao, L.; Guo, Z.; Gao, X.; Li, C. Stealth UAV Path Planning Based on DDQN Against Multi-Radar Detection. Aerospace 2025, 12, 774. [Google Scholar] [CrossRef]
Björklund, S.; Hernnäs, H. Statistical analysis of the radar cross section of two small fixed-wing drones using typical flights. IET Radar Sonar Navig. 2023, 18, 125–136. [Google Scholar] [CrossRef]
Ai, Y.; Li, R.; Xiang, C.; Liang, X. Real-Time occluded target detection and collaborative tracking method for UAVs. Electronics 2025, 14, 4034. [Google Scholar] [CrossRef]
Liu, J.; Xu, Q.; Su, M.; Chen, W. UAV Swarm Target Identification and Quantification Based on Radar Signal Independency Characterization. Remote Sens. 2024, 16, 3512. [Google Scholar] [CrossRef]
Dai, T.; Mei, L.; Zhang, Y.; Tian, B.; Guo, R.; Wang, T.; Du, S.; Xu, S. UAVs and birds classification using robust coordinate attention synergy residual split-attention network based on micro-Doppler signature measurement by using L-band staring radar. Measurement 2023, 222, 113692. [Google Scholar] [CrossRef]
Fan, S.; Wu, Z.; Xu, W.; Zhu, J.; Tu, G. Micro-Doppler signature detection and recognition of UAVs Based on OMP Algorithm. Sensors 2023, 23, 7922. [Google Scholar] [CrossRef]
Narayanan, R.M.; Tsang, B.; Bharadwaj, R. Classification and discrimination of birds and small drones using radar micro-doppler spectrogram images. Signals 2023, 4, 337–358. [Google Scholar] [CrossRef]
Chen, Y.; Li, S.; Yang, J.; Cao, F. Rotor blades echo modeling and mechanism analysis of flashes phenomena. Acta Phys. Sin. 2016, 65, 287–297. [Google Scholar] [CrossRef]
Berroth, M.; Jacob, A.F.; Schmidt, L.-P. Classification of small UAVs and birds by micro-Doppler signatures. Int. J. Microw. Wirel. Technol. 2014, 6, 435–444. [Google Scholar] [CrossRef]
Yu, J.; Yu, J.; Zhang, Z.; Pan, Q.; Mu, Y. Lightweight imperfect wheat grain identification model based on improved efficientNet. J. Chin. Cereals Oils Assoc. 2025, 40, 192–202. [Google Scholar] [CrossRef]
Li, H.; Cao, Z.; Zhang, X. A position distribution measurement method and mathematical modeling of two projectiles simultaneous hitting target based on three photoelectric encoder detection screens. Def. Technol. 2025, 53, 151–168. [Google Scholar] [CrossRef]
Zhang, W.; Ma, H.; Li, X.; Liu, X.; Jiao, J.; Zhang, P.; Gu, L.; Wang, Q.; Bao, W.; Cao, S. Imperfect wheat grain recognition combined with an attention mechanism and residual network. Appl. Sci. 2021, 11, 5139. [Google Scholar] [CrossRef]
Liang, X.; Huang, Z.; Lu, L.; Tao, Z.; Yang, B.; Li, Y. Deep learning method on target echo signal recognition for obscurant penetrating lidar detection in degraded visual environments. Sensors 2020, 20, 3424. [Google Scholar] [CrossRef]
Li, H.; Zhang, X.; Kang, W. A testing and data processing method of projectile explosion position based on three UAVs’ visual spatial constrain mechanism. Expert Syst. Appl. 2025, 265, 125984. [Google Scholar] [CrossRef]
Bing, Z.; Zhou, L.; Dong, S.; Wen, Z. A Classification and Recognition Method for UAV Targets Based on Micro⁃Doppler and Machine Learning. Radar Sci. Technol. 2024, 22, 549–556. [Google Scholar] [CrossRef]
Ghazlane, Y.; Ahmed, E.H.A.; Hicham, M. Real-time lightweight drone detection model: Fine-grained Identification of four types of drones based on an improved Yolov7 model. Neurocomputing 2024, 596, 127941. [Google Scholar] [CrossRef]
Seidaliyeva, U.; Akhmetov, D.; Ilipbayeva, L.; Matson, E.T. Real-Time and Accurate Drone Detection in a Video with a Static Background. Sensors 2020, 20, 3856. [Google Scholar] [CrossRef] [PubMed]
Guo, D.; Qu, Y.; Sun, J.; Zhou, X.; Yin, S. Speckle suppression and Camshift algorithm optimization based on single-photon LiDAR for UAV tracking. Infrared Phys. Technol. 2025, 150, 106018. [Google Scholar] [CrossRef]
Tian, M.; Wang, J.; Yu, S.; Cui, M.; Yang, H. Efficient unmanned aerial vehicle detection algorithm based on cross-weighted pixel reconstruction and feature selection network. Eng. Appl. Artif. Intell. 2025, 159, 111562. [Google Scholar] [CrossRef]

Figure 1. Overall research framework and ideas.

Figure 2. Radar and target observation coordinate systems.

Figure 3. Network architecture of the classification algorithm based on L/K dual-band micro-Doppler and Mamba.

Figure 4. Mamba Encoder network structure.

Figure 5. Comparison of L/K dual-band micro-doppler time-frequency spectrograms for UAVs and birds.

Figure 6. Variation curves of accuracy and loss with epochs for different models. (a) ACC–epoch curves of different models; (b) loss–epoch curves of different models.

Figure 7. Classification confusion matrix.

Figure 8. t-SNE Visualization comparison of the proposed algorithm. (a) t-SNE visualization of baseline network features; (b) t-SNE visualization of the proposed algorithm’s features.

Table 1. Performance comparison of different backbone networks in the UAV and bird classification task.

Model	ACC	mAP	MF1	KC	FLOPs (G)
VGG16	93.1	91.8	92.5	92.2	15.51
ResNet50	94.9	93.7	94.1	93.2	6.25
Swin Transformer	95.8	95.1	94.8	94.3	23.64
ConvNext	94.3	93.6	93.9	93.3	15.41
Proposed Algorithm	97.5	96.2	95.6	95.2	18.95

Table 2. Results of three independent, repeated experiments for the proposed method and Swin Transformer.

Experiment No.	Method	ACC	mAP	MF1	KC
1st	Swin Transformer	95.5	94.7	94.6	94.1
1st	Ours	97.2	95.9	95.2	94.8
2nd	Swin Transformer	95.8	95.1	94.8	94.4
2nd	Ours	97.4	96.2	95.6	95.2
3rd	Swin Transformer	95.9	95.5	95.0	94.4
3rd	Ours	97.7	96.4	95.9	95.5

Table 3. Ablation study of the proposed classification algorithm based on L/K dual-band micro-Doppler and Mamba.

L/K Dual-Branch	Patching	Mamba-SSM	Late Fusion	Joint Loss	ACC	mAP	mF1	KC
√	×	×	×	×	92.8	91.4	92.0	90.9
√	√	×	×	×	94.0	92.7	93.2	92.1
√	√	√	×	×	95.4	94.1	94.6	93.6
√	√	√	√	×	96.8	95.5	95.0	94.3
√	√	√	√	√	97.5	96.2	95.6	95.2

In the table, “√” indicates that the proposed component is used, while “×” indicates that the proposed component is not used and is replaced by an alternative implementation. Specifically, when Patching is marked as “×”, two-dimensional convolutional feature extraction followed by flattening is adopted to generate the sequence representation; when Mamba-SSM is marked as “×”, a Transformer encoder is used to replace Mamba-SSM; when Late Fusion is marked as “×”, score-level averaging of the two branch outputs is adopted as the fusion strategy; and when the joint loss is marked as “×”, only the cross-entropy loss is used for supervised training.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhang, T.; Song, X. A Classification Algorithm of UAV and Bird Target Based on L/K Dual-Band Micro-Doppler and Mamba. Drones 2026, 10, 265. https://doi.org/10.3390/drones10040265

AMA Style

Zhang T, Song X. A Classification Algorithm of UAV and Bird Target Based on L/K Dual-Band Micro-Doppler and Mamba. Drones. 2026; 10(4):265. https://doi.org/10.3390/drones10040265

Chicago/Turabian Style

Zhang, Tao, and Xiaoru Song. 2026. "A Classification Algorithm of UAV and Bird Target Based on L/K Dual-Band Micro-Doppler and Mamba" Drones 10, no. 4: 265. https://doi.org/10.3390/drones10040265

APA Style

Zhang, T., & Song, X. (2026). A Classification Algorithm of UAV and Bird Target Based on L/K Dual-Band Micro-Doppler and Mamba. Drones, 10(4), 265. https://doi.org/10.3390/drones10040265

Article Menu

A Classification Algorithm of UAV and Bird Target Based on L/K Dual-Band Micro-Doppler and Mamba

Highlights

Abstract

1. Introduction

2. Related Work

3. The Overall Research Framework and Ideas

4. Radar Echo Modeling for UAVs and Birds Based on Micro-Doppler Signatures

4.1. UAV Echo Model and Micro-Doppler Parametric Representation

4.2. Flapping Bird Echo Model and Parametric Representation of Micro-Doppler

5. Classification Algorithm Based on L/K Dual-Band Micro-Doppler and Mamba

5.1. L/K Dual-Band Micro-Doppler Spectrogram Representation and Positional Encoding

5.2. Mamba-Based SSM Spectrogram Encoding

5.3. L/K Dual-Branch Joint Classification Head

5.4. Joint Loss via Mutual Learning and Contrastive Learning

6. Experiments and Analysis

6.1. Dataset Construction

6.2. Experimental Setup

6.3. Analysis of Experimental Results

7. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI