Micro-Motion Classiﬁcation of Flying Bird and Rotor Drones via Data Augmentation and Modiﬁed Multi-Scale CNN

: Aiming at the difﬁcult problem of the classiﬁcation between ﬂying bird and rotary-wing drone by radar, a micro-motion feature classiﬁcation method is proposed in this paper. Using K-band frequency modulated continuous wave (FMCW) radar, data acquisition of ﬁve types of rotor drones (SJRC S70 W, DJI Mavic Air 2, DJI Inspire 2, hexacopter, and single-propeller ﬁxed-wing drone) and ﬂying birds is carried out under indoor and outdoor scenes. Then, the feature extraction and parameterization of the corresponding micro-Doppler (m-D) signal are performed using time-frequency (T-F) analysis. In order to increase the number of effective datasets and enhance m-D features, the data augmentation method is designed by setting the amplitude scope displayed in T-F graph and adopting feature fusion of the range-time (modulation periods) graph and T-F graph. A multi-scale convolutional neural network (CNN) is employed and modiﬁed, which can extract both the global and local information of the target’s m-D features and reduce the parameter calculation burden. Validation with the measured dataset of different targets using FMCW radar shows that the average correct classiﬁcation accuracy of drones and ﬂying birds for short and long range experiments of the proposed algorithm is 9.4% and 4.6% higher than the Alexnet- and VGG16-based CNN methods, respectively. that the recognition accuracy of clim (10 − 4 ) in the same dataset is higher than that without clim (10 − 4 ), such as the recognition accuracy of C1 and C3 being higher than that of C2. The result shows that the spectrum with clim (10 − 4 ) has the best m-D characteristics for m-D classiﬁcation.


Introduction
Bird strikes refer to incidents of aircraft colliding with birds while taking off, landing or during flight, which is a traditional security threat. Recently, "low altitude, slow speed and small size" aircraft, e.g., small rotor drones, have been developing rapidly [1][2][3]. There have been successive incidents of trespassing drones in many airports, which would seriously threaten public safety. Monitoring the illegal flying of drones and the prevention of bird strikes have become challenging problems for several applications, e.g., airport clearance zone surveillance, important event or place security, etc. [4][5][6]. One of the key technologies is classification of the two kinds of targets, as people need to tell them apart for the following different precautions. Radar is an effective means of target surveillance; however, there is still a lack of effective methods for identification of drones and flying birds via radar.
They are non-rigid targets and the rotation of the drone's rotor and the flapping of the bird's wings will introduce additional modulation sidebands near the Doppler frequency of the radar echo generated by the translation of the main body, which is called the micro-Doppler (m-D) effect [7][8][9]. The micro-motion characteristics are closely related to the type, motion state, radar observation parameter, environment and background, etc. [10,11]. Therefore, m-D is an effective characteristic for the classification of drones and flying birds, which can improve the ability of fine feature description [12][13][14][15]. However, the Flying bird with flapping wings is a typical non-rigid target with joints [29]. For the kinematic model of the flapping wing of a bird, it is assumed that the wings have two interconnected parts, i.e., the elbow joint and the wrist joint. In Figure 1, the elbow joint is used to connect the upper arm and the forearm, and the wrist joint is used to connect the forearm and the hands. The elbow joint can only swing up and down on a fixed plane of one motion axis; the wrist joint can swing and circle around two vertical motion axes respectively. The flapping angle and torsion angle of the wings are both expressed by a general sine and cosine function.
introduces the proposed data augmentation method; Section 3.3 the micro-motion characteristics of different types of drones and flying bird are analyzed based on the collected data; and Section 3.4 describes the composition and quantity of the dataset. Description of the proposed multi-scale CNN model and detailed flowchart of the m-D feature extraction and classification method are given in Section 4. Finally, in Section 5, detection experiments are carried out using the target dataset and the multi-scale model. The experimental results show that the proposed method has better classification accuracy and generalization ability compared with popular methods, e.g., AlexNet [27] and VGG16 [28]. The last section concludes the paper and presents future research directions.

M-D Signal of Flying Bird
Flying bird with flapping wings is a typical non-rigid target with joints [29]. For the kinematic model of the flapping wing of a bird, it is assumed that the wings have two interconnected parts, i.e., the elbow joint and the wrist joint. In Figure 1, the elbow joint is used to connect the upper arm and the forearm, and the wrist joint is used to connect the forearm and the hands. The elbow joint can only swing up and down on a fixed plane of one motion axis; the wrist joint can swing and circle around two vertical motion axes respectively. The flapping angle and torsion angle of the wings are both expressed by a general sine and cosine function.
where C2 is the flapping amplitude of the forearm and φ2 is the delay of the torsion angle. Furthermore, the angular velocity and the linear velocity of the wingtip can be obtained. The angular velocity is expressed as ( ) The relationship of the wingtip linear velocity with time is established. We ignore the influence of the upper arm on the angular velocity, and the upper arm and forearm are analyzed as a whole. The flapping angle of the upper arm and forearm are ψ 1 (t) = A 1 cos 2π f flap t + ψ 1 (1) ψ 2 (t) = A 2 cos 2π f flap t + ψ 2 (2) where A 1 and A 2 are the swing amplitude of upper arm and forearm, f flap is the flapping frequency and ψ 1 and ψ 2 represent the delay of the flapping angle. The torsion angle of the forearm is where C 2 is the flapping amplitude of the forearm and ϕ 2 is the delay of the torsion angle. Furthermore, the angular velocity and the linear velocity of the wingtip can be obtained. The angular velocity is expressed as ω ψ (t) = Therefore, the linear velocity of a bird's wing tip can be expressed as v ψ (t) = (L 1 + L 2 )ω ψ (t) = −2(L 1 + L 2 )πA 2 f flap sin 2π f flap t (6) v ϕ (t) = (L 1 + L 2 )ω ϕ (t) = −2(L 1 + L 2 )πC 2 f flap sin 2π f flap t where L 1 and L 2 are the length of upper arm and forearm, respectively, and L 1 + L 2 is the half length of the bird's wingspan, i.e., half the wingspan. For ordinary birds, the flight speed, flapping angle and other factors are quite similar, but for different birds such as swallows and finches, the motion state and wingspan length are different. The flapping wing frequency and wingspan length of flying birds are important factors that affect the m-D signal.

M-D Signal of Rotor Drone
The echo signal of the rotary-wing drone is represented by sum of the Doppler of main body and the m-D of the rotor components. The main body motion is mainly modelled as having uniform or accelerated motion. The significant difference in m-D characteristics between the rotation and the flapping motion provides the basis for the classification. A fixed space coordinate system (X, Y, Z) and a fixed object coordinate system (x, y, z) are established, which are parallel to each other, and the radar and rotor center positions are respectively located at the original point of the two coordinate systems. In addition, the rotor blade is regarded as composed of countless scattering points. During the movement of the rotor drone target, the scattering point in the rotor rotates around the center of the rotor at an angular velocity ω. The azimuth and pitch angles of the rotor relative to the radar are α and β, as shown in Figure 2.
Therefore, the linear velocity of a bird's wing tip can be expressed as ( ) where L1 and L2 are the length of upper arm and forearm, respectively, and L1 + L2 is the half length of the bird's wingspan, i.e., half the wingspan. For ordinary birds, the flight speed, flapping angle and other factors are quite similar but for different birds such as swallows and finches, the motion state and wingspan length are different. The flapping wing frequency and wingspan length of flying birds are important factors that affect the m-D signal.

M-D Signal of Rotor Drone
The echo signal of the rotary-wing drone is represented by sum of the Doppler of main body and the m-D of the rotor components. The main body motion is mainly modelled as having uniform or accelerated motion. The significant difference in m-D characteristics between the rotation and the flapping motion provides the basis for the classification. A fixed space coordinate system (X, Y, Z) and a fixed object coordinate system (x, y, z) are established, which are parallel to each other, and the radar and rotor center positions are respectively located at the original point of the two coordinate systems. In addition, the rotor blade is regarded as composed of countless scattering points. During the movement of the rotor drone target, the scattering point in the rotor rotates around the center of the rotor at an angular velocity ω. The azimuth and pitch angles of the rotor relative to the radar are α and β, as shown in Figure 2. Figure 2. The geometric diagram of radar and quadrotor drone.
The echo signal of the multi-rotor drone is composed of the main body of the drone and the m-D signal of the rotor components. For the former one, it does not contribute much to the classification features and can be removed by compensation methods, so it is not considered in the paper. The rotor echo reflecting the micro-motion characteristics can be regarded as the sum of multi-rotor echoes together. Based on the helicopter single-rotor signal model, the echo of multi-rotor drone can be represented as follows [7]. The echo signal of the multi-rotor drone is composed of the main body of the drone and the m-D signal of the rotor components. For the former one, it does not contribute much to the classification features and can be removed by compensation methods, so it is not considered in the paper. The rotor echo reflecting the micro-motion characteristics can be regarded as the sum of multi-rotor echoes together. Based on the helicopter single-rotor signal model, the echo of multi-rotor drone can be represented as follows [7]. where M is the number of rotors; l 0 is the length of rotor blades; R 0,m is the distance from the radar to the center of the mth rotor; Z 0,m is the height of the mth rotor blade; β m is the pitch angle from the radar to the mth rotor; N is the number of single rotor blades; ω m is the frequency of the rotation angle; and ϕ 0,m is the initial rotation angle of the mth rotor. Correspondingly, the m-D frequency of the kth blade of the mth rotor is It can be seen that the m-D frequency is modulated in the form of a sine function, and is also affected by radar parameters, blade length, initial phase and pitch angle. As the drone rotor rotates, the linear velocity at the tip of the blade is the largest, so the corresponding Doppler frequency is also the largest, and the maximum m-D frequency is where v tip is the blade tip linear velocity, ω is the rotational angular velocity (rad/s) and n is the rotational speed of the rotor blade (r/s, revolutions per second). Based on Equation (11), the length of the rotor blade can be estimated as follows

Data Augmentation via Adjusting T-F Graph Display Scope and Feature Fusion
The main parameters of the K-band FMCW radar used in this paper are shown in Table 1. A higher working frequency will result in a more obvious m-D signature. The modulation bandwidth related to the range resolution and longer modulation period means more integration time with longer observation range. The parameter of −3 db beamwidth indicates the beam coverage.  Figure 3 shows the data acquisition and signal processing flowchart of K-band FMCW radar. The processing results obtained after data acquisition are shown in Figure 4, which are a one-dimensional range profile (after demodulation), range-period graph, rangeperiod graph after stationary clutter suppression and T-F graph of the target's range unit. Figure 4c,d can accurately reflect the location and micro-motion information of the target, which are also the basis of the following m-D dataset.       The measurement principle of FMCW (triangular wave) radar for detecting relative moving target is shown in Figure 5. f t is the transmitted modulated signal, f r is the received reflected signal, B is the signal modulation bandwidth, f 0 is the initial frequency of the signal, f d is the Doppler shift, T is the modulation period of the signal, and τ is the time delay. For FMCW radar (triangular wave modulation mode), the range information and speed information between the radar and the target can be measured by the difference frequency signal ∆ f a and ∆ f b of the triangular wave for two consecutive cycles [23]. The measurement principle of FMCW (triangular wave) radar for detecting relative moving target is shown in Figure 5. ft is the transmitted modulated signal, fr is the received reflected signal, B is the signal modulation bandwidth, f0 is the initial frequency of the signal, fd is the Doppler shift, T is the modulation period of the signal, and τ is the time delay. For FMCW radar (triangular wave modulation mode), the range information and speed information between the radar and the target can be measured by the difference frequency signal Δ a f and Δ b f of the triangular wave for two consecutive cycles [23].
where M represents the number of range unit and N is the samplings during the modulation periods. The feature selection of target's signal includes high-resolution range-period and m-D features. The former can reflect the property of radar cross section (RCS) and the range walk information. The latter can obtain the characteristic of the vibration and rotation of the target and its components. At present, moving target classification usually uses T-F information, while the characteristics of range profile and range walk are ignored. Moreover, it cannot bring in additional effective features using traditional image augmentation methods, such as image rotation, cropping, blurring, and adding noise. It only generates similar images with the original images, and as the increment of expanded dataset, the similar data will also lead to network overfitting and poor generalization. In this paper we proposed three methods for effective data enhancement.
In Method 1, the display of range unit in the range-time/period graph is selected and focused on the target location to obtain the most obvious radar m-D characteristics of flying birds and rotary drones.
In Method 2, the amplitude scope displayed in T-F graph is adjusted to enhance the m-D features, and then the detail features of m-D are more obvious in the spectrum.
In Method 3, the above two methods can be combined together, i.e., adjusting different range units and setting different amplitude scopes. The advantage of the proposed data augmentation is that more different data can be fed into the classification model, and more feature information can be learned from the m-D signals.
The detailed data augmentation is described as follows: Step 1: The target's range unit is selected from the range-time/period graph, and T-F transform is performed on the time dimension data of a certain range unit to obtain the two-dimensional T-F graph, i.e.,

( )
where ( ) × M N S u is the demodulated signal or the signal after MTI processing, ( ) − g u t is a movable window function and the variable parameter is the window length of the STFT Then, the received signal after demodulation can be written as where M represents the number of range unit and N is the samplings during the modulation periods. The feature selection of target's signal includes high-resolution range-period and m-D features. The former can reflect the property of radar cross section (RCS) and the range walk information. The latter can obtain the characteristic of the vibration and rotation of the target and its components. At present, moving target classification usually uses T-F information, while the characteristics of range profile and range walk are ignored. Moreover, it cannot bring in additional effective features using traditional image augmentation methods, such as image rotation, cropping, blurring, and adding noise. It only generates similar images with the original images, and as the increment of expanded dataset, the similar data will also lead to network overfitting and poor generalization. In this paper we proposed three methods for effective data enhancement.
In Method 1, the display of range unit in the range-time/period graph is selected and focused on the target location to obtain the most obvious radar m-D characteristics of flying birds and rotary drones.
In Method 2, the amplitude scope displayed in T-F graph is adjusted to enhance the m-D features, and then the detail features of m-D are more obvious in the spectrum.
In Method 3, the above two methods can be combined together, i.e., adjusting different range units and setting different amplitude scopes. The advantage of the proposed data augmentation is that more different data can be fed into the classification model, and more feature information can be learned from the m-D signals.
The detailed data augmentation is described as follows: Step 1: The target's range unit is selected from the range-time/period graph, and T-F transform is performed on the time dimension data of a certain range unit to obtain the two-dimensional T-F graph, i.e., STFT s (t, ω).
where S M×N (u) is the demodulated signal or the signal after MTI processing, g(u − t) is a movable window function and the variable parameter is the window length of the STFT. Step 2: The amplitude is firstly normalized to [0, 1]. For the obtained range-time/period graph and T-F graph, the dataset is expanded by changing the display scope (spectrum amplitude). By controlling the color range display of the m-D feature in the T-F graph and the range-time/period graph, the m-D feature of the target and the range walk information can be enhanced or weakened.
The data in the range-time/period graph and T-F graph can be regarded as an array C, which is displayed by imagesc(C, clims). The color range (spectrum amplitude) is specified as a two-element vector of the form clims =[cmin cmax]. According to the characteristic display of the graph, different color ranges can be set appropriately to obtain different number of datasets. Take the drone as an example: set the color display for the rangeperiods graph to [A, 1] and specify the value of A as 0.01, 0.0001, and 0.00001. Then, the range-periods graph drawn in dB is shown in Figure 6. The drone target is located at 1 m, and different amplitude modulation features are given. Set the color display range for the T-F graph to [B, 1] and specify the values of B as 0.01, 0.0001 and 0.00001, respectively. The dataset augmentation of the T-F graph drawn in dB is shown in Figure 7.
Remote Sens. 2022, 14, x FOR PEER REVIEW 8 of 25 Step 2: The amplitude is firstly normalized to [0, 1]. For the obtained range-time/period graph and T-F graph, the dataset is expanded by changing the display scope (spectrum amplitude). By controlling the color range display of the m-D feature in the T-F graph and the range-time/period graph, the m-D feature of the target and the range walk information can be enhanced or weakened.
The data in the range-time/period graph and T-F graph can be regarded as an array C, which is displayed by Step 3: The dataset is performed by cropping, feature fusion and label calibration.
Based on the original range-period graph and T-F graph, select an area containing the target micro-motion feature for cropping, and the cropped image contains the target micro-movement feature in the uniform size. Feature fusion is to merge the range-period graph and T-F graph of the cropped image, which contains the target's micro-motion features. The width of the range-period graph and T-F graph are consistent with the categories of different micro-motions. Then, label calibration of the dataset after feature fusion is performed for unified size. The feature fusion processing is shown in Figure 8. Step 2: The amplitude is firstly normalized to [0, 1]. For the obtained range-time/period graph and T-F graph, the dataset is expanded by changing the display scope (spectrum amplitude). By controlling the color range display of the m-D feature in the T-F graph and the range-time/period graph, the m-D feature of the target and the range walk information can be enhanced or weakened.
The data in the range-time/period graph and T-F graph can be regarded as an array C, which is displayed by ( ) imagesc , clims C . The color range (spectrum amplitude) is specified as a two-element vector of the form [ ] clims= cmin cmax . According to the characteristic display of the graph, different color ranges can be set appropriately to obtain different number of datasets. Take the drone as an example: set the color display for the range-periods graph to [A, 1] and specify the value of A as 0.01, 0.0001, and 0.00001. Then, the range-periods graph drawn in dB is shown in Figure 6. The drone target is located at 1 m, and different amplitude modulation features are given. Set the color display range for the T-F graph to [B, 1] and specify the values of B as 0.01, 0.0001 and 0.00001, respectively. The dataset augmentation of the T-F graph drawn in dB is shown in Figure 7. Step 3: The dataset is performed by cropping, feature fusion and label calibration. Based on the original range-period graph and T-F graph, select an area containing the target micro-motion feature for cropping, and the cropped image contains the target micro-movement feature in the uniform size. Feature fusion is to merge the range-period graph and T-F graph of the cropped image, which contains the target's micro-motion features. The width of the range-period graph and T-F graph are consistent with the categories of different micro-motions. Then, label calibration of the dataset after feature fusion is performed for unified size. The feature fusion processing is shown in Figure 8. Step 3: The dataset is performed by cropping, feature fusion and label calibration. Based on the original range-period graph and T-F graph, select an area containing the target micro-motion feature for cropping, and the cropped image contains the target micromovement feature in the uniform size. Feature fusion is to merge the range-period graph and T-F graph of the cropped image, which contains the target's micro-motion features. The width of the range-period graph and T-F graph are consistent with the categories of different micro-motions. Then, label calibration of the dataset after feature fusion is performed for unified size. The feature fusion processing is shown in Figure 8.

Datasets Construction
We selected civilian-grade large, medium and small drones as typical rotary drones, i.e., the SJRC S70 W, DJI Mavic Air 2, DJI Inspire 2, six-rotor drone (hexacopter) and singlepropeller fixed-wing drone, and flying bird targets (bionic and seagull birds). Figure 9 and Table 2 show the photos and main parameters of the different targets.  * Type "Q" is for quadrocopter, "P" is for plane, "B" is for bird. # The target parameters of the bird are expressed as the number of wings and half the length of the bird's wingspan.
The M-D dataset of different types of drones and flying birds is shown in Table 3 and Figure 10. The construction of the micro-motion dataset mainly includes two parts: data preprocessing and data augmentation. The preprocessing mainly performs stationary clutter suppression via MTI (sometimes MTI is not a necessary step). The dataset can be

Datasets Construction
We selected civilian-grade large, medium and small drones as typical rotary drones, i.e., the SJRC S70 W, DJI Mavic Air 2, DJI Inspire 2, six-rotor drone (hexacopter) and singlepropeller fixed-wing drone, and flying bird targets (bionic and seagull birds). Figure 9 and Table 2 show the photos and main parameters of the different targets.

Datasets Construction
We selected civilian-grade large, medium and small drones as typical rotary drones, i.e., the SJRC S70 W, DJI Mavic Air 2, DJI Inspire 2, six-rotor drone (hexacopter) and singlepropeller fixed-wing drone, and flying bird targets (bionic and seagull birds). Figure 9 and Table 2 show the photos and main parameters of the different targets.  * Type "Q" is for quadrocopter, "P" is for plane, "B" is for bird. # The target parameters of the bird are expressed as the number of wings and half the length of the bird's wingspan.
The M-D dataset of different types of drones and flying birds is shown in Table 3 and Figure 10. The construction of the micro-motion dataset mainly includes two parts: data preprocessing and data augmentation. The preprocessing mainly performs stationary clutter suppression via MTI (sometimes MTI is not a necessary step). The dataset can be  * Type "Q" is for quadrocopter, "P" is for plane, "B" is for bird. # The target parameters of the bird are expressed as the number of wings and half the length of the bird's wingspan.
The M-D dataset of different types of drones and flying birds is shown in Table 3 and Figure 10. The construction of the micro-motion dataset mainly includes two parts: data preprocessing and data augmentation. The preprocessing mainly performs stationary clutter suppression via MTI (sometimes MTI is not a necessary step). The dataset can be expanded by setting the energy display amplitude of different m-D features. Cut the effective information of the range-periods and m-D spectrogram separately, and then combine the time-range information and the time-Doppler information to realize enhancement of the micro-motion features. In order to achieve the training of the CNN model, it is necessary to modify the dataset, remove the scale and colorbar of the horizontal and vertical axes of the input images, and normalize their sizes. expanded by setting the energy display amplitude of different m-D features. Cut the effective information of the range-periods and m-D spectrogram separately, and then combine the time-range information and the time-Doppler information to realize enhancement of the micro-motion features. In order to achieve the training of the CNN model, it is necessary to modify the dataset, remove the scale and colorbar of the horizontal and vertical axes of the input images, and normalize their sizes.  The experimental scene is shown in Figure 11. According to the measured m-D characteristic image of the rotor drone in Figure 12, it can be seen that the echo intensity mainly located at zero frequency and its surroundings is very strong, which reflects the main body motion and micro-motion. The T-F graph of the "Inspire 2", i.e., Figure 12b, has clearer m-D features than that of the "Mavic Air 2" in Figure 12a. The experimental scene is shown in Figure 11. According to the measured m-D characteristic image of the rotor drone in Figure 12, it can be seen that the echo intensity mainly located at zero frequency and its surroundings is very strong, which reflects the main body motion and micro-motion. The T-F graph of the "Inspire 2", i.e., Figure 12b, has clearer m-D features than that of the "Mavic Air 2" in Figure 12a. Figure 13 shows the m-D characteristics of Mavic Air 2 at different distances, i.e., 3 m, 6 m, 9 m and 12 m. As the distance increases, the m-D characteristics caused by the rotor are gradually weakened and its maximum Doppler amplitude is decreased, which is due to the weakening of the rotor echo. Although the m-D signal has been weakened, the reflection characteristics of the root of the rotor are still obvious, which is shown as a small modulation characteristic near the frequency of the main body's motion.  Figure 13 shows the m-D characteristics of Mavic Air 2 at different distanc m, 6 m, 9 m and 12 m. As the distance increases, the m-D characteristics cause rotor are gradually weakened and its maximum Doppler amplitude is decreased is due to the weakening of the rotor echo. Although the m-D signal has been we the reflection characteristics of the root of the rotor are still obvious, which is sho small modulation characteristic near the frequency of the main body's motion.  Figure 13 shows the m-D characteristics of Mavic Air 2 at different distances, i.e., 3 m, 6 m, 9 m and 12 m. As the distance increases, the m-D characteristics caused by the rotor are gradually weakened and its maximum Doppler amplitude is decreased, which is due to the weakening of the rotor echo. Although the m-D signal has been weakened, the reflection characteristics of the root of the rotor are still obvious, which is shown as a small modulation characteristic near the frequency of the main body's motion.   Figure 13 shows the m-D characteristics of Mavic Air 2 at different distances, i.e., 3 m, 6 m, 9 m and 12 m. As the distance increases, the m-D characteristics caused by the rotor are gradually weakened and its maximum Doppler amplitude is decreased, which is due to the weakening of the rotor echo. Although the m-D signal has been weakened, the reflection characteristics of the root of the rotor are still obvious, which is shown as a small modulation characteristic near the frequency of the main body's motion.

M-D Analysis of Flying Birds
Due to the maneuverability of the real flying bird, the amount of bird radar data is limited. This paper collects two types of bird data: one is the indoor experiment using a bionic bird that highly simulates the flapping flight of real bird, and the other is the outdoor experiment for a real seagull bird. The subjects of the indoor experiment are a single bird and two birds with flapping wings, which is shown in Figure 14. The m-D characteristics are shown in Figure 15a,b. It can be found that the m-D effect produced by the flapping wings of a bird can be effectively observed by the experimental radar. According to the waveform frequency and maximum Doppler of the m-D characteristic in the figure, we can further estimate the wingspan length and flapping frequency of the bird. When simulating the side-by-side flapping movement of two birds, the micro-motion characteristics in the T-F domain may be overlapped.

M-D Analysis of Flying Birds
Due to the maneuverability of the real flying bird, the amount of bird radar data is limited. This paper collects two types of bird data: one is the indoor experiment using a bionic bird that highly simulates the flapping flight of real bird, and the other is the outdoor experiment for a real seagull bird. The subjects of the indoor experiment are a single bird and two birds with flapping wings, which is shown in Figure 14. The m-D characteristics are shown in Figure 15a

M-D Analysis of Flying Birds
Due to the maneuverability of the real flying bird, the amount of bird radar data is limited. This paper collects two types of bird data: one is the indoor experiment using a bionic bird that highly simulates the flapping flight of real bird, and the other is the outdoor experiment for a real seagull bird. The subjects of the indoor experiment are a single bird and two birds with flapping wings, which is shown in Figure 14. The m-D characteristics are shown in Figure 15a  The outdoor experiment for real bird micro-motion signal acquisition is carried out. The observation distance is 3 m and other parameters remain unchanged. The obtained m-D characteristics of real bird flight are shown in Figure 16. According to the T-F graph, the m-D effect of the flapping wings can be observed as well and there are both similarities and differences of m-D features between the real birds and bionic bird. The similarity lies in the obvious periodic characteristics of the micro-motions, with longer interval and stronger amplitude, which are also distinctive features from drone targets. The difference is that m-D signal of the real bird is relatively weak due to the far distance, and there are some subtle fluctuations. However, the overall micro-motion characteristics different from the rotary-wing drones are obvious, and therefore the indoor and outdoor bird data are combined together as the overall dataset for further training and testing. The outdoor experiment for real bird micro-motion signal acquisition is carried out. The observation distance is 3 m and other parameters remain unchanged. The obtained m-D characteristics of real bird flight are shown in Figure 16. According to the T-F graph, the m-D effect of the flapping wings can be observed as well and there are both similarities and differences of m-D features between the real birds and bionic bird. The similarity lies in the obvious periodic characteristics of the micro-motions, with longer interval and stronger amplitude, which are also distinctive features from drone targets. The difference is that m-D signal of the real bird is relatively weak due to the far distance, and there are some subtle fluctuations. However, the overall micro-motion characteristics different from the rotary-wing drones are obvious, and therefore the indoor and outdoor bird data are combined together as the overall dataset for further training and testing.  Figure 17 shows the micro-motion characteristics of the fixed-wing drone. In order to better analyze the influence of different observation angles on the m-D, results of three observation angles, i.e., side view, front view, and up view, are provided respectively. The following conclusions can be drawn by comparison.

•
When the radar is looking ahead, the echo is the strongest (Figure 17b), with very obvious Doppler periodic modulation characteristics. Due to the low rotation speed, the micro-motion period is significantly longer than that of the rotary-wing drone. • Compared Figure 17b,c, i.e., different observation elevation angles, it can be seen that there is an optimal observation angle for radar detection of the target. When the observation angle changes, the m-D characteristics will be weakened to a large extent  The outdoor experiment for real bird micro-motion signal acquisition is carried out. The observation distance is 3 m and other parameters remain unchanged. The obtained m-D characteristics of real bird flight are shown in Figure 16. According to the T-F graph, the m-D effect of the flapping wings can be observed as well and there are both similarities and differences of m-D features between the real birds and bionic bird. The similarity lies in the obvious periodic characteristics of the micro-motions, with longer interval and stronger amplitude, which are also distinctive features from drone targets. The difference is that m-D signal of the real bird is relatively weak due to the far distance, and there are some subtle fluctuations. However, the overall micro-motion characteristics different from the rotary-wing drones are obvious, and therefore the indoor and outdoor bird data are combined together as the overall dataset for further training and testing.  Figure 17 shows the micro-motion characteristics of the fixed-wing drone. In order to better analyze the influence of different observation angles on the m-D, results of three observation angles, i.e., side view, front view, and up view, are provided respectively. The following conclusions can be drawn by comparison.

•
When the radar is looking ahead, the echo is the strongest (Figure 17b), with very obvious Doppler periodic modulation characteristics. Due to the low rotation speed, the micro-motion period is significantly longer than that of the rotary-wing drone. • Compared Figure 17b,c, i.e., different observation elevation angles, it can be seen that there is an optimal observation angle for radar detection of the target. When the observation angle changes, the m-D characteristics will be weakened to a large extent  Figure 17 shows the micro-motion characteristics of the fixed-wing drone. In order to better analyze the influence of different observation angles on the m-D, results of three observation angles, i.e., side view, front view, and up view, are provided respectively. The following conclusions can be drawn by comparison.

•
When the radar is looking ahead, the echo is the strongest (Figure 17b), with very obvious Doppler periodic modulation characteristics. Due to the low rotation speed, the micro-motion period is significantly longer than that of the rotary-wing drone. • Compared Figure 17b,c, i.e., different observation elevation angles, it can be seen that there is an optimal observation angle for radar detection of the target. When the observation angle changes, the m-D characteristics will be weakened to a large extent and the micro-motion information would be partially missing. In addition, when the radar is observing the target vertically (side view Figure 17a), the radial velocity of the blade towards the radar is the biggest, which results in the maximum Doppler effect compared to the other two angles. • Due to the large size of the fixed-wing drone and the blades, the main body echo and the m-D characteristics of the blades are significantly stronger than those of the rotary-wing drone and the flying bird.

•
As the blade speed increases, the micro-motion modulation period becomes shorter and part of the micro-motion feature is missing.
effect compared to the other two angles. • Due to the large size of the fixed-wing drone and the blades, the main body echo and the m-D characteristics of the blades are significantly stronger than those of the rotary-wing drone and the flying bird.

•
As the blade speed increases, the micro-motion modulation period becomes shorter and part of the micro-motion feature is missing.  Figure 18 shows the m-D characteristics of the six-axis rotor drone, i.e., hexacopter, at two distances. Figure 18a,c shows a range versus modulation periods graph. It can be clearly seen that there are obvious echoes around 11.5 m and 8.5 m, respectively, and the radar range resolution is 7.5 cm (bandwidth is 2 GHz). Under high-resolution conditions, due to the rotation of the rotors, the drone's echo occupies multiple range units and exhibits obvious periodic changes. Due to the different lengths and sizes of the rotors, various extended range units and modulation periods characteristics can be reflected in the  Figure 18 shows the m-D characteristics of the six-axis rotor drone, i.e., hexacopter, at two distances. Figure 18a,c shows a range versus modulation periods graph. It can be clearly seen that there are obvious echoes around 11.5 m and 8.5 m, respectively, and the radar range resolution is 7.5 cm (bandwidth is 2 GHz). Under high-resolution conditions, due to the rotation of the rotors, the drone's echo occupies multiple range units and exhibits obvious periodic changes. Due to the different lengths and sizes of the rotors, various extended range units and modulation periods characteristics can be reflected in the range-periods graph, which verifies the usefulness of the range-period information. Due to the multiple rotors and the fast rotation speed, the micro-motion components in the T-F graph (Figure 18b,d) overlap with each other. The positive-frequency of micromotion characteristics caused by the rotation of the rotor, i.e., towards the radar, are more obvious than the other part. Large rotors would result in obvious periodic modulation characteristics near the main body, which is also helpful for subsequent classification.

M-D Analysis of Hexacopter Drone
to the multiple rotors and the fast rotation speed, the micro-motion components in the T-F graph (Figure 18b,d) overlap with each other. The positive-frequency of micro-motion characteristics caused by the rotation of the rotor, i.e., towards the radar, are more obvious than the other part. Large rotors would result in obvious periodic modulation characteristics near the main body, which is also helpful for subsequent classification.

The Modified Multi-Scale CNN Model
This paper proposes a target m-D feature classification method based on the modified multi-scale CNN [25], which uses multi-scale splitting of the hybrid connection structure. The output of the multi-scale module contains a combination of different receptive field sizes, which is conducive to extracting the global feature information and the local information of the target. Firstly, a single-layer convolution kernel with a 7 × 7 convolutional layer is used to extract features from the input image, and then a multi-scale network characterization module is employed. The 1 × 1 convolution kernel is used to adjust the number of input channels, so that the next multi-scale module can perform in-depth feature extraction. The structure of the multi-scale CNN model is shown in Figure 19, which is based on the residual network module (Res2 Net). The filter bank with a convolution kernel size of 3 × 3 is used to replace the 1×1 convolutional feature map of n channels. The feature map after 1 × 1 convolution of two channels is divided into s feature map subsets, and each feature map subset contains n/s number of channels. Except for the first feature map subset that is directly passed down, the rest of the feature map subsets are followed by a convolutional layer with a convolution kernel size of 3 × 3, and the convolution operation is performed.

The Modified Multi-Scale CNN Model
This paper proposes a target m-D feature classification method based on the modified multi-scale CNN [25], which uses multi-scale splitting of the hybrid connection structure. The output of the multi-scale module contains a combination of different receptive field sizes, which is conducive to extracting the global feature information and the local information of the target. Firstly, a single-layer convolution kernel with a 7 × 7 convolutional layer is used to extract features from the input image, and then a multi-scale network characterization module is employed. The 1 × 1 convolution kernel is used to adjust the number of input channels, so that the next multi-scale module can perform in-depth feature extraction. The structure of the multi-scale CNN model is shown in Figure 19, which is based on the residual network module (Res2 Net). The filter bank with a convolution kernel size of 3 × 3 is used to replace the 1 × 1 convolutional feature map of n channels. The feature map after 1 × 1 convolution of two channels is divided into s feature map subsets, and each feature map subset contains n/s number of channels. Except for the first feature map subset that is directly passed down, the rest of the feature map subsets are followed by a convolutional layer with a convolution kernel size of 3 × 3, and the convolution operation is performed.
The second feature map subset is convoluted, and a new feature subset is formed and passed down in two lines. One line is passed down directly; and the other line is combined with the third feature map subset using a hierarchical arrangement connection method and sent to the convolution to form a new feature map subset. Then, the new feature map subset is divided into two lines; one is directly passed down, and the other line is still connected with the fourth feature map subset using a hierarchical progressive arrangement and sent to the convolutional layer to obtain other new feature map subsets. Repeat the above operations until all feature map subsets have been processed. Each feature map subset is combined with another feature map subset after passing through the convolutional layer. This operation will increase the equivalent receptive field of each convolutional layer gradually, so as to complete the extraction of information at different scales. The second feature map subset is convoluted, and a new feature subset is formed and passed down in two lines. One line is passed down directly; and the other line is combined with the third feature map subset using a hierarchical arrangement connection method and sent to the convolution to form a new feature map subset. Then, the new feature map subset is divided into two lines; one is directly passed down, and the other line is still connected with the fourth feature map subset using a hierarchical progressive arrangement and sent to the convolutional layer to obtain other new feature map subsets. Repeat the above operations until all feature map subsets have been processed. Each feature map subset is combined with another feature map subset after passing through the convolutional layer. This operation will increase the equivalent receptive field of each convolutional layer gradually, so as to complete the extraction of information at different scales.
Use Ki() to represent the 3 × 3 output of the convolution kernel, and xi represents the divided feature map subsets, where { } 1, 2,..., i s ∈ and s represents the number of feature map subsets divided by the feature map. The above process can be expressed as follows Then the output yi can be expressed as Figure 19. The multi-scale CNN structure. The right picture is an enlarged view of the SE module.
Use K i () to represent the 3 × 3 output of the convolution kernel, and x i represents the divided feature map subsets, where i ∈ {1, 2, . . . , s} and s represents the number of feature map subsets divided by the feature map. The above process can be expressed as follows Then the output y i can be expressed as Based on the above network structure, the output of the multi-scale module includes a combination of different receptive field sizes via the split hybrid connection structure, which is conducive to extracting global and local information. Specifically, the feature map after the convolution operation with the convolution kernel size of 1 × 1 is divided into four feature map subsets; after the multi-scale structure hybrid connection, the processed feature map subsets are combined by a splicing method, i.e., y 1 + y 2 + y 3 + y 4 . A convolutional layer with convolution kernel size 1 × 1 is used on the spliced feature map subsets to realize the information fusion of the divided four feature map subsets. Then, the multi-scale residual module is combined with the identity mapping y = x.
The squeeze excitation (SE) module is added after the multi-scale residual module, and then the modified multi-scale neural network residual module is completed. The structure of the SE module is given in the right of 0. For a feature map with a shape of (h, w, c), i.e., (height, width, channel), the SE module first performs a compression operation, and the feature map is globally averaged in the spatial dimension to obtain a feature vector representing global information. Convert the output of the multi-scale residual module h × w × c into the output of 1 × 1 × c. The second is the incentive operation, which is shown as follows.
where δ(W 1 × z) represents the first fully connected operation (FC), the first layer weight parameter is W 1 whose dimension is c × c/r, r is called the scaling factor.
Here, let r = 16, its function is to reduce the number of channels with less calculations, and z is the result of the previous squeeze operation with the dimension 1 × c. g(z, W) represents the output result after the first fully connected operation. After the first fully connected layer, the dimension becomes 1 × 1 × c/16, and then a ReLu layer activation function is added to increase the nonlinearity of the network, while the output dimension remains unchanged. Then, it is multiplied by W 2 , i.e., weight of the second fully connected layer. The output dimension becomes 1 × 1 × c and the output of the SE module is calculated through the activation function Sigmoid.
Finally, the re-weighting operation is performed, and the feature weights S are multiplied to the input feature map channel by channel to complete the feature re-calibration operation. This learning method can automatically obtain the importance of each feature channel, increase the useful features accordingly and suppress the features useless for the current task. The multi-scale residual module and the SE module are combined together and with 18 such modules the modified multi-scale network is formed. The combination of the multi-scale residual module and the SE module can learn different receptive field combinations and retain useful features, and suppress invalid feature information, which greatly reduces the parameter calculation burden. Finally, we add a three-layer fully connected layer. It is used to map the effective features learned by the multi-scale model to the label space of the samples; moreover, the depth of the network model is increased so that it can learn refined features. Compared with the global average pooling, the fully connected layer can achieve faster convergence speed and higher recognition accuracy.
By setting the parameter solving algorithm Adam, the nonlinear activation function ReLU, the initial learning rate 0.0001, the training round (Epoch) 100 and other parameters, the dataset is trained. After training for one round, a verification is performed on the verification set until the correct recognition rate meets the requirements, and the network model parameters are saved to obtain the suitable network model. The final test is to input the test data not involved in training and verification into the trained network model to verify the effectiveness and generalization ability of the multi-scale CNN model. By calculating the ratio of the number of samples correctly classified in the test dataset to the total number of samples in the entire test set, the classification accuracy is obtained.

Algorithm Flowchart
The detailed flowchart of the classification method for flying birds and rotary-wing drones based on the data augmentation and the multi-scale CNN is shown in Figure 20, which is consisted of four parts, radar echo processing, m-D dataset construction, CNN model training and model testing.

Algorithm Flowchart
The detailed flowchart of the classification method for flying birds and rotary-wing drones based on the data augmentation and the multi-scale CNN is shown in Figure 20 which is consisted of four parts, radar echo processing, m-D dataset construction, CNN model training and model testing.

Experimental Results
All models are implemented on a PC equipped with 16 G memory, a 2.5 GHZ Intel(R Core (TM) i5-8400 CPU and an NVIDIA GeForce GTX1050Ti GPU. In the experiment, we use the Adam algorithm as an optimizer, ReLu as the activation function and the cross entropy as cost function. The initial learning rate is set to 0.0001 and the number of itera tions is 100. In addition, to prevent the overfitting problem, dropout is applied to improve model generalization ability. The size of the extracted spectrogram was 560 × 420 and nor malized to 100 × 100 for data training in order to increase the computational speed.
Using the proposed method and the other popular CNNs, e.g., Alexnet and VGG16 the classification performance for five types of rotary-wing drones and flying bird is given by the confusion matrix under the conditions of relative short range (SNR > 0 dB) and relative long range (SNR < 0 dB). According to the analysis of Section 5.4, the detection probability of SNR = 0 dB is about 86.3% (normal value), therefore we choose 12 m as the boundary and when the range is farer than 12 m, it is called relative long range and vice versa. K-fold cross-validation method is employed in order to obtain a reliable and stable model [30] (K = 5 in this paper). For shorter range, Tables 4-6 show the confusion matrix of Alexnet, VGG16 and the proposed method for six targets, and the number of test sam ples for each type of target is 40. The true class is along the top. Target A judged as othe types are represented in gray, i.e., there is no mutual judgment error; pink is the samples

Experimental Results
All models are implemented on a PC equipped with 16 G memory, a 2.5 GHZ Intel(R) Core (TM) i5-8400 CPU and an NVIDIA GeForce GTX1050Ti GPU. In the experiment, we use the Adam algorithm as an optimizer, ReLu as the activation function and the cross entropy as cost function. The initial learning rate is set to 0.0001 and the number of iterations is 100. In addition, to prevent the overfitting problem, dropout is applied to improve model generalization ability. The size of the extracted spectrogram was 560 × 420 and normalized to 100 × 100 for data training in order to increase the computational speed.
Using the proposed method and the other popular CNNs, e.g., Alexnet and VGG16, the classification performance for five types of rotary-wing drones and flying bird is given by the confusion matrix under the conditions of relative short range (SNR > 0 dB) and relative long range (SNR < 0 dB). According to the analysis of Section 5.4, the detection probability of SNR = 0 dB is about 86.3% (normal value), therefore we choose 12 m as the boundary and when the range is farer than 12 m, it is called relative long range and vice versa. K-fold cross-validation method is employed in order to obtain a reliable and stable model [30] (K = 5 in this paper). For shorter range, Tables 4-6 show the confusion matrix of Alexnet, VGG16 and the proposed method for six targets, and the number of test samples for each type of target is 40. The true class is along the top. Target A judged as other types are represented in gray, i.e., there is no mutual judgment error; pink is the samples judged correctly; green color represents the cases of wrong classification, and dark green means more than one mistake.   It can be seen from Tables 4 and 5 that it is difficult to distinguish the Mavic Air 2, SJRC S70 W and Hexacopter, and there are many wrong classification cases. Figure 21b-d shows the range-periods graph and T-F graph, which indicates that the three types of targets all have the micro-motion characteristics of the rotor. Because of the fast speed, the T-F graph is densely distributed with short period. However, there are also subtle differences. For example, the Hexacopter has a wide range of m-D spectrum and obvious micro-motion peaks, while the micro-motion spectrum of the Mavic Air 2 is concentrated around −200 Hz~200 Hz, due to its small rotor size. If better classification ability is needed, the network is required to be able to learn both of large-scale periodic modulation features and small-scale micro-motion features. Compared with Alexnet, the classification accuracy of the CNN based on VGG16 is higher, i.e., from 89.6% to 94.6%, but for small drones, there are still classification errors. Fixed-wing drones have a small number of rotors and exhibit obvious echo modulation characteristics. The Hexacopter drone has many rotors with larger size, and therefore it occupies more range units in the time domain (the white box in Figure 21b), which are also effective features different from small drones. Then, the modified multi-scale CNN is used to split the hybrid connection structure at multiple scales, so that the output of the multi-scale module contains a combination of different receptive field sizes, which is conducive to extracting the global and local information of the target features. The classification accuracy probability is increased to 99.2%. For longer ranges, the confusion matrices are shown in Tables 7-9. As the target echo gets weaker, the m-D characteristics also become inconspicuous. Therefore, for Alexnet, it is difficult to distinguish the three smaller sizes targets, i.e., the Mavic Air 2, SJRC S70 W and Inspire 2, as shown in Figure 22b,c. At the same time, it is easy to judge the Hexacopter as Mavic Air 2, because the spectrum broadening feature of the Hexacopter is not obvious, as shown in Figure 22a. The proposed method can learn the weak and broadened spectrum characteristics, thereby improving the classification probability to 97.5% compared with the other two models, at 88.3% and 92.9%. Based on the above analysis, the proposed method has a good classification ability for different types of rotary-wing drones, and can distinguish flying bird targets as well.   For longer ranges, the confusion matrices are shown in Tables 7-9. As the target echo gets weaker, the m-D characteristics also become inconspicuous. Therefore, for Alexnet, it is difficult to distinguish the three smaller sizes targets, i.e., the Mavic Air 2, SJRC S70 W and Inspire 2, as shown in Figure 22b,c. At the same time, it is easy to judge the Hexacopter as Mavic Air 2, because the spectrum broadening feature of the Hexacopter is not obvious, as shown in Figure 22a. The proposed method can learn the weak and broadened spectrum characteristics, thereby improving the classification probability to 97.5% compared with the other two models, at 88.3% and 92.9%. Based on the above analysis, the proposed method has a good classification ability for different types of rotary-wing drones, and can distinguish flying bird targets as well.   Table 9. Confusion matrix of the proposed method for relative long range (average classification accuracy 97.5%).  Table 9. Confusion matrix of the proposed method for relative long range (average classification accuracy 97.5%). M  38  1  0  0  0  0  S  1  38  0  0  0  0  I  1  1  39  1  0  0  H  0  0  1

The Influence of Amplitude Display on Classification
The spectrum amplitude display is set from clim (10 −2 ) to clim (10 −6 ), and it is found that the m-D spectrum becomes more and more significant. When the amplitude of the spectrum was set to clim (10 −6 ), the Doppler energy was found to be more divergent and even some glitches appeared. On the contrary, when the amplitude of the spectrum is at clim (10 −2 ), the m-D feature disappears or is weakened. In addition, we construct a dataset with different clim values to study the effect of spectrum amplitude display on micromotion classification. The comparison results of different clim are shown in Table 10. One fold corresponds to the accuracy of each training. It can be observed that the classification accuracy of target classification is getting higher and higher as the number of dataset increases. It is noted that the expansion of the dataset plays a key role in improving the accuracy of the target classification. One phenomenon we find is that the recognition accuracy of clim (10 −4 ) in the same dataset is higher than that without clim (10 −4 ), such as the recognition accuracy of C1 and C3 being higher than that of C2. The result shows that the spectrum with clim (10 −4 ) has the best m-D characteristics for m-D classification.

The Influence of Amplitude Display on Classification
The spectrum amplitude display is set from clim (10 −2 ) to clim (10 −6 ), and it is found that the m-D spectrum becomes more and more significant. When the amplitude of the spectrum was set to clim (10 −6 ), the Doppler energy was found to be more divergent and even some glitches appeared. On the contrary, when the amplitude of the spectrum is at clim (10 −2 ), the m-D feature disappears or is weakened. In addition, we construct a dataset with different clim values to study the effect of spectrum amplitude display on micro-motion classification. The comparison results of different clim are shown in Table 10. One fold corresponds to the accuracy of each training. It can be observed that the classification accuracy of target classification is getting higher and higher as the number of dataset increases. It is noted that the expansion of the dataset plays a key role in improving the accuracy of the target classification. One phenomenon we find is that the recognition accuracy of clim (10 −4 ) in the same dataset is higher than that without clim (10 −4 ), such as the recognition accuracy of C1 and C3 being higher than that of C2. The result shows that the spectrum with clim (10 −4 ) has the best m-D characteristics for m-D classification.

Classification Performance Using Feature Fusion Strategy
In order to further compare the impact of the feature fusion method proposed in Section 3.1 on the classification performance, the results of the single range-period graph dataset, the T-F graph dataset and the feature fusion dataset are compared, as shown in Table 11. Each dataset is trained three times and averaged as the final classification accuracy. Experiments show that the classification accuracy using feature fusion strategy is 31% and 6% higher than the classification accuracy of the single range-period graph and T-F graph datasets, respectively.

Classification Performance via Different Network Models
The superiority of the modified multi-scale network model structure is verified and the two typical ResNet18 and Res2Net18 network models are compared under the same conditions (the network layer depth is the same as 18 layers and the SE module is added). The result is shown in Figure 23. The classification accuracy of Resnet, Res2Net and the modified multi-scale network model on the validation set are as follows: 93.9%, 95.06% and 96.9%, respectively, indicating that the modified model has better classification accuracy.

Classification Performance Using Feature Fusion Strategy
In order to further compare the impact of the feature fusion method proposed in Section 3.1 on the classification performance, the results of the single range-period graph dataset, the T-F graph dataset and the feature fusion dataset are compared, as shown in Table  11. Each dataset is trained three times and averaged as the final classification accuracy. Experiments show that the classification accuracy using feature fusion strategy is 31% and 6% higher than the classification accuracy of the single range-period graph and T-F graph datasets, respectively.

Classification Performance via Different Network Models
The superiority of the modified multi-scale network model structure is verified and the two typical ResNet18 and Res2Net18 network models are compared under the same conditions (the network layer depth is the same as 18 layers and the SE module is added). The result is shown in Figure 23. The classification accuracy of Resnet, Res2Net and the modified multi-scale network model on the validation set are as follows: 93.9%, 95.06% and 96.9%, respectively, indicating that the modified model has better classification accuracy.

Detection Probability for Different Ranges and SNRs
Since the radar used in this paper is a FMCW radar, the detection range for such lowobservable drones or birds is limited. In order to prove the universality of the proposed method and further generalize the conclusions, the detection performance with different signal-to-noise ratios (SNRs) of the returned signal is employed as a parameter to show how far to detection the target instead of the range parameter. The SNR is defined as the average power ratio of target's range unit to the background unit in the range-period data. Table 12 gives the detection probability of Mavic Air 2 under different SNRs. Its maximum detection range is about 100 m according to the radar range equation. The data were recorded for different ranges and we give the relation between the target's location and the SNR. The detection process is carried out in the two dimension data, i.e., range-period, with the CA-CFAR, and the false alarm rate is 10 −2 . It should be noted that the relation is only for Mavic Air 2 using the FMCW radar in the paper and we assume the target can be detected for the following analysis of classification.  It should be noted that the proposed algorithm is based on the correct detection of the target (exceeding the threshold), and the obvious m-D characteristics. Therefore, the result of clutter suppression and m-D signal enhancement would affect the classification results. Although due to the low power of the FMCW radar and the small RCS of the drone, the signal is weak and the detection range is relatively short, the conclusion can be applied to other radars according to the SNR relations (Table 12).

Conclusions
A feature extraction and classification method of flying birds and rotor drones is proposed based on data augmentation and a modified multi-scale CNN. The m-D signal models of the rotary drone and flying birds are established. Multi-features, i.e., the range profile, range-time/periods and m-D features (T-F graph), are employed and a data augmentation method is proposed by setting the color display amplitude of the T-F spectrum in order to increase the effective dataset and enhance m-D features. Using a K-band FMCW radar, micro-motion signal measurement experiments were carried out for five different sizes of rotary drones and one bionic bird, i.e., the SJRC S70 W, DJI Mavic Air 2, DJI Inspire 2, hexacopter and single-propeller fixed-wing drone. Different observation conditions on the impact of m-D characteristics, e.g., angle, distance, rotating speed, etc., were analyzed and compared. The multi-scale CNN model was modified for better learning and distinguishing of different micro-motion features, i.e., global and local information of m-D features. The experimental results for different scenes (indoor, outdoor, high SNR and low SNR) show that the proposed method has better classification accuracy of the five types of drones and flying birds compared with popular methods, e.g., AlexNet and VGG16. In future research, in order to better analyze the target classification performance for long distance and clutter background, the coherent pulse-Doppler radar will be used and the target's characteristics, e.g., m-D, will be investigated. Further, the CNN may be more intelligent with the intelligent systems, which enables truly intelligent processing and recognition [31].

Patents
The methods described in this article have applied for a Chinese invention patent: "A dataset expansion method and system for radar micro-motion target recognition. Patent number: 2020121101990640"; "A radar target micro-motion feature extraction and intelligent classification method and system. Patent number: 202110818621.0".