Research on a Sound Source Localization Method for UAV Detection Based on Improved Empirical Mode Decomposition

To address the challenge of accurately locating unmanned aerial vehicles (UAVs) in situations where radar tracking is not feasible and visual observation is difficult, this paper proposes an innovative acoustic source localization method based on improved Empirical Mode Decomposition (EMD) within an adaptive frequency window. In this study, the collected flight signals of UAVs undergo smoothing filtering. Additionally, Robust Empirical Mode Decomposition (REMD) is applied to decompose the signals into Intrinsic Mode Function (IMF) components for spectrum analysis. We introduce a sliding frequency window with adjustable bandwidth, which is automatically determined using a Grey Wolf Optimizer (GWO) with a sliding index. This window is used to lock and extract specific frequencies from the IMFs. Based on predefined criteria, the extracted IMF components are reconstructed, and trigger signal times are analyzed and recorded from these reconstructed IMFs. The time differences between sensor receptions are then calculated. Furthermore, this study introduces the Chan–Taylor localization algorithm based on weighted least squares. This advanced algorithm takes sensor time delay parameters as input and solves a set of nonlinear equations to determine the target’s location. Simulations and real-world signal tests are used to validate the robustness and performance of the proposed method. The results indicate that the localization error remains below 5% within a 15 m × 15 m measurement area. This provides an efficient and real-time method for detecting the location of small UAVs.


Introduction
In military, civilian, and anti-terrorism sectors, precise detection of micro-UAV locations is crucial, yet traditional detection methods are costly and complex to operate.Passive acoustic detection systems, especially those employing the Time Difference of Arrival (TDOA) algorithm, offer a low-cost and efficient solution to this challenge [1].However, there are three major obstacles in practical applications: accuracy issues in low signal-to-noise ratio (SNR) environments, difficulties in solving nonlinear equation systems, and limitations in long-range detection.
To address the low SNR problem, this paper introduces the Robust Empirical Mode Decomposition (REMD) algorithm.It incorporates robustness criteria or regularization terms, such as the Self-Adaptive Sifting Stopping Criterion (SSSC).By improving the endpoint effects of traditional EMD [2], REMD significantly enhances the stability and accuracy of signal processing, especially in complex and noise-mixed signal environments.The REMD method can adapt to various complex signal characteristics and effectively avoid noise interference and outlier influences, resulting in more accurate and reliable reconstructed signals.This solves the defect of EMD, which performs well in processing nonlinear and non-stationary signals but may be affected when facing complex signals or noise.To address the difficulty in solving nonlinear equation systems, we combine the advantages of the Chan algorithm and the Taylor algorithm, proposing a novel Chan-Taylor fusion algorithm [3].This approach not only reduces the difficulty of localization but also significantly improves localization accuracy, especially in noisy environments.To expand the detection range and increase the significance of time delay differences, we adopt a three-element discrete distributed array.By pre-measuring the specific coordinates of the sensors and combining them with the time delay estimation values obtained through REMD processing, we can accurately determine the target location using the Chan-Taylor fusion algorithm.
The structure of this paper is arranged as follows: Firstly, we introduce the division of far and near sound fields and the sound source localization model of the three-element discrete array.Secondly, we elaborate on the REMD time delay analysis method and then we delve into the specific mathematical methods of the Chan-Taylor fusion algorithm.Afterward, we present the simulation results of the REMD and Chan-Taylor algorithms.Finally, we summarize the full text.Through this structured arrangement, we aim to provide readers with a comprehensive and in-depth understanding of the applications and advantages of passive acoustic detection systems in micro-UAV localization.Figure 1 is a general flowchart of our approach to the methodology explained in this paper.
Sensors 2024, 24, x FOR PEER REVIEW 2 of 27 nonlinear and non-stationary signals but may be affected when facing complex signals or noise.
To address the difficulty in solving nonlinear equation systems, we combine the advantages of the Chan algorithm and the Taylor algorithm, proposing a novel Chan-Taylor fusion algorithm [3].This approach not only reduces the difficulty of localization but also significantly improves localization accuracy, especially in noisy environments.To expand the detection range and increase the significance of time delay differences, we adopt a three-element discrete distributed array.By pre-measuring the specific coordinates of the sensors and combining them with the time delay estimation values obtained through REMD processing, we can accurately determine the target location using the Chan-Taylor fusion algorithm.
The structure of this paper is arranged as follows: Firstly, we introduce the division of far and near sound fields and the sound source localization model of the three-element discrete array.Secondly, we elaborate on the REMD time delay analysis method and then we delve into the specific mathematical methods of the Chan-Taylor fusion algorithm.Afterward, we present the simulation results of the REMD and Chan-Taylor algorithms.Finally, we summarize the full text.Through this structured arrangement, we aim to provide readers with a comprehensive and in-depth understanding of the applications and advantages of passive acoustic detection systems in micro-UAV localization.Figure 1 is a general flowchart of our approach to the methodology explained in this paper.

Delineation of Near-Field and Far-Field Models
Assume that d is the array spacing, m  is the minimum wavelength, L is the dis- tance from the sound source to the microphone, and nd D  is the size of the receiver array.The waveform of the sound source arriving at the microphone array at this point is considered to be a spherical waveform, which is consistent with the near-field model, and vice versa, for a plane wave, which is consistent with the far-field model, when The near-field model is suitable for close sound sources, but considering the amplitude

Delineation of Near-Field and Far-Field Models
Assume that d is the array spacing, λ m is the minimum wavelength, L is the distance from the sound source to the microphone, and D = nd is the size of the receiver array.The waveform of the sound source arriving at the microphone array at this point is considered to be a spherical waveform, which is consistent with the near-field model, and vice versa, for a plane wave, which is consistent with the far-field model, when L < 2D 2 λ m .The near-field model is suitable for close sound sources, but considering the amplitude difference and phase difference in sound waves, it can more accurately describe the characteristics of sound wave propagation, thus improving the positioning accuracy, while the far-field model is the opposite [4,5].Therefore, the near-field model is chosen in this paper, and its applicable area is improved by increasing the array aperture.The near-field model requires three or more microphones for localization, using a three-microphone uniform linear microphone array as an example shown in the Figure 2. The first microphone m 1 is chosen as the reference microphone, and τ 12 and τ 13 denote the time delay between the acoustic signal arriving at the first microphone m 1 and the second microphone m 2 and the third microphone m 3 , respectively.Then, the time delay τ 12 and τ 13 can be expressed as: Note: r i (i = 1, 2, 3) is the distance from the point source to the i microphone, and c is the velocity of the propagation of the acoustic signal in air (generally taken as an approximation of 340 m/s).
far-field model is the opposite [4,5].Therefore, the near-field model is chosen in this paper, and its applicable area is improved by increasing the array aperture.
The near-field model requires three or more microphones for localization, using a three-microphone uniform linear microphone array as an example shown in the Figure 2.
The first microphone 1 m is chosen as the reference microphone, and 12  and 13  de- note the time delay between the acoustic signal arriving at the first microphone (2) Note: is the distance from the point source to the i microphone, and c is the velocity of the propagation of the acoustic signal in air (generally taken as an approximation of 340 m/s).

Discrete Distributed Microphone Arrays
The acoustic signals received by microphone arrays based on the near-field model are spherical waves with a high correlation among signals.By increasing the aperture of the array, the reception range of a spherical wave can be enlarged, i.e., the practical application area of the microphone array can be increased.Therefore, a discrete distributed microphone array is selected, as shown in Figure 3.

Discrete Distributed Microphone Arrays
The acoustic signals received by microphone arrays based on the near-field model are spherical waves with a high correlation among signals.By increasing the aperture of the array, the reception range of a spherical wave can be enlarged, i.e., the practical application area of the microphone array can be increased.Therefore, a discrete distributed microphone array is selected, as shown in Figure 3.
In the coordinate system Oxy, the position of the N microphones are The array of N microphones can be flexibly arranged.T(X, Y) is the true position of the point source, the distance from T to the origin of the Oxy coordinate system is the drop-off distance R, the distance to each microphone m i is r i (i = 1, 2, 3), and the time at which the signal arrives at the reference microphone m 1 is t.According to the geometrical relationship, the following can be obtained [6]: From Formula (3), we can obtain the hyperbolic intersection equations for the discrete distributed microphone array.By solving this system of hyperbolic equations, we can obtain the location information of the target.This system of equations is nonlinear because it contains square roots and squared terms.To solve this system of equations, we usually need to use nonlinear optimization algorithms, such as Newton's method, the gradient descent method, or the least squares method.Specifically, we can define an objective function, which is the sum of the squared differences between the left and right sides of all equations, as shown in Formula (4) [7].
Our goal is to find the value of (X s , Y s ) that minimizes F(X s , Y s ).Additionally, the selection of the reference microphone is not fixed and can be randomly chosen as the i th microphone, which can then be combined with the other N − 1 microphones.In the coordinate system Oxy , the position of the N microphones are . The array of N microphones can be flexibly arranged.
is the true position of the point source, the distance from T to the origin of the Oxy coordinate system is the drop-off distance R , the distance to each microphone i m , and the time at which the signal arrives at the reference microphone 1 m is t .According to the geometrical relationship, the following can be obtained [6]: From Formula (3), we can obtain the hyperbolic intersection equations for the discrete distributed microphone array.By solving this system of hyperbolic equations, we can obtain the location information of the target.This system of equations is nonlinear because it contains square roots and squared terms.To solve this system of equations, we usually need to use nonlinear optimization algorithms, such as Newton's method, the gradient descent method, or the least squares method.Specifically, we can define an objective function, which is the sum of the squared differences between the left and right sides of all equations, as shown in Formula (4) [7].The Empirical Mode Decomposition method (EMD) is a method of signal decomposition based on the time scale characteristics of the data itself without any predefined basis functions.Theoretically, it can achieve time resolution and frequency resolution in any dimension, providing a new method for analyzing and guiding smooth and nonlinear signals [8].Based on its own characteristics, EMD makes the method theoretically applicable to the decomposition of any type of signal, and it has very obvious advantages and high signal-to-noise ratios in dealing with non-smooth and nonlinear data [9].
The traditional EMD decomposition steps can be categorized as follows: (1) Find all extreme points of the original signal.
(2) Fit the upper and lower envelopes e max (n) and e min (n) by using cubic spline curves.
(3) Find the mean of the upper and lower envelopes and draw the mean envelope m i (n).
(4) Obtain the middle signal h i (n) as the original signal minus the mean envelope.
During the generation of intermediate signals, a conditional judgement on h i (n) is required as follows: 1.
In the interval, the number of extreme points and the number of points past zero must be equal or differ by at most one.

2.
At any time, the average value of the upper envelope formed by the local maxima and the lower envelope formed by the local minima is zero.
When the above conditions are met, an IMF component can be generated, and the algorithm updates the original signal and repeats the process.
Note: c 1 (n) is the first-order IMF component and r 1 (n) denotes the updated signal.

Robust Empirical Mode Decomposition with Adaptive Frequency Windows
This paper proposes the use of adaptive frequency windows for locking and extraction in the signal spectrum to solve the problem of missing spectrum boundaries when segmenting the original signal.The schematic process is shown in Figure 4.The frequency window shown in the figure is [ f a , f b ], where f a , f b are the center frequencies of the front and back stopbands of the window.The shaded area is the segmented fragment transition region with a width of 2π [10].After initially setting a sliding window width, the sliding window is used to traverse the decomposition frequencies in sequence.The bandwidth range of the frequency window is variable.While traversing, the frequency amplitude within the sliding window [ f a , f b ] is solved in parallel.When the calculated amplitude exceeds the set threshold, the sliding window locates the conforming frequency band and outputs the traversal result data.
As the traditional EMD algorithm adopts cubic spline interpolation in the decomposition, the algorithm has endpoint effects when decomposing the IMF.This leads to signal distortion, and there is a large error in the decomposition of the obtained IMF time axis.The principle of the algorithm leads to modal aliasing phenomenon in the decomposed IMF.When the signal-to-noise ratio of the original signal is low, the signal-to-noise ratio of the decomposed IMF is also low, and the effect becomes larger in further decomposition.Since the UAV flight sound is still mixed with noise in different frequency intervals during the process of being received by the sensors, it is necessary to improve the traditional EMD algorithm to adapt to the scenarios applied in this paper.
Aiming at the phenomena existing in the traditional EMD, combined with the application requirements of sound source localization, this paper adopts the improved Empirical Mode Decomposition algorithm, i.e., Robust Empirical Mode Decomposition (REMD).The main idea of REMD is to reduce the influence of noise in the process of signal decomposition and to improve the stability and reliability of the decomposition through the introduction of some robustness criteria or regularization terms [11].By adjusting the regularization parameter or robustness criteria in the algorithm, the influence of noise can be reduced while retaining the signal characteristics.When the adaptive frequency window locks and splits the original signal, the high and low frequency signals in the split signal are decomposed by weights, and the decomposed IMF components are reconstructed, so as to obtain the reconstructed signal with a high signal-to-noise ratio.The specific implementation process is shown in Figure 5.
An REMD decomposition of the segmented signal x(n) is performed, and the decomposition yields n IMF components.
Note: c i (n) is the improved robust intrinsic modal function and r N (n) is the residual term.As the traditional EMD algorithm adopts cubic spline interpolation in the decomposition, the algorithm has endpoint effects when decomposing the IMF.This leads to signal distortion, and there is a large error in the decomposition of the obtained IMF time axis.The principle of the algorithm leads to modal aliasing phenomenon in the decomposed IMF.When the signal-to-noise ratio of the original signal is low, the signal-to-noise ratio of the decomposed IMF is also low, and the effect becomes larger in further decomposition.Since the UAV flight sound is still mixed with noise in different frequency intervals during the process of being received by the sensors, it is necessary to improve the traditional EMD algorithm to adapt to the scenarios applied in this paper.
Aiming at the phenomena existing in the traditional EMD, combined with the application requirements of sound source localization, this paper adopts the improved Empirical Mode Decomposition algorithm, i.e., Robust Empirical Mode Decomposition 1.
Filter the components with higher signal-to-noise ratio according to the following reconstruction criterion: (1) Correlation coefficient The correlation coefficient is a statistical indicator of the close connection among response variables.In this paper, Pearson's correlation coefficient, which is commonly used in statistics, is used [12].The degree of correlation between X and Y is described by a value in the interval [−1, 1]; the larger the value of the coefficient, the higher the correlation, and vice versa.After performing REMD decomposition, the algorithm automatically calculates the correlation coefficients between the IMF components and the segmented signals as a way to differentiate the useful components and retain them.Let two samples be X and Y.The following equation represents the Pearson correlation coefficient (ρx, y) of two continuous variables (x, y), which is equal to the product of the covariance cov(X, Y) between them divided by their respective standard deviations (σ x , σ y ).Variables close to 0 are considered uncorrelated and those close to 1 or −1 are said to be strongly correlated.
(2) Amplitude ratio The mathematical expression for the energy ratio coefficient is where E x is the total energy of the IMF component; E IMF (i) is the energy of different IMFs; and ε is the energy ratio coefficient.

2.
According to the proportional weighting method, take the maximum correlation coefficient as I MF max (n) and the minimum correlation coefficient as I MF min (n), subtract the two components and compute the average value, add the value to the segmented signal x(n), and reconstruct it to obtain a signal with high signal-to-noise ratio. x Analyze the trigger time values in the reconstructed signal.
Sensors 2024, 24, x FOR PEER REVIEW 7 of 27 (REMD).The main idea of REMD is to reduce the influence of noise in the process of signal decomposition and to improve the stability and reliability of the decomposition through the introduction of some robustness criteria or regularization terms [11].By adjusting the regularization parameter or robustness criteria in the algorithm, the influence of noise can be reduced while retaining the signal characteristics.When the adaptive frequency window locks and splits the original signal, the high and low frequency signals in the split signal are decomposed by weights, and the decomposed IMF components are reconstructed, so as to obtain the reconstructed signal with a high signal-to-noise ratio.The specific implementation process is shown in Figure 5.An REMD decomposition of the segmented signal ) (n x is performed, and the decomposition yields n IMF components.
Note:   is the improved robust intrinsic modal function and  n is the residual term.
1. Filter the components with higher signal-to-noise ratio according to the following reconstruction criterion: (1).Correlation coefficient The correlation coefficient is a statistical indicator of the close connection among re-

Time Delay Estimation Method Based on the Comparison of Amplitude Mean Values of Reconstructed Signals
The time delay estimation method based on the comparison of amplitude mean values of reconstructed signals involves calculating the amplitude α at each sampling point of a pure signal and then determining its average value α.By comparing the amplitude mean values of the reconstructed signal and the reference signal, the time delay of the signal can be inferred.If there is a shift in the amplitude mean of the reconstructed signal relative to the reference signal, this offset can serve as an estimate τ ij for the time delay.The effectiveness of this approach relies on the characteristics of the signal and the noise level.However, the use of improved REMD (Robust Empirical Mode Decomposition) can circumvent this issue.
To simplify the calculation, Figure 5 illustrates that this study only focuses on the amplitude within the first peak of the reconstructed signal to calculate the mean amplitude value.One of the reconstructed REMD signals from a microphone is selected as the reference signal, and its mean amplitude is compared with that of other microphone signals.
The offset of the same amplitude value is chosen as the time delay estimation.When the ratio of the mean amplitude value α i , α i−1 at time T i to that at time T i−1 reaches its maximum, we consider that τ i,i−1 = T i − T i−1 .

GWO-Based Adaptive Optimization
The Grey Wolf Optimization (GWO) algorithm is inspired by grey wolf packs in nature, and the GWO algorithm simulates the hierarchy and hunting mechanism of grey wolf packs in nature [13].The Grey Wolf Optimization algorithm does not require complex parameter tuning, and, at the same time, it has a strong global search ability, which can quickly find the optimal solution.This article seeks the optimal fitness function of the adaptive frequency window with the help of the advantages of this algorithm.GWO classifies the entire wolf population into Alpha, Beta, Delta, and Omega wolves according to the fitness curve, with wolves decreasing in rank in the following update process: (1) Calculate the distance between an individual grey wolf and its prey and update the grey wolf's position at all times: where t is the number of moment iterations; D(t) is the distance between the individual grey wolf and the prey; X p is the position vector of the prey; X is the position vector of the individual grey wolf; and A, C is the coefficient vector, which is calculated as: where r 1 , r 2 is the random vector in the interval [0, 1], t max is the maximum number of iterations, and a is the convergence factor.Grey wolves are able to identify the location of their prey and surround them [14].When a grey wolf identifies the location of a target, β and δ guide the pack to encircle the prey under the leadership of α.Constantly update their positions using their tracking mathematical model expressions as follows: Sensors 2024, 24, 2701 9 of 25 where D α , D β and D δ denote the distance between α, β and δ and other individuals, respectively; X α , X β and X δ represent the current positions of α, β and δ, respectively; C 1 , C 2 and C 3 are random vectors; and X is the current position of the grey wolf.Equations ( 21)-( 23) denote the step length and direction of individual w in the wolf pack towards e, f, and g, respectively, and Equation (24) denotes the final position of W.
The coefficient vector A in Equation ( 15) represents a random value in the interval [−α, α], and during the attack process of the wolf pack, the value of a is gradually reduced, while the range of fluctuation in A is also reduced.When |A| < 1, the wolf pack approaches and captures the prey, i.e., the optimal solution is obtained, and when |A| > 1, the grey wolf separates from the prey and searches for a more suitable target [15].In this paper, the noise ratio in the frequency domain is used as the fitness function of the frequency window, and the GWO algorithm is used to adaptively determine the window position.Figure 6 shows the schematic of the GWO adaptive approach.
Sensors 2024, 24, x FOR PEER REVIEW 10 of 27 while the range of fluctuation in A is also reduced .When

 A
, the wolf pack approaches and captures the prey, i.e., the optimal solution is obtained, and when 1  A , the grey wolf separates from the prey and searches for a more suitable target [15].In this paper, the noise ratio in the frequency domain is used as the fitness function of the frequency window, and the GWO algorithm is used to adaptively determine the window position.Figure 6 shows the schematic of the GWO adaptive approach.

TDOA Algorithm
The TDOA algorithm works by measuring the time differences between signals received by different receivers and uses these time differences to calculate the position of the transmitting source.Specifically, it is assumed that there are three or more receivers with known relative positions to each other and that when the source transmits a signal, each receiver records the time at which the signal is received.Since the velocity of signal propagation is known, this time difference and velocity information can be used to construct a system of localization hyperbolic equations to calculate the position of the signal source.However, this hyperbolic system of equations tends to be nonlinear, and the obtained solution is often imaginary, or there is no solution, which needs to be solved by transforming the nonlinear system of equations into a linear system of equations.The mainstream methods are the Fang algorithm, the Chan algorithm, and the Taylor algorithm.
In this paper, the Chan algorithm and the Taylor algorithm are selected, and five different sizes of Gaussian white noise are selected for simulation.Figure 7 represent the Chan algorithm measurement noise standard deviation, the Taylor algorithm measurement noise standard deviation, and the Chan and Taylor algorithm measurement noise standard deviation, respectively.

Improved Sound Source Localization Algorithms 4.1. TDOA Algorithm
The TDOA algorithm works by measuring the time differences between signals received by different receivers and uses these time differences to calculate the position of the transmitting source.Specifically, it is assumed that there are three or more receivers with known relative positions to each other and that when the source transmits a signal, each receiver records the time at which the signal is received.Since the velocity of signal propagation is known, this time difference and velocity information can be used to construct a system of localization hyperbolic equations to calculate the position of the signal source.However, this hyperbolic system of equations tends to be nonlinear, and the obtained solution is often imaginary, or there is no solution, which needs to be solved by transforming the nonlinear system of equations into a linear system of equations.The mainstream methods are the Fang algorithm, the Chan algorithm, and the Taylor algorithm.
In this paper, the Chan algorithm and the Taylor algorithm are selected, and five different sizes of Gaussian white noise are selected for simulation.Figure 7 represent the Chan algorithm measurement noise standard deviation, the Taylor algorithm measurement noise standard deviation, and the Chan and Taylor algorithm measurement noise standard deviation, respectively.The simulation results in Figure 7 show that the RMSE (Root Mean Square Error) of the Chan algorithm and the Taylor algorithm are basically the same when the noise is small.As the noise of the measurements becomes bigger and bigger, the Taylor algorithm's RMSE is obviously better than that of the Chan algorithm.

Improving the Chan-Taylor Localization Algorithm
As the Chan algorithm can make full use of the information of the TDOA measurement value of each base station, it can effectively reduce the negative impact of the large measurement error of individual base stations on the positioning, and the Taylor algorithm obtains higher positioning accuracy when the initial position selection accuracy is higher.Therefore, in this paper, the Chan algorithm and the Taylor algorithm are fused to locate the position of experimental micro-UAV, and five different sizes of Gaussian white noise are selected for simulation.
The principle of the Chan-Taylor joint positioning algorithm is to use the Chan algorithm to estimate the initial coordinates of the target   0 0 , y x .After the Chan algorithm, Taylor iterative calculations are carried out to constantly update the position label obtained by the method that more closely resembles the true position   y x, .In this paper, the Chan-Taylor algorithm is selected and combined with the weighted least squares method for solving.The detailed derivation process is shown in Appendix B; here, we simply show the algorithm processing steps.
(1) In the first step, apply WLS estimation to the nonlinear equations, to obtain a quadratic equation in terms of 1 r .The simulation results in Figure 7 show that the RMSE (Root Mean Square Error) of the Chan algorithm and the Taylor algorithm are basically the same when the noise is small.As the noise of the measurements becomes bigger and bigger, the Taylor algorithm's RMSE is obviously better than that of the Chan algorithm.

Improving the Chan-Taylor Localization Algorithm
As the Chan algorithm can make full use of the information of the TDOA measurement value of each base station, it can effectively reduce the negative impact of the large measurement error of individual base stations on the positioning, and the Taylor algorithm obtains higher positioning accuracy when the initial position selection accuracy is higher.Therefore, in this paper, the Chan algorithm and the Taylor algorithm are fused to locate the position of experimental micro-UAV, and five different sizes of Gaussian white noise are selected for simulation.
The principle of the Chan-Taylor joint positioning algorithm is to use the Chan algorithm to estimate the initial coordinates of the target (x 0 , y 0 ).After the Chan algorithm, Taylor iterative calculations are carried out to constantly update the position label obtained by the method that more closely resembles the true position (x, y).In this paper, the Chan-Taylor algorithm is selected and combined with the weighted least squares method for solving.The detailed derivation process is shown in Appendix B; here, we simply show the algorithm processing steps.
(1) In the first step, apply WLS estimation to the nonlinear equations, to obtain a quadratic equation in terms of r 1 Then, solve the equation for the two roots available, omit the invalid solutions, and substitute the valid solutions back into Equation ( 12) to obtain the estimated coordinates (x 0 , y 0 ) of the location point T.
(2) Use the estimated coordinates (x 0 , y 0 ) obtained by the Chan algorithm as the ini- tial values of Taylor's series expansion method.Performing a Taylor expansion of f i (x, y, x i , y i ) at (x 0 , y 0 ) and translate into matrix form: Finally, solve the weighted least squares solution of the above equation to obtain the final target coordinates: where T(x, y) is the final positioning coordinate of the target that is infinitely close to the true value after iterative calculation.

Verification and Analysis
In this section, we test the performance and robustness of the proposed localization algorithm.In Section 5.1, we compare the EMD with the improved algorithm proposed in this paper by loading simulated signals under ideal conditions.We draw conclusions by comparing the IMF curves of the two algorithms.Section 5.2 verifies the actual performance of the strategy and algorithm proposed in this paper using signals collected in a real environment, thus demonstrating the accuracy and feasibility of the entire sound source localization algorithm proposed in this paper for drone localization scenarios.

Algorithm Stability Validation
Based on the analysis of the flight characteristics of drones, the signal generated by the rotating blades during drone acceleration can be regarded as a short-term impulse signal.This signal is characterized by a peak value appearing after a period of time, which gradually weakens over time.Because of the complex surrounding environment of the experiment, the drone flight signal may be interfered with by certain high-frequency vibration signals in the air and soil.As a result, the received signal may contain high-energy high-frequency vibration signals and low-amplitude random white noise signals.Therefore, we created a similar environment in the laboratory, where the hardware equipment consists of a DAQ122 acquisition card and a discrete microphone array composed of three MEMS digital microphones.The signal collected by the host computer is shown in the figure.We compared the EMD algorithm with the proposed algorithm using this analog signal.The simulation was performed using Matlab (Version: R2022b, created by MathWorks in Natick, MA, USA), and the simulation results are shown in Figures 8-11.
Sensors 2024, 24, x FOR PEER REVIEW 12 of 27 Finally, solve the weighted least squares solution of the above equation to obtain the final target coordinates: where   is the final positioning coordinate of the target that is infinitely close to the true value after iterative calculation.

Verification and Analysis
In this section, we test the performance and robustness of the proposed localization algorithm.In Section 5.1, we compare the EMD with the improved algorithm proposed in this paper by loading simulated signals under ideal conditions.We draw conclusions by comparing the IMF curves of the two algorithms.Section 5.2 verifies the actual performance of the strategy and algorithm proposed in this paper using signals collected in a real environment, thus demonstrating the accuracy and feasibility of the entire sound source localization algorithm proposed in this paper for drone localization scenarios.

Algorithm Stability Validation
Based on the analysis of the flight characteristics of drones, the signal generated by the rotating blades during drone acceleration can be regarded as a short-term impulse signal.This signal is characterized by a peak value appearing after a period of time, which gradually weakens over time.Because of the complex surrounding environment of the experiment, the drone flight signal may be interfered with by certain high-frequency vibration signals in the air and soil.As a result, the received signal may contain high-energy high-frequency vibration signals and low-amplitude random white noise signals.Therefore, we created a similar environment in the laboratory, where the hardware equipment consists of a DAQ122 acquisition card and a discrete microphone array composed of three MEMS digital microphones.The signal collected by the host computer is shown in the figure.We compared the EMD algorithm with the proposed algorithm using this analog signal.The simulation was performed using Matlab (Version: R2022b, created by Math-Works in Natick, MA, USA), and the simulation results are shown in Figures 8-11.The IMF diagrams decomposed by the two algorithms are both listed in Appendix A. By comparing the results in the appendix, it is concluded that the orthogonality between each IMF obtained after REMD of the collected signals is higher than that of the IMF decomposed by EMD, which makes the decomposed IMF more independent.At the same time, after improving robustness, the IMF decomposed by REMD fluctuates significantly, and the IMF decomposed at low frequencies has more convergent stability.There is no signal aliasing in each IMF, which can more accurately capture the local characteristics of the signal, thus greatly improving the performance of later signal reconstruction and adaptive sliding window positioning.The IMF diagrams decomposed by the two algorithms are both listed in Appendix A. By comparing the results in the appendix, it is concluded that the orthogonality between each IMF obtained after REMD of the collected signals is higher than that of the IMF decomposed by EMD, which makes the decomposed IMF more independent.At the same time, after improving robustness, the IMF decomposed by REMD fluctuates significantly, and the IMF decomposed at low frequencies has more convergent stability.There is no signal aliasing in each IMF, which can more accurately capture the local characteristics of the signal, thus greatly improving the performance of later signal reconstruction and adaptive sliding window positioning.

Real Signal Simulation Experiment
Within the test area, to eliminate the influence of human manipulation on the drone's flight position, a system verification test was conducted using pre-recorded audio of a drone's accelerating ascent sound signal to simulate drone flight.In the experiment, the actual positions of the sensors were preset and further corrected using differential GPS for positional coordinates.The recorded drone flight sound signal was then amplified and played at a designated point.The microphone array in the test site collected and processed these signals.After obtaining the time delay values, the TDOA algorithm was utilized for positioning.The system recorded each solution result and compared the positioning outcomes with the preset real target points to evaluate positioning accuracy and system robustness.Figure 12 shows the microphone placement at the test site.

Real Signal Simulation Experiment
Within the test area, to eliminate the influence of human manipulation on the drone's flight position, a system verification test was conducted using pre-recorded audio of a drone's accelerating ascent sound signal to simulate drone flight.In the experiment, the actual positions of the sensors were preset and further corrected using differential GPS for positional coordinates.The recorded drone flight sound signal was then amplified and played at a designated point.The microphone array in the test site collected and processed these signals.After obtaining the time delay values, the TDOA algorithm was utilized for positioning.The system recorded each solution result and compared the positioning outcomes with the preset real target points to evaluate positioning accuracy and system robustness.Figure 12 shows the microphone placement at the test site.
A total of 10 valid tests were conducted in the experiment.Figure 13 shows a schematic diagram of the target positions and sensor positions corresponding to the ten tests.The same drone flight sound signal was used at all playback points, and the signal decibel was unified.Figure 14 shows the pre-collected drone flight sound signals collected by three microphone sensors in the first experiment.According to pre-experiment tests, the signals collected by the sensors in the experiment were basically consistent with the pre-collected drone flight signals.Therefore, they could be used as experimental simulation signals.Meanwhile, there were slight environmental noises such as wind, people walking, and vehicles driving around during the experiment.Table 1 is a record of the trigger times for the signals collected by the system analysis, and Table 2 is a record of the ten tests for the real landing point (x, y) and the system positioning estimate landing point (x * , y * ), which are used to calculate the positioning error δ.A total of 10 valid tests were conducted in the experiment.Figure 13 shows a schematic diagram of the target positions and sensor positions corresponding to the ten tests.The same drone flight sound signal was used at all playback points, and the signal decibel was unified.Figure 14 shows the pre-collected drone flight sound signals collected by three microphone sensors in the first experiment.According to pre-experiment tests, the signals collected by the sensors in the experiment were basically consistent with the precollected drone flight signals.Therefore, they could be used as experimental simulation signals.Meanwhile, there were slight environmental noises such as wind, people walking, and vehicles driving around during the experiment.Table 1 is a record of the trigger times for the signals collected by the system analysis, and Table 2 is a record of the ten tests for the real landing point    A total of 10 valid tests were conducted in the experiment.Figure 13 shows a schematic diagram of the target positions and sensor positions corresponding to the ten tests.The same drone flight sound signal was used at all playback points, and the signal decibel was unified.Figure 14         As can be seen in Figure 14, because of the limited distance of the microphones, the time received by each microphone is very close.For the positioning algorithm adopted in this paper, the signals processed without any treatment or using traditional signal decomposition algorithms cannot guarantee the positioning accuracy of the drone.Next, we decompose the signals collected by the three microphones.
As shown in Figure 15, the amplitudes of the IMFs decomposed by the algorithm in this paper for the three microphones are the same, indicating that the filtering approach adopted in this paper completely separates the mixed noise in the acoustic wave.It can also be seen that the filtering used does not clip the useful signal at the top or bottom.The signal spectrum obtained by FFT transformation is shown in Figure 16  As can be seen in Figure 14, because of the limited distance of the microphones, the time received by each microphone is very close.For the positioning algorithm adopted in this paper, the signals processed without any treatment or using traditional signal decomposition algorithms cannot guarantee the positioning accuracy of the drone.Next, we decompose the signals collected by the three microphones.
As shown in Figure 15, the amplitudes of the IMFs decomposed by the algorithm in this paper for the three microphones are the same, indicating that the filtering approach adopted in this paper completely separates the mixed noise in the acoustic wave.It can also be seen that the filtering used does not clip the useful signal at the top or bottom.The signal spectrum obtained by FFT transformation is shown in Figure 16.A sliding frequency window with variable bandwidth is used for corresponding frequency matching and extraction.The reconstructed signal filtered by the screening criteria is shown in Figure 17.By analyzing the data and using the algorithm to calculate the time difference between different sensor receptions, the trigger signal moments in the extracted IMF components are determined and recorded.Following this process, the time delay results obtained from the remaining nine experiments are recorded in Table 1.Meanwhile, the estimated position coordinates are compared with the actual measured coordinates using the positioning algorithm presented in this paper, and the calculated error values are recorded in Table 2.     Following this process, the time delay results obtained from the remaining nine experiments are recorded in Table 1.Meanwhile, the estimated position coordinates are compared with the actual measured coordinates using the positioning algorithm presented in this paper, and the calculated error values are recorded in Table 2.
In Tables 1 and 2, it can be seen that in the same experimental environment, after analysis using the improved Empirical Mode Decomposition sound source localization algorithm with an adaptive frequency window, the positioning estimation error is below 5% within the range where signals can be collected.This indicates that using the algorithm presented in this paper, the time delay estimation value resolved from the decomposed signal greatly improves the success rate of position estimation.This algorithm performs well in both near and far fields and even in the presence of random noise.However, during the experiment, it was found that the positioning accuracy was still not high enough in some locations.The specific reason is that because of product packaging and other factors,

2.
A modified REMD decomposition of the simulated signals., the following can be determined:  The following solution can be obtained:

2 m and 13 
and the third microphone 3 m , respectively.Then, the time delay 12 can be expressed as:

Figure 2 .
Figure 2. Near-field modeling of uniform line arrays.

Figure 2 .
Figure 2. Near-field modeling of uniform line arrays.

Figure 5 .
Figure 5. Schematic diagram of the time delay estimation method based on amplitude mean value comparison.

Figure 5 .
Figure 5. Schematic diagram of the time delay estimation method based on amplitude mean value comparison.

Figure 6 .
Figure 6.Schematic diagram of the GWO adaptive algorithm.

Figure 6 .
Figure 6.Schematic diagram of the GWO adaptive algorithm.

Figure 7 .
Figure 7. Different algorithms used to measure standard deviation.
the equation for the two roots available, omit the invalid solutions, and substitute the valid solutions back into Equation (12) to obtain the estimated coordinates Chan algorithm as the initial values of Taylor's series expansion method.Performing a Taylor expansion of and translate into matrix form:

Figure 7 .
Figure 7. Different algorithms used to measure standard deviation.

Figure 10 .
Figure 10.IMF decomposition of the EMD algorithm for simulating acoustic signals.

Figure 10 .
Figure 10.IMF decomposition of the EMD algorithm for simulating acoustic signals.Figure 10.IMF decomposition of the EMD algorithm for simulating acoustic signals.

Figure 10 .
Figure 10.IMF decomposition of the EMD algorithm for simulating acoustic signals.Figure 10.IMF decomposition of the EMD algorithm for simulating acoustic signals.

Figure 11 .
Figure 11.IMF decomposition of the REMD algorithm for simulating acoustic signals.

Figure 11 .
Figure 11.IMF decomposition of the REMD algorithm for simulating acoustic signals.

Sensors 2024 , 27 Figure 12 .
Figure 12.Map of sensor positions during the test.
to calculate the positioning error .

Figure 13 .
Figure 13.Schematic diagram of the corresponding target points and sensor positions in the test.

Figure 12 .
Figure 12.Map of sensor positions during the test.

Figure 12 .
Figure 12.Map of sensor positions during the test.
shows the pre-collected drone flight sound signals collected by three microphone sensors in the first experiment.According to pre-experiment tests, the signals collected by the sensors in the experiment were basically consistent with the precollected drone flight signals.Therefore, they could be used as experimental simulation signals.Meanwhile, there were slight environmental noises such as wind, people walking, and vehicles driving around during the experiment.
are used to calculate the positioning error .

Figure 13 .
Figure 13.Schematic diagram of the corresponding target points and sensor positions in the test.Figure 13.Schematic diagram of the corresponding target points and sensor positions in the test.

Figure 13 .
Figure 13.Schematic diagram of the corresponding target points and sensor positions in the test.Figure 13.Schematic diagram of the corresponding target points and sensor positions in the test.

Figure 14 .
Figure 14.Waveforms of the signals collected by the three array microphones.
. A sliding frequency window with variable bandwidth is used for corresponding frequency matching and extraction.The reconstructed signal filtered by the screening criteria is shown in Figure 17.By analyzing the data and using the algorithm to calculate the time difference between different sensor receptions, the trigger signal moments in the extracted IMF components are determined and recorded.

Figure 14 .
Figure 14.Waveforms of the signals collected by the three array microphones.

Figure 15 .Figure 16 .
Figure 15.REMD decomposition of the signals collected by the three array microphones.Sensors 2024, 24, x FOR PEER REVIEW 19 of 27

Figure 17 .
Figure 17.Reconstruction of the IMF trigger signal moment chart: (a-c) denote the IMF decomposition process of the signal, respectively, and c is the final IMF decomposition result.

Sensors 2024 ,
24, x FOR PEER REVIEW 2. A modified REMD decomposition of the simulated signals.Specific Derivation Process of the Chan-Taylor AlgorithmThe specific algorithmic process is as follows:(1) Initial estimation of the target's true position using the Chan algorithm[16] Table 1 is a record of the trigger times for the signals collected by the system analysis, and Table 2 is a record of the ten tests for the real landing point

Table 1 .
Time estimation results after REMD decomposition.

Table 2 .
Test drop monitoring results.

Table 1 .
Time estimation results after REMD decomposition.

Table 2 .
Test drop monitoring results.