High-Precision Joint TDOA and FDOA Location System

: Passive location based on TDOA (time difference of arrival) and FDOA (frequency difference of arrival) is the mainstream method for target localization. This paper proposes a fast time–frequency difference positioning method to address issues such as low accuracy, large computational resource utilization, and limited suitability for real-time signal processing in the conventional CAF (cross-ambiguity function)-based approach, aiming to complete the processing of the target radiation source to obtain the target parameters within a short timeframe. In the mixing product operation step of the CAF, a frequency-domain approach replaces the time-domain convolution operation in PW-ZFFT (pre-weighted Zoom-FFT) to reduce the computational load of the CAF. Additionally, a quadratic surface ﬁtting method is used to enhance the accuracy of TDOA and FDOA. The localization solution is obtained using Newton’s method, which can provide more accurate results compared to analytical methods. Next, a signal processing platform is designed with FPGA (ﬁeld-programmable gate array) and multi-core DSP (digital signal processor), and works by dividing and mapping the algorithm functional modules according to the hardware’s characteristics. We analyze the architectural advantages of multi-core DSP and design methods to improve program performance, such as EDMA transfer optimization, inline function optimization, and cache optimization. Finally, this paper constructs simulation tests in typical positioning scenarios and compares them to hardware measurement results, thus conﬁrming the correctness and real-time capability of the program.

the most commonly utilized methods is the joint time-frequency difference localization technology, which utilizes intercepted signals emitted by target radiation sources to acquire time difference of arrival (TDOA) and frequency difference of arrival (FDOA) information, enabling precise target location.This location system, characterized by minimal system equipment, extended range, and high concealment, is widely employed in UAV reconnaissance, satellite navigation, and various other fields.Leveraging these advantages, time-frequency difference localization has become a focal point of research in passive detection technology [4].
The effectiveness of joint time-frequency difference localization relies on the accuracy of estimating parameters such as time difference and Doppler frequency difference [5].Typically, a method employing the cross-ambiguity function (CAF) is utilized for the joint estimation of time difference and frequency difference [6,7].The cross-ambiguity function can be viewed as a two-dimensional correlation between the time difference and the Doppler frequency difference [8].
Considerable research has been conducted on the localization of target sources using time difference of arrival (TDOA) and frequency difference of arrival (FDOA).In [9], the optimal interpolation factors method was employed to reduce the computational load of mutual fuzzy functions.The work in [10] introduced an algorithm based on the higher-order statistics cross-ambiguity function (CAF-HOS) to reduce the unknown noise correlation effect.Addressing the issue of CAF performance degradation due to time compression from relative target motion, the authors of [11] investigated short-time CAF coherence.In [12], several weighted least-squares methods were employed to obtain the target location without requiring an initial solution.The work in [13] introduced the constrained weighted least squares (CWLS) algorithm for solving target source positioning and velocity.Both these methods can approach the Cramér-Rao lower bound before the threshold effect occurs, demonstrating enhanced positioning performance at high signal-tonoise ratios.The authors of [14] introduced the use of the total least squares (TLS) method for target localization; however, the accuracy of this method falls short of reaching the Cramér-Rao lower bound.In [15,16], time-frequency difference estimation was explored in the contexts of orthogonal frequency division multiplexing (OFDM) signals and linear frequency modulation (LFM) signals.However, these methods are tailored to signals in specific modes, lacking universality.To address the limited FDOA estimation performance of frequency-hopping signals, the authors of [17] proposed a coherent accumulation method, demonstrating superior FDOA estimation accuracy.In comparison to the existing research, this paper makes the following key contributions:

•
We improved the PW-ZFFT algorithm.In the process of calculating the mixing product, the time-domain convolution operation of the signal was replaced by frequencydomain multiplication to achieve a fast calculation of cross-ambiguity functions.For long signals, the frequency-domain method can save more calculations, which is especially important for real-time signal processing.

•
To balance the computational efficiency and accuracy of the joint time-frequency difference estimation algorithm, we investigated and adopted the quadratic surface fitting method.This method reduces errors caused by numerical discontinuities near the peak of the cross-ambiguity function without significantly increasing the computational load, thus achieving high precision and rapid estimation of timefrequency differences.

•
We constructed a heterogeneous signal processing platform based on FPGA and multicore DSP for implementing the time-frequency difference localization method.By allocating algorithms rationally on this platform and fully utilizing its architectural advantages, we achieved both software and hardware design implementation for joint time-frequency difference localization.

•
We proposed a development optimization method based on multi-core DSP, aimed at enhancing program performance and the real-time nature of data processing.This includes EDMA optimization, cache optimization, and inline functions.Through these optimization strategies, we were able to utilize the resources of the DSP multi-core system more effectively, thus enhancing the real-time processing capability of the time-frequency difference localization algorithm.
The rest of this paper is organized as follows: Section 2 introduces the principle of time-frequency difference localization, constructs a joint TDOA/FDOA localization model, and summarizes the improved cross-ambiguity function algorithm and related works in localization solution computation.In Section 3, we construct a signal processing platform for joint time-frequency difference localization, transplant the localization algorithm onto a hardware platform for real-time processing, detail the structure of the hardware platform, and outline the specific implementation methods of each algorithm module.Finally, based on the characteristics of the processor, we provide a development optimization method based on multi-core DSP to improve the real-time nature of signal processing.Section 4 presents a comparison of hardware and simulation localization results in typical localization scenarios, verifying the accuracy and real-time performance of the time-frequency difference joint localization processing platform.Lastly, Section 5 concludes this paper.

Related Works
Related works of the passive location theory and localization algorithm are introduced in this section.
There are many passive positioning systems, including one-point positioning, dynamic positioning, cross positioning, etc. Different positioning systems should be selected based on the relative motion status between the radiation source and the positioning station, the number of positioning stations, and other specific characteristics.Parameters that usually need to be measured include received signal strength (RSS), direction of arrival (DOA), time of arrival (TOA), time difference of arrival (TDOA), doppler frequency and doppler frequency change rate, etc. Positioning methods based on a single parameter have many shortcomings in practical applications.Hybrid positioning improves positioning performance by measuring multiple parameters and is simultaneously used to achieve complementary effects.The study of [18] proposes an RSS/TDOA hybrid positioning algorithm based on wireless sensor networks, which achieves high accuracy and convergence speed.Study [19] considers the impact of multipath noise on positioning performance, and proposes a high-resolution estimation technique to implement DOA/TOA positioning.In low signal-to-noise ratio scenarios, estimating the spatial position of the target source performs better than the multipath fingerprinting scheme.The DOA/TDOA joint positioning proposed in the study by [20] solves the limitations of traditional positioning methods when the range of the signal source is unknown, and at the same time improves the accuracy and applicability of near-field and far-field positioning.
The estimation of the target position often requires complex calculations, which generally require processing processes such as signal interception, signal sorting, and signal pairing.However, there are also technologies that use compressed sensing, deep neural networks, etc., to directly perform positioning, which greatly reduces the computational complexity and processing time.Study [21] proposes a precise indoor positioning system based on RSS, which uses the compressed sensing theory to recover sparse signals from a small number of noisy measurements.The system includes two steps-rough positioning and fine positioning, the cluster in which a node is located is first detected by comparing the received signal strengths of multiple clusters-and then the compressed sensing theory is used to further refine the position estimate.Study [22] proposes an improved recurrent neural network (RNN) method for passive signal source localization.Through spatial partitioning, a cost function is designed to convert the positioning of the transmitter into the output of a modified recurrent neural network.These neural networks are both chaotic and stable in terms of convergence.

Passive Localization System
This section will elucidate the principles of time-frequency difference positioning and build the echo model, detailing the estimation methods for TDOA and FDOA, as well as the positioning solution methods.Firstly, the principle of time-frequency difference positioning is summarized in Section 3.1.Then, in Section 3.2, the cross-ambiguity function is introduced.The improved PW-ZFFT algorithm is utilized for the rapid computation of the cross-ambiguity function, obtaining coarse estimates of the time-frequency differences.Section 3.3 discusses the method of quadratic surface fitting to obtain precise estimates of time-frequency differences.In Section 3.4, we build a set of positioning equations and solve these nonlinear equations using Newton's method, thus acquiring the position information of the source.

Time-Frequency Difference Location Theory
The time-frequency difference-based positioning principle relies on the distance difference between multiple receiving stations and the target source, leading to varying signal arrival times at each station, thereby resulting in a time difference of arrival.Additionally, the speed of multiple satellite-receiving stations relative to the target source differs, causing a frequency difference in the signals received at each station due to the Doppler effect [23][24][25].By intercepting the signals emitted by the target source and performing analysis and processing, the time difference and frequency difference of the signals arriving at multiple receiving stations can be measured.The time difference of arrival determines an iso-time-difference hyperbolic surface, and the frequency difference yields an iso-frequency-difference surface.Combined with the Earth's surface equation, they enable the determination of the target source's location [26].The key to joint time-frequency difference positioning is the estimation of time delay and Doppler frequency difference.The accuracy of parameter estimation directly affects the accuracy of positioning.

Cross-Ambiguity Function
The CAF method is employed for joint time-frequency difference estimation [27].This involves initially pre-delaying one of the two signals, and then conjugating and multiplying both signals to generate a mixing product signal.By adjusting the length of the pre-delay τ, multiple mixing product signals are combined to form a mixing product matrix.A Fourier transform is then performed on the frequency difference dimension of this mixing product matrix, resulting in the spectrum of the mixed product, which is the CAF.
Assume that the target emits an electromagnetic wave time-domain signal as follows: Then, the model of the primary star and secondary star sensors receiving signals can be expressed as where α is the attenuation coefficient, θ is the initial phase, D represents the time delay between two received signals, f D represents the Doppler frequency difference of the two received signals, and w 1 (t) and w 2 (t) are stationary, zero mean, independent Gaussian white noise.The cross-ambiguity function is defined as where '*' represents the complex conjugate of the signal, x(t) represents the signal received by the primary star, and y(t − τ) represents the signal received by the secondary star after a pre-delay of τ.
In practical applications, the sampling signal is discrete, and the corresponding discrete CAF is where N is the number of sampling points.For any time delay τ, the cross-ambiguity function can be regarded as the conjugate multiplication of x * (nT s ) and y(nT s − d∆τ) to obtain the mixing product r N (n; d), and then the N-point discrete Fourier transform on the mixed product is performed.
When the surface is formed by the CAF peaks, the corresponding peak points τ and f are the estimated values of TDOA and FDOA.
where D and f D represent TDOA and FDOA, respectively.

Fast Calculation for CAF
In actual systems, the duration of signals is usually quite long; using FFT directly to compute the cross-ambiguity function is therefore time-consuming and may not meet realtime requirements [28].In addition, the frequency interval of the cross-ambiguity function in Equation ( 7) is [− f s /2, f s /2], but the Doppler frequency shift caused by the relative motion between the source and the receiver is much smaller than the sampling frequency f s , leading to unnecessary computational effort.Considering these factors, we present an improvement on the pre-weighted Zoom-FFT method for calculating the cross-ambiguity function.This approach retains the advantages of the Zoom-FFT method, reduces the signal length and limits the signal's spectral range to the vicinity of its actual frequency difference [29][30][31][32], thus reducing the number of FFT points from N to M for each time delay.
The implementation of the Zoom-FFT method can be summarized as follows.Let the N-point mixing product pass a low-pass filter and then be down-sampled by a factor of D, which is enough to preserve useful component.We will obtain an M-point series, where M = N/D.Perform M-point FFT, with which we can obtain the CAF.The pre-weighted Zoom-FFT algorithm builds on the Zoom-FFT by dividing the filtering process into two steps: weighting and summation.Weighing is processed first because it is independent of the ambiguity function's time delay parameter.As a result, complex multiplication is reduced.The PW-ZFFT method is summarized as follows.Initially, two signals are divided into M segments, with each segment consisting of N h points.This results in x (m) (lT s ) and y (m) (lT s ), m = 1, 2, . . ., M and l = 1, 2, . . ., N h , where In the second step, weigh x (m) (lT s ) with h (l), as follows: h (l) represents the reverse-order coefficients of the filter.The third step involves performing mixed product operations.In this step, we improve the PW-ZFFT algorithm by converting the time-domain convolution into frequency-domain multiplication.Specifically, x ((mD + l)T S ) and y((mD + l)T s ) are first transformed into the frequency domain, the two signals are multiplied in the frequency domain, and are finally transformed back into the time domain.A total of 2M FFT operations, M complex multiplication operations, and M IFFT operations need to be performed.Multiplying in the frequency domain is more efficient than convolving in the time domain.When the length of two signals is N, the calculation of the mixing product using frequency-domain multiplication can be completed in 2 × O(Nlog(N)) + O(log(N)) time.In contrast, the computational complexity of timedomain convolution is O N 2 .For long signals, the frequency-domain approach is more advantageous, which is especially important for real-time signal processing.The specific steps involve performing FFT on two signals, multiplying them, and then using IFFT to transform them back to the time domain as follows: Finally, perform an M-point discrete Fourier transform on r M (n; d) to obtain the CAF.The frequency interval is Figure 1 shows the number of the computational complexity of ZFFT, pre-weighted ZFFT, and frequency-domain method for pre-weighted ZFFT.We use the number of complex multiplications (NCM) as a figure of merit.
Remote Sens. 2024, 16, x FOR PEER REVIEW 6 of 19 ℎ′() represents the reverse-order coefficients of the filter.The third step involves performing mixed product operations.In this step, we improve the PW-ZFFT algorithm by converting the time-domain convolution into frequency-domain multiplication.Specifically,  ( + ) and  ( + ) are first transformed into the frequency domain, the two signals are multiplied in the frequency domain, and are finally transformed back into the time domain.A total of 2M FFT operations, M complex multiplication operations, and M IFFT operations need to be performed.Multiplying in the frequency domain is more efficient than convolving in the time domain.When the length of two signals is , the calculation of the mixing product using frequency-domain multiplication can be completed in 2 × (log()) + (log()) time.In contrast, the computational complexity of time-domain convolution is ( ) .For long signals, the frequency-domain approach is more advantageous, which is especially important for real-time signal processing.The specific steps involve performing FFT on two signals, multiplying them, and then using IFFT to transform them back to the time domain as follows: Finally, perform an  -point discrete Fourier transform on  (; ) to obtain the CAF.The frequency interval is [− /2,  /2].
Figure 1 shows the number of the computational complexity of ZFFT, pre-weighted ZFFT, and frequency-domain method for pre-weighted ZFFT.We use the number of complex multiplications (NCM) as a figure of merit.

Quadratic Surface Fitting
In Section 3.2, we perform a peak search on the cross-ambiguity function to obtain estimates of time delay and frequency shift.Due to the continuous variation of the timefrequency differences between signals, this method can only provide coarse estimates of

Quadratic Surface Fitting
In Section 3.2, we perform a peak search on the cross-ambiguity function to obtain estimates of time delay and frequency shift.Due to the continuous variation of the timefrequency differences between signals, this method can only provide coarse estimates of TDOA and FDOA [33][34][35].Therefore, the quadratic surface fitting method is used to improve the accuracy of the time-frequency difference estimation.Assume that the coarse estimated value coordinates of time delay and frequency shift are The specific steps of quadratic surface fitting are as follows.First, establish a quadratic surface equation as where c 1 , c 2 , . . ., c 6 are the coefficients to be determined of the fitting function.The eight points around the peak value are recorded to determine these coefficients.By substituting the coordinates of the peak value and the surrounding eight points into the surface equation, the least squares solution for the coefficients c 1 , c 2 , . . ., c 6 can be obtained.By calculating the derivative of Equation ( 16), we can obtain that the peak coordinates satisfy the relationship, as follows: After solving the equation, we can obtain the offset of the precise estimation result of the time-frequency difference compared with the original peak value as follows: Thus, the precise TDOA and FDOA are "

Joint Time-Frequency Difference Localization Method
Based on TDOA, the iso-time-difference hyperbolic surface (a spatial hyperbolic surface with two receiving stations as foci) is determined; based on FDOA, the iso-frequencydifference surface (an ellipsoidal surface with two receiving stations as foci) is determined.Given the earth's surface equation, calculate its intersection with the iso-time-difference hyperbolic surface and the iso-frequency-difference surface.After removing one of the mirror ambiguity points, the remaining point is the location of the source.

Time-Frequency Difference Localization Model
In the geodetic coordinate system, assume that the primary star coordinate vector is s 1 = x 1 , y 1 , z 1 ] T , and its radial velocity is v 1 = v x1 , x y1 , v z1 ] T , and assume that the secondary star coordinate vector is s 2 = x 2 , y 2 , z 2 ] T , and its radial velocity is The coordinate vector of the target source is u = x, y, z] T ; thus, the distance between the two receiving stations and the source is Two receiving stations receive electromagnetic waves from a ground target source.Let the TDOA of the electromagnetic waves at the two receiving stations be ∆t, and the FDOA of the received signals be ∆ f D .The electromagnetic wave propagation speed is known to be c = 3 × 10 8 m/s.The distance difference between the two signals, represented by the TDOA, can be expressed as Formula ( 25) is the TDOA measurement equation.By differentiating it, we can obtain where ∆ . r represents the time difference change rate.If the frequency of the target source signal is f 0 , ∆ .r can be measured by the frequency difference ∆ f D , and the relationship between them satisfies the following equation: By substituting Equation ( 21) into Equation ( 20), the FDOA measurement equation can be obtained as where λ = c/ f 0 is the wavelength of the target source signal.Given the Earth's radius R, the coordinates of the target source located on the Earth's surface satisfy the following equation: By combining these three equations, a nonlinear set of equations is formed, as follows: Solving this system of equations allows for the precise localization of the target radiation source.By substituting the coordinates of the two receiving stations and the radiation source into this system, we obtain

Newton's Method
In this section, we will utilize Newton's method to solve the roots of nonlinear equations, i.e., the exact coordinates of the target source.The theoretical basis of Newton's method is to use the Taylor series expansion to approximate the roots of a function and iteratively approach the solution through linear approximation.During the iteration process, the coordinates are updated in the direction of the function's gradient to find a more precise root [36].This method is usually very effective when the derivative of the function is easy to calculate, converging rapidly to a solution near the root.The advantage of Newton's method over analytical methods lies in its more precise positioning results [37].
The accuracy of the positioning of Newton's method depends on the precision of the initial estimate.With a relatively accurate initial value, it can successfully converge through many iterations.Compared to analytical methods, Newton's method can achieve superior results.The positioning accuracy of Newton's iteration is within 10 KM, which is relatively similar to the constrained weighted least squares (CWLS) algorithm, and when the signal-to-noise ratio is above 15 dB, the positioning accuracy of Newton's iteration can reach 1 KM.
For the nonlinear equation F(X) = 0, suppose X k is the approximate solution, and the function F(X) is differentiable near X k .Then, F(X) at X k can be approximated as a linear equation, as follows: This equation has a unique solution, denoted as X k+1 , and Newton's iteration formula is According to Section 3.4.1, the TDOA/FDOA positioning equations can be expressed as Let X = [x, y, z] T ,F(X) = F 1 (x, y, z), F 2 (x, y, z), F 3 (x, y, z)] T .Then, the coefficient matrix of the differentiated nonlinear equation system F(X) can be expressed as the positioning accuracy), stop iterating.The last iteration result is the positioning result.To prevent the iteration from not stopping due to insufficient accuracy, set the maximum number of iterations as epoch.The coordinate point X k+1 at the stop of iteration is taken as the solution to the time-frequency difference equation group, i.e., the estimated coordinates of the target source.

Hardware Platform Design and Algorithm Implementation
This section introduces the specific engineering implementation process of the joint time-frequency difference positioning algorithm.Section 4.1 will build a signal processing platform and describe its specific structure.Section 4.2 elaborates on the implementation method of the joint time-frequency difference positioning algorithm, including how various modules of the algorithm map to the hardware.In Section 4.3, we present methods to improve the algorithm's execution efficiency on the DSP platform, which enhance the real-time performance of the time-frequency difference joint positioning system.

Signal Processing Platform Design
In this section, we build a signal processing platform suitable for the joint timefrequency difference positioning algorithm and complete its hardware implementation.The signal processing platform is constructed following the 6U-CPCI architecture standard.The board is equipped with a CPCI bus interface to establish connections with other functional sub-cards.Simultaneously, the on-board extensive resource FPGA, 8-core DSP, and high-performance SOC enable the transmission of processed results to the host computer via the Ethernet.The processed results are subsequently conveyed through three optical fiber connectors and network ports.The board offers three optical ports, one Gigabit Ethernet port, and two hundred Gigabit Ethernet ports.Given the complexity and parallel nature of the time-frequency difference positioning algorithm, the signal processing platform utilizes Xilinx-Virtex7 series high-performance FPGA and a multi-core DSP developed by the National University of Defense Technology, forming a heterogeneous processing architecture.FPGA is programmed in hardware language, suitable for high line rate transmission, block diagram programming, processing fixed, or repetitive tasks.DSP uses the C language, assembly language programming, floating-point operation ability, and a short development cycle, suitable for dealing with complex multi-thread algorithm tasks.TI provides a wealth of library functions, and provides corresponding function support for each peripheral interface, which is convenient for users to develop.The two high-performance DSPs can perform large-scale, high-precision digital signal processing and are equipped with gigabit Ethernet ports for data exchange with the host computer.The FPGA powerful, flexibly configurable programmable resources for implementing input/output interfaces, general digital logic, memory, and digital signal processing, which are responsible for unpacking instructions and data preprocessing.Each DSP signal processing node is equipped with four DDR3 SDRAM chips, forming a 64-bit width and 2 GB of storage space.The FPGA signal processing node is equipped with two DDR3 SDRAM chips, forming a group with 1GB capacity and 32-bit width storage space, which are used for storing a large amount of intermediate data in the algorithm.Four high-speed 4×SRIO interfaces are mounted between every two processing nodes, enabling high-speed data transmission between the FPGA and DSP, and between the two DSPs.The signal processing platform interacts with the host computer through an Ethernet interface for data exchange, establishing a TCP connection, through which the host computer sends and uploads instructions and data, as well as displays graphics.The picture of the digital signal processing platform is shown in Figure 2.
high-performance SOC enable the transmission of processed results to the host computer via the Ethernet.The processed results are subsequently conveyed through three optical fiber connectors and network ports.The board offers three optical ports, one Gigabit Ethernet port, and two hundred Gigabit Ethernet ports.Given the complexity and parallel nature of the time-frequency difference positioning algorithm, the signal processing platform utilizes Xilinx-Virtex7 series high-performance FPGA and a multi-core DSP developed by the National University of Defense Technology, forming a heterogeneous processing architecture.FPGA is programmed in hardware language, suitable for high line rate transmission, block diagram programming, processing fixed, or repetitive tasks.DSP uses the C language, assembly language programming, floating-point operation ability, and a short development cycle, suitable for dealing with complex multi-thread algorithm tasks.TI provides a wealth of library functions, and provides corresponding function support for each peripheral interface, which is convenient for users to develop.The two highperformance DSPs can perform large-scale, high-precision digital signal processing and are equipped with gigabit Ethernet ports for data exchange with the host computer.The FPGA integrates powerful, flexibly configurable programmable resources for implementing input/output interfaces, general digital logic, memory, and digital signal processing, which are responsible for unpacking instructions and data preprocessing.Each DSP signal processing node is equipped with four DDR3 SDRAM chips, forming a 64-bit width and 2 GB of storage space.The FPGA signal processing node is equipped with two DDR3 SDRAM chips, forming a group with 1GB capacity and 32-bit width storage space, which are used for storing a large amount of intermediate data in the algorithm.Four high-speed 4×SRIO interfaces are mounted between every two processing nodes, enabling high-speed data transmission between the FPGA and DSP, and between the two DSPs.The signal processing platform interacts with the host computer through an Ethernet interface for data exchange, establishing a TCP connection, through which the host computer sends and uploads instructions and data, as well as displays graphics.The picture of the digital signal processing platform is shown in Figure 2.

Algorithm Implementation
Based on the algorithm design discussed in Section 3, assume that the minimum and maximum time differences between two receiving data channels are t min and t max , respectively.If t min to t max is divided into H groups, then for any delay τ, the CAF can be considered as the result of processing the two received data streams s 1 (t) and s 2 (t), as shown in Figure 3.After obtaining the CAF, a quadratic surface fitting is performed to accurately estimate the time-frequency difference.Combining this with the parameters of the positioning scenario, the position information of the target source is obtained.

Algorithm Implementation
Based on the algorithm design discussed in Section 3, assume that the minimum and maximum time differences between two receiving data channels are  and  , respectively.If  to  is divided into  groups, then for any delay , the CAF can be considered as the result of processing the two received data streams  () and  (), as shown in Figure 3.After obtaining the CAF, a quadratic surface fitting is performed to accurately estimate the time-frequency difference.Combining this with the parameters of the positioning scenario, the position information of the target source is obtained.For the joint time-frequency difference positioning algorithm, the algorithm implementation tasks are divided according to the processing advantages of FPGA and DSP.FPGA is suitable for high data rate, parallel operations, and fixed processing tasks in development scenarios, and is therefore used to implement parallel data collection, preprocessing, and mixing product computation in the time-frequency difference positioning algorithm.DSP is suitable for lower data bandwidth, multi-condition operations, complex algorithm tasks, and high-precision digital signal processing scenarios.As a result, it is used to complete the computation of the cross-ambiguity function, peak search, quadratic surface fitting, and the subsequent positioning solution process.Additionally, a dual-DSP architecture is adopted to further divide the computation process to meet the algorithm's requirements for DSP processing capabilities.The data interaction between each processing node is carried out using a high-speed bus SRIO to achieve a high-speed transmission of data and instructions.
Figure 4 shows the specific implementation flow of the joint time-frequency difference positioning method in FPGA and DSP.FPGA receives two parallel data streams for For the joint time-frequency difference positioning algorithm, the algorithm implementation tasks are divided according to the processing advantages of FPGA and DSP.FPGA is suitable for high data rate, parallel operations, and fixed processing tasks in development scenarios, and is therefore used to implement parallel data collection, preprocessing, and mixing product computation in the time-frequency difference positioning algorithm.DSP is suitable for lower data bandwidth, multi-condition operations, complex algorithm tasks, and high-precision digital signal processing scenarios.As a result, it is used to complete the computation of the cross-ambiguity function, peak search, quadratic surface fitting, and the subsequent positioning solution process.Additionally, a dual-DSP architecture is adopted to further divide the computation process to meet the algorithm's requirements for DSP processing capabilities.The data interaction between each processing node is carried out using a high-speed bus SRIO to achieve a high-speed transmission of data and instructions.
Figure 4 shows the specific implementation flow of the joint time-frequency difference positioning method in FPGA and DSP.FPGA receives two parallel data streams for processing and transmits the mixed product results to DSP1 via SRIO.After signal processing using the DSP1 node, the time-frequency difference estimation results are transmitted to DSP2 to carry out a positioning solution, thus obtaining the target source's location.The processing results are then transmitted back to the host computer via the Ethernet and are displayed.
processing and transmits the mixed product results to DSP1 via SRIO.After signal processing using the DSP1 node, the time-frequency difference estimation results are transmitted to DSP2 to carry out a positioning solution, thus obtaining the target source's location.The processing results are then transmitted back to the host computer via the Ethernet and are displayed.In dual-path data parallel acquisition, data need to be delayed according to the time difference search range.Pre-delay is applied to  () in parallel data for  segments based on prior information, while the starting position of  () remains unchanged.The relative offset of  () to  () for each segment is given by  +  2 , ( = 2,3,4, . . ., ) The pre-delayed  () and  () are stored in DDR, and the pre-delay instructions and the subsequent compensatory instructions for segmentation are stored in BRAM.
Since  () is the delayed input, the excess part at the end of  () is truncated.Take the overlapping parts of  () and  () for subsequent data preprocessing, and write zeros before each piece of data  () to align it with  () to facilitate the subsequent division of  segments.Assuming a clock period of  , we calculate the number of zeros written in the  segment of  () as follows: Data are read subsequently by controlling the FIFO read and write commands.Two receiving FIFOs acquire  () and  () separately.The write clock and read clock of the two receiving FIFOs operate in different clock domains.In the process of selecting the FIFO read and write clocks, the write clock frequency is much lower than the read clock frequency, ensuring that the FIFO does not overflow in the case of discontinuous data reading.The valid signal identifies the effective length of each data segment.The valid signal of the  () serves as the write enable for both FIFOs.When the rising edge of the  () valid signal is detected, the write enable for both FIFOs is asserted, allowing the FIFO of the  () and  () to perform writing operations.At the same time, this valid signal serves as the flag for the counter to start counting.When the  () valid signal is high and the  () valid signal is low, zeros are written into the  ()'s FIFO.When the rising edge of the  () valid signal is detected, the counting ends, and the data written In dual-path data parallel acquisition, data need to be delayed according to the time difference search range.Pre-delay is applied to s 2 (t) in parallel data for H segments based on prior information, while the starting position of s 1 (t) remains unchanged.The relative offset of s 2 (t) to s 1 (t) for each segment is given by The pre-delayed s 1 (t) and s 2 (t) are stored in DDR, and the pre-delay instructions and the subsequent compensatory instructions for segmentation are stored in BRAM.
Since s 2 (t) is the delayed input, the excess part at the end of s 2 (t) is truncated.Take the overlapping parts of s 1 (t) and s 2 (t) for subsequent data preprocessing, and write zeros before each piece of data s 2 (t) to align it with s 1 (t) to facilitate the subsequent division of M segments.Assuming a clock period of t clk , we calculate the number of zeros written in the H segment of s 2 (t) as follows: Data are read subsequently by controlling the FIFO read and write commands.Two receiving FIFOs acquire s 1 (t) and s 2 (t) separately.The write clock and read clock of the two receiving FIFOs operate in different clock domains.In the process of selecting the FIFO read and write clocks, the write clock frequency is much lower than the read clock frequency, ensuring that the FIFO does not overflow in the case of discontinuous data reading.The valid signal identifies the effective length of each data segment.The valid signal of the s 1 (t) serves as the write enable for both FIFOs.When the rising edge of the s 1 (t) valid signal is detected, the write enable for both FIFOs is asserted, allowing the FIFO of the s 1 (t) and s 2 (t) to perform writing operations.At the same time, this valid signal serves as the flag for the counter to start counting.When the s 1 (t) valid signal is high and the s 2 (t) valid signal is low, zeros are written into the s 2 (t)'s FIFO.When the rising edge of the s 2 (t) valid signal is detected, the counting ends, and the data written into the s 2 (t)'s FIFO switch from zeros to the s 2 (t).This approach achieves truncation and zero padding of the two data paths.
Following the discussion in Section 3 on the fast calculation method for the CAF, we now divide the s 1 (t) and s 2 (t) into M segments, each with a length of D, and select N h points in each data segment.In the read clock domain of the FIFO, by properly designing the width of the read enable, fixed-length data are sequentially read from the receiving FIFO, achieving data segmentation.This process can be expressed as After calculating s 2 (t) are then subjected to FFT processing, complex multiplication processing, and IFFT processing in succession, requiring a total of 2M FFT processing, M complex multiplication processing, and M inverse FFT processing operations.
As shown in Figure 5, this process employs a pipelining approach to process the M segments of data.When the rising edge of the FIFO read command is detected, the data s    2 (t) are synchronized through the FIFO2 and multiplied, the read command of the FIFO1 is lowered for a period and then raised again.The calculation module detects the rising edge of the FIFO2 read command again, and continues to read the next segment of data.While the first segment of data undergoes subsequent IFFT processing, the second segment of data goes through FFT processing, and the processing of the two segments proceeds simultaneously without affecting each other.This process continues in sequence until all M segments of data are processed.This method significantly reduces the algorithm processing time in the FPGA, enhancing data processing efficiency.After processing, a mixing product matrix of size M * N h is obtained, as follows: now divide the  () and  () into  segments, each with a length of , and selec points in each data segment.In the read clock domain of the FIFO, by properly desig the width of the read enable, fixed-length data are sequentially read from the rece FIFO, achieving data segmentation.This process can be expressed as After calculating  ( ) () with weighted filtering,  ( ) () and  ( ) () are then jected to FFT processing, complex multiplication processing, and IFFT processing in cession, requiring a total of 2 FFT processing,  complex multiplication proces and  inverse FFT processing operations.As shown in Figure 5, this process employs a pipelining approach to process th segments of data.When the rising edge of the FIFO read command is detected, the  ( ) () and  ( ) () begin their respective multiplication and FFT calculation proc in sequence.When the FFT calculation results of  ( ) () and  ( ) () are synchron through the FIFO2 and multiplied, the read command of the FIFO1 is lowered for a p and then raised again.The calculation module detects the rising edge of the FIFO2 command again, and continues to read the next segment of data.While the first seg of data undergoes subsequent IFFT processing, the second segment of data goes thr FFT processing, and the processing of the two segments proceeds simultaneously wit affecting each other.This process continues in sequence until all  segments of dat processed.This method significantly reduces the algorithm processing time in the FP enhancing data processing efficiency.After processing, a mixing product matrix of  *  is obtained, as follows:

𝑟 (𝑚; 𝑑)
After completing the mixing product operation, the data are truncated accordi the algorithm's parameters.The processed data are packaged and transmitted to the of the DSP1 through the SRIO.After receiving the doorbell message from the SRIO DSP performs matrix transposition and rearrangement of the mixing product data t mitted pulse by pulse via the FPGA and converts the fixed-point data into single-prec floating-point data.To meet the high real-time requirements of the time-frequency difference positio algorithm, this paper employs multi-core parallel processing in the DSP.The 0-core o After completing the mixing product operation, the data are truncated according to the algorithm's parameters.The processed data are packaged and transmitted to the DDR of the DSP1 through the SRIO.After receiving the doorbell message from the SRIO, the DSP performs matrix transposition and rearrangement of the mixing product data transmitted pulse by pulse via the FPGA and converts the fixed-point data into single-precision floatingpoint data.

FIFO Write Enable
To meet the high real-time requirements of the time-frequency difference positioning algorithm, this paper employs multi-core parallel processing in the DSP.The 0-core of the DSP-M6678 serves as the "main core" responsible for system control and task allocation, while the remaining 7 cores act as computational units participating in calculation tasks.A "Fork-Join" working pattern is employed among the 8 cores.During the FFT processing of the mixing product matrix, a synchronous computation is used among the 8 cores.Data are evenly divided into 8 parts in the frequency-domain dimension, with each core determining its data processing address based on its core number.Since the core's access speed to L2 SRAM (Level 2 Static Random-Access Memory) is higher than that of DDR SDRAM, the data for each frequency-domain dimension is first moved to L2 SRAM using EDMA.As shown in Figure 6, the FFT calculation and absolute value computation are performed in L2 SRAM, and the results are stored in DDR via EDMA.
DSP-M6678 serves as the "main core" responsible for system control and task allocation, while the remaining 7 cores act as computational units participating in calculation tasks.A "Fork-Join" working pattern is employed among the 8 cores.During the FFT processing of the mixing product matrix, a synchronous computation is used among the 8 cores.Data are evenly divided into 8 parts in the frequency-domain dimension, with each core determining its data processing address based on its core number.Since the core's access speed to L2 SRAM (Level 2 Static Random-Access Memory) is higher than that of DDR SDRAM, the data for each frequency-domain dimension is first moved to L2 SRAM using EDMA.As shown in Figure 6, the FFT calculation and absolute value computation are performed in L2 SRAM, and the results are stored in DDR via EDMA.Perform quadratic surface fitting on the discrete CAF, and perform a peak search on the fitting results to obtain a coarse estimate of the time-frequency difference.First, we find the maximum value of each segment of the frequency difference data, resulting in  values.Then, we identify the peak value from these  values, corresponding to the precise estimates of TDOA and FDOA.Transmit TDOA and FDOA, as well as the coordinates of the receiving stations, to the DSP2 via 4×SRIO.According to the algorithm's design, the DSP2 calculates the location of the source.
We utilize the Semaphore mechanism provided by TI (Texas Instruments) for C66x CorePacs to implement a multi-core synchronization for shared resources.Semaphores are used to prevent conflicts in the access of shared resources among multiple cores in the data processing flow [38].Each core in the DSP can acquire semaphores resources in three ways: direct access, indirect access, and combined access.Cores read the SEM_DIRECT register to acquire semaphore values and check the semaphore states of each core.This allows it to determine whether each core's data processing task is completed.
During a multi-core synchronization operation, all 8 cores simultaneously call the CSL_semAcquireDirect() function to acquire semaphores.In the next synchronization step, all 8 cores simultaneously call the CSL_semReleaseSemaphore() function to release Perform FFT processing on the mixing product matrix to obtain the CAF.The specific implementation formula is Perform quadratic surface fitting on the discrete CAF, and perform a peak search on the fitting results to obtain a coarse estimate of the time-frequency difference.First, we find the maximum value of each segment of the frequency difference data, resulting in N values.Then, we identify the peak value from these N values, corresponding to the precise estimates of TDOA and FDOA.Transmit TDOA and FDOA, as well as the coordinates of the receiving stations, to the DSP2 via 4×SRIO.According to the algorithm's design, the DSP2 calculates the location of the source.
We utilize the Semaphore mechanism provided by TI (Texas Instruments) for C66x CorePacs to implement a multi-core synchronization for shared resources.Semaphores are used to prevent conflicts in the access of shared resources among multiple cores in the data processing flow [38].Each core in the DSP can acquire semaphores resources in three ways: direct access, indirect access, and combined access.Cores read the SEM_DIRECT register to acquire semaphore values and check the semaphore states of each core.This allows it to determine whether each core's data processing task is completed.
During a multi-core synchronization operation, all 8 cores simultaneously call the CSL_semAcquireDirect() function to acquire semaphores.In the next synchronization step, all 8 cores simultaneously call the CSL_semReleaseSemaphore() function to release the semaphores.Alternating semaphore acquisition and release achieves multi-core synchronization.

1.
EDMA transfer optimization: EDMA defines data transfers in three dimensions: ACNT, BCNT, and CCNT.In our program design, EDMA is employed for the continu-ous movement of mixing product matrices.We use the AB synchronous data transfer mode, with ACNT and BCNT playing crucial roles.As the size of ACNT determines the data block size for a single burst transfer in EDMA, we opt for a larger ACNT parameter to enhance the efficiency of EDMA data transfers.

2.
Inline function and loop unrolling: The "#pragma UNROLL(n)" directive is used to specify that a particular loop needs to be unrolled n times by the compiler.This allows multiple iterations of the loop body to be copied into the program, folding the loop and improving instruction-level parallelism in the code.In the calculation process of time-frequency difference positioning, operations such as fixed-point to floating-point conversion and spectrum shifting are optimized using inline functions and loop unrolling to reduce the overall execution time.

3.
Cache optimization: During the calculation of the CAF, data from each segment of frequency difference in the mixing product is sequentially moved into the L2 SRAM for FFT and modulus calculations.The DSP kernel needs to access the L2 SRAM multiple times, and the read and write performance of the L2 SRAM highly depends on L1D.Therefore, we configure the entire L1D as cache to improve the efficiency of data retrieval by the kernel.When the kernel accesses the L2 SRAM multiple times, instruction pipelining can help reduce additional overhead, and the cache coherence of the L2 SRAM is automatically maintained by hardware, eliminating the need for manual maintenance.

Time-Frequency Difference Estimation Results
To validate the accuracy of the time-frequency difference localization system's calculations, we conducted a simulation where two pseudo-random code-modulated BPSK signals were generated.Time delays and Doppler frequency shifts were introduced to BPSK1 and BPSK2.With a sampling rate of 200 kHz, we generated 10,000 data points for joint time-frequency difference estimation and computed the CAF for the two signals.The specific simulation parameters are shown in Table 1: MATLAB is used for algorithm simulation analysis.The TDOA estimation is 0.0002 s, and the FDOA estimation is −0.00009722598Hz.The simulation results of the CAF are shown in Figure 7: In the figure, the x-axis and y-axis, respectively, represent the time difference correction value and the frequency difference correction value.We validate the time-frequency difference localization algorithm on the signal processing platform.The time-frequency difference estimates calculated by the signal processing platform are compared with the simulated results, as shown in Table 2.In the figure, the x-axis and y-axis, respectively, represent the time difference correction value and the frequency difference correction value.We validate the time-frequency difference localization algorithm on the signal processing platform.The time-frequency difference estimates calculated by the signal processing platform are compared with the simulated results, as shown in Table 2.The difference between hardware results and simulation results mainly comes from the following reasons: 4. Data format: MATLAB's data format comprises double-precision floating-point numbers, while hardware (DSP and FPGA) uses fixed-point numbers and single-precision floating-point numbers. 5. Programming language characteristics: The underlying logic of the programming language used by DSP and FPGA is different from that used by a MATLAB simulation, which also affects their different rounding and truncation strategies when processing numerical values, resulting in cumulative errors.
The comparative results indicate that the accuracy of FDOA is between 10 −7 and 10 −8 , and the accuracy of TDOA can be maintained between 0.1 us and 0.01 us.The positioning results are in good agreement with the simulated calculations, which satisfies the precision requirements for practical applications, thus validating the correctness and effectiveness of the joint time-frequency difference localization processing platform presented in this paper.

Real-Time Performance Analysis
Since core 0 arranges the tasks of other cores, statistical time performance is based on core 0. The time measurement tools used comprise the timestamp registers (TSCL, TSCH) from the TI library, and the DSP operates with a clock frequency of 1.0 GHz.Table 3 provides the execution time statistics for each module before and after DSP program optimization.The difference between hardware results and simulation results mainly comes from the following reasons: 1.
Data format: MATLAB's data format comprises double-precision floating-point numbers, while hardware (DSP and FPGA) uses fixed-point numbers and single-precision floating-point numbers.

2.
Programming language characteristics: The underlying logic of the programming language used by DSP and FPGA is different from that used by a MATLAB simulation, which also affects their different rounding and truncation strategies when processing numerical values, resulting in cumulative errors.
The comparative results indicate that the accuracy of FDOA is between 10 −7 and 10 −8 , and the accuracy of TDOA can be maintained between 0.1 us and 0.01 us.The positioning results are in good agreement with the simulated calculations, which satisfies the precision requirements for practical applications, thus validating the correctness and effectiveness of the joint time-frequency difference localization processing platform presented in this paper.

Real-Time Performance Analysis
Since core 0 arranges the tasks of other cores, statistical time performance is based on core 0. The time measurement tools used comprise the timestamp registers (TSCL, TSCH) from the TI library, and the DSP operates with a clock frequency of 1.0 GHz.Table 3 provides the execution time statistics for each module before and after DSP program optimization.

Conclusions
This paper investigates issues related to passive localization methods and hardware platform implementation.First, we elaborate on the signal processing flow of timefrequency difference localization and enhance the PW-ZFFT method by using a frequencydomain approach to calculate the mixing product, thereby reducing the computation load of the CAF.Then, we construct a heterogeneous signal processing platform based on FPGA and DSP, and determine the engineering task division and implementation of the algorithm flow based on the signal processing capabilities and advantages of FPGA and multi-core According to the different sources of observation signals, target detection technology is categorized into active detection technology, which involves the active emission of signals, and passive detection technology, which relies on the passive capture of signals emitted by the target source.Passive detection technology is commonly referred to as an external radiation source radar.The satellite platform detects and processes signals from non-cooperative radiation sources, extracting the location parameters of the radiation source from the received signals.Compared with an active radar, a passive radar exhibits enhanced concealment and safety features [1-3].Passive detection technology has continuously evolved with respect to real-time performance, accuracy, and detection range.The realm of passive detection has given rise to numerous positioning systems and technology types being developed, each offering complementary advantages.One of Remote Sens. 2024, 16, 693 2 of 18

Figure 1 .
Figure 1.Comparison of computational complexity of different methods.

Figure 1 .
Figure 1.Comparison of computational complexity of different methods.

Figure 2 .
Figure 2. Picture of the signal processing platform.

Figure 2 .
Figure 2. Picture of the signal processing platform.

Figure 3 .
Figure 3. Calculation flow of CAF for  with arbitrary delay.

Figure 3 .
Figure 3. Calculation flow of CAF for τ with arbitrary delay.
with weighted filtering, s ) begin their respective multiplication and FFT calculation processes in sequence.When the FFT calculation results of s

Figure 5 .
Figure 5. Pipeline processing flow of  segment data.

Figure 5 .
Figure 5. Pipeline processing flow of M segment data.

Table 1 .
Simulation parameters for typical scenarios in TDOA/FDOA localization.

Table 2 .
Time-frequency difference positioning results.

Table 2 .
Time-frequency difference positioning results.

Table 3 .
Execution time of each calculation module.