Acoustic Source Tracking Based on Probabilistic Data Association and Distributed Cubature Kalman Filtering in Acoustic Sensor Networks

A probabilistic data association-based distributed cubature Kalman filter (PDA-DCKF) method is proposed in this paper, whose performance on tracking single moving sound sources in the distributed acoustic sensor network was verified. In this method, the PDA algorithm is first used to sift the observations from neighboring nodes. Then, the sifted observations are fused to update the state vectors in the CKF. Since nodes in a sensor network have different reliabilities, the final tracking result integrates the estimations from the local nodes, which are weighted with the parameters depending on the mean square error of the estimation and the energy of the received signal. The experimental results illustrated that the proposed PDA-DCKF method is superior to the other DCKF methods in tracking sound sources even under severe noise and reverberant conditions.


Introduction
The problem of acoustic source localization and tracking has always been one of the research hotspots in the field of speech processing. It has been widely used in many aspects, such as audio and video conferencing systems, human-computer interaction and speech enhancement, etc. [1][2][3][4]. Traditional acoustic localization and tracking methods usually require the microphone array to have a regular geometric structure, and generally use a centralized data processing method [5]. With the continuous advancement of technology, some traditional microphone arrays gradually show some deficiencies. The distributed microphone network has attracted more and more research work because it has no strict restrictions on the arrangement of microphones, and is a network composed of multiple nodes arbitrarily distributed in space, usually each node contains a set of microphones [6][7][8][9][10].
So far, there have been many studies on acoustic source localization using distributed microphone networks [11]. But they only locate the acoustic source based on the current observations of multiple microphones, which can locate the acoustic source when the background noise and reverberation are small. In noisy and reverberant environments, spurious observations may even mask observations from real acoustic sources, degrading localization performance. To avoid this problem, a Bayesian filter [12] combined the current observation with a series of past observations for current position estimation, which is more effective for dealing with the adverse effects of noise or reverberation. Theoretically, Bayesian filters describe the tracking problem with a state-space model that includes a dynamic model that describes the motion of the target and an observation model that describes the relationship between the observations and the state of the acoustic source. When the state space model is linear and Gaussian, Kalman filter can replace Bayesian filter. However, in acoustic source tracking scenarios, the observation function is usually nonlinear, and some conditions and properties applicable to linear systems no longer hold, and the performance of the Kalman filter may be severely degraded. This method is generally unreliable, as any failure of the central processor renders the entire network untraceable.
In order to solve the unreliable problem of centralized methods, many distributed methods have been developed for sound source tracking. No central processor is needed in the distributed method, and all nodes realize the estimation of the global state only by exchanging data with their neighbors. In reference [18], a distributed extended Kalman particle filter (DEKPF) for speaker tracking was developed, which combined the current TDOA observations into EKF to propose particle filter. In reference [19], a distributed particle filter (DPF) was proposed, which applied the improved iterative covariance intersection (MICI) algorithm and interactive multiple model (IMM) to speaker tracking in distributed microphone networks. In reference [20], a distributed iterative EKF was proposed to estimate the time-varying speaker position in the microphone array. In reference [21], a Distributed Unscented Kalman Filter (DUKF) is proposed to overcome the nonlinearity of the measurement model in speaker tracking. The time difference of arrival (TDOA) was used as the observation and then the distributed IMM-UKF was used to track the location of the sound source.
In the actual environment, the existence of noise or reverberation usually produces unreliable observations with false peaks, which may lead to serious performance degradation. Usually, the current observations contained in these methods are only extracted from the largest peak value of a certain observation function. In some bad cases, the peak value related to the real acoustic source may be masked by the stray acoustic source. Therefore, it is more reasonable to extract multiple observations from the observation function, rather than one observation, and then incorporate it into the above tracking scheme. Probabilistic Data Association (PDA) [22] is an effective method to combine multiple observations into Kalman filter state update, which has been proved to be suitable for target tracking in clutter environment. In reference [23], an improved distributed unscented Kalman particle filter (DUKPF) was proposed to track a single moving acoustic source using a distributed microphone network in noise and reverberation environments. This method proposed to extract multiple observations from the observation function of each node and combined them into the status update of UKF through probabilistic data association (PDA) technology, so as to generate PDA-UKF, and then brought in particle filter. In reference [24], a microphone array network distributed multi speaker tracking method based on tasteless particle filter and data association was proposed. The available observations were Sensors 2022, 22, 7160 3 of 21 associated with each speaker at each node using data association technology to track the speaker. Reference [25] proposed a volume information filter based on joint probabilistic data association (JPDA) for multi acoustic source tracking based on distributed acoustic vector sensor (AVS) array, in which JPDA was used to deal with the correlation between observations and targets. Issues related to multi-source tracking are beyond the scope of this article. However, most of particle filter-based methods require excessive computational costs, which limits them in real-time applications. Besides, in existing speaker tracking methods, the PDA algorithm is applied to sift the observations without considering the information from neighboring nodes.
Probabilistic data association with cubature Kalman filtering are combined in this paper, and they are applied to the problem of single-acoustic source tracking in noisy and reverberant environments with distributed acoustic sensor networks. The contributions of this paper are as follows:

•
Combining the cubature Kalman filter (CKF) with PDA, the probabilistic data associationcubature Kalman filter (PDA-CKF) was developed. In PDA-CKF, multiple possible observations were merged into the state update of CKF by the PDA technique.

•
In this paper, PDA-CKF was applied to the distributed acoustic sensor network, and the probabilistic data association-distributed cubature Kalman filter (PDA-DCKF) was developed by combining the observation information of each node's neighbor nodes in the network.

•
Considering the reliability of the local state, it was proposed to combine the mean square error (MSE) of the position estimation of each node and the received signal energy to adjust the weighting coefficient of distributed acoustic sensor data fusion. In this way, the local state of high-quality nodes is enhanced, and each node can achieve global consistency and good speaker tracking performance.
The structure of this paper is as follows. Section 2 presents the problem formulation, background knowledge, and some prior knowledge of acoustic source tracking. Section 3 first introduces the single-node PDA-CKF and then details the distributed PDA-DCKF. Section 4 presents the experimental results and discussion. Section 5 summarizes some conclusions.

Problem Formulation
Consider a distributed sensor network with N nodes deployed as shown in Figure 1. The positions of the nodes can be obtained in advance by calibration [26]. Each node in the DMA consists of two microphones at distance L. All nodes of the network are modeled as vertices of the graph G1 = (ε, υ), where υ = {1, 2, . . . , N} is the vertex set, ε ⊂ {(p, q)|p, q ∈ υ} is the edge set, and (p, q) ∈ ε represents the network's communication constraints, i.e., node p can send information to node q, and vice versa. Let N p,k = {q ∈ υ|(p, q) ∈ ε} ∪ {p} denote the set of neighbors of node p at time k, where a node is a neighbor of itself certainly.

Signal Model and TDOA Estimation
In acoustic sensor networks, the discrete-time signal acquired by the th l − microphone ( 1, 2 l = ) of node p can be modeled as [23]

Signal Model and TDOA Estimation
In acoustic sensor networks, the discrete-time signal acquired by the l − th microphone (l = 1, 2) of node p can be modeled as [23] y p,l (t) = h p,l (t) * y(t) + e p,l (t), ∀p ∈ υ (1) where t is the discrete-time index, h p,l (t) is the room impulse response (RIR) between the microphone and the acoustic source, * denotes the convolution operator, y(t) is the source signal, and e p,l (t) is the additive noise. Traditionally, the generalized cross-correlation function (GCC) [27] is used for TDOA estimation. Assuming that Y 1 (k) and Y 2 (k) are the acoustic signal received by a microphone pair at time k and Y l ( f ) = FFT{Y l (k)}, l = 1, 2 is the frequency domain representation of the corresponding acoustic signal in a time frame, the generalized cross-correlation function of the acoustic signal received by the microphone pair is where Y 1 ( f ) and Y 2 ( f ) represent the frequency-domain microphone signals at the node, and * represents the complex conjugation operation. Therefore, the delay estimation is [27] τmax is the largest time delay estimation. However, in the real indoor environment, reverberation and noise will bring false maxima of R 12 (τ) and obtain invalid TDOA estimation. In order to solve this problem, the local largest of the first Q largest peaks of R 12 (τ) are taken as the candidate measurement value of multiple TDOA of node p at time k. In this paper, multiple TDOA observations were extracted through a two-step selection process, taking node p as an example [23].
(1) Select Q delays according to the peak amplitude of the GCC, i.e., where τ (i) p,k is the delay of node p related to the i -th largest peak of R 12 (τ) at time k. (2) Further, select m p,k observations from (4) as local observations, and the selection rules are shown in Section 3.

Dynamic Model of Acoustic Source
Without loss of generality, the two-dimensional tracking is considered herein, since the height of a moving acoustic source would usually not change significantly. Speakers move in a room with a distributed acoustic sensor network, and Langevin model [24] can accurately and simply describe the time-varying position of speakers. At time k, the state of the speaker is defined as y k ) T represent the position and moving speed of the speaker, respectively. In this model, the speaker's motion in the Cartesian coordinate system is considered to be independent and modeled as [23] x k = I 2 a∆T ⊗ I 2 0 a ⊗ I 2 where a = e −β∆T , and b = υ √ 1 − a 2 ; β and υ are the rate constant and the steady velocity parameter, respectively. I s denotes the s-order identical matrix, ⊗ stands for the Kronecker product, ∆T is the sampling period for position estimation, and u k−1 is the zero-mean white Gaussian noise with identity covariance matrix, which describes the uncertainty of the acoustic source motion.

Bayesian Framework for Speaker Tracking
Bayesian filtering is the basis of Kalman filtering. This section briefly reviews the basic principles of the Bayesian filtering algorithm.
Assuming that the state variable at time k is x k ∈ R p and its observation value is y k ∈ R q , where R n represents the n-dimensional real vector space, the state equation and observation equation are expressed as [21]: where f k (·) is the nonlinear state transfer function, h k (·) is the nonlinear observation function, Γ k is the noise transfer matrix, w k is the process noise, and v k is the observation noise, which meets [21] where the superscript T represents the transpose of the matrix, E{·} represents the expected operator, and δ k,l represents the Kronecker delta function. Q k and R k are the covariance matrices of noise w k and v k , respectively, and it is assumed that they are both positive definite. The Bayesian filtering problem is to infer the estimated value of the state variable x k at time k given the observation information y 1:k = {y 1 , . . . , y k } at time k, i.e., to estimate the posterior probability density p(x k |y 1:k ) . Assuming that the initial probability density function p(x 0 ) of the state variable is known as prior knowledge, the posterior probability density p(x k |y 1:k ) can be obtained recursively by the following equations [20]: p(x k |y 1:k−1 ) = p(x k x k−1 )p(x k−1 y 1:k−1 )dx k−1 (10) p(x k |y 1:k ) = p(y k x k )p(x k y 1:k−1 ) p(y k y 1:k−1 ) In Equations (10) and (11), the state transition probability density function p(x k |x k−1 ) is defined by the state equation; the observation likelihood probability density function p(y k |x k ) is defined by the observation equation.

Improved Distributed Cubature Kalman Filter
In the CKF, the observation corresponding to the largest peak of the observation function is used for the state update. This approach works well under moderate acoustic environments, while its performance degrades in severe noise and reverberation conditions because the spurious peaks from noise or reverberation may cover up the peaks from real acoustic sources. To alleviate this problem, multiple observations are selected from the multiple local maxima of the observation function. A general framework for state updates that integrates multiple possible observations is provided by the probabilistic data association (PDA). Inspired by this idea, the probabilistic data association-cubature Kalman filter (PDA-CKF) was derived in this paper. Next, PDA-CKF was used for acoustic source tracking in distributed acoustic sensor networks, and an improved PDA-DCKF algorithm was developed. The observations of multiple nodes in the neighborhood are filtered by PDA and then merged into the state update of CKF to integrate the information of multiple nodes to realize distributed tracking.
The standard Gaussian weighted integral is calculated using the spherical-radial cubature rule, i.e., [28] In Equation (12), f (·) is the nonlinear state transfer function or observation function, n is the dimension of the state variable, N (x; 0, P) is a Gaussian distribution function with a mean of zero and a variance of P, and ξ i is the cubature points.
[1] i represents the point set of n (n-dimensional state) dimensional space, i.e.,

(a) Initialization
When k = 0, assuming x 0 ∼ N (x 0 , P 0 ), the initial value of the process noise and observation noise matrix are set to Q 0 and R 0 , respectively. Then, the optimal initialization of the filter isx p,0|0 = x 0 P p,0|0 = P 0 (16)

(b) State Prediction
For each node p, the state estimate and covariance matrixx p,k−1|k−1 ,P p,k−1|k−1 at time k − 1 are given, and the positive definite noise matrix Q p,k−1 , R p,k−1 are given. Using Equations (13) and (14), the state predicted cubature points χ i p,k−1|k−1 is calculated as: According to the state transition model, the cubature points are propagated nonlinearly, i.e., where n represents the dimension of the state variable, and N represents the number of nodes in the distributed acoustic sensor network. At this time, the state predictionx p,k|k−1 and its error matrixP p,k|k−1 are calculated as: From the estimatedx p,k|k−1 and varianceP p,k|k−1 at time k, the state update cubature points χ i p,k|k−1 is calculated as: Further, the observation predictionẑ p,k|k−1 and the observation prediction error variance P zz p,k|k−1 are, respectively, obtained bŷ Then, according to the probabilistic data association, the verification area of node p can be constructed by [29]: where γ is the gate threshold. Suppose m p,k (m p,k ≥ 0) observations fall into the validated region (27) at time k. Define validate observations z p,k , i.e., Actually, only one of the above observations is related to the real source; the others are due to noise or reverberation, or none of them are related to the real source. Correspondingly, for m p,k validated observations, there maybe be m p,k + 1 possible hypothesis, i.e., According to Equation (29), the equation for calculatingx E{x p,k |H p,j , z p,1:k } is the updated estimate conditioned on the event H p,j , j = 0, 1, . . . , m p,k , and where v p,k , K p,k is the Kalman gain of node p, and where P xz p,k|k−1 is the cross covariance between the state and observation z p,k of node p. Given the innovation v (j) p,k and its covariance P zz p,k|k−1 , the probability β where λ is the spatial probability, P p,D is the probability that the acoustic source is detected by sensor p, and P G is the gate probability. Finally, the state estimate valuex p,k|k and error covarianceP p,k|k can be obtained bŷ where p,k is the probability weighted innovation, and the covariances . P p,k|k and .. P p,k|k are respectively given by [29,30] . ..
To summarize, the pseudo-code of the PDA-CKF method of using the observations from a single node is depicted in Algorithm 1.
The PDA-CKF algorithm makes full use of the observation information of the node itself, which improves the tracking accuracy. However, this algorithm will fail when a node is damaged or the environmental noise and reverberation are severe. Therefore, this paper generalized PDA-CKF to a distributed version that can be used in distributed sensor networks. The improved method was named the probabilistic data associationbased distributed cubature Kalman filter (PDA-DCKF). The specific process is shown in Section 3.2.

PDA-DCKF Algorithm
The neighborhood information of nodes are fused in PDA-DCKF to form local node networks. Then, the local state estimations and error covariances for the local node networks are calculated separately. Finally, the local results are fused to obtain the global state estimation.
On the basis of the above steps, the following is defined: where q represents the neighborhood nodes adjacent to node p, υ = {1, 2, . . . , N} is the vertex set, ε ⊂ {(p, q)|p, q ∈ υ} is the edge set of the distributed acoustic sensor network, num(N p,k ) indicates the number of nodes in the neighborhood of node p. N p,k = {q ∈ υ|(p, q) ∈ ε} ∪ {p} denotes the set of neighbors of node p at time k, where a node is a neighbor of itself certainly. Further, the resulting observations are fused into a matrix. Then, the observed prediction and prediction error variance are, respectively, given bŷ For single node p, v p,k −ẑ p,k|k−1 is the innovation vector related to observation z (j) p,k , and K p,k is the Kalman gain of node p. As far as multiple nodes are concerned, the information of node p and surrounding nodes q is fused to obtain where P xz N p,k|k−1 is the cross covariance between the state and the observed value of node p after fusing the information of neighboring nodes, and K N p,k is the Kalman gain of node p at time k after the fusion.
The probability weighted innovation vector of local nodes is defined as The following is defined as P p,k|k of node p as w p ; when the information of node p and surrounding nodes is fused, the expression of w p is computed as Finally, the state estimatex N p,k|k and the error covarianceP N p,k|k for node p are expressed asx . ..

Fusion Strategy
After calculating the estimation of each local node in the distributed acoustic sensor network, these data need to be fused to obtain a global estimate. Since nodes in a sensor network have different reliabilities, the final tracking result integrates the estimations from the local nodes, which are weighted with the parameters depending on the mean square error of the estimation and the energy of the received signal.

(a) Energy
The energy of the signal received by each node in the acoustic sensor network is calculated [31], and the equation is described as: where x p (t) represents the sound signal received by node p. In practice, analog signal x(t) is converted into digital signal x(n), and x(n) needs to be framed and windowed. Then, the framed signal is donated by x(n) · ω(n). In this paper, the Hamming window was selected for the window function ω(n). Further, the energy of each frame can be obtained by where h(n) = ω 2 (n), and E p,n represents the short-term energy of node p when the window function starts at the n − th point of the signal. The short-term energy can be regarded as the output of the square of the speech signal passing through a linear filter, and the unit impulse response of the linear filter is h(n).

(b) MSE
In Equation (48) wherer N,k represents the global position estimation result weighted with the average consensus coefficients and calculates the MSE between the position obtained by each local node andr N,k , defined as After calculating the energy E p and the mean square error M p of node p at time k, the following is defined: where η p represents the weight of node p during global fusion. A global consistency analysis is performed on the results obtained by each node according to η p , p = 1, 2, . . . , N: To summarize, the PDA-DCKF is depicted in Algorithm 2. The advantages of probabilistic data association and distributed acoustic sensor networks are combined in the PDA-DCKF proposed in this paper. In this method, the PDA algorithm is used to sift the observations from neighboring nodes. Then, the sifted observations are fused to update the state vectors in the CKF. This method not only makes the observation value obtained by each node more accurate, but also makes full use of the information of neighborhood nodes.
Meanwhile, a weighted fusion method based on local node-received signal energy and position estimation mean square error was proposed. This dynamic weighted consistency fusion considers the reliability of the local state of the nodes and provides a good global estimation performance.

Experiments and Results Discussion
To verify the performance of the proposed speaker tracking method, the evaluations are performed in a simulated room environment. Under the same conditions, the comparative experiments between PDA-DCKF and current methods are carried out, including centralized method (CCKF), DUKF, DCKF, iteration based DCKF [20] (DICKF) and DEKF. The results obtained by all methods are the average of 100 Monte Carlo runs.
The root mean square error (RMSE) is used here to evaluate the tracking performance. r k is expressed as the ground truth value of time k, andr N,k represents the global consistency position calculated by the acoustic sensor network at this time. The RMSE is defined as [32] where K denotes the number of frames. Generally, the smaller the RMSE, the better the tracking result.

Simulation Setups
The simulation environment was a typical room of size 6 m × 6 m × 3 m, with an acoustic sensor network of 12 nodes (N = 12). Each node contained a pair of microphones 0.5 m apart. The communication diagram of the distributed acoustic sensor network is shown in Figure 2, where the communication radius is 2.5 m, and each circle represents a node. The simulated trajectory 1 was a line from (0.5,0.8) to (2.5, 2.8), and trajectory 2 was an arc from (1, 2) to (4.86, 2.1), as shown in Figure 3. In different experiments, the speech sampled at the frequency of F s = 16KHZ was used as the acoustic source signal; the speech was a female recording, and the waveform and spectrum of the signal are shown in Figure 4a. The sound speed was c = 342 m/s. The microphone signals were simulated with the Image method [33]. Specifically, different RIRS are generated by virtual sound source method to reflect different reverberation time. These RIRSs were convolved with the speech signal and then added to the Gaussian white noise with a determined mean and covariance to produce a received microphone signal with a mixture of reverberation and noise. The different covariance of Gaussian noise determines the different value of the signal-to-noise ratio (SNR), which reflects different environmental noise conditions. The microphone signal was divided into different signal frames along the sound source track, where the frame length of speech signal was N f = 512 and each signal frame was used for state estimation. Taking node 1 as an example, Figure 4b shows the waveform and spectrum of the speech signal received by the first microphone of node 1. For the observation TDOA, a total of eight time delays were chosen according to the magnitude of the GCC peak. From these delays, further TDOA observations were selected, where the relevant parameters were set as λ = 10, γ = 4, P G = 0.93, and P D = 0.95. The standard deviation of TDOA measurement error was σ = 50 µs. In the acoustic dynamical model, the parameters were β = 10 s −1 and υ = 1 ms −1 . In the average consistency calculation of the global state estimation and its error covariance, the Metropolis weight was used, the number of consistency iterations [34] was N con = 10, and the number of iterations in the iterative CKF was 3.         This paper conducted four experiments to evaluate the tracking performance of PDA-DCKF. In Experiment 1, trajectory 1 was used as the acoustic source trajectory. The initial prior p(x 0 ) of the acoustic source position was set as a Gaussian distribution with mean x 0 = [0.5, 0.8, 0.02, 0.02] T and covariance P 0 = diag([0.05, 0.05, 0.0025, 0.0025]). In experiment 2, the sound source signal and track were the same as experiment 1. Using simple average fusion rules, the influence of fusion rules on PDA-DCKF tracking performance was discussed. Experiment 3 discussed the robustness of the algorithm. The acoustic source and trajectory were the same as the previous two experiments. In Experiment 4, trajectory 2 was used as the acoustic source track to check the tracking results of the acoustic source when the track was nonlinear.

Experiment 1
In this experiment, the tracking performance was evaluated under different ambient and reverberant conditions. First, the impact of environmental noise on tracking performance was investigated. Figure 5 depicts the RMSE results as a function of SNR for a reverberation time of T 60 = 200 ms. In Figure 5, it is observed that the RMSE of all methods decreases with the increase of SNR, which means that the tracking accuracy increases with the increase of SNR. This is because when the SNR becomes larger, the microphone signal is less affected by ambient noise, resulting in better tracking performance. In addition, under the same SNR, PDA-DCKF performs better than traditional distributed Kalman filtering, such as extended Kalman filtering, unscented Kalman filtering, and cubature Kalman filtering. Since only one time-delayed observation of the GCC largest peak is used in traditional methods, peaks associated with real sources may be masked by spurious peaks caused by noise or reverberation, resulting in erroneous state estimates. In contrast, multiple time-difference observations of multiple largest peaks of GCC are employed in PDA-DCKF, resulting in ideal tracking performance. At the same time, compared with DICKF in this experiment, the results show that the effect of PDA-DCKF is better than that of DICKF. Because DICKF is aimed at the DCKF method, and DCKF has problems such as slow response speed and low tracking accuracy. However, the tracking performance and convergence speed of the algorithm can be improved through several local iterations in DICKF. However, still only one time-delay observation of the GCC largest peak is used in DICKF, which also causes it to be inaccurate, but as can be seen from Figure 5, as the SNR increases, the gap between DICKF and PDA-DCKF becomes smaller because the observations are more reliable when the SNR becomes larger. In addition, Figure 5 shows that PDA-DCKF is not as good as CCKF because the observation information of all nodes is used in CCKF, but PDA-DCKF achieved an effect very close to the CCKF effect, and its computational cost and the burden of the network is less than that of CCKF.
reverberation time of 60 200ms T = . In Figure 5, it is observed that the RMSE of all methods decreases with the increase of SNR, which means that the tracking accuracy increases with the increase of SNR. This is because when the SNR becomes larger, the microphone signal is less affected by ambient noise, resulting in better tracking performance. In addition, under the same SNR, PDA-DCKF performs better than traditional distributed Kalman filtering, such as extended Kalman filtering, unscented Kalman filtering, and cubature Kalman filtering. Since only one time-delayed observation of the GCC largest peak is used in traditional methods, peaks associated with real sources may be masked by spurious peaks caused by noise or reverberation, resulting in erroneous state estimates. In contrast, multiple time-difference observations of multiple largest peaks of GCC are employed in PDA-DCKF, resulting in ideal tracking performance. At the same time, compared with DICKF in this experiment, the results show that the effect of PDA-DCKF is better than that of DICKF. Because DICKF is aimed at the DCKF method, and DCKF has problems such as slow response speed and low tracking accuracy. However, the tracking performance and convergence speed of the algorithm can be improved through several local iterations in DICKF. However, still only one time-delay observation of the GCC largest peak is used in DICKF, which also causes it to be inaccurate, but as can be seen from Figure 5, as the SNR increases, the gap between DICKF and PDA-DCKF becomes smaller because the observations are more reliable when the SNR becomes larger. In addition, Figure 5 shows that PDA-DCKF is not as good as CCKF because the observation information of all nodes is used in CCKF, but PDA-DCKF achieved an effect very close to the CCKF effect, and its computational cost and the burden of the network is less than that of CCKF. The effect of reverberation on tracking performance was also studied in this paper. The effect of reverberation on tracking performance was also studied in this paper. Figure 6 depicts the RMSE results as a function of T 60 with SNR = 20 dB. From the results, we can observe that the RMSEs of all the methods increased as T 60 became larger, which signifies the degradation of the tracking accuracies. This may be because the microphone signal is more affected by reverberation as T 60 becomes larger, the time difference observations extracted from only the largest peak or multiple largest peaks are not reliable, and the tracking performance of these methods deteriorates. In addition, it can be found from Figure 6 that the tracking performance of PDA-DCKF is better than DEKF, DUKF, DCKF, and DICKF. In fact, in traditional methods, the time-difference observations included in the scheme are only extracted from the largest peak of the GCC, while the peaks associated with the true hypocenter may be masked by false peaks caused by reverberation. In contrast, PDA-DCKF incorporates TDOA observations of multiple largest peaks of GCC into the scheme, which can alleviate the adverse effects of reverberation to a certain extent. Furthermore, the effect is not as good as CCKF showed in Figure 6, but it also achieves a very close effect.

Experiment 2
The effect of the fusion strategy proposed in this paper on the results is discussed in Experiment 2. When PDA-DCKF adopts a simple average fusion rule, it is called PDA-DCKF-avg. In this section, different SNR and different reverberations are used to test the effectiveness of the fusion strategy. The experimental results are shown in Figures 7 and 8. DCKF, and DICKF. In fact, in traditional methods, the time-difference observations included in the scheme are only extracted from the largest peak of the GCC, while the peaks associated with the true hypocenter may be masked by false peaks caused by reverberation. In contrast, PDA-DCKF incorporates TDOA observations of multiple largest peaks of GCC into the scheme, which can alleviate the adverse effects of reverberation to a certain extent. Furthermore, the effect is not as good as CCKF showed in Figure 6, but it also achieves a very close effect.    As depicted in Figure 7, with the increase of the SNR, the RMSEs for the PDA-DCKF methods with both these two fusion strategies decrease, but the proposed one is more effective. Figure 8 also shows that, with the increase of the reverberation time, the error also increases. In addition, only under 50 ms reverberation, the error of the average fusion strategy is smaller than that proposed in this paper, and the fusion strategy proposed in this paper was better than the average fusion effect under 100-600 ms. Comparing Figures  5-8, it can be found that, even if the average fusion strategy is used, the PDA-DCKF in this paper is still smaller than the error obtained by the above comparison test, which further proves the effectiveness of the method in this paper.

Experiment 3
In practical applications, a network may be damaged by nodes, and when a node in a network is damaged, whether the network can still work normally will test the robust-  As depicted in Figure 7, with the increase of the SNR, the RMSEs for the PDA-DCKF methods with both these two fusion strategies decrease, but the proposed one is more effective. Figure 8 also shows that, with the increase of the reverberation time, the error also increases. In addition, only under 50 ms reverberation, the error of the average fusion strategy is smaller than that proposed in this paper, and the fusion strategy proposed in this paper was better than the average fusion effect under 100-600 ms. Comparing Figures 5-8, it can be found that, even if the average fusion strategy is used, the PDA-DCKF in this paper is still smaller than the error obtained by the above comparison test, which further proves the effectiveness of the method in this paper.

Experiment 3
In practical applications, a network may be damaged by nodes, and when a node in a network is damaged, whether the network can still work normally will test the robustness of the system. In this subsection, the node damage in the distributed acoustic sensor network is simulated, and the tracking results of the acoustic source after the damage are compared with those before the damage. When node 1 in the network is damaged, it is called graph G2, as shown in Figure 9a. When node 1 and node 6 in the acoustic sensor network are damaged, it is called graph G3, as shown in Figure 9b. The experimental results are shown in Tables 1 and 2. this paper was better than the average fusion effect under 100-600 ms. Comparing Figures  5-8, it can be found that, even if the average fusion strategy is used, the PDA-DCKF in this paper is still smaller than the error obtained by the above comparison test, which further proves the effectiveness of the method in this paper.

Experiment 3
In practical applications, a network may be damaged by nodes, and when a node in a network is damaged, whether the network can still work normally will test the robustness of the system. In this subsection, the node damage in the distributed acoustic sensor network is simulated, and the tracking results of the acoustic source after the damage are compared with those before the damage. When node 1 in the network is damaged, it is called graph 2  , as shown in Figure 9a. When node 1 and node 6 in the acoustic sensor network are damaged, it is called graph 3  , as shown in Figure 9b. The experimental results are shown in Tables 1 and 2.    It can be seen from Tables 1 and 2 that the acoustic source can still be tracked in the case of node damage. Although the accuracy has decreased, the amplitude of the drop is not large and the acoustic source can still be tracked accurately. This can prove that the method proposed in this paper has good robustness under this network.

Experiment 4
In order to further verify the effectiveness of the algorithm in this paper, the semicircle of trajectory 2 was used as the acoustic source trajectory, and comparative experiments were carried out under different SNR and reverberation. The experimental data are shown in Tables 3 and 4. Figure 10 shows the tracking results with SNR = 15 dB and T 60 = 400 ms.  From the above Tables 3 and 4 and Figure 10, it can be seen that the algorithm in this paper can still accurately track the sound source in the face of such a strong nonlinear trajectory.

Conclusions
An improved PDA-DCKF method was proposed in this paper, which proved to be able to solve the problem of tracking a single mobile acoustic source with distributed acoustic sensor networks in the noise and reverberation environment. First, in order to reduce the adverse effects of noise and reverberation, the prediction value of observation is obtained by using the prediction state and the observation model of distributed nodes. From the above Tables 3 and 4 and Figure 10, it can be seen that the algorithm in this paper can still accurately track the sound source in the face of such a strong nonlinear trajectory.

Conclusions
An improved PDA-DCKF method was proposed in this paper, which proved to be able to solve the problem of tracking a single mobile acoustic source with distributed acoustic sensor networks in the noise and reverberation environment. First, in order to reduce the adverse effects of noise and reverberation, the prediction value of observation is obtained by using the prediction state and the observation model of distributed nodes. Secondly, the actual observations are screened according to the predicted value. Multiple TDOA observations are extracted at each node and incorporated into the status update of CKF through PDA to generate PDA-CKF. PDA-CKF was applied to distributed acoustic sensor networks, and PDA-DCKF was further developed. In PDA-DCKF, the PDA algorithm is first used to sift the observations from neighboring nodes. Then, the sifted observations are fused to update the state vectors in the CKF. Each node runs PDA-DCKF for local state estimation and TDOA observation. Then, a new fusion strategy is proposed using energy and MSE to merge all single local estimates in a distributed manner for global state estimation. In order to apply the improved PDA-DCKF to the acoustic source tracking problem, the Langevin model was used to model the acoustic source dynamics, and a method to extract the time difference observation was proposed. Finally, a distributed acoustic source tracking framework was obtained. In order to evaluate the effectiveness of PDA-DCKF in acoustic source tracking, comparative experiments were carried out with existing methods (DCKF, DUKF, DEKF, and DICKF) under different ambient noise and reverberation conditions. The results show that the PDA-DCKF has better tracking performance than DCKF, DUKF, DEKF, and DICKF under most noise and reverberation conditions. In addition, the PDA-DCKF achieved the same tracking performance as the centralized CKF. Furthermore, it can even track the acoustic source stably in the case of node damage.