A Robust Steered Response Power Localization Method for Wireless Acoustic Sensor Networks in an Outdoor Environment

The localization of outdoor acoustic sources has attracted attention in wireless sensor networks. In this paper, the steered response power (SRP) localization of band-pass signal associated with steering time delay uncertainty and coarser spatial grids is considered. We propose a modified SRP-based source localization method for enhancing the localization robustness in outdoor scenarios. In particular, we derive a sufficient condition dependent on the generalized cross-correlation (GCC) waveform function for robust on-grid source localization and show that the SRP function with GCCs satisfying this condition can suppress the disturbances induced by the grid distance and the uncertain steering time delays. Then a GCC refinement procedure for band-pass GCCs is designed, which uses complex wavelet functions in multiple sub-bands to filter the GCCs and averages the envelopes of the filtered GCCs as the equivalent GCC to match the sufficient condition. Simulation results and field experiments demonstrate the excellent performance of the proposed method against the existing SRP-based methods.

Most methods require a pre-processing stage in which specific modalities are measured from sensor signals before the location-estimating stage. In contrast, the SRP-based approaches locate the source position or direction by maximizing the power of spatially steered filter and sum beamformer of a group of sensors and contain only one decision step in processing sensor signals to estimate location. Without information compression and disturbances resulting from partial mistakes in the front-end stage, the SRP-based solutions can usually yield more robust performance in noisy and reverberant acoustic environments. Practical implementations commonly use the generalized cross-correlation [23]-based form of the SRP function [16] to reduce computation. The methods similar to the GCC-expression of SRP function are also called a "global coherence field (GCF)" in several references [24,25].
In practice, the primary constraint of the SRP-based approaches is the time-consuming on-grid searching procedure for finding their global maximums. Hence, it has been a hot issue to reduce the computational cost for the SRP-based approaches. In [17], a stochastic region construction (SRC) method is proposed to avoid global grid searching. However, this strategy also causes information loss. In [26], a geometrically sampled grid set based on the TDOA gradient is proposed to improve the SRP performances. An alternative strategy to solve the high-cost searching problem is adopting some adaptive SRP functions regarding the grid resolution to apply a coarse or a hierarchical searching. In [27], the authors use the low-frequency component of GCC for coarse grid resolution and the high-frequency component for fine grids in the SRP-based DOA estimation. In [28], the authors adopt a Gaussian low-pass filter to the GCC for coarse grids. For full-band signals, a similar kind of modification is proposed both in microphone arrays [29] and WASNs [18,19], respectively, in which the spatial spectrum of a given grid is calculated from the sum of the phase-transform weighted GCCs (GCC-Phase Transform (PHAT)s) within a time window containing the TDOA values in the volume surrounding the grid, instead of the original GCC-PHAT in the SRP function.
The SRP-based approaches can provide a robust solution in DOA estimation and source localization tasks in confined spaces. However, they could lose their robustness in an outdoor WASN scenario due to the synthetic effect of the following factors. (1) Grid size, since the monitoring area in outdoor cases may become much more extensive than the area of indoor applications, and the proper searching grids would be much coarser (e.g., meter-level grids outdoors compared with centimeter-level grids indoors). (2) Steering time delay uncertainty; in the classical SRP-based localization frame, the steering time delay at a given position is generated from an ideal propagation model and is always assumed to be entirely right. However, the steering time delay to the source position is different from the actual propagation time. Such a difference becomes no more negligible in the outdoor environment and causes a defocus effect, even though the WASN system is well synchronized. (3) Signal passband; when processing the acoustic data collected in outdoor environments, high-pass or band-pass filtering is indispensable because the environmental noise is intense in the low-frequency range, and the source signals in the real world often possess the band-pass characteristic. The synthetic effect of these three factors would make it difficult to achieve stable localization results. The Modified-SRP functional (MSRP) method introduced in [18,19] provides an elegant solution for scalable grids but it is not suitable for band-pass signals. In [21], the authors elaborate on the SRP in band-pass situations and use the GCC-PHAT envelope or frequency-shifted GCC-PHAT to enhance the robustness in such situations. Nevertheless, the above two methods hardly consider the other two factors (the grid and the steering time uncertainty). In [30], the authors propose a Frequency-Sliding GCC (FSGCC) method, which uses singular value decomposition (SVD) or weighted SVD (WSVD) on the FSGCC matrix and can intelligently extract time delay information of the source signal from multiple sub-band GCCs. The authors adopt the WSVD-FSGCC to the MSRP functional for source localization. This solution can provide excellent localization performance in the band-pass situation with scalable grids. However, in outdoor applications, the high computation cost of the SVD of giant matrices is inevitable due to the long GCC range.
Previously, several common acoustic source placements have been proposed in outdoor scenarios. They mostly focus on localizing the source from TDOA [31] and DOA [32,33] measurements. Some uncertainties are then introduced by the estimation error of TDOA or DOA estimating algorithms. Moreover, some useful information is also compressed, which results in unstable performance. In this direction, in this paper, a robust SRP-based outdoor source localization problem is discussed.
In this paper, a modified SRP-based method is proposed, in which the systematic influence of the above inevitable factors in outdoor WASNs scenarios is considered. The localization performance is analyzed using the normalized contribution of the signal components in the SRP function. A sufficient condition dependent on the GCC waveform function for robust on-grid SRP-based source localization is derived by geometrical analysis. The SRP function using GCCs satisfying this condition can suppress the disturbances induced by the grid distance and the uncertain steering time delay. A GCC refinement procedure for band-pass GCCs is then designed, which uses the complex wavelet functions in multiple sub-bands to filter the GCC and averages the envelopes of the filtered GCCs as the equivalent GCC to match the sufficient condition. Simulation results and field experiments demonstrate the excellent performance of the proposed method against the existing SRP-based methods.
The rest of this paper is organized as follows. In Section 2, the outdoor SRP-based source localization problem is formulated. Section 3 gives the sufficient condition in brief and introduces the GCC refinement procedure. The results of the simulation and the field experiment are presented in Section 4. Conclusions are given in Section 5.

System Models
We discuss the acoustic source localization problem in an N-dimensional Euclidean space with M distributed microphones (M > N). Let x ∈ R N be a spatial coordinate vector. Specifically, define x s as the source location and z m as the position of the m th sensor (m = 1, 2, . . . , M). Let s(t) be the source signal in the time domain, and the received signal of the microphone at z m can be modeled as where h m (t) is the impulse response function representing the propagation of sound from x s to z m , the operator " * " represents the convolution operation, w m (t) stands for the additive noise signal, and δ(t − n/F s ) denotes the sampling process at rate F s . When the multi-path delay and non-linear distortion are neglected, the propagation function in the frequency domain can be simplified as where A m ∈ R is the amplitude-attenuation factor and t m is the time delay factor. In the frequency domain Equation (1) can be denoted as where Ω = ω/F s ∈ [−π, π] is the normalized angular frequency, Y m (Ω) is the discretetime Fourier transform (DTFT) of y m [n], S(Ω) and W m (Ω) are the Fourier transforms of s(t) and w m (t), respectively. Let η m (x) ∈ R be the steering time delay function describing the time delay associated with sound propagation from a given location x to z m . In practice, it is commonly modeled as the sound traveling time going through the line-of-sight (LOS) path with a constant sound speed v s ; i.e., where " . " denotes the Euclidean distance. Note that η m (x) is not exactly the sound propagation in reality. Then the SRP function, which is defined as the output power of the filtered-and-sum beam-former, is given by: where G m (Ω)e jΩ m F s η m (x) is the filter associated with the m th sensor. It can be equivalently expressed in term of GCCs [16]: where denotes the GCC of the sensor pair {l, m}, τ is the time lag, superscript "(.) * " represents the conjugate operation, Ψ l,m (Ω) = G l (Ω)G * m (Ω) and denotes the weight function of the associated GCC. Ideally, each R l,m (τ) achieves its peak at τ = t m − t l so that the SRP function is supposed to achieve its maximum value at the source position x s , as shown in Figure 1a,b. The Phase Transform (PHAT) weight function is widely used in the TDOA-and SRP-based localization applications. The PHAT-weighted GCC is generally referred to as the GCC-PHAT, and the SRP using the GCC-PHAT is generally referred to as the SRP-PHAT.
Removing those irrelevant and repetitive terms in Equation (6), the effective component for source localization can be simplified as where p is the sequence number of the valid sensor pair c p = {l, m}(l < m) and is deduced to be p = (2M − l)(l − 1)/2 + m − l, varying from one to a combinatorial number C 2 M ; τ p (x) = η m (x) − η l (x) and can be referred to as the steering TDOA function.

Problem Formulation
The classical SRP-based localization method often lacks robustness in outdoor scenarios. The steering time delay function η m (x) in the SRP function is different from the sound propagation in reality denoted as η 0 m (x), and ∆η m (x) = η m (x) − η 0 m (x) is denoted as the steering time-uncertainty function. Similarly, the steering TDOA-uncertainty functions in a pair of sensors can be expressed as where τ 0 , representing the real steering TDOA function for a given sensor pair c p . This term is usually negligible within a confined space, so it has been rarely discussed in classical SRP models. However, in outdoor applications, the sound propagation is much more unpredictable, resulting in enlarged uncertainty with the increase in distances. The steering time uncertainty can easily be influenced by the geography, temperature, wind, and self-localization error among sensors, and then yields a noticeable defocus effect on the SRP map, as shown in Figure 1c. The GCCs would intersect with each other dispersedly around x s .
Since the spatial spectrum generated by the SRP function contains many local extrema and ridged areas, the maximal value of P(x) is usually found through a grid-searching process. Consider a uniform sampling grid (USG) case in R N . Define X g as the set of grid points in the candidate searching region (V ∈ R N ), and d g ∈ R, N g ∈ R as the grid distance and the total number of the grids in X g , respectively, then the estimated on-grid location is formulated asx Note that the localization precision depends on the gird resolution. A more accurate estimation usually requires a smaller d g . This will leads to a larger N g and significantly increased calculation burden because the number of grids is inversely proportional to the N th power of d g (i.e., N g ∝ (d g ) −N ). Hence, the accuracy and feasibility can hardly be balanced in an outdoor WASN system confronting a large search region, for which the minimal grid resolution limited by computing power is much coarser than that in indoor applications. However, most SRP approaches usually work well at subtle grid resolutions, and coarser grid resolution has an undersampled effect, as shown in Figure 1d. The searching process probably would miss the source peak.
It is known that the background noise always dominates at low frequencies in the field environment, and real sound sources often show band-pass characteristics. Thus a band-pass GCC is indeed required. However, the SRP-PHAT with a band-pass source would cause a rippling effect [21], as shown in Figure 1e. The rippling effect does not alter the location of the maximal value of the SRP function. However, it may lead to local extrema and even fake peaks such that the SRP spectrum is susceptible to the two other factors and shows a lack of robustness.
Under the influence of the synthetic effect of the above inevitable factors, the realworld SRP output is illustrated in Figure 1f. It shows that classical SRP implementations hardly deal with all these factors outdoors and yield a divergent localization result.

On-Grid SRP-Based Localization Error Bound Condition
It is known that the SRP-based spatial spectra mainly depend on the phase information of the source components. It is always reasonable to assume that the additive noise of sensors is independent of each other and the source signal, and then it has no spatial preference (which means that they have zero mean in the phase domain). Their contributions to the SRP spectrum can be neglected and not related to the grid resolution and the steering time uncertainty. Therefore, only the contribution of the source signal is considered in analyzing the SRP function. With the terms of additive noise w m (τ) neglected, the weight functions Ψ p (Ω) of the sensor pair c p usually can be expressed as where B p ∈ R is an amplitude-scaling factor irrelevant to the frequency, and Ψ 0 (Ω) = Ψ 0 (−Ω) ∈ R is a real function irrelevant to sensors. Substituting Equation (12) into Equation (7), the GCC R p (τ) can be rewritten as is the amplitude-normalized version of the weighted self-correlation function of the source signal s(t). Hence, each GCC contains the same waveform function R 0 (τ) with different time-shifting factors τ 0 p (x s ) and amplitude factors B p A l A m /C 0 . In practice, the range information in amplitude is usually less stable or accurate than in time delay. Thus, a normalized mapping function representing the contribution of the source component in the SRP function can be constructed as In the above equation, the amplitude factors B p A l A m /C 0 between different sensor pairs are removed. Thus, each pair yields an equal contribution to the SRP function. Note that F E (x) ∈ [−1, 1] has a definite value range regardless of the sensor number M.
For a given grid distance d g ∈ R >0 , an arbitrary uniform sampling grid set in R N can be expressed as where x o g ∈ R N is the position of the origin of the set. Then the on-grid location estimation is given byx It is worth pointing out that the grid resolution, the steering time uncertainty, and band-pass issues are comprehensively considered in the above-simplified SRP function.
The grid issue should be unrelated to the origin position x o g . In the real world, the uncertainty functions ∆τ p (x) are hard to closely describe due to many interference factors, and it is reasonable to assume that they have an upper bound ∆τ max (i.e., ∆τ p (x) ≤ ∆τ max ). ∆τ max indicates the steering time delay uncertainty level and can be estimated from the environmental and devices' conditions. Thus, the robustness of the on-grid localization problem can be described as: given a d g and a ∆τ max , there exists a ε ∈ (0, ∞) such that Define a level-passed area based on F E (x, x s ): where α ∈ R is the level-pass threshold. Then a sufficient condition can be obtained in the following Proposition: The proof is given in Appendix A.1. Thus, the robustness of the on-grid source localization problem can be analyzed in terms of M(α, x s ).
A practical example of M(α, x s ) is depicted in Figure 2, and its area shrinks inwards when α increases. The first sub-condition (M(α, can be satisfied when M(α, x s ) covers enough areas. The shape of M(α, x s ) relates to α, R 0 (τ), ∆τ p (x), and sensor distribution, and it is generally irregular. Consider a closed ball B N (x 0 , r) x : |x − x 0 | ≤ r; x 0 , x ∈ R N with center x 0 and radius r. If , then the first sub-condition is satisfied.
A valid R 0 (τ) is an even and bounded function (i.e., R 0 (τ) = R 0 (−τ) and R 0 (τ) ∈ [−1, 1]) and contains a main-lobe around τ = 0, where its maximum a m lies. The maximum side-lobe height (or the maximum value outside the main-lobe area if R 0 (τ) has no sidelobes) can be denoted as a s , where a s < a m .
Let us define a function based on R 0 (τ) by where a T ∈ [a S , a M ] is the level-pass threshold of GCC, " inf{.} represents the infimum. T R (a T ) represents the half-width of the level-passed section of R 0 (τ) within its main-lobe. It follows that R 0 (τ) ≥ a T if and only if τ ∈ (−T R (a T ), T R (a T )). Based on a geometrical analysis in Appendix A.3, if R 0 (τ) possesses the following property: . Therefore, the first sub-condition can be satisfied.  For each sensor pair c p , the solution set of the half hyperbolic equation τ p (x) = τ c can be denoted as Λ p (τ c , 0) and extends to infinity (i.e., there exists an x such that x = ∞ and x ∈ Λ p (τ c , 0) ). For two different sensor pairs c i and c j , if there exist a τ c i ∈ −τ max i , τ max i and a τ c j ∈ −τ max j , τ max j such that Λ i τ c i , 0 ⊆ Λ j τ c j , 0 or Λ i τ c i , 0 Λ j τ c j , 0 , then the half hyperbolic functions τ i (x) = τ c i and τ j (x) = τ c j are not independent. The sense might occur when the sensors of these two pairs are co-linear or have the same axis of symmetry; in the meantime, both τ c i and τ c j reach their extremum or become zero. In WASNs, this case rarely happens because the sensor distributions are often irregular. Despite this sense for all sensor pairs, the maximal value of F E (x, x s ) at infinity does not exceed a linear combination of a m and a s , which is given as The detailed derivation can be found in Appendix A.4. If α > α in f , then M(α, x s ) is bounded.
Combining Inequality (23) and Equation (24) together, a sufficient condition for robust on-grid source localization is given by It means that for a given grid distance d g and steering TDOA uncertainties within ∆τ max , if the GCC waveform function R 0 (τ) has a wide main-lobe satisfying this condition, then the divergent on-grid location estimation can be avoided. The SRP-PHAT generates a sharp GCC to increase the TDOA resolution for cases with reverberation or multiple sources. However, as shown in Figure 3, the band-pass effect would bring a narrow main-lobe section and strong side-lobes to the GCC waveform function. It can hardly satisfy the requirement Inequality (25), which is also shown by the poor performance of SRP-PHAT in Figure 1f. Next, we will introduce a GCC waveform refinement procedure for the band-pass SRP.

Robust SRP-Based Source Localization with Refined GCC Waveform
The condition in Inequality (25) is too strict for band-pass GCC situations with coarse grid resolution and perceptible steering TDOA uncertainties. Some classical GCC methods utilized low-pass filtering to meet a broader main-lobe requirement, but they are not applicable for band-pass signals. In this section, the GCC is refined to obtain a suitable waveform to modify the SRP function.
Consider a complex wavelet function ψ e (τ, Ω C ) = u e (τ)e −jΩ C F s τ , where u e (τ) ∈ L 2 (R) is an even symmetrical function. Applying ψ e (τ, Ω C ) as the filtering function on the GCC-PHAT, the filtered output of c p can be denoted as where R PH AT where U e (Ω) is the Fourier Transform of u e (τ), and if the source is dominant in the frequency band [Ω C − Ω B , Ω C + Ω B ] ⊆ (0, π], then the approximation exists. It can be observed that the approximate function carries the same envelope as u e (τ) and extracts the TDOA information in [Ω C − Ω B , Ω C + Ω B ]. Note that the R CF p (τ, Ω C ) is equal to the time domain approach of the sub-band GCC defined in [30]. Since the main goal is to obtain an equivalent GCC to match the sufficient condition in Inequality (25), a lightweight approach is to average the envelope of those filtered GCCs of multiple sub-bands in high SNR conditions. According to the power spectral density (PSD) of source signal or other prior knowledge, N q valid sub-bands can be selected with individual central frequency Ω q . The final refined GCC is given by which has a specific waveform function R 0 (τ) ≈ |u e (τ)|. Furthermore, the improved spatial function is calculated as The selection u e (τ) has a significant influence on the refinement of GCC. Its envelope |u e (τ)| provides the waveform function of refined GCCs. The suitable envelope of a suitable u e (τ) should have no side-lobes, i.e., |u e (τ 1 )| > |u e (τ 2 )| ≥ 0 for all |τ 1 | < |τ 2 |. Meanwhile, each U e Ω − Ω q in the frequency domain serves as a band-pass filter, thus the spectral distribution of U e (Ω) should be concentrated to satisfy Inequality (27). Gaussian function given by which possesses the required properties both in the time domain and in the frequency domain. Then the corresponding complex filtering function ψ e (τ, Ω C ) can be regarded as a complex Morlet wavelet. According to (25), for a given grid distance d g and steering TDOA uncertainty level ∆τ max , the parameter Ω d can be given by where N is the space dimension, α is the threshold value, which usually can be set as α = 0.5. Taking Equation (31) into Inequality (27) and dividing (27) by its right side term, it yields Thus, the relation of Ω d and Ω B can be obtained by the following equivalent equation: where c is an extremely small number. Then, it can be obtained that where c e is the positive solution of the following equation: A simulation is performed to illustrate the effect of the GCC waveform refinement procedure on on-grid SRP-based source localization. As shown in Figure 4, the dot-dashed box shows the range of TDOA within the volume of the nearest gird x g , the dashed line with "∆" shows the real TDOA, which should coincide with the peak of the GCC; the dotted line with "∇" marks R p τ p x g , corresponding to the nearest gird x g . The R p τ p x g of the traditional GCC-PHAT is small, thus leading to poor performance in grid searching. In contrast, the proposed refining method generates a smooth waveform and high values throughout the TDOA region indicated by the box in the figure.  The modified algorithm with the GCC refinement procedure is shown in Algorithm 1, in which u e (τ) = e −(Ω d F s τ) 2 is taken as the target waveform function.

Algorithm 1: SRP with the waveform refinement procedure
Parameter Setting (1) Set the maximum steering TDOA error ∆τ max = ∆τ C max + ∆τ S max , where the sub-items ∆τ C max and ∆τ S max are determined by the wind and the synchronization error of sensors, respectively.
(2) Set the grid distance d g and searching region V that meet the system requirement. Then the searching grid set X g is generated.
(3) Set the waveform function u e (τ) = e −(Ω d F s τ) 2 and α =0.5. (2) Pick up N q highest PSD bands of the source or divide the passband uniformly.

Numerical Simulations
In this section, we use Monte Carlo simulations to analyze the efficiency of the proposed SRP-based localization method (the SRP functional with the refinement waveform, referred to as WR), compared with the traditional SRP functional with GCC-PHAT (PS), the SRP functional-the envelope of GCC-PHAT (PES) that is designed for acoustic band-pass signals [21], the modified-SRP (M-SRP) functional with GCC-PHAT (PM) [18] in which grid resolution is considered, and the M-SRP functional with the envelope of GCC-PHAT (PEM) in which both band-pass and grid resolution are considered.
In We consider four different conditions in WASNs to test the algorithms: (a) a small steering TDOA uncertainty and small grid distance (STSG) condition with ∆τ max = 0.1 ms, d g = 0.1 m, (b) a large steering TDOA uncertainty and small grid distance (LTSG) condition with ∆τ max = 100 ms, d g = 0.1 m, (c) a small steering TDOA uncertainty and large grid distance (STLG) condition with ∆τ max = 0.1 ms, d g = 10 m, (d) a large steering TDOA uncertainty and large grid distance (LTLG) condition with ∆τ max = 100 ms or d g = 10 m.
The mean absolute error (MAE) E{ x s − x s } of distance and the cumulative distribution function (CDF) of estimation errors of relative distance are calculated to evaluate the accuracy and robustness of these algorithms, where the relative distance in the cumulative distribution function (CDF) is normalized by the grid distance, i.e., where e u is the relative positioning error that is determined as the system requirement. Specifically, the 95th percentile of the localization error in meters is computed as F −1 (0.95) · d g . The MAE and 95th percentile results are listed in Table 1. All the localization algorithms can obtain the best estimation accuracy in the STSD condition in which the defocus effect and undersampled effect are slight. When the steering TDOA uncertainty or the grid distance increases, the MAE would increase. However, compared with the PS, PES, PM, and PEM methods, the MAE in the WR has almost the smallest estimate error because all these factors have been considered. The 95th percentile has similar results with the MAE, which indicates that the proposed WR method has a stable localization performance in outdoor conditions. Figure 5a-d depict the CDF of each algorithm in the range e u ∈ [0.5, 100 m/d g ] under the four conditions. Specifically, the CDF curves will increase rapidly with the location error in the fine condition, and then the estimate errors are the smallest for all the algorithms in the STSG. The CDF curve will move down as the grid distance d g and steering TDOA uncertainty ∆τ max increase, such as in the LTSG, STLG, and LTLG. Since the steering TDOA uncertainty is not considered in PES and PEM, their descent range of CDF in the SDLG is lower than that in the LDSG. Among these localization algorithms, the CDF of the WR is the highest or very close to the highest (STLG), and the PEM method is better than the PS, PES, and PM. The proposed WR method is very robust even though the condition becomes abominable. Furthermore, Figure 6 presents the MAE in four situations: (a) fixed small steering TDOA uncertainty (ST) with ∆τ max = 0.1 ms, d g ranges from 0.1 m to 50 m; (b) fixed large steering TDOA uncertainty level (LT) with ∆τ max = 100 ms, d g ranges from 0.1 m to 50 m; (c) fixed small grid distance (SG) with d g = 0.1 m, ∆τ max range from 0.1 ms to 100 ms; (d) fixed large grid distance (LG) with d g = 10 m, ∆τ max range from 0.1 ms to 100 ms. The MAE increases with d g or ∆τ max significantly, and this indicates that the steering TDOA uncertainty and grid distance have a severe influence on the performance of source localization. In each situation, the PS and PM produce larger MAE than the other algorithms when d g and ∆τ max are small because they are not applied to band-pass signals. Since the scalable grid sampling and steering TDOA uncertainty are not considered in the PES, it shows reliable performance only when d g ≤ 1 m and ∆τ max ≤ 1 ms. The PEM considered both grid size and band-pass effect; thus, it achieves the best performance in the small ∆τ max case. However, the MAE becomes worse when the influence caused by the steering TDOA uncertainties is more significant than by the grid size. The WR obtains the MAE close to the PEM when ∆τ max is small. Moreover, it is the smallest in all the other situations. These results abundantly demonstrate its excellent robust performance.

Field Experiment
In this experiment, seven nodes are distributed in a park, as shown in Figure 7a,b. Each node consists of a microphone sensor, a Wi-Fi module, and a GPS module for selflocalization and time calibration. The monitoring area has the same 200 m × 200 m in addition with a hillock. A portable speaker generates the sound signals at 12 positions inside the area, such as the Gaussian signal (S-G), the whistle of vehicles (S-V) representing an urban source, and birdsong (S-B) representing a field source. The temperature was approximately 30°C, and the wind speed is slower than 3 m/s. Therefore, in the proposed method ∆τ max can be set to be 10 ms fully considering the self-localization error of the sensors and the effect of wind.
The sampling frequency is 10,000 Hz and Figure 7c shows the PSDs of both the background noise and received source signals, which are obtained with the Burg method of 50 order number and 2048 FFT length. The PSDs of the source signals are collected at about 30 m away from the speaker. Because the environmental noise is mainly distributed in the frequency bands below 1500 Hz, the passband is set to be (1500 Hz, 3500 Hz) for all sources. The estimated SNRs are shown in Figure 7d, and the SNRs of the full band (0, 5000 Hz) and of the passband (1500 Hz, 3500 Hz) are plotted in solid lines and dashed lines, respectively. For the three source types, the SNR is improved by 20 dB∼30 dB. The recorded data are divided into 1242 two-second audio frames. SRP algorithms with full-band and band-pass cross-correlation (referred to as CSF and CSB) are added to analyze the necessity of band-pass signals. The PS and PM are not included since they have been proven unreliable in the simulation. Then the candidate SRP-based locators compared in this sub-section include: (1) SRP with full-band GCC (CSF), (2) SRP with band-pass GCC (CSB), (3) SRP with the envelope of band-pass GCC-PHAT (PES), (4) MSRP with the envelope of band-pass GCC-PHAT (PEM) and (5) WR-SRP with band-pass GCC (WR). A well known TDOA-based localization method [13] (referred to as TC) is also compared as a reference in which the TDOAs are obtained by band-pass GCC-PHATs.
The MAE and the 95th percentile of the localization errors of the TC method and the SRP-based methods with different grid distances (d g ∈ {0.1, 1, 10} m) are listed in Table 2. Moreover the MAEs with grid distance d g ranging from 0.1 m to 50 m are presented in Figure 8a. Figure 8b-d give the CDF curves at the three grid distances (d g ∈ {0.1, 1, 10} m).
Like the simulation, the MAEs increase and the CDF curves move down as the grid distance increases. The MAE of the TC method is the highest because some sensor pairs might produce very severe TDOA measurements in noisy acoustic environments. Its CDF curve also shows that the solution is not stable. By comparing the result of CSF and CSB, the band-pass GCC can significantly enhance the SNR and the localization performance. The PES and PEM obtain more significant localization errors and lack robustness, which indicates the influence of the steering TDOA uncertainty is very remarkable. The proposed WR method achieves the best estimation for all the grid distances, which thoroughly verifies its effectiveness.

Conclusions
In this work, a novel and robust Steered Response Power (SRP)-based source localization approach is proposed to localize the band-pass source in outdoor WASNs with steering time delay uncertainty and coarser spatial grids. The robustness of on-grid source localization is analyzed by a sufficient condition, in which the relation between GCC signal waveform and on-grid localization error is demonstrated. A band-pass GCC refinement procedure is designed to meet the sufficient condition for enhancing the on-grid source localization performance. The Monte Carlo simulation and field experiment show that the proposed method has a robust performance in outdoor WASNs scenarios, compared with some state-of-the-art SRP-based methods. Data Availability Statement: Publicly available datasets were analyzed in this study. This data can be found here: https://1drv.ms/u/s!AskSoQGpB3VUgfIqsxtYhosVrGyzOg?e=pnfutC.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: Therefore, we can find the grid point x n g = x o g + n o 1 d g , . . . , n o N d g T ∈ X (d g , x o g ), so that x o − x n g = ∆x o 1 − n o 1 d g , . . . , ∆x o N − n o N d g T . The distance yields Thus, if r ≥ √ Nd g /2, then x n g ∈ B N (x o , r). Hence, X (d g , x o g ) ∩ B N (x o , r) = ∅ holds.
Proof of Proposition A4. For a spatial point x such that x = ∞, let K ∈ N be the total number of sensor pairs c p such that x ∈ Λ p τ 0 p (x s ), T R (a s ) . According to Equation (15) and Inequality (22), it follows that If K ≥ C 2 N + 1, there exists a collection of N linear independent sensor pairs from those C 2 N + 1 sensor pairs. Without the loss of generality, denote this collection as {c 1 , . . . , c N }. Then for each x d ∈ N p=1 Λ p τ 0 p (x s ), T R (a s ) , there exists an equation set such that: where τ c N ∈ τ 0 p (x s ) − T R (a s ), τ 0 p (x s ) + T R (a s ) . According to the condition of the Proposition A4 and since the sensor pairs are all linear independent, these N equations are linear independent. Then it holds that x d = ∞ which is in contradiction with x = ∞. Thus K ≤ C 2 N . According to Inequality (A1), it is easily obtain that F E (x, x s ) ≤ (C 2 N a m + (C 2 M − C 2 N )a s )/C 2 M .