An Outlier Detection Method Based on Mahalanobis Distance for Source Localization

This paper addresses the problem of localization accuracy degradation caused by outliers of the angle of arrival (AOA). The problem of outlier detection of the AOA is converted into the detection of the estimated source position sets, which are obtained by the proposed division and greedy replacement method. The Mahalanobis distance based on robust mean and covariance matrix estimation method is then introduced to identify the outliers from the position sets. Finally, the weighted least squares method based on the reliable probabilities and distances is proposed for source localization. The simulation and experimental results show that the proposed method outperforms representative methods when unreliable AOAs are present.


Introduction
The source localization techniques based on the angle of arrival (AOA) estimate the target position using a set of estimated bearings. Various methods have been proposed to solve the localization problem [1][2][3], among which the closed-form method of the pseudolinear estimator (PLE) was proposed under the assumption that AOA errors are small [4]. Although the PLE method is easy to implement and is efficient to compute, it is sensitive to outliers (AOAs with large errors), and such outliers (also referred to as unreliable AOAs) may exist in many practical applications of distributed node networks, which typically consist of a large number of small, low-cost sensor nodes. When nodes are deployed in harsh and unattended environments, animal attack or other forms of interference may occur. Moreover, low-cost nodes have limited amounts of power, computational, and memory capacity, and these limitations may also cause outliers. Other factors, such as node failures, data loss, and non-line of sight (NLOS) propagation [5,6], can lead to unreliable measurements. As a result, the estimated AOAs at each node will deviate significantly from the true values. Such outliers have been found to be detrimental to the PLE [7,8]. Thus, it is important to identify these erroneous data to improve the localization performance or perform a repair of the data.
To reduce the error induced by outliers in node networks, several hybrid localization methods have been proposed by combining the AOA with the time difference of arrival (TDOA) and received signal strength (RSS) to identify and mitigate the NLOS error [9]. The expectation maximization (EM) method is introduced to identify unreliable AOAs caused by NLOS [10]. The intersection points (IPs)-based method [11] calculates the source position by taking the centroid of the set of intersections obtained by pairs of bearing lines; however, this method cannot significantly improve the localization performance, even eliminating the IPs obtained by two bearing lines close to parallel. The proposed unreliable AOA detection method in [7] can improve the localization accuracy; however, many threshold parameters need to be set. The steered-response power phase transform (SRP-PHAT) [12] source localization approaches have demonstrated robustness when operating in reverberant and noisy environments. Regardless, these methods require a considerably higher amount of information to be transmitted to the central processing node and cannot be applicable to large-range localization scenarios (e.g., hundreds or thousands of meters). In this work, every node is equipped with a microphone array to estimate the AOA and then transmits the estimated AOA to the central node. This method does not require time synchronization in different nodes. Note that AOA estimation methods under an environment with complex environmental noise are outside the scope of this paper; interested readers are referred to [13][14][15]. These robust AOA estimation methods are proposed under the assumption that only small portions of snapshots are contaminated; they can perform well for continuous source signals or impulsive interference noise, which only have influence on limited snapshots. However, with other causes that can last a period of time, such as sensor failures and non-line of sight (NLOS) propagation, all snapshots for one node are unreliable; thus, outliers that may deteriorate the localization performance are still present even when these methods are applied to estimate AOAs under complex noise. Therefore, the outlier detection for the AOA is still necessary to improve the localization accuracy.
Here, we propose a robust localization method when outliers are present. A large number of positions can be obtained by different node combinations. The maximum number of estimated positions is N × (N − 1)/2 for an N-node network. However, the estimated positions are sensitive to the bearing lines and their differences. Deleting the outliers from all intersections alone cannot significantly improve the localization performance [11]. To increase the estimated position reliability and improve the detection accuracy, we propose the division and greedy replacement (DIG) method to obtain different estimated position sets by changing one node at one time. The robust estimation method of the mean and covariance matrix for estimated position sets is then addressed to provide the information for outlier detection. The Mahalanobis distance (MD) [16] is finally proposed to identify the outliers from the estimation position sets. Finally, the weighted least squares (WLS) method based on detected reliable probabilities and distances is used to estimate the source position. The proposed method is easy to implement and can be easily extended to a three-dimensional (3D) source localization method. The main contributions of this paper can be summarized as follows: • The division and greedy replacement (DIG) method is developed to estimate the target positions.

•
The Mahalanobis distance based on robust estimation of mean and covariance matrix is proposed to detect the outliers from estimated source positions. • An improved WLS localization method based on reliable probabilities and distances is introduced.

•
Outdoor experiments are conducted to verify the proposed method.
The remainder of this paper is organized as follows. Section 2 describes the AOA localization method and addresses the existing problem. The unreliable node detection method is proposed in Section 3. Simulations and experimental results are presented in Sections 4 and 5, respectively. Finally, our work is summarized in Section 6.
where k r is the distance between the source and node k s .

Problem Formulation
The PLE is easy to implement, even for large-scale data. However, the PLE is sensitive to unreliable measurements (i.e., outliers). In this section, we use theoretical analysis and simulation results to illustrate this problem.
If the measurement error is sufficiently small, then we have sin k k η η ≈ . Thus, the approximation of the residuals of Equation (6) can be expressed as k k k e rη ≈ . The estimated error of the source position can be expressed as follows [8]: The set of measurements from N nodes can be written as follows: the pseudolinear estimator (PLE), also known as the orthogonal vectors (OV) estimator, can be used to estimate the source position and is given by the following [4]: where the estimated source position is as follows: where the k-th row of matrix A and B is A(k, :) = [ sinθ k cosθ k ], B(k, :) = x k sinθ k − y k cosθ k , k = 1, 2, . . . , N, and e = [ r 1 sin η 1 r 2 sin η 2 . . .
where r k is the distance between the source and node s k .

Problem Formulation
The PLE is easy to implement, even for large-scale data. However, the PLE is sensitive to unreliable measurements (i.e., outliers). In this section, we use theoretical analysis and simulation results to illustrate this problem.
If the measurement error is sufficiently small, then we have sin η k ≈ η k . Thus, the approximation of the residuals of Equation (6) can be expressed as e k ≈ r k η k . The estimated error of the source position can be expressed as follows [8]: Sensors 2018, 18, 0 4 of 17 The covariance matrix of Equation (7) can be obtained by the following: Thus, the mean-square error (MSE) is given by the following: Submitting Equations (6) and (7) into Equation (10), we have the following: where S is defined as all the combinations if {i, j} with j > i. f 11 = ∑ i∈s sin 2 θ i , f 22 = ∑ i∈s cos 2 θ i and cos θ i sin θ i and σ 2 e i = E(ee T ). We can see from Equation (10) that the MSE is affected by the relative geometry between the source and the nodes, the number of nodes, and the AOA measurement errors. To illustrate the effect of outliers, we conducted several simulations to analyze the characteristics of the localization error for different source positions. The source is assumed to be located at the gridded points, ranging from −10 m to 10 m in a 20 × 20 m 2 grid with a resolution of 0.5 m. Four nodes-s 1 , s 2 , s 3 , and s 4 -are randomly deployed in the test area, as shown in Figure 1. The root-mean-square error (RMSE) of the PLE of 500 trials for every target position is used as the performance metrics.
The RMSEs for different source positions are shown in Figure 2a when It is clear that the localization errors are relatively lower when the source is surrounded by the nodes compared to the outside source. The conclusion follows the analysis based on the Cramer-Rao lower bound (CRLB) in [17]. The covariance matrix of Equation (7) can be obtained by the following: Thus, the mean-square error (MSE) is given by the following: [ ] tr cov( ) MSE = Δ p . (9) Submitting Equations (6) and (7) into Equation (10) We can see from Equation (10) that the MSE is affected by the relative geometry between the source and the nodes, the number of nodes, and the AOA measurement errors. To illustrate the effect of outliers, we conducted several simulations to analyze the characteristics of the localization error for different source positions. The source is assumed to be located at the gridded points, ranging from −10 m to 10 m in a 20 × 20 m 2 grid with a resolution of 0.5 m. Four nodes- 1 2 3 , , s s s , and . It is clear that the localization errors are relatively lower when the source is surrounded by the nodes compared to the outside source. The conclusion follows the analysis based on the Cramer-Rao lower bound (CRLB) in [17].   Assume that the unreliable node 1 s is subject to a large noise with zero means 1 10 σ =° and  Figure 2b. When the source is close to the unreliable node, the localization accuracy is not significantly deteriorated. However, the RMSEs are significantly increased when the source is far from the unreliable nodes. From Equation (10), we can  Assume that the unreliable node s 1 is subject to a large noise with zero means σ 1 = 10 • and σ 2 = σ 3 = σ 4 = 1 • . The resulting RMSEs are plotted in Figure 2b. When the source is close to the unreliable node, the localization accuracy is not significantly deteriorated. However, the RMSEs are significantly increased when the source is far from the unreliable nodes. From Equation (10), we can see that for the same AOA estimation error σ i , the MSE is mainly influenced by the distance r k between the source and the node s k .
To demonstrate the importance of detecting unreliable nodes, the RMSEs of the estimated positions obtained from the four nodes with one being unreliable are compared with the RMSEs when only three reliable observations are used for the source located at p = [3, 0] T m. As shown in Table 1, the localization errors obtained using only three reliable nodes are significantly lower than those obtained with four nodes, one of which is unreliable. Therefore, it is necessary to detect the unreliable nodes and then remove them to improve the localization accuracy.

The DIG_MD Method
We know that at least two nonparallel bearing lines are required to estimate an IP, and the maximum number of IPs for N nodes is N(N − 1)/2. Regardless of the parallel cases of two bearing lines, all IPs are expected to be close to each other and surround the true source position when they are only subjected to low-level environment noise. In contrast, the bearings corrupted by large noise will lead the IPs to be far from the source position. As shown in Figure 3, the IPs obtained from s 1 are obviously far from the other IPs. Therefore, we can identify the unreliable AOAs by detecting outliers from the estimated target positions. However, there are too many intersections to calculate for large-scale node networks if only two bearings are used. Moreover, the IPs are also easily affected by the errors of either one and by the angular distance [11]. For example, it is easy to cause a false alarm if s 6 is determined to be unreliable when p 16 and p 36 are detected as outliers. To solve these problems, the division and greedy replacement (DIG) method is proposed here to improve the stability of the estimated positions. The two-dimensional (2D) outlier detection method is then used to find the unreliable bearings. Finally, the WLS based on detected reliable probabilities and distances from initial position to nodes is used to perform the localization. The procedure is given as follows: see that for the same AOA estimation error i σ , the MSE is mainly influenced by the distance k r between the source and the node k s .
To demonstrate the importance of detecting unreliable nodes, the RMSEs of the estimated positions obtained from the four nodes with one being unreliable are compared with the RMSEs when only three reliable observations are used for the source located at [3,0] Τ = p m. As shown in Table 1, the localization errors obtained using only three reliable nodes are significantly lower than those obtained with four nodes, one of which is unreliable. Therefore, it is necessary to detect the unreliable nodes and then remove them to improve the localization accuracy.

The DIG_MD Method
We know that at least two nonparallel bearing lines are required to estimate an IP, and the maximum number of IPs for N nodes is ( 1 ) . Regardless of the parallel cases of two bearing lines, all IPs are expected to be close to each other and surround the true source position when they are only subjected to low-level environment noise. In contrast, the bearings corrupted by large noise will lead the IPs to be far from the source position. As shown in Figure 3, the IPs obtained from 1 s are obviously far from the other IPs. Therefore, we can identify the unreliable AOAs by detecting outliers from the estimated target positions. However, there are too many intersections to calculate for large-scale node networks if only two bearings are used. Moreover, the IPs are also easily affected by the errors of either one and by the angular distance [11]. For example, it is easy to cause a false alarm if 6 s is determined to be unreliable when 16 p and 36 p are detected as outliers. To solve these problems, the division and greedy replacement (DIG) method is proposed here to improve the stability of the estimated positions. The two-dimensional (2D) outlier detection method is then used to find the unreliable bearings. Finally, the WLS based on detected reliable probabilities and distances from initial position to nodes is used to perform the localization. The procedure is given as follows:

The Division and Greedy Replacement (DIG) Method
In order to detect outliers from the AOAs, based on estimated source positions, a set of position estimations are needed, which should be calculated by a fixed number of nodes only with one independent variable. Thus, every estimated position corresponds to the unique different node. In this paper, we propose to divide all nodes into two sets, and the greedy replacement is then used to obtain different combinations of a fixed number of nodes with one difference.

The Division and Greedy Replacement (DIG) Method
In order to detect outliers from the AOAs, based on estimated source positions, a set of position estimations are needed, which should be calculated by a fixed number of nodes only with one independent variable. Thus, every estimated position corresponds to the unique different node. In this paper, we propose to divide all nodes into two sets, and the greedy replacement is then used to obtain different combinations of a fixed number of nodes with one difference.
(1) Division: In this section, the two separated set are defined as the reference node set (Ω re f ) and the replacement node set (Ω rep ), with sizes m and N − m, respectively. Here, to provide an easier explanation, we assume that the reference nodes are indexed from 1 to m. Thus, Ω re f and Ω rep can be denoted as Ω re f = {s 1 , s 2 , . . . , s m } (m ≥ 3) and Ω rep = {s m+1 , s m+2 , . . . , s N }, respectively. Algorithm 1 presents the selection method of reference nodes: Algorithm 1. Selection method of reference nodes.
(1) Estimate the initial source position p by the PLE based on all measurements; (2) Calculate the distances from p to all nodes; (3) Select m nodes that have short distances and can form a convex polygon with the target inside.
The performance analysis in [17] shows that the nodes that are close to the target are dominant in the localization results and that the localization error for the target inside a convex polygon composed of multiple nodes is smaller than that of an outside one. So, we propose to use the nodes that can comprise a convex polygon with the target inside and have short distances to the source as reference nodes, as shown in step 3; thus, no fewer than three nodes should be selected as the reference nodes. As the true position is unknown, an initial obtained from all the measurements can be used to evaluate the distances stated as step 1 and step 2. As shown in Figure 3, p is the initial position calculated by six measurements; s 1 , s 2 , and s 3 are closest to p , and p is inside the convex polygon formed by the three nodes. Thus, s 1 , s 2 , and s 3 are selected as reference nodes (i.e., Ω re f = {s 1 , s 2 , s 3 } when m = 3). The detail of the division method can be summarized as follows.
To identify the unreliable bearings by detecting outliers from a set of estimated positions, we obtain the positions by changing only one node at a time. As noted above, the localization error is sensitive to the bearing error of the nodes relatively far from the source. Thus, we design the greedy replacement method by using each node in Ω rep to replace one of those in Ω re f . The procedure is given by Algorithm 2. Every node in Ω re f is replaced by (N − m) nodes from Ω rep . Next, m sets, including (N − m) positions in each set, can be obtained, and every point is calculated by m nodes with (m − 1) same nodes from Ω re f . For this method, the position sets can be calculated with cost (−m 2 + mN). In contrast, the cost is [N(N − 1)/2] if all IPs are estimated. In general, the DIG method is computationally simpler than the IP method.
In this paper, X ∪ Y and X − Y denote the union and difference of sets X and Y, respectively; A(Ω) represents the matrix A in Equation (4) calculated based on the nodes from set Ω; and P k (j, :) is the j-th row of P k . Each element p k,j of P k is the estimated position using s j , j ∈ {m + 1, . . . , N} to replace s k , k ∈ {1, . . . , m}.

Outlier Detection Method for Estimated Target Position Sets
All the position elements in P k = [p k,m+1 , p k,m+2 , . . . , p k,n ] T , k = 1, 2, . . . , m should be close to each other under the assumption that all the nodes are reliable. The outlier positions should be obtained from the unreliable nodes. For the 2D source localization problem, the elements in P k are identically distributed 2D random vectors with mean µ k and a positive-definite covariance matrix Σ k . To identify Sensors 2018, 18, 0 7 of 17 the unreliable nodes in set Ω rep , the square of the Mahalanobis distance (MD) [18][19][20], which can be formulated as in Equation (11), is proposed to detect outliers from the position matrix in P k as follows: In the field of data statistics, MD is typically used to characterize how far a particular datum is from the center. A point with a distance greater than a predetermined threshold is assumed to be an outlier. The outlier detection problem in this work is a 2D data detection problem. Therefore, the robust estimated method of Σ k and µ k is important for robust outlier detection. The outlier detection method for one position set P k is given by Algorithm 3.
To better explain Algorithm 3, let us recall the Gnanadesikan Kettenring (GK) estimator first [18], which provides a reasonable relationship between variance and covariance. Assume that V is the covariance matrix of L-dimensional random vector x and σ(·) represents the standard deviation; thus, we have for all c ∈ R L . The GK estimator can be formulated as the following: where x and y are a pair of random vectors.
Algorithm 3. Outlier detection method from P k .
Step 6. If d s k,j > d k0 , p k,j is an outlier. Thus, the unreliable probability for s j is 1/m; otherwise, it is 0.
In Algorithm 3, med(·) represents the median value, χ 2 p (α) is the α-quantile of the chi-squared distribution with p degrees of freedom, diag(·) is the diagonal matrix, and σ(·) and µ(·) denote the univariate standard deviation and average value, respectively. c 1 and c 2 are a constant. Σ k and µ k are the estimations of Σ k and µ k .
Steps 1-4 in Algorithm 3 provide a method to obtain the positive-definite and approximately equal-variant covariance matrix Σ k for high-dimensional scatter datasets with much shorter computing times [19]. The first step in Algorithm 3 makes the position vector scale-equivariant for different dimensions. Then, the GK estimator is used to calculate the covariance matrix Ψ in step 2. However, Ψ is symmetric but not necessarily positive semidefinite, it cannot satisfy the requirement of positive definiteness of Σ k [20]. Considering the fact that, the eigenvalues of a covariance matrix can be seen as the variances along the directions of respective eigenvectors, the eigenvalue decomposition is performed to find eigenvalues and eigenvectors in step 3. A modification is then made in step 4 by using the positive robust variances calculated by Equation (12) to replace the eigenvalues, which may be negative [21], to obtain the positive diagonal covariance matrix Γ. Then, Γ is used to estimate the positive-definite covariance matrix Σ k instead of Λ. It has been proven in [22] that there exist constant c 1 and c 2 , such that the true Σ k and µ k can be approximated by the estimations Σ k and µ k , that is Σ k ← c 1 Σ k , and µ k ← c 2 µ k . For the classical fast minimum covariance determinant (FASTMCD) method [23], c 1 is defined as follows: and c 2 = 1. Once µ k and Σ k are obtained in step 4, the MD for every position vector can be calculated according to Equation (11), which can be rewritten as follows: Thus, the outliers are identified by comparing the squared MDs with the defined threshold d 2 k0 obtained in step 5. The choice of the threshold is based on the fact that, when the position matrix P k ∼ N(µ k , Σ k ), the squared MD d 2 k,j is distributed as a χ 2 random variance with 2 degrees of freedom [22].
To reduce the false-alarm probability, we set an unreliable probability to every node in step 6. If p k,j is detected as outliers, then the unreliable probability of s j is set to be q k,j = 1/m; otherwise, q k,j = 0. After m position matrices are evaluated, the unreliable probability for every node in S rep can be obtained by q j = m ∑ k=1 q k,j . Thus, the unreliable probabilities for the nodes from S rep have been determined. To identify the unreliable nodes in S re f , the detection method is repeated with different reference nodes, which are selected from the set of S rep that have been identified as reliable.

WLS Based on Reliable Probability and Distance
When the unreliable probabilities for all nodes are determined, the WLS method with reliable probability q i and distancer i from the initial position to node s i , i = 1, . . . , N is applied to perform localization and can be formulated as follows: where W = diag(w 1 /r i , . . . , w N /r N ). (17) w i = 1 − q i . The procedure for the proposed localization method is given by Algorithm 4.

Simulations
In this section, we compare the performance of the proposed method, DIG_WD, with that of the PLE, the WLS-based distance method denoted as WLS (i.e., the reliable probabilities for all nodes are 1), and the EM-based method [10] through a series of computer simulations.
We assume that N nodes are placed uniformly in an L × L m 2 test area with a resolution of ∆ x and ∆ y along the horizontal and vertical directions, respectively. Each node is equipped with a microphone array to estimate the AOA of the target, and 1000 Monte Carlo simulations are conducted for every case based on the parameters L = 250, ∆ x = ∆ y = 50, and α = 0.95. Next, u randomly selected nodes are assumed to be subject to large noise or interference, and their standard deviation of the estimated bearing error is set to be σ 2 ; moreover, those of the remaining "reliable" nodes are set to be σ 1 , σ 2 σ 1 . The initial positions for WLS, EM, and DIG_MD are obtained by the PLE.
For comparison purposes, we also apply the detection method, Algorithm 3, to identify the outliers from all IPs calculated by every two bearing lines. Instead of calculating the mean of IPs as the source position, the WLS estimator based on reliable probabilities and distance is also used to determine the position. When t (t ≤ n − 1) the IPs can be obtained on the bearing line extending from s i , i = 1, 2, . . . , n; the unreliable probabilities of s i is q/t if q IPs included in the t points are identified as outliers from all IPs. Next, the WLS based on Equation (16) is used to find the source position; this method is defined as the IP_WLS method. Furthermore, the center of all IPs after excluding all detected outliers is defined as CIP. PLE, the WLS-based distance method denoted as WLS (i.e., the reliable probabilities for all nodes are 1), and the EM-based method [10] through a series of computer simulations.
We assume that N nodes are placed uniformly in an L × L m 2 test area with a resolution of x  and y  along the horizontal and vertical directions, respectively. Each node is equipped with a microphone array to estimate the AOA of the target, and 1000 Monte Carlo simulations are conducted for every case based on the parameters 250 L  , 50 xy     , and 0.95   . Next, u randomly selected nodes are assumed to be subject to large noise or interference, and their standard deviation of the estimated bearing error is set to be 2  ; moreover, those of the remaining "reliable" nodes are set to be 1  , 21  . The initial positions for WLS, EM, and DIG_MD are obtained by the PLE.
For comparison purposes, we also apply the detection method, Algorithm 3, to identify the outliers from all IPs calculated by every two bearing lines. Instead of calculating the mean of IPs as the source position, the WLS estimator based on reliable probabilities and distance is also used to determine the position. When t ( 1) tn  the IPs can be obtained on the bearing line extending from i s , i = 1, 2, …, n; the unreliable probabilities of i s is qt if q IPs included in the t points are identified as outliers from all IPs. Next, the WLS based on Equation (16) is used to find the source position; this method is defined as the IP_WLS method. Furthermore, the center of all IPs after excluding all detected outliers is defined as CIP. 1  and 2 

The RMSEs for Different
The localization performances of various approaches are influenced by the standard deviation of the estimated bearings error. For the source located at  It can be seen that the existence of unreliable bearings can severely deteriorate the localization performance of the PLE, especially when 1  is small. When 1 0.5   , the RMSE of the estimated errors for the PLE is as large as 4.83 m. Compared with WLS, the CIP method shows lower RMSEs only when 1 2   , and IP_WLS always outperforms WLS for all values of 1  , because it is easier to It can be seen that the existence of unreliable bearings can severely deteriorate the localization performance of the PLE, especially when σ 1 is small. When σ 1 = 0.5 • , the RMSE of the estimated errors for the PLE is as large as 4.83 m. Compared with WLS, the CIP method shows lower RMSEs only when σ 1 < 2 • , and IP_WLS always outperforms WLS for all values of σ 1 , because it is easier to detect the outliers when the data are contaminated severely. This phenomenon also illustrates the importance of detecting outliers. From Figure 4a, we can also observe that IP_WLS shows better performance than CIP, illustrating the superiority of WLS over simple CIP. The EM exhibits a somewhat similar performance to that of DIG_WD when σ 1 is small; however, it shows a greater advantage as σ 1 increases. The simulation is then conducted when σ 1 = 2 • and σ 2 ranges from 10 • to 20 • , considering the fact that the background noise usually does not change greatly during a short period for certain applications. The results in Figure 4b indicate that the DIG_MD can significantly improve the localization performance compared with the conventional PLE and WLS. CIP can outperform WLS only when the difference between σ 1 and σ 2 is large, and it always has higher RMSEs than those of the IP_WLS method. EM performs slightly better than DIG_MD when σ 2 is significantly larger than σ 1 . In contrast, the DIG_MD clearly outperforms EM when σ 2 is less than 16.
From Figure 4a,b, we can see that both IP_WLS and DIG_MD can improve the localization accuracy compared with the PLE and WLS. However, DIG_MD shows better performance than IP_WL. This is because IP_WLS is based on the outlier detection results of IP. These IPs are sensitive to the difference of two AOAs. When the source and two nodes are close to located at a line, the IP will be easily identified as outliers, and thus, false alarm probability will be increased. On the other hand, any of the two AOA errors will have an influence on the IP. When one IP is detected as an outlier, then two nodes will be allocated unreliable probabilities. As a result, the false alarm also exists if only one of them is reliable, especially when the node is close to the source. All these problems can be solved by the proposed DIG method.

The Influence of the Number of Reference Nodes
To discuss the effect of the number of reference nodes on the localization performance, the RMSEs for different scale of reference nodes are plotted in Figure 5 when σ 1 = 2 • and σ 2 = 15 • . detect the outliers when the data are contaminated severely. This phenomenon also illustrates the importance of detecting outliers. From Figure 4a, we can also observe that IP_WLS shows better performance than CIP, illustrating the superiority of WLS over simple CIP. The EM exhibits a somewhat similar performance to that of DIG_WD when 1  is small; however, it shows a greater advantage as 1  increases. The simulation is then conducted when 1 2   and 2  ranges from 10° to 20°, considering the fact that the background noise usually does not change greatly during a short period for certain applications. The results in Figure 4b indicate that the DIG_MD can significantly improve the localization performance compared with the conventional PLE and WLS. CIP can outperform WLS only when the difference between 1  and 2  is large, and it always has higher RMSEs than those of the IP_WLS method. EM performs slightly better than DIG_MD when 2  is significantly larger than 1  . In contrast, the DIG_MD clearly outperforms EM when 2  is less than 16. From Figure 4a,b, we can see that both IP_WLS and DIG_MD can improve the localization accuracy compared with the PLE and WLS. However, DIG_MD shows better performance than IP_WL. This is because IP_WLS is based on the outlier detection results of IP. These IPs are sensitive to the difference of two AOAs. When the source and two nodes are close to located at a line, the IP will be easily identified as outliers, and thus, false alarm probability will be increased. On the other hand, any of the two AOA errors will have an influence on the IP. When one IP is detected as an outlier, then two nodes will be allocated unreliable probabilities. As a result, the false alarm also exists if only one of them is reliable, especially when the node is close to the source. All these problems can be solved by the proposed DIG method.

The Influence of the Number of Reference Nodes
To discuss the effect of the number of reference nodes on the localization performance, the RMSEs for different scale of reference nodes are plotted in Figure 5  We can see that more reference nodes should be used when the number of unreliable nodes increases, and the number of reference nodes should be no more than [n/6] (   x is the nearest integer to x). Otherwise, the performance of DIG_WD will deteriorate seriously. As illustrated in Section 3, m position sets with (N − m) elements in each set can be obtained using the DIG_MD method. When the number of reference nodes increases, the position sets also increase, whereas the number of estimated locations decreases. Only if there are enough positions to be evaluated should more We can see that more reference nodes should be used when the number of unreliable nodes increases, and the number of reference nodes should be no more than [n/6] ([x] is the nearest integer to x). Otherwise, the performance of DIG_WD will deteriorate seriously. As illustrated in Section 3, m position sets with (N − m) elements in each set can be obtained using the DIG_MD method. When the number of reference nodes increases, the position sets also increase, whereas the number of estimated locations decreases. Only if there are enough positions to be evaluated should more position sets be used to increase the reliability of detection. To guarantee enough positions in each set to detect outliers, (N − m) should be significantly greater than m. From the simulation results, it can be seen that the reference node number is preferred to be within the range from three to [N/6]. Figure 6 further shows the localization performance for different numbers of unreliable nodes. It can be seen that the RMSEs of all the methods increase as the number of unreliable nodes increases. Compared with the PLE, CIP can improve the localization performance when unreliable nodes are present; however, it exhibits slightly higher RMSE than PLE when there is no outlier. EM has the highest RMSE among EM, WLS, IP_WLS, and DIG_MD; however, it performs better than WLS when the number of unreliable nodes increases. The IP_WLS and DIG_MD methods can inhibit the effect of unreliable bearing measurements for all cases. The superiority of the DIG_MD method over other methods increases as the number of unreliable nodes increases. position sets be used to increase the reliability of detection. To guarantee enough positions in each set to detect outliers, (N − m) should be significantly greater than m. From the simulation results, it can be seen that the reference node number is preferred to be within the range from three to [N/6]. Figure 6 further shows the localization performance for different numbers of unreliable nodes. It can be seen that the RMSEs of all the methods increase as the number of unreliable nodes increases. Compared with the PLE, CIP can improve the localization performance when unreliable nodes are present; however, it exhibits slightly higher RMSE than PLE when there is no outlier. EM has the highest RMSE among EM, WLS, IP_WLS, and DIG_MD; however, it performs better than WLS when the number of unreliable nodes increases. The IP_WLS and DIG_MD methods can inhibit the effect of unreliable bearing measurements for all cases. The superiority of the DIG_MD method over other methods increases as the number of unreliable nodes increases. To investigate the robustness of the proposed method, the hit percentage of DIG_MD (when the errors of the evaluated methods are less than WLS or the PLE) is shown in Figure 7. The figure shows that the CIP method has the lowest hit percentages compared with both the PLE and WLS. EM has higher hit percentages than IP_WLS compared with the PLE, while the latter can improve localization accuracy with greater probability than EM compared with WLS. In contrast, the hit percentages of DIG_MD retain its superiority compared with both the PLE and WLS. To investigate the robustness of the proposed method, the hit percentage of DIG_MD (when the errors of the evaluated methods are less than WLS or the PLE) is shown in Figure 7. The figure shows that the CIP method has the lowest hit percentages compared with both the PLE and WLS. EM has higher hit percentages than IP_WLS compared with the PLE, while the latter can improve localization accuracy with greater probability than EM compared with WLS. In contrast, the hit percentages of DIG_MD retain its superiority compared with both the PLE and WLS. position sets be used to increase the reliability of detection. To guarantee enough positions in each set to detect outliers, (N − m) should be significantly greater than m. From the simulation results, it can be seen that the reference node number is preferred to be within the range from three to [N/6]. Figure 6 further shows the localization performance for different numbers of unreliable nodes. It can be seen that the RMSEs of all the methods increase as the number of unreliable nodes increases. Compared with the PLE, CIP can improve the localization performance when unreliable nodes are present; however, it exhibits slightly higher RMSE than PLE when there is no outlier. EM has the highest RMSE among EM, WLS, IP_WLS, and DIG_MD; however, it performs better than WLS when the number of unreliable nodes increases. The IP_WLS and DIG_MD methods can inhibit the effect of unreliable bearing measurements for all cases. The superiority of the DIG_MD method over other methods increases as the number of unreliable nodes increases. To investigate the robustness of the proposed method, the hit percentage of DIG_MD (when the errors of the evaluated methods are less than WLS or the PLE) is shown in Figure 7. The figure shows that the CIP method has the lowest hit percentages compared with both the PLE and WLS. EM has higher hit percentages than IP_WLS compared with the PLE, while the latter can improve localization accuracy with greater probability than EM compared with WLS. In contrast, the hit percentages of DIG_MD retain its superiority compared with both the PLE and WLS.

The Localization Performance for Different Numbers of Nodes and for Different Source Positions
As the number of nodes usually has a great influence on the localization performance, we plot the relationship between RMSE and the number of nodes in Figure 8. For fairness, the number of unreliable nodes is N/6. The number of reference nodes is four. Figure 8 shows that EM has a higher RMSE than IP_WLS and DIG_MD methods when the number of nodes is 12. However, the IP_WLS method shows worse performance than EM as the number of nodes increases. The proposed method, DIG_MD, always has the best localization accuracy for the different cases.

The Localization Performance for Different Numbers of Nodes and for Different Source Positions
As the number of nodes usually has a great influence on the localization performance, we plot the relationship between RMSE and the number of nodes in Figure 8. For fairness, the number of unreliable nodes is N/6. The number of reference nodes is four. Figure 8 shows that EM has a higher RMSE than IP_WLS and DIG_MD methods when the number of nodes is 12. However, the IP_WLS method shows worse performance than EM as the number of nodes increases. The proposed method, DIG_MD, always has the best localization accuracy for the different cases. To study the efficiency of the proposed method for different source positions, Figure 9b shows the localization performance for five different source positions when six unreliable measurements are present. It is clear that the proposed method can improve the localization performance significantly for all source positions.
To study the efficiency of the proposed method for different source positions, Figure 9b shows the localization performance for five different source positions when six unreliable measurements are present. It is clear that the proposed method can improve the localization performance significantly for all source positions.

The Localization Performance for Different Numbers of Nodes and for Different Source Positions
As the number of nodes usually has a great influence on the localization performance, we plot the relationship between RMSE and the number of nodes in Figure 8. For fairness, the number of unreliable nodes is N/6. The number of reference nodes is four. Figure 8 shows that EM has a higher RMSE than IP_WLS and DIG_MD methods when the number of nodes is 12. However, the IP_WLS method shows worse performance than EM as the number of nodes increases. The proposed method, DIG_MD, always has the best localization accuracy for the different cases. To study the efficiency of the proposed method for different source positions, Figure 9b shows the localization performance for five different source positions when six unreliable measurements are present. It is clear that the proposed method can improve the localization performance significantly for all source positions.

Outdoor Experiment Results and Analysis
In this section, we describe the verification of our proposed method using a 30-node network for acoustic source localization. All nodes were placed in an 11 × 11 m 2 square field, as shown in Figure 10. Each node is an autonomous vehicle equipped with a four-element cross microphone array, as shown in Figure 11. The microphone array is arranged into two orthogonal pairs 20 cm apart. Each pair of microphones estimates an AOA using the generalized cross correlation with phase transform (GCC-PHAT) [24] method. The final AOA is then obtained by the fusion of two AOAs obtained by the two pairs of microphones. The vertical distance from ground to microphone is also 20 cm. During the test, all nodes transmitted the estimated angles to the base station following a predefined collision-avoidance communication protocol. The localization tests were repeated 40 times. The acoustic source was a car engine noise generated by a loudspeaker orientated upward. Without loss of generality, we placed the speaker at the center of the test field (i.e., x = [5.5m, 5.5m] T ).
The experiment is conducted in an outdoor environment, with noise always present. However, the signal-noise-ratio (SNR) for each node is different as the distances from source to nodes are different. The range is from 5 to 15 dB. During the experiment, unreliable AOAs may be introduced by the following:

•
Multipath signal: because the distance between the microphone array and the ground is only 20 cm, unreliable AOAs may be introduced by a multipath signal.

•
Interferences: the movements of people and cars during the experiment are also causes of unreliable measurements.

•
The low SNR: because of the possible nonstationary background, the SNR of the received signal of each node may vary in a large range, possibly resulting in unreliable measurements.

Outdoor Experiment Results and Analysis
In this section, we describe the verification of our proposed method using a 30-node network for acoustic source localization. All nodes were placed in an 2 11 11  m square field, as shown in Figure 10. Each node is an autonomous vehicle equipped with a four-element cross microphone array, as shown in Figure 11. The microphone array is arranged into two orthogonal pairs 20 cm apart. Each pair of microphones estimates an AOA using the generalized cross correlation with phase transform (GCC-PHAT) [24] method. The final AOA is then obtained by the fusion of two AOAs obtained by the two pairs of microphones. The vertical distance from ground to microphone is also 20 cm. During the test, all nodes transmitted the estimated angles to the base station following a predefined collision-avoidance communication protocol. The localization tests were repeated 40 times. The acoustic source was a car engine noise generated by a loudspeaker orientated upward. Without loss of generality, we placed the speaker at the center of the test field (i.e., [5.5 ,5.5 ]   x m m ). The experiment is conducted in an outdoor environment, with noise always present. However, the signal-noise-ratio (SNR) for each node is different as the distances from source to nodes are different. The range is from 5 to 15 dB. During the experiment, unreliable AOAs may be introduced by the following: • Multipath signal: because the distance between the microphone array and the ground is only 20 cm, unreliable AOAs may be introduced by a multipath signal. • Interferences: the movements of people and cars during the experiment are also causes of unreliable measurements. • The low SNR: because of the possible nonstationary background, the SNR of the received signal of each node may vary in a large range, possibly resulting in unreliable measurements.

Outdoor Experiment Results and Analysis
In this section, we describe the verification of our proposed method using a 30-node network for acoustic source localization. All nodes were placed in an 2 11 11 × m square field, as shown in Figure 10. Each node is an autonomous vehicle equipped with a four-element cross microphone array, as shown in Figure 11. The microphone array is arranged into two orthogonal pairs 20 cm apart. Each pair of microphones estimates an AOA using the generalized cross correlation with phase transform (GCC-PHAT) [24] method. The final AOA is then obtained by the fusion of two AOAs obtained by the two pairs of microphones. The vertical distance from ground to microphone is also 20 cm. During the test, all nodes transmitted the estimated angles to the base station following a predefined collision-avoidance communication protocol. The localization tests were repeated 40 times. The acoustic source was a car engine noise generated by a loudspeaker orientated upward. Without loss of generality, we placed the speaker at the center of the test field (i.e., [5.5 ,5.5 The experiment is conducted in an outdoor environment, with noise always present. However, the signal-noise-ratio (SNR) for each node is different as the distances from source to nodes are different. The range is from 5 to 15 dB. During the experiment, unreliable AOAs may be introduced by the following: • Multipath signal: because the distance between the microphone array and the ground is only 20 cm, unreliable AOAs may be introduced by a multipath signal. • Interferences: the movements of people and cars during the experiment are also causes of unreliable measurements. • The low SNR: because of the possible nonstationary background, the SNR of the received signal of each node may vary in a large range, possibly resulting in unreliable measurements.   To verify the influence of the number of reference nodes on the localization accuracy, we plotted the RMSEs of different numbers of reference nodes, as shown in Figure 12. We can see that the proposed method DIG_MD clearly outperforms other compared methods when the number of reference nodes is fewer than six. As the number exceeds six, the RMSEs of DIG_MD increase gradually. When more than nine reference nodes are used, the DIG_MD method yields similar localization performance to the IP_WLS. To show the localization results more clearly, we further compared the localization results of DIG_MD with the PLE for the 40 experiments with m = 4, as shown in Figure 13. The results show that while most of the large error peaks of PLE were substantially degraded, there were a few cases in which the DIG_MD method performed slightly better than the PLE (e.g., in the 11th and 33rd runs). To investigate the underlying reason, we plotted the unreliable sensor node detection results for the two cases, as shown in Figure 14a,b. For comparison purposes, the estimated AOA values and the detection results for the 10th and 32nd experimental runs for which the proposed method significantly improves the localization performance are plotted in Figure 14c,d, respectively. To verify the influence of the number of reference nodes on the localization accuracy, we plotted the RMSEs of different numbers of reference nodes, as shown in Figure 12. We can see that the proposed method DIG_MD clearly outperforms other compared methods when the number of reference nodes is fewer than six. As the number exceeds six, the RMSEs of DIG_MD increase gradually. When more than nine reference nodes are used, the DIG_MD method yields similar localization performance to the IP_WLS. To show the localization results more clearly, we further compared the localization results of DIG_MD with the PLE for the 40 experiments with 4 m = , as shown in Figure 13. The results show that while most of the large error peaks of PLE were substantially degraded, there were a few cases in which the DIG_MD method performed slightly better than the PLE (e.g., in the 11th and 33rd runs). To investigate the underlying reason, we plotted the unreliable sensor node detection results for the two cases, as shown in Figure 14a,b. For comparison purposes, the estimated AOA values and the detection results for the 10th and 32nd experimental runs for which the proposed method significantly improves the localization performance are plotted in Figure 14c,d, respectively.   To verify the influence of the number of reference nodes on the localization accuracy, we plotted the RMSEs of different numbers of reference nodes, as shown in Figure 12. We can see that the proposed method DIG_MD clearly outperforms other compared methods when the number of reference nodes is fewer than six. As the number exceeds six, the RMSEs of DIG_MD increase gradually. When more than nine reference nodes are used, the DIG_MD method yields similar localization performance to the IP_WLS. To show the localization results more clearly, we further compared the localization results of DIG_MD with the PLE for the 40 experiments with 4 m  , as shown in Figure 13. The results show that while most of the large error peaks of PLE were substantially degraded, there were a few cases in which the DIG_MD method performed slightly better than the PLE (e.g., in the 11th and 33rd runs). To investigate the underlying reason, we plotted the unreliable sensor node detection results for the two cases, as shown in Figure 14a,b. For comparison purposes, the estimated AOA values and the detection results for the 10th and 32nd experimental runs for which the proposed method significantly improves the localization performance are plotted in Figure 14c,d, respectively.   As nodes, 11 24 , s s , are misjudged as unreliable for the 11th run, the localization error of DIG_MD is only slightly better than that of the PLE, even though the unreliable nodes 6 9 22 , , s s s can be detected; a similar situation can also be found in the 33rd experiment. In contrast, the unreliable nodes in the 10th run can be detected correctly. For the 32nd run, the localization error can be significantly decreased while 18 s and 28 s are detected with a very low false-alarm probability.
To verify the localization performance under different numbers of nodes in a node network, we only use 1 k s s  , 20, 25, 30 k = to perform localization when four reference nodes are used.
Note that when different numbers of nodes are used, the source location is no longer located at the center of all nodes. The simulation results shown in Figure 15 reveal that DIG_MD has the best localization performance for all the cases considered. As nodes, s 11 , s 24 , are misjudged as unreliable for the 11th run, the localization error of DIG_MD is only slightly better than that of the PLE, even though the unreliable nodes s 6 , s 9 , s 22 can be detected; a similar situation can also be found in the 33rd experiment. In contrast, the unreliable nodes in the 10th run can be detected correctly. For the 32nd run, the localization error can be significantly decreased while s 18 and s 28 are detected with a very low false-alarm probability.
To verify the localization performance under different numbers of nodes in a node network, we only use s 1 ∼ s k , k = 20, 25, 30 to perform localization when four reference nodes are used. Note that when different numbers of nodes are used, the source location is no longer located at the center of all nodes. The simulation results shown in Figure 15 reveal that DIG_MD has the best localization performance for all the cases considered. Sensors 2018, 18, x FOR PEER REVIEW 16 of 17 Figure 15. RMSEs for different numbers of nodes.

Conclusions
The localization performance of conventional AOA-based method, the PLE, is prone to be deteriorated when unreliable measurements are present. In this paper, we propose an unreliable node detection method based on the characteristics of the estimated positions of different node combinations. In the proposed approach, the DIG method is used to acquire different position sets, and the MD based on robust location and covariance matrix estimator is used to identify the outliers from the estimated target position sets. The proposed method does not require any prior information about the target and is easy to implement. Both simulation and outdoor experiment results show that DIG_MD is efficient and robust against the influence of unreliable measurements and can significantly improve the localization accuracy when the measurements are contaminated.

Conclusions
The localization performance of conventional AOA-based method, the PLE, is prone to be deteriorated when unreliable measurements are present. In this paper, we propose an unreliable node detection method based on the characteristics of the estimated positions of different node combinations. In the proposed approach, the DIG method is used to acquire different position sets, and the MD based on robust location and covariance matrix estimator is used to identify the outliers from the estimated target position sets. The proposed method does not require any prior information about the target and is easy to implement. Both simulation and outdoor experiment results show that DIG_MD is efficient and robust against the influence of unreliable measurements and can significantly improve the localization accuracy when the measurements are contaminated.