Specific Direction-Based Outlier Detection Approach for GNSS Vector Networks

In this paper we propose an outlier detection approach for GNSS vector networks based on the specific direction (i.e., SD approach), along which the test statistic constructed reaches the maximum. We derive the unit vector of this specific direction in detail, and prove that the unit vector is the same as that determined by the outlier estimates in three-dimensional (3D) approach, while the distribution of the maximum test statistic in this direction is the square root of Chi-squared distribution. Therefore, eliminating an outlier along this specific direction can get the same result as that of eliminating all three components of outlier vector in 3D approach. The mathematical equivalence of SD approach and 3D approach is further demonstrated by a real GNSS network. Moreover, preliminary application of the SD approach to detect the abnormal antenna height measurement is carried out in terms of numerical simulations of multiple baseline solutions, and it shows that the SD approach can effectively detect baselines that are directly infected by corresponding receiver antenna height errors.


Introduction
When a weight matrix is chosen as the inverse of observables' covariance matrix, the weighted least-squares (WLS) estimation is the best linear unbiased estimator (BLUE), assuming that no outlier exists. However, outliers can inevitably occur in practice and cause the optimal feature of such estimation loss [1][2][3]. Therefore, outliers must be detected and then eliminated as soon as possible. Baarda [4] first introduced 'data-snooping' for detecting outliers in geodetic networks, where outliers are identified one by one based on the test statistic of single outlier detection approach. The test statistic can be constructed according to various statistical distributions, e.g., standard normal distribution, τ-distribution, and F-distribution [4][5][6][7], and one can determine the existence of outliers by a comparison with correspondent critical value at a given significance level [8,9]. Three types of errors, i.e., rejecting the right observation (type I error) and accepting the wrong observation (type II error) as well as locating the outlier to right observation (type III error) [10][11][12][13], are inevitably encountered in outlier detection processes. Therefore, the reliability theory is of fundamental importance in outlier detection, the content of which has been extended from the case for single outlier [4] to multiple outliers [7,14,15] and from independent observations to correlated ones [16,17]. According to the reliability theory, once the possibility of type I and type II errors is given, the Minimal Detectable Bias (MDB) and Bias-to-Noise Ratios (BNR), defined respectively as the measures of internal reliability and external reliability of geodetic networks, are uniquely determined [18][19][20]. Both MDB and BNR reflect the characteristics of a geodetic network to resist outliers, and the BNR shows impact of non-detected outlier on the final solution [21], which can be reduced or eliminated by robust methods via iteratively reweighting of i Ph i where δ 0 is the non-centrality parameter, which is uniquely determined by the size of type I error α 0 and type II error β 0 [40].
For the GNSS baseline networks, it is reasonable to treat the baseline vector observations in triples manner because the three components of a baseline vector are computed together by the same GNSS observations and are naturally correlated. Once an outlier occurs, all three components would be impacted. Therefore, the 3D outlier detection approach is intuitively developed specifically for the GNSS baseline vector applications [34]. To describe the 3D approach, the observation vector and design matrix in (1) is partitioned as where y j and ε j are the 3 × 1 vectors of the jth baseline observation and observation error, A j is the jth 3 × n design sub-matrix. When the ith observation vector y i contains a 3D outlier vector d i , Equation (1) is rewritten with (6) as and H j = 0 3 , for j i I 3 for j = i , where 0 3 is a 3 × 3 zero matrix and I 3 is a 3 × 3 identity matrix. The WLS estimationd i of 3D outlier vector is derived from (7) aŝ in which, P ij denotes the ij-th 3 × 3 sub-matrix of the reliability matrix P. To determine whether or not the 3D vector of outliers exists, the test statistic T i is constructed by If there are no outliers, T i is central F-distributed with two degrees of freedom as 3 and ∞ at given α 0 , i.e., T i ∼ F(α 0 ; 3, ∞). If the variance factor σ 2 0 in (9) is unknown, then T i can be re-constructed following the central F-distribution of F(α 0 ; 3, 3m − n − 3) if no outlier exists according to [6] (p. 302).

Outlier Detection in SD Approach
Supposing the outlier's coefficient matrix of (7) is defined as H j = 0 3 , for j i u ik for j = i , where u ik represents the 3D directional cosines relative to three Cartesian coordinate axes and 0 3 is a 3D zero vector, similar to (3) the test statistic for the ith baseline vector observation at the kth direction of is constructed as where g i = m j=1 P ij y j is a 3D vector and P ij is the ij-th 3 × 3 sub-matrix of the reliability matrix P.
For different directions u ik , the testing values w ik in (10) are also different. Accordingly, the outlier detection and identification should focus on finding the specific unit direction vector, supposing u i3 , that enables the largest test statistic for (10). This can be solved by the following target function Φ(u i3 ) For a local maximum of the target function above, its first order partial derivative must equal to zero, i.e., The solution of (12) is where the matrix P −1 ii denotes the inverse of P ii . When g T i u i3 = 0, the test statistic of (10) gets the minimum value, which is not the right solution we are looking for. Since the scalar factor u T i3 P ii u i3 /g T i u i3 in (13) does not impact the direction of the unit direction vector u i3 , the first equation of (13) is simply equivalent to (14) for determination of a spatial direction ii g i being the 2-norm of the vector P −1 ii g i . Here, by comparing (14) with (8), we can find that u i3 andd i are along the same direction, indicating the outlier vector derived from 3D approach can intrinsically determine the specific direction with maximum test statistic.
By taking the second order partial derivative to (12) and then substituting (14) into it, one can get where I 3 denotes the 3 × 3 identity matrix. Since the block matrix P ii is positive definite, if the matrix (14) is the unique unit vector to get the local maximum test statistic, and it must be the global maximum one. By substituting (14) into (10), one can derive the maximum value as (16), which is utilized as test statistic in the SD approach Considering g i = m j=1 P ij y j , test statistic (16) can be rewritten with (8) as Comparing (17) with (9), we can find that |w i3 | 2 follows the Chi-squared distribution with 3 degrees of freedom. Thereby, the SD approach is mathematically equivalent to the 3D method and its critical value for |w i3 | can be directly calculated by 3 · F(α 0 ; 3, ∞).

Outlier Elimination in SD Approach
If the maximum test statistic |w i3 | by (17) is larger than its critical value, the 3D test statistic (9) will also be larger than its corresponding critical value and the whole ith baseline vector should be eliminated.
Evaluating 3D outlier estimates (8), it can be rewritten aŝ where d i3 = P −1 ii g i , the outlier estimates at the other two directions (u i1 and u i2 ) orthogonal to u i3 must be zero. Therefore, in SD approach, if an outlier occurs at the ith baseline vector, the observational equation for eliminating the outlier along the specific direction u i3 is expressed as Then the estimates of parameter vector x and its variance can be derived via least squares adjustment. It was found that elimination of the outlier estimated in this direction will lead to the same results of elimination outlier vectord i in the whole baseline vector as done in 3D approach, since the outlier scalar estimated in this specific direction u i3 contains all the information content of 3D outlier vectord i .

Data Description
The real GNSS network used in the following is shown in Figure 1 and its observation data set is given in Tables A1 and A2 in the Appendix A [41]. There are 8 sites and 16 baselines in this network, and the site N001 is fixed as known for the free network adjustment.

Data Description
The real GNSS network used in the following is shown in Figure 1 and its observation data set is given in Table A1 and Table A2 in the Appendix [41]. There are 8 sites and 16 baselines in this network, and the site N001 is fixed as known for the free network adjustment.

Specific Direction Validation
If the matrix ( ) (15) is non-negative definite, previous derivation guarantees the unit direction vector 3 i u determined by (14) is the specific direction for the ith baseline observation to achieve the maximum test statistic as expressed by (16). The unit direction vector ik u of an arbitrary direction in 3D space can be expressed as ( ) cos cos cos sin sin (20) where [ 90 , 90 ] are spherical coordinates. When  and  are fixed, correspondent test statistic of (10) at this direction is uniquely determined. Therefore, the specific direction, which (10) generates as the maximum test statistic, can be obtained by simply traversing the whole range space of  and  given a certain small step size. To validate the effectiveness of (14) and (16), statistic values (10) of the No. 1 baseline for all ( , )  direction combinations are calculated and plotted in Figure 2 given the 1 step size. It is shown that values in Figure 2 manifest a symmetric pattern with respect to the origin and there exist two maximum points of 1.4973 in two opposite directions, just corresponding to the positive and negative sign in (14). This value is slightly smaller than the maximum 1.4975, which is directly derived from analytical formula (16). The differences of the maximum test statistics by analytical formula (16) proposed in SD method and those by numerical traversal algorithm are shown in Figure 3 for 16 baseline observations, which indicate that the maximum test statistics derived from (16) are all slightly larger than those from the traversal method. It further proves the test statistic derived from (16) is the theoretical global maximum one, since the traversal method can only get an approximate maximum due to the step The square root of prior variance factor σ 0 is taken as 1 cm in the following data analysis, and probabilities of type I and type II errors are chosen as α 0 = 0.1% and β 0 = 20% respectively thereafter [7]. (15) is non-negative definite, previous derivation guarantees the unit direction vector u i3 determined by (14) is the specific direction for the ith baseline observation to achieve the maximum test statistic as expressed by (16). The unit direction vector u ik of an arbitrary direction in 3D space can be expressed as

Specific Direction Validation
are spherical coordinates. When φ and λ are fixed, correspondent test statistic of (10) at this direction is uniquely determined. Therefore, the specific direction, which (10) generates as the maximum test statistic, can be obtained by simply traversing the whole range space of φ and λ given a certain small step size. To validate the effectiveness of (14) and (16), statistic values (10) of the No. 1 baseline for all (φ, λ) direction combinations are calculated and plotted in Figure 2 given the 1 • step size. It is shown that values in Figure 2 manifest a symmetric pattern with respect to the origin and there exist two maximum points of 1.4973 in two opposite directions, just corresponding to the positive and negative sign in (14). This value is slightly smaller than the maximum 1.4975, which is directly derived from analytical formula (16). The differences of the maximum test statistics by analytical formula (16) proposed in SD method and those by numerical traversal algorithm are shown in Figure 3 for 16 baseline observations, which indicate that the maximum test statistics derived from (16) are all slightly larger than those from the traversal method. It further proves the test statistic derived from (16) is the theoretical global maximum one, since the traversal method can only get an approximate maximum due to the step size limitation. Besides this, the matrices ii g i g T i corresponding to 16 baseline observations are all non-negative definite.

Outlier Detection
The test statistics of all baseline observations of the networks by SD, 3D, and 1D approach, calculated via (16), (9), and (3) respectively, are listed in Table 1, and the correspondent spherical coordinates denoting the specific directions in SD approach are also demonstrated. For a given Since one outlier can pollute its neighboring observations and possibly causes their test statistic values exceed the critical value, the 'data snooping' procedure is iteratively conducted, i.e., detecting the outliers one by one. The largest test statistics among all baseline observations in each test step by different approaches are shown in Table 3 and those exceeding critical values are in bold font.

Outlier Detection
The test statistics of all baseline observations of the networks by SD, 3D, and 1D approach, calculated via (16), (9), and (3) respectively, are listed in Table 1, and the correspondent spherical coordinates denoting the specific directions in SD approach are also demonstrated. For a given Since one outlier can pollute its neighboring observations and possibly causes their test statistic values exceed the critical value, the 'data snooping' procedure is iteratively conducted, i.e., detecting the outliers one by one. The largest test statistics among all baseline observations in each test step by different approaches are shown in Table 3 and those exceeding critical values are in bold font.

Outlier Detection
The test statistics of all baseline observations of the networks by SD, 3D, and 1D approach, calculated via (16), (9), and (3) respectively, are listed in Table 1, and the correspondent spherical coordinates denoting the specific directions in SD approach are also demonstrated. For a given α 0 = 0.1%, the critical values of three approaches are shown in Table 2, which are respectively computed by the inverse of the standard normal cumulative distribution function N(α 0 ; 0, 1) for the 1D approach, by inverse of the central cumulative F-distribution function F(α 0 ; 3, ∞) for 3D approach, and by 3 · F(α 0 ; 3, ∞) for SD approach. Besides this, test statistics larger than corresponding critical values are marked in bold font.
Since one outlier can pollute its neighboring observations and possibly causes their test statistic values exceed the critical value, the 'data snooping' procedure is iteratively conducted, i.e., detecting the outliers one by one. The largest test statistics among all baseline observations in each test step by different approaches are shown in Table 3 and those exceeding critical values are in bold font. In Step 1, the outlier is detected at the No. 3 baseline by all three approaches, but the 1D method locates the outlier only at the Y-component of the baseline. Therefore, the No. 3 baseline observation should be discarded and the remaining data set is used to continue 'data snooping' procedure in Step 2. In Step 2, all test statistics are well lower than the corresponding critical values listed in Table 2; therefore, no outlier is detected, indicating that the current dataset is quite 'clean' and the 'data snooping' procedure can be terminated. However, three methods show obvious discrepancy at this step. By the SD and 3D method, the largest statistics are reached both at the No. 1 baseline. However, by the 1D method, the largest test statistic is located at the Z-component of the No. 9 baseline, indicating that the baseline-component-based method (1D approach) and the baseline-vector-based method (3D and SD approach) do not always lead to same results as already discussed by [34].

Outlier Elimination
After identifying the outlier observation, its influence on the final parameter estimation must be eliminated. Since the SD approach is mathematically equivalent to the 3D approach, it can be expected that eliminating the influence of outlier in specific direction (14) has the same effect as that in 3D approach by (8). Figure 4 shows the absolute differences of 21 coordinate parameters (7 unknown sites) estimated by SD and 3D approach after elimination of outliers' influence respectively. The differences are ignorable and are merely caused by the limits of computer precision. Note that the values of parameters 12 and 13 are zero due to floating point number round-off and therefore not be presented in Figure 4. The final parameters estimation after outlier elimination of the No. 3 baseline by SD and 3D method are listed in Table 4. the values of parameters 12 and 13 are zero due to floating point number round-off and therefore not be presented in Figure 4. The final parameters estimation after outlier elimination of the No. 3 baseline by SD and 3D method are listed in Table 4.

Simulation Analysis for detecting abnormal GNSS Antenna Height Measurements
In this section, we adopt the SD approach to detect baseline vectors which are infected by abnormal antenna height measurement in the GNSS networks in terms of numerical simulations of multiple baseline solutions. The GNSS network used for simulations is based on Figure 1, which also consists of 8 sites and 16 baselines. Simulated baseline vector observations are generated in two steps, where firstly error-free baseline vectors are calculated by 'true' coordinates of each site, and secondly baseline vector noises are randomly generated according to corresponding baseline covariance matrix and then added to the error-free baseline vector. The simulation procedure is described in detail by [23], and in following simulations the estimated sites coordinates in Table 4, as well as the known coordinates of site N001 in Appendix A, are treated as the 'true' values for error-free baseline vector generation. Besides this, the covariance matrices in Appendix A are used to generate baseline vector noises as that in [23].
Assuming there are four GNSS receivers to carry out the measurement task of above-mentioned GNSS network, the surveying is divided into six observation sessions as arranged in Table 5 and note that only three receivers are used for sessions 2 and 5. In each session, there are at most three functional independent baseline vectors for the final network adjustment. Since the multiple baseline solutions are supposed, the baselines of each session must be stochastic dependent and the correlation coefficients from 0.2 to 0.3 between different baselines' components are assumed during the construction of weight matrix. In the following simulations, it is assumed that the GNSS antenna

Simulation Analysis for Detecting Abnormal GNSS Antenna Height Measurements
In this section, we adopt the SD approach to detect baseline vectors which are infected by abnormal antenna height measurement in the GNSS networks in terms of numerical simulations of multiple baseline solutions. The GNSS network used for simulations is based on Figure 1, which also consists of 8 sites and 16 baselines. Simulated baseline vector observations are generated in two steps, where firstly error-free baseline vectors are calculated by 'true' coordinates of each site, and secondly baseline vector noises are randomly generated according to corresponding baseline covariance matrix and then added to the error-free baseline vector. The simulation procedure is described in detail by [23], and in following simulations the estimated sites coordinates in Table 4, as well as the known coordinates of site N001 in Appendix A, are treated as the 'true' values for error-free baseline vector generation. Besides this, the covariance matrices in Appendix A are used to generate baseline vector noises as that in [23].
Assuming there are four GNSS receivers to carry out the measurement task of above-mentioned GNSS network, the surveying is divided into six observation sessions as arranged in Table 5 and note that only three receivers are used for sessions 2 and 5. In each session, there are at most three functional independent baseline vectors for the final network adjustment. Since the multiple baseline solutions are supposed, the baselines of each session must be stochastic dependent and the correlation coefficients from 0.2 to 0.3 between different baselines' components are assumed during the construction of weight matrix. In the following simulations, it is assumed that the GNSS antenna height on site N006 is wrongly measured by 10 cm in observation session 2, which is possibly caused by, for example, the misreading of antenna height. Therefore, baselines No. 3 and No. 11 are directly influenced by the wrong N006 antenna height. The influence is introduced by upward continuation of the site N006 coordinates in the error-free baseline vector generation step of observation session 2, while for other sessions the coordinates of site N006 are still based on that in Table 4. The numerical simulations are carried out 10,000 times, and for each simulation the baseline vector noises are newly produced while the 10 cm antenna height error of site N006 in session 2 is kept fixed. We use the SD approach to estimate the spatial direction, i.e., latitude and longitude, of possible outlier vector for each baseline by (14), and calculate the standard deviation (SD) of them with respect to the upward direction of site N006 by (21). In Equation (21), SD ϕ and SD λ stand for the SD values of latitude and longitude estimates respectively, and the upward direction of site N006 is (ϕ N006 , λ N006 ) = (31.3 • , 121.3 • ) according to Table 4. Table 6 lists the SD values of both latitude and longitude estimates for all baselines as well as corresponding mean values over 10,000 simulations. As shown, the No.3 baseline reaches the best consistency in terms of outlier direction estimation with site N006's antenna height direction, which is followed by the No.11 baseline for its smaller SD values compared to remaining baselines. Since the No. 3 and No. 11 are directly infected by the wrong antenna height of site N006 in session 2, the statistical results demonstrate that the SD approach can effectively determine the influence of wrong antenna height on baseline vectors in the GNSS network. Furthermore, we investigate the outlier direction estimates of other site N006-related baselines, which are observed in other sessions where the antenna height is correctly measured. Figure 5 shows the statistical distribution of outlier direction estimates for these baselines, from which we can obviously see more gathering outlier direction estimates for No. 3 and No. 11 baselines and smaller bias with respect to the antenna height direction of site N006 indicated by the corresponding red vertical line at each panel. Wih regard to the test statistics for each baseline observation calculated by (16), it turns out that among 16 baselines over 10,000 simulations, the No. 11 baseline reaches the maximum at about 99% times while the remaining part of the maximum values falls into the No. 3 baseline, which are all well beyond the critical value listed in Table 2. Therefore, it is possible to apply the SD method to detect the influence of wrongly measured receiver antenna height on baseline vectors in GNSS networks, which needs further investigation.

Conclusions
In this contribution, we proposed the specific direction-based outlier detection approach (SD approach), for 3D GNSS networks. By seeking the specific direction in the 3D space, the maximum test statistic of baseline vector observations is constructed. The analytical expression (14) is derived to directly obtain this specific direction and to construct the corresponding test statistic by (16). Compared to traditional 3D approach, the SD approach is derived from another point of view. It tests the baseline vector in a specific direction in which the outlier vector manifests the largest test statistic value. Evaluating (17) and (9) demonstrates that the two approaches are rigorously mathematically equivalent, while if readers want to directly investigate the spatial direction characteristic of outlier sources in the GNSS networks, the SD approach is preferred. A real GNSS network is processed to validate the effectiveness of the SD approach and the equivalence to the 3D method. Moreover, preliminary application of SD approach to detect the influence of wrong GNSS antenna height measurement on baseline vectors in the GNSS networks are carried out, which shows promising results and needs further investigation.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A
The baseline vector observations and their covariance matrix of real GNSS networks are listed in Table A1 referred to in [41], where the accurate known site N001's 3D coordinates are: X = −2830754.6300, Y = 4650074.3450, and Z = 3312175.0540, with the unit of meter. The approximate 3D coordinates of seven unknown sites are listed in Table A2.