Evaluation of Different Outlier Detection Methods for GPS Networks

GPS (Global Positioning System) devices can be used in many applications which require accurate point positioning in geosciences. Accuracy of GPS decreases due to outliers resulted from the errors inherent in GPS observations. Several approaches have been developed to detect outliers in geodetic observations. It is important to determine which method is most effective at distinguishing outliers from normal observations. This paper investigates the behavior of conventional statistical test methods (Data Snooping (DS), Tau and t tests), some robust methods (Andrews's M-Estimation, Huber's M-Estimation, Tukey's M-Estimation, Danish Method, Yang-I M-Estimation, Yang-II M-Estimation, and fuzzy logic method in detection of outliers for three GPS networks having different characteristics. Test results are evaluated and the performances of different methods are presented quantitatively.


Introduction
Geoscience applications such as determination of crustal movements, deformations and landslides require accurate point positioning. GPS can be used as a tool in these applications due to its accurate point positioning ability. The 3-D coordinates of the GPS satellites are known precisely with respect to an Earth fixed coordinate system. GPS receivers measure code and phase to every satellite. For accurate positioning, absolute positioning is not used in GPS. Instead, baselines connecting control points are determined. This is also called relative positioning. In relative positioning (at millimeter level), at least two GPS receivers are occupied at two control points (position of one control point is known) and the code and phase observations to at least four GPS satellites are measured simultaneously. These measurements are repeated for a certain period of time which leads to redundant observations. If coordinates of one of the control points are known, the coordinates of the second point are determined using the baseline components. For example, let A is a control point whose coordinates are known, and B is the point whose coordinates are to be determined. The baseline components of these two points are measured using GPS receivers and the X, Y, Z coordinates of point B are obtained as:  Second GPS network 7 GPS networks are made up of baselines (see Figure 1), their baseline components ΔX, ΔY, and ΔZ are taken as observations, and estimated coordinates are obtained by Least Squares (LS) adjustment. However, some observations can be outliers and they may reduce the accuracy of the network; therefore, outliers should be detected and eliminated from the adjustment not to affect the rest of the observations.
In geodetic observations, errors can be in different magnitudes and have different characteristics depending on the surveyor, surveying equipment and environmental conditions. These errors yield differences in the magnitudes of the observations and are categorized as systematic, gross, and random errors. The effects of systematic and gross errors on the observations must be eliminated; however, it is not possible to remove random errors. Random errors are assumed to follow a normal distribution and are always present in the observations. Observations having random errors that deviate from normal distribution are called 'outliers'.
The preliminary studies were implemented using two models. In the 'mean-shift' model, which is used by conventional methods, the outliers are detected step-by-step using statistical tests. In every step an observation is detected as an outlier and removed from the observation set. In the conventional methods of Data Snooping (DS), Tau, and t tests, there is a disadvantage since these methods remove outlying baselines which in turn deteriorate the shape of the network. A normal observation may be detected as an outlier or an outlying observation may be perceived as a normal observation because of the existing outliers in the observation set. Another model to detect outliers is 'variance-inflation' model which is used in robust estimations [1]. This model was developed to eliminate the effects of the outliers in the adjustment model. Any outlier is not removed from the network; however, the weights of the observations are changed after iterations. The weights of the outliers are reduced even to zero while the weights of the normal observations are kept unchanged during the iterations. The most important point in robust methods is defining the most acceptable critical value for weight functions. This critical value can be computed or selected as a constant value. In other works [2] this scheme of computing the robust estimator by iteratively reweighting has been replaced by the solution of a global optimization problem in the space of unknowns. The fuzzy logic approach uses both the fuzzy set and the statistical test theory to determine outliers. Contrary to the conventional methods, no observation is removed from the observation set. In addition, a more accurate decision is given for the observations close to the critical value.
In this paper, outliers in GPS networks are detected using different outlier detection and robust estimation methods, and the performances of these methods are investigated to evaluate their behaviors and similarities in detection of existing outliers.

Conventional Statistical Test Methods
These methods are based on the assumption that only one observation can be detected as an outlier in each iteration step of the adjustment. Outlying observation is identified using statistical test theory. In conventional methods, the Least Squares Estimation (LSE) is used. LSE has some advantages such as simplicity of the calculation algorithm. In addition, the properties of the stochastic and functional models do not change from beginning to the end.

Data Snooping
Data Snooping (DS) and other conventional methods use the mean-shift model. In the DS method, it is assumed that only one outlier is present in the observation set. In practice, this method allows detection of more than one outlier and estimation of their locations [3]. DS is performed when the a priori variance of the observation of unit weight is known. The standard deviations of the residuals are calculated using this a priori value. The residuals normalized by this method are normally distributed [4,5].

Tau Test
If the a priori variance is not known or a value cannot be assigned to it before adjustment, the a posteriori variance 2 0 m produced after adjustment is used for outlier detection.

The t Test
If an observation i l includes a gross error i Δ , using the standard deviation obtained from the invalid adjustment model is not appropriate. In this situation, it is a more accurate approach to compute the 2 0 m value from the residuals that are free from the model errors.
where f is the degree of freedom, v is the vector of residuals, and Q vv is the cofactor matrix of the residuals.
In Table 1, P is the weight matrix of the observations, s 0 is a priori standard deviation of unit weight, f is the degree of freedom, α 0 is the significance level, N represents the normal distribution, F represents the Fischer statistic, χ 2 represents the Chi-Square statistic, t represents the Student (t) statistic, and τ represents the Tau statistic. If the correlation among residuals is neglected, the significance level α 0 is computed as: where n is the number of observations, and α is usually chosen as 5% [7].

Some Robust Estimations
Estimation methods based on LSE are sensitive to the deviations from the normal distribution of observation errors; therefore, LSE is not distributionally robust. We cannot determine that a unique robust method is better than other methods since there is no unique criterion related to the robustness. The most commonly used estimators in the literature are M-, L-, and R-Estimators [8][9]. M-Estimators stand out as the most flexible estimators and considered by many as the most favorable estimator group [8]. M-Estimation is the most convenient technique for debugging observations that include gross errors. It can be applied for heavy tailed normal distribution. It is assumed that the geodetic data follow a normal distribution.
LSE is a special case of M-Estimation, whose score function is   The computational algorithm may follow an iteratively reweighted scheme as mentioned in [10][11] although there are other approaches in the literature.
is the influence function and is the robust weight factor which decreases when the absolute of the residual increases.
The estimation procedure is as follows: where P is the equivalent weight matrix, P is the first weight matrix, w is the robust weight factor, A is the design matrix, v is the vector of residuals, x is the vector of the unknown parameters,  is the vector of reduced observations, i is the iteration number. In the first iteration, w is taken as the unit matrix.
where n is the number of observations. The iterations are executed until the difference between the parameters 1 i x  and i x are negligible. At the end of the sequence of iterations, it is observed that the equivalent weights of the outliers become smaller, even reduced to zero. The weights of the normal observations either do not change during the iterations or show little change.
As shown above, the equivalent weight matrix is used to give a decision about whether observations are normal or outlying. In obtaining the equivalent weight matrix, a robust weight factor is used. The robust weight factors are obtained by comparing the residuals with critical values derived from calculations or given constant values. In order to calculate the critical value, a procedure can be applied as follows: Here c represents the critical value, 0 s is the a priori standard deviation, Qvv is the cofactor matrix of the residuals, P is the weight matrix of the observations, f is the degree of freedom, 0  is the significance level, and t represents the Student distribution.
The associate critical value is calculated by averaging the critical values calculated for each observation as: In this study, some of the critical values of the estimations have been calculated and others taken as constant values. The weight functions of the M-Estimations used are listed in Table 2. 0 s in Table 2 is the a priori standard deviation of the unit weight, and

Fuzzy Logic Method
In the fuzzy logic approach, the fuzzy set and the statistical test theory are used together. There are two important properties of the fuzzy sets used to identify outliers. These are the complementation and the intersection properties. At the beginning, a statistical test is applied to the residuals and the residuals are classified as 'normal' and 'abnormal' residuals according to their test statistic [12]. From now on, a test will be called "the first test" when the results are presented: Here t i shows the test statistic and q is the critical value. The membership functions are used to clarify the vagueness concerning the residuals having test statistic very close to the critical value.
is the membership function related to the subset F, d is the standardization component that shows the meaningful deviation magnitude of the test statistic from the critical value [7]. It is impossible to state a definite value for d. Therefore, several values have been assigned to this component.
After the definition of the membership values of the residuals, the fuzzy membership relations between the observation errors are determined. The membership values of the residuals and the redundancy matrix are used to realize this goal. The relation between the residuals and the errors is given as follows: Here, the multiplication P Qvv is equal to the redundancy matrix R, and the equation represents the transformation between the residuals and observation errors. In order to obtain the effects of all observation errors on a residual, the relative redundancy matrix R is used. The elements of this matrix are obtained as follows:

M-Estimation Weight Function Critical Value
Andrews [13]    

Yang-II
The set of the gross errors H is defined as the intersection of the set of the errors having the greatest effect on the residuals that are most likely abnormal A and the set of the errors having the least effect on the residuals that are most likely normal B. The membership value of an observation error in the set H is defined as follows: In order to obtain the membership values of the observation errors in the set A and B, the maximum relative contribution of the i th observation error to the residuals that have membership values equal or greater than 0.5 in the subset F and P is searched, respectively. Then, this relative value and its complementary value are multiplied by the membership value of the corresponding residual as follows: The membership values of the observation errors in the intersection set H are compared with a critical value. This value can be calculated using an arithmetic or weighted mean [7].
Let the number of the elements belonging to set H with membership values different from zero be k. Now, the critical value C μH can be calculated with an arithmetic mean as: In the weighted mean method, weights are given to the membership values taking into consideration the relative effect of the observation errors in their own set as follows: After obtaining the observation errors that have membership values greater than the critical value C μH , the a priori knowledge about the location of these errors is also obtained. In order to verify this determination, a procedure is proposed by [12] as follows: observations and the number of the observations that exceed the critical value C μH , respectively. In the H matrix, the column element that corresponds to the observation with gross error is taken as 1. Consequently, the significance of the estimated gross errors is tested using one of the statistical tests. When presenting the results, this will be the last test.

Experiments and Analyses
In this study, three GPS networks (see Figure 1) have been evaluated to examine outliers using different methods. In order to focus only on the networks and not on the external constraints a free adjustment strategy has been applied. The properties of the networks are listed in Table 3. Various calculation techniques can be used to compute the a priori standard deviation of the unit weight ( 0 s ). For instance, one of these techniques uses the loop closures. But, this is not correct in some situations. In GPS networks, loop closures are not independent since the same variables are used in neighboring closures, and the weights of the loop closures are not equivalent which are not like those in triangle closures. GPS baseline components are correlated, i.e. every baseline has a 3x3 block of the weight matrix as given in equation (4). Therefore, a gross error affects all components since weight matrix is non-diagonal. At any rate, the fact that GPS baseline components are correlated makes the detection of possible outliers a question of further research so as to determine whether the corresponding 3D-baseline determination is an outlier as a whole or not. Besides, statistical tests such as the -test are not rigorously valid (though widely applied) for the case of correlated observations [19]. Since it is crucial to determine 0 s , the formula using the median of the absolute values of the residuals with weights is more convenient [18]: where "med" denotes median, P i and v i are the weight and residual of the observation i  , respectively. After 0 s is obtained, it should not be changed in the iteration steps of the robust estimation.
Conventional methods have been applied to the three networks with two different significance levels of: 0.01, and 0.001. When the significance levels are calculated using equation (3), a value smaller than 0.001 is obtained for each network. The smaller the significance level, the less sensitive the statistical test to the outliers. In other words, few or no outlier can be determined at small significance levels. Therefore, the smallest significance level is taken as 0.001 for all three networks.

Statistical Tests and Fuzzy Logic Approach for Outlier Detection
In the conventional methods, no outlier has been detected at a significance level of 0.01 for the first and second GPS network (Table 4). Hence, the statistical tests have not been applied at a significance level of 0.001.  Table 4, the greatest test statistic of the residuals is smaller than the critical value. As a consequence, it is not possible to determine any outlier using the fuzzy logic method since no residual has been named an 'abnormal' residual in the first test.
In the conventional methods at a significance level of 0.01, 13 identical observations have been determined as outliers with Tau and the t test for the third GPS network. In DS, there are only two outlying observations as shown in Table 5. In addition, at a significance level of 0.001, DS detected only one observation. The Tau and t tests indicate two outliers that are the same as the results of DS at a significance level of 0.01. In order to see the changes in the results of the fuzzy logic method, the components of this method have been used alternately. The different applications of the fuzzy logic method are shown in Table 6. Here, the Tau test was applied to the observations at the beginning and at the end. Unlike the first two GPS networks, it was possible to separate the residuals as 'normal' and 'abnormal' in the third GPS network. So it has been possible to execute the fuzzy logic method. When attention is paid to the results, it can be seen that they are quite compatible with the results of the conventional methods. The abbreviations used in Table 6

Comparison of Some Robust Estimators
We followed only the iteratively reweighted least squares scheme. In robust methods, the critical value has been calculated from equations (9) and (10), and the constant value given is 0 s 2 except for Yang-I and Yang-II M-Estimations. As shown in Table 2, the weight functions of Yang-I and Yang-II M-Estimation methods are derived differently. The robust methods with constant critical values are denoted with a superscript star in Table 7.
After applying the robust estimators to the three GPS networks, it has been seen that Danish and Huber methods yield similar results reducing the weights of the suspicious observations during the iterations. Tukey, Andrews, and Yang-I M-Estimations resemble each other reducing the weights to zero. On the other hand, the Yang-II M-Estimation is the method yielding the results that best fit with the statistical test methods and fuzzy logic approach. The methods except for Yang-II M-Estimation produce much more outlying observations when compared with the statistical tests.  Table 7 has been arranged to compare the results of the robust estimations. The performances of different robust estimations in the first GPS network coincide with those in the second and the third GPS networks; therefore, as a representative, Table 7 contains only the results for the first GPS network. In Table 7, the constant and calculated critical values, and the observations whose weights change in iterations are given in Table 7.

Conclusions
In this study, it has been seen that it is appropriate to apply conventional detection tests at a significance level of 0.001 in GPS networks. Using equation (3), a value was obtained that was smaller than 0.001 for the three networks. But if the conventional methods are used at very small significance levels, these methods tend to mask the outliers. On the other hand, at greater significance level such as 0.01, more outliers appear to exist in the networks. So, the significance level can be selected as 0.001 in GPS networks that have too many observations. In the first and second GPS networks, there appeared no outliers at any significance level. The opportunity to compare various conventional methods has been taken within the third GPS network. The t and Tau tests indicate the same results at different significance levels. One can be substituted for the other one. DS is different from these two tests and denotes few outliers. This behavior of DS is related to the a priori variance of the networks. If this a priori value is calculated from equation (26), it is more appropriate to apply DS to the networks instead of other conventional methods.
In the fuzzy logic method, statistical tests have an important effect on the results. When compared with the conventional methods, few outliers are visible with the fuzzy logic method since outliers maintain their effects on the adjustment model throughout the iterations in the conventional methods. Unlike the conventional methods, no observation is removed from the network and the shape of the network is kept to the end in the fuzzy logic method. This characteristic can be seen as an advantage. However, the abundance of the parameters used in this method makes it difficult to use this method as commonly as the conventional methods. In this study, it has been seen that if appropriate values can be given to these parameters, the results are more reliable than the conventional methods. Even if the statistical tests are applied at greater significance level, a more reliable decision can be given about the outliers than the conventional methods.
In robust estimation computed as the iteratively reweighted least squares scheme, it is crucial to determine the critical value that is used by the weight functions. In this study, this critical value is both taken as constant and calculated in applying the robust estimators to the GPS networks. Danish method and Huber M-Estimation usually reduce the weights of the suspicious observations, whereas Tukey, Andrews, and Yang-I M-Estimations tend to make the weights zero. These methods may show similar results with the conventional methods that have great significance levels. But, it has been observed in this study that it is not appropriate to choose great significance levels in the GPS networks with many redundant observations. All the robust methods except for Yang-II M-Estimation produce more outlying observations than determined by the conventional statistical test methods. The Yang-II M-Estimation is compatible with the conventional methods that have small significance levels and the fuzzy logic method.
Since GPS baseline components are correlated, a gross error in one component also affects the other components. Therefore, the detection of possible outliers is a question of further research so as to determine whether the corresponding 3D-baseline determination is an outlier as a whole or not. Besides, statistical tests such as the -test are not rigorously valid for the case of correlated observations.