Mean Shift versus Variance Inﬂation Approach for Outlier Detection—A Comparative Study

: Outlier detection is one of the most important tasks in the analysis of measured quantities to ensure reliable results. In recent years, a variety of multi-sensor platforms has become available, which allow autonomous and continuous acquisition of large quantities of heterogeneous observations. Because the probability that such data sets contain outliers increases with the quantity of measured values, powerful methods are required to identify contaminated observations. In geodesy, the mean shift model (MS) is one of the most commonly used approaches for outlier detection. In addition to the MS model, there is an alternative approach with the model of variance inﬂation (VI). In this investigation the VI approach is derived in detail, truly maximizing the likelihood functions and examined for outlier detection of one or multiple outliers. In general, the variance inﬂation approach is non-linear, even if the null model is linear. Thus, an analytical solution does usually not exist, except in the case of repeated measurements. The test statistic is derived from the likelihood ratio (LR) of the models. The VI approach is compared with the MS model in terms of statistical power, identiﬁability of actual outliers, and numerical effort. The main purpose of this paper is to examine the performance of both approaches in order to derive recommendations for the practical application of outlier detection.


Introduction
Nowadays, outlier detection in geodetic observations is part of the daily business of modern geodesists. As Rofatto et al. [1] state, we have well established and practicable methods for outlier detection for half a century, which are also implemented in current standard geodetic software. The most important toolbox for outlier detection is the so-called data snooping, which is based on the pioneering work of Baarda [2]. A complete distribution theory of data snooping, also known as DIA (detection, identification, and adaptation) method, was developed by Teunissen [3].
In geodesy, methods for outlier detection can be characterised as statistical model selection problem. A null model is opposed to one or more extended or alternative models. While the null model describes the expected stochastic properties of the data, the alternative models deviate from such a situation in one way or another. For outlier detection, the alternative models relate to the situation, where the data are contaminated by one or more outliers. According to Lehmann [4], an outlier is defined by "an observation that is so probably caused by a gross error that it is better not used or not used as it is".
From a statistical point of view, outliers can be interpreted as a small amount of data that have different stochastic properties than the rest of the data, usually a shift in the mean or an inflation of the variance of their statistical distribution. This situation is described by extra parameters in the functional or stochastic model, such as shifted means or inflated variances. Such an extended model is called an alternative model. Due to the additionally introduced parameters, the discrepancies between the observations and the related results of the model decrease w. r. t. the null model. It has to be decided whether such an improvement of the goodness of fit is statistically significant, which means that the alternative model describes the data better than the null model. This decision can be made by hypothesis testing, information criteria, or many other statistical decision approaches, as shown by Lehmann and Lösler [5,6].
The standard alternative model in geodesy is the mean shift (MS) model, in which the contamination of the observations by gross errors is modelled as a shift in the mean, i.e., by a systematic effect. This approach is described in a large number of articles and textbooks, for example, the contributions by Baarda [2], Teunissen [7], and Kargoll [8]. However, there are other options besides this standard procedure. The contamination may also be modelled as an inflation of the variance of the observations under consideration, i.e., by a random effect. This variance inflation (VI) model is rarely investigated in mathematical statistics or geodesy. Bhar and Gupta [9] propose a solution based on Cook's statistic [10]. Although this statistic was invented for the MS model, it can also be made applicable when the variance is inflated. Thompson [11] uses the VI model for a single outlier in the framework of the restricted (or residual) maximum likelihood estimation, which is known as REML. In contrast to the true maximum likelihood estimation, REML can produce unbiased estimates of variance and covariance parameters and it causes less computational workload. Thompson [11] proposes that the observation with the largest log-likelihood value can be investigated as a possible outlier. Gumedze et al. [12] take up this development and set up a so-called variance shift outlier model (VSOM). Likelihood ratio (LR) and score test statistics are used to identify the outliers. The authors conclude that VSOM gives an objective compromise between including and omitting an observation, where its status as a correct or erroneous observation cannot be adequately resolved. Gumedze [13] review this approach and work out a one-step LR test, which is a computational simplification of the full-step LR test.
In geodesy, the VI model was introduced by Koch and Kargoll [14]. The estimation of the unknown parameters has been established as an iteratively reweighted least squares adjustment. The expectation maximization (EM) algorithm is used to detect the outliers. It is found that the EM algorithm for the VI model is very sensitive to outliers, due to its adaptive estimation, whereas the EM algorithm for the MS model provides the distinction between outliers and good observations. Koch [15] applies the method to fit a surface in three-dimensional space to the Cartesian coordinates of a point cloud obtained from measurements with a laser scanner.
The main goal of this contribution is a detailed derivation of the VI approach in the framework of outlier detection and compare it with the well-established MS model. The performance of both approaches is to be compared in order to derive recommendations for the practical application of outlier detection. This comprises the following objectives: 1. Definition of the generally accepted null model and specification of alternative MS and VI models (Section 2). Multiple outliers are allowed for in both alternative models to keep the models equivalent. 2. True maximization of the likelihood functions of the null and alternative models, not only for the common MS model, but also for the VI model. This means, we do not resort to the REML approach of Thompson [11], Gumedze et al. [12], and Gumedze [13]. This is important for the purpose of an insightful comparison of MS and VI (Section 3 4. Comparison of both approaches using the illustrative example of repeated observations, which is worked out in full detail (Section 4).
Section 5 briefly summarises the investigations that were carried out and critically reviews the results. Recommendations for the practical application of outlier detection conclude this paper.

Null Model, Mean Shift Model, and Variance Inflation Model
In mathematical statistics, a hypothesis H is a proposed explanation that the probability distribution of the random n-vector y of observations belongs to a certain parametric family W of probability distributions with parameter vector θ, e.g., Teunissen [7], The parameter vector θ might assume values from a set Θ of admissible parameter vectors. A model is then simply the formulation of the relationship between observations y and parameters θ based on H. In geodesy, the standard linear model is based on the hypothesis that the observations follow a normal distribution N, e.g., Koch [16], Teunissen [7], i.e., with u-vector of functional parameters x and covariance matrix Σ. The latter matrix might contain further stochastic parameters, like a variance factor σ 2 , according to with Q being the known cofactor matrix of y. In this case, θ is the union of x and σ 2 . The covariance matrix Σ might eventually contain more stochastic parameters, known as variance components, cf. Koch ([16], p. 225ff). Matrix A is said to be the n × u-matrix of design. The model that is based on H 0 is called the null model. In outlier detection, we oppose H 0 with one or many alternative hypotheses, most often in the form of a mean shift (MS) hypothesis, e.g., Koch [16], Teunissen [7] H MS : where ∇ is a m-vector of additional functional bias parameters and matrix C extends the design. In this case, θ is extended by ∇. This relationship gives rise to the MS model, where the mean of the observations is shifted from Ax to Ax + C∇ by the effect of gross errors, see Figure 1. C∇ can be interpreted as accounting for the systematic effect of gross observation errors superposing the effect of normal random observation errors already taken into account by Σ in (2). The great advantage of H MS is that, if the null model is linear or linearized, so is the MS model. The determination of the model parameters is numerically easy and computationally efficient, cf. Lehmann and Lösler [5]. However, there are different possibilities to set up an alternative hypothesis. The most simple one is the variance inflation hypothesis where Σ is a different covariance matrix, which consists of inflated variances. Σ can be interpreted as accounting for the joint random effect of normal observation errors in all observations and zero mean gross errors in few outlying observations, see Figure 2. The VI model might be considered to be more adequate to describe the outlier situation when the act of falsification of the outlying observations is thought of as being a random event, which might not be exactly reproduced in a virtual repetition of the observations. However, even if the VI model might be more adequate to describe the stochastics of the observations, this does not mean that it is possible to estimate parameters or to detect outliers better than with some less adequate model like MS. This point will be investigated below. In the following we will only consider the case that y are uncorrelated observations, where both Σ and Σ are diagonal matrices, such that the hypotheses read Here, H V I accounts for m random zero mean gross errors in the observations y 1 , . . . , y m modeled by stochastic parameters τ 1 , . . . , τ m . In this case, θ is extended by τ 1 , . . . , τ m , which will be called variance inflation factors. Thus, τ 1 , . . . , τ m can be interpreted as a special case of extra variance components.
Note that the term variance inflation factors is used differently in multiple linear regression when dealing with multicollinearity, cf. James et al. ( [17], p. 101f).

Outlier Detection by Hypothesis Tests
Outlier detection can be characterised as a statistical model selection problem. The null model, which describes the expected stochastic properties of the data, is opposed to one or more alternative models, which deviate from such properties. Usually, the decision, whether the null model is rejected in favour of a proper alternative model, is based on hypothesis testing. Multiple testing, consisting of a sequence of testings with one single alternative hypothesis only, is required if there are many possible alternative models. In this study, we first focus on such one single testing only. In the following subsections, the test statistics in the MS model as well as in the VI model are derived.
For the sake of simplicity, the scope of this contribution is restricted to cases, where all of the estimates to be computed are unique, such that all matrices to be inverted are regular.

Mean Shift Model
In the case of no stochastic parameters, i.e., Σ is known, the optimal test statistic T MS (y) for the test problem H 0 in (2) versus H MS in (4) is well known and yields, cf. Teunissen ([7], p. 76): where∇ and Σ∇ are the vector of estimated bias parameters in the MS model and its covariance matrix, respectively. Furthermore,ê 0 and Σê 0 are the vector of estimated residuals e := Ax − y in the null model and its covariance matrix, respectively. This second expression offers the opportunity to perform the test purely based on the estimation in the null model, cf. Teunissen ([7], p. 75), i.e., Each estimation is such that the likelihood function of the model is maximized. The hat will indicate maximum likelihood estimates below.
The test statistic (7) has the remarkable property of being uniformly the most powerful invariant (UMPI). This means, given a probability of type 1 decision error (rejection of H 0 when it is true) α, (7) • has the least probability of type 2 decision error (failure to reject H 0 when it is false) β (most powerful); • is independent of ∇ (uniform); and, • but only for some transformed test problem (invariant).
For the original test problem, no uniformly most powerful (UMP) test exists. For more details see Arnold [18], Kargoll [8].
Lehmann and Voß-Böhme [19] prove that (7) has the property of being a UMPχ 2 test, i.e., a UMP test in the class of all tests with test statistic following a χ 2 distribution. It can be shown that (7) belongs to the class of likelihood ratio (LR) tests, where the test statistic is equivalent to the ratio Here, L 0 and L MS denote the likelihood functions of the null and alternative model, respectively, cf. Teunissen ([7], p. 53), Kargoll [8]. (Two test statistics are said to be equivalent, if they always define the same critical region and, therefore, bring about the same decision. A sufficient condition is that one test statistic is a monotone function of the other. In this case, either both or none exceed their critical value referring to the same α.) The LR test is very common in statistics for the definition of a test statistic, because only a few very simple test problems permit the construction of a UMP test. A justification for this definition is provided by the famous Neyman-Pearson lemma, cf. Neyman and Pearson [20].

Variance Inflation Model
For the test problem (6a) versus (6b), no UMP test exists. Therefore, we also resort to the LR test here. We start by setting up the likelihood functions of the null and alternative model. For the null model (6a), the likelihood function reads and for the alternative model (6b) the likelihood function is given by where a i , i = 1, . . . , n are the row vectors of A. According to (10), the likelihood ratio reads Equivalently, we might use the double negative logarithm of the likelihood ratio as test statistic, because it brings about the same decision as the likelihood ratio itself, i.e., The first minimization result is the well-known least squares solution (8). The second minimization must be performed not only with respect to x, but also with respect to the unknown variance inflation factors τ 1 , . . . , τ m . The latter yield the necessary conditionŝ This means that τ 1 , . . . , τ m are estimated, such that the first m residuals in the VI modelê V I,i equal in magnitude their inflated standard deviations σ i √τ i , and the subtrahend in (14) is obtained by In the latter expression the minimum is to be found only with respect to the free parameter vector x V I . This expression differs from min Ω 0 essentially by the logarithm of the first m normalized residuals. This means that those summands are down-weighted, whenever the residualsê V I,i are larger in magnitude than their non-inflated standard deviations σ i . The necessary conditions for x V I are obtained by nullifying the first derivatives of (16) and read where a ij denotes the j-th element of a i . This system of equations can be rewritten as a system of polynomials of degree m + 1 in the parametersx V I,i . In general, the solution forx V I must be executed by a numerical procedure. This extra effort is certainly a disadvantage of the VI model.
Another disadvantage is that (14) does not follow a well known probability distribution, which complicates the computation of the critical value, being the quantile of this distribution. Such a computation is best performed by Monte Carlo integration, according to Lehmann [21].
Note that the likelihood function L V I in (12) has poles at τ i = 0, i = 1, . . . , m. These solutions must be excluded from consideration, because they belong to minima of (12) or equivalently to maxima of (16). (Note that log τ i is dominated by 1/τ i at τ i → 0).
A special issue in the VI model is what to do if maxτ i ≤ 1 is found in (15). In this case, the variance is not inflated, such that H 0 must not be rejected in favour of H V I , see (6b). However, it might happen that, nonetheless, T V I in (14) exceeds its critical value, especially if α is large. In order to prevent this behaviour, we modify (14) by If H 0 is true, then there is a small probability that maxτ i > 1 and, consequently, T V I > 0 arises, i.e., We see that a type 1 error cannot be required more probable than this α max , i.e., contrary to the MS model, there is an upper limit for the choice of α.
Even more a problem is what to do, if minτ i < 1 < maxτ i is found in (15). Our argument is that, in this case, H 0 should be rejected, but possibly not in favour of H V I in (6b). A more suitable alternative hypothesis should be found in the framework of a multiple test.

Repeated Observations
There is one case, which permits an analytical treatment, even of the VI model, i.e., when one scalar parameter x is observed directly n times, such that we obtain A = (1, . . . , 1) T =: 1. By transformation of the observations, also all other models with u = 1 can be mapped to this case. For compact notation, we define the weighted means of all observations and of only the last n − m inlying observations, i.e., By covariance propagation, the related variances of those expressions are obtained, i.e., Having the following useful identities the estimates in the null model (8) can be expressed aŝ and the minimum of the sum of the squared residuals is

Mean Shift Model
In the MS model, the first m observations are falsified by bias parameters ∇ 1 , . . . , ∇ m . Matrix in (4) is a block matrix of the m × m identity matrix and a (n − m) × m null matrix. Maximizing the likelihood function yields the estimated parameters and residuals, i.e., respectively, as well as the estimated bias parameters and their related covariance matrix, i.e., respectively. Note that (25d) is obtained by covariance propagation that was applied to (25c). By applying the Sherman-Morrison formula, cf. Sherman and Morrison [22], the inverse matrix of Σ∇ is obtained, and the test statistic (7) in the MS model becomes According to (9), the distributions of the null model and the alternative model are given by respectively, where the non-centrality parameter reads For the special cases of m = 1 and m = 2 extra bias parameters, as well as the case of independent and identically distributed random observation errors, the related test statistics (27) are given by Case m = 1: Case m = 2: Case σ 1 = σ 2 = · · · = σ m =: σ: In the case of σ w min σ i , which often arises when m n, the test statistic tends to

Variance Inflation Model-General Considerations
In the VI model, the first m observations are falsified by variance inflation factors τ 1 , . . . , τ m . The necessary condition (17) Using (20b) and (21b), this can be rewritten tô This solutionx V I is obtained as the real root of a polynomial of degree m + 1, which might have, at most m + 1, real solutions. In the model, it is easy to exclude the case that y i = y j , i = j, because they are either both outliers or both good observations. They should be merged into one observation. Let us index the observations, as follows: y 1 < y 2 < · · · < y m . We see that

•
in the interval −∞ . . . y 1 ofx V I the right hand side of (35) goes from 0 to +∞, • in each interval y i−1 . . . y i it goes from −∞ to +∞, and • in the interval y m · · · + ∞ it goes from −∞ to 0.

•
The left hand side of (35) is a straight line.
Therefore, (35) has always at least one real solutionx V I in each interval y i−1 . . . y i , where one of them must be a maximum of (16), because (16) goes from −∞ up to some maximum and then down again to −∞ in this interval. Besides these m − 1 uninteresting solutions, (35) can have no more or two more real solutions, except in rare cases, where it might have one more real solution. If W < y 1 , then there are no solutions above y m . If W > y m , then there are no solutions below y 1 .
From these considerations it becomes clear that (35) can have, at most, one solution that is a minimum of (16), see also Figure 3.
The second-order sufficient condition for a strict local minimum of (16) is that the Hessian matrix H of (16), (37) must be a positive definite matrix. A practical test for positive definiteness that does not require explicit calculation of the eigenvalues is the principal minor test, also known as Sylvester's criterion. The k-th leading principal minor is the determinant that is formed by deleting the last n − k rows and columns of the matrix. A necessary and sufficient condition that a symmetric n × n matrix is positive definite is that all n leading principal minors are positive, cf. Prussing [24], Gilbert [25]. Invoking Schur's determinant identity, i.e., and in combination with (15), we see that the k-th leading principal minor of H in (37) is To be positive, the second factor must be ensured to be positive for each k. Obviously, if this is true for k = m, it is also true for all other k. Therefore, the necessary and sufficient condition for a local minimum of (16) reads In other words, if and only ifx V I is sufficiently far away from all outlying observations, it belongs to a strict local minimum of (16).
Using (23d), (16), the test statistic (18) in the VI model for maxτ i > 1 reads Tracing back the flow sheet of computations, it becomes clear that T V I depends on the observations only through y 1 , . . . , y m and w or equivalently through y 1 , . . . , y m and W. These m + 1 quantities represent a so-called "sufficient statistic" for the outlier test. An interesting result is obtained, when we consider the case σ W → 0, which occurs if the n − m good observations contain much more information than the m suspected outliers. In this case, (35) can be rewritten as where P is some polynomial. If the right hand side goes to zero, at least one factor of the left hand side must also go to zero. As σ W → 0, we obtain m + 1 solutions forx V I , approaching W, y 1 , . . . .y m . The first solution can be a valid VI solution, the others are invalid asτ i → 0. Note that we havê x V I →x MS in this case, and alsoê V I,i →ê MS,i for all i = 1, . . . , m. Having in mind (22c), we see that (41) becomes and also noting that with (22b) we find σ w → 0, such that When comparing this to the equivalent result in the MS model (33), we see that T V I and T MS are equivalent test statistics under the sufficient condition minτ i > 1, because τ − log τ is a monotonic function for τ > 1. This means that, in this case, the decision on H 0 is the same, both in the MS and in the VI model. However, maxτ i > 1 might not be sufficient for this property.

Variance Inflation Model-Test for One Outlier
In the case m = 1 (35) readsx Rewriting this to a quadratic equation yields up to two solutions, i.e., With (15), we find For a solution to exist at all, we must have |ê MS,1 | ≥ 2σ W . This means that y 1 must be sufficiently outlying, otherwise H 0 is to be accepted.
The condition for a strict local maximum (40) reads here For the sign of the square root equal to the sign ofê MS,1 , this inequality is trivially fulfilled. For the opposite sign, we require Rewriting this expression yields and squaring both sides, which can be done because they are both positive, we readily arrive at |ê MS,1 | < 2σ W , which is the case that no solution exists. Therefore, we have exactly one minimum of (16), i.e.,x and the test statistic (18) becomes The conditionτ 1 > 1 is equivalent to which is trivially fulfilled, if the right hand side is negative. If it is non-negative, both sides can be squared and rearranged to Since this condition also covers the case that |ê MS,1 | > 2σ 1 , it can be used exclusively as an equivalent ofτ 1 > 1.
With (22c) we see that botĥ as well asτ 1 through (53b) depend on the observations only throughê MS,1 , and so does T V I in (54). On closer examination, we see that T V I in (54) depends even only on |ê MS,1 |. This clearly holds as well for T MS in (30). Therefore, both test statistics are equivalent if T V I can be shown to be a strictly monotone function of T MS . Figure 4 shows that (54) as a function ofê MS,1 is monotone. A mathematical proof of monotony is given in the Appendix A. Thus, it is also monotone as a function of T MS and even strictly monotone forτ 1 > 1, which is the case that we are interested in. Therefore, the MS model and the VI model are fully equivalent for repeated observations to be tested for m = 1 outlier. Finally, we numerically determine the probability distribution of test statistic (54) using Monte Carlo integration by • defining the ratio σ W/σ 1 , • generating normally distributed pseudo random numbers for e MS,1 , • evaluating (53) and (54), and • taking the histogram of (54), using 10 7 pseudo random samples ofê MS,1 . In Figure 5, the positive branch of the symmetric probability density function (PDF) is given in logarithmic scale for various ratios of σ W/σ 1 . However, only about 30% of the probability mass is located under this curve, the rest is concentrated at T V I = 0, and is not displayed. The quantiles of this distribution determine critical values and also α max in (19). The results are summarized in Table 1.

Variance Inflation Model-Test for Two Outliers
In the case m = 2 outliers, (35) readŝ The solution forx V I can be expressed in terms of a cubic equation, which permits an analytical solution. One real solution must be in the interval y 1 . . . y 2 , but there may be two more solutions

•
both below y 1 , if W < y 1 , or • both above y 2 , if W > y 2 , or • both between y 1 and y 2 , if y 1 > W > y 2 .
In rare cases, solutions may also coincide. The analytical expressions are very complicated and they do not permit a treatment analogous to the preceding subsection. Therefore, we have to fully rely on numerical methods, which, in our case, is the Monte Carlo method (MCM).
First, we compute the critical values of the test statistic (18) The MCM is preformed, using 10 7 pseudo random samples. We restrict ourselves to the case σ 1 = σ 2 . The maximum selectable type 1 error probabilities α max are summarized in Table 2. It is shown that α max is mostly larger than for m = 1. The reason is that, more often, we obtainτ i > 1, even if H 0 is true, which makes it easier to define critical values in a meaningful way. Moreover, Table 2 indicates the probabilities that under H 0 • (16) has no local minimum, and if it has, that i.e., none, one, or both variances are inflated. It is shown that, if the good observations contain the majority of the information, a minimum exists, but, contrary to our expectation, the case max(τ 1 ,τ 2 ) ≤ 1 is not typically the dominating case. The important result is what happens, if H 0 is false, because variances are truly inflated. The probability that H 0 is rejected is known as the power 1 − β of the test, where β is the probability of a type 2 decision error. It is computed both with T MS in (31) as well as with T V I in (59). Table 3 provides the results. It is shown that the power of T MS is always better than of T V I . This is unexpected, because T MS is not equivalent to the likelihood ratio of the VI model.
A possible explanation of the low performance of T V I in (59) is that, in many cases, the likelihood function L V I has no local maximum, such that (16) has no local minimum. Even for an extreme variance inflation of τ 1 = τ 2 = 5 this occurs with remarkable probability of 0.14. Moreover, the probability that max(τ 1 ,τ 2 ) ≤ 1 is hardly less than that. In both cases, H 0 cannot be rejected. Table 2. Maximum selectable type 1 error probability α max and critical value c α for T V I in (59) for various ratios of σ W/σ 1 = σ W/σ 2 and α = 0.05 as well as probabilities that (16) has no local minimum or that 0 or 1 or 2 variances are inflated.  Table 3. Test power 1 − β MS for test using T MS in (31) and test power 1 − β V I for test using T V I in (59) for various true values of the variance inflation factors τ 1 , τ 2 for σ W = 0.1 · σ 1 = 0.1 · σ 2 and α = 0.05, as well as probabilities that (16) has no local minimum or that 0 or one or two variances are inflated.

Probabilities for
Test Power no 0 1 2

Outlier Identification
If it is not known which observations are outlier-suspected, a multiple test must be set up. If the critical values are identical in all tests, then we simply have to look for the largest test statistic. This is the case for T MS when considering the same number m of outlier-suspected observations, see (9). If we even consider different numbers m in the same multiple test, we have to apply the p-value approach, cf. Lehmann and Lösler [5].
In the VI model, the requirement of identical critical values is rarely met. It is, in general, not met for repeated observations, not even for m = 1, as can be seen in Figure 5. However, in this case, it is no problem, because the test with T V I in (54) is equivalent to T MS in (30), as demonstrated. This also means that the same outlier is identified with both test statistics.
For repeated observations, we find identical critical values only for identical variances of the outlier-suspected observations, such that those observations are fully indistinguishable from each other. For example, for n = 27, m = 2, α = 0.05, and σ 1 = · · · = σ n , we find σ W/σ i = 0.20 and c α = 2.32 for all 351 pairs of outlier-suspected observations, see Table 2.
We evaluate the identifiability of two outliers in n = 10 and n = 20 repeated observations with m = 2 outliers while using the MCM. In each of the 10 6 repetitions, random observations are generated having equal variances. Two cases are considered. Whereas, in the first case, two variances are inflated by τ according to the VI model, in the second case, two observation values are shifted by ∇ according to the MS model. Using (31) and (59), the test statistics T MS and T V I are computed for all n(n − 1)/2 = 45 or 190 pairs of observations. If the maximum of the test statistic is attained for the actually modified pair of observations, the test statistic correctly identifies the outliers. Here, we assume that α is large enough for the critical value to be exceeded, but otherwise the results are independent of the choice of α. The success probabilities are given in Table 4. Table 4. Success probabilities for outlier identification in repeated observations of equal variance σ 2 with two outliers.
Success Probabilities for n = 10 Success Probabilities for n = 20 As expected, the success probabilities increase as τ or ∇ gets large. However, for both cases, T MS outperforms T V I . In Figure 6, the ratio r T of the success probabilities between the VI and the MS approach is depicted, having n = 10 repeated observations. If r T > 1, the success rate of T V I is higher than for T MS and vice versa. The ratio is always r T < 1 and tends to 1, as shown in Figure 6. Therefore, the success probability of the MS is higher than for the VI approach, even if the outliers are caused by an inflation of the variances.

Conclusions
We have studied the detection of outliers in the framework of statistical hypothesis testing. We have investigated two types of alternative hypotheses: the mean shift (MS) hypothesis, where the probability distributions of the outliers are thought of as having a shifted mean, and the variance inflation (VI) model, where they are thought of as having an inflated variance. This corresponds to an outlier-generating process thought of as being deterministic or random, respectively. While the first type of alternative hypothesis is routinely applied in geodesy and in many other disciplines, the second is not. However, even if the VI model might be more adequate to describe the stochastics of the observations, this does not mean that it is possible to estimate parameters or detect outliers better than with some less adequate model, like MS.
The test statistic has been derived by the likelihood ratio of null and alternative hypothesis. This was motivated by the famous Neyman-Pearson lemma, cf. Neyman and Pearson [20], even though that this lemma does not directly apply to this test. Therefore, the performance of the test must be numerically evaluated.
When compared to existing VI approaches, we • strived for a true (non-restricted) maximization of the likelihood function; • allowed for multiple outliers; • fully worked out the case of repeated observations; and, • computated the corresponding test power by MC method for the first time.
We newly found out that the VI stochastic model has some critical disadvantages: • the maximization of the likelihood function requires the solution of a system of u polynomial equations of degree m + 1, where u is the number of model parameters and m is the number of suspected outliers; • it is neither guaranteed that the likelihood function actually has such a local maximum, nor that it is unique; • the maximum might be at a point where some variance is deflated rather than inflated. It is debatable, what the result of the test should be in such a case; • the critical value of this test must be computed numerically by Monte Carlo integration. This must even be done for each model separately; and, • there is an upper limit (19) for the choice of the critical value, which may become small in some cases.
For the first time, the application of the VI model has been investigated for the most simple model of repeated observations. It is shown that here the likelihood function admits at most one local maximum, and it does so, if the outliers are strong enough. Moreover, in the limiting case that the suspected outliers represent an almost negligible amount of information, the VI test statistic and the MS test statistic have been demonstrated to be almost equivalent.
For m = 1 outlier in the repeated observations, there is even a closed formula (54) for the test statistic, and the existence and uniqueness of a local maximum is equivalent to a simple checkable inequality condition. Additionally, here the VI test statistic and the MS test statistic are equivalent.
In our numerical investigations, we newly found out that for m > 1 outliers in the repeated observations the power of the VI test is worse than using the classical MS test statistic. The reason is the lack of a maximum of the likelihood function, even for sizable outliers. Our numerical investigations also show that the identifiability of the outliers is worse for the VI test statistic. This is clearly seen in the case that the outliers are truly caused by shifted means, but also in the other case the identifiability is slightly worse. This means that the correct outliers are more often identified with the MS test statistic.
In the considered cases, we did not find real advantages of the VI model, but this does not prove that they do not exist. As long as such cases are not found, we therefore recommend practically performing outlier detection by the MS model. Funding: This research received no external funding.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A
To prove the monotony of the tails of the function T V I (e MS,1 ) in (54), the extreme values of T V I have to be determined. Since T V I is symmetric, it is sufficient to restrict the proof for positive values of e MS,1 . The first derivation of T V I is given by Setting T V I (e MS,1 ) = 0 yields two roots Inserting the positive extreme value e + into the second derivation of T V I , i.e., T V I (e MS,1 ) = e MS,1 σ 2 identifies e + as a minimum value, because T V I (e + ) is always positive for τ 1 > 1, cf. (55). For that reason, T V I is a monotonically increasing function on the interval (e + , +∞). Figure A1 depicts the positive tail of T V I and T V I , respectively, as well as the minimum e + .