Control Points Selection Based on Maximum External Reliability for Designing Geodetic Networks

: A set of stable and identiﬁable points—known as control points—are interconnected by direction, distance or height differences measurements form a geodetic network. Geodetic networks are used in various branches of modern science, such as monitoring the man-made structures, analysing the crustal deformation of the Earth, establishing and maintaining a geospatial reference frame, mapping, civil engineering projects and others. One of the most crucial components for ensuring the network quality is Geodetic Network Design. The design of a geodetic network depends on its purpose. In this paper, an automatic procedure for selection of control points is proposed. The goal is to ﬁnd the optimum control points location so that the maximum inﬂuence of an anomaly measurement (outlier) on the coordinates of the network is minimum. Here, the concept of Minimal Detectable Bias deﬁnes the size of the outlier and its propagation on the network coordinates is used to describe the external reliability. The proposed procedure was applied to design a levelling network. Two scenarios were investigated: design of a network with one control point (minimally constrained levelling network) and another with two control points (over-constrained levelling network). The centre of the network was the optimum position to set the control point. Results for that network reveal that the centre of the network was the optimum position to set the control point for the minimal constraint case, whereas the over-constraint case were those with less line connections. We highlight that the procedure is a generally applicable method.


Introduction
A geodetic network consists of a set of stable and identifiable points located on the Earth's surface or near it [1]. Their positions are associated with a coordinate reference system. These points are referred to control points. These control points provide the common basis for surveying and mapping activities. of particular importance is the densification of a geodetic network. In that case, new geodetic networks are established based on an existing network. In other words, new points are often tied to previously established control points.
If we want to determine the coordinates of that new points, measurements based on directions, distances, height differences and/or space-based geodetic techniques, such as Global Navigation Satellite Systems (GNSS), should be performed between the new points and the existing control points (now we just refer "existing control points" to as control points). In most cases of geodetic networks, the measurements bear a known linear relationship with the unknown parameters. In other words, the mathematical model that relates measurements to parameters is linear. In that case, the coordinates of new points are unknown parameters, whereas the coordinates of control points are constraints on the mathematical model. Details about constraints can be verified in [2].
The coordinates of the new points are often computed from the Least-Squares Estimation (LSE). In other words, we seek to find the coordinates of the new points that minimises the sum-of-squares of the residuals. The residuals is the difference between the observed values and the estimated (or expected) values of the measurements [3][4][5].
A step prior to the deployment of a geodetic network is the Geodetic Network Design. The quality of a geodetic network is characterised by its precision, reliability, and cost [6,7]. The covariance matrix of the least-squares parameters is associated with the degree of the network precision. There is also the internal and external reliability of a geodetic network, which were both introduced by [4]. Internal reliability refers to the ability of a network to detect outliers in the set of observations, and external reliability refers to the effect of the undetectable outliers on the estimated parameters. Here, outlier is defined according to [8]: "An outlier is an measurement that is so probably caused by a gross error that it is better not used or not used as it is". The cost is related to the effort required to implement the design and related expenses.
There is not the best geodetic network, but the optimal geodetic network. Designing a geodetic network in the optimal sense involves designing an optimal configuration for the network (e.g., selection of the position of network control points). Optimal design problem of a geodetic network was widely investigated since the pioneering work of [9,10]. Since then, several works were described in detail in papers published in scientific journals (see e.g., [1,6,7,[11][12][13][14][15][16][17][18][19]). Details about the orders of design is outside of discussion in this paper. However, details can be found in [10,11]. In this paper, we provide an automatic procedure to design a geodetic network in terms of reliability. Here, we use the reliability theory for designing geodetic networks. The reliability theory becomes a fundamental part of modern data analysis [1,[20][21][22][23]. This is due to [3,4].
Generally, the random measurement errors in a system are unavoidable. However, the presence of an outlier in the dataset can jeopardise the reliability level of the system. To detect a possible outlier, Baarda [4] introduced the Data Snooping procedure for the detection of a single outlier in linear(ised) models. Most of conventional geodetic studies have a chapter on Baarda's data snooping procedure, e.g., [5,24]. Data snooping has become very popular and is routinely used in adjustment computations [25]. Although introduced in the context of geodetic networks, Data Snooping is a generally applicable method of outlier detection in both univariate and multivariate approaches [26].
Data snooping is a procedure based on hypothesis testing, which consists of screening each individual observation for a possible outlier. The test statistic associated with Data Snooping is given by a normalised least-squares residuals. That test, known as Baarda's w-test, can also be derived as a particular case of the generalised likelihood ratio test [27]. Baarda's w-test makes a decision between the null H 0 (model in the absence of an outlier) and a single alternative hypothesis H A (model in the presence of an outlier). In that case, rejection of H 0 automatically implies acceptance of H A , and vice-versa [21,22,28]. Here, we restrict ourselves to the only one single alternative hypothesis. In the case where multiple alternative hypotheses are in play, the readers can consult, for example, [22,26].
Since Data Snooping is based on a hypothesis testing, it may lead to a errors decision. There is the probability of rejecting a true null hypothesis H 0 , which is well-known type I decision error (denoted by "α"). The probability level α (known as significance level of a test) defines the size of a test and is often known as "false alarm probability" [29]. On the other hand, there is the probability of rejecting a true alternative hypothesis H A (type II decision error-'missed detection', denoted by "β"). Based on the probability levels α and β, Baarda [3,4] derived the concept of Minimal Detectable Bias (MDB)-the term given [30]. The MDB is the additional bias (or are the additional biases) in the parameters vector that can be detected by the w-test with a certain probability of 1 − β. The MDB can be computed before actual measurements were carried out, using only a functional model and the expected stochastic properties of the data [5]. In addition, it is possible to describe the influences of the MDBs on the geodetic network coordinates (i.e., on the parameters). The set of MDBs describes the internal reliability, whereas their propagation on the parameters is said to describe the external reliability [31].
The internal and external reliability, therefore, are very useful tool to assess the magnitude of possible errors that can be detected during the pre-processing of the data. For this reason, the concept of the internal (quantified by MDB) and external reliability can be applied during the design stage of geodetic network. In this context, the goal is to apply the reliability theory for designing geodetic network. The quality criterion considered here is based on the external reliability. The position of the control point of geodetic network is selected so that the maximum influence of an MDB on the coordinates of the network points is minimum. A levelling network is used as an example of application of the proposed method. Here, we consider a scenario where the observations of the network have the same uncertainties and another with different uncertainties. We also consider the case of minimally constrained (one control point in the case of levelling network) and over-constrained network (two control points).

Conventional Reliability Theory
The null hypothesis H 0 corresponds to a model in the absence of outliers. Thus, the null hypothesis of the standard Gauss-Markov model in linear or linearised form is given by [24]: where E{.} is the expectation operator, y ∈ R n the vector of measurements, A ∈ R n×u the Design Matrix) (also referred as Jacobian matrix) of full rank u, x ∈ R u the unknown parameter vector, and e ∈ R n the unknown vector of measurement errors. Typically, it is assumed that the errors of the good measurements are normally distributed with expectation zero [32], i.e.,: with a known positive definite symmetric covariance matrix Q e ∈ R n×n . Here, we confine ourselves to the case that A and Q e have full column rank. The redundancy (or degrees of freedom) of the model in Equation (1) is r = n − u. However, any model is only an approximation to the truth. This implies that we inevitably encounter misspecified models. In contrast to the H 0 , Baarda [4] introduced a mean shift model that defines the alternative hypothesis H A , also referred to as model misspecification, as follows: where c i takes the form of a canonical unit vector with 1 as its ith entry and zeros elsewhere. The 1 value means that an ith bias parameter of magnitude ∇ i affects an ith measurement and 0 otherwise. We have, for instance, c i = 0 0 0 · · · 1 i th 0 · · · 0 T . In other words, c i specifies the type of model error and ∇ i the size of the model error, or outlier.
The likelihood ratio test to test H 0 against H A is given by: and the test statistics (known as Baarda's w-test) are the normalised least-squares residuals given by [4]: According to (4) and (5), we have: • k is the critical value. The critical value k is the the tabular value from the cumulative distribution function (cdf) of the standard normal N(0, 1) based on the chosen of a significance level α. Because we perform a two-sided test of the form |w i | ≤ k we have α/2. For example, for α = 0.01, we obtain k = 2.576. In this case, if |w i | > 2.576 for some y i one may reject H 0 . • Φ −1 denotes the inverse of the normal cumulative distribution function.
•ê is the least-squares residuals vector under H 0 and Qê the covariance matrix of the best linear unbiased estimator ofê under H 0 .
The decision rule in (4) says that if the test statistic |w i | in (5) is larger than some critical value k, i.e., a percentile of its probability distribution, then we reject the null hypothesis H 0 in favour of the alternative hypothesis H A .
Because w-test is based on binary hypothesis testing, in which one decides between the null hypothesis H 0 and a single alternative hypothesis H A , it may lead to type I decision error (α) and type II decision error (β). The probability level α is the probability of rejecting the null hypothesis when it is true, whereas β is the probability of failing to reject the null hypothesis when it is false. The complements of α and β are well-known confidence level (CL) and the statistical power (γ), respectively. The first deals with the probability of accepting a true null hypothesis; the second, with the probability of correctly accepting the alternative hypothesis. In other words, we have CL = 1 − α and γ = 1 − β.
The normalised least-squares residual w i follows a standard normal distribution with the expectation that µ = 0 if H 0 holds true. On the other hand, if the system is contaminated with a single error at the ith position of the dataset, there is an outlier that causes the expectation of w i to become µ > 0. The effect can be best understood using the non-central chi-squared distribution with one degree of freedom (i.e., for a single outlier). Under the alternative hypothesis H A , the expectation of w i is the square-root of the non-centrality parameter λ q=1 from the chi-square distribution with one degree of freedom (q = 1), which is given by: where λ q=1 is the non-centrality parameter for one degree of freedom q = 1. Under H A , the expectation of w i is λ q=1 , which is caused by the presence of an outlier. The non-centrality parameter λ q=1 in Equation (6) describes the expectation of a w-test when the null hypothesis H 0 is false (so the alternative hypothesis H A is true). This leads to their use in calculating statistical power. The term c i T Q −1 e QêQ −1 e c i in Equation (6) is a scalar for q = 1 and therefore it can be rewritten as follows [30]: where |∇ i | is the Minimal Detectable Bias MDB (i) , which can be computed for each measurement of the n alternative hypotheses according to Equation (3). For a single outlier, the variance of estimated outlier, denoted by σ 2 ∇ i , is: Thus, the MDB can also be written as: where is the standard-deviation of estimated outlier ∇ i . The MDB in Equations (7)-(9) of an alternative hypothesis is the smallest magnitude outlier that can lead to rejection of a null hypothesis for a given α and β. Thus, for each model of the alternative hypothesis H A , the corresponding MDB can be computed. The key point of MDB is that it can work as a tool for designing systems capable of withstanding outlier with a certain degree of probability.
The non-centrality parameter λ q=1 can be computed as a function of type 1 decision error α, type 2 decision error β and the degrees of freedom of the test q. To obtain the non-centrality parameter λ q=1 , here we use the recursive algorithm based on the work by [33], namely bisection algorithm. with the non-centrality parameter, and knowing the uncertainty of the sensor and the architecture of the model, it is possible to compute the MDB according to Equations (7)- (9). The MDB was further investigated for a single outlier in a singular Gauss-Markov model [34]. There are also studies covering either independent or correlated measurements [35][36][37][38][39]. It is also possible to set up for the case of multiple (simultaneous) outliers. The readers who are interested in multiple (simultaneous) outliers issue can refer to [40][41][42][43][44]. For more details about alternative models, refer to [8,45].
To quantify the external reliability, one should propagate each MDB on the parameters. In other words, the external reliability measures the influence of an undetected outlier on the estimation of coordinates of the geodetic network, and it is given by: where ∇X ∈ R u is the influence of an undetectable outlier ∇ i located at a given position according to the vector c i in (3) and W ∈ R n×n the known matrix of weights, taken as W = σ 0 2 Q −1 e , where σ 2 0 is the variance factor. Here, we compute the maximum external reliability (max (∇X)) as follows: It is important to mention that the maximum external reliability max {∇X} can be a positive or a negative value. According to Equation (1), we consider the maximum influence of an undetectable outlier ∇ i = MDB on the parameters.
Although here the reliability theory was applied in a specific network, it is a generally applicable method. For example, the reliability theory was used to measure the integrity of the receivers for civil aviation, which is a main tool for safety-of-life applications, see e.g., [46].
In the next section, we present an automatic procedure for designing geodetic networks. The proposed procedure was computationally developed based on reliability theory. Specifically, we apply reliability theory to automatically define the optimal location of control points.

Automatic Procedure to Design the Location of Control Points in the Geodetic Network
For the establishment of a geodetic network, we must define which points of the network will have their coordinates previously determined in the desired reference system. These points are called control points, or constraint points. These points that allow the other points of the geodetic network to be linked to a reference system. Therefore, it is essential to define the location of these control points at the design stage of a geodetic network. the proposed automatic method here focuses on designing of the geodetic networks in terms of high reliability. Under the present proposal, the quality criterion to be considered during the design stage is based on the smallest value of the maximum influence of an outlier on the coordinates of the network (i.e., maximum external reliability). The method does not depend on the real measurements values but only on the model design, i.e., the network geometry and covariance matrix. The computations can be performed as follows:

1.
Defining a significance level α and the type II error β in order to compute the non-centrality parameter. Here, we use the recursive algorithm based on the work by Aydin and Demirel [33], namely bisection algorithm, in order to obtain the non-centrality parameter for one degree of freedom, i.e., λ q=1 . Typically a value of the level α = 0.001 and β = 0.2 is adopted (see, e.g., [4]).

2.
Defining a geodetic network configuration as well as the uncertainties of the observations, i.e., the design matrix A and the covariance matrix of the observations Q e , respectively. The covariance matrix of the observations Q e may consist of random effects and the uncertainties associated with the correction of systematic effects. The latter follows from the instrument precision, measurement techniques and field condition. In this step, the design matrix and covariance matrix are conditioned to the position of the control point (or by the combination of control points) in the network. It is important to mention that the design matrix defined must have a minimum configuration to avoid rank deficiency [47].
Computing the external reliability according to Equation (10).

5.
Computing and store the maximum external reliability according to Equation (11).

6.
Checking whether all the points (or all combination of points) of the network were configured as control point. If not, select a new control point (or new combination of control points) and return to Step 3. Otherwise, the algorithm selects the configuration of the network that has the lowest value of the maximum external reliability. Important to mention that matrix A is modified when a new point (or a new combination of points) is selected as the control.
The proposed procedure is summarised as a flowchart in Figure 1.

Results and Discussion
In order to demonstrate the design method in practice, in this section we apply it to a closed levelling network. the network is displayed in Figure 2. The goal is to illustrate the design method; further considerations about levelling networks are outside the scope of this study. The results of this paper are presented for γ = 0.8 and α = 0.001, which gives λ q=1 = 17.075. The results of the internal reliability are shown based on the relationship between MDB and the standard-deviation of the observation. As an example of that relationship, for MDB = 5 mm and σ = 2.5 mm, the ratio is MDB σ = 5mm 2.5mm = 2 ⇒ MDB = 2σ. Two typical cases were considered here: (a) a minimally constrained and (b) a over-constrained least squares adjustment. The variances of the height difference (denoted by σ 2 ∆h i ) are assumed normally distributed and uncorrelated. The variances were based on the relation between the differential levelling lines and their lengths. In other words, the variances of differential levelling lines are proportional to their lengths, i.e., the larger the lengths, the larger the variances of differential levelling lines.
We consider that the equipment used here is a spirit level with nominal standard deviation of ±1 mm/km for a double run levelling. In each scenario two variants are considered here: 1.
all lengths of the differential levelling with 1 km, and therefore the variances equal to 1 mm 2 ; and 2.
lines with diversified lengths, and therefore levelling lines with different variances, whose values are given in Table 1. In the first scenario (a), we consider the closed levelling network in Figure 2 with availability of one control station, and 6 points with unknown heights, totalling six minimally constrained points and 7 possible cases of control point configuration. In that case, there are n = 12 observations and u = 6 unknowns, which lead to n − u = 6 degrees of freedom.
Moreover, the design matrix A has dimension 12 × 6 and the covariance matrix of observations Q e has dimension 12 × 12. The stations C, D, E, F and G are involved in 4 height differences, so there are three redundant observations for the determination of these heights. On the other hand, there is one redundant observation for the determination of heights of the stations a and B.
The MDBs computed for each observation of the network and for each case of variances configuration are displayed in Figure 3. It is important to mention that MDBs were invariant with regard to the position of a single control point in the network. It can be noted that the observations ∆h 7 and ∆h 12 are more resistant to outlier than others, because theirs MDBs were the smallest on the network. Table 2 shows the maximum external reliability of the network. It can be noted that the smallest value of the maximum influence of an MDB on the heights occurred when the station G was taken as control point, i.e., when the control point was set to the centre of the network (3.28 mm, marked in bold). The ± sign in Table 2 means that the maximum influence of an outlier on the network occurs in two directions (up and down). The best network configuration obtained based on the optimal position of the control point is shown in Figure 4.   In the second case, we consider the closed levelling network in Figure 2 over-constrained with two control stations, totalling 21 possible combinations of control points. For example, taking A and B as fixed, we have u = 5 unknown heights (C, D, E, F, G), n = 12 observations and n − u = 7 redundant observations. In that case, the design matrix A has dimension 12 × 5 and the covariance matrix of observations Q e has dimension 12 × 12. On the other hand, we have u = 5 unknown heights (B, D, E, F, G), n = 11 observations and n − u = 16 redundant observations in the case of selecting control points a and C. In that case, when the control points are adjacent, the respective levelling line are not observed and, therefore, the design matrix A has dimension 11 × 5 and the covariance matrix of observations Q e has dimension 11 × 11. Figure 5 shows an example when the control points are adjacent. For the second case, we produce and analyse 21 graphs showing the MDB values. Due to the large number of graphs, we show here a table with a summary of the results. Table 3 shows the overall statistics of the MDBs (average, maximum, minimum, and standard-deviation) for each possible combination, two by two, of control points and for each scenario of variance. For the case of same variances, it can be noted that when the points A and B were taken as control points (AB), the observations presented a good level of homogeneity (homogeneous redundancy), i.e., all of them had the same internal reliability. It means that, in the presence of an outlier, all observations have the same ability to detect it. This fact is expected and due to the complete symmetry of the network design and the variances of observations. The search for homogeneous redundancy in all the observations has already been investigated in [14,15]. Figure 6 shows the maximum external reliability for the over-constrained network. The maximum external reliability of the network for combinations of control points AB, AG, BG, CD, CF, DE and EF had positive and negative signals. This is represented by the ± sign in Figure 6. It means that the maximum influence of an outlier on the network occurs in two directions. It can be noted that both cases of observations with same variances and different variances, the smallest value of the maximum influence of an MDB on the heights occurred when the stations A and B were fixed as control, with max (∇X) = 2.3mm. In general, the inflation of the variances in the network amplified the maximum external reliability, i.e., it increased the maximum influence of a possible outlier on the network. The optimum configuration of the network according to our algorithm is shown in Figure 7. Table 3. Statistics of MDB for two control points-over-constrained scenario (b).

Observations with Same Variances
Observations

Final Remarks
Within the context of applied sciences, we highlight that the core of the paper is to bring forward a method for automatically selecting the location of the control point (or control points) of geodetic networks based on the reliability theory.
The proposed method to design a geodetic network is based on the smallest value of the maximum external reliability. The size of the outlier is defined according to MDB for a given type I and type II error probabilities. We highlight that the method discards the use of the observation vector of the Gauss-Markov model. In fact, the only needs are the geometrical network configuration (given by Jacobian matrix) and the uncertainties of the observations (given by instrument precision, measurement techniques and/or field condition). Therefore, it can be applied for any kind of geodetic network. In fact, it can be applied to any mathematical modelling for sensitivity analysis. In other words, it can be employed to apportion the changes in outputs of a system to different configuration in its inputs.
The method requires that a priori geodetic network configuration (i.e., geometric configuration and uncertainty of observations) be provided by the user. The user-provided design matrix A presents rank deficiency due to the lack of definition of a network datum. In other words, there are no minimal constraints to solve a rank deficiency of a priori user-provided design matrix A. In order to have a matrix A with full rank, the proposed algorithm select a point as control. Next, the maximum external reliability is computed and stored. The procedure is repeated until all network points were selected as a control point. Finally, the algorithm selects the network which has suffered the least influence from a possible undetected outlier. In other words, the algorithm selects that network configuration with the smallest value for maximum external reliability. As shown in the results, the proposed procedure also works for designing geodetic networks with more than one control point.
The proposed method was applied to a closed levelling network. The MDB was computed based on a power of the test of data snooping of 0.80 (80%) and the significance level of 0.001 (0.1%). To apply the concepts in practice, two scenarios were presented for a simulated levelling geodetic network: a minimally constrained network one and a over-constrained network. The observations were assumed normally distributed and uncorrelated, which usually happens in the practice of levelling network adjustment. In each scenario two variants were also considered: one in which the variances of the measurements were assumed equal and another in which the variances were different. In the case of the minimally constrained network, we highlight that the centre of the simulated network was the optimum position to set the control point. In the over-constrained network, we highlight that among the 21 possibilities of configuring the control points, the stations with less line connections (i.e., with less redundant observations) provided the optimum configuration of geodetic network.
This simple example was provided to demonstrate the application of the method. Future experiments based on real data will be performed in order to infer the actual application of the method. The application to more complex networks and others optimization criteria (such as the value and ratio of the eigenvalues of the covariance matrices of the unknowns to predict variances and their homogeneity) will also be investigated in future works.
Furthermore, we highlight that the integration of the reliability and precision as a one-off criterion is an issue in progress for future publications.