Efficacy of Msplit Estimation in Displacement Analysis

Sets of geodetic observations often contain groups of observations that differ from each other in the functional model (or at least in the values of its parameters). Sets of observations obtained at various measurement epochs is a practical example in such a context. From the conventional point of view, for example, in the least squares estimation, subsets in question should be separated before the parameter estimation. Another option would be application of Msplit estimation, which is based on a fundamental assumption that each observation is related to several competitive functional models. The optimal assignment of every observation to the respective functional model is automatic during the estimation process. Considering deformation analysis, each observation is assigned to several functional models, each of which is related to one measurement epoch. This paper focuses on the efficacy of the method in detecting point displacements. The research is based on example observation sets and the application of Monte Carlo simulations. The results were compared with the classical deformation analysis, which shows that the Msplit estimation seems to be an interesting alternative for conventional methods. The most promising are results obtained for disordered observation sets where the Msplit estimation reveals its natural advantage over the conventional approach.


Introduction and Motivation
Consider the classical functional model of geodetic observations, which is given for l = 1, . . . , q different measurement epochs, namely where y l = [y 1,l , · · · , y n l ,l ] T are the observation vectors whose elements belong to the respective sets Φ l = y 1,l , . . . , y n l ,l ; X l = [X 1,l , · · · , X r,l ] T are the parameter vectors; v l = [v 1,l , · · · , v n l ,l ] T are vectors of random errors; and A l ∈ R n l ,r are known coefficient matrices. Such models are the basis of deformation analysis, namely for determining the shifts ∆X (k,l) = X l − X k between the epochs l and k (for example, the changes of the point coordinates between such epochs). The vectors ∆X (k,l) can be estimated by applying different methods or strategies (e.g., [1][2][3]). The least squares method (LS-method) is still the most popular approach in such an analysis, note that LS-estimates are often supplemented with respective statistical tests (e.g., [4][5][6]). However, some unconventional methods are also in use, for example, robust M-estimation [7,8] or R-estimation [9][10][11][12][13][14]. In the case of relative networks, one can also apply methods of free adjustment (e.g., [15][16][17][18]). Some methods as well as their properties are well known, other methods are still being researched.
The M split estimation surely belongs in the latter group. The M split estimation was proposed by Wiśniewski [19,20] and has been applied to some practical problems in which each observation could be assigned to several different functional models. For example, it was used in remote sensing (terrestrial laser scanning or ALS data) for data modeling [21] and in some geodetic problems, for example, in deformation analysis [22][23][24][25], and robust estimation (e.g., [26]). Automatic assignment of each observation to the best fitted model is one of the most important features of M split estimation. It is also very useful in deformation analysis, when the observation set might include observations from all measurement epochs (the set is an unrecognized mixture of such observations). Note that there is usually no problem with separating observations from different epochs and hence with separate analyses. However, there are some cases when the application of the M split estimation is advisable. For example, when a point is displaced during an observation session, thus, one should consider two pseudo-epochs, and the M split estimation allows us to estimate the parameters of the functional models for such pseudo-epochs. Such models can also be applied when an observation set is disturbed by outliers [23,24]. Note that the method under investigation can be applied in all observation sets that are an unrecognized (and/or unordered) mixture of observation aggregations. Such data can result from different sources or instrumentations. In fact, the source of data does not matter here. It can be, for example, geodetic instruments: total stations, GNSS receivers, etc., or in remote sensing such as terrestrial or airborne laser scanners.
The main properties of the M split estimation are discussed in the papers cited above; that study focused on the efficacy of the method in estimating parameters of the competitive functional models, hence also in estimating point displacements. The analyses were based on simulations of the crude Monte Carlo method and the application of elementary functional models or models of a leveling network. The results were compared with the results of the LS-method.

Theoretical Foundations
Without loss of generality, we can assume two measurement epochs, thus in the model of Equation (1), we have q = 2. Then, the optimization criterion of the LS-method and its solution can be written in the following way (l = 1, 2) where v i,l = y i,l − a i,l X l , D LS,l = (A T l P l A l ) −1 A T l P l , P l are respective weight matrices, (a i,l -ith row of matrix A (l) ). The difference ∆X LS(1,2) =X LS,2 −X LS,1 is a LS-estimate of the shift ∆X (1,2) .
In the case of the M split estimation, we assumed that each observation belonged to either of two sets Φ 1 or Φ 2 ; however, there is one observation set Φ = Φ 1 ∪ Φ 2 and one observation vector y = [y 1 , · · · , y n ] T , n = n 1 + n 2 . There are two competitive functional models with two competitive versions of the parameter X, namely X (1) and X (2) (A ∈ R n,r ,rank(A) = r). The vectors v (1) , v (2) ∈ R n are two competitive versions of the observation errors related to all elements of the vector y.
The theoretical basis of the M split estimation is an assumption that every observation y i can be assigned to either of two density function f (y i ; X (1) ) or f (y i ; X (2) ). If y i occurs, it brings the f -information I f (y i ; X (1) ) = − ln f (y i ; X (1) ) or the f -information I f (y i ; X (2) ) = − ln f (y i ; X (2) ), which are competitive to each other. M split estimates of the parameters X (1) and X (2) , namelyX (1) andX (2) , minimize the following global information that is brought by all elements of the vector y [19] I f (y; X (1) , X (2) [− ln f (y i ; X (1) )][− ln f (y i ; X (2) )] (4) In other words, the estimators in question are the solutions of the following optimization problem: minI f (y; X (1) , X (2) ) = minI f (y;X (1) ,X (2) ) (5) For such solutions, the occurrence of the particular observation vector is the most probable. If X (1) = X (2) = X, then I f (y; X (1) , X (2) , which is the objective function of the maximum likelihood method (ML-method). In such a context, the M split estimation is a special development of the ML-method. Huber [27,28] generalized the ML-method to M-estimation by introducing ϕ M (X) = n i=1 ρ(y i ; X), where ρ(y i ; X) is an arbitrary function for which estimators obtain the desired properties (for example, they are robust against outliers). A similar generalization was also proposed for the M split estimation [19,20]. The objective function of Equation (4) is replaced by the following function Of course, M split estimation is also a development of classical M-estimation, if only X (1) = X (2) = X, and hence ϕ ρ (X (1) , X (2) There are several variants of M split estimation that differ from one another in the objective function or assumed parameters [19,22,29]. So far, the most popular is the squared M split estimation for which ρ (1) (y i ; X (1) ) = p i v 2 i(1) and ρ (2) (y i ; X (2) ) = p i v 2 i(2) . Hence, one can write the following optimization problem of such a method [19,30] as where P = Diag(p 1 , . . . , p n ) is a diagonal weight matrix of the observations y ( * -the Hadamard product). It is obvious that if X (1) = X (2) = X, then ϕ sq (X (1) , X (2) ) = n i=1 p i v 2 i , which means that the squared M split estimation is a development of the LS method. Considering such a relationship and the range of the practical applications of the M split estimation, we will only discuss the squared M split estimation. To compute M split estimates, one can use the sufficient conditions for the minimum of the objective function. Considering the optimization problem (7), one can write the following equations g (1) (X (1) , X (2) ) = ∂ϕ(X (1) , X (2) )/∂X (1) T X (1) =X (1) X (2) =X (2) = 0 g (2) (X (1) , X (2) ) = ∂ϕ(X (1) , X (2) )/∂X (2) where g (1) (X (1) , X (2) ) and g (2) (X (1) , X (2) ) are the gradients of the function ϕ sq (X (1) , X (2) ). The following matrices are diagonal weight matrices that are based on the cross-weighting functions [20,31] w (1) (v i(2) ) = The solutions of Equation (8) are the following M split estimatorŝ X (1) = D (1) (v (2) )y,X (2) = D (2) (v (1) )y (11) where Thus,X (1) is a function ofv (2) = y − AX (2) , whereasX (2) is a function ofv (1) = y − AX (1) . For this reason, this solution has an asymptotic character. The following iterative procedure can be applied to compute the sought estimates (j = 1, . . . , m) (for the given starting point, for example, v 0 (2) = y − AX LS ). The process stops when for each l = 1, 2, it holds that g (l) (X (1) ,X (2) ) = 0 and henceX (l) = X m (l) = X m−1 (l) . Note that other iterative processes that use both the gradients and the Hessians of ϕ(X (1) , X (2) ), namely Newton's method, can be found in [19,20,29]. Now, the elementary property the of M split estimates is shown. Here, we consider a basic example that precedes the more detail analysis presented in the next section. Let us assume the functional model y i = X + v i , i = 1, . . . , 7, and the observation set Φ as a following vector Figure 1a). Then,X LS = 1.49. For the sake of comparison, let the robust M-estimate be computed. By applying the Huber method [27,28], where the weight function is w(v) = min{1, k/|v|} and k = 3, one can obtainX M = 1.27 ( Figure 1b). Both estimates in question are not satisfactory, and do not reflect the nature of the observation set. The robust estimatê X M lies closer to the "bigger" aggregation of observations. Next, the question of how to treat the observations that are furthest from that estimate arises. In the classical approach, such observations are regarded as outliers (for example, affected by gross errors), and we are no longer interested in such observations. Different conclusions follow the M split estimation whereX (1) = 1.10 andX (2) = 2.00 ( Figure 1c). The Msplit estimates show that set  consists of two subsets 1  and 2  (Figure 1d), whose elements can be regarded as realizations of two different random variables that differ from each other in location parameters 1 X and 2 X , respectively. Similar assumptions can also be found in other estimation problems, for example, cluster analysis (e.g., [32,33]); or in a mixed model estimation The M split estimates show that set Φ consists of two subsets Φ 1 and Φ 2 (Figure 1d), whose elements can be regarded as realizations of two different random variables that differ from each other in location parameters X 1 and X 2 , respectively. Similar assumptions can also be found in other estimation problems, for example, cluster analysis (e.g., [32,33]); or in a mixed model estimation applied in geosciences (e.g., [34,35]). Such approaches can be regarded as alternatives; however, we should have some understanding that they differ significantly in their general ideas.
Assigning each observation to the model that is the most suitable for it is a natural process in M split estimation. This property can be applied in the analysis of network deformation where there are two functional models: y 1 = A 1 X 1 + v 1 and y 2 = A 2 X 2 + v 2 for two measurement epochs, respectively. Thus, one can create one common observation vector y = [y T 1 , y T 2 ] T , the common weight matrix P = Diag P (1) , P (2) , and the coefficient matrix . It is noteworthy that the order of the observation within vector y can be arbitrary. The actual order of the observations must coincide with the order of the rows within matrix A and order of the weights in weight matrix P. Here, the shift ∆X (1,2) can be estimated by ∆X (1,2) =X (2) −X (1) . It is worth noting that ∆X (1,2) can also be estimated directly by applying the Shift-M split estimation proposed by Duchnowski and Wiśniewski [22].

Elementary Tests
The elementary analysis was based on the univariate models and simulations of observations related to such models. Thus, where 1 n l = [1 1 , · · · , 1 n l ] T ; X 1 and X 2 are parameters that differ from each other in the shift ∆X (1,2) = X 2 − X 1 . The measurements, namely the elements of vectors y 1 and y 2 , were simulated by using the Gaussian generator randn(n, 1) of MATLAB. We assumed that σ = 1, and the following theoretical values of the parameters: X t 1 = 0 and hence X t 2 = X t 1 + ∆X (1,2) = ∆X (1,2) . Considering the LS-estimation of X 1 and X 2 we can apply the model of Equation (14) or Equation (1) where A 1 = 1 n 1 and A 2 = 1 n 2 . In the case of M split estimation, we assumed the model of Equation (3), T ∈ R n , n = n 1 + n 2 , and A = [1 T n 1 , 1 T n 2 ] T = 1 n . We also applied the iterative procedure of Equation (13) by taking LS-estimates as the starting point (note that the starting point can usually be arbitrary).
Let us now consider an example of observation simulation for which ∆X (1,2) = 5σ = 5 and n 1 = 50, n 2 = 10. The parameter estimates, together with the respective residuals, are presented in Figure 2. Now, let us consider more simulated observation sets. By applying the crude Monte Carlo method (MC) for N simulations, one can compute the MC estimates by applying the formulâ whereθ i are the estimates obtained for the ith simulation. The location of the MC estimates for N = 5000 and ∆X (1,2) = 5 or ∆X (1,2) = 20 is presented in Figure 3. This shows that the MC estimates that were obtained for both estimation methods were close to the respective theoretical values (considering the simulated standard deviation). Generally, the LS estimates seemed more satisfactory. Please note that the results obtained for different values of shift ∆X (1,2) indicate that M split estimation is more satisfactory for bigger shifts than for smaller ones. Thus, let us examine how efficient the M split estimation is for different shifts. Let the measure of efficacy be defined in relation to the LS estimates, thus The application of MC simulations allowed us to present the success rate (SR), which can be computed for different values of the shift is the value of Equation (17) at the ith simulation. Note that when λ (l) (X (l) ,X LS,l ) < 0, then the M split estimate is closer to the theoretical value than the LS estimate. Now, we can define the following function of an elementary success of M split estimation The application of MC simulations allowed us to present the success rate (SR), which can be computed for different values of the shift ∆X (1,2) γ (l) (X (l) ,X LS,l ; ∆X (1,2) where s i (l) (X (l) ,X LS,l ) is the value of Equation (17) at the ith simulation. Note that such a SR is defined in a very similar way to the mean success rate (MSR) given by Hekimoglu and Koch [36]. SRs for different ∆X (1,2) and for N = 5000 simulations are presented in Figure 4. Note that such a SR is defined in a very similar way to the mean success rate (MSR) given by Hekimoglu and Koch [36]. SRs for different (1,2) X  and for 5000 N  simulations are presented in Figure 4. X and (2) X for the growing value of (1,2) X  .

Vertical Displacement Analysis
Let us now consider the efficacy of Msplit estimates in a more practical example, namely the analysis of vertical displacements within the leveling network, which is presented in Figure 5. Such a network has already been under investigation in previous papers [24,25]. h h was measured twice at each of two measurement epochs, and that 2   mm was the known standard deviation of all measurements. We also assumed that at the first epoch . In the classical approach to the estimation of the point displacements, we used the functional model of Equation (1). Since all height differences were measured twice at two

Vertical Displacement Analysis
Let us now consider the efficacy of M split estimates in a more practical example, namely the analysis of vertical displacements within the leveling network, which is presented in Figure 5. Such a network has already been under investigation in previous papers [24,25]. Note that such a SR is defined in a very similar way to the mean success rate (MSR) given by Hekimoglu and Koch [36]. SRs for different (1,2) X Δ and for 5000 N = simulations are presented in Figure 4. X and (2) X for the growing value of (1, 2 ) X Δ .

Vertical Displacement Analysis
Let us now consider the efficacy of Msplit estimates in a more practical example, namely the analysis of vertical displacements within the leveling network, which is presented in Figure 5. Such a network has already been under investigation in previous papers [24,25]. The network consists of four reference points 1 4 , , R R  with the known heights . We assumed that each of the height differences 1 1 6 , , h h  was measured twice at each of two measurement epochs, and that 2 σ = mm was the known standard deviation of all measurements. We also assumed that at the first epoch  The network consists of four reference points R 1 , . . . , R 4 with the known heights H R 1 = · · · = H R 4 = 0 m and five object points of P 1 , . . . , P 5 . We assumed that each of the height differences h 1 , . . . , h 16 was measured twice at each of two measurement epochs, and that σ = 2 mm was the known standard deviation of all measurements. We also assumed that at the first epoch X t 1 = [H 1,1 = 0, · · · , H 5,1 = 0] T = 0, where H i,1 is the height of the ith object point at the first epoch. The shift of the object points between the measurement epochs is given by ∆X (1,2) = [∆H 1 (1,2) , · · · , ∆H 5(1,2) ] T , where ∆H i(1,2) = H i,2 − H i,1 . In the classical approach to the estimation of the point displacements, we used the functional model of Equation (1). Since all height differences were measured twice at two measurement epochs, namely, we had two series Sensors 2019, 19, 5047 9 of 14 of measurements at each epoch, then we should assume that y l ∈ R 32 , X l = [H 1,l , · · · , H 5,l ] T , and A ⊗ = A ⊗ 1 2 ∈ R 32,5 where A ∈ R 16,5 is a known coefficient matrix related to one series of measurements, 1 2 = [1, 1] T , and ⊗ is the Kronecker product. On the other hand, in the case of M split estimation, we should apply the functional model of Equation (3) and X (1) , X (2) ∈ R 5 are the competitive versions of the parameter vector, hence v (1) , v (2) ∈ R 64 are the respective competitive versions of the measurement errors. When analyzing the efficacy of M split estimation, we can use two measures, namely the local measure of the distance between the LS and M split estimates as well as the global one where [•] j is jth element of the vector and • is the Euclidean norm. The local distance, which is just another form of Equation (16), is related to a particular parameter, for example, the height of a displacing point. The global distance describes the whole parameter vector. Thus, we can define the local and global success rates in the following way where s i (l) j ([X (l) ] j , [X LS,l ] j ) and s i (l) (X (l) ,X LS,l ) are functions of an elementary success from Equation (17) and indexed with the respective arguments.
The empirical analysis, which was based on the MC method for N = 5000 simulations, was carried out for several variants of the point displacements. First, we assumed that only point P 5 was displaced. The respective MC estimates obtained for the LS and M split estimations and ∆H 5(1,2) = −50, ∆H 5(1,2) = −100, or ∆H 5(1,2) = −200 mm are presented in Table 1, which also presents the local and global SRs.  The MC estimates were similar for both estimation methods and the stable points. The SRs indicate that the LS estimates were closer to the theoretical values in the vast majority of the simulations. Note that the local SRs obtained for point P 5 were much higher than the global ones. All estimates of the point heights obtained in the MC simulations (for the variant ∆H 5(1,2) = −50 mm) are presented in Figure 6.
simulations. Note that the local SRs obtained for point 5 P were much higher than the global ones. All estimates of the point heights obtained in the MC simulations (for the variant 5(1,2) 50 H    mm) are presented in Figure 6. In the second variant, we assumed that there were two unstable points, namely 5 P and 4 P . The results, which were obtained for the different point shifts, are presented in Table 2. Here, the MC estimates obtained for both methods were also similar. Figure 7 presents the LS and Msplit estimates that were obtained for all of the MC simulations. Generally, this confirmed the correctness of both estimation methods; however, differences between these two estimation methods were also apparent. The main difference was the dispersion, which was larger in the case of the Msplit estimation, especially for the stable points, which suggests that the accuracy of the Msplit estimation was worse than LS estimation. It is also worth noting that the SRs of the Msplit estimation achieved bigger values in this variant. In the case of point 5 P , the results of the Msplit estimation were better than the results of the classical approach in almost one third of the simulations. In the second variant, we assumed that there were two unstable points, namely P 5 and P 4 . The results, which were obtained for the different point shifts, are presented in Table 2. Here, the MC estimates obtained for both methods were also similar. Figure 7 presents the LS and M split estimates that were obtained for all of the MC simulations. Generally, this confirmed the correctness of both estimation methods; however, differences between these two estimation methods were also apparent. The main difference was the dispersion, which was larger in the case of the M split estimation, especially for the stable points, which suggests that the accuracy of the M split estimation was worse than LS estimation. It is also worth noting that the SRs of the M split estimation achieved bigger values in this variant. In the case of point P 5 , the results of the M split estimation were better than the results of the classical approach in almost one third of the simulations.   100 The results, which are presented here, show that both methods, namely LS and Msplit estimation, yielded satisfactory solutions. However, such a conclusion was valid for the ordered observation sets, namely when each observation was properly assigned to its measurement epoch. If such a condition is not met, then the observation from another epoch will usually be regarded as an outlier. Since LS estimation as well as Msplit estimation are not robust against outliers, they both break down (please note that Msplit estimation is generally not robust unless we introduce an additional virtual model for outliers). Note that in the context addressed here, the outliers result from the assignment of an observation to the wrong measurement epoch, but not from gross errors. The natural feature of Msplit estimation is the automatic assignment of each observation to the proper epoch. Thus, we can suppose that this estimation method will not break down if such outliers occur. To illustrate this feature of Msplit estimation, we simulated that point 5 P was displaced and that  The results, which are presented here, show that both methods, namely LS and M split estimation, yielded satisfactory solutions. However, such a conclusion was valid for the ordered observation sets, namely when each observation was properly assigned to its measurement epoch. If such a condition is not met, then the observation from another epoch will usually be regarded as an outlier. Since LS estimation as well as M split estimation are not robust against outliers, they both break down (please note that M split estimation is generally not robust unless we introduce an additional virtual model for outliers). Note that in the context addressed here, the outliers result from the assignment of an observation to the wrong measurement epoch, but not from gross errors. The natural feature of M split estimation is the automatic assignment of each observation to the proper epoch. Thus, we can suppose that this estimation method will not break down if such outliers occur. To illustrate this feature of M split estimation, we simulated that point P 5 was displaced and that ∆H 5(1,2) = −50 mm. Now, let us consider the following variants of the  Table 3. In the case of variant A, the results were very close to the respective results presented in Table 1. If the observation sets are not ordered correctly, then the local SRs at the second epoch are close to 1, which means that almost always, the height of point P 5 at the second measurement epoch is better assessed by the M split estimation than by LS estimation. Additionally, the global SRs were very high at the second epoch, hence one can say that the heights of all network points were better estimated by the application of M split estimation.

Conclusions
The paper showed that M split estimation can be successfully applied in deformation analysis. The results were generally similar to the results of the more conventional LS estimation; however, the latter method usually yielded slightly better outcomes. The elementary tests showed that the efficacy of the M split estimation grew with an increasing shift between the observation sets. In the case of geodetic networks, where a parameter vector usually consists of several point coordinates, the shift of one or two such coordinates between measurement epochs does not influence the efficacy of the M split estimation in a significant way. The real advantage of the M split estimation was revealed for the disordered observation sets, for example, when the observations from at least two measurement epochs were mixed for some reason. Note that the LS estimates break down in such cases, in contrast with the M split estimation, for which the ordering of all observations within the combined observation set can be arbitrary and does not influence the final results of the method as well as its iterative process. Such a feature results directly from the theoretical foundations of the method, which are based on the concept of the split potential. In short, each observation "chooses" the functional model that fits it best. In this context, M split estimates are robust against some kind of "outliers", namely observations that come from other observation sets. Referring to the presented example, there were four height differences regarding the height of network point P 5 . If one of them does not fit the other, then the method tries to fit such an "outlying" observation into another epoch. If it works, then the whole estimation process succeeds. However, if such an observation is in fact affected by a gross error, then it does not fit any epoch, and the estimation must break down. The introduction of a virtual epoch, which is not related to any real measurements, is one solution to this problem. One can say that such an epoch can collect all "loners" that do not fit any real measurement epochs. Generally speaking, one can say that the M split estimation is not robust against outliers, which results from the occurrence of gross errors. However, if one assumes an additional competitive functional model (dedicated to outliers), then the M split estimation can estimate the location parameters for "good" observation aggregations as well as outlier(s). Increasing the number of competitive functional models protects the estimation of location parameters of good observations from the bad influence of outliers. Note that in this context, outliers are no longer "outlying", and become regular observations of the third (or more generally next) aggregation. This concept, which is out of the scope of this paper, was discussed in [23,24,30].