Total Least-Squares Collocation: An Optimal Estimation Technique for the EIV-Model with Prior Information

: In regression analysis, oftentimes a linear (or linearized) Gauss-Markov Model (GMM) is used to describe the relationship between certain unknown parameters and measurements taken to learn about them. As soon as there are more than enough data collected to determine a unique solution for the parameters, an estimation technique needs to be applied such as ‘Least-Squares adjustment’, for instance, which turns out to be optimal under a wide range of criteria. In this context, the matrix connecting the parameters with the observations is considered fully known, and the parameter vector is considered fully unknown. This, however, is not always the reality. Therefore, two modiﬁcations of the GMM have been considered, in particular. First, ‘stochastic prior information’ (p. i.) was added on the parameters, thereby creating the – still linear – Random E ﬀ ects Model (REM) where the optimal determination of the parameters (random e ﬀ ects) is based on ‘Least Squares collocation’, showing higher precision as long as the p. i. was adequate (Wallace test). Secondly, the coe ﬃ cient matrix was allowed to contain observed elements, thus leading to the – now nonlinear – Errors-In-Variables (EIV) Model. If not using iterative linearization, the optimal estimates for the parameters would be obtained by ‘Total Least Squares adjustment’ and with generally lower, but perhaps more realistic precision. Here the two concepts are combined, thus leading to the (nonlinear) ’EIV-Model with p. i.’, where an optimal estimation (resp. prediction) technique is developed under the name of ‘Total Least-Squares collocation’. At this stage, however, the covariance matrix of the data matrix – in vector form – is still being assumed to show a Kronecker product structure.


Introduction
Over the last 50 years or so, the (linearized) Gauss-Markov Model (GMM) as standard model for the estimation of parameters from collected observation [1,2] has been refined in a number of ways.Two of these will be considered in more detail, namely

•
the GMM after strengthening the parameters through the introduction of "stochastic prior information".The relevant model will be the "Random Effects Model (REM)", and the resulting estimation technique has become known as "least-squares collocation" [3,4].
the GMM after weakening the coefficient matrix through the replacement of fixed entries by observed data, resulting in the (nonlinear) Errors-In-Variables (EIV) Model.When nonlinear normal equations are formed and subsequently solved by iteration, the resulting estimation technique has been termed "Total Least-Squares (TLS) estimation" [5][6][7].The alternative approach, based on iteratively linearizing the EIV-Model, will lead to identical estimates of the parameters [8].
After a compact review and comparison of several key formulas (parameter estimates, residuals, variance component estimate, etc.) within the above three models, a new model will be introduced that allows the strengthening of the parameters and the weakening of the coefficient matrix at the same time.The corresponding estimation technique will be called "TLS collocation" and follows essentially the outline that had first been presented by this author in June 2009 at the Intl.Workshop on Matrices and Statistics in Smolenice Castle (Slovakia); cf.Schaffrin [9].
Since then, further computational progress has been made, e.g.Snow and Schaffrin [10]; but several open questions remain that need to be addressed elsewhere.These include issues related to a rigorous error propagation of the "TLS collocation" results; progress may be achieved here along similar lines as in Snow and Schaffrin [11], but is beyond the scope of the present paper.
For a general overview of the participating models, the following diagram (Figure 1) may be helpful.
After a compact review and comparison of several key formulas (parameter estimates, residuals, variance component estimate, etc.) within the above three models, a new model will be introduced that allows the strengthening of the parameters and the weakening of the coefficient matrix at the same time.The corresponding estimation technique will be called "TLS collocation" and follows essentially the outline that had first been presented by this author in June 2009 at the Intl.Workshop on Matrices and Statistics in Smolenice Castle (Slovakia); cf.Schaffrin [9].
Since then, further computational progress has been made, e.g.Snow and Schaffrin [10]; but several open questions remain that need to be addressed elsewhere.These include issues related to a rigorous error propagation of the "TLS collocation" results; progress may be achieved here along similar lines as in Snow and Schaffrin [11], but is beyond the scope of the present paper.
For a general overview of the participating models, the following diagram (Figure 1) may be helpful.The most informative model defines the top position, and the least informative model is at the bottom.The new model can be formed at the intermediate level (like the GMM), but belongs to the "nonlinear world" where nonlinear normal equations need to be formed and subsequently solved by iteration.
Here, y denotes the 1 n × vector of (incremental) observations, A the n m × matrix of coefficients (given), ξ the 1 m × vector of (incremental) parameters (unknown), The most informative model defines the top position, and the least informative model is at the bottom.The new model can be formed at the intermediate level (like the GMM), but belongs to the "nonlinear world" where nonlinear normal equations need to be formed and subsequently solved by iteration.
Here, y denotes the n × 1 vector of (incremental) observations, A the n × m matrix of coefficients (given), ξ the m × 1 vector of (incremental) parameters (unknown), e the n × 1 vector of random errors (unknown) with expectation E{e} = 0 while the dispersion matrix D{e} = σ 2 0 I n is split into the (unknown) factor σ 2 0 as variance component (unit-free) and I n as (homogeneized) n × n symmetric and positive-definite "cofactor matrix" whose inverse is better known as "weight matrix" P; here P = I n for the sake of simplicity.Now, it is well known that the Least-Squares Solution (LESS) is based on the principle which leads to the "normal equations" Depending on rk N, the rank of the matrix N, the LESS may turn out uniquely as or it may belong to a solution (hyper)space that can be characterized by certain generalized inverses of N, where N − rs denotes an arbitrary reflexive symmetric g-inverse of N (including the "pseudo-inverse" N + ), In the latter case (6), any LESS will be biased with bias its dispersion matrix will be and its Mean Squared Error (MSE) matrix, therefore, In contrast to the choice of LESS in case (6), the residual vector will be unique; for any ξLESS : Hence, the optimal variance component estimate will also be unique: For the corresponding formulas in the full-rank case (4), the reflexive symmetric g-inverse N − rs simply needs to be replaced by the regular inverse N −1 , thereby showing that the LESS turns into an unbiased estimate of ξ while its MSE-matrix coincides with its dispersion matrix (accuracy precision).

The Random Effects Model (REM)
Now, additional prior information (p.i.) is introduced to strengthen the parameter vector within the GMM (1).Based on Harville's [13] notation of a m × 1 vector 0 ∼ of so-called "pseudo-observations", the new m × 1 vector x := ξ − 0 ∼ of "random effects" (unknown) is formed with (13) E{x} = β 0 as given m × 1 vector of p. i. and ( 14) Consequently, the Random Effects Model (REM) can be stated as symmetric and nnd (non-negative definite ). ( If P 0 := Q −1 0 exists, the estimation/prediction of x may now be based on the principle which leads to the "normal equations" for the "Least-Squares Collocation (LSC)" solution which is unique even in the rank-deficient case (6).If Q 0 is singular, but e 0 ∈ (Q 0 ) with probability 1, then there must exist an m × 1 vector ν 0 with Q 0 ν 0 = −e 0 = x − β 0 with probability 1 (a.s.
which generates the LSC solution uniquely from the "modified normal equations" or, alternatively, via the "update formula" which exhibits the "weak (local) unbiasedness" of xLSC via Consequently, the MSE-matrix of xLSC can be obtained from whereas the dispersion matrix of xLSC itself is of no relevance here.It holds: Again, both residual vectors are uniquely determined from with the control formula An optimal estimate of the variance component may now be obtained from

The Errors-In-Variables (EIV) Model
In this scenario, the Gauss-Markov Model (GMM) is further weakened by allowing some or all of the entries in the coefficient matrix A to be observed.So, after introducing a corresponding n × m matrix of unknown random errors E A , the original GMM (1) turns into the EIV-Model with the vectorized form of E A being characterized through Here, the vec operation transform a matrix into a vector by stacking all columns underneath each other while ⊗ denotes the Kronecker-Zehfuss product of matrices, defined by In particular, the following key formula holds true: for matrices of suitable size.Note that, in (34), the choice Q A := I mn is a very special one.In general, Q A may turn out singular whenever some parts of the matrix A remain unobserved (i.e., nonrandom).
In any case, thanks to the term the model (33) needs to be treated in the "nonlinear world" even though the vector ξ may contain only incremental parameters.From now on, A is assumed to have full column rank, rk A =: q = m.
Following Schaffrin and Wieser [7], for instance, the "Total-Least Squares Solution (TLSS)" can be based on the principle e T e + e T A e A = min s.t.(33-34), and the Lagrange function which needs to be made stationary.The necessary Euler-Lagrange conditions then read: which needs to be solved in connection with the "modified normal equations" from (44), namely Due to the nonlinear nature of ξTLS , it is not so easy to determine if it is an unbiased estimate, or how its MSE-matrix may exactly look like.First attempts of a rigorous error propagation have recently been undertaken by Amiri-Simkooei et al. [14] and by Schaffrin and Snow [15], but are beyond the scope of this paper.
Instead, both the optimal residual vector ẽTLS and the optimal residual matrix ( ẼA ) TLS are readily available through (40) and (43) as and through (41) as As optimal variance component estimate, it is now proposed to use the formula in analogy to the previous estimates (11) and (32).

A New Model:
The EIV-Model with Random Effects (EIV-REM) In the following, the above EIV-Model (33-34) is strengthened by introducing stochastic prior information (p.i.) on the parameters which thereby change their character and become "random effects" as in (13)(14)(15).The EIV-REM can, therefore, be stated as ) , Q 0 symmetric and nnd. (54) The first set of formulas will be derived by assuming that the weight matrix P 0 := Q −1 0 exists uniquely for the p. i.Then, the "TLS collocation (TLSC)" may be based on the principle e T e + e which needs to be made stationary.The necessary Euler-Lagrange conditions then read: Combining (60) with (62) results in and finally in In the more general case of a singular matrix Q 0 , an approach similar to (20) can be followed, leading to the equation system that needs to be solved in connection with (65).Obviously, Again, it is still unclear if xTLSC represents an unbiased prediction of the vector x of random effects.Also, very little (if anything) is known about the corresponding MSE-matrix of xTLSC .The answers to these open problems will be left for a future contribution.It is, however, possible to find the respective residual vectors/matrices represented as follows: while a suitable formula for the variance component is suggested as

Conclusions and Outlook
Key formulas have been developed successfully to optimally determine the parameters and residuals within the new 'EIV-Model with p. i.' (or EIV-REM) which turns out to be more general than the other three models considered here (GMM, REM, EIV-Model).In particular, it is quite obvious that
Hence the new EIV-REM can indeed serve as a universal representative of the whole class of models presented here.
Therefore, in a follow-up paper, it is planned to also cover more general dispersion matrices for e and e A in (54), similarly to the work by Schaffrin et al. [16] for the EIV-Model with singular dispersion matrices for e A .