Approximated Uncertainty Propagation of Correlated Independent Variables Using the Ordinary Least Squares Estimator

For chemical measurements, calibration is typically conducted by regression analysis. In many cases, generalized approaches are required to account for a complex-structured variance–covariance matrix of (in)dependent variables. However, in the particular case of highly correlated independent variables, the ordinary least squares (OLS) method can play a rational role with an approximated propagation of uncertainties of the correlated independent variables into that of a calibrated value for a particular case in which standard deviation of fit residuals are close to the uncertainties along the ordinate of calibration data. This proposed method aids in bypassing an iterative solver for the minimization of the implicit form of the squared residuals. This further allows us to derive the explicit expression of budgeted uncertainties corresponding to a regression uncertainty, the measurement uncertainty of the calibration target, and correlated independent variables. Explicit analytical expressions for the calibrated value and associated uncertainties are given for straight-line and second-order polynomial fit models for the highly correlated independent variables.


Introduction
For calibration purposes, regression analysis wherein (in)dependent variables with a complex uncertainty structure has been intensively studied to yield the best estimates of the regression coefficient and associated uncertainties [1][2][3][4][5][6][7][8][9][10][11][12][13].In particular, the uncertainty propagation of measured scalar responses toward the final calibrated value is a critical issue for ensuring the compatibility of inter-and intra-laboratory measurement results [14].In the case of non-zero covariances among responses (dependent variables) and stimuli (independent variables), generalized least squares (GLS) regression is required to solve for the matrix form of squared residuals (SR), which is weighted against the variance-covariance matrix of calibration data vectors [15].The weighted SR matrix is then minimized by an iterative solver.When the uncertainties of independent variables and covariances among dependent and independent variables are nonexistent, the GLS regression can be reduced to the weighted least squares (WLS) regression [1][2][3][4][5][6][7][8]16].As the off-diagonal elements of the SR matrix are zero, the analytical expression for the fitted parameters of the regression model can be analytically derived.
Analytical formulations of GLS are available from NPL and INRiM [18,20].INRiM's software, CCC (v2.0), works for linear and nonlinear regression problems with a full covariance matrix, excluding covariance between independent and dependent variables [28].However, NPL's software, XLGENLINE (v1.1), implemented in EXCEL by calling FORTRAN DLL, can work only for linear regression problems with up to fourth-order polynomials and diagonal covariance matrices [29].In contrast, XGENLINE (v8.1) [30], which is a MATLAB version of XLGENLINE, uses a representation in terms of the Chebyshev polynomials to obtain better numerical conditioning than the general polynomials [17].ISO/TS 28037:2010(E) [26] and the relevant software B_LEAST [30] can address the case of correlation among dependent and independent variables, but only when considering a straight-line model.B_LEAST can work with different types of regression models, including power and exponential models, but it requires diagonal covariance matrices only [32].
Following the Guide to the Expression of Uncertainty in Measurement (GUM), a common text for the metrological determination of the measurement value [33], the uncertainty of the calibration target can be evaluated by applying the (linear) law of propagation of uncertainty (LPU) to the abscissa.Therefore, for a generalized approach with the nondiagonalized SR matrix, an analytical process to derive the variance-covariance matrix of regression coefficients has been demonstrated only for particular cases wherein the correlation between variables does not disturb the derivation of the explicit variance-covariance formulae [33].For chemical or biological RMs prepared from the same mother material, one may reasonably assume a strong correlation between these uncertainties [34].For instance, a batch of chemical mixtures in various amount-fractions are typically blended using the same raw materials or mother mixture in the same facility and perhaps even by the same worker.Therefore, their variances may be correlated to some extent.Here, we attempt to derive an explicit expression of the uncertainty of the calibrated value which was propagated from measurement uncertainties of references and calibration targets using the ordinary least squares (OLS) estimator according to LPU.The calibration result using the proposed method is compared to the Monte Carlo simulation, complementing the validity of the proposed method.In addition, depicted in the right panel of Figure 1, a dataset in which the uncertainties of y i are equivalent to the sum of squared residuals (SSR) and the uncertainties of x i are highly correlated (Figure 1) was considered using LPU.

Least Squares Methods
Figure 1 illustrates the regression problem of the present study, presenting the measured data vector d, d i = (x i , y i ), where y corresponds to the response value and x the stimuli (typically, the value of the reference material (RM)), and reference values x i are fully correlated to each other.The minimization of regression variance for the data vector can be expressed as a generalized least squares (GLS) formulation as follows [13]: where the unobservable true vector describing the fitted line (x * , y * ) is a constraint upon ∼ β of order 1 × (m + 1) denotes the input regression coefficient vector of the mth-order polynomial model function, and β T , X * T , Y * T are auxiliary output quantities.C d is the variance-covariance matrix of the measurement data vector d of order n × n, where C d is assumed to have full rank.Equation (1) does not have a closed-form solution, and therefore, an iterative solving algorithm is required to derive the estimate β that is a vector of maximum likelihood estimators under the normality assumption, that is, the Best Linear Unbiased Estimator (BLUE) [17].Equation ( 1) can be reduced to yield the OLS estimator at the regression coefficient vector β in the case that the C d is the identity matrix.The SSR at β is expressed as follows: where T , the design matrix X has n rows and m columns having elements of x m n , and y = Xβ + ε.The Gauss-Markov theorem allows the unobservable random error vector ε to have same variance σ 2 .The function S(b) is quadratic in β with positive definite Hessian, and therefore, this function possesses a unique global minimum at β = β, which can be given by the explicit expression as follows: Therefore, a BLUE estimator of regression coefficients in the OLS problem can be derived from Equation (3) by applying the inverse of the Gram matrix X T X −1 to both sides.The measurement uncertainty (errors-in-variable on abscissa of data vector d) can be considered identical to the standard deviation of fit residuals, and Equation (3) yields the OLS estimator that is the BLUE, satisfying the Gauss-Markov theorem as follows [35,36] Therefore, βOLS is an affine transformation of the response vector y onto a column space of the regression line by the orthogonal projection operator, X T X −1 X T [37].The GLS method is required when there are errors in the independent variable (along the ordinate).However, the error propagation of the regression variance of the GLS method requires, in general, the linearization of the regression variance at the solution point of Equation (1).The following sections explore the explicit expressions for the uncertainty of fully correlated reference u(x i ) propagated via βOLS .

Variance-Covariance Matrix of the Regression Coefficient
A full matrix expression of Equation ( 4) can be expanded as follows: Then, the cost function matrix Q is expressed as Q = (q 0 , q 1 , . ... .., q m ) T = 0 with coefficient vector β taken by the OLS estimator βOLS .Then, Equation ( 5) is reorganized to X T Xβ − X T y = 0 to have the implicit function vector elements q m , expressed as follows [38]: Equation ( 6) does not require an estimation of the inverse matrix by organizing each element implicitly.The systematic equations of Equation ( 6) do not have an exact analytical solution.Instead, coefficient vector β fits the equation best in the sense of finding the minima of SSR as described above.According to DIN 1319-4, the uncertainty transformation matrix T βOLS for accounting for variability in βOLS should be formed by the ordinate components of the regression coefficient vector β and the response vector y.To propagate the regression variance, an uncertainty propagation matrix requires the Jacobian matrices of the cost function matrix Q against coefficient vector β as an input quantity and response vector y as an output quantity.Jacobian matrices Q i and Q o , where the subscripts represent the input and output quantities, respectively, yield an uncertainty transformation matrix To propagate the variances-covariances of the regression coefficients, the uncertainty propagation matrix requires Jacobian matrices of the cost function matrix Q against coefficient vector β and response vector y as input and output quantities, respectively.To obtain a positive definite, the uncertainty transformation matrix for estimating the OLS regression variance is negatively given as follows [23,39]: where Q y and Q βOLS are the Jacobian matrices of the Q matrix against the response vector y and the regression coefficient vector β, respectively.These two matrices are given as follows: 8) The variance matrix of the regression coefficient V βOLS is calculated as follows: where Ω is the diagonal residual matrix with identical entries of s 2 [2].s 2 is a variance of fit residual of the sampled data from the unobservable 'true' regression line, of which the regression coefficient vector is b.Notably, the statistical parameter for describing the extent of dispersion of an unobservable error e i ~yi − q(x i ,b) that has a constant variance σ 2 , in which the error vectors are normally distributed ε ∼ N 0, σ 2 (zero mean and standard deviation equaling σ).The standard deviation of the fit residuals of the sampled data, s, can be expressed by the following: where the degrees-of-freedom correction in the denominator is restricted by the polynomial order of the fitting model m and the number of data vectors n. βOLS allows the residual to be an observable estimate of the unobservable error ε.The probability distribution of the residual error converges to a normal distribution following the Gauss-Markov assumption.
The variance-covariance matrix of the OLS estimator V βOLS is expressed as follows: where G is the Gram matrix X T X of Equation (9).Each diagonal entry corresponds to the variance of the regression coefficient of each term, whereas the off-diagonal entries correspond to the covariance between regression coefficients.

Propagation of Variance-Covariance of the Regression Coefficients
For calibration, the measured response value of the target y 0 is inversely evaluated to obtain the corresponding stimulus value x 0 , which can be numerically rooted by using Newton's method.To propagate the measurement uncertainty of u(y 0 ) onto u(x 0 ), the inverse evaluation requires the estimated regression coefficient vector βOLS and estimated x 0 as input and output quantities, respectively.The Jacobian matrix against the input and output quantities (q βOLS and q x , respectively) can then be expressed as follows: Here, q ′ (x 0 ) is the slope of the tangent of the determined regression line in the case of the second-order polynomial fit model at the target coordinate (x 0 , y 0 ).This implies that the regression variance will be weighted by the slope of the tangent of the non-first-order calibration curve.The uncertainty transformation matrix for the regression variance toward the abscissa of the output quantity t βOLS is then given as follows: To transform the variance-covariance matrix of the OLS estimator V βOLS with the mthorder polynomial model function (Equation ( 12)) onto the abscissa corresponding to the output quantity x 0 , the uncertainty propagation matrix t βOLS (Equation ( 15)) is bracketed according to the LPU as follows: where g ij is an element of the inverse Gram matrix G −1 .In this study, q ′ (x 0 ) has only positive values.The propagated uncertainty of regression coefficients into the calibrated value x 0 is given as follows: Because the highest-order element of the inverse Gram matrix g mm is 2m order, for the second-order polynomial, u 2 (x 0 ) is shown as a quartic curve, as shown in Figure 2. The variance of each component is estimated using the dataset in Table 1.(Left) Propagated reference variance corresponds to the calibrated value when there is no correlation, r x = 0 (blue); and correlation is unity, r x = 1 (red).(Right) Uncertainties propagated from the uncertainties of reference u x (x 0 ) (gray); target measurement, u t (x 0 ) (magenta); and regression, u 2 (x 0 ) (cyan).The combined standard uncertainty u(x 0 ) is represented by the green line.For detailed discussions of u x (x 0 ) and u t (x 0 ), see Sections 2.4 and 2.5.

Propagation of the Measurement Uncertainty of the Calibration Target
To propagate the measurement uncertainty of the calibration target, its uncertainty value is treated as an input value of the reverse evaluation.The partial derivative of the cost function at y 0 can be expressed as follows: Then, the uncertainty transformation matrix for the measurement uncertainty of the unknown sample can then be obtained to obtain the positive definite as follows: The propagation of the measurement uncertainty of the target u(y 0 ) yields the target variance V t as follows: where s is the measurement uncertainty of the calibration target, which, in general, can be approximated to the same extent of sampling variance for the regression data vector, and p is the number of repetitions of the target measurement.The measurement variance s 2 can be weighted by the number of repetitions p according to the central limit theorem.The target uncertainty can then be expressed as follows:

Propagation of Uncertainty of the Independent Variable
The uncertainty of the independent variable was propagated using same method for the estimation of the regression variance.To define an uncertainty transformation matrix toward the abscissa, if there are no correlations between x i and y i , the Jacobian matrix of the cost function against the stimulus x i is defined as follows: The uncertainty transformation matrix of the reference uncertainty to the abscissa is given as follows: where Q βOLS is given in Equation ( 9).As an analytical derivation of the above matrix is nontrivial.Then, the variances and covariances of the references are propagated back to the abscissa using following equation: where the variance-covariance matrix of the reference V ref is given as follows: (26) where R ref is the correlation matrix in which the elements are r x , and D ref is the diagonal matrix of the reference uncertainty.The uncertainty of the calibrated value of the target u x (x 0 ) estimated from the OLS estimator is given as follows: Unfortunately, the formulaic expression of the explicit form of u x (x 0 ) is nontrivial for higher-order polynomials.In a condition of no correlation (r x = 0), a curvature of u x (x 0 ) is quartically variated for the quadratic calibration function because of the nature of the regression variance-covariance matrix V ref , of which the highest-order element is x 2m .(Figure 2).
As a quantitative investigation of the correlation coefficient is beyond the limitation of the realization in many cases, a conservative approach is required.In an extremely conservative case, for an all-ones correlation matrix (all r x ij = 1), the measurement uncertainty of the target along the abscissa is coherently dispersed, as presented below: where q is the cost function of the fitted model of the OLS estimator and u(x) is the averaged uncertainty of u(x i ).It was assumed that the uncertainty values of references are close to each other, as shown in Table 1 and Figure 3.The fitted line spans out in the positive and negative directions by the extent of the width of the normal distribution, of which uncertainty is u(x) (Figure 1).As expected, u x (x 0 ) is smoothly variated as a function of x 0 when all correlations are fully positive (i.e., r x = 1) based on the assumption that u(x 0 ) is constant with respect to x 0 (Figure 2).The variation rate is lower at large values of x 0 because the slope of the tangent for the used dataset is higher at such values.In the case that x 0 is centered within the reference scale, Equation ( 26) can be simplified as follows: Figure 3. Graphical representation of the artificial dataset A given in Table 1.Note that the uncertainties of reference values and responses are amplified three times to be seen.The uncertainties of reference values were assumed to be fully correlated with each other.
Table 1.Datasets to test the proposed method in this study.Note that all u(y i ) values are close to the standard deviation of the fit residual of the corresponding dataset.The uncertainties of reference values were assumed to be fully correlated with each other.

Discussions
Artificial datasets mimicking the measurements of ambient-level N 2 O using a gas chromatograph with an electron capture detector (GC-ECD) were utilized to evaluate the uncertainty of calibrated value x 0 (Table 1).The artificial data are dispersed along the ordinate to have uncertainties, u(y i ), which are closely set to the standard deviation of fit residuals s = 0.00073, 0.00085, and 0.00041, respectively, for dataset A, B, and C.This aspect implies that u(y i ) are considered as the sampling uncertainties from the (x * , y * ) which was modeled by the OLS estimator.It should be noted that the OLS and the WLS methods yield similar results in the regression variance, where the variance-covariance matrix of the WLS estimator V βWLS was computed by the equation of X T WX −1 , where W is the weighting matrix and its elements are inversed u(y i ) [5].The u x (x 0 ) curves for r x = 1 and r x = 0 cross outside of the range of the reference scale, implying that a conservative approach can be achieved by considering the correlation effect of reference values within the reference scale (Figure 2).The post-evaluation of the correlation factors among reference values is hard to achieve; one may assume r x = 1 to avoid underestimating the calibration uncertainty due to uncertainties of reference values.
To verify the derived explicit equation of each uncertainty value, the Monte Carlo (MC) simulation was conducted employing identical datasets.In the MC method, individual data points were randomly sampled from normally distributed reference values (x i ), which were fully correlated to each other (all r x ij = 1), and normally distributed response values (y i ).The uncertainties of both x i and y i , denoted as u(x i ) and u(y i ), respectively, for each dataset are detailed in Table 1.A total of 10,500 d i = (x i , y i ) were randomly sampled from the population to compute the uncertainty value of the calibration target.Running the OLS estimator on each sampled data vector yielded the single regression coefficient vector β and calibrated value x 0 .An average and standard deviation of each parameter resulted in the corresponding value and its uncertainty.Because it is a valid assumption that a result by the MC simulation will be in close proximity to the unbiased limit of variance of the resulting distribution [40], a comparison between the x-correlated OLS method ( xc OLS) and the MC method allowed us to test the reliability of the xc OLS method (Figure 4).With LPU, it is hard to handle the GLS problem in which the SR matrix cannot be reduced to the implicit model equations [33].Therefore, approximating the uncertainties according to the LPU is prevalent in many cases.The uncertainties u 2 (x 0 ), u t (x 0 ), and u x (x 0 ) from both the xc OLS and MC methods exhibit a high degree of agreement for various datasets in ranges of variance of fit residuals s 2 .Covariances stemming from the coherent dispersion of reference values were observed to be minor, as seen in the comparison between the MC and xc OLS methods in the datasets (Tables 2 and 3 and Figure 4).A comparison of the variance values corresponding to the reference, V x (x 0 ) (gray); the target measurement, V t (x 0 ) (magenta); and regression, V 2 (x 0 ) (cyan).Note that the number of target measurement repetitions p is 1.Calibrated values obtained through each method are indicated on top of the respective bars.3. The combined standard uncertainty of the determined value of the calibration target under the assumption of unitive correlation of the reference values.β 1 is the slope of a fitted straight line, and x is the mean of the reference value.The measurement uncertainty of the target is assumed to be at a level similar to the measurement uncertainty of the reference.The unitive correlation among the reference variances is approximated to u x (x 0 ) ∼ = u re f .

Polynomial Order
Var(x 0 ) m > l

Conclusions
In the field of chemical and biological analysis, the OLS method is a prevalent choice for calibration tasks involving highly accurate RMs, particularly when the negligible uncertainty of references obviates the need for their consideration in the regression analysis.However, numerous scenarios, such as those encountered in graphite furnace atomic absorption spectroscopy, X-ray fluorescence spectroscopy, and spark emission spectroscopy, deviate from the simplicity of the OLS approach due to the presence of considerable variance-covariance in the independent variables, namely reference values.To address this challenge, an approximated propagation method for the variance-covariance of the data matrix becomes imperative.This study addresses a specific calibration scenario wherein the independent variables exhibit high correlation, and uncertainties along the ordinate are equivalent to the standard deviation of the fit residuals.If the uncertainties of the independent variables are deemed negligible, this scenario aligns with the traditional OLS case.Conceptually illustrated in Figure 1, the uncertainties of fully correlated independent variables were dispersed orthogonally along the abscissa.The validity of this concept is substantiated through the step-by-step derivation of explicit equations.This study meticulously budgets the contributions of variances arising from regression coefficients, target measurements, and references, offering an analytical framework for comprehending uncertainty propagation for the calibration scenario.The calibration task employing the xc OLS and the MC methods was exemplified using artificial datasets, with standard deviations of fit residuals s being 0.00073, 0.00085, and 0.00041.The resulting uncertainty values, as a metric for comparing results, were estimated by the xc OLS and the MC methods to show good agreement.This good agreement underscores the validity of the proposed approach in propagating uncertainties associated with highly correlated independent variables using the OLS estimator.

Figure 1 .
Figure 1.(Left) Notation of uncertainties and errors addressed in this study.u(x i ) and u(y i ) denote input variances in the reference and corresponding response values, respectively.The residual error ε i represents the deviation between its theoretical and input values along the ordinate.The measured response of the target is y 0 , which is calibrated to x 0 .(Right) Conceptual representation of linear dispersion of the calibrated value due to the fully correlated reference values.The tested dataset for the proposed calibration method is given inTable 1 in Section 2.5.

Figure 2 .
Figure 2.(Left) Propagated reference variance corresponds to the calibrated value when there is no correlation, r x = 0 (blue); and correlation is unity, r x = 1 (red).(Right) Uncertainties propagated from the uncertainties of reference u x (x 0 ) (gray); target measurement, u t (x 0 ) (magenta); and regression, u 2 (x 0 ) (cyan).The combined standard uncertainty u(x 0 ) is represented by the green line.For detailed discussions of u x (x 0 ) and u t (x 0 ), see Sections 2.4 and 2.5.

Figure 4 .
Figure 4.A comparison of the variance values corresponding to the reference, V x (x 0 ) (gray); the target measurement, V t (x 0 ) (magenta); and regression, V 2 (x 0 ) (cyan).Note that the number of target measurement repetitions p is 1.Calibrated values obtained through each method are indicated on top of the respective bars.

Table 2 .
Estimated regression coefficients for each dataset.Associated uncertainties are given in the parentheses.