Roundoff Error Analysis of an Algorithm Based on Householder Bidiagonalization for Total Least Squares Problems

: For large-scale problems, how to establish an algorithm with high accuracy and stability is particularly important. In this paper, the Householder bidiagonalization total least squares (HBITLS) algorithm and nonlinear iterative partial least squares for total least squares (NIPALS-TLS) algorithm were established, by which the same approximate TLS solutions was obtained. In addition, the propagation of the roundoff error for the process of the HBITLS algorithm was analyzed, and the mixed forward-backward stability of these two algorithms was proved. Furthermore, an upper bound of roundoff error was derived, which presents a more detailed and clearer approximation of the computed solution.


Introduction
Consider estimating x from the overdetermined linear system where the error exists in both the right-hand side b and the data matrix A and m ≥ n + 1.
In this case, the total least squares (TLS) model should be appropriate to adopt (cf. [1,2]). The TLS approach is just to find a perturbation with the minimum Frobenius norm to make the system (1) a compatible system min (E, r) F , subject to b + r ∈ Range(A + E).
The TLS method is widely used in various scientific fields, such as physics, automatic control, signal processing, statistics, economics, biology, medicine etc. In essence, a solution of a TLS problem can be expressed by a singular value decomposition of the augmented matrix (A, b). When the dimensions of A are not too large, one can use the truncated-SVD (TSVD) method. When the dimensions of A become large, this approach becomes prohibitive because the SVD algorithm is of complexity O(mn 2 ). The above considerations lead us to consider Krylov iterative methods, that do not alter the matrix A. The methods have the attractive feature just like the Lanczos methods-that when n increases, the computed extreme singular elements rapidly become good approximations to the exact ones, and are satisfactorily accurate even if k is far less than n theoretically [1]. Nevertheless, the orthonormal properties of the Krylov basis strongly support the use of these Householder matrix-based algorithms. This is particularly true when we need to be sure that the perturbed problem we are solving has to conserve some spectral similarity properties. This will be especially relevant when we need to compute approximations of the TLS problems.
In view of this, we consider applying the Householder bidiagonalization algorithm and the NIPALS PLS algorithm posed by Å. Björck [3] to TLS problems, the formed Householder bidiagonalization total least squares (HBITLS) algorithm, and NIPALS-TLS algorithm, respectively. Furthermore, we find that the HBITLS and NIPALS-TLS algorithms also compute the same approximate solutions for the TLS problems.
When it comes to practical problems, the arithmetic will be inaccurate and there will be errors in each step of the calculation. Arithmetic operations running on the computer have finite precision, so there will be rounding errors as long as there are numerical computations. These rounding errors cause the calculation quantities to be different from their theoretical values. One of the design principles of the floating-point operation is that it should encourage experts to develop robust, efficient, and portable numerical programs, enable the handling of arithmetic exceptions, and provide for the development of transcendental functions and high-precision arithmetic [4]. The results in the roundoff error analysis in Lanczos-type methods obtained by Paige [5][6][7] played an important role in interpreting the behavior of the Lanczos method in finite-precision computations. Parlett and Scott [8] used the results of the roundoff error analysis as the basis for suggesting a modification of the Lanczos method, which they called selective orthogonalization [8][9][10]. In addition, in many practical problems, the stop criterion can be safely selected on the basis of the rounding error analysis of the original problem, thereby diminishing the need for an extremely precise approximation of the algebraic problem solution [4]. As far as we know, the roundoff error analysis of the approximation TLS solutions obtained by using the Householder bidiagonalization procedure was not systematically performed in the literature. Hence, in this paper, we analyzed the propagation of the roundoff error during the process of the HBITLS algorithm and found that the HBITLS algorithm and NIPALS-TLS algorithm are mixed forward-backward stable.
The paper is organized as follows. The HBITLS algorithm and NIPALS-TLS algorithm were established, by which the same approximate TLS solution was obtained in Section 2. Section 3 analyzes the propagation of the roundoff error during the process of the HBITLS algorithm. A brief conclusion is shown in the last section.

HBITLS Algorithm and NIPALS-TLS Algorithm
It is well known that algorithms based on a sequence of orthogonal transformations with Householder matrices have very good stability properties; see Higham [4]. Based on this, this paper gives the HBITLS and NIPALS-TLS algorithms and finds that they both compute the same approximate TLS solutions.

HBITLS Algorithm
Let us first describe the Householder bidiagonalization process just as shown in [3]. However, in this paper, the process is used in the augmented matrix (b, A) for the TLS problem. The idea is to compute orthogonal matrices U ∈ R m×(n+1) and V ∈ R (n+1)×(n+1) , such that U = G 1 · · · G n = [u 1 , u 2 , · · · , u n+1 ] and V = H 1 · · · H n = [v 1 , v 2 , · · · , v n+1 ] can be determined as a product of Householder matrices in each iteration. Generally, G k introduces zero in the kth column, while H k sets zero for the appropriate entries in the kth row. This can be done by an algorithm named Householder. Given the reason of space, and known to all, we omit it here, see algorithm 5.4.2 in [1] for details.
From the above process of Householder bidiagonalization, we know that V can be In detail, from (3), we have From (3), we have and there comes leading principal submatrix of order k + 1 of the final bidiagonal matrix C n . As we all know, if the exact arithmetic is used, we have U T k+1 U k+1 = I, V T k V k = I. However, in any case, the previous equations remain within machine precision. Then (4) and (5) can be rewritten as whereC k = (C k , α k+1 e k ) ∈ R k×(k+1) . After performing the k-step Householder bidiagonalization iterations, the TLS problem can be reduced onto the subspace generated by U k+1 and V k . Then the reduced TLS problem (also see [11]) is as follows or where e 1 = (1, 0, · · · , 0) T , andB k andê k are generally full. As in LSQR, seek an approximate TLS solution where K k (B, y) denotes the Krylov subspace span{y, By, ..., B k−1 y}. Let the SVD of (β 1 e 1 , B k ) =W kΣkZ T k and if let withZ then the approximate TLS solution is given by Note that we only need the last singular vectorZ k e k+1 to compute x k . To this extent, summarizing the above process, we can get the Householder bidiagonalization TLS (HBITLS) algorithm as follows: Remark 1. A variant of Algorithm 1 can also be given, in which the product of the Householder transformations applying to vectors are replaced by operations that can be performed concurrently, to a large extent. This variation gives an efficient method for developing parallelism in the case of parallel computing matrix vector products. In regard to this variation of Algorithm 1, one can refer to [12] and we omit it here.
. end 3: Compute the last singular triplet for matrix C k by employing the implicit zero-shift QR algorithm. 4: the approximate TLS solution is given by Using the recursions (4) and (5), the following properties of {v 1 , v 2 , · · · , v i } and {u 1 , u 2 , · · · , u i } can be proved.

Lemma 1.
The sets {v 1 , v 2 , · · · , v i } and {u 1 , u 2 , · · · , u i } generated by Algorithm 1 are the orthonormal basis of K i (A T A, A T b) and K i (AA T , AA T b) respectively. 1 and the process of Householder bidiagonalization, for 1 ≤ k ≤ i, it's easy to know that v j = H 1 · · · H k e j for j = 1, · · · , k, i.e., that

Proof. As a result of the facts that β
It is easy to see from (4) that On the other hand, if The Householder matrices H i and G i need not be formed explicitly. In other words, the matrices V k and U k can also remain in product form in the HBITLS algorithm. In floating-point operations, the Householder transformation does not have to worry too much about the loss of orthogonality.

The NIPALS-TLS Algorithm
For the NIPALS PLS algorithm, one can see in [3,13]. In this paper, we want to use it to solve the TLS problems and then form the NIPALS-TLS algorithm. We can find that the HBITLS algorithm and NIPALS-TLS algorithm generate the same sequences, orthonormal base V k . From the uniqueness of this base, and combined with the relationship between the two algorithms, we conclude that the two algorithms generate the same numerical solution x k .
In [3], it tells us that we can set A 0 = A, b 0 = b, for k = 1, 2, · · · , we can produce sequences u k and v k according to the following form: In (16) A k and b k are formed by deflated A k−1 and b k−1 by subtracting their orthogonal projections onto p k . We know that this operation uses elementary orthogonal transformations, such that S = I − pp T , p 2 = 1. The deflation in (16) can also be written as The process is terminated when it meets either then the rank of the matrix A k is one less than that of A k−1 exactly.
Using exact arithmetic, the sets {v 1 , v 2 , · · · , v k } and {p 1 , p 2 , · · · , p k } generated by (14) and (15) are the unique orthogonal bases for the Krylov sub-spaces K k (A T A, A T b) and K k (AA T , AA T b), respectively. Summing (17) and (18) generates where P k = [p 1 , p 2 , · · · , p k ], S k = [s 1 , s 2 , · · · , s k ], and z k = (ζ 1 , ζ 2 , · · · , ζ k ) T . These relationships maintain working accuracy and do not depend on orthogonality. The matrix P k S T k is a rank-k approximation to the data matrix A. From [3], we have S T k = P T k A and S T k V k = R k . Thus, in exact arithmetic, the matrix S T k V k is upper bidiagonal with its elements and By Paige [14], we know that R k must be identical to the matrix that would be obtained from the conventional QR factorization of B k , such that Then we have , then the solution of the projected TLS problem (9) is , and the TLS solution is . And the following theorem comes by (12) and (21) Theorem 1. the HBITLS and NIPALS-TLS algorithms compute the same approximate solutions x k .

Roundoff Error Analysis
In this section, we analyze the propagation of roundoff error during the process of the HBITLS algorithm and get the mixed forward-backward stability of the HBITLS algorithm and NIPALS-TLS algorithm naturally. The total roundoff error during the process of the HBITLS algorithm can be divided into the following four parts: First, we can find that the HBITLS algorithm solves the original TLS problem (2) to a perturbed TLS problem. The propagation of the roundoff error of a Householder matrix in the HBITLS algorithm is advantageous when performing numerical computations.
From now one, we will denote by ε the machine precision under consideration. In [15], it shows that the computed Householder matrix f l(H) comes near the exact Householder matrix H itself: Moreover, for a vector y, the computed updates with f l(H) are very close to the exact updates with H : f l( f l(H)y) = H(y + w), w 2 ≤ 87ε y 2 + O(ε 2 ). and, in general, f l( f l(H 1 ) · · · f l(H j )y) = H 1 · · · H j (y + z), z 2 ≤ 87jε y 2 + O(ε 2 ).
The following lemma tells us that the reduced system calculated by the HBITLS algorithm is equivalent to the system formed after the original system has been disturbed. B k ,Q k andP k+1 are the floating-point computation of the matrices B k , V k and U k+1 in HBITLS algorithm, respectively. Lemma 2. LetB k be the computed bidiagonalization matrix (k + 1) × k matrix obtained by the HBITLS algorithm. Then, there comes a perturbation matrix E and exists two column orthogonal matricesQ k andP k+1 s.t.
where n is the number of columns of matrix A. Furthermore, the matrixQ k is an orthonormal Proof. We prove this theorem by induction. The key point is that we should show the computed matrix, which will be shown by introduction from (3), for G T k · · · G T 1 [v 1 , Av 1 , · · · , Av k ] as followsP For k = 1, first, let u 1 = b/ b 2 ,ū 1 = f l(u 1 ), a Householder matrix P 1 is found s.t. P 1 u 1 = e 1 . SetP 1 = f l(P 1 ), ref. [16] tells us that, corresponding to matrixP 1 , we can find a Householder matrix to make Similarly, for Av 1 , the computed result can be written as f l(Av 1 ) = (A + G 1 )v 1 , where G 1 2 ≤ ε3 √ n A 2 . Now, we set the Householder matrix P 2 s.t. P 2 P 1 [v 1 , Av 1 ] is upper bidiagonal matrix. We know P 2 only works the vector P 1 Av 1 , so there's no change for the 1st column of f l(P 1 [ f l(v 1 ), f l(Av 1 )]) when producted byP 2 andP 2 . Likely, there's a Householder matrixP 2 s.t.P 2 f l(P 1 [v 1 , Av 1 ] is bidiagonal matrix in theory, but the algorithm computes a matrixP 2 in practice [16] such that Finally, we have For the kth step, assume that the HBITLS algorithm has calculated the matrices P 1 , · · · ,P k , associated with the Householder matricesP 1 , · · · ,P k . Then, after k steps, we can get the following result: Likely, there is a Householder matrix P k+1 , which only works on the vector P k · · · P 1 Av k , s.t. P k+1 P k · · · P 1 [v 1 , Av 1 , · · · , Av k ] is the upper bidiagonal matrix. The algorithm computes a matrixP k+1 in practice so that Then the floating-point matrix is obtained, such that LetP (k) ,Q (k) be the matrices, such that (P (k) ) T =P k · · ·P 1 and (Q (k) ) T =Q k · · ·Q 1 , respectively, we find that the first (i − 1) rows of eachQ i is e 1 , · · · , e i−1 . Let q (j) i be the i-th column ofQ (j) , then the results are as follows Then there comes it is an n × (k + 1) upper bidiagonal matrix. And, ∀j ≤ k, we obtain j is the j-th column q j of the matrixQ (j) , if we denote by y j = G j q and so we can obtain = (P (k+1) ) T (v 1 , Aq 1 , · · · , Aq k ) + (g 1 , y 1 , · · · , y k ) .
If we cut off the first column of the matrix, we can setB k with n × k such thatB k = (B T k , 0) T , hereB k is an (k + 1) × k upper bidiagonal matrix. If we denote F i = [y l , · · · , y i ], ∀i, thenB k = (P (k+1) ) T A[q 1 , · · · , q k ] + F k . In addition, ∀j ≤ k, letQ j be the matrix made up of the first j columns ofQ (k) , we obtainP (k+1)B k = AQ (k) + F k , and, from the structure ofB k , there comesP k+1Bk = AQ k + F k . Then we can write F k = F nQ T nQk owing toQ T nQk = I k 0 , and soP If E = F nQ T n , we can finish the proof of the first part of the lemma, because Finally, we prove that the subspace spanned by the columns of the matrixQ k is an orthogonal basis of a Krylov space. LetÃ = A + E,b = A T b 2Qk e 1 and form K k (Ã TÃ ,Ã Tb ). We know thatb = A T b 2Qk e 1 = A T b 2Q1 e 1 and set e =b − A T b, then we have We still prove the rest of the theorem by induction; that is, to prove where each vector r i has only the first (i + 1) components, which are different from zero. For i = 1, we havẽ T kBk e 1 =Q k r 1 , since the last component of the vectorB T kB k e 1 is zero, in addition, except for the first two components, the rest of the components of vector r 1 are all zero.
Suppose for a given i the following relation is true, and in the next step, we will show it is true for i + 1. From the inductive hypothesis, we know that only the first (i + 1) components of r i are not zero; therefore, the last component is zero of the vectorB T kB k r i with (i + 2) non-zero elements. Then there comes a conclusion that and, hence, the lemma is proved.
Based on Lemma 2 and Algorithm 1, for the bidiagonalization matrixB k obtained by the HBITLS algorithm, one can find an orthonormal matrixQ k s.t.
whereQ k is just an orthogonal basis of K k ((A + E) T (A + E), (A + E) T (b + e)). Based on this, we know that the first part of HBITLS , in exact arithmetic, gives the exact basis of the perturbed Krylov space K k ((A + E) T (A + E), (A + E) T (b + e)). A perturbation bound for TLS solutions is given by Xie and Wei [17], see Lemma 3, which is related to the smallest singular value σ n+1 of (A, b), the TLS solution x, and the residual r = b − Ax. LetÂ = A + ∆A andb = b + ∆b. Then, the unique solution of the perturbed TLS problem can be expressed asx.
The perturbation bound is obtained under the genericity conditionσ n > σ n+1 , whereσ n is the smallest singular value of A Lemma 3 ([17]). Consider the TLS problem (2) and assume that the genericity condition holds. If (∆A, ∆b) F is sufficiently small, then we obtain that Suppose that x andx are the exact TLS solutions of Ax ≈ b and (A + E)x ≈ b + e respectively. The error introduced in this part of the HBITLS algorithm is the inherent error, so we can give x −x 2 by Lemma 3 and Lemma 2 easily, see Theorem 2.
Secondly, let us consider the error between the TLS solution of the system (A + E)x ≈ b + e and the approximation solutionx k =V kŷk of the system computed by the HBITLS algorithm at step k with the exact arithmetic, i.e.,ŷ k is the exact solution of the reduced TLS problem For convenience, defineV k = 1 0 0 V k and let where W and Z = [z 1 , z 2 , · · · , z n+1 ] are orthogonal matrices of dimension m and n + 1, respectively, Σ is an m × (n + 1) diagonal matrix whose diagonal entries σ i are the singular values of (b, A), sorted in non-increasing order. Let θ(u, v) be the subspace angle between R(u) and R(v).

and, therefore
Moreover, let σ k1 ≥ σ k2 ≥ · · · ≥ σ kn be the singular values of B k produced after k steps of the implicit zero-shift QR algorithm. Then if condition (30) holds, and all perturbation angles θ satisfy sin 2 θ ≤ τ < 1, we obtain James and Kahan [19] also give the relative differences between the singular vectors of B and the ones ofB.
Then, combining with the perturbation bound of TLS given in [20] as shown in Lemma 7, we can give the error estimate ŷ k −ȳ k 2 .
If If max( ∆Ã , ∆Ǎ + ∆A < σ k (the k-th singular value of A) may be provided. Then where v andv are the smallest right singular vectors of (A, b) and (Ā,b) respectively.
In summary, if letx k = f l(V kȳk ) be the final computed solution at the k-th step, then roundoff error analysis of HBITLS algorithm for TLS problem can be shown as follows Theorem 2. Considering the HBITLS algorithm at step k , the roundoff error emerged during the algorithm can be bounded as follows: whereẑ andẑ (k) are defined in (26) similarly,ž is the computed smallest right singular vector of (β 1 e 1 ,B k ).
Proof. The roundoff error can be composed of the following parts and we analyze these errors separately.
For the first part, x andx are the TLS solutions of the systems Ax ≈ b and (A + E)x ≈ b + e, respectively, in line with Lemma 2, so the error of this part is the inherent error. Then, combining with Lemma 3, we have where κ A and κ b see Lemma 3. For the second part, this error is owing to the approximate solution of (A + E)x ≈ b + e obtained by using HBITLS algorithm after k steps with the exact arithmetic. Lemma 4 tells us that x −x k ≤ sin θ(ẑ,ẑ (k) ) 1 + x 2 1 + x k 2 .
For the third part, it is noticed thatx k =V kŷk , we have that x k −V kȳk = ŷ k −ȳ k , where ŷ k −ȳ k is the roundoff error stem from the projected TLS solution. Since B k is a special form of bidiagonal matrices, we consider using the implicit zero-shift QR algorithm to perform singular value decomposition. (31) gives an upper bound of the angle between the solution vectors, and combining Lemma 7, we know where θ(ẑ (k) ,ž) is the subspace angle between the sub-spaces produced by the smallest right singular vector and the computed smallest right singular vector of (β 1 e 1 ,B k ), respectively.
For the last part, we know V k = H 1 H 2 · · · H k I k 0 , where H 1 H 2 · · · H k is the product of k Householder matrices. So, on the basis of (22), we obtain By theorem 2, we get the mixed forward-backward stability of the HBITLS algorithm and NIPALS-TLS algorithm naturally. The backward stability will generate perturbation that will marginally influence the theoretical convergence of the residual to zero.

Remark 2.
The bound we introduced in Theorem 2 shows that the total roundoff errors are dominated by the approximation errors x −x k . From this, we can know that, in many practical problems, we can safely select the stopping criteria required by the algorithm based on the theoretical nature of the original problem. This shows that, in a great deal of practical studies, the stopping criteria may be effectively selected based on the theoretical properties of the problem itself, thereby reducing the cost required to pursue an extremely accurate approximate solution to the original problem.

Conclusions
For large-scale problems, how to give an algorithm with good accuracy and stability is particularly important. In this paper, the Householder bidiagonalization total least squares (HBITLS) algorithm and nonlinear iterative partial least squares (NIPALS-TLS) algorithm are given. The HBITLS uses the Householder bidiagonalization algorithm for reducing (b, A) to upper bidiagonal form and then runs the implicit zero-shift QR algorithm to compute the smallest right singular vector of the reduced form for the approximation solutions. The NIPALS-TLS is based on rank-reducing orthogonal projections. The two algorithms compute the same approximate TLS solutions. By analyzing the propagation of the roundoff error during the process of the HBITLS algorithm, we find that the HBITLS algorithm and the NIPALS-TLS algorithm are to be mixed forward-backward stable. In addition, in many practical problems, the stop criterion can be safely selected on the basis of the rounding error analysis of the original problem. The upper bound of our roundoff error gives a more detailed and clearer approximation of the computed solution.
Author Contributions: Conceptualization, Z.Y. and X.L.; methodology, Z.Y. and X.L.; formal analysis, Z.Y.; data curation, X.L.; writing-original draft preparation, X.L.; writing-review and editing, Z.Y. and X.L. All authors have read and agreed to the published version of the manuscript.