Numerical Algorithms for Computing an Arbitrary Singular Value of a Tensor Sum †

: We consider computing an arbitrary singular value of a tensor sum: T : = I n ⊗ I m ⊗ A + I n ⊗ B ⊗ I (cid:96) + C ⊗ I m ⊗ I (cid:96) ∈ R (cid:96) mn × (cid:96) mn , where A ∈ R (cid:96) × (cid:96) , B ∈ R m × m , C ∈ R n × n . We focus on the shift-and-invert Lanczos method, which solves a shift-and-invert eigenvalue problem of ( T T T − ˜ σ 2 I (cid:96) mn ) − 1 , where ˜ σ is set to a scalar value close to the desired singular value. The desired singular value is computed by the maximum eigenvalue of the eigenvalue problem. This shift-and-invert Lanczos method needs to solve large-scale linear systems with the coefﬁcient matrix T T T − ˜ σ 2 I (cid:96) mn . The preconditioned conjugate gradient (PCG) method is applied since the direct methods cannot be applied due to the nonzero structure of the coefﬁcient matrix. However, it is difﬁcult in terms of memory requirements to simply implement the shift-and-invert Lanczos and the PCG methods since the size of T grows rapidly by the sizes of A , B , and C . In this paper, we present the following two techniques: (1) efﬁcient implementations of the shift-and-invert Lanczos method for the eigenvalue problem of T T T and the PCG method for T T T − ˜ σ 2 I (cid:96) mn using three-dimensional arrays (third-order tensors) and the n -mode products, and (2) preconditioning matrices of the PCG method based on the eigenvalue and the Schur decomposition of T . Finally, we show the effectiveness of the proposed methods through numerical experiments.


Introduction
We consider computing an arbitrary singular value of a tensor sum: where A ∈ R × , B ∈ R m×m , C ∈ R n×n , I n is the n × n identity matrix, and the symbol "⊗" denotes the Kronecker product. The tensor sum T arises from a finite difference discretization of three-dimensional constant coefficient partial differential equations (PDE) defined as follows: −a · ∇ * ∇ + b · ∇ + c u(x, y, z) = g(x, y, z) in Ω, u(x, y, z) = 0 on ∂Ω, where Ω = (0, 1) × (0, 1) × (0, 1), a, b ∈ R 3 , c ∈ R, and the symbol " * " denotes elementwise products. If a = (1, 1, 1), then a · (∇ * ∇) = ∆. Matrix T tends to be too large even if A, B and C are not. Hence it is difficult to compute singular values of T with regard to the memory requirement.
where X ijk denotes the (i, j, k) element of X . Secondly, the n-mode product of a tensor X ∈ R I 1 ×I 2 ×···×I N and a matrix M ∈ R J×I n is defined as (X × n M) i 1  Conversely, vec −1 operator can reshape a tensor from one long vector.

Shift-and-Invert Lanczos Method for an Arbitrary Singular Value over Tensor Space
This section gives an algorithm for computing an arbitrary singular value of the tensor sum T. Let σ and x be a desired singular value of T and the corresponding right singular vectors, respectively. Then, the eigenvalue problem of T is written by T T Tx = σ 2 x. Here, introducing a shiftσ ≈ σ, the shift-and-invert eigenvalue problem is The shift-and-invert Lanczos method (see, e.g., [3]) computes the nearest singular value σ based on Equation (3). Reconstructing this method over the × m × n tensor space, we obtain Algorithm 1 whose memory requirement is of O(n 3 ) when n = m = .
At step k, we have the followingT k by Algorithm 1: To implement Algorithm 1, we need to iteratively solve the linear system whose coefficient matrix is mn × mn, that is, the memory requirement is O(n 6 ) when n = m = . Here, the convergence rate of the shift and invert Lanczos method depends on the ratio of gaps between the maximum, the second maximum, and the minimum singular values σ 1 , σ 2 , σ m of (T T T −σ 2 I mn ) −1 as follows: (σ 2 1 − σ 2 2 )/(σ 2 1 − σ 2 m ). In the next section, we consider solving the linear systems with memory requirement of O(n 3 ) when n = m = .

Preconditioned Conjugate Gradient (PCG) Method over Tensor Space
This section provides an efficient solver of Equation (4) using tensors. This linear system is rewritten by v = T T T −σ 2 I mn −1 q k , where v := vec(V ) and q k := vec(Q k ).
Then we solve T T T −σ 2 I mn v = q k , where v and q k are unknown and known vectors.
Since the coefficient matrix is symmetric positive definite, we can use the preconditioned conjugate gradient method (PCG method, see, e.g., [14]), which is one of the widely used solvers. However, it is difficult to simply apply the method due to the complex nonzero structure of the coefficient matrix T T T −σ 2 I mn . For applying the PCG method, we consider transforming the linear system T T T −σ 2 I mn v = q k by the eigendecomposition and the complex Schur decomposition as shown in the next subsections.

PCG Method by the Eigendecomposition
Firstly, T is decomposed into T := XDX −1 , where X and D are a matrix whose column vectors are eigenvectors and a diagonal matrix with eigenvalues, respectively. Then, it follows that where D is the complex conjugate of D. We rewrite the above linear system intoÃỹ =b, whereÃ := DX H XD −σ 2 X H X,ỹ := X −1 v, andb := X H q k . Here, X is easily computed by small matrices X A , X B , and X C whose column vectors are eigenvectors of A, B, and C as follows: X = X C ⊗ X B ⊗ X A . Moreover, eigenvalues of T in D are obtained by summations of each eigenvalue of A, B, and C.
The PCG method for solvingÃỹ =b is shown in Algorithm 2. Since this algorithm computesỹ, we need to compute v = Xỹ. Section 4.1.1 proposes a preconditioning matrix and Section 4.1.2 provides efficient computations using tensors.
Since M is the diagonal matrix, the second condition of the preconditioning matrix is satisfied. Moreover, if T is symmetric, X is the unitary matrix, that is, X H X = I mn . In the case of the symmetric matrix T, we obtain M =Ã. Namely, the proposed matrix satisfies the first conditions when T is symmetric. So, even if T is not exactly symmetric, if T is almost symmetric, then we can expect the preconditioning matrix M to be effective.

Efficient Implementation of Algorithm 2 by the Eigendecomposition
Similarly to obtaining Algorithm 1, to improve an implementation of Algorithm 2, we reconstruct mn dimensional vectors into × m × n tensors via vec −1 operator as follows: X k := vec −1 (x k ), R k := vec −1 (r k ), P k := vec −1 (p k ), Z k := vec −1 (z k ), andP k := vec −1 (p k ). Most computations of vectors are simply transformed into computations of tensors because of the linearity of vec −1 operator.
In the rest of this section, we show the computations of vec −1 Ã vec(P k ) and vec −1 M −1 vec(R k ) , which are required in the PCG method, using the 1, 2, and 3mode products for tensors and the definition of T. First, from the definitions ofÃ and X, vec −1 Ã vec(P k ) = vec −1 (DX H XDvec(P k )) −σ 2 vec −1 (X H Xvec(P k )) holds. Let D = vec −1 (diag(D)), where diag(D) returns an mn-dimensional column vector with diagonals of D. Then, D ijk := λ j , and λ (C) k denote the eigenvalues of A, B, and C. Note that (vec −1 (Dvec(P k ))) ijk = D ijk (P k ) ijk since we compute (Dp k ) i = D ii (p k ) i for i = 1, 2, . . . , mn. Using the relation between the Kronecker product and the mode products via vec −1 operator, we compute vec −1 Ã vec(P ) where " * " denotes elementwise product. Next, from the definition of the diagonal matrix M in Equation (5), we easily obtain Here As shown in Algorithm 3, the PCG method can be implemented using the preconditioning matrix M and the aforementioned computations, where the linear systemÃỹ =b and vec(Ỹ ) :=ỹ. Algorithm 3 requires only small matrices A, B, and C and × m × n tensors X k , R k , P k , and Z k . Therefore the memory requirement is of O(n 3 ) in the case of n = m = .

PCG Method by the Schur Decomposition
Firstly, the Schur decomposition of T is T := QRQ H , where R and Q are upper triangular and unitary matrices, respectively. Then, This linear system denotesÃỹ =b, whereÃ := R H R −σ 2 I mn ,ỹ := Q H v, and b := Q H q k . The PCG method forÃỹ =b is shown in Algorithm 2. R and Q are obtained from the complex Schur decomposition of A, B, and C as follows: Algorithm 3: PCG method over tensor space for the 5-th line of Algorithm 1 [Proposed inner algorithm using the eigendecomposition] 1: Choose an initial tensor X 0 ∈ R ×m×n and P 0 = O ×m×n , and an initial scalar 10: ; 12: end for 13: Obtain an approximate solutionỸ ≈ X k ; 14

Preconditioning Matrix
A preconditioning matrix forÃỹ =b satisfies the conditions in Section 4.1.1. Therefore, we propose the preconditioning matrix based on the Schur decomposition where D R is a diagonal matrix with diagonals of R. Since M is also the diagonal matrix, the above second conditions are satisfied. Moreover, if T is symmetric, R is a diagonal matrix, that is, R = D R . Therefore M =Ã in the case of the symmetric matrix T. From this, we expect that the preconditioning matrix M is effective if T is not symmetric but almost symmetric.

Efficient Implementation of Algorithm 2 by the Schur Decomposition
We show the computations of vec −1 Ã vec(P k ) and vec −1 M −1 vec(R k ) for the PCG method over tensor space using the 1, 2, and 3-mode products for tensors and the definition of T. First, from the definitions ofÃ and R, we have vec −1 Ã vec(P k ) = vec −1 R H (Rvec(P k )) −σ 2 vec(P k ) . Therefore, Next, from M = D R D R −σ 2 I mn , we easily obtain Similarly to Section 4.
As shown in Algorithm 4, the PCG method can be implemented using the preconditioning matrix M and the aforementioned computations, where the linear systemÃỹ =b and vec(Ỹ ) :=ỹ. Algorithm 4 just requires small matrices A, B, and C and × m × n tensors X k , R k , P k , and Z k , namely, do not require large matrix T. Therefore the memory requirement is of O(n 3 ) in the case of n = m = .
Algorithm 4: PCG method over tensor space for the 5-th line of Algorithm 1 [Proposed inner algorithm using the Schur decomposition] 1: Choose an initial tensor X 0 ∈ R ×m×n and P 0 = O ×m×n , and an initial scalar ; 12: end for 13: Obtain an approximate solutionỸ ≈ X k ; 14

Numerical Experiments
This section provides results of numerical experiments using Algorithm 1 with Algorithm 3 and Algorithm 1 with Algorithm 4. There are the two purposes of this experiments: (1) to confirm convergence to the singular value of T nearest to the shift by Algorithm 1, and (2) to confirm the effectiveness of the proposed precondition matrix in Algorithms 3 and 4. For comparison, the results using Algorithms 3 and 4 in the case of M = I are also given as the results by the CG method. All the initial guesses of Algorithms 1, 3, and 4 are tensors with random numbers. The stopping criteria used in Algorithm 1 was β k ||e T k s k MAX || < 10 −8 , where s k MAX is the eigenvector corresponding to the maximum eigenvalue ofT k and e k denotes the k-th canonical basis for k dimensional vector space. Algorithms 3 and 4 were stopped when either the relative residual ||R k ||/||B|| < 10 −12 or the maximum number of iterations k > 20, 000 were satisfied.
All computations were carried out using MATLAB R2021a version on a workstation with Xeon processor 3.7 GHz and 128 GB of RAM.
In the following subsection, we show the results computing the 5-th maximum, median, and 5-th minimum singular values σ of the test matrices T. For all the cases, for the first purpose, we set the shift value in Algorithm 1 as whereσ's and σ's are the perturbed singular values of T and the aforementioned singular values computed by the svd function in MATLAB, respectively. Test matrices T in Equation (1) are obtained from a seven-point central difference discretization of the PDE (2) in over an (n + 1) × (n + 1) × (n + 1) grid. The test matrices T in Equation (1), whose size is n 3 × n 3 , are generated from where h = 1/(n + 1), M 1 and M 2 are symmetric and skew-symmetric matrices given below.

Numerical Results
In all tables, the number of iterations of the shift-and-invert Lanczos method ("the Lanczos method" hereafter) and the average of the number of iterations of the CG or PCG method based on the eigendecomposition or the Schur decomposition are summarized. "Not converged" denotes Algorithm 3 or 4 did not converge.
We show the first results in the case of almost symmetric matrix with a = c = 1 and b = 0.01 in Equation (9) for the shift (8). From Tables 1-3, the numbers of iterations of Lanczos methods using any inner algorithms were almost the same. Focusing on the effectiveness of the proposed preconditioning matrix M, the numbers of iterations of both PCG methods were less than 19 regardless of the size of T. On the other hand, the numbers of iterations of both CG methods linearly increased depending on the size of T. From these facts, the preconditioning matrix M is effective in the case of almost symmetric matrix T. Moreover, the number of iterations of the shift and invert Lanczos method for the median singular value is larger than the number for other singular values since the distance between the maximum and second maximum singular values of (T T T −σ 2 I mn ) −1 for the median singular value of T is closer than the cases of other singular values.
Here, the running time of Table 1 is summarized in Table 4. All the running time by the PCG method were less than the time by the CG method. Moreover, the running time by the PCG methods of Algorithms 3 and 4 were similar since the computational complexities of these algorithms are similar. Thus, the running time is strongly correlated with the number of iterations of Algorithms 3 and 4.
In addition, convergence histories of n = 15 in Tables 1 and 2 are shown in Figures 1 and 2. Figure 2 displays that the relative residual norms unsteadily decreased when the number of iterations of the shift and invert Lanczos method is not small.  (8).    Table 4. Running time (seconds) of the Lanczos method using the CG/PCG method in the case of the 5-th max. singular value of almost symmetric matrix T with a = c = 1 and b = 0.01 in (9) for the shift (8).   Next, we show the second results in the case of slightly symmetric matrix with a = c = 1 and b = 0.1 in Equation (9) for the shift (8). From Table 5, both PCG methods did not converge for computing the 5-th maximum singular values of slightly symmetric matrix T. It seems that the linear system for T T T −σ 2 I mn is ill-conditioned since 10 −2 in the shift (8) is much less than the 5-th maximum singular values of the matrix. In Appendix A, we show the results using relative shift without the effect of the magnitude of the singular values. Table 6 shows Algorithms 1 and 4, that is, the algorithms based on Schur decomposition, was more robust than Algorithms 1 and 3, that is, the algorithms based on the eigendecomposition. From Table 7, both PCG methods converged regardless of n, and the numbers of iterations of both PCG methods were less than the number of iterations of both CG method. Namely, it seems that the preconditioning matrix M can be effective in the case of a slightly symmetric matrix T when computing the 5-th minimum and median singular values of T.  Table 6. Number of iterations of the Lanczos method and the average of numbers of iterations of the CG/PCG method in the case of the median of singular value of slightly symmetric matrix T with a = c = 1 and b = 0.1 in (9) for the shift (8).   Finally, we show the third results in the case of marginally symmetric matrix with a = c = 1 and b = 0.2 in Equation (9) for the shift (8). Both PCG methods did not converge for computing the 5-th maximum singular values of T as shown in Table 8, similarly to  Table 5. Moreover, computing the median singular values of T sometimes did not converge from Table 9. In Table 10, all methods converged for the 5-th minimum singular value of T. The numbers of iterations by the PCG method with the proposed preconditioning matrix were less than the number of iterations by the CG method in most cases. It seems that the preconditioning matrix M can be effective in the case of the marginally symmetric matrix T when computing the 5-th minimum singular values of T. Table 8. Number of iterations of the Lanczos method and the average of numbers of iterations of the CG/PCG method in the case of the 5-th max. singular value of marginally symmetric matrix T with a = c = 1 and b = 0.2 in (9) for the shift (8).

Conclusions
We considered computing an arbitrary singular value of a tensor sum. The shift-andinvert Lanczos method and the PCG method reconstructed over tensor space. We proposed the preconditioning matrices which are the following two diagonal matrices: (1) whose diagonals of the eigenvalues by the eigendecomposition, and (2) whose diagonals of the upper diagonal matrix by the Schur decomposition. This preconditioning matrix can be effective if the tensor sum is almost symmetric.
From numerical results, we confirmed that the proposed method reduces memory requirements without any conditions. The numbers of iterations of the PCG method by the proposed preconditioning matrix were reduced in most cases of the almost and slightly symmetric matrix. Moreover, the numbers of iterations of the PCG method by the proposed preconditioning matrix were also reduced in certain cases of the marginally symmetric matrix.
For future work, we will consider a robust preconditioning matrix for slightly or marginally symmetric tensor sum, a suitable preconditioning matrix for non-symmetric tensor sum, parallel implementations of the proposed algorithms, and finding real-life applications.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript:

PCG Preconditioned Conjugate Gradient CG
Conjugate Gradient PDE Partial Differential Equation

Appendix A
This appendix gives the numerical results in the case of the 5-th maximum and the median singular values of slightly and marginally symmetric matrices by the relative shift where σ's are the singular values of T computed by the svd function in MATLAB. The condition of the numerical experiments except for the setting of the shift is the same as the experiments in Section 5. Firstly, we show the results in the case of slightly symmetric matrix with a = c = 1 and b = 0.1 in Equation (9) for the shift (A1) in Tables A1 and A2. Computing the 5-th and the median singular values of the slightly symmetric matrix using the shift (A1), the number of iterations of both PCG methods is much less than the number of iterations of both CG methods.
Secondly, Tables A3 and A4 are the results in the case of marginally symmetric matrix with a = c = 1 and b = 0.2 in Equation (9) for the shift (A1). From Tables A3 and A4, both PCG methods converged faster than both CG method using the relative shift. Moreover, the PCG method by Algorithm 4 is more robust than the PCG method by Algorithm 3.
Consequently, when we compute the 5-th maximum and the median singular values of the slightly symmetric matrix, the numerical experiments of Section 5 and Appendix A imply that the proposed preconditioning matrix can work in the case of a suitable shift.