Scalability of k-Tridiagonal Matrix Singular Value Decomposition

Tănăsescu, Andrei; Carabaş, Mihai; Pop, Florin; Popescu, Pantelimon George

doi:10.3390/math9233123

Open AccessArticle

Scalability of k-Tridiagonal Matrix Singular Value Decomposition

¹

Computer Science and Engineering Department, Faculty of Automatic Control and Computer Science, University Politehnica of Bucharest, Splaiul Independentei 313, 060042 Bucharest, Romania

²

National Institute for Research & Development in Informatics—ICI, 011455 Bucharest, Romania

^*

Author to whom correspondence should be addressed.

Mathematics 2021, 9(23), 3123; https://doi.org/10.3390/math9233123

Submission received: 12 October 2021 / Revised: 27 November 2021 / Accepted: 30 November 2021 / Published: 3 December 2021

(This article belongs to the Special Issue Models and Algorithms in Cybersecurity)

Download

Browse Figures

Versions Notes

Abstract

:

Singular value decomposition has recently seen a great theoretical improvement for k-tridiagonal matrices, obtaining a considerable speed up over all previous implementations, but at the cost of not ordering the singular values. We provide here a refinement of this method, proving that reordering singular values does not affect performance. We complement our refinement with a scalability study on a real physical cluster setup, offering surprising results. Thus, this method provides a major step up over standard industry implementations.

Keywords:

scalability; k-tridiagonal matrix; singular value decomposition; LAPACK

1. Introduction

The singular value decomposition [1] of a matrix

M \in R^{m \times n}

is

M = U S V^{T}

, such that

U \in R^{m \times m}

and

V \in R^{n \times n}

are orthogonal matrices and

S \in R^{m \times n}

is a rectangular diagonal matrix containing the singular values of

M

, i.e., the square roots of the eigenvalues of

M^{T} M

. One of the most important application of the singular value decomposition is principal component analysis [2,3], which exploits the Eckart–Young theorem [1], which states that the closest rank k approximation of

M

under the Frobenius norm is

\hat{M} = \hat{U} \hat{S} {\hat{V}}^{T}

, where

\hat{S} \in R^{k \times k}

are the largest k singular values of

M

and

\hat{U}, \hat{V}

are their corresponding singular vectors. If the diagonal entries of

S

are sorted in descending order, then

\hat{S}

consists of the first k lines and columns of

S

whereas

\hat{U}

and

\hat{V}

consist of the first k columns of

U

and

V

, respectively. The matrix

\hat{T} = \hat{U} \hat{S}

gives the principal k components of

M

and is core to dimensionality reduction [2].

Some applications of obtained results are also relevant in ODE or PDE models [4]. For example, matrices involved in PDE are usually band matrices, such as double-banded matrices, encountered in the Lame equation, where, in particular, k-tridiagonal matrices also arise. Consequently, the tests that we ran as part of this paper are relevant in the context of the Lame equation [5]. Furthermore, fractional differential equations have also become popular in recent years [6,7].

Singular value decomposition is a very important tool that is core to the development of new technologies, being used, for example, in soft sensors [8], as well as for the estimataion of 5G channel parameters [9]. In fact, SVD shows its value as a computationally efficient dimensionality reduction method when confronted with large amounts of data [10], which is often sharded, a situation which has prompted innovations such as privacy-preserving SVD [11].

An important challenge when working with SVD is that most known exact algorithms [12,13,14,15,16] have complexity

O (m^{2} n + m n^{2} + n^{3})

, which led the literature to approximation methods [17]. The classical methods of numerically computing the SVD all involve bringing the matrix

M

to a bidiagonal form and using an iterative method to find its eigenvalues. Further refinements, such as implicit bidiagonalization, e.g., the IRLANB [18], IRRLANB [19] and AIRLB [20] algorithms, are the state-of-the-art for sparse matrices. This shows that core to a general SVD solver is a solver for a very particular matrix class—the bidiagonal matrices.

A recent direction in numerical computation research pertains to k-tridiagonal matrices [21,22,23,24,25,26,27,28,29], for which, important algorithms, such as block-diagonalization [21], matrix inverse [22,23,26] and singular value decomposition [30], are improved by several orders of magnitude. A k-tridiagonal matrix [22]

T \in R^{n \times n}

is a matrix whose elements lay only on its main and kth upper and lower diagonals, i.e., there are some

d \in R^{n}

and

a, b \in R^{n - k}

, such that

T = [\begin{matrix} d_{1} & 0 & \dots & 0 & a_{1} & 0 & \dots & 0 \\ 0 & ⋱ & 0 & \dots & 0 & a_{2} & ⋱ & ⋮ \\ ⋮ & 0 & ⋱ & ⋱ & \dots & ⋱ & ⋱ & 0 \\ 0 & \dots & ⋱ & ⋱ & ⋱ & \dots & ⋱ & a_{n - k} \\ b_{1} & 0 & \dots & ⋱ & ⋱ & ⋱ & \dots & 0 \\ 0 & b_{2} & ⋱ & \dots & ⋱ & ⋱ & 0 & ⋮ \\ ⋮ & ⋱ & ⋱ & ⋱ & \dots & 0 & d_{n - 1} & 0 \\ 0 & \dots & 0 & b_{n - k} & 0 & \dots & 0 & d_{n} \end{matrix}] .

The key insight in this research direction is that, if pre- and post-processing are used to improve data locality, then the resulting algorithm is much faster, especially when the workload is found to be highly parallelizable.

While a recent paper [30] studied complexity, in this paper, we make a thorough experimental study of the applicability and benefit of the SVD algorithm for k-tridiagonal matrices on a grid system and investigate their true scalability. Moreover, we improve on the algorithm of [30] by also sorting the singular values of

S

, and confirm that this additional post-processing does not alter the scaling potential. In the previous published results, the new optimized algorithm for SVD of the k-tridiagonal matrix is presented, whereas the current paper focuses on parallelization and performance evaluation in terms of scalability.

2. Background

In this section, we shortly review some proprieties of k-tridiagonal matrices that allow for their efficient processing. An important property of k-tridiagonal matrices is that the connected components of the underlaying graph of

T

are the equivalence classes modulo k [21], i.e.,

{\bar{r}}_{n, k} = \{i \in 1, \dots, n ∣ i \equiv r (\mod k)\} .

Then, if we let

P_{{\bar{r}}_{n, k}} \in R^{n \times |{\bar{r}}_{n, k}|}

, such that the

i^{t h}

column of

P_{{\bar{r}}_{n, k}}

is the canonical vector

e_{r + k (i - 1)}

, then the permutation matrix

P_{σ_{n, k}} = [\begin{matrix} P_{{\bar{1}}_{n, k}} & P_{{\bar{2}}_{n, k}} & \dots & P_{{\bar{k}}_{n, k}} \end{matrix}]

pivots

T

into its block diagonal form [21],

T = P_{σ_{n, k}} (T^{1} \oplus T^{2} \oplus \dots \oplus T^{k}) P_{σ_{n, k}}^{T},

where its blocks

T^{i} \in R^{|{\bar{r}}_{n, k}| \times |{\bar{r}}_{n, k}|}

are tridiagonal matrices.

Thus, if the singular value decomposition of each block is

T^{i} = U^{i} S^{i} {V^{i}}^{T}

, then [30]

\begin{matrix} T = & P_{σ_{n, k}} (\oplus_{i = 1}^{k} U^{i} S^{i} {V^{i}}^{T}) P_{σ_{n, k}}^{T} \\ = & \underset{U}{\underset{︸}{(P_{σ_{n, k}} \oplus_{i = 1}^{k} U^{i})}} \underset{S}{\underset{︸}{\oplus_{i = 1}^{k} S^{i}}} \underset{V^{T}}{\underset{︸}{{(P_{σ_{n, k}} \oplus_{i = 1}^{k} V^{i})}^{T}}}, \end{matrix}

(1)

where

S

are the singular values of

T

, and

U

and

V

are the singular vectors of

T

, but the diagonal matrix

S

is not necessarily sorted, as is conventional, which may hinder certain applications, such as the truncated SVD used for PCA.

3. Parallel Singular Value Decomposition for $k$ -Tridiagonal Matrices

In this section, we make a modification to the SVD algorithm of [30] to obtain the sorted list of singular values.

For any permutation matrix

P_{τ}

, using Equation (1), we obtain

T = \underset{U_{τ}}{\underset{︸}{(P_{σ_{n, k}} \oplus_{i = 1}^{k} U^{i} P_{τ}^{T})}} \underset{S_{τ}}{\underset{︸}{(P_{τ} \oplus_{i = 1}^{k} S^{i} P_{τ}^{T})}} \underset{V_{τ}^{T}}{\underset{︸}{{(P_{σ_{n, k}} \oplus_{i = 1}^{k} V^{i} P_{τ}^{T})}^{T}}} .

and if

τ

sorts

S

descendingly, this is the conventional SVD.

Notably, determining

τ

can be achieved by a usual algorithm in

O (n log n)

. However, there are more interesting options, such as using an additional thread for a job scheduler based upon a k-size heap in time

O (n log k)

, or having a hierarchical job structure (parallel merge sort), which also takes time

O (n log k)

. The main benefit of a heap is that it can be used as a scheduler and prioritize the running threads, which can be used to halt the producer in the case of a truncated SVD.

We have thus proven the following result:

Theorem 1.

Let the performance of a black box conventional SVD algorithm be

T (n)

for

n \times n

matrices. Then, our algorithm allows for computing the conventional SVD of a k-tridiagonal

n \times n

matrix with complexity

min (k / t, k) \cdot (T (n / k) + O (n^{2} / k^{2})) + O (n log k)

, where t is the number of maximum concurrent threads.

Proof.

The first term comes from the result of [30], whereas the second is due to the complexity of merging k vectors. □

We now illustrate Theorem 1 through an explicit algorithm.

Notice that, for the truncation of all singular values lower than

ϵ

, the Algorithm 1 can be used almost as-is. It is sufficient to report

S_{j, j} = - 1

for truncated singular values, and redefine

c^{i} = \{j ∣ v_{j} = (- s, i, .), s > 0\} .

Algorithm 1 Parallel conventional SVD for k-tridiagonal matrices

1:: procedureKTriConvSVD(n, k, $d$ , $a$ , $b$ )
2:: Share the memory $d, a, b$
3:: Start threads kTriConvSVDBlock(n, k, i) for $1 \leq i \leq n$
4:: Wait for all $S^{i}$ to arrive
5:: Compute the augmented vectors $v_{j}^{i} = (- S_{j, j}^{i}, i, j)$
6:: Merge (in a stable manner) $v^{i}$ s w.r.t. lex order into $v$
7:: Compute $c^{i} = \{j ∣ v_{j} = (., i, .)\}$ (in a stable manner)
8:: Post $c^{i}$ to their respective worker threads
9:: end procedure
10:: procedureKTriConvSVDBlock(n, k, i)
11:: Obtain the width of $T^{i}$ , $w^{i} = 1 + ⌊\frac{n - i}{k}⌋$ .
12:: Obtain selector vector, $s_{j \in \bar{1, w^{i}}}^{i} = (j - 1) k + i$
13:: Obtain $T^{i}$ ’s diagonals $d_{j}^{i} = d_{s_{j}^{i}}; a_{j}^{i} = a_{s_{j}^{i}}; b_{j}^{i} = b_{s_{j}^{i}};$
14:: Perform the SVD of $T^{i} = U^{i} S^{i} {V^{i}}^{T}$
15:: Post $S^{i}$ to the scheduler
16:: Await from the scheduler the target columns $c^{i} = s^{i} P_{τ}^{T}$
17:: Copy $S_{x, x}$ to $S_{c_{x}^{i}, c_{x}^{i}}$
18:: Copy ${U^{i}}_{y, x}$ to $U_{s_{y}^{i}, c_{x}^{i}}$ (obtaining $P_{{\bar{i}}_{n, k}} U^{i} P_{τ}^{T}$ )
19:: Copy ${V^{i}}_{y, x}$ to $V_{s_{y}^{i}, c_{x}^{i}}$ (obtaining $P_{{\bar{i}}_{n, k}} V^{i} P_{τ}^{T}$ )
20:: end procedure

Regarding the numerical stability of the proposed method, outside of the calls to the underlying black box algorithm applied to the blocks, our algorithm does not involve floating-point operations. Thus, numerical stability is a function of just the black box SVD solver used for the blocks.

4. Numerical Examples

In this section, we exemplify our algorithm on a matrix similar to those considered by other authors [22,30], and another that also highlights the fact that our algorithm also handles non-symmetric matrices.

Firstly, we consider the matrix

T = [\begin{matrix} 1 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 \\ 0 & 2 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 \\ 0 & 0 & 2 & 0 & 0 & 0 & 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 2 & 0 & 0 & 0 & 1 & 0 & 0 \\ 1 & 0 & 0 & 0 & 2 & 0 & 0 & 0 & 1 & 0 \\ 0 & 1 & 0 & 0 & 0 & 2 & 0 & 0 & 0 & 1 \\ 0 & 0 & 1 & 0 & 0 & 0 & 2 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 & 0 & 0 & 0 & 2 & 0 & 0 \\ 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 2 & 0 \\ 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 2 \end{matrix}],

or, equivalently,

d = [1, 2, 2, 2, 2, 2, 2, 2, 2, 2]; a = b = [1, 1, 1, 1, 1, 1]

We then obtain the block-diagonal form of

T

,

T^{1} = [\begin{matrix} 1 & 1 & 0 \\ 1 & 2 & 1 \\ 0 & 1 & 2 \end{matrix}]; T^{2} = [\begin{matrix} 2 & 1 & 0 \\ 1 & 2 & 1 \\ 0 & 1 & 2 \end{matrix}]; T^{3} = T^{4} = [\begin{matrix} 2 & 1 \\ 1 & 2 \end{matrix}] .

Now, we perform the SVD on each block, as follows

T^{1} \approx [\begin{matrix} - 0.328 & 0.591 & 0.737 \\ - 0.737 & 0.328 & - 0.591 \\ - 0.591 & - 0.737 & 0.328 \end{matrix}] [\begin{matrix} 3.247 & 0 & 0 \\ 0 & 1.555 & 0 \\ 0 & 0 & 0.198 \end{matrix}] {[\begin{matrix} - 0.328 & 0.591 & 0.737 \\ - 0.737 & 0.328 & - 0.591 \\ - 0.591 & - 0.737 & 0.328 \end{matrix}]}^{T},

T^{2} \approx [\begin{matrix} - 0.500 & 0.707 & 0.500 \\ - 0.707 & - 0.000 & - 0.707 \\ - 0.500 & - 0.707 & 0.500 \end{matrix}] [\begin{matrix} 3.414 & 0 & 0 \\ 0 & 2.000 & 0 \\ 0 & 0 & 0.586 \end{matrix}] {[\begin{matrix} - 0.500 & 0.707 & 0.500 \\ - 0.707 & - 0.000 & - 0.707 \\ - 0.500 & - 0.707 & 0.500 \end{matrix}]}^{T},

T^{3} = T^{4} \approx [\begin{matrix} - 0.707 & - 0.707 \\ - 0.707 & 0.707 \end{matrix}] [\begin{matrix} 3.000 & 0 \\ 0 & 1.000 \end{matrix}] {[\begin{matrix} - 0.707 & - 0.707 \\ - 0.707 & 0.707 \end{matrix}]}^{T} .

The augmented vectors

v^{i}

are

\begin{matrix} v^{1} \approx & {[\begin{matrix} - 3.247 & - 1.555 & - 0.198 \\ 1 & 1 & 1 \\ 1 & 2 & 3 \end{matrix}]}^{T}, \\ v^{2} \approx & {[\begin{matrix} - 3.414 & - 2.000 & - 0.586 \\ 2 & 2 & 2 \\ 1 & 2 & 3 \end{matrix}]}^{T}, \\ v^{3} \approx & {[\begin{matrix} - 3.000 & - 1.000 \\ 3 & 3 \\ 1 & 2 \end{matrix}]}^{T}, v^{4} \approx {[\begin{matrix} - 3.000 & - 1.000 \\ 4 & 4 \\ 1 & 2 \end{matrix}]}^{T}, \end{matrix}

and thus the merged vector

v

is

v \approx {[\begin{matrix} - 3.414 & - 3.247 & - 3.000 & - 3.000 & - 2.000 & - 1.555 & - 1.000 & - 1.000 & - 0.586 & - 0.198 \\ 2 & 1 & 3 & 4 & 2 & 1 & 3 & 4 & 2 & 1 \\ 1 & 1 & 1 & 1 & 2 & 2 & 2 & 2 & 3 & 3 \end{matrix}]}^{T}

from which, we obtain the column selectors

c^{i}

,

c^{1} = {[\begin{matrix} 2 & 6 & 10 \end{matrix}]}^{T} c^{2} = {[\begin{matrix} 1 & 5 & 9 \end{matrix}]}^{T} c^{3} = {[\begin{matrix} 3 & 7 \end{matrix}]}^{T} c^{4} = {[\begin{matrix} 4 & 8 \end{matrix}]}^{T}

while the original row selectors

s^{i}

were,

s^{1} = {[\begin{matrix} 1 & 5 & 9 \end{matrix}]}^{T} s^{2} = {[\begin{matrix} 2 & 6 & 10 \end{matrix}]}^{T} s^{3} = {[\begin{matrix} 3 & 7 \end{matrix}]}^{T} s^{4} = {[\begin{matrix} 4 & 8 \end{matrix}]}^{T}

hence, we obtain the decomposition

U = V = [\begin{matrix} 0 & - 0.328 & 0 & 0 & 0 & 0.591 & 0 & 0 & 0 & - 0.737 \\ - 0.500 & - 0 & 0 & 0 & - 0.707 & 0 & 0 & 0 & 0.500 & 0 \\ 0 & 0 & - 0.707 & 0 & 0 & 0 & - 0.707 & 0 & 0 & 0 \\ 0 & 0 & 0 & - 0.707 & 0 & 0 & 0 & - 0.707 & 0 & 0 \\ 0 & - 0.737 & 0 & 0 & 0 & 0.328 & 0 & 0 & 0 & 0.591 \\ - 0.707 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & - 0.707 & 0 \\ 0 & 0 & - 0.707 & 0 & 0 & 0 & 0.707 & 0 & 0 & 0 \\ 0 & 0 & 0 & - 0.707 & 0 & 0 & 0 & 0.707 & 0 & 0 \\ 0 & - 0.591 & 0 & 0 & 0 & - 0.737 & 0 & 0 & 0 & - 0.328 \\ - 0.500 & 0 & 0 & 0 & 0.707 & 0 & 0 & 0 & 0.500 & 0 \end{matrix}],

S = diag ([\begin{matrix} - 3.414 & - 3.247 & - 3.000 & - 3.000 & - 2.000 & - 1.555 & - 1.000 & - 1.000 & - 0.586 & - 0.198 \end{matrix}]) .

Furthermore, we consider the matrix

T = [\begin{matrix} 1 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 \\ 0 & 2 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 \\ 0 & 0 & 2 & 0 & 0 & 0 & 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 2 & 0 & 0 & 0 & 1 & 0 & 0 \\ - 1 & 0 & 0 & 0 & 2 & 0 & 0 & 0 & 1 & 0 \\ 0 & - 1 & 0 & 0 & 0 & 2 & 0 & 0 & 0 & 1 \\ 0 & 0 & - 1 & 0 & 0 & 0 & 2 & 0 & 0 & 0 \\ 0 & 0 & 0 & - 1 & 0 & 0 & 0 & 2 & 0 & 0 \\ 0 & 0 & 0 & 0 & - 1 & 0 & 0 & 0 & 2 & 0 \\ 0 & 0 & 0 & 0 & 0 & - 1 & 0 & 0 & 0 & 2 \end{matrix}],

or, equivalently,

d = [1, 2, 2, 2, 2, 2, 2, 2, 2, 2]; b = - a = - [1, 1, 1, 1, 1, 1]

We then obtain the block-diagonal form of

T

,

T^{1} = [\begin{matrix} 1 & 1 & 0 \\ - 1 & 2 & 1 \\ 0 & - 1 & 2 \end{matrix}]; T^{2} = [\begin{matrix} 2 & 1 & 0 \\ - 1 & 2 & 1 \\ 0 & - 1 & 2 \end{matrix}]; T^{3} = T^{4} = [\begin{matrix} 2 & 1 \\ - 1 & 2 \end{matrix}] .

Now, we perform the SVD on each block, as follows

T^{1} \approx [\begin{matrix} - 0.268 & - 0.208 & - 0.940 \\ - 0.940 & 0.268 & 0.208 \\ 0.208 & 0.940 & 0.268 \end{matrix}] [\begin{matrix} 2.507 & 0 & 0 \\ 0 & 2.285 & 0 \\ 0 & 0 & 1.221 \end{matrix}] {[\begin{matrix} 0.268 & - 0.208 & - 0.940 \\ - 0.940 & - 0.268 & - 0.208 \\ - 0.208 & 0.940 & - 0.268 \end{matrix}]}^{T},

T^{2} \approx [\begin{matrix} - 0.408 & - 0.577 & - 0.707 \\ - 0.816 & 0.577 & 0.000 \\ 0.408 & 0.577 & 0.500 \end{matrix}] [\begin{matrix} 2.449 & 0 & 0 \\ 0 & 2.449 & 0 \\ 0 & 0 & 2.000 \end{matrix}] {[\begin{matrix} 0.000 & - 0.707 & - 0.707 \\ - 1.000 & 0.000 & 0.000 \\ 0.000 & 0.707 & - 0.707 \end{matrix}]}^{T},

T^{3} = T^{4} \approx [\begin{matrix} - 0.894 & - 0.447 \\ - 0.447 & 0.894 \end{matrix}] [\begin{matrix} 2.236 & 0 \\ 0 & 2.236 \end{matrix}] {[\begin{matrix} - 1 & 0 \\ - 0 & 1 \end{matrix}]}^{T} .

The augmented vectors

v^{i}

are

\begin{matrix} v^{1} \approx & {[\begin{matrix} - 2.507 & - 2.285 & - 1.221 \\ 1 & 1 & 1 \\ 1 & 2 & 3 \end{matrix}]}^{T}, \\ v^{2} \approx & {[\begin{matrix} - 2.449 & - 2.449 & - 2.000 \\ 2 & 2 & 2 \\ 1 & 2 & 3 \end{matrix}]}^{T}, \\ v^{3} \approx & {[\begin{matrix} - 2.236 & - 2.236 \\ 3 & 3 \\ 1 & 2 \end{matrix}]}^{T}, v^{4} \approx {[\begin{matrix} - 2.236 & - 2.236 \\ 4 & 4 \\ 1 & 2 \end{matrix}]}^{T}, \end{matrix}

and thus, the merged vector

v

is

v \approx {[\begin{matrix} - 2.507 & - 2.449 & - 2.449 & - 2.285 & - 2.236 & - 2.236 & - 2.236 & - 2.236 & - 2.000 & - 1.221 \\ 1 & 2 & 2 & 1 & 3 & 3 & 4 & 4 & 2 & 1 \\ 1 & 1 & 2 & 2 & 1 & 2 & 1 & 2 & 3 & 3 \end{matrix}]}^{T}

from which, we obtain the column selectors

c^{i}

,

c^{1} = {[\begin{matrix} 1 & 4 & 10 \end{matrix}]}^{T} c^{2} = {[\begin{matrix} 2 & 3 & 9 \end{matrix}]}^{T} c^{3} = {[\begin{matrix} 5 & 6 \end{matrix}]}^{T} c^{4} = {[\begin{matrix} 7 & 8 \end{matrix}]}^{T}

while the original row selectors

s^{i}

were

s^{1} = {[\begin{matrix} 1 & 5 & 9 \end{matrix}]}^{T} s^{2} = {[\begin{matrix} 2 & 6 & 10 \end{matrix}]}^{T} s^{3} = {[\begin{matrix} 3 & 7 \end{matrix}]}^{T} s^{4} = {[\begin{matrix} 4 & 8 \end{matrix}]}^{T}

hence, we obtain the decomposition

U = [\begin{matrix} - 0.268 & 0 & 0 & - 0.208 & 0 & 0 & 0 & 0 & 0 & - 0.940 \\ 0 & - 0.408 & - 0.577 & 0 & 0 & 0 & 0 & 0 & - 0.707 & 0 \\ 0 & 0 & 0 & 0 & - 0.894 & 0.447 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & - 0.894 & 0.447 & 0 & 0 \\ - 0.940 & 0 & 0 & 0.268 & 0 & 0 & 0 & 0 & 0 & 0.208 \\ 0 & - 0.816 & 0.577 & 0 & 0 & 0 & 0 & 0 & 0.000 & 0 \\ 0 & 0 & 0 & 0 & 0.447 & 0.894 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & - 0.894 & 0.447 & 0 & 0 \\ 0.208 & 0 & 0 & 0.940 & 0 & 0 & 0 & 0 & 0 & - 0.268 \\ 0 & 0.408 & 0.577 & 0 & 0 & 0 & 0 & 0 & - 0.707 & 0 \end{matrix}],

- S = diag ([\begin{matrix} - 2.507 & - 2.449 & - 2.449 & - 2.285 & - 2.236 & - 2.236 & - 2.236 & - 2.236 & - 2.000 & - 1.221 \end{matrix}]) .

V = [\begin{matrix} 0.268 & 0 & 0 & - 0.208 & 0 & 0 & 0 & 0 & 0 & - 0.940 \\ 0 & 0.000 & - 0.707 & 0 & 0 & 0 & 0 & 0 & - 0.707 & 0 \\ 0 & 0 & 0 & 0 & - 1.000 & 0.000 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & - 1.000 & 0.000 & 0 & 0 \\ - 0.940 & 0 & 0 & - 0.268 & 0 & 0 & 0 & 0 & 0 & - 0.208 \\ 0 & - 1.000 & 0.000 & 0 & 0 & 0 & 0 & 0 & 0.000 & 0 \\ 0 & 0 & 0 & 0 & 0.000 & 1.000 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0.000 & 1.000 & 0 & 0 \\ - 0.208 & 0 & 0 & 0.940 & 0 & 0 & 0 & 0 & 0 & - 0.268 \\ 0 & 0.000 & 0.707 & 0 & 0 & 0 & 0 & 0 & - 0.707 & 0 \end{matrix}],

5. Evaluation

In this section, we present a comparison between Algorithm 1 and the industry standard for full SVD implementations, the Lapack library. We used Lapack DGESVD rutine, the standard method. It computes the SVD of a real

n \times m

matrix, optionally computing the left and/or right singular vectors. Our code was developed to obtain real results on many random matrix inputs, and to show that Lapack does indeed not know to take advantage of

n / k

. Experimental data serves to validate Theorem 1, and, given the small RSD, it can clearly be extrapolated.

Regarding the dataset, any bidiagonal or tridiagonal matrix dataset can be used, using any black box solver, like the one presented in [31]. However, such matrices are only 1-tridiagonal; thus, our algorithm will simply be called the black box. To our knowledge, there is no standard k-tridiagonal matrix dataset, which is why we generated random matrices.

In order to validate and compare our scalability of k-tridiagonal matrix singular value decomposition, we have built an experimental infrastructure using dedicated computing servers. The compute node used is a dual-socket one with a CPU architecture based on XEON E5-2640 v3 running at 2.60 GHz. It has 16 cores and 32 hyper-threads (hyper-threading enabled). The amount of memory available is 128 GB RAM, which is relevant for the size of the matrix you can load in the memory. Being a shared cluster, we have used a shared NFS storage connected through 10Gbps ethernet. The NFS backend is formed of 900 GB SAS disks in RAID6 (this is relevant for the time needed to read the input matrix). The amount of memory and storage description was given here for practical purposes, but was not taken into account in our measurements and comparisons. In our experimental setup, we measured the amount of time needed for computing the k-tridiagonal matrix singular value decomposition (without reading and writing the inputs and outputs). Regarding the software base, the compute node was running a CentOS7, with the latest updates of the time of writing this paper (CentOS 7.7), and a hand built compiler gcc gnu-5.4.0 (to take full advantage of the architecture). The parallelization was performed using OpenMP flags (so multi-threaded).

Please note that the 24-core experiment uses the hyper-threads, which are not as efficient as a fully-fledged core, but we wanted to see how it scales.

We firstly investigate the reliability of our measurements. We evaluate our method on 10,000 × 10,000 k-tridiagonal matrices with entries of uniformly distributed integers in the interval

[0, 100]

for various values of k (10, 50, 100, 200, 500, 1000, 5000) using multiple setups (1, 2, 4, 8, 16, 24 CPUs). We perform 1000 such experiments and plot the mean and standard deviation of our proposed code in Figure 1. The variability of this method can be quantified by the relative standard deviation (RSD), which varies between 1.05% and 9.95% for the measurements we have performed.

Finally, we evaluate whether industry standard software, such as Lapack, can detect and use the sparsity of our data. We consider

10, 000 \times 10, 0000

matrices with entries of uniformly distributed integers in the interval

[0, 100]

, and multiple values of k (10, 50, 100, 200, 500, 1000, 5000). We plot in Figure 2 the mean runtime of Lapack, the parallel code of [30] and our proposed code, where the parallel codes are ran on 1, 2, 4, 8, 16, 24 cpus. From the graph, we can see that sorting the eigenvalues adds almost no cost (<5 ms), and that both methods produce much better results than Lapack. Thus, considering its low variability, our method provides a great improvement over Lapack, not only on average, but even in the worst cases.

While many algorithms have been developed for general banded matrices, the k-tridiagonal form allows for even further optimization. For example, notice that, as the matrix becomes wider-banded (i.e., as k increases), our algorithm gets better whereas the previous algorithm [32] gets worse, i.e., when

k = n

, the previous algorithm is worse than the traditional full-matrix BCD SVD.

6. Conclusions

Considering the great theoretical improvement of singular value decomposition for k-tridiagonal matrices [30] as a starting point, we proved here that sorting the singular values does not alter the performance, nor the scaling potential. Furthermore, a complete scaling scenario has been treated, showing surprising results, which emphasize the endless scalability potential of such methods, providing a considerable burst to industry standard implementations. The singular value decomposition method is a very important tool that is core to the development of new technologies especially in communications, so any new improved implementation adds a valuable benefit.

Author Contributions

Conceptualization, A.T. and P.G.P.; methodology, M.C. and F.P.; software, A.T. and M.C.; validation, A.T., M.C., F.P. and P.G.P.; formal analysis, P.G.P.; investigation, F.P.; resources, M.C.; data curation, A.T. and M.C.; writing—original draft preparation, A.T. and P.G.P.; writing—review and editing, A.T., M.C., F.P. and P.G.P.; visualization, F.P.; supervision, P.G.P.; project administration, A.T.; funding acquisition, F.P. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the project CloudPrecis (SMIS code 2014+ 124812). The APC was funded by University Politehnica of Bucharest through the PubArt program.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

We would like to thank the reviewers for their time and expertise, constructive comments and valuable insight.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

SVD	Singular value decomposition
PCA	Principal component analysis
NFS	Network file system
RSD	Relative standard deviation

References

Horn, R.A.; Horn, R.A.; Johnson, C.R. Matrix Analysis; Cambridge University Press: Cambridge, UK, 1990. [Google Scholar]
Bishop, C.M. Pattern Recognition and Machine Learning; Springer Science+ Business Media: New York, NY, USA, 2006. [Google Scholar]
Jolliffe, I.T.; Cadima, J. Principal component analysis: A review and recent developments. Philos. Trans. R. Soc. A Math. Phys. Eng. Sci. 2016, 374, 20150202. [Google Scholar] [CrossRef] [PubMed]
Luo, W.H.; Huang, T.Z.; Gu, X.M.; Liu, Y. Barycentric rational collocation methods for a class of nonlinear parabolic partial differential equations. Appl. Math. Lett. 2017, 68, 13–19. [Google Scholar] [CrossRef]
McMillen, T.; Bourget, A.; Agnew, A. On the zeros of complex Van Vleck polynomials. J. Comput. Appl. Math. 2009, 223, 862–871. [Google Scholar] [CrossRef] [Green Version]
Gu, X.M.; Sun, H.W.; Zhao, Y.L.; Zheng, X. An implicit difference scheme for time-fractional diffusion equations with a time-invariant type variable order. Appl. Math. Lett. 2021, 120, 107270. [Google Scholar] [CrossRef]
Luo, W.H.; Gu, X.M.; Yang, L.; Meng, J. A Lagrange-quadratic spline optimal collocation method for the time tempered fractional diffusion equation. Math. Comput. Simul. 2021, 182, 1–24. [Google Scholar] [CrossRef]
Peng, W.; Xin, B. An integrated autoencoder-based filter for sparse big data. J. Control. Decis. 2020, 8, 260–268. [Google Scholar] [CrossRef]
Ruble, M.; Güvenç, İ. Multilinear Singular Value Decomposition for Millimeter Wave Channel Parameter Estimation. IEEE Access 2020, 8, 75592–75606. [Google Scholar] [CrossRef]
He, Y.L.; Tian, Y.; Xu, Y.; Zhu, Q.X. Novel soft sensor development using echo state network integrated with singular value decomposition: Application to complex chemical processes. Chemom. Intell. Lab. Syst. 2020, 200, 103981. [Google Scholar] [CrossRef]
Han, S.; Ng, W.K.; Philip, S.Y. Privacy-preserving singular value decomposition. In Proceedings of the 2009 IEEE 25th International Conference on Data Engineering, Shanghai, China, 29 March–2 April 2009; pp. 1267–1270. [Google Scholar]
Chan, T.F. An improved algorithm for computing the singular value decomposition. ACM Trans. Math. Softw. (TOMS) 1982, 8, 72–83. [Google Scholar] [CrossRef]
Gu, M.; Eisenstat, S.C. A divide-and-conquer algorithm for the bidiagonal SVD. SIAM J. Matrix Anal. Appl. 1995, 16, 79–92. [Google Scholar] [CrossRef]
Nakatsukasa, Y.; Higham, N.J. Stable and efficient spectral divide and conquer algorithms for the symmetric eigenvalue decomposition and the SVD. SIAM J. Sci. Comput. 2013, 35, A1325–A1349. [Google Scholar] [CrossRef]
de Rijk, P. A one-sided Jacobi algorithm for computing the singular value decomposition on a vector computer. SIAM J. Sci. Stat. Comput. 1989, 10, 359–371. [Google Scholar] [CrossRef]
Konda, T.; Nakamura, Y. A new algorithm for singular value decomposition and its parallelization. Parallel Comput. 2009, 35, 331–344. [Google Scholar] [CrossRef]
Musco, C.; Musco, C. Randomized block Krylov methods for stronger and faster approximate singular value decomposition. In Proceedings of the Annual Conference on Neural Information Processing Systems 2015, Montreal, QC, Canada, 7–12 December 2015; pp. 1396–1404. [Google Scholar]
Kokiopoulou, E.; Bekas, C.; Gallopoulos, E. Computing smallest singular triplets with implicitly restarted Lanczos bidiagonalization. Appl. Numer. Math. 2004, 49, 39–61. [Google Scholar] [CrossRef] [Green Version]
Niu, D.; Yuan, X. An implicitly restarted Lanczos bidiagonalization method with refined harmonic shifts for computing smallest singular triplets. J. Comput. Appl. Math. 2014, 260, 208–217. [Google Scholar] [CrossRef]
Ishida, Y.; Takata, M.; Kimura, K.; Nakamura, Y. An Improvement of Augmented Implicitly Restarted Lanczos Bidiagonalization Method. In Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA). The Steering Committee of The World Congress in Computer Science, Computer Engineering and Applied Computing (WorldComp), Las Vegas, NV, USA, 17–20 July 2017; pp. 281–287. [Google Scholar]
Sogabe, T.; El-Mikkawy, M. Fast block diagonalization of k-tridiagonal matrices. Appl. Math. Comput. 2011, 218, 2740–2743. [Google Scholar] [CrossRef]
El-Mikkawy, M.; Atlan, F. A novel algorithm for inverting a general k-tridiagonal matrix. Appl. Math. Lett. 2014, 32, 41–47. [Google Scholar] [CrossRef]
El-Mikkawy, M.; Atlan, F. A new recursive algorithm for inverting general k-tridiagonal matrices. Appl. Math. Lett. 2015, 44, 34–39. [Google Scholar] [CrossRef]
Sogabe, T.; Yılmaz, F. A note on a fast breakdown-free algorithm for computing the determinants and the permanents of k-tridiagonal matrices. Appl. Math. Comput. 2014, 249, 98–102. [Google Scholar] [CrossRef]
Kirklar, E.; Yilmaz, F. A Note on k-Tridiagonal k-Toeplitz Matrices. Ala. J. Math. 2015, 3, 39. [Google Scholar]
Jia, J.; Li, S. Symbolic algorithms for the inverses of general k-tridiagonal matrices. Comput. Math. Appl. 2015, 70, 3032–3042. [Google Scholar] [CrossRef]
Ohashi, A.; Usuda, T.; Sogabe, T. On Tensor product decomposition of k-tridiagonal toeplitz matrices. Int. J. Pure Appl. Math. 2015, 103, 537–545. [Google Scholar] [CrossRef]
Takahira, S.; Sogabe, T.; Usuda, T. Bidiagonalization of (k, k+ 1)-tridiagonal matrices. Spec. Matrices 2019, 7, 20–26. [Google Scholar] [CrossRef]
Küçük, A.Z.; Özen, M.; İnce, H. Recursive and Combinational Formulas for Permanents of General k-tridiagonal Toeplitz Matrices. Filomat 2019, 33, 307–317. [Google Scholar]
Tănăsescu, A.; Popescu, P.G. A fast singular value decomposition algorithm of general k-tridiagonal matrices. J. Comput. Sci. 2019, 31, 1–5. [Google Scholar] [CrossRef]
Marques, O.; Demmel, J.; Vasconcelos, P.B. Bidiagonal SVD Computation via an Associated Tridiagonal Eigenproblem. ACM Trans. Math. Softw. (TOMS) 2020, 46, 1–25. [Google Scholar] [CrossRef]
Liao, X.; Li, S.; Cheng, L.; Gu, M. An improved divide-and-conquer algorithm for the banded matrices with narrow bandwidths. Comput. Math. Appl. 2016, 71, 1933–1943. [Google Scholar] [CrossRef]

Figure 1. Average runtime of our proposed algorithm for n = 10,000 and multiple values of k and multiple numbers of cpus.

Figure 2. Average runtime of our proposed algorithm (marked S in the legend), the algorithm of [30] (marked NS in the legend) and Lapack for

n = 10, 000

and multiple values of k and multiple numbers of cpus.

Figure 2. Average runtime of our proposed algorithm (marked S in the legend), the algorithm of [30] (marked NS in the legend) and Lapack for

n = 10, 000

and multiple values of k and multiple numbers of cpus.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Tănăsescu, A.; Carabaş, M.; Pop, F.; Popescu, P.G. Scalability of k-Tridiagonal Matrix Singular Value Decomposition. Mathematics 2021, 9, 3123. https://doi.org/10.3390/math9233123

AMA Style

Tănăsescu A, Carabaş M, Pop F, Popescu PG. Scalability of k-Tridiagonal Matrix Singular Value Decomposition. Mathematics. 2021; 9(23):3123. https://doi.org/10.3390/math9233123

Chicago/Turabian Style

Tănăsescu, Andrei, Mihai Carabaş, Florin Pop, and Pantelimon George Popescu. 2021. "Scalability of k-Tridiagonal Matrix Singular Value Decomposition" Mathematics 9, no. 23: 3123. https://doi.org/10.3390/math9233123

APA Style

Tănăsescu, A., Carabaş, M., Pop, F., & Popescu, P. G. (2021). Scalability of k-Tridiagonal Matrix Singular Value Decomposition. Mathematics, 9(23), 3123. https://doi.org/10.3390/math9233123

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Scalability of k-Tridiagonal Matrix Singular Value Decomposition

Abstract

1. Introduction

2. Background

3. Parallel Singular Value Decomposition for $k$ -Tridiagonal Matrices

4. Numerical Examples

5. Evaluation

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Scalability of k-Tridiagonal Matrix Singular Value Decomposition

Abstract

1. Introduction

2. Background

3. Parallel Singular Value Decomposition for k -Tridiagonal Matrices

4. Numerical Examples

5. Evaluation

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

3. Parallel Singular Value Decomposition for $k$ -Tridiagonal Matrices