Stable Tensor Principal Component Pursuit: Error Bounds and Efficient Algorithms

Fang, Wei; Wei, Dongxu; Zhang, Ran

doi:10.3390/s19235335

Open AccessArticle

Stable Tensor Principal Component Pursuit: Error Bounds and Efficient Algorithms

by

Wei Fang

^1,*,

Dongxu Wei

² and

Ran Zhang

³

¹

Department of Computer Science and Technology, Huaibei Vocational and Technical College, Huaibei 235000, China

²

School of Physics and Electronic Electrical Engineering, Huaiyin Normal University, Huaian 223300, China

³

Mathematics Teaching and Research Group, Nanjing No.9 High School, Nanjing 210018, China

^*

Author to whom correspondence should be addressed.

Sensors 2019, 19(23), 5335; https://doi.org/10.3390/s19235335

Submission received: 13 November 2019 / Revised: 28 November 2019 / Accepted: 29 November 2019 / Published: 3 December 2019

(This article belongs to the Special Issue Sensor Signal and Information Processing III)

Download

Browse Figures

Versions Notes

Abstract

:

The rapid development of sensor technology gives rise to the emergence of huge amounts of tensor (i.e., multi-dimensional array) data. For various reasons such as sensor failures and communication loss, the tensor data may be corrupted by not only small noises but also gross corruptions. This paper studies the Stable Tensor Principal Component Pursuit (STPCP) which aims to recover a tensor from its corrupted observations. Specifically, we propose a STPCP model based on the recently proposed tubal nuclear norm (TNN) which has shown superior performance in comparison with other tensor nuclear norms. Theoretically, we rigorously prove that under tensor incoherence conditions, the underlying tensor and the sparse corruption tensor can be stably recovered. Algorithmically, we first develop an ADMM algorithm and then accelerate it by designing a new algorithm based on orthogonal tensor factorization. The superiority and efficiency of the proposed algorithms is demonstrated through experiments on both synthetic and real data sets.

Keywords:

tensor principal component pursuit; stable recovery; tensor SVD; ADMM

1. Introduction

In recent years, different types of tensor data have emerged with the significant progress of modern sensor technology, such as color images [1], videos [2], functional MRI data [3], hyper-spectral images [4], point could data [5], traffic stream data [6], etc. Thanks to its multi-way nature, tensor-based methods have natural superiority over vector and matrix-based methods in analyzing and processing ubiquitous modern multi-way data, and have found extensive applications in computer vision [1,7], data mining [5], machine learning [2], signal processing [8], to name a few. In real applications, the acquired tensor data may often suffer from noises and gross corruptions owing to many different reasons such as sensor failure, lens pollution, communication interference, occlusion in videos, or abnormalities in a sensor network [9], etc. At the same time, many real-world tensor data, such as face images or videos, have been shown to have some low-dimensional structure and can be well approximated by a smaller number of “principal components” [8]. Then, a question naturally arises: how to pursue the principal components of an observed tensor data in the presence of both noises and gross corruptions? We will answer this question in this paper and refer to the proposed methodology as Stable Tensor Principal Component Pursuit (STPCP).

The tensor low-rankness is an ideal model of the property that a tensor data can be well approximated by a small number of principal components [8]. In the last decade, low-rank tensor models have attracted much attention in many fields [10]. There are multiple low-rank tensor models since there exist different definitions of tensor rank. Among these models, the low CP rank model [11] and the low Tucker rank model [1] should be the most famous two. The low CP rank model approximates the underlying tensor by the sum of a small number of rank-1 tensors, whereas the low Tucker rank model assumes the unfolding matrix along each mode are low rank. To estimate an unknown low-rank tensor from corrupted observations, it is a natural option to consider the rank minimization problem which chooses the tensor of lowest rank as the solution from a certain feasible set. However, tensor rank minimization, even in its 2-way (matrix) case, is generally NP-hard [12] and even harder in higher-way cases [13]. For tractable solutions, researchers turn to a variety of convex surrogates for tensor rank [1,14,15,16,17,18] to replace the tensor rank in rank minimization problem. Methods based on surrogates for the tensor CP rank and Tucker rank have been extensively explored in both the theoretical side and the application side [14,17,19,20,21,22,23,24].

Recently, the low-tubal-rank model [16,25] has shown better performance than traditional tensor low-rank models in many tensor recover tasks such as image/video inpainting/denoising/ sensing [2,25,26], moving object detection [27], multi-view learning [28], seismic data completion [29], WiFi fingerprint [30], MRI imaging [16], point cloud data inpainting [31], and so on. The tubal rank is a new complexity measure of tensor defined through the framework of tensor singular value decomposition (t-SVD) [32,33]. At the core of existing low-tubal-rank models is the tubal nuclear norm (TNN) which is a convex surrogate for the tubal rank. In contrast to CP-based tensor nuclear norms or Tucker-based tensor nuclear norms which models low-rankness in the original domain, TNN models low-rankness in the Fourier domain. It is pointed out in [25,34,35] that TNN has superiority over traditional tensor nuclear norms in exploiting the ubiquitous “spatial-shifting” property in real-world tensor data.

Inspired by the superior performance of TNN, this paper adopts TNN as a low-rank regularizer in the proposed STPCP model. Specifically, the proposed STPCP aims to estimate the underlying tensor data

{\underline{L}}_{0} \in R^{n_{1} \times n_{2} \times n_{3}}

from an observation tensor

\underline{M}

polluted by both small dense noises and sparse gross corruptions as follows:

\begin{matrix} \underline{M} = {\underline{L}}_{0} + {\underline{S}}_{0} + {\underline{E}}_{0}, \end{matrix}

(1)

where

{\underline{S}}_{0}

is a tensor denoting the sparse corruptions and

{\underline{E}}_{0}

is a tensor representing small dense noises. Model (1) is also known as robust tensor decomposition in [36,37].

Our STPCP model is first formulated as a TNN-based convex problem. Then, our theoretical analysis gives upper bound on the estimation error of

{\underline{L}}_{0}

and

{\underline{S}}_{0}

. In contrast to the analysis in [37], the proposed STPCP can exactly recovery the underlying tensor

{\underline{L}}_{0}

and the sparse corruption tensor

{\underline{S}}_{0}

when the noise term

{\underline{E}}_{0}

vanishes. For efficient solution of the proposed STPCP model, we develop two algorithms with extensions to a more challenging scenario where missing observations are also considered. The first algorithm is an ADMM algorithm and the second algorithm accelerates it using tensor factorization. Experiments show the effectiveness and the efficiency of the designed algorithms.

We organize the rest of this paper as follows. In Section 2, we briefly introduce basic preliminaries for t-SVD and some related works. The proposed STPCP model is formulated and analyzed theoretically in Section 3. We design two algorithms in Section 4 and report experimental results in Section 5. This work is concluded in Section 6. The proofs of theorems, propositions, and lemmas are given in the appendix.

2. Preliminaries and Related Works

In this section, some preliminaries of t-SVD are first introduced. Then, the related works are presented.

Notations. We denote vectors by bold lower-case letters, e.g.,

a \in R^{n}

, matrices by bold upper-case letters, e.g.,

A \in R^{n_{1} \times n_{2}}

, and tensors by underlined upper-case letters, e.g.,

\underline{A} \in R^{n_{1} \times n_{2} \times n_{3}}

. For a given 3-way tensor, we define its fiber as a vector given through fixing all indices but one, and its slice as a matrix obtained by fixing all indices but two. For a given 3-way tensor

\underline{A}

, we use

{\underline{A}}_{i j k}

to denote its

(i, j, k)

-th element;

A^{(k)} : = \underline{A} (:, :, k)

is used to denote its k-th frontal slice.

\tilde{\underline{A}}

is used to denote the tensor after performing 1D Discrete Fourier Transformation (DFT) on all tube fibers

\underline{A} (i, j, :)

of

\underline{T}

,

\forall i = 1, 2, \dots, n_{1}, j = 1, 2, \dots, n_{2}

, which can be efficiently computed by the Matlab command

\tilde{\underline{A}} = fft (\underline{A}, [], 3)

. We use

dft 3 (\cdot)

and

idft 3 (\cdot)

to represent the 1D DFT and inverse DFT along the tube fibers of 3-way tensors, i.e.,

dft 3 (\underline{A}) : = fft (\underline{A}, [], 3), idft 3 (\underline{A}) : = ifft (\underline{A}, [], 3) .

For a given matrix

M \in R^{n_{1} \times n_{2}}

, define the nuclear norm and spectral norm of

M

respectively as:

\begin{matrix} {∥ M ∥}_{*} : = \sum_{i = 1}^{p} σ_{i} (M), and {∥ M ∥}_{sp} : = max {σ_{i} (M)}, \end{matrix}

where

p = min {n_{1}, n_{2}}

, and

σ_{1} (M) \geq \dots \geq σ_{p} (M)

are the singular values of

M

in a non-ascending order. The

l_{0}

-norm,

l_{1}

-norm, Frobenius norm,

l_{\infty}

-norm of a tensor

\underline{A} \in R^{n_{1} \times n_{2} \times n_{3}}

is defined as:

\begin{matrix} ∥ \underline{A} ∥_{0} : = \sum_{i j k} 1 ({\underline{A}}_{i j k} \neq 0), ∥ \underline{A} ∥_{1} : = \sum_{i j k} | {\underline{A}}_{i j k} |, ∥ \underline{A} ∥_{F} : = \sqrt{\sum_{i j k} {\underline{A}}_{i j k}^{2}}, ∥ \underline{A} ∥_{\infty} : = max_{i j k} | {\underline{A}}_{i j k} |, \end{matrix}

where

1 (C)

is an indicator function whose value is 1 if the condition C is true, and 0 otherwise.

Given two matrices

A = (a_{i j}) \in C^{n_{1} \times n_{2}}, B = (b_{i j}) \in C^{n_{1} \times n_{2}}

, we define their inner product as follows:

\begin{matrix} 〈A, B〉 = tr (A^{H} B) = \sum_{i j} {\bar{a}}_{i j} b_{i j}, \end{matrix}

where

A^{H}

denotes conjugate transpose of matrix

A

and

{\bar{a}}_{i j}

denotes the conjugation of complex number

a_{i j}

. Given two 3-way tensors

\underline{A}, \underline{B} \in R^{n_{1} \times n_{2} \times n_{3}}

, we define their inner product as follows:

\begin{matrix} 〈\underline{A}, \underline{B}〉 : = \sum_{i j k} {\underline{A}}_{i j k} {\underline{B}}_{i j k} . \end{matrix}

2.1. Tensor Singular Value Decomposition

We first define 3 operators based on block matrices which are introduced in [33]. For a given 3-way tensor

\underline{A} \in R^{n_{1} \times n_{2} \times n_{3}}

, we define its block vectorization

bvec (\cdot)

and the inverse operation

bvfold (\cdot)

in the following equation:

\begin{matrix} bvec (\underline{A}) : = [\begin{matrix} A^{(1)} \\ A^{(2)} \\ ⋮ \\ A^{(n_{3})} \end{matrix}] \in R^{n_{1} n_{3} \times n_{2}}, bvfold (bvec (\underline{A})) = \underline{A} . \end{matrix}

We further define the block circulant matrix

bcirc (\cdot)

of any 3-way tensor

\underline{A} \in R^{n_{1} \times n_{2} \times n_{3}}

as follows:

\begin{matrix} bcirc (\underline{A}) : = [\begin{matrix} A^{(1)} & A^{(n_{3})} & \dots & A^{(2)} \\ A^{(2)} & A^{(1)} & \dots & A^{(3)} \\ ⋮ & ⋱ & ⋱ & ⋮ \\ A^{(n_{3})} & A^{(n_{3} - 1)} & \dots & A^{(1)} \end{matrix}] \in C^{n_{1} n_{3} \times n_{2} n_{3}} \end{matrix}

Equipped with above defined operators, we are now in a position to define the t-product of 3-way tensors.

Definition 1

(t-product [33]). Given two tensors

\underline{A} \in R^{n_{1} \times n_{2} \times n_{3}}

and

\underline{B} \in R^{n_{2} \times n_{4} \times n_{3}}

, the t-product of

\underline{A}

and

\underline{B}

is a new 3-way tensor

\underline{C}

with size

n_{1} \times n_{4} \times n_{3}

:

\begin{matrix} \underline{C} = \underline{A} * \underline{B} = : bvfold (bcirc (\underline{A}) bvec (\underline{B})) . \end{matrix}

(2)

A more intuitive interpretation of t-SVD is as follows [33]. If we treat a 3-way tensor

\underline{A} \in R^{n_{1} \times n_{2} \times n_{3}}

as a matrix of size

n_{1} \times n_{2}

whose entries are the tube fibers, then the tensor t-product can be analogously understood as the “matrix multiplication” where the standard scalar product is replaced with the vector circular convolution between the tubes (i.e., vectors):

\begin{matrix} \underline{C} = \underline{A} * \underline{B} \Leftrightarrow \underline{C} (i, j, :) = \sum_{k = 1}^{n_{2}} \underline{A} (i, k, :) ⋆ \underline{B} (k, j, :), \forall i = 1, 2, \dots, n_{1}, j = 1, 2, \dots, n_{4}, \end{matrix}

(3)

where ⋆ represent the operation of circular convolution [33] of two vectors

a, b \in R^{n_{3}}

defined as

{(a ⋆ b)}_{j} = \sum_{k = 1}^{n_{3}} a_{k} b_{1 + (j - k) \mod n_{3}}

.

We also define the block diagonal matrix

bdiag (\cdot)

of any 3-way tensor

\underline{A} \in R^{n_{1} \times n_{2} \times n_{3}}

and its inverse

bdfold (\cdot)

as follows:

\begin{matrix} bdiag (\underline{A}) : = [\begin{matrix} A^{(1)} \\ ⋱ \\ A^{(n_{3})} \end{matrix}] \in R^{n_{1} n_{3} \times n_{2} n_{3}}, bdfold (bdiag (\underline{A})) = \underline{A} . \end{matrix}

We also use

\bar{A}

(or

\bar{\underline{A}}

) to denote the block diagonal matrix of tensor

\tilde{\underline{A}} = dft 3 (\underline{A})

(i.e., the Fourier version of

\underline{A}

) i.e.,

\begin{matrix} \bar{A} = bdiag (\tilde{\underline{A}}) : = [\begin{matrix} {\tilde{A}}^{(1)} \\ ⋱ \\ {\tilde{A}}^{(n_{3})} \end{matrix}] \in C^{n_{1} n_{3} \times n_{2} n_{3}} . \end{matrix}

Then the relationship between DFT and circular convolution further indicates that the conducting t-product in the original domain is equivalent to performing standard matrix product on the Fourier block diagonal matrices [33]. Since matrix product on the Fourier block diagonal matrices can be parallel written as matrix product of all the frontal slices in the Fourier domain, we have the following relationships:

\begin{matrix} \underline{C} = \underline{A} * \underline{B} \Leftrightarrow \bar{C} = \bar{A} \bar{B} \Leftrightarrow {\tilde{C}}^{(k)} = {\tilde{A}}^{(k)} {\tilde{B}}^{(k)}, k = 1, 2, \dots, n_{3} . \end{matrix}

(4)

The relationship between the t-product and FFT also indicates that the inner product of two 3-way tensors

\underline{A}, \underline{B} \in R^{n_{1} \times n_{2} \times n_{3}}

and the inner product of their corresponding Fourier block diagonal matrices

\bar{\underline{A}}, \bar{\underline{B}} \in C^{n_{1} n_{3} \times n_{2} n_{3}}

satisfy the following relationship:

\begin{matrix} 〈\underline{A}, \underline{B}〉 = \frac{1}{n_{3}} 〈\tilde{\underline{A}}, \tilde{\underline{B}}〉 = \frac{1}{n_{3}} 〈\bar{A}, \bar{B}〉 . \end{matrix}

When

\underline{A} = \underline{B} = \underline{X}

, one has:

\begin{matrix} ∥ \underline{X} ∥_{F} = \frac{1}{\sqrt{n_{3}}} {∥ \bar{\underline{X}} ∥}_{F} . \end{matrix}

We further define the concepts of tensor transpose, identity tensor, f-diagonal tensor and orthogonal tensor as follows.

Definition 2

(tensor transpose [33]). Given a tensor

\underline{A} \in R^{n_{1} \times n_{2} \times n_{3}}

, then define its transpose tensor

{\underline{A}}^{⊤}

of size

n_{2} \times n_{1} \times n_{3}

which can be formed through first transposing all the frontal slices of

\underline{A}

and then exchanging each k-th transposed frontal slice with the

(n_{3} + 2 - k)

-th transposed frontal slice for all

k = 2, 3, \dots, n_{3}

.

For example, consider 3-way tensor

\underline{A} = [A^{(1)} | A^{(2)} | A^{(3)} | A^{(4)}] \in R^{n_{1} \times n_{2} \times 4}

with 4 frontal slices, the tensor transpose

{\underline{A}}^{⊤}

of

\underline{A}

is:

\begin{matrix} {\underline{A}}^{⊤} = [{(A^{(1)})}^{⊤} | {(A^{(4)})}^{⊤} | {(A^{(3)})}^{⊤} | {(A^{(2)})}^{⊤}] \in R^{n_{2} \times n_{1} \times 4} . \end{matrix}

Definition 3

(identity tensor [33]). The identity tensor

\underline{I} \in R^{n \times n \times n_{3}}

is a tensor whose first frontal slice is the n-by-n identity matrix with all other frontal slices are zero matrices.

Definition 4

(f-diagonal tensor [33]). We call a 3-way tensor f-diagonal if all the frontal slices of it are diagonal matrices.

Definition 5

(orthogonal tensor [33]). We call a tensor

\underline{Q} \in R^{n \times n \times n_{3}}

an orthogonal tensor if the following equations hold:

{\underline{Q}}^{⊤} * \underline{Q} = \underline{Q} * {\underline{Q}}^{⊤} = \underline{I} .

Then, the tensor singular value decomposition (t-SVD) can be given as follows.

Definition 6

(Tensor singular value decomposition, and Tensor tubal rank [38]). Given any 3-way tensor

\underline{X} \in R^{n_{1} \times n_{2} \times n_{3}}

, then it has the following factorization called tensor singular value decomposition (t-SVD):

\begin{matrix} \underline{X} = \underline{U} * \sum_{_} * {\underline{V}}^{⊤}, \end{matrix}

(5)

where the left and right factor tensors

\underline{U} \in R^{n_{1} \times n_{1} \times n_{3}}

and

\underline{V} \in R^{n_{2} \times n_{2} \times n_{3}}

are orthogonal, and the middle tensor

\sum_{_} \in R^{n_{1} \times n_{2} \times n_{3}}

is a rectangular f-diagonal tensor.

A visual illustration for the t-SVD is shown in Figure 1. It can be computed efficiently by FFT and IFFT in the Fourier domain according to Equation (4). For more details, see [2].

Definition 7

(Tensor tubal rank [38]). The tensor tubal rank of any 3-way tensor

\underline{X} \in R^{n_{1} \times n_{2} \times n_{3}}

is defined as the number of non-zero tubes of

\sum_{_}

in its t-SVD shown in Equation (5), i.e.,

\begin{matrix} r_{tubal} (\underline{A}) : = \sum_{i} 1 (\sum_{_} (i, i, :) \neq 0) . \end{matrix}

(6)

Definition 8

(Tubal average rank [38]). The tubal average rank

r_{a} (\underline{A})

of any 3-way tensor

\underline{A} \in R^{n_{1} \times n_{2} \times n_{3}}

is defined as the averaged rank of all frontal slices of

\tilde{\underline{A}}

as follows,

\begin{matrix} r_{a} (\underline{A}) : = \frac{1}{n_{3}} \sum_{k = 1}^{n_{3}} error ({\tilde{A}}^{(k)}) . \end{matrix}

(7)

Definition 9

(Tensor operator norm [2,38]). The tensor operator norm

∥ \underline{F} ∥_{op}

of any 3-way tensor

\underline{F} \in R^{n_{1} \times n_{2} \times n_{3}}

is defined as follows:

\begin{matrix} ∥ \underline{F} ∥_{op} : = & sup_{∥ \underline{A} ∥_{F} \leq 1} {∥ \underline{F} * \underline{A} ∥}_{F} . \end{matrix}

(8)

The relationship between t-product and FFT indicates that:

\begin{matrix} ∥ \underline{F} ∥_{op} : = & sup_{∥ \underline{A} ∥_{F} \leq 1} ∥ \underline{F} * \underline{A} ∥_{F} = sup_{∥ \bar{A} ∥_{F} \leq \sqrt{n_{3}}} ∥ \bar{F} \cdot \bar{\underline{A}} ∥_{F} = {∥ \bar{A} ∥}_{sp} . \end{matrix}

(9)

Definition 10

(Tensor spectral norm [38]). The tensor spectral norm

∥ \underline{A} ∥_{sp}

of any 3-way tensor

\underline{F} \in R^{n_{1} \times n_{2} \times n_{3}}

is defined as the matrix spectral norm of

\bar{A}

, i.e.,

\begin{matrix} ∥ \underline{A} ∥_{sp} : = {∥ \bar{A} ∥}_{sp} . \end{matrix}

(10)

We further define the tubal nuclear norm.

Definition 11

(Tubal nuclear norm [2]). For any tensor

\underline{A} \in R^{n_{1} \times n_{2} \times n_{3}}

with t-SVD

\underline{A} = \underline{U} * \sum_{_} * {\underline{V}}^{⊤}

, the tubal nuclear norm (TNN) of

\underline{A}

is defined as:

\begin{matrix} ∥ \underline{A} ∥_{TNN} : = 〈\sum_{_}, \underline{I}〉 = \sum_{i = 1}^{r} \sum_{_} (i, i, 1), \end{matrix}

(11)

where

r = r_{t u b a l} (\underline{A})

.

To understand the tubal nuclear norm, first note that:

\begin{matrix} r_{tubal} (\underline{A}) = \sum_{i} 1 (\sum_{_} (i, i, :) \neq 0) \overset{(i)}{=} \sum_{i} 1 (\tilde{\sum_{_}} (i, i, :) \neq 0) \overset{(i i)}{=} \sum_{i} 1 (∥ \tilde{\sum_{_}} (i, i, :) ∥_{1} \neq 0) \overset{(i i i)}{=} \sum_{i} 1 (\sum_{_} (i, i, 1) \neq 0), \end{matrix}

(12)

where (i) holds because of the definition of DFT [2], (ii) holds by the property of

l_{1}

-norm, and (iii) is a result of DFT [2]. Thus, the tubal rank of

\underline{A}

is also the number of non-zero diagonal elements of

\sum_{_} (i, i, 1)

, i.e., the first frontal slice of tensor

\sum_{_}

in the t-SVD of

\underline{A}

. Similar to the matrix singular values, the values

\sum_{_} (i, i, 1), i = 1, 2, \dots, n_{3}

are also called the singular values of tensor

\underline{A}

. As the matrix nuclear norm is the sum of matrix singular values, the tubal nuclear norm can be similarly understood as the sum of tensor singular values.

One can also verify by the property of DFT [2] that:

\begin{matrix} ∥ \underline{A} ∥_{TNN} = \sum_{i = 1}^{r} \sum_{_} (i, i, 1) = \sum_{k = 1}^{n_{3}} \sum_{i = 1}^{r} \tilde{\sum_{_}} (i, i, k) = \frac{1}{n_{3}} \sum_{k = 1}^{n_{3}} ∥ {\tilde{A}}^{(k)} ∥_{*} = \frac{1}{n_{3}} {∥ \bar{A} ∥}_{*}, \end{matrix}

(13)

which indicates that the TNN of

\underline{A} \in R^{n_{1} \times n_{2} \times n_{3}}

is also the averaged nuclear norm all frontal slices of

\tilde{\underline{A}}

. Thus, TNN indeed models the low-rankness of Fourier domain.

Now, we will show that the low-tubal-rank model is ideal to some real-world tensor data, such as color images and videos.

First, we consider a natural image of size

256 \times 256 \times 3

, shown in Figure 2a. In Figure 2b, we plot the distribution of its singular values, i.e., the values of

\sum_{_} (i, i, 1)

along with the index i. As can be seen from Figure 2b, there are only a small number of singular values with large magnitude, and most of the singular values are close to 0. Then, we can say that some natural color images are approximately low tubal rank.

Then, consider a commonly used YUV sequence Mother-daughter_qcif (These data can be download from the following link https://sites.google.com/site/subudhibadri/fewhelpfuldownloads.) whose first frame is shown in Figure 3a. We use the Y components of the first 30 frames, and get a tensor of size

144 \times 176 \times 30

and show the distribution of tensor singular values in Figure 3b. We can see from Figure 3b that similar to Figure 2b, there are only a small number of singular values with large magnitude, and most of the singular values are close to 0. Then, we can say that some videos can be well approximately low tubal rank.

For TNN and tensor spectral norm, we highlight the following two lemmas.

Lemma 1.

[2] TNN is the convex envelop of the tensor average rank in the unit ball of tensor spectral norm

{\underline{T} \in R^{n_{1} \times n_{2} \times n_{3}} | ∥ \underline{T} ∥_{sp} \leq 1}

.

Lemma 2.

[2] The TNN and the tensor spectral norm are dual norms to each other.

2.2. Related Works

In this subsection, we briefly introduce some related works. The proposed STPCP is tightly related to the Tensor Robust Principal Component Analysis (TRPCA) which aims to recover a low-rank tensor

{\underline{L}}_{0}

and a sparse tensor

{\underline{S}}_{0}

from their sum

\underline{M} = {\underline{L}}_{0} + {\underline{S}}_{0}

. This is a special case of our measurement Model (1) where the noise tensor

{\underline{E}}_{0}

is a zero tensor.

In [39], the SNN-based TRPCA model is proposed by modeling the underlying tensor as a low Tucker rank one:

\begin{matrix} min_{\underline{L}, \underline{S}} & ∥ \underline{L} ∥_{SNN} + {∥ \underline{S} ∥}_{1} s . t . \underline{L} + \underline{S} = \underline{M}, \end{matrix}

(14)

where SNN (Sum of Nuclear Norms) is defined as

∥ \underline{L} ∥_{SNN} : = \sum_{i = 1}^{K} α_{k} {∥ L_{(k)} ∥}_{*},

where

α_{k} > 0

and

L_{(k)}

is the mode-k matricization of

\underline{L}

[40].

Model (14) indeed assumes the underlying tensor to be low Tucker rank, which can be too strong for some real tensor data. The TNN-based TRPCA model uses TNN to impose low-rankness in the final solution

\underline{L}

as follows:

\begin{matrix} min_{\underline{L}, \underline{S}} ∥ \underline{L} ∥_{TNN} + λ {∥ \underline{S} ∥}_{1} s . t . \underline{L} + \underline{S} = \underline{M} . \end{matrix}

(15)

As shown in [2], when the underlying tensor

{\underline{L}}_{0}

satisfy the tensor incoherent conditions, by solving Problem (15), one can exactly recover the underlying tensor

{\underline{L}}_{0}

and

{\underline{S}}_{0}

with high probability with parameter

λ = 1 / \sqrt{max {n_{1}, n_{2}} n_{3}}

.

When the noise tensor

{\underline{E}}_{0}

is not zero, the robust tensor decomposition based on SNN is proposed in [36] as follows:

\begin{matrix} min_{\underline{L}, \underline{S}} \frac{1}{2} ∥ \underline{M} - \underline{L} - \underline{S} ∥_{F} + λ_{1} ∥ \underline{L} ∥_{SNN} + λ_{2} {∥ \underline{S} ∥}_{1}, \end{matrix}

(16)

where

λ_{1}

and

λ_{2}

are positive regularization parameters. The estimation error on

\underline{L}

and

\underline{S}

is analyzed with an upper bound in [36].

In [37], the TNN-based RTD model is proposed as follows:

\begin{matrix} min_{\underline{L}, \underline{S}} \frac{1}{2} ∥ \underline{M} - \underline{L} - \underline{S} ∥_{F} + λ_{1} ∥ \underline{L} ∥_{TNN} + λ_{2} ∥ \underline{S} ∥_{1}, s . t . {∥ \underline{L} ∥}_{\infty} \leq α, \end{matrix}

(17)

where

α

is an upper estimate of

l_{\infty}

-norm of the underlying tensor

{\underline{L}}_{0}

. An upper bound on the estimation error is also established. However, in the analysis of Model (17), the error does not vanish as the noise tensor

{\underline{E}}_{0}

vanishes which means the analysis cannot guarantee exact recovery in the noiseless setting (which can be provided by the analysis of TNN-based TRPCA (15) by Lu et al. [2]).

The Bayesian approach is also used for robust tensor recovery. The CP decomposition under sparse corruption and small dense noise is considered [41], and tensor rank estimation is achieved using Bayesian approach. In [42], CP decomposition under missing value and small dense noise is considered with rank estimation similar to [41]. A sparse Bayesian CP model is proposed in [43] to recover a tensor with missing value, outliers and noises. In [44], a fully Bayesian treatment is proposed to recover a low-tubal-rank tensor corrupted by both noises and outliers.

3. Theoretical Guarantee for Stable Tensor Principal Component Pursuit

In this section, we formulate the proposed STPCP model and give the main theoretical result which upper bounds the estimation error and guarantees exact recovery in the noiseless setting.

3.1. The Proposed STPCP

As for the measurement Model (1), we further assume that the noise tensor

{\underline{E}}_{0}

has bounded energy measured in F-norm, i.e.,

∥ {\underline{E}}_{0} ∥_{F} \leq δ

. Please note that the limited energy assumption is very mild, since most signals are of limited energy.

To recover the low-rank tensor

{\underline{L}}_{0}

and the sparse tensor

{\underline{S}}_{0}

, we first produce the following optimization problem:

\begin{matrix} (\hat{\underline{L}}, \hat{\underline{S}}) = & \underset{\underline{L}, \underline{S}}{argmin} ∥ \underline{L} ∥_{TNN} + λ ∥ \underline{S} ∥_{1}, s . t . {∥ \underline{M} - \underline{L} - \underline{S} ∥}_{F} \leq δ, \end{matrix}

(18)

where

λ

is a positive parameter balancing the two regularizers. The motivation is to use TNN as a low-rank regularization term to exploit the low-dimensional structure in the signal tensor, whereas tensor

l_{1}

-norm is used to impose sparsity in the corruption tensor (since we assumes it to be sparse).

The relationship between Model (18) and existing models are discussed in Remark 1 and Remark 2.

Remark 1.

The following models can be seen as special cases as the proposed STPCP Model (18);

(I).: When $δ = 0$ , i.e., in the noiseless case, the proposed model degenerates to the TRPCA Model (15) [2].
(II).: When $n_{3} = 1$ , then the stable tensor PCP Model (18) degenerates to the Stable Principal Component Pursuit (SPCP) [45] which aims to pursuit the principal components modeled by low-rank matrix ${\underline{L}}_{0}$ from it observation $M$ corrupted by both noises $E_{0}$ and sparse corruptions $S_{0}$ . The SPCP is formulated as follows:

$\begin{matrix} min_{L, S} {∥ L ∥}_{*} + {λ ∥ S ∥}_{1}, s . t . {∥ M - L - S ∥}_{F} \leq δ . \end{matrix}$

(19)
(III).: When $n_{3} = 1$ and $δ = 0$ , the proposed STPCP further degenerates to Robust Principal Component Analysis (RPCA) [46] given as follows:

$\begin{matrix} min_{L, S} {∥ L ∥}_{*} + λ {∥ S ∥}_{1}, s . t . L + S = M . \end{matrix}$

(20)

Remark 2.

The differences from the proposed Model (18) and TNN-based RTD Model ((17) [37]) is as follows. First, our model does not need to upper estimate the

l_{\infty}

-norm of the underlying tensor. Second, our model is a constrained optimization problem, whereas Model (17) is an unconstrained optimization problem.

3.2. A Theorem for Stable Recovery

To analyze the statistical performance of Model (18), we should assume on the underlying low-rank tensor

{\underline{L}}_{0}

that it is not sparse. Only by this assumption,

{\underline{L}}_{0}

can be identified from its mixture with sparse

{\underline{S}}_{0}

. Such an assumption can be described by the tensor incoherence condition [2,47], which is used to provide an identifiablility for low-rank

{\underline{L}}_{0}

.

Definition 12

(Tensor incoherence condition [2,47]). Given a 3-way tensor

\underline{T} \in R^{n_{1} \times n_{2} \times n_{3}}

with tubal rank r, suppose it has the skinny t-SVD

\underline{T} = \underline{U} * \underline{Λ} * {\underline{V}}^{⊤}

, where

\underline{U} \in R^{n_{1} \times r \times n_{3}}, \underline{V} \in R^{r \times n_{2} \times n_{3}}

are orthogonal tensors, and

\underline{Λ} \in R^{r \times r \times n_{3}}

is an f-diagonal tensor. Then,

\underline{T}

is said to satisfy the tensor incoherent condition (TIC) with parameter

μ (\underline{T})

if the following inequalities hold:

\begin{matrix} max_{i \in [n_{1}]} {∥ {\underline{U}}^{⊤} * {\overset{˚}{e}}_{i} ∥}_{F} & \leq \sqrt{\frac{r μ (\underline{T})}{n_{1} n_{3}}}, \end{matrix}

(21)

\begin{matrix} max_{j \in [n_{2}]} {∥ {\underline{V}}^{⊤} * {\overset{˚}{e}}_{j} ∥}_{F} & \leq \sqrt{\frac{r μ (\underline{T})}{n_{2} n_{3}}}, \end{matrix}

(22)

\begin{matrix} ∥ \underline{U} * {\underline{V}}^{⊤} ∥_{\infty} & \leq \sqrt{\frac{r μ (\underline{T})}{n_{1} n_{2} n_{3}}} . \end{matrix}

(23)

where

{\overset{˚}{e}}_{i} \in R^{n_{1} \times 1 \times n_{3}}

is a tensor column basis with only the

(i, 1, 1)

-th element being 1 and all the others being 0, and

{\overset{˚}{e}}_{j} \in R^{n_{2} \times 1 \times n_{3}}

is also a tensor column basis with only the

(j, 1, 1)

-th element being 1 and all the others being 0.

Assumption 1.

Suppose the true tensor

{\underline{L}}_{0}

in the measurement model (1) satisfies tensor incoherence condition with parameter μ.

Assumption 1 intrinsically ensures that the row bases and column bases of

{\underline{L}}_{0}

do not align well with the canonical row and column bases. Thus, the low-rank

{\underline{L}}_{0}

is not sparse, which avoids the ambiguity that low-rank component can also be sparse in the measurement Model (1).

We should also force the sparse component in Model (1) is not low rank.

Assumption 2.

Assume the support Ω of

{\underline{S}}_{0}

is drawn uniformly at random.

Now we can establish an upper bound on the estimation error of

\hat{\underline{L}}

and

\hat{\underline{S}}

in Problem (18).

Theorem 1

(An Upper Bound on the Estimation Error). Suppose

{\underline{L}}_{0}

and

{\underline{S}}_{0}

satisfy Assumption 1 and Assumption 2, respectively. If the tubal rank r of

{\underline{L}}_{0}

and the sparsity (i.e., the

l_{0}

-norm) s of

{\underline{S}}_{0}

are respectively upper bounded as follows:

\begin{matrix} r \leq \frac{c_{r} min {n_{1}, n_{2}}}{μ {log}^{2} (n_{3} max {n_{1}, n_{2}})}, and s \leq c_{s} n_{1} n_{2} n_{3} \end{matrix}

(24)

where

c_{l}

and

c_{s}

are two sufficiently small numerical constants independent on the dimensions

n_{1}

,

n_{2}

and

n_{3}

. Then the estimator defined in Model (18) satisfy the following inequalities:

\begin{matrix} ∥ \hat{\underline{L}} - {\underline{L}}_{0} ∥_{F} & \leq (\sqrt{1 + \frac{1}{max {n_{1}, n_{2}}}} + 8 (1 + 2 \sqrt{2}) \sqrt{min {n_{1}, n_{2}} n_{3}}) δ \\ ∥ \hat{\underline{S}} - {\underline{S}}_{0} ∥_{F} & \leq (\sqrt{1 + max {n_{1}, n_{2}}} + 8 (1 + 2 \sqrt{2}) \sqrt{n_{1} n_{2} n_{3}}) δ, \end{matrix}

(25)

with probability at least

1 - c_{1} {(n_{3} max {n_{1}, n_{2}})}^{- c_{2}}

(over the choice of support of

{\underline{S}}_{0}

), where

c_{1}

and

c_{2}

are positive constants independent on the dimensions

n_{1}

,

n_{2}

and

n_{3}

.

The proof of Theorem 1 are given in the appendix. In Theorem 1, estimation errors on

{\underline{L}}_{0}

and

{\underline{S}}_{0}

are separately established. It indicates that the estimation error scales linearly with the noise level

δ

, which is in consistence with the result in [37].

Remark 3.

A significant progress over [37] is that in the noiseless setting where

{\underline{E}}_{0}

vanishes, our analysis can provide exact recovery guarantee of

{\underline{L}}_{0}

and

{\underline{S}}_{0}

. This is because the tensor incoherence condition adopted in our analysis intrinsically ensures that the low-rank tensor

{\underline{L}}_{0}

is not sparse and thus can be separated from the sparse corruption tensor, whereas the non-spiky condition adopted in [37] fails to provide identifiability in the measurement Model (1).

For Theorem 1, we also give the following remark.

Remark 4.

The error bounds established in Theorem 1 are consistent with the theoretical analysis for the special cases shown in Remark 1.

(I).: When $δ = 0$ , i.e., in the noiseless case, the error bounds in Theorem 1 will vanish, which means exact recovery of ${\underline{L}}_{0}$ and ${\underline{S}}_{0}$ can be guaranteed. This result is consistent with the analysis in [2] for TNN-based TRPCA Model (15).
(II).: When $n_{3} = 1$ , the error bound on the sparse component in Theorem 1 is consistent with the error bound shown in Equation (8) of [45]. The upper bound on error of the low-rank component in Theorem 1 is sharper than that given in Equation (8) of [45].
(III).: When $n_{3} = 1$ and $δ = 0$ , the proposed STPCP has consistent theoretical guarantee with the analysis of RPCA [46].

4. Algorithms

In this section, we design two algorithms. The first algorithm is based on the framework of ADMM [48] which has been extensively used in convex optimization with good convergence behavior. However, ADMM requires full SVDs on large matrices in each iteration which is high computational burden in high-dimensional settings. Thus, the second algorithm is proposed to solve this issue by using a factorization trick which can instead conducting SVDs on much smaller matrices.

4.1. An ADMM Algorithm

The proposed estimator (18) is equivalent to the following unconstrained problem:

\begin{matrix} min_{\underline{L}, \underline{S}} \frac{1}{2} ∥ \underline{L} + \underline{S} - \underline{M} ∥_{F}^{2} + γ (∥ \underline{L} ∥_{TNN} + λ ∥ \underline{S} ∥_{1}), \end{matrix}

(26)

where

γ

is a positive parameter balancing the data fidelity term and the regularization term.

Besides being corrupted by noises and outliers, the observed tensor

\underline{M}

may also suffer from missing entries which can be taken as outliers with known positions in many applications. Thus, it is more practical to consider the recovery of

{\underline{L}}_{0}

against outliers

{\underline{S}}_{0}

, noises

{\underline{E}}_{0}

and missing entries shown in the following measurement model:

\begin{matrix} \underline{M} = \underline{B} ⊙ ({\underline{L}}_{0} + {\underline{S}}_{0} + {\underline{E}}_{0}), \end{matrix}

(27)

where tensor

\underline{B} \in R^{n_{1} \times n_{2} \times n_{3}}

denote the missing mask where

{\underline{B}}_{i j k} = 1

, if the

(i, j, k)

-th entry of

\underline{L}

is observed and

{\underline{B}}_{i j k} = 0

otherwise, and ⊙ denotes element-wise multiplication. Taking into consideration of missing entries, Model (26) can be further modified as:

\begin{matrix} min_{\underline{L}, \underline{S}} \frac{1}{2} ∥ \underline{B} ⊙ (\underline{L} + \underline{S} - \underline{M}) ∥_{F}^{2} + γ (∥ \underline{L} ∥_{TNN} + λ ∥ \underline{S} ∥_{1}) . \end{matrix}

(28)

By adding auxiliary variables to Problem (28), we obtain:

\begin{matrix} min_{\underline{K}, \underline{L}, \underline{R}, \underline{S}} & \frac{1}{2} ∥ \underline{B} ⊙ (\underline{L} + \underline{S} - \underline{M}) ∥_{F}^{2} + γ ∥ \underline{K} ∥_{TNN} + γ λ {∥ \underline{R} ∥}_{1} \\ s . t . & \underline{K} = \underline{L}, \underline{R} = \underline{S} . \end{matrix}

(29)

The Augmented Lagrangian (AL) of Problem (29) is given as follows:

\begin{matrix} L_{ρ} (\underline{L}, \underline{S}, \underline{K}, \underline{R}, {\underline{Y}}_{1}, {\underline{Y}}_{2}) = \frac{1}{2} & ∥ \underline{B} ⊙ (\underline{L} + \underline{S} - \underline{M}) ∥_{F}^{2} + γ ∥ \underline{K} ∥_{TNN} + γ λ {∥ \underline{R} ∥}_{1} \\ + 〈{\underline{Y}}_{1}, \underline{K} - \underline{L}〉 + \frac{ρ}{2} ∥ \underline{K} - \underline{L} ∥_{F}^{2} + 〈{\underline{Y}}_{2}, \underline{R} - \underline{S}〉 + \frac{ρ}{2} {∥ \underline{R} - \underline{S} ∥}_{F}^{2}, \end{matrix}

(30)

where

{\underline{Y}}_{1}, {\underline{Y}}_{2} \in R^{n_{1} \times n_{2} \times n_{3}}

are Lagrangian multipliers and

ρ

is a penalty parameter.

According the strategy of ADMM, we update prime variables

(\underline{L}, \underline{S})

and

(\underline{K}, \underline{R})

by alternative minimization of AL in Problem (29) as follows

Update $(\underline{L}, \underline{S})$ . We update $(\underline{L}, \underline{S})$ by minimizing $L_{ρ}$ with other variables fixed as follows:

$\begin{matrix} ({\underline{L}}^{t + 1}, {\underline{S}}^{t + 1}) \\ = \underset{(\underline{L}, \underline{S})}{argmin} L_{ρ} (\underline{L}, \underline{S}, {\underline{K}}^{t}, {\underline{R}}^{t}, {\underline{Y}}_{1}^{t}, {\underline{Y}}_{2}^{t}) \\ = \underset{(\underline{L}, \underline{S})}{argmin} \frac{1}{2} ∥ \underline{B} ⊙ (\underline{L} + \underline{S} - \underline{M}) ∥_{F}^{2} + 〈{\underline{Y}}_{1}^{t}, {\underline{K}}^{t} - \underline{L}〉 + \frac{ρ}{2} ∥ {\underline{K}}^{t} - \underline{L} ∥_{F}^{2} + 〈{\underline{Y}}_{2}^{t}, {\underline{R}}^{t} - \underline{S}〉 + \frac{ρ}{2} {∥ {\underline{R}}^{t} - \underline{S} ∥}_{F}^{2} . \end{matrix}$

(31)

Taking derivatives of the right-hand side of Equation (31) with respect to $\underline{L}$ and $\underline{S}$ respectively, and setting the results zero, we obtain:

$\begin{matrix} \underline{B} ⊙ ({\underline{L}}^{t + 1} + {\underline{S}}^{t + 1}) - \underline{B} ⊙ \underline{M} - {\underline{Y}}_{1}^{t} + ρ ({\underline{L}}^{t + 1} - {\underline{K}}^{t}) & = \underline{0} \\ \underline{B} ⊙ ({\underline{L}}^{t + 1} + {\underline{S}}^{t + 1}) - \underline{B} ⊙ \underline{M} - {\underline{Y}}_{2}^{t} + ρ ({\underline{S}}^{t + 1} - {\underline{R}}^{t}) & = \underline{0} . \end{matrix}$

(32)

Resolving the above equation group yields:

$\begin{matrix} {\underline{L}}^{t + 1} & = (ρ (\underline{B} + ρ \underline{1}) ⊙ {\underline{K}}^{t} + ρ \underline{B} ⊙ \underline{M} + (\underline{B} + ρ \underline{1}) ⊙ {\underline{Y}}_{1}^{t} - \underline{B} ⊙ {\underline{Y}}_{2}^{t} - ρ \underline{B} ⊙ {\underline{R}}^{t}) ⊘ (ρ (2 \underline{B} + ρ \underline{1})), \\ {\underline{S}}^{t + 1} & = (ρ (\underline{B} + ρ \underline{1}) ⊙ {\underline{R}}^{t} + ρ \underline{B} ⊙ \underline{M} + (\underline{B} + ρ \underline{1}) ⊙ {\underline{Y}}_{2}^{t} - \underline{B} ⊙ {\underline{Y}}_{1}^{t} - ρ \underline{B} ⊙ {\underline{K}}^{t}) ⊘ (ρ (2 \underline{B} + ρ \underline{1})), \end{matrix}$

(33)

where ⊘ denotes entry-wise division and $\underline{1}$ denotes the tensor all whose entries are 1.
Update $(\underline{K}, \underline{R})$ . We update $(\underline{K}, \underline{R})$ by minimizing $L_{ρ}$ with other variables fixed as follows

$\begin{matrix} ({\underline{K}}^{t + 1}, {\underline{R}}^{t + 1}) \\ = \underset{(\underline{K}, \underline{R})}{argmin} L_{ρ} ({\underline{L}}^{t + 1}, {\underline{S}}^{t + 1}, \underline{K}, \underline{R}, {\underline{Y}}_{1}^{t}, {\underline{Y}}_{2}^{t}) \\ = \underset{(\underline{K}, \underline{R})}{argmin} γ ∥ \underline{K} ∥_{TNN} + γ λ ∥ \underline{R} ∥_{1} + 〈{\underline{Y}}_{1}^{t}, \underline{K} - {\underline{L}}^{t + 1}〉 + \frac{ρ}{2} ∥ \underline{K} - {\underline{L}}^{t + 1} ∥_{F}^{2} + 〈{\underline{Y}}_{2}^{t}, \underline{R} - {\underline{S}}^{t + 1}〉 + \frac{ρ}{2} {∥ \underline{R} - {\underline{S}}^{t + 1} ∥}_{F}^{2} . \end{matrix}$

(34)

Please note that Problem (34) can further be solved separately as follows:

$\begin{matrix} {\underline{K}}^{t + 1} & = \underset{\underline{K}}{argmin} γ ∥ \underline{K} ∥_{TNN} + 〈{\underline{Y}}_{1}^{t}, \underline{K} - {\underline{L}}^{t + 1}〉 + \frac{ρ}{2} {∥ \underline{K} - {\underline{L}}^{t + 1} ∥}_{F}^{2} \\ = S_{γ ρ^{- 1}}^{{∥ \cdot ∥}_{TNN}} ({\underline{L}}^{t + 1} - ρ^{- 1} {\underline{Y}}_{1}^{t}) . \end{matrix}$

(35)

and

$\begin{matrix} {\underline{R}}^{t + 1} & = \underset{\underline{R}}{argmin} γ λ ∥ \underline{R} ∥_{1} + 〈{\underline{Y}}_{1}^{t}, \underline{R} - {\underline{S}}^{t + 1}〉 + \frac{ρ}{2} {∥ \underline{R} - {\underline{S}}^{t + 1} ∥}_{F}^{2} \\ = S_{γ λ ρ^{- 1}}^{{∥ \cdot ∥}_{1}} ({\underline{S}}^{t + 1} - ρ^{- 1} {\underline{Y}}_{2}^{t}), \end{matrix}$

(36)

where $S_{τ}^{{∥ \cdot ∥}_{TNN}} (\cdot)$ is the proximity operator of TNN [5]. and $S_{τ}^{{∥ \cdot ∥}_{1}} (\cdot)$ is the proximity operator of tensor $l_{1}$ -norm given as follows [49]:

$\begin{matrix} S_{τ}^{{∥ \cdot ∥}_{1}} (\underline{A}) : = & \underset{\underline{X}}{argmin} τ ∥ \underline{X} ∥_{1} + \frac{1}{2} ∥ \underline{X} - \underline{A} ∥_{F}^{2} = sign (\underline{A}) ⊙ max {(| \underline{A} | - τ, 0}, \end{matrix}$

In [5], a closed-form expression of $S_{τ} (\cdot)$ is given as follows:
Lemma 3.
(Proximity operator of TNN [5]) For any 3D tensor $\underline{A} \in R^{n_{1} \times n_{2} \times n_{3}}$ with reduced t-SVD $\underline{A} = \underline{U} * \underline{Λ} * {\underline{V}}^{⊤}$ , where $\underline{U} \in R^{n_{1} \times r \times n_{3}}$ and $\underline{V} \in R^{n_{2} \times r \times n_{3}}$ are orthogonal tensors and $\underline{Λ} \in R^{r \times r \times n_{3}}$ is the f-diagonal tensor of singular tubes, the proximity operator $S_{τ}^{{∥ \cdot ∥}_{TNN}} (\underline{A})$ at $\underline{A}$ can be computed by:

$\begin{matrix} S_{τ}^{{∥ \cdot ∥}_{TNN}} (\underline{A}) : = & \underset{\underline{X}}{argmin} τ ∥ \underline{X} ∥_{TNN} + \frac{1}{2} {∥ \underline{X} - \underline{A} ∥}_{F}^{2} = \underline{U} * i f f t 3 (max (f f t 3 (\underline{Λ}) - τ, 0)) * {\underline{V}}^{⊤}, \end{matrix}$
Update $({\underline{Y}}_{1}, {\underline{Y}}_{2})$ . The Lagrangian multipliers are updated by gradient ascent as follows:

$\begin{matrix} {\underline{Y}}_{1}^{t + 1} & = {\underline{Y}}_{1}^{t} + ρ ({\underline{K}}^{t + 1} - {\underline{L}}^{t + 1}), \\ {\underline{Y}}_{2}^{t + 1} & = {\underline{Y}}_{2}^{t} + ρ ({\underline{R}}^{t + 1} - {\underline{S}}^{t + 1}) . \end{matrix}$

(37)

The algorithm is summarized in Algorithm 1. The convergence analysis of Algorithm 1 is established in Theorem 2.

Algorithm 1 Solving Problem (29) using ADMM.

Input:

The observed tensor

\underline{M}

, the parameters

γ, λ, ρ, δ

.

1:

Initialize

t = 0

,

{\underline{L}}^{0} = {\underline{S}}^{0} = {\underline{K}}^{0} = {\underline{R}}^{0} = {\underline{Y}}_{1}^{0} = {\underline{Y}}_{2}^{0} = \underline{0} \in R^{n_{1} \times n_{2} \times n_{3}}

2:

for

t = 0, \dots, T_{max}

do

3:

Update

({\underline{L}}^{t + 1}, {\underline{S}}^{t + 1})

by Equation (33);

4:

Update

({\underline{K}}^{t + 1}, {\underline{R}}^{t + 1})

by Equations (35)–(36);

5:

Update

({\underline{Y}}_{1}^{t + 1}, {\underline{Y}}_{2}^{t + 1})

by Equation (37);

6:

Check the convergence criteria:

(i): convergence of variables: $∥ {\underline{A}}^{t + 1} - {\underline{A}}^{t} ∥_{\infty} \leq δ, \forall \underline{A} \in {\underline{L}, \underline{S}, \underline{K}, \underline{R}}$ ,
(ii): convergence of constraints: $max {∥ {\underline{K}}^{t + 1} - {\underline{L}}^{t} ∥_{\infty}, ∥ {\underline{R}}^{t + 1} - {\underline{S}}^{t + 1} ∥_{\infty}} \leq δ$ .

7:

end for

Output:

(\hat{\underline{L}}, \hat{\underline{S}}) = ({\underline{L}}^{t + 1}, {\underline{S}}^{t + 1})

.

Theorem 2

(Convergence of Algorithm 1). For any

ρ > 0

, if the unaugmented Lagrangian

L (\underline{L}, \underline{S}, \underline{K}, \underline{R}, {\underline{Y}}_{1}, {\underline{Y}}_{2})

has a saddle point, then the iterations

L ({\underline{L}}^{t}, {\underline{S}}^{t}, {\underline{K}}^{t}, {\underline{R}}^{t}, {\underline{Y}}_{1}^{t}, {\underline{Y}}_{2}^{t})

in Algorithm 1 satisfy the residual convergence, objective convergence and dual variable convergence of Problem (29) as

t \to \infty

.

The proof of Theorem 2 is given in the Appendix A.

In a single iteration of Algorithm 1, the main cost comes from updating

{\underline{L}}^{t}

which involves computing FFT, IFFT and

n_{3}

SVDs of

n_{1} \times n_{2}

matrices [47]. Hence Algorithm 1 has per-iteration complexity of order

O (n_{1} n_{2} n_{3} (n_{1} \land n_{2} + log n_{3}))

. Thus, if the total iteration number is T, then the total computational complexity is:

\begin{matrix} O (T n_{1} n_{2} n_{3} (min {n_{1}, n_{2}} + log n_{3})) . \end{matrix}

(38)

4.2. A Faster Algorithm

To reduce the cost of computing TNN which is a main cost of Algorithm 1, we propose the following lemma which indicates that TNN is orthogonal invariant.

Lemma 4.

Given a tensor

\underline{X} \in R^{r \times r \times n_{3}}

, let

\underline{Q} \in R^{n_{1} \times r \times n_{3}}

a two semi-orthogonal tensors, i.e.,

{\underline{Q}}^{⊤} * \underline{Q} = \underline{I} \in R^{r \times r \times n_{3}}

and

r \leq min {n_{1}, n_{2}}

. Then, we have the following relationship:

\begin{matrix} ∥ \underline{Q} * \underline{X} ∥_{TNN} = {∥ \underline{X} ∥}_{TNN} . \end{matrix}

The proof of Lemma 4 can be found in the appendix. Equipped with Lemma 4, we decompose the low-rank component in Problem (28) as follows:

\begin{matrix} \underline{L} = \underline{Q} * \underline{X}, s . t . {\underline{Q}}^{⊤} * \underline{Q} = {\underline{I}}_{r}, \end{matrix}

where

{\underline{I}}_{r} \in R^{r \times r \times n_{3}}

is an identity tensor. The similar strategy has been used in low-rank matrix recovery from gross corruptions by [50]. Furthermore, we propose the following model for Problem (28):

\begin{matrix} min_{\underline{Q}, \underline{X}, \underline{S}} & \frac{1}{2} ∥ \underline{B} ⊙ (\underline{Q} * \underline{X} + \underline{S} - \underline{M}) ∥_{F}^{2} + γ (∥ \underline{X} ∥_{TNN} + λ ∥ \underline{S} ∥_{1}) \\ s . t . & {\underline{Q}}^{⊤} * \underline{Q} = {\underline{I}}_{r}, \end{matrix}

(39)

where r is an upper estimation of tubal rank of the underlying tensor

r^{*} = r_{tubal} ({\underline{L}}_{0})

.

In contrast to Model (28), the proposed Model (39) is a non-convex optimization problem. That means Model (39) may have many local minima. We establish a connection between the proposed Model (39) with Model (28) in the following theorem.

Theorem 3

(Connection between Model (39) and Model (28)). Let

({\underline{Q}}_{*}, {\underline{X}}_{*}, {\underline{S}}_{*})

be a global optimal solution to Problem (39). Furthermore, let

({\underline{L}}^{⋆}, {\underline{S}}^{⋆})

be the solution to Problem (28), and

r_{tubal} ({\underline{L}}^{⋆}) \leq r

, where r is the initialized tubal rank. Then

({\underline{Q}}_{*} * {\underline{X}}_{*}, {\underline{S}}_{*})

is also the optimal solution to Problem (28).

The proof of Theorem 3 can be found in the appendix. Theorem 3 states that the global optimal point of the (non-convex) Model (39) coincides with solution of the (convex) Model (28). It further indicates that the accuracy of Model (39) cannot exceed Model (28), which can be validated numerically in the experiment section.

To solve Model (39), we also use the ADMM framework.

First, by adding auxiliary variables, we have the following problem:

\begin{matrix} min_{\underline{L}, \underline{S}, \underline{R}, \underline{Q}, \underline{X}} & \frac{1}{2} ∥ \underline{B} ⊙ (\underline{L} + \underline{S} - \underline{M}) ∥_{F}^{2} + γ (∥ \underline{X} ∥_{TNN} + λ ∥ \underline{R} ∥_{1}) \\ s . t . & \underline{Q} * \underline{X} = \underline{L}; \underline{R} = \underline{S}; {\underline{Q}}^{⊤} * \underline{Q} = {\underline{I}}_{r} . \end{matrix}

(40)

The augmented Lagrangian of Problem (40) is:

\begin{matrix} L_{2}^{'} (\underline{L}, \underline{S}, \underline{R}, \underline{Q}, \underline{X}) & = \frac{1}{2} ∥ \underline{B} ⊙ (\underline{L} + \underline{S} - \underline{M}) ∥_{F}^{2} + γ (∥ \underline{X} ∥_{TNN} + λ ∥ \underline{R} ∥_{1}) \\ + 〈{\underline{Y}}_{1}, \underline{Q} * \underline{X} - \underline{L}〉 + \frac{ρ}{2} ∥ \underline{Q} * \underline{X} - \underline{L} ∥_{F}^{2} + 〈{\underline{Y}}_{2}, \underline{R} - \underline{S}〉 + \frac{ρ}{2} {∥ \underline{R} - \underline{S} ∥}_{F}^{2} \\ s . t . {\underline{Q}}^{⊤} * \underline{Q} = {\underline{I}}_{r} . \end{matrix}

(41)

According the strategy of ADMM, we update prime variables

(\underline{L}, \underline{S})

and

(\underline{Q}, \underline{X}, \underline{R})

by alternative minimization of AL in Problem (41) as follows

Update $(\underline{L}, \underline{S})$ : We update $(\underline{L}, \underline{S})$ by minimizing $L_{ρ}^{'}$ with other variables fixed as follows:

$\begin{matrix} ({\underline{L}}^{t + 1}, {\underline{S}}^{t + 1}) \\ = \underset{(\underline{L}, \underline{S})}{argmin} L_{ρ}^{'} (\underline{L}, \underline{S}, {\underline{Q}}^{t}, {\underline{X}}^{t}, {\underline{R}}^{t}, {\underline{Y}}_{1}^{t}, {\underline{Y}}_{2}^{t}) \\ = \underset{(\underline{L}, \underline{S})}{argmin} \frac{1}{2} ∥ \underline{B} ⊙ (\underline{L} + \underline{S} - \underline{M}) ∥_{F}^{2} + 〈{\underline{Y}}_{1}^{t}, {\underline{Q}}^{t} * {\underline{X}}^{t} - \underline{L}〉 + \frac{ρ}{2} ∥ {\underline{Q}}^{t} * {\underline{X}}^{t} - \underline{L} ∥_{F}^{2} + 〈{\underline{Y}}_{2}^{t}, {\underline{R}}^{t} - \underline{S}〉 + \frac{ρ}{2} {∥ {\underline{R}}^{t} - \underline{S} ∥}_{F}^{2} . \end{matrix}$

(42)

Taking derivatives of the right-hand side with respect to $\underline{L}$ and $\underline{S}$ respectively, and setting the results zero, we obtain:

$\begin{matrix} \underline{B} ⊙ ({\underline{L}}^{t + 1} + {\underline{S}}^{t + 1}) - \underline{B} ⊙ \underline{M} - {\underline{Y}}_{1}^{t} + ρ ({\underline{L}}^{t + 1} - {\underline{Q}}^{t} * {\underline{X}}^{t}) & = \underline{0} \\ \underline{B} ⊙ ({\underline{L}}^{t + 1} + {\underline{S}}^{t + 1}) - \underline{B} ⊙ \underline{M} - {\underline{Y}}_{2}^{t} + ρ ({\underline{S}}^{t + 1} - {\underline{R}}^{t}) & = \underline{0}, \end{matrix}$

(43)

Resolving the above equation group yields:

$\begin{matrix} {\underline{L}}^{t + 1} & = ((1 + ρ) {\underline{Q}}^{t} * {\underline{X}}^{t} + \underline{B} ⊙ \underline{M} + {\underline{Y}}_{1}^{t} - {\underline{R}}^{t}) ⊘ (2 \underline{B} + ρ \underline{1}), \\ {\underline{S}}^{t + 1} & = ((1 + ρ) {\underline{R}}^{t} + \underline{B} ⊙ \underline{M} + {\underline{Y}}_{2}^{t} - {\underline{Q}}^{t} * {\underline{X}}^{t}) ⊘ (2 \underline{B} + ρ \underline{1}) . \end{matrix}$

(44)
Update $\underline{Q}$ . We update $\underline{Q}$ by minimizing $L_{ρ}^{'}$ with other variables fixed as follows

$\begin{matrix} min_{{\underline{Q}}^{⊤} * \underline{Q} = {\underline{I}}_{r}} L_{ρ} ({\underline{L}}^{t + 1}, {\underline{S}}^{t + 1}, \underline{Q}, {\underline{X}}^{t}, {\underline{R}}^{t}, {\underline{Y}}_{1}^{t}, {\underline{Y}}_{2}^{t}) \\ = min_{{\underline{Q}}^{⊤} * \underline{Q} = {\underline{I}}_{r}} 〈{\underline{Y}}_{1}^{t}, \underline{Q} * {\underline{X}}^{t} - {\underline{L}}^{t + 1}〉 + \frac{ρ}{2} {∥ \underline{Q} * {\underline{X}}^{t} - {\underline{L}}^{t + 1} ∥}_{F}^{2} . \\ = min_{{\underline{Q}}^{⊤} * \underline{Q} = {\underline{I}}_{r}} \frac{ρ}{2} {∥ \underline{Q} * {\underline{X}}^{t} - ({\underline{L}}^{t + 1} - ρ^{- 1} {\underline{Y}}_{1}^{t}) ∥}_{F}^{2} \\ = P (({\underline{L}}^{t + 1} - ρ^{- 1} {\underline{Y}}_{1}^{t}) * {({\underline{X}}^{t})}^{⊤}), \end{matrix}$

(45)

where operator $P (\cdot)$ is defined in Lemma 5 as follows.
Lemma 5.
([51]) Given any tensors $\underline{A} \in R^{r \times n_{2} \times n_{3}}, \underline{B} \in R^{n_{1} \times n_{2} \times n_{3}}$ , suppose tensor $\underline{B} * {\underline{A}}^{⊤}$ has t-SVD $\underline{B} * {\underline{A}}^{⊤} = \underline{U} * \underline{Λ} * {\underline{V}}^{⊤}$ , where $\underline{U} \in R^{n_{1} \times r \times n_{3}}$ and $\underline{V} \in R^{r \times r \times n_{3}}$ . Then, the problem:

$\begin{matrix} min_{{\underline{Q}}^{⊤} * \underline{Q} = {\underline{I}}_{r}} {∥ \underline{P} * \underline{A} - \underline{B} ∥}_{F}^{2} \end{matrix}$

(46)

has a closed-form solution as:

$\begin{matrix} \underline{Q} = P (\underline{B} * {\underline{A}}^{⊤}) : = \underline{U} * {\underline{V}}^{⊤} . \end{matrix}$

(47)
Update $(\underline{X}, \underline{R})$ :We update $(\underline{X}, \underline{S})$ by minimizing $L_{ρ}^{'}$ with other variables fixed as follows:

$\begin{matrix} min_{(\underline{X}, \underline{R})} L_{ρ} ({\underline{L}}^{t + 1}, {\underline{S}}^{t + 1}, {\underline{Q}}^{t + 1}, \underline{X}, \underline{R}, {\underline{Y}}_{1}^{t}, {\underline{Y}}_{2}^{t}) \\ = min_{(\underline{X}, \underline{R})} γ ∥ \underline{X} ∥_{TNN} + γ λ ∥ \underline{R} ∥_{1} + 〈{\underline{Y}}_{1}^{t}, {\underline{Q}}^{t + 1} * \underline{X} - {\underline{L}}^{t + 1}〉 + \frac{ρ}{2} {∥ {\underline{Q}}^{t + 1} * \underline{X} - {\underline{L}}^{t + 1} ∥}_{F}^{2} \\ + 〈{\underline{Y}}_{2}^{t}, \underline{R} - {\underline{S}}^{t + 1}〉 + \frac{ρ}{2} {∥ \underline{R} - {\underline{S}}^{t + 1} ∥}_{F}^{2} . \end{matrix}$

(48)

Please note that Problem (48) can further be solved separately as follows:

$\begin{matrix} {\underline{K}}^{t + 1} & = \underset{\underline{X}}{argmin} γ ∥ \underline{X} ∥_{TNN} + 〈{\underline{Y}}_{1}^{t}, {\underline{Q}}^{t + 1} * \underline{X} - {\underline{L}}^{t + 1}〉 + \frac{ρ}{2} {∥ {\underline{Q}}^{⊤} * \underline{X} - {\underline{L}}^{t + 1} ∥}_{F}^{2} \\ = \underset{\underline{X}}{argmin} γ ∥ \underline{X} ∥_{TNN} + \frac{ρ}{2} {∥ {\underline{Q}}^{t + 1} * \underline{X} - ({\underline{L}}^{t + 1} - ρ^{- 1} {\underline{Y}}_{1}^{t}) ∥}_{F}^{2} \\ \overset{(i)}{=} \underset{\underline{X}}{argmin} γ ∥ \underline{X} ∥_{TNN} + \frac{ρ}{2} {∥ \underline{X} - {({\underline{Q}}^{t + 1})}^{⊤} * ({\underline{L}}^{t + 1} - ρ^{- 1} {\underline{Y}}_{1}^{t}) ∥}_{F}^{2} \\ = S_{γ ρ^{- 1}}^{{∥ \cdot ∥}_{TNN}} ({({\underline{Q}}^{t + 1})}^{⊤} * ({\underline{L}}^{t + 1} - ρ^{- 1} {\underline{Y}}_{1}^{t}) .) \end{matrix}$

(49)

and

$\begin{matrix} {\underline{R}}^{t + 1} & = \underset{\underline{R}}{argmin} γ λ ∥ \underline{R} ∥_{1} + 〈{\underline{Y}}_{1}^{t}, \underline{R} - {\underline{S}}^{t + 1}〉 + \frac{ρ}{2} {∥ \underline{R} - {\underline{S}}^{t + 1} ∥}_{F}^{2} \\ = S_{γ λ ρ^{- 1}}^{{∥ \cdot ∥}_{1}} ({\underline{K}}^{t + 1} - ρ^{- 1} {\underline{Y}}_{2}^{t}) . \end{matrix}$

(50)

The equality $(i)$ in Equation (49) holds because according to ${\underline{Q}}^{⊤} * \underline{Q} = \underline{I}$ , we have:

$\begin{matrix} min_{\underline{X}} {∥ \underline{Q} * \underline{X} - \underline{Y} ∥}_{F}^{2} & = min_{\bar{X}} \frac{1}{n_{3}} {∥ \bar{Q} \cdot \bar{X} - \bar{Y} ∥}_{F}^{2} \\ = min_{\bar{X}} \frac{1}{n_{3}} ∥ \bar{Y} ∥_{F}^{2} - \frac{2}{n_{3}} 〈\bar{Q} \cdot \bar{X}, \bar{Y}〉 + \frac{1}{n_{3}} {∥ \bar{Q} \cdot \bar{X} ∥}_{F}^{2} \\ = min_{\bar{X}} \frac{1}{n_{3}} ∥ \bar{Y} ∥_{F}^{2} - \frac{2}{n_{3}} 〈\bar{X}, {\bar{Q}}^{H} \bar{Y}〉 + \frac{1}{n_{3}} {∥ \bar{X} ∥}_{F}^{2} \\ = min_{\bar{X}} \frac{1}{n_{3}} {∥ \bar{X} - {\bar{Q}}^{H} \bar{Y} ∥}_{F}^{2} \\ = min_{\underline{X}} \frac{ρ}{2} {∥ \underline{X} - {\underline{Q}}^{⊤} * \underline{Y} ∥}_{F}^{2} . \end{matrix}$

(51)
Update $({\underline{Y}}_{1}, {\underline{Y}}_{2})$ . The Lagrangian multipliers are updated by gradient ascent as follows:

$\begin{matrix} {\underline{Y}}_{1}^{t + 1} & = {\underline{Y}}_{1}^{t} + ρ ({\underline{Q}}^{t + 1} * {\underline{X}}^{t + 1} - {\underline{L}}^{t + 1}), \\ {\underline{Y}}_{2}^{t + 1} & = {\underline{Y}}_{2}^{t} + ρ ({\underline{R}}^{t + 1} - {\underline{S}}^{t + 1}) . \end{matrix}$

(52)

The algorithmic steps are summarized in Algorithm 2. The complexity analysis is given as follows.

In each iteration of Algorithm 2, the update of

\underline{L}

requires FFT/IFFT, and

n_{3}

multiplications of

n_{1}

-by-r and r-by-

n_{2}

matrices, which costs

O ((n_{1} n_{2} + r n_{1} + r_{n} 2) n_{3} log n_{3} + r n_{1} n_{2} n_{3})

; updating

\underline{S}

costs

O (n_{1} n_{2} n_{3})

; updating of

\underline{Q}

involves FFT/IFFT and

n_{3}

SVDs of

n_{1}

-by-r matrices, which costs

O (r n_{1} n_{3} log n_{3} + r^{2} n_{1} n_{3})

; updating

\underline{X}

involves FFT/IFFT and

n_{3}

SVDs of r-by-

n_{2}

, which costs

O (r n_{2} n_{3} log n_{3} + r^{2} n_{2} n_{3}))

. Then, the per-iteration computational complexity of Algorithm 2 is dominated by:

\begin{matrix} O (max \{n_{1} n_{2} n_{3} log n_{3}, r^{2} (n_{1} + n_{2}) n_{3}\}) . \end{matrix}

Since the low-tubal-rank assumption

r ≪ min {n_{1}, n_{2}}

is adopted in this paper, the per-iteration of Algorithm 2 is much lower than Algorithm 1.

Algorithm 2 Solving Problem (40) using ADMM.

Input:

The observed tensor

\underline{M}

, an upper estimation r of

r_{tubal} ({\underline{L}}_{0})

, the parameters

γ, λ, ρ, δ

.

1:

Initialize

t = 0

,

{\underline{L}}^{0} = {\underline{S}}^{0} = {\underline{R}}^{0} = {\underline{Y}}_{1}^{0} = {\underline{Y}}_{2}^{0} = \underline{0} \in R^{n_{1} \times n_{2} \times n_{3}}

,

{\underline{Q}}^{0} = \underline{0} \in R^{n_{1} \times r \times n_{3}}

,

{\underline{X}}^{0} = \underline{0} \in R^{r \times n_{2} \times n_{3}}

.

2:

for

t = 0, \dots, T_{max}

do

3:

Update

({\underline{L}}^{t + 1}, {\underline{S}}^{t + 1})

by Equation (42);

4:

Update

{\underline{Q}}^{t + 1}

by Equation (45);

5:

Update

({\underline{X}}^{t + 1}, {\underline{R}}^{t + 1})

by Equations (49)–(50);

6:

Update

({\underline{Y}}_{1}^{t + 1}, {\underline{Y}}_{2}^{t + 1})

by Equation (52);

7:

Check the convergence criteria:

(i): convergence of variables: $∥ {\underline{A}}^{t + 1} - {\underline{A}}^{t} ∥_{\infty} \leq δ, \forall \underline{A} \in {\underline{L}, \underline{S}, \underline{R}, \underline{Q}, \underline{X}}$
(ii): convergence of constraints: $max {∥ {\underline{Q}}^{t + 1} * {\underline{X}}^{t + 1} - {\underline{L}}^{t} ∥_{\infty}, ∥ {\underline{R}}^{t + 1} - {\underline{S}}^{t + 1} ∥_{\infty}} \leq δ$ .

8:

end for

Output:

(\hat{\underline{L}}, \hat{\underline{S}}) = ({\underline{L}}^{t + 1}, {\underline{S}}^{t + 1})

.

5. Experiments

5.1. Synthetic Data

We first verify the correctness of Theorem 1. Specifically, we check whether the following two statements indicated in Theorem 1 hold in experiments on synthetic data sets:

(I).: (Exact recovery in the noiseless setting.) Our analysis guarantees that the underlying low-rank tensor ${\underline{L}}_{0}$ and sparse tensor ${\underline{S}}_{0}$ can be exactly recovered in the noiseless setting. This statement will be checked in Section 5.1.1.
(II).: (Linear scaling of errors with the noise level.) In Theorem 1, the estimation errors on ${\underline{L}}_{0}$ and ${\underline{S}}_{0}$ scales linearly with the noise level $δ$ . This statement will be checked in Section 5.1.2.

Signal Generation. With a given tubal rank

r_{0}

, we first generate the underlying tensor

{\underline{L}}_{0} \in R^{n_{1} \times n_{2} \times n_{3}}

by

{\underline{L}}_{0} = \underline{A} * \underline{B} / n_{3}

, where tensors

\underline{A} \in R^{n_{1} \times r_{0} \times n_{3}}

and

\underline{B} \in R^{r_{0} \times n_{2} \times n_{3}}

are generated with i.i.d. standard Gaussian elements. Then, the sparse corruption tensor

{\underline{S}}_{0}

is generated by choosing its support uniformly at random. The non-zero elements of

{\underline{S}}_{0}

will be i.i.d. sampled from a certain distribution that will be specified afterwards. Furthermore, the noise tensor

{\underline{E}}_{0}

is generated with entries sampled i.i.d. from

N (0, σ^{2})

with

σ = c ∥ {\underline{L}}_{0} ∥_{F} / \sqrt{n_{1} n_{2} n_{3}}

, where we set constant c is to control the signal noise ratio. Finally, the observed tensor

\underline{M}

is formed by

\underline{M} = {\underline{L}}_{0} + {\underline{S}}_{0} + {\underline{E}}_{0}

.

5.1.1. Exact Recovery in the Noiseless Setting

We first check Statement (I), i.e., exact recovery in the noiseless setting. Specifically, we will show that Algorithm 1 and Algorithm 2 can exactly recover the underlying tensor

{\underline{L}}_{0}

and the sparse corruption

{\underline{S}}_{0}

. We first test the recovery performance of different tensor sizes by setting

n = n_{1} = n_{2} \in {100, 160, 200}

and

n_{3} = 20

, with

(r_{tubal} ({\underline{L}}_{0}), ∥ {\underline{S}}_{0} ∥_{0}) = (0.05 n, 0.05 n^{2} n_{3})

. The non-zero elements of tensor

{\underline{S}}_{0}

is sampled from i.i.d. symmetric Bernoulli distribution, i.e., the possibility of being 1 or −1 are 1/2. The results are shown in Table 1. It can be seen that both Algorithm 1 and Algorithm 2 can obtain relative standard error (RSE) smaller than

1 e - 5

by which we can say that

{\underline{L}}_{0}

and

{\underline{S}}_{0}

are exact recovered. We can also see that Algorithm 2 runs much faster than Algorithm 1.

We then test whether the recovery performance can be affected by the distribution of the corruptions. This is done by choosing the non-zeros elements of

{\underline{S}}_{0}

from i.i.d. standard Gaussian distribution. The experimental results are reported in Table 2. We can find that both Algorithm 1 and Algorithm 2 can exactly recover the true

{\underline{L}}_{0}

and

{\underline{S}}_{0}

and Algorithm 2 runs much faster than Algorithm 1.

We also conduct STPCP by Algorithm 1 and Algorithm 2 with missing entries. After generating

{\underline{L}}_{0}

,

{\underline{S}}_{0}

and

{\underline{E}}_{0}

, we get the observation by Model (27). We choose the support of

\underline{B}

uniformly at random with possibility

0.8

and then set elements in the chosen support to be 1. Thus, %20 of the entries are missing. The corrupted observation M is then formed by

\underline{M} = \underline{B} ⊙ ({\underline{L}}_{0} + {\underline{S}}_{0} + {\underline{E}}_{0})

. We show the recover results in Table 3. We can see that the underlying low-rank tensor

{\underline{L}}_{0}

can be exactly recovered and the observed part of the corruption tensor

\underline{B} ⊙ {\underline{S}}_{0}

can also be exactly recovered (Please note that it is impossible to recover the unobserved entries of a sparse tensor

{\underline{S}}_{0}

[52]).

5.1.2. Linear Scaling of Errors with the Noise Level

We then verify Statement (II) that the estimation errors have linear scale behavior with respect to the noise level. The estimation errors are measured using the mean-squared-error (MSE):

\begin{matrix} M S E (\hat{\underline{L}}) = \frac{∥ \hat{\underline{L}} - {\underline{L}}_{0} ∥_{F}^{2}}{n_{1} n_{2} n_{3}}, M S E (\hat{\underline{S}}) = \frac{∥ \hat{\underline{S}} - {\underline{S}}_{0} ∥_{F}^{2}}{n_{1} n_{2} n_{3}}, \end{matrix}

for the low rank component

{\underline{L}}_{0}

and the sparse component

{\underline{S}}_{0}

, respectively. We test tensors of 3 different size by choosing

n \in {60, 80, 100}

and

n_{3} = 20

. The tubal rank

r_{tubal} ({\underline{L}}_{0})

of

{\underline{L}}_{0}

and sparsity s of

{\underline{S}}_{0}

are set as

(r_{tubal} ({\underline{L}}_{0}), s) = (5, 0.1 n^{2} n_{3})

. We vary the signal noise ratio

c = 0.03 : 0.03 : 0.6

which is in proportional of the noise level

δ

. We run the proposed Algorithm 1, test 50 trials, and report the averaged MSEs. The MSEs of

\hat{\underline{L}}

and

{\underline{S}}_{0}

versus

c^{2}

are shown in sub-figures (a) and (b) in Figure 4. We can see that the estimation error has linear scaling behavior along with the noise level as Theorem 1 indicates. Since the results for

n = 80

and

n = 100

are quite similar to the case of

n = 60

, they are simply omitted.

5.2. Real Data Sets

In this section, experiments on real data sets (color images and videos) are carried out to evaluate the effectiveness and efficiency of the proposed Algorithms 1 and 2. Besides noises and sparse corruptions, we also consider missing values which is more challenging. The proposed algorithms are compared with the following typical models:

(I).: NN-I: tensor recovery based on matrix nuclear norms of frontal slices formulated as follows:

$\begin{matrix} min_{\underline{L}, \underline{S}} \frac{1}{2} ∥ \underline{B} ⊙ (\underline{M} - \underline{L} - \underline{E}) ∥_{F} + γ \sum_{k = 1}^{n_{3}} (∥ L_{(k)} ∥_{*} + λ ∥ S^{(k)} ∥_{1}) . \end{matrix}$

(53)

This model will be used for image restoration in Section 5.2.1. Please note that Model (53) is equivalent to parallel matrix recovery on each frontal slice.
(II).: NN-II: tensor recovery based on matrix nuclear norm formulated as follows:

$\begin{matrix} min_{\underline{L}, \underline{S}} \frac{1}{2} ∥ \underline{B} ⊙ (\underline{M} - \underline{L} - \underline{E}) ∥_{F} + {γ ∥ L ∥}_{*} + γ λ {∥ \underline{S} ∥}_{1}, \end{matrix}$

(54)

where $L = [l_{1}, l_{2}, \dots, l_{n_{3}}] \in R^{n_{1} n_{2} \times n_{3}}$ with $l_{k} : = vec (L^{(k)}) \in R^{n_{1} n_{2}}$ defined as the vectorization [40] of frontal slices $L^{(k)}$ , for all $k = 1, 2, \dots, n_{3}$ . This model will be used for video restoration in Section 5.2.2.
(III).: SNN: tensor recovery based on SNN formulated as follows:

$\begin{matrix} min_{\underline{L}, \underline{S}} \frac{1}{2} ∥ \underline{B} ⊙ (\underline{M} - \underline{L} - \underline{E}) ∥_{F} + γ \sum_{i = 1}^{3} α_{m} ∥ L_{(i)} ∥_{*} + γ {∥ \underline{S} ∥}_{1}, \end{matrix}$

(55)

where $L_{(i)} \in R^{n_{i} \times \prod_{j \neq i} n_{j}}$ is the mode-i matriculation of tensor $\underline{L} \in R^{n_{1} \times n_{2} \times n_{3}}$ , for all $i = 1, 2, 3$ .

We solve the above Model (53)–(55) using ADMM implemented by ourselves in Matlab. The effectiveness of the algorithms is measured by Peaks Signal Noise Ratio (PSNR):

\begin{matrix} PSNR : = 10 {log}_{10} (\frac{n_{1} n_{2} n_{3} {∥ {\underline{L}}_{0} ∥}_{\infty}^{2}}{∥ \hat{\underline{L}} - {\underline{L}}_{0} ∥_{F}^{2}}) ., \end{matrix}

Please note that a larger PSNR value indicates higher quality of

\hat{\underline{L}}

.

5.2.1. Color Images

Color images are the most commonly used 3-way tensors. We test the twenty 256-by-256-by-3 color images which have been used in [37], and carry out robust tensor recovery with missing entries (see Figure 5). Following [37], for a color image

{\underline{L}}_{0} \in R^{n \times n \times 3}

, we choose its support uniformly at random with ratio

ρ_{s}

and fill in the values with i.i.d. symmetric Bernoulli variables to generate

{\underline{S}}_{0}

. The noise tensor

{\underline{E}}_{0}

is generated with i.i.d. zero-mean Gaussian entries whose standard deviation is given by

σ = 0.05 ∥ {\underline{L}}_{0} ∥_{F} / \sqrt{3 n^{2}}

. Then, we form the binary observation mask

\underline{B}

by choosing its support uniformly at random with ratio

ρ_{obs}

. Finally, the partially observed corruption

\underline{M} = \underline{B} ⊙ ({\underline{L}}_{0} + {\underline{S}}_{0} + {\underline{E}}_{0})

are formed.

We consider two scenarios by setting

(ρ_{obs}, ρ_{s}) \in {(0.9, 0.1), (0.8, 0.2)}

. For NN (Model (53)), we set the regularization parameters

λ = 1 / \sqrt{n ρ_{obs}}

(suggested by [46]), and set parameter

γ = ∥ {\underline{E}}_{0} ∥_{sp}

where

∥ {\underline{E}}_{0} ∥_{sp}

is estimated as

6.5 σ \sqrt{3 ρ_{obs} n log (6 n)}

(suggested by [5]). For SNN, the parameters are chosen to satisfy

γ = 0.05

,

α_{1} = α_{2} = \sqrt{3 n ρ_{obs}}, α_{3} = 0.01 \sqrt{3 n ρ_{obs}}

. For Algorithm 1 and Algorithm 2, we set

γ = 0.3 ∥ {\underline{E}}_{0} ∥_{sp}

, and

λ = 1 / \sqrt{3 n ρ_{obs}}

. The initialized rank r in Algorithm 2 is set as 60. In each setting, we test each color image for 10 times and report the averaged PSNR and time. For quantitative comparison, we show the PSNR values and running times in Figure 6 and Figure 7 for settings of

(ρ_{obs}, ρ_{s}) = (0.9, 0.1)

and

(ρ_{obs}, ρ_{s}) = (0.8, 0.2)

, respectively. Several visual examples are shown in Figure 8 for qualitative comparison for the setting of

(ρ_{obs}, ρ_{s}) = (0.8, 0.2)

. We can see from Figure 6, Figure 7 and Figure 8 that the proposed Algorithm 1 has the highest recovery quality and the proposed Algorithm 2 has the second highest quality but the fastest running time.

5.2.2. Videos

In this subsection, video restoration experiments are conducted on four broadly used YUV videos (They can be downloaded from https://sites.google.com/site/subudhibadri/fewhelpfuldownloads: Akiyo_qcif, Scilent_qcif, Carphone_qcif, and Claire_qcif.) Owing to computational limitation, we simply use the first 32 frames of the Y components of all the videos which results in four 144-by-176-by-30 tensors. For a 3-way data tensor

{\underline{L}}_{0} \in R^{n_{1} \times n_{2} \times n_{3}}

, To generate corruption

{\underline{S}}_{0}

, the support is chosen uniformly at random with ratio

ρ_{s}

and then elements in the support are filled in with i.i.d. symmetric Bernoulli variables. The noise tensor

{\underline{E}}_{0}

is also generated with i.i.d. zero-mean Gaussian entries whose standard deviation is given by

σ = 0.05 ∥ {\underline{L}}_{0} ∥_{F} / \sqrt{n_{1} n_{2} n_{3}}

. Then, the binary observation mask

\underline{B}

is formed thorough choosing its support uniformly at random with ratio

ρ_{obs}

. Finally, the partially observed corruption

\underline{M} = \underline{B} ⊙ ({\underline{L}}_{0} + {\underline{S}}_{0} + {\underline{E}}_{0})

are formed.

We also consider two scenarios by setting

(ρ_{obs}, ρ_{s}) \in {(0.9, 0.1), (0.8, 0.2)}

. NN-II Model (54) is used in video restoration. For NN-II, we set the regularization parameters

λ = 1 / \sqrt{n_{1} n_{2} ρ_{obs}}

(suggested by [46]), and set parameter

γ = ∥ {\underline{E}}_{0} ∥_{sp}

where

∥ {\underline{E}}_{0} ∥_{sp}

is estimated as

6.5 σ \sqrt{ρ_{obs} n_{1} n_{3} log ((n_{1} + n_{2}) n_{3})}

(suggested by [5]). For SNN, the parameters are chosen to satisfy

γ = 0.05

,

α_{1} = α_{2} = \sqrt{n_{1} n_{3} ρ_{obs}}, α_{3} = 5 \sqrt{n_{1} n_{3} ρ_{obs}}

. For Algorithm 1 and Algorithm 2, we set

γ = 0.3 ∥ {\underline{E}}_{0} ∥_{sp}

, and

λ = 1 / \sqrt{max {n_{1}, n_{2}} n_{3} ρ_{obs}}

after careful parameter tuning. The initialized rank r in Algorithm 2 is set as 60. In each setting, we test each video for 10 times and report the averaged PSNR and time. For quantitative comparison, we show the PSNR values and running times in Table 4. It can be seen that Algorithm 1 has the highest recovery quality and the proposed Algorithm 2 has the second highest quality but the fastest running time.

6. Conclusions

This paper studied the problem of stable tensor principal component pursuit which aims to recover a tensor from noises and sparse corruptions. We proposed a constrained tubal nuclear norm-based model and established upper bounds on the estimation error. In contrast to prior work [37], our theory can guarantee exact recovery in the noiseless setting. We also designed two algorithms, the first ADMM algorithm can be accelerated by the second Algorithm which adopts a factorization strategy. We validated the correctness of our theory by simulations on synthetic data, and evaluated the effectiveness and efficiency of the proposed algorithms via experiments on color images and videos.

For future directions, it is a natural and interesting extension to consider recovery of 4-way tensors [35] with arbitrary linear transformation [53,54]. It is also interesting to use tensor factorization-based methods [55,56] for STPCP. Another challenging future direction is developing tools to verify whether the unknown tensor satisfies the tensor incoherence condition from its incomplete or corrupted observations.

For extensions of the proposed approach to higher-way tensors, we produce the following two ideas:

By recursively applying DFT over successive modes higher than 3 and then unfolding the obtained tensor into 3-way [57], the proposed algorithms and theoretical analysis can be extended to higher-way tensors.
By using the overlapped orientation invariant tubal nuclear norm [58], we can extend the proposed algorithm to higher-order cases and obtain orientation invariance.

Author Contributions

Conceptualization, W.F. Data curation, D.W. and R.Z. Formal analysis, W.F. Methodology, W.F., D.W and R.Z. Software, D.W. and R.Z. Writing, original draft, W.F., D.W. and R.Z.

Funding

This research was funded by the Key Projects of Natural Science Research in Universities in Anhui Province under grant number KJ2019A0994.

Acknowledgments

We sincerely thank Andong Wang who shared the codes of [37] and gave us some suggestions of the proof.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Proofs of Lemmas and Theorems

Appendix A.1. The Proof of Theorem 1

Appendix A.1.1. Key Lemmas for the Proof of Theorem 1

Before Proving Theorem 1, we should define some notations and operators first.

Suppose

{\underline{L}}_{0} \in R^{n_{1} \times n_{2} \times n_{3}}

with tubal rank r has the skinny t-SVD

{\underline{L}}_{0} = \underline{U} * \underline{Λ} * {\underline{V}}^{⊤}

, where

\underline{U} \in R^{n_{1} \times r \times n_{3}}, \underline{V} \in R^{r \times n_{2} \times n_{3}}

are orthogonal tensors, and

\underline{Λ} \in R^{r \times r \times n_{3}}

is an f-diagonal tensor. Define the following set:

\begin{matrix} T : = \{\underline{U} * \underline{A} + \underline{B} * {\underline{V}}^{⊤} | \underline{A} \in R^{r \times n_{2} \times n_{3}}, \underline{B} \in R^{n_{1} \times r \times n_{3}}\} \subset R^{n_{1} \times n_{2} \times n_{3}} . \end{matrix}

(A1)

Then, define the projector onto

T

for any tensor

\underline{T} \in R^{n_{1} \times n_{2} \times n_{3}}

as follows:

\begin{matrix} P_{T} (\underline{T}) & : = \underline{U} * {\underline{U}}^{⊤} * \underline{T} + \underline{T} * \underline{V} * {\underline{V}}^{⊤} - \underline{U} * {\underline{U}}^{⊤} * \underline{T} * \underline{V} * {\underline{V}}^{⊤}, \\ P_{T^{⊥}} (\underline{T}) & : = (\underline{I} - \underline{U} * {\underline{U}}^{⊤}) * \underline{T} * (\underline{I} - \underline{V} * {\underline{V}}^{⊤}) . \end{matrix}

(A2)

Let

Ω^{⊥}

be the complement of

Ω \subset [n_{1}] \times [n_{2}] \times [n_{3}]

which is the support of

{\underline{S}}_{0}

. Then, define two operators

P_{Ω}, P_{Ω^{⊥}}

as follows:

\begin{matrix} P_{Ω} (\underline{T}) : = \sum_{(i, j, k) \in Ω} 〈\underline{T}, {\overset{˚}{e}}_{i} * {\dot{e}}_{k} * {\overset{˚}{e}}_{j}^{⊤}〉, P_{Ω^{⊥}} (\underline{T}) : = \sum_{(i, j, k) \in Ω^{⊥}} 〈\underline{T}, {\overset{˚}{e}}_{i} * {\dot{e}}_{k} * {\overset{˚}{e}}_{j}^{⊤}〉, \end{matrix}

(A3)

for any

\underline{T} \in R^{n_{1} \times n_{2} \times n_{3}}

.

Define two sets

Γ

and

Γ^{⊥}

as follows:

\begin{matrix} Γ = {(\underline{A}, \underline{A}) | \underline{A} \in R^{n_{1} \times n_{2} \times n_{3}}}, Γ^{⊥} = {(\underline{A}, - \underline{A}) | \underline{A} \in R^{n_{1} \times n_{2} \times n_{3}}} . \end{matrix}

(A4)

Then, for any tensors

{\underline{X}}_{ι}, {\underline{X}}_{s} \in R^{n_{1} \times n_{2} \times n_{3}}

, the projectors of the tensor

\underline{X} = ({\underline{X}}_{ι}, {\underline{X}}_{s})

into the sets

Γ

and

Γ^{⊥}

are given as follows, respectively:

\begin{matrix} P_{Γ} (\underline{X}) = (\frac{{\underline{X}}_{ι} + {\underline{X}}_{s}}{2}, \frac{{\underline{X}}_{ι} + {\underline{X}}_{s}}{2}), P_{Γ^{⊥}} (\underline{X}) = (\frac{{\underline{X}}_{ι} - {\underline{X}}_{s}}{2}, \frac{{\underline{X}}_{s} - {\underline{X}}_{ι}}{2}) . \end{matrix}

(A5)

For any tensors

{\underline{X}}_{ι}, {\underline{X}}_{s} \in R^{n_{1} \times n_{2} \times n_{3}}

, define two operators on

\underline{X} = ({\underline{X}}_{ι}, {\underline{X}}_{s})

as follows:

\begin{matrix} (P_{T} \times P_{Ω}) (\underline{X}) = (P_{T} ({\underline{X}}_{ι}), P_{Ω} ({\underline{X}}_{s})), (P_{T^{⊥}} \times P_{Ω^{⊥}}) (\underline{X}) = (P_{T^{⊥}} ({\underline{X}}_{ι}), P_{Ω^{⊥}} ({\underline{X}}_{s})) . \end{matrix}

(A6)

Also define two norms as follows:

\begin{matrix} ∥ \underline{X} ∥_{F} = \sqrt{∥ {\underline{X}}_{ι} ∥_{F}^{2} + {∥ {\underline{X}}_{s} ∥}_{F}^{2}}, {∥ \underline{X} ∥}_{F, μ} = \sqrt{∥ {\underline{X}}_{ι} ∥_{F}^{2} + μ^{2} {∥ {\underline{X}}_{s} ∥}_{F}^{2}} . \end{matrix}

(A7)

where

μ

is a constant that will be determined afterwards.

We first give Lemma A1 which can be seen as a modified version of Lemma C.1 in [2].

Lemma A1.

Assume that

∥ P_{Ω} P_{T} ∥ \leq \frac{1}{2}

, and

λ \leq \frac{1}{2 \sqrt{n_{3}}}

. Suppose there exists a tensor

{\underline{G}}^{*}

satisfying the following conditions:

\begin{matrix} \{\begin{matrix} P_{T} ({\underline{G}}^{*}) = \underline{U} * {\underline{V}}^{⊤}, \\ ∥ P_{T^{⊥}} ({\underline{G}}^{*}) ∥_{sp} \leq \frac{1}{2}, \\ ∥ P_{Ω} ({\underline{G}}^{*} - λ sign ({\underline{S}}_{0})) ∥_{F} \leq \frac{λ}{4}, \\ ∥ P_{Ω^{⊥}} ({\underline{G}}^{*}) ∥_{\infty} \leq \frac{λ}{2} . \end{matrix} \end{matrix}

(A8)

Then for any perturbation

\underline{Δ} \in R^{n_{1} \times n_{2} \times n_{3}}

, one has:

\begin{matrix} ∥ {\underline{L}}_{0} + \underline{Δ} ∥_{TNN} + λ {∥ {\underline{S}}_{0} - \underline{Δ} ∥}_{1} \\ \geq ∥ {\underline{L}}_{0} ∥_{TNN} + λ ∥ {\underline{S}}_{0} ∥_{1} + (\frac{3}{4} - {∥ P_{T^{⊥}} (\underline{G}) ∥}_{sp}) ∥ P_{T^{⊥}} (\underline{Δ}) ∥_{TNN} + (\frac{3}{4} λ - P_{Ω^{⊥}} (∥ \underline{G} ∥_{\infty})) {∥ P_{Ω^{⊥}} (Δ) ∥}_{1} . \end{matrix}

(A9)

Proof.

Let

{\underline{G}}_{ι} \in \partial {∥ {\underline{L}}_{0} ∥}_{TNN}

, i.e., any sub-gradient of

{∥ \cdot ∥}_{TNN}

at

{\underline{L}}_{0}

, then it satisfies:

\begin{matrix} P_{T} ({\underline{G}}_{ι}) = \underline{U} * {\underline{V}}^{⊤}, {∥ P_{T^{⊥}} ({\underline{G}}_{ι}) ∥}_{sp} \leq 1 . \end{matrix}

(A10)

{\underline{G}}_{ι} \in \partial {∥ {\underline{L}}_{0} ∥}_{TNN}

and

{\underline{G}}_{s} \in \partial (λ ∥ {\underline{S}}_{0} ∥_{1})

. According to the convexity of

{∥ \cdot ∥}_{TNN}

and

{∥ \cdot ∥}_{1}

, we have:

\begin{matrix} ∥ {\underline{L}}_{0} + \underline{Δ} ∥_{TNN} \geq ∥ {\underline{L}}_{0} ∥_{TNN} + 〈{\underline{G}}_{ι}, \underline{Δ}〉, λ ∥ {\underline{S}}_{0} - \underline{Δ} ∥_{1} \geq λ {∥ {\underline{S}}_{0} ∥}_{1} - 〈{\underline{G}}_{s}, \underline{Δ}〉 . \end{matrix}

(A11)

By choosing

{\underline{G}}_{ι} = \underline{U} * {\underline{V}}^{⊤} + \underline{P} * {\underline{Q}}^{⊤}

, where

\underline{P}

and

\underline{Q}

comes from the skinny t-SVD of

P_{T^{⊥}} (\underline{Δ}) = \underline{P} * \sum_{_} * {\underline{Q}}^{⊤}

, one has:

\begin{matrix} 〈{\underline{G}}_{ι}, \underline{Δ}〉 & = 〈\underline{G}, \underline{Δ}〉 + 〈{\underline{G}}_{ι} - \underline{G}, \underline{Δ}〉 = 〈\underline{G}, \underline{Δ}〉 + 〈P_{T^{⊥}} ({\underline{G}}_{ι}), P_{T^{⊥}} (\underline{Δ})〉 - 〈P_{T^{⊥}} (\underline{G}), P_{T^{⊥}} (\underline{Δ})〉 \\ = 〈\underline{G}, \underline{Δ}〉 - (1 - ∥ P_{T^{⊥}} (\underline{G}) ∥_{sp}) ∥ P_{T^{⊥}} (\underline{Δ}) ∥_{TNN} . \end{matrix}

(A12)

Also, by choosing

{\underline{G}}_{s} = λ s i g n ({\underline{S}}_{0}) - s i g n (P_{Ω^{⊥}} (\underline{Δ}))

, one has:

\begin{matrix} - 〈{\underline{G}}_{s}, \underline{Δ}〉 & = - 〈\underline{G}, \underline{Δ}〉 - 〈{\underline{G}}_{s} - \underline{G}, \underline{Δ}〉 \\ = - 〈\underline{G}, \underline{Δ}〉 - 〈P_{Ω} (λ s i g n ({\underline{S}}_{0}) - \underline{G}), P_{Ω} (\underline{Δ})〉 - 〈P_{Ω^{⊥}} ({\underline{G}}_{s}), P_{Ω^{⊥}} (\underline{Δ})〉 + 〈P_{Ω^{⊥}} (\underline{G}), P_{Ω^{⊥}} (\underline{Δ})〉 \\ \geq - 〈\underline{G}, \underline{Δ}〉 - ∥ P_{Ω} (λ s i g n ({\underline{S}}_{0}) - \underline{G}) ∥_{F} ∥ P_{Ω} (\underline{Δ}) ∥_{F} + ∥ P_{Ω^{⊥}} (\underline{Δ}) ∥_{1} - ∥ P_{Ω^{⊥}} (\underline{G}) ∥_{\infty} {∥ P_{Ω^{⊥}} (\underline{Δ}) ∥}_{1} \\ \geq - 〈\underline{G}, \underline{Δ}〉 - \frac{λ}{4} ∥ P_{Ω} (\underline{Δ}) ∥_{F} + (1 - ∥ P_{Ω^{⊥}} (\underline{G}) ∥_{\infty}) ∥ P_{Ω^{⊥}} (\underline{Δ}) ∥_{1} \end{matrix}

(A13)

Also note that:

\begin{matrix} ∥ P_{Ω} (\underline{Δ}) ∥_{F} & \leq ∥ P_{Ω} P_{T} (\underline{Δ}) ∥_{F} + {∥ P_{Ω} P_{T^{⊥}} (\underline{Δ}) ∥}_{F} \\ \leq ∥ P_{Ω} P_{T} (\underline{Δ}) ∥_{F} + {∥ P_{Ω} P_{T^{⊥}} (\underline{Δ}) ∥}_{F} \\ \leq \frac{1}{2} ∥ \underline{Δ} ∥_{F} + {∥ P_{Ω} P_{T^{⊥}} (\underline{Δ}) ∥}_{F} \\ \leq \frac{1}{2} ∥ P_{Ω} (\underline{Δ}) ∥_{F} + \frac{1}{2} ∥ P_{Ω^{⊥}} (\underline{Δ}) ∥_{F} + {∥ P_{Ω} P_{T^{⊥}} (\underline{Δ}) ∥}_{F} \end{matrix}

(A14)

which leads to:

\begin{matrix} ∥ P_{Ω} (\underline{Δ}) ∥_{F} & \leq ∥ P_{Ω^{⊥}} (\underline{Δ}) ∥_{F} + 2 ∥ P_{Ω} P_{T^{⊥}} (\underline{Δ}) ∥_{F} \leq ∥ P_{Ω^{⊥}} (\underline{Δ}) ∥_{1} + 2 \sqrt{n_{3}} {∥ P_{T^{⊥}} (\underline{Δ}) ∥}_{TNN} . \end{matrix}

(A15)

Putting things together, we have:

\begin{matrix} ∥ {\underline{L}}_{0} + \underline{Δ} ∥_{TNN} + λ ∥ {\underline{S}}_{0} - \underline{Δ} ∥_{1} - (∥ {\underline{L}}_{0} ∥_{TNN} + λ ∥ {\underline{S}}_{0} ∥_{1}) \\ \geq (1 - \frac{λ \sqrt{n_{3}}}{2} - {∥ P_{T^{⊥}} (\underline{G}) ∥}_{sp}) ∥ P_{T^{⊥}} (\underline{Δ}) ∥_{TNN} + (\frac{3}{4} λ - P_{Ω^{⊥}} (∥ \underline{G} ∥_{\infty})) {∥ P_{Ω^{⊥}} (\underline{Δ}) ∥}_{1} . \end{matrix}

(A16)

Since

λ \leq \frac{1}{2 \sqrt{n_{3}}}

, it holds that:

\begin{matrix} ∥ {\underline{L}}_{0} + \underline{Δ} ∥_{TNN} + λ {∥ {\underline{S}}_{0} - \underline{Δ} ∥}_{1} \\ \geq ∥ {\underline{L}}_{0} ∥_{TNN} + λ ∥ {\underline{S}}_{0} ∥_{1} + (\frac{3}{4} - {∥ P_{T^{⊥}} (\underline{G}) ∥}_{sp}) ∥ P_{T^{⊥}} (\underline{Δ}) ∥_{TNN} + (\frac{3}{4} λ - P_{Ω^{⊥}} (∥ \underline{G} ∥_{\infty})) {∥ P_{Ω^{⊥}} (Δ) ∥}_{1}, \end{matrix}

for any perturbation

\underline{Δ} \in R^{n_{1} \times n_{2} \times n_{3}}

. □

Lemma A2.

Suppose that

∥ P_{Ω} P_{T} ∥ \leq 1 / 2

, then for any

\underline{X} = ({\underline{X}}_{ι}, {\underline{X}}_{s})

, we have:

\begin{matrix} ∥ P_{Γ} (P_{T} \times P_{Ω}) (\underline{X}) ∥_{F, μ}^{2} \geq \frac{1 + μ^{2}}{8} {∥ P_{T} \times P_{Ω} (\underline{X}) ∥}_{F}^{2} . \end{matrix}

(A17)

Proof.

According to the definitions of

P_{Γ}

and

P_{T} \times P_{Ω}

, we have:

\begin{matrix} P_{Γ} (P_{T} \times P_{Ω}) (\underline{X}) = (\frac{P_{T} ({\underline{X}}_{ι}) + P_{Ω} ({\underline{X}}_{s})}{2}, \frac{P_{T} ({\underline{X}}_{ι}) + P_{Ω} ({\underline{X}}_{s})}{2}) . \end{matrix}

(A18)

Then, we have:

\begin{matrix} ∥ P_{Γ} (P_{T} \times P_{Ω}) (\underline{X}) ∥_{F, μ}^{2} & = (1 + μ^{2}) \cdot \frac{1}{4} \cdot (∥ P_{T} ({\underline{X}}_{ι}) ∥_{F}^{2} + {∥ P_{Ω} ({\underline{X}}_{s}) ∥}_{F}^{2} + 2 〈P_{T} ({\underline{X}}_{ι}), P_{Ω} ({\underline{X}}_{s})〉) \\ \overset{}{=} \frac{(1 + μ^{2})}{4} (∥ P_{T} ({\underline{X}}_{ι}) ∥_{F}^{2} + {∥ P_{Ω} ({\underline{X}}_{s}) ∥}_{F}^{2} + 2 〈P_{Ω} P_{T} P_{T} ({\underline{X}}_{ι}), P_{Ω} ({\underline{X}}_{s})〉) \\ \overset{}{\geq} \frac{(1 + μ^{2})}{4} (∥ P_{T} ({\underline{X}}_{ι}) ∥_{F}^{2} + ∥ P_{Ω} ({\underline{X}}_{s}) ∥_{F}^{2} - 2 ∥ P_{Ω} P_{T} ∥ ∥ P_{T} ({\underline{X}}_{ι}) ∥_{F} {∥ P_{Ω} ({\underline{X}}_{s}) ∥}_{F}) \\ \overset{}{\geq} \frac{(1 + μ^{2})}{4} (∥ P_{T} ({\underline{X}}_{ι}) ∥_{F}^{2} + ∥ P_{Ω} ({\underline{X}}_{s}) ∥_{F}^{2} - 2 \cdot \frac{1}{2} ∥ P_{T} ({\underline{X}}_{ι}) ∥_{F} {∥ P_{Ω} ({\underline{X}}_{s}) ∥}_{F}) \\ \overset{}{\geq} \frac{(1 + μ^{2})}{4} (∥ P_{T} ({\underline{X}}_{ι}) ∥_{F}^{2} + {∥ P_{Ω} ({\underline{X}}_{s}) ∥}_{F}^{2} - \frac{∥ P_{T} ({\underline{X}}_{ι}) ∥_{F}^{2} + {∥ P_{Ω} ({\underline{X}}_{s}) ∥}_{F}^{2}}{2}) \\ = \frac{(1 + μ^{2})}{8} {∥ P_{T} \times P_{Ω} (\underline{X}) ∥}_{F}^{2} . \end{matrix}

(A19)

Hence completes the proof. □

Appendix A.1.2. Proof of Theorem 1

Proof.

For

\underline{X} = (\underline{L}, \underline{S})

, define

∥ \underline{X} ∥_{⋄} = ∥ \underline{L} ∥_{TNN} + λ {∥ \underline{S} ∥}_{1}

. Let

\underline{\hat{X}} = (\hat{\underline{L}}, \hat{\underline{S}})

,

{\underline{X}}^{*} = ({\underline{L}}_{0}, {\underline{S}}_{0})

. According to the optimality of

(\hat{\underline{L}}, \hat{\underline{S}})

and the feasibility of

({\underline{L}}_{0}, {\underline{S}}_{0})

, we directly have:

\begin{matrix} ∥ \underline{\hat{X}} ∥_{⋄} \leq {∥ {\underline{X}}^{*} ∥}_{⋄}, \end{matrix}

(A20)

\begin{matrix} ∥ \hat{\underline{L}} + \hat{\underline{S}} - \underline{M} ∥_{F} \leq δ, \end{matrix}

(A21)

\begin{matrix} ∥ {\underline{L}}_{0} + {\underline{S}}_{0} - \underline{M} ∥_{F} \leq δ . \end{matrix}

(A22)

Let

{\underline{Δ}}_{ι} = \hat{\underline{L}} - {\underline{L}}_{0}

,

{\underline{Δ}}_{s} = \hat{\underline{S}} - {\underline{S}}_{0}

. Then, we have:

\begin{matrix} ∥ {\underline{Δ}}_{ι} + {\underline{Δ}}_{s} ∥_{F} = ∥ \hat{\underline{L}} + \hat{\underline{S}} - \underline{M} - ({\underline{L}}_{0} + {\underline{S}}_{0} - \underline{M}) ∥_{F} \leq ∥ \hat{\underline{L}} + \hat{\underline{S}} - \underline{M} ∥_{F} + {∥ {\underline{L}}_{0} + {\underline{S}}_{0} - \underline{M} ∥}_{F} \leq 2 δ . \end{matrix}

(A23)

Define the pair of error tensors

\underline{Δ} = \underline{\hat{X}} - {\underline{X}}^{*} = ({\underline{Δ}}_{ι}, {\underline{Δ}}_{s})

. The goal is to bound

∥ \underline{Δ} ∥_{F, μ}

.

First, we use the decomposition

\underline{Δ} = P_{Γ} (\underline{Δ}) + P_{Γ^{⊥}} (\underline{Δ})

, and let

{\underline{Δ}}^{Γ} = P_{Γ} (\underline{Δ}) = ({\underline{Δ}}_{ι}^{Γ}, {\underline{Δ}}_{s}^{Γ}) = (\frac{{\underline{Δ}}_{ι} + {\underline{Δ}}_{s}}{2}, \frac{{\underline{Δ}}_{ι} + {\underline{Δ}}_{s}}{2}), {\underline{Δ}}^{Γ^{⊥}} = P_{Γ^{⊥}} (\underline{Δ}) = ({\underline{Δ}}_{ι}^{Γ^{⊥}}, {\underline{Δ}}_{s}^{Γ^{⊥}}) = (\frac{{\underline{Δ}}_{ι} - {\underline{Δ}}_{s}}{2}, \frac{{\underline{Δ}}_{s} - {\underline{Δ}}_{ι}}{2})

for simplicity. Then, we have:

\begin{matrix} ∥ \underline{Δ} ∥_{F, μ} = ∥ {\underline{Δ}}^{Γ} + {\underline{Δ}}^{Γ^{⊥}} ∥_{F, μ} \overset{}{\leq} ∥ {\underline{Δ}}^{Γ} ∥_{F, μ} + {∥ {\underline{Δ}}^{Γ^{⊥}} ∥}_{F, μ} . \end{matrix}

(A24)

Please note that

{\underline{Δ}}_{ι}^{Γ} = {\underline{Δ}}_{s}^{Γ} = \frac{{\underline{Δ}}_{ι} + {\underline{Δ}}_{s}}{2}

, thus

∥ {\underline{Δ}}^{Γ} ∥_{F, μ}

can be bounded easily as follows:

\begin{matrix} ∥ {\underline{Δ}}^{Γ} ∥_{F, μ} = \sqrt{∥ {\underline{Δ}}_{ι}^{Γ} ∥_{F}^{2} + μ^{2} {∥ {\underline{Δ}}_{s}^{Γ} ∥}_{F}^{2}} = \frac{\sqrt{1 + μ^{2}}}{2} {∥ {\underline{Δ}}_{ι} + {\underline{Δ}}_{s} ∥}_{F} \leq δ \sqrt{1 + μ^{2}} . \end{matrix}

(A25)

Then, it remains to bound

∥ {\underline{Δ}}^{Γ^{⊥}} ∥_{F, μ}

. Due to the triangular inequality:

\begin{matrix} ∥ {\underline{Δ}}^{Γ^{⊥}} ∥_{F, μ} \leq ∥ (P_{T} \times P_{Ω}) {\underline{Δ}}^{Γ^{⊥}} ∥_{F, μ} + {∥ (P_{T^{⊥}} \times P_{Ω^{⊥}}) {\underline{Δ}}^{Γ^{⊥}} ∥}_{F, μ}, \end{matrix}

(A26)

(A) bound $∥ (P_{T^{⊥}} \times P_{Ω^{⊥}}) {\underline{Δ}}^{Γ^{⊥}} ∥_{F, μ}$ . According to the convexity of

{∥ \cdot ∥}_{⋄}

we have:

\begin{matrix} ∥ {\underline{X}}^{*} + \underline{Δ} ∥_{⋄} = ∥ {\underline{X}}^{*} + {\underline{Δ}}^{Γ} + {\underline{Δ}}^{Γ^{⊥}} ∥_{⋄} \geq ∥ {\underline{X}}^{*} + {\underline{Δ}}^{Γ^{⊥}} ∥_{⋄} - {∥ {\underline{Δ}}^{Γ} ∥}_{⋄} . \end{matrix}

(A27)

Using Lemma A1, we have:

\begin{matrix} ∥ {\underline{X}}^{*} + {\underline{Δ}}^{Γ^{⊥}} ∥_{⋄} & \geq ∥ {\underline{X}}^{*} ∥_{⋄} + (\frac{3}{4} - {∥ P_{T^{⊥}} (\underline{G}) ∥}_{sp}) ∥ P_{T^{⊥}} ({\underline{Δ}}_{ι}^{Γ^{⊥}}) ∥_{TNN} + (\frac{3}{4} λ - P_{Ω^{⊥}} (∥ \underline{G} ∥_{\infty})) {∥ P_{Ω^{⊥}} ({\underline{Δ}}_{s}^{Γ^{⊥}}) ∥}_{1} \\ \geq ∥ {\underline{X}}^{*} ∥_{⋄} + \frac{1}{4} {∥ (P_{T^{⊥}} \times P_{Ω^{⊥}}) {\underline{Δ}}^{Γ^{⊥}} ∥}_{⋄} . \end{matrix}

(A28)

Combining Equations (A20), (A27) and (A28), we have:

\begin{matrix} ∥ {\underline{Δ}}^{Γ} ∥_{⋄} \geq \frac{1}{4} {∥ (P_{T^{⊥}} \times P_{Ω^{⊥}}) {\underline{Δ}}^{Γ^{⊥}} ∥}_{⋄} \end{matrix}

(A29)

Then, with

μ = \sqrt{n_{3}} λ

, we reach a bound on

∥ (P_{T^{⊥}} \times P_{Ω^{⊥}}) {\underline{Δ}}^{Γ^{⊥}} ∥_{F, μ}

as follows:

\begin{matrix} ∥ (P_{T^{⊥}} \times P_{Ω^{⊥}}) {\underline{Δ}}^{Γ^{⊥}} ∥_{F, μ} & \leq ∥ P_{T^{⊥}} ({\underline{Δ}}_{ι}^{Γ^{⊥}}) ∥_{F} + μ {∥ P_{Ω^{⊥}} ({\underline{Δ}}_{s}^{Γ^{⊥}}) ∥}_{F} \\ \leq \sqrt{n_{3}} ∥ P_{T^{⊥}} ({\underline{Δ}}_{ι}^{Γ^{⊥}}) ∥_{TNN} + μ {∥ P_{Ω^{⊥}} ({\underline{Δ}}_{s}^{Γ^{⊥}}) ∥}_{1} \\ \overset{}{\leq} \sqrt{n_{3}} (∥ P_{T^{⊥}} ({\underline{Δ}}_{ι}^{Γ^{⊥}}) ∥_{TNN} + λ {∥ P_{Ω^{⊥}} ({\underline{Δ}}_{s}^{Γ^{⊥}}) ∥}_{1}) \\ \leq \sqrt{n_{3}} {∥ (P_{T^{⊥}} \times P_{Ω^{⊥}}) {\underline{Δ}}^{Γ^{⊥}} ∥}_{⋄} \\ \overset{}{\leq} 4 \sqrt{n_{3}} {∥ {\underline{Δ}}^{Γ} ∥}_{⋄} \\ \leq 4 \sqrt{n_{3}} (∥ {\underline{Δ}}_{ι}^{Γ} ∥_{TNN} + λ {∥ {\underline{Δ}}_{s}^{Γ} ∥}_{1}) \\ \overset{}{\leq} 4 \sqrt{n_{3}} (\sqrt{min {n_{1}, n_{2}}} ∥ {\underline{Δ}}_{ι}^{Γ} ∥_{F} + λ \sqrt{n_{1} n_{2} n_{3}} {∥ {\underline{Δ}}_{s}^{Γ} ∥}_{F}) \\ \overset{}{=} 4 \sqrt{n_{3}} (\sqrt{min {n_{1}, n_{2}}} + λ \sqrt{n_{1} n_{2} n_{3}}) {∥ {\underline{Δ}}_{ι}^{Γ} ∥}_{F} \\ \overset{}{\leq} 4 \sqrt{n_{3}} (\sqrt{min {n_{1}, n_{2}}} + λ \sqrt{n_{1} n_{2} n_{3}}) δ . \end{matrix}

(A30)

(B) bound $∥ (P_{T} \times P_{Ω}) {\underline{Δ}}^{Γ^{⊥}} ∥_{F, μ}$ . Please note that:

\begin{matrix} P_{Γ} ({\underline{Δ}}^{Γ^{⊥}}) \overset{}{=} \underline{0} = P_{Γ} (P_{T} \times P_{Ω}) ({\underline{Δ}}^{Γ^{⊥}}) + P_{Γ} (P_{T^{⊥}} \times P_{Ω^{⊥}}) ({\underline{Δ}}^{Γ^{⊥}}), \end{matrix}

(A31)

which means:

\begin{matrix} ∥ P_{Γ} (P_{T} \times P_{Ω}) ({\underline{Δ}}^{Γ^{⊥}}) ∥_{F, μ} & = ∥ P_{Γ} (P_{T^{⊥}} \times P_{Ω^{⊥}} ({\underline{Δ}}^{Γ^{⊥}})) ∥_{F, μ} \overset{}{\leq} {∥ P_{T^{⊥}} \times P_{Ω^{⊥}} ({\underline{Δ}}^{Γ^{⊥}}) ∥}_{F, μ} . \end{matrix}

(A32)

According to Lemma A2, we have:

\begin{matrix} ∥ (P_{T} \times P_{Ω}) ({\underline{Δ}}^{Γ^{⊥}}) ∥_{F, μ} & \overset{}{\leq} ∥ P_{T} ({\underline{Δ}}_{ι}^{Γ^{⊥}}) ∥_{F} + μ {∥ P_{Ω} ({\underline{Δ}}_{ι}^{Γ^{⊥}}) ∥}_{F} \\ \overset{}{\leq} \sqrt{1 + μ^{2}} \sqrt{∥ P_{T} ({\underline{Δ}}_{ι}^{Γ^{⊥}}) ∥_{F}^{2} + {∥ P_{Ω} ({\underline{Δ}}_{s}^{Γ^{⊥}}) ∥}_{F}^{2}} \\ \overset{}{=} \sqrt{1 + μ^{2}} {∥ P_{T} \times P_{Ω} ({\underline{Δ}}^{Γ^{⊥}}) ∥}_{F} \\ \overset{}{\leq} \sqrt{1 + μ^{2}} \cdot \sqrt{\frac{8}{\sqrt{1 + μ^{2}}}} \cdot {∥ P_{Γ} (P_{T} \times P_{Ω}) ({\underline{Δ}}^{Γ^{⊥}}) ∥}_{F} . \end{matrix}

(A33)

According to Equations (A32) and (A33), we obtain:

\begin{matrix} ∥ (P_{T} \times P_{Ω}) ({\underline{Δ}}^{Γ^{⊥}}) ∥_{F, μ} \leq 2 \sqrt{2} {∥ (P_{T^{⊥}} \times P_{Ω^{⊥}}) ({\underline{Δ}}^{Γ^{⊥}}) ∥}_{F, μ} . \end{matrix}

(A34)

Thus, combing Equations (A24), (A25), (A30) and (A34), and setting

μ = \sqrt{n_{3}} λ

, we obtain:

\begin{matrix} ∥ \underline{Δ} ∥_{F, μ} \leq (\sqrt{1 + n_{3} λ^{2}} + 4 (1 + 2 \sqrt{2}) (\sqrt{min {n_{1}, n_{2}} n_{3}} + n_{3} λ \sqrt{n_{1} n_{2}})) δ . \end{matrix}

(A35)

Since

λ = \frac{1}{\sqrt{max {n_{1}, n_{2}} n_{3}}}

, we have:

\begin{matrix} ∥ \underline{Δ} ∥_{F, μ} \leq (\sqrt{1 + \frac{1}{max {n_{1}, n_{2}}}} + 8 (1 + 2 \sqrt{2}) \sqrt{min {n_{1}, n_{2}} n_{3}}) δ, \end{matrix}

(A36)

which indicates that:

\begin{matrix} ∥ \hat{\underline{L}} - {\underline{L}}_{0} ∥_{F} & \leq (\sqrt{1 + \frac{1}{max {n_{1}, n_{2}}}} + 8 (1 + 2 \sqrt{2}) \sqrt{min {n_{1}, n_{2}} n_{3}}) δ \\ ∥ \hat{\underline{S}} - {\underline{S}}_{0} ∥_{F} & \leq (\sqrt{1 + max {n_{1}, n_{2}}} + 8 (1 + 2 \sqrt{2}) \sqrt{n_{1} n_{2} n_{3}}) δ . \end{matrix}

(A37)

Moreover, according to the analysis in [2], the conditions

∥ P_{Ω} P_{T} ∥ \leq \frac{1}{2}

and Equation (A8) in Lemma A1 hold with probability at least

1 - c_{1} {(n_{3} max {n_{1}, n_{2}})}^{- c_{2}}

, where

c_{1}

and

c_{2}

are positive constants.

In this way, the proof of Theorem 1 is completed. □

Appendix A.2. Proof of Theorem 2

Proof.

The key idea is to rewrite Problem (29) into a standard two-block ADMM problem. For notational simplicity, let:

\begin{matrix} f (x) = \frac{1}{2} ∥ \underline{L} + \underline{S} - \underline{Y} ∥_{F}^{2}, g (z) = γ {∥ \underline{K} ∥}_{TNN} + γ λ R (\underline{S}), \end{matrix}

where

x, y, z

and

A

are defined as follows:

\begin{matrix} x = [\begin{matrix} vec (\underline{L}) \\ vec (\underline{S}) \end{matrix}], y = [\begin{matrix} vec ({\underline{Y}}_{1}) \\ vec ({\underline{Y}}_{2}) \end{matrix}], z = [\begin{matrix} vec (\underline{K}) \\ vec (\underline{R}) \end{matrix}], A = [\begin{matrix} diag (vec (\underline{B})) & 0 \\ 0 & diag (vec (\underline{B})) \end{matrix}], \end{matrix}

and

vec (\cdot)

denotes an operation of tensor vectorization (see [40]).

It can be verified that

f (\cdot)

and

g (\cdot)

are closed, proper convex functions. Then, Problem (29) can be re-written as follows:

\begin{matrix} min_{x, z} & f (x) + g (z) \\ s . t . & A x - z = 0 . \end{matrix}

According to the convergence analysis in [48], we have:

\begin{matrix} objective convergence : & lim_{t \to \infty} f (x^{t}) + g (z^{t}) = f^{⋆} + g^{⋆}, \\ dual variable convergence : & lim_{t \to \infty} y^{t} = y^{⋆}, \\ constraint convergence : & lim_{t \to \infty} A x^{t} - z^{t} = 0, \end{matrix}

where

f^{⋆}, g^{⋆}

are the optimal values of

f (x)

,

g (z)

, respectively. Variable

y^{⋆}

is a dual optimal point defined as:

\begin{matrix} y^{⋆} = [\begin{matrix} vec ({\underline{Y}}_{1}^{⋆}) \\ vec ({\underline{Y}}_{2}^{⋆}) \end{matrix}], \end{matrix}

where

({\underline{Y}}_{1}^{⋆}, {\underline{Y}}_{2}^{⋆})

is the dual component of a saddle point

({\underline{L}}^{⋆}, {\underline{S}}^{⋆}, {\underline{K}}^{⋆}, {\underline{R}}^{⋆}, {\underline{Y}}_{1}^{⋆}, {\underline{Y}}_{2}^{⋆})

of the unaugmented Lagrangian

L (\underline{L}, \underline{S}, \underline{K}, \underline{R}, {\underline{Y}}_{1}, {\underline{Y}}_{2})

. □

Appendix A.3. Proof of Lemma 4

Proof.

Let the full t-SVD of

\underline{X}

be

\underline{X} = \underline{U} * \underline{Λ} * {\underline{V}}^{⊤}

, where

\underline{U}, \underline{V} \in R^{r \times r \times n_{3}}

are orthogonal tensors and

\underline{Λ} \in R^{r \times r \times n_{3}}

is f-diagonal. Then:

\begin{matrix} ∥ \underline{X} ∥_{TNN} & = ∥ \bar{\underline{U} * \underline{Λ} * {\underline{V}}^{⊤}} ∥_{*} = ∥ \bar{\underline{U}} \cdot \bar{\underline{Λ}} \cdot \bar{{\underline{V}}^{⊤}} ∥_{*} = ∥ \bar{\underline{Λ}} ∥_{*} . \end{matrix}

(A38)

Then

\underline{Q} * \underline{X} = (\underline{Q} * \underline{U}) * \underline{Λ} * {\underline{V}}^{⊤}

. Since

\begin{matrix} {(\underline{Q} * \underline{U})}^{⊤} * (\underline{Q} * \underline{U}) & = {\underline{U}}^{⊤} * {\underline{Q}}^{⊤} * \underline{Q} * \underline{U} = \underline{I}, \end{matrix}

(A39)

we obtain that:

\begin{matrix} ∥ \underline{Q} * \underline{X} ∥_{TNN} & = ∥ \bar{\underline{Q} * \underline{X}} ∥_{*} \\ = ∥ \bar{(\underline{Q} * \underline{U}) * \underline{Λ} {\underline{V}}^{⊤}} ∥_{*} \\ = ∥ \bar{(\underline{Q} * \underline{U})} \cdot \bar{\underline{Λ}} \cdot \bar{{\underline{V}}^{⊤}} ∥_{*} \\ = ∥ \bar{\underline{Λ}} ∥_{*} . \end{matrix}

(A40)

Thus,

∥ \underline{Q} * \underline{X} ∥_{TNN} = {∥ \underline{X} ∥}_{TNN}

. □

Appendix A.4. Proof of Theorem 3

Proof.

Please note that

({\underline{Q}}_{*} * {\underline{X}}_{*}, {\underline{S}}_{*})

is a feasible point of Problem (28), then we have:

\begin{matrix} \frac{1}{2} ∥ \underline{B} ⊙ ({\underline{L}}^{⋆} + {\underline{S}}^{⋆} - \underline{M}) ∥_{F}^{2} + γ (∥ {\underline{L}}^{⋆} ∥_{TNN} + λ ∥ {\underline{S}}^{⋆} ∥_{1}) \\ \leq \frac{1}{2} ∥ \underline{B} ⊙ ({\underline{Q}}_{*} * {\underline{X}}_{*} + {\underline{S}}_{*} - \underline{M}) ∥_{F}^{2} + γ (∥ {\underline{Q}}_{*} * {\underline{X}}_{*} ∥_{TNN} + λ ∥ {\underline{S}}_{*} ∥_{1}) \\ = \frac{1}{2} ∥ \underline{B} ⊙ ({\underline{Q}}_{*} * {\underline{X}}_{*} + {\underline{S}}_{*} - \underline{M}) ∥_{F}^{2} + γ (∥ {\underline{X}}_{*} ∥_{TNN} + λ ∥ {\underline{S}}_{*} ∥_{1}) \end{matrix}

(A41)

By the assumption that

r_{tubal} ({\underline{L}}^{⋆}) \leq r

, there exists a decomposition

{\underline{L}}^{⋆} = {\underline{Q}}^{⋆} * {\underline{X}}^{⋆}

, such that

({\underline{Q}}^{⋆}, {\underline{X}}^{⋆}, {\underline{S}}^{⋆})

is also a feasible point of Problem (39).

Moreover, since

({\underline{Q}}_{*}, {\underline{X}}_{*}, {\underline{S}}_{*})

is a global optimal solution to Problem (39), then we have that

\begin{matrix} \frac{1}{2} ∥ \underline{B} ⊙ ({\underline{Q}}_{*} * {\underline{X}}_{*} + {\underline{S}}_{*} - \underline{M}) ∥_{F}^{2} + γ (∥ {\underline{X}}_{*} ∥_{TNN} + λ ∥ {\underline{S}}_{*} ∥_{1}) \\ \leq \frac{1}{2} ∥ \underline{B} ⊙ ({\underline{Q}}^{⋆} * {\underline{X}}^{⋆} + {\underline{S}}^{⋆} - \underline{M}) ∥_{F}^{2} + γ (∥ {\underline{X}}^{⋆} ∥_{TNN} + λ ∥ {\underline{S}}^{⋆} ∥_{1}) . \end{matrix}

By

{\underline{L}}^{⋆} = {\underline{Q}}^{⋆} * {\underline{X}}^{⋆}

, we have:

\begin{matrix} ∥ {\underline{L}}^{⋆} ∥_{TNN} = ∥ {\underline{Q}}^{⋆} * {\underline{X}}^{⋆} ∥_{TNN} = {∥ {\underline{X}}^{⋆} ∥}_{TNN} . \end{matrix}

(A42)

Thus, we deduce:

\begin{matrix} \frac{1}{2} ∥ \underline{B} ⊙ ({\underline{Q}}_{*} * {\underline{X}}_{*} + {\underline{S}}_{*} - \underline{M}) ∥_{F}^{2} + γ (∥ {\underline{X}}_{*} ∥_{TNN} + λ ∥ {\underline{S}}_{*} ∥_{1}) \\ \leq \frac{1}{2} ∥ \underline{B} ⊙ ({\underline{L}}^{⋆} + {\underline{S}}^{⋆} - \underline{M}) ∥_{F}^{2} + γ (∥ {\underline{L}}^{⋆} ∥_{TNN} + λ ∥ {\underline{S}}^{⋆} ∥_{1}) . \end{matrix}

(A43)

According to Equations (A41) and (A43), we further have:

\begin{matrix} \frac{1}{2} ∥ \underline{B} ⊙ ({\underline{Q}}_{*} * {\underline{X}}_{*} + {\underline{S}}_{*} - \underline{M}) ∥_{F}^{2} + γ (∥ {\underline{X}}_{*} ∥_{TNN} + λ ∥ {\underline{S}}_{*} ∥_{1}) \\ \leq \frac{1}{2} ∥ \underline{B} ⊙ ({\underline{L}}^{⋆} + {\underline{S}}^{⋆} - \underline{M}) ∥_{F}^{2} + γ (∥ {\underline{L}}^{⋆} ∥_{TNN} + λ ∥ {\underline{S}}^{⋆} ∥_{1}) . \end{matrix}

(A44)

In this way,

({\underline{Q}}_{*} * {\underline{X}}_{*}, {\underline{S}}_{*})

is also the optimal solution to Problem (28). □

References

Liu, J.; Musialski, P.; Wonka, P.; Ye, J. Tensor completion for estimating missing values in visual data. IEEE Trans. Pattern Anal. Mach. Intell. 2013, 35, 208–220. [Google Scholar] [CrossRef] [PubMed]
Lu, C.; Feng, J.; Chen, Y.; Liu, W.; Lin, Z.; Yan, S. Tensor robust principal component analysis with a new tensor nuclear norm. IEEE Trans. Pattern Anal. Mach. Intell. 2019. [Google Scholar] [CrossRef] [Green Version]
Xu, Y.; Hao, R.; Yin, W.; Su, Z. Parallel matrix factorization for low-rank tensor completion. Inverse Probl. Imaging 2015, 9, 601–624. [Google Scholar] [CrossRef] [Green Version]
Liu, Y.; Shang, F. An Efficient Matrix Factorization Method for Tensor Completion. IEEE Signal Process. Lett. 2013, 20, 307–310. [Google Scholar] [CrossRef]
Wang, A.; Wei, D.; Wang, B.; Jin, Z. Noisy Low-Tubal-Rank Tensor Completion Through Iterative Singular Tube Thresholding. IEEE Access 2018, 6, 35112–35128. [Google Scholar] [CrossRef]
Tan, H.; Feng, G.; Feng, J.; Wang, W.; Zhang, Y.J.; Li, F. A tensor-based method for missing traffic data completion. Transp. Res. Part C 2013, 28, 15–27. [Google Scholar] [CrossRef] [Green Version]
Peng, Y.; Lu, B.L. Discriminative extreme learning machine with supervised sparsity preserving for image classification. Neurocomputing 2017, 261, 242–252. [Google Scholar] [CrossRef]
Cichocki, A.; Mandic, D.; De Lathauwer, L.; Zhou, G.; Zhao, Q.; Caiafa, C.; Phan, H.A. Tensor decompositions for signal processing applications: From two-way to multiway component analysis. IEEE Signal Process. Mag. 2015, 32, 145–163. [Google Scholar] [CrossRef] [Green Version]
Vaswani, N.; Bouwmans, T.; Javed, S.; Narayanamurthy, P. Robust subspace learning: Robust PCA, robust subspace tracking, and robust subspace recovery. IEEE Signal Process. Mag. 2018, 35, 32–55. [Google Scholar] [CrossRef] [Green Version]
Cichocki, A.; Lee, N.; Oseledets, I.; Phan, A.H.; Zhao, Q.; Mandic, D.P. Tensor Networks for Dimensionality Reduction and Large-scale Optimization: Part 1 Low-Rank Tensor Decompositions. Found. Trends® Mach. Learn. 2016, 9, 249–429. [Google Scholar] [CrossRef] [Green Version]
Yuan, M.; Zhang, C.H. On Tensor Completion via Nuclear Norm Minimization. Found. Comput. Math. 2016, 16, 1–38. [Google Scholar] [CrossRef] [Green Version]
Candès, E.J.; Tao, T. The power of convex relaxation: Near-optimal matrix completion. IEEE Trans. Inf. Theory 2010, 56, 2053–2080. [Google Scholar] [CrossRef] [Green Version]
Hillar, C.J.; Lim, L. Most Tensor Problems Are NP-Hard. J. ACM 2009, 60, 45. [Google Scholar] [CrossRef]
Yuan, M.; Zhang, C.H. Incoherent Tensor Norms and Their Applications in Higher Order Tensor Completion. IEEE Trans. Inf. Theory 2017, 63, 6753–6766. [Google Scholar] [CrossRef]
Tomioka, R.; Suzuki, T. Convex tensor decomposition via structured schatten norm regularization. In Proceedings of the Advances in Neural Information Processing Systems, Lake Tahoe, NV, USA, 5–10 December 2013; pp. 1331–1339. [Google Scholar]
Semerci, O.; Hao, N.; Kilmer, M.E.; Miller, E.L. Tensor-Based Formulation and Nuclear Norm Regularization for Multienergy Computed Tomography. IEEE Trans. Image Process. 2014, 23, 1678–1693. [Google Scholar] [CrossRef] [Green Version]
Mu, C.; Huang, B.; Wright, J.; Goldfarb, D. Square Deal: Lower Bounds and Improved Relaxations for Tensor Recovery. In Proceedings of the International Conference on Machine Learning, Beijing, China, 21–26 June 2014; pp. 73–81. [Google Scholar]
Zhao, Q.; Meng, D.; Kong, X.; Xie, Q.; Cao, W.; Wang, Y.; Xu, Z. A Novel Sparsity Measure for Tensor Recovery. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 271–279. [Google Scholar]
Wei, D.; Wang, A.; Wang, B.; Feng, X. Tensor Completion Using Spectral (k, p) -Support Norm. IEEE Access 2018, 6, 11559–11572. [Google Scholar] [CrossRef]
Tomioka, R.; Hayashi, K.; Kashima, H. Estimation of low-rank tensors via convex optimization. arXiv 2010, arXiv:1010.0789. [Google Scholar]
Chretien, S.; Wei, T. Sensing tensors with Gaussian filters. IEEE Trans. Inf. Theory 2016, 63, 843–852. [Google Scholar] [CrossRef] [Green Version]
Ghadermarzy, N.; Plan, Y.; Yılmaz, Ö. Near-optimal sample complexity for convex tensor completion. arXiv 2017, arXiv:1711.04965. [Google Scholar] [CrossRef] [Green Version]
Ghadermarzy, N.; Plan, Y.; Yılmaz, Ö. Learning tensors from partial binary measurements. arXiv 2018, arXiv:1804.00108. [Google Scholar] [CrossRef] [Green Version]
Liu, Y.; Shang, F.; Fan, W.; Cheng, J.; Cheng, H. Generalized Higher-Order Orthogonal Iteration for Tensor Decomposition and Completion. In Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada, 8–13 December 2014; pp. 1763–1771. [Google Scholar]
Zhang, Z.; Ely, G.; Aeron, S.; Hao, N.; Kilmer, M. Novel methods for multilinear data completion and de-noising based on tensor-SVD. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 3842–3849. [Google Scholar]
Lu, C.; Feng, J.; Lin, Z.; Yan, S. Exact Low Tubal Rank Tensor Recovery from Gaussian Measurements. In Proceedings of the 28th International Joint Conference on Artificial Intelligence, Stockholm, Sweden, 13–19 July 2018; pp. 1948–1954. [Google Scholar]
Jiang, J.Q.; Ng, M.K. Exact Tensor Completion from Sparsely Corrupted Observations via Convex Optimization. arXiv 2017, arXiv:1708.00601. [Google Scholar]
Xie, Y.; Tao, D.; Zhang, W.; Liu, Y.; Zhang, L.; Qu, Y. On Unifying Multi-view Self-Representations for Clustering by Tensor Multi-rank Minimization. Int. J. Comput. Vis. 2018, 126, 1157–1179. [Google Scholar] [CrossRef] [Green Version]
Ely, G.T.; Aeron, S.; Hao, N.; Kilmer, M.E. 5D seismic data completion and denoising using a novel class of tensor decompositions. Geophysics 2015, 80, V83–V95. [Google Scholar] [CrossRef] [Green Version]
Liu, X.; Aeron, S.; Aggarwal, V.; Wang, X.; Wu, M. Adaptive Sampling of RF Fingerprints for Fine-grained Indoor Localization. IEEE Trans. Mob. Comput. 2016, 15, 2411–2423. [Google Scholar] [CrossRef] [Green Version]
Wang, A.; Lai, Z.; Jin, Z. Noisy low-tubal-rank tensor completion. Neurocomputing 2019, 330, 267–279. [Google Scholar] [CrossRef]
Sun, W.; Chen, Y.; Huang, L.; So, H.C. Tensor Completion via Generalized Tensor Tubal Rank Minimization using General Unfolding. IEEE Signal Process. Lett. 2018, 25, 868–872. [Google Scholar] [CrossRef]
Kilmer, M.E.; Braman, K.; Hao, N.; Hoover, R.C. Third-order tensors as operators on matrices: A theoretical and computational framework with applications in imaging. SIAM J. Matrix Anal. Appl. 2013, 34, 148–172. [Google Scholar] [CrossRef] [Green Version]
Liu, X.Y.; Aeron, S.; Aggarwal, V.; Wang, X. Low-tubal-rank tensor completion using alternating minimization. arXiv 2016, arXiv:1610.01690. [Google Scholar]
Liu, X.Y.; Wang, X. Fourth-order tensors with multidimensional discrete transforms. arXiv 2017, arXiv:1705.01576. [Google Scholar]
Gu, Q.; Gui, H.; Han, J. Robust tensor decomposition with gross corruption. In Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada, 8–13 December 2014; pp. 1422–1430. [Google Scholar]
Wang, A.; Jin, Z.; Tang, G. Robust tensor decomposition via t-SVD: Near-optimal statistical guarantee and scalable algorithms. Signal Process. 2020, 167, 107319. [Google Scholar] [CrossRef]
Zhang, Z.; Aeron, S. Exact Tensor Completion Using t-SVD. IEEE Trans. Signal Process. 2017, 65, 1511–1526. [Google Scholar] [CrossRef]
Goldfarb, D.; Qin, Z. Robust low-rank tensor recovery: Models and algorithms. SIAM J. Matrix Anal. Appl. 2014, 35, 225–253. [Google Scholar] [CrossRef] [Green Version]
Kolda, T.G.; Bader, B.W. Tensor decompositions and applications. SIAM Rev. 2009, 51, 455–500. [Google Scholar] [CrossRef]
Cheng, L.; Wu, Y.C.; Zhang, J.; Liu, L. Subspace identification for DOA estimation in massive/full-dimension MIMO systems: Bad data mitigation and automatic source enumeration. IEEE Trans. Signal Process. 2015, 63, 5897–5909. [Google Scholar] [CrossRef]
Cheng, L.; Xing, C.; Wu, Y.C. Irregular Array Manifold Aided Channel Estimation in Massive MIMO Communications. IEEE J. Sel. Top. Signal Process. 2019, 13, 974–988. [Google Scholar] [CrossRef]
Zhao, Q.; Zhou, G.; Zhang, L.; Cichocki, A.; Amari, S.I. Bayesian robust tensor factorization for incomplete multiway data. IEEE Trans. Neural Networks Learn. Syst. 2016, 27, 736–748. [Google Scholar] [CrossRef] [Green Version]
Zhou, Y.; Cheung, Y. Bayesian Low-Tubal-Rank Robust Tensor Factorization with Multi-Rank Determination. IEEE Trans. Pattern Anal. Mach. Intell. 2019. [Google Scholar] [CrossRef]
Zhou, Z.; Li, X.; Wright, J.; Candes, E.; Ma, Y. Stable principal component pursuit. In Proceedings of the 2010 IEEE International Symposium on Information Theory, Austin, TX, USA, 12–18 June 2010; pp. 1518–1522. [Google Scholar]
Candès, E.J.; Li, X.; Ma, Y.; Wright, J. Robust principal component analysis? J. ACM 2011, 58, 11. [Google Scholar] [CrossRef]
Lu, C.; Feng, J.; Chen, Y.; Liu, W.; Lin, Z.; Yan, S. Tensor Robust Principal Component Analysis: Exact Recovery of Corrupted Low-Rank Tensors via Convex Optimization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 5249–5257. [Google Scholar]
Boyd, S.; Parikh, N.; Chu, E.; Peleato, B.; Eckstein, J. Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends® Mach. Learn. 2011, 3, 1–122. [Google Scholar] [CrossRef]
Peng, Y.; Lu, B.L. Robust structured sparse representation via half-quadratic optimization for face recognition. Multimed. Tools Appl. 2017, 76, 8859–8880. [Google Scholar] [CrossRef]
Liu, G.; Yan, S. Active subspace: Toward scalable low-rank learning. Neural Comput. 2012, 24, 3371–3394. [Google Scholar] [CrossRef] [PubMed]
Wang, A.; Jin, Z.; Yang, J. A Factorization Strategy for Tensor Robust PCA; ResearchGate: Berlin, Germany, 2019. [Google Scholar]
Jiang, Q.; Ng, M. Robust Low-Tubal-Rank Tensor Completion via Convex Optimization. In Proceedings of the 28th International Joint Conference on Artificial Intelligence, Macao, China, 10–16 August 2019; pp. 2649–2655. [Google Scholar]
Kernfeld, E.; Kilmer, M.; Aeron, S. Tensor–tensor products with invertible linear transforms. Linear Algebra Its Appl. 2015, 485, 545–570. [Google Scholar] [CrossRef]
Lu, C.; Peng, X.; Wei, Y. Low-Rank Tensor Completion With a New Tensor Nuclear Norm Induced by Invertible Linear Transforms. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019; pp. 5996–6004. [Google Scholar]
Liu, X.Y.; Aeron, S.; Aggarwal, V.; Wang, X. Low-tubal-rank tensor completion using alternating minimization. In Proceedings of the SPIE Defense+ Security, Baltimore, MD, USA, 17–21 April 2016; International Society for Optics and Photonics: Bellingham, DC, USA, 2016; p. 984809. [Google Scholar]
Zhou, P.; Lu, C.; Lin, Z.; Zhang, C. Tensor Factorization for Low-Rank Tensor Completion. IEEE Trans. Image Process. 2018, 27, 1152–1163. [Google Scholar] [CrossRef] [PubMed]
Martin, C.D.; Shafer, R.; Larue, B. An Order-p Tensor Factorization with Applications in Imaging. SIAM J. Sci. Comput. 2013, 35, A474–A490. [Google Scholar] [CrossRef]
Wang, A.; Jin, Z. Orientation Invariant Tubal Nuclear Norms Applied to Robust Tensor Decomposition. Available online: https://www.researchgate.net/publication/329116872_Orientation_Invariant_Tubal_Nuclear_Norms_Applied_to_Robust_Tensor_Decomposition (accessed on 3 December 2019).

Figure 1. A visual illustration of t-SVD.

Figure 2. The distribution of tensor singular values

\sum_{_} (i, i, 1)

in a natural color image. (a) the sample image, (b) the distribution of

\sum_{_} (i, i, 1)

.

Figure 2. The distribution of tensor singular values

\sum_{_} (i, i, 1)

in a natural color image. (a) the sample image, (b) the distribution of

\sum_{_} (i, i, 1)

.

Figure 3. The distribution of tensor singular values

\sum_{_} (i, i, 1)

in a video sequence. (a) the first frame of the video, (b) the distribution of

\sum_{_} (i, i, 1)

.

Figure 3. The distribution of tensor singular values

\sum_{_} (i, i, 1)

in a video sequence. (a) the first frame of the video, (b) the distribution of

\sum_{_} (i, i, 1)

.

Figure 4. The MSEs of

\hat{\underline{L}}

and

{\underline{S}}_{0}

versus

c^{2}

for tensors of size

60 \times 60 \times 20

where the tubal rank

r_{tubal} ({\underline{L}}_{0})

of

{\underline{L}}_{0}

and sparsity s of

{\underline{S}}_{0}

are set as

(r_{tubal} ({\underline{L}}_{0}), s) = (5, 0.1 n^{2} n_{3})

. (a): MSE of

\hat{\underline{L}}

vs

c^{2}

. (b): MSE of

\hat{\underline{S}}

vs

c^{2}

.

Figure 4. The MSEs of

\hat{\underline{L}}

and

{\underline{S}}_{0}

versus

c^{2}

for tensors of size

60 \times 60 \times 20

where the tubal rank

r_{tubal} ({\underline{L}}_{0})

of

{\underline{L}}_{0}

and sparsity s of

{\underline{S}}_{0}

are set as

(r_{tubal} ({\underline{L}}_{0}), s) = (5, 0.1 n^{2} n_{3})

. (a): MSE of

\hat{\underline{L}}

vs

c^{2}

. (b): MSE of

\hat{\underline{S}}

vs

c^{2}

.

Figure 5. The 20 color images used.

Figure 6. The quantitative comparison in PSNR and time on color images. First,

10 %

entries of each image is corrupted by i.i.d. symmetric Bernoulli variable, then polluted by Gaussian noise of noise level

c = 0.05

, and finally

10 %

of the corrupted entries are missing uniformly at random. (a): the PSNR values of each algorithm; (b): the running time of each algorithm.

Figure 6. The quantitative comparison in PSNR and time on color images. First,

10 %

entries of each image is corrupted by i.i.d. symmetric Bernoulli variable, then polluted by Gaussian noise of noise level

c = 0.05

, and finally

10 %

of the corrupted entries are missing uniformly at random. (a): the PSNR values of each algorithm; (b): the running time of each algorithm.

Figure 7. The quantitative comparison in PSNR and time on color images. First,

20 %

entries of each image is corrupted by i.i.d. symmetric Bernoulli variable, then polluted by Gaussian noise of noise level

c = 0.05

, and finally

20 %

of the corrupted entries are missing uniformly at random. (a): the PSNR values of each algorithm; (b): the running time of each algorithm.

Figure 7. The quantitative comparison in PSNR and time on color images. First,

20 %

entries of each image is corrupted by i.i.d. symmetric Bernoulli variable, then polluted by Gaussian noise of noise level

c = 0.05

, and finally

20 %

of the corrupted entries are missing uniformly at random. (a): the PSNR values of each algorithm; (b): the running time of each algorithm.

Figure 8. The visual results for image recovery of different algorithms. First,

20 %

entries of each image is corrupted by i.i.d. symmetric Bernoulli variable, then polluted by Gaussian noise of noise level

c = 0.05

, and finally

20 %

of the corrupted entries are missing uniformly at random. (a): the original image; (b): the corrupted image; (c) image recovered by Algorithm 1; (d) image recovered by Algorithm 2; (e) image recovered by the matrix nuclear norm (NN)-based Model (53); (f) image recovered by the SNN-based Model (55).

Figure 8. The visual results for image recovery of different algorithms. First,

20 %

entries of each image is corrupted by i.i.d. symmetric Bernoulli variable, then polluted by Gaussian noise of noise level

c = 0.05

, and finally

20 %

of the corrupted entries are missing uniformly at random. (a): the original image; (b): the corrupted image; (c) image recovered by Algorithm 1; (d) image recovered by Algorithm 2; (e) image recovered by the matrix nuclear norm (NN)-based Model (53); (f) image recovered by the SNN-based Model (55).

Table 1. Performance of Algorithm 1 and Algorithm 2 in both accuracy and speed for different tensor sizes when the gross corruption. Outliers from symmetric Bernoulli, observation tensor

\underline{M} \in R^{n \times n \times n_{3}}

,

n_{3} = 30

,

r_{tubal} ({\underline{L}}_{0}) = 0.05 n

,

∥ {\underline{S}}_{0} ∥_{1} = 0.05 n^{2} n_{3}

, noise level

c = 0

,

r = max \{⌊ 2 r_{tubal} ({\underline{L}}_{0}) ⌋, 15\}

.

Table 1. Performance of Algorithm 1 and Algorithm 2 in both accuracy and speed for different tensor sizes when the gross corruption. Outliers from symmetric Bernoulli, observation tensor

\underline{M} \in R^{n \times n \times n_{3}}

,

n_{3} = 30

,

r_{tubal} ({\underline{L}}_{0}) = 0.05 n

,

∥ {\underline{S}}_{0} ∥_{1} = 0.05 n^{2} n_{3}

, noise level

c = 0

,

r = max \{⌊ 2 r_{tubal} ({\underline{L}}_{0}) ⌋, 15\}

.

n	$r_{tubal}$ ${\underline{L}}_{0}$	$∥ {\underline{S}}_{0} ∥_{0}$	Method	$r_{tubal}$ $\hat{\underline{L}}$	$\frac{∥ \hat{\underline{L}} - {\underline{L}}_{0} ∥_{F}}{∥ {\underline{L}}_{0} ∥_{F}}$	$\frac{∥ \hat{\underline{S}} - {\underline{S}}_{0} ∥_{F}}{∥ {\underline{S}}_{0} ∥_{F}}$	time/s
100	5	$1 \times 10^{4}$	Algorithm 1	5	$5.13 \times 10^{- 6}$	$5.27 \times 10^{- 6}$	3.63
100	5	$1 \times 10^{4}$	Algorithm 2	5	$4.92 \times 10^{- 6}$	$5.12 \times 10^{- 6}$	1.76
160	8	$2.56 \times 10^{4}$	Algorithm 1	8	$3.86 \times 10^{- 6}$	$3.52 \times 10^{- 6}$	9.52
160	8	$2.56 \times 10^{4}$	Algorithm 2	8	$4.48 \times 10^{- 6}$	$4.08 \times 10^{- 6}$	4.42
200	10	$4 \times 10^{4}$	Algorithm 1	10	$3.46 \times 10^{- 6}$	$3.59 \times 10^{- 6}$	14.16
200	10	$4 \times 10^{4}$	Algorithm 2	10	$4.12 \times 10^{- 6}$	$4.63 \times 10^{- 6}$	7.44

Table 2. Performance of Algorithm 1 and Algorithm 2 in both accuracy and speed for different tensor sizes when the gross corruption. Outliers from standard Gaussian distribution, observation tensor

\underline{M} \in R^{n \times n \times n_{3}}

,

n_{3} = 30

,

r_{tubal} ({\underline{L}}_{0}) = 0.05 n

,

∥ {\underline{S}}_{0} ∥_{1} = 0.05 n^{2} n_{3}

, noise level

c = 0

,

r = max \{⌊ 2 r_{tubal} ({\underline{L}}_{0}) ⌋, 15\}

.

Table 2. Performance of Algorithm 1 and Algorithm 2 in both accuracy and speed for different tensor sizes when the gross corruption. Outliers from standard Gaussian distribution, observation tensor

\underline{M} \in R^{n \times n \times n_{3}}

,

n_{3} = 30

,

r_{tubal} ({\underline{L}}_{0}) = 0.05 n

,

∥ {\underline{S}}_{0} ∥_{1} = 0.05 n^{2} n_{3}

, noise level

c = 0

,

r = max \{⌊ 2 r_{tubal} ({\underline{L}}_{0}) ⌋, 15\}

.

n	$r_{tubal}$ ${\underline{L}}_{0}$	$∥ {\underline{S}}_{0} ∥_{0}$	Method	$r_{tubal}$ $\hat{\underline{L}}$	$\frac{∥ \hat{\underline{L}} - {\underline{L}}_{0} ∥_{F}}{∥ {\underline{L}}_{0} ∥_{F}}$	$\frac{∥ \hat{\underline{S}} - {\underline{S}}_{0} ∥_{F}}{∥ {\underline{S}}_{0} ∥_{F}}$	time/s
100	5	$1 \times 10^{4}$	Algorithm 1	5	$2.7 \times 10^{- 6}$	$2.6 \times 10^{- 6}$	4.43
100	5	$1 \times 10^{4}$	Algorithm 2	5	$2.9 \times 10^{- 6}$	$3.2 \times 10^{- 6}$	1.82
160	8	$2.56 \times 10^{4}$	Algorithm 1	8	$4.76 \times 10^{- 6}$	$4.08 \times 10^{- 6}$	10.45
160	8	$2.56 \times 10^{4}$	Algorithm 2	8	$4.24 \times 10^{- 6}$	$4.05 \times 10^{- 6}$	5.15
200	10	$4 \times 10^{4}$	Algorithm 1	10	$3.78 \times 10^{- 6}$	$3.64 \times 10^{- 6}$	18.97
200	10	$4 \times 10^{4}$	Algorithm 2	10	$3.78 \times 10^{- 6}$	$3.63 \times 10^{- 6}$	8.04

Table 3. Performance of Algorithm 1 and Algorithm 2 in both accuracy and speed for different tensor sizes when the gross corruption. Outliers from symmetric Bernoulli, observation tensor

\underline{M} \in R^{n \times n \times n_{3}}

,

n_{3} = 30

,

r_{tubal} ({\underline{L}}_{0}) = 0.05 n

,

∥ {\underline{S}}_{0} ∥_{1} = 0.05 n^{2} n_{3}

, noise level

c = 0

,

r = max \{⌊ 2 r_{tubal} ({\underline{L}}_{0}) ⌋, 15\}

, with

% 20

random missing entries.

Table 3. Performance of Algorithm 1 and Algorithm 2 in both accuracy and speed for different tensor sizes when the gross corruption. Outliers from symmetric Bernoulli, observation tensor

\underline{M} \in R^{n \times n \times n_{3}}

,

n_{3} = 30

,

r_{tubal} ({\underline{L}}_{0}) = 0.05 n

,

∥ {\underline{S}}_{0} ∥_{1} = 0.05 n^{2} n_{3}

, noise level

c = 0

,

r = max \{⌊ 2 r_{tubal} ({\underline{L}}_{0}) ⌋, 15\}

, with

% 20

random missing entries.

n	$r_{tubal}$ ${\underline{L}}_{0}$	$∥ \underline{B} ⊙ {\underline{S}}_{0} ∥_{0}$	Method	$r_{tubal}$ $\hat{\underline{L}}$	$\frac{∥ \hat{\underline{L}} - {\underline{L}}_{0} ∥_{F}}{∥ {\underline{L}}_{0} ∥_{F}}$	$\frac{∥ \hat{\underline{S}} - \underline{B} ⊙ {\underline{S}}_{0} ∥_{F}}{∥ \underline{B} ⊙ {\underline{S}}_{0} ∥_{F}}$	time/s
100	5	$8 \times 10^{3}$	Algorithm 1	5	$7.52 \times 10^{- 6}$	$5.97 \times 10^{- 6}$	3.87
100	5	$8 \times 10^{3}$	Algorithm 2	5	$7.50 \times 10^{- 6}$	$5.96 \times 10^{- 6}$	1.69
160	8	$2.048 \times 10^{4}$	Algorithm 1	8	$4.46 \times 10^{- 6}$	$5.17 \times 10^{- 6}$	9.64
160	8	$2.048 \times 10^{4}$	Algorithm 2	8	$5.60 \times 10^{- 6}$	$4.71 \times 10^{- 6}$	4.46
200	10	$3.2 \times 10^{4}$	Algorithm 1	10	$4.78 \times 10^{- 6}$	$4.04 \times 10^{- 6}$	14.78
200	10	$3.2 \times 10^{4}$	Algorithm 2	10	$5.13 \times 10^{- 6}$	$4.20 \times 10^{- 6}$	7.77

Table 4. PSNR values and running time (in seconds) of different algorithms on video data. First,

ρ_{s} n_{1} n_{2} n_{3}

entries of each image is corrupted by i.i.d. symmetric Bernoulli variable, then polluted by Gaussian noise of noise level

c = 0.05

, and finally

(1 - ρ_{obs}) n_{1} n_{2} n_{3}

of the corrupted entries are missing uniformly at random. The items with highest PSNR values are highlighted with bold face, and the items with shortest running time are highlighted with underline.

Table 4. PSNR values and running time (in seconds) of different algorithms on video data. First,

ρ_{s} n_{1} n_{2} n_{3}

entries of each image is corrupted by i.i.d. symmetric Bernoulli variable, then polluted by Gaussian noise of noise level

c = 0.05

, and finally

(1 - ρ_{obs}) n_{1} n_{2} n_{3}

of the corrupted entries are missing uniformly at random. The items with highest PSNR values are highlighted with bold face, and the items with shortest running time are highlighted with underline.

Data Set	$(ρ_{obs}, ρ_{s})$	Index	NN, Model (54)	SNN, Model (55)	Algorithm 1	Algorithm 2
Akiyo	(0.9,0.1)	PSNR	31.74	32.09	33.94	33.36
	(0.9,0.1)	time/s	29.48	51.13	20.10	12.39
	(0.8,0.2)	PSNR	30.59	30.70	32.44	32.07
	(0.8,0.2)	time/s	30.65	51.17	19.53	14.92
Silent	(0.9,0.1)	PSNR	28.26	30.39	31.74	31.23
	(0.9,0.1)	time/s	28.91	49.79	21.21	14.76
	(0.8,0.2)	PSNR	26.95	27.60	30.42	30.07
	(0.8,0.2)	time/s	36.51	60.81	22.43	15.62
Carphone	(0.9,0.1)	PSNR	26.87	28.79	29.15	28.94
	(0.9,0.1)	time/s	28.55	47.17	22.12	14.41
	(0.8,0.2))	PSNR	26.12	26.43	28.17	27.99
	(0.8,0.2))	time/s	26.72	49.21	20.55	14.74
Claire	(0.9,0.1)	PSNR	30.56	32.20	34.27	34.02
	(0.9,0.1)	time/s	29.75	47.32	21.43	13.52
	(0.8,0.2)	PSNR	29.94	30.43	32.96	32.78
	(0.8,0.2)	time/s	29.43	50.46	19.47	13.04

© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Fang, W.; Wei, D.; Zhang, R. Stable Tensor Principal Component Pursuit: Error Bounds and Efficient Algorithms. Sensors 2019, 19, 5335. https://doi.org/10.3390/s19235335

AMA Style

Fang W, Wei D, Zhang R. Stable Tensor Principal Component Pursuit: Error Bounds and Efficient Algorithms. Sensors. 2019; 19(23):5335. https://doi.org/10.3390/s19235335

Chicago/Turabian Style

Fang, Wei, Dongxu Wei, and Ran Zhang. 2019. "Stable Tensor Principal Component Pursuit: Error Bounds and Efficient Algorithms" Sensors 19, no. 23: 5335. https://doi.org/10.3390/s19235335

APA Style

Fang, W., Wei, D., & Zhang, R. (2019). Stable Tensor Principal Component Pursuit: Error Bounds and Efficient Algorithms. Sensors, 19(23), 5335. https://doi.org/10.3390/s19235335

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Stable Tensor Principal Component Pursuit: Error Bounds and Efficient Algorithms

Abstract

1. Introduction

2. Preliminaries and Related Works

2.1. Tensor Singular Value Decomposition

2.2. Related Works

3. Theoretical Guarantee for Stable Tensor Principal Component Pursuit

3.1. The Proposed STPCP

3.2. A Theorem for Stable Recovery

4. Algorithms

4.1. An ADMM Algorithm

4.2. A Faster Algorithm

5. Experiments

5.1. Synthetic Data

5.1.1. Exact Recovery in the Noiseless Setting

5.1.2. Linear Scaling of Errors with the Noise Level

5.2. Real Data Sets

5.2.1. Color Images

5.2.2. Videos

6. Conclusions

Author Contributions

Funding

Acknowledgments

Conflicts of Interest

Appendix A. Proofs of Lemmas and Theorems

Appendix A.1. The Proof of Theorem 1

Appendix A.1.1. Key Lemmas for the Proof of Theorem 1

Appendix A.1.2. Proof of Theorem 1

Appendix A.2. Proof of Theorem 2

Appendix A.3. Proof of Lemma 4

Appendix A.4. Proof of Theorem 3

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI