Guaranteed Robust Tensor Completion via ∗L-SVD with Applications to Remote Sensing Data

Wang, Andong; Zhou, Guoxu; Zhao, Qibin

doi:10.3390/rs13183671

Open AccessArticle

Guaranteed Robust Tensor Completion via ∗_L-SVD with Applications to Remote Sensing Data

by

Andong Wang

^1,2,3

,

Guoxu Zhou

^1,3 and

Qibin Zhao

^1,2,*

¹

School of Automation, Guangdong University of Technology, Guangzhou 510006, China

²

Tensor Learning Team, RIKEN AIP, Tokyo 103-0027, Japan

³

Key Laboratory of Intelligent Detection and The Internet of Things in Manufacturing, Ministry of Education, Guangzhou 510006, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2021, 13(18), 3671; https://doi.org/10.3390/rs13183671

Submission received: 30 July 2021 / Revised: 3 September 2021 / Accepted: 9 September 2021 / Published: 14 September 2021

(This article belongs to the Special Issue Remote Sensing Image Denoising, Restoration and Reconstruction)

Download

Browse Figures

Versions Notes

Abstract

:

This paper conducts a rigorous analysis for the problem of robust tensor completion, which aims at recovering an unknown three-way tensor from incomplete observations corrupted by gross sparse outliers and small dense noises simultaneously due to various reasons such as sensor dead pixels, communication loss, electromagnetic interferences, cloud shadows, etc. To estimate the underlying tensor, a new penalized least squares estimator is first formulated by exploiting the low rankness of the signal tensor within the framework of tensor

*_{L}

-Singular Value Decomposition (

*_{L}

-SVD) and leveraging the sparse structure of the outlier tensor. Then, an algorithm based on the Alternating Direction Method of Multipliers (ADMM) is designed to compute the estimator in an efficient way. Statistically, the non-asymptotic upper bound on the estimation error is established and further proved to be optimal (up to a log factor) in a minimax sense. Simulation studies on synthetic data demonstrate that the proposed error bound can predict the scaling behavior of the estimation error with problem parameters (i.e., tubal rank of the underlying tensor, sparsity of the outliers, and the number of uncorrupted observations). Both the effectiveness and efficiency of the proposed algorithm are evaluated through experiments for robust completion on seven different types of remote sensing data.

Keywords:

remote sensing data restoration; robust tensor completion; tensor SVD; statistical performance; ADMM

Graphical Abstract

1. Introduction

Despite the broad adoption of advanced sensors in various remote sensing tasks, the quality of data remains a critical issue and can significantly influence the actual performances of the backend applications. Many types of modern remote sensing data in the modality of optical, hyperspectral, multispectral, thermal, Light Detection and Ranging (LiDAR), Synthetic Aperture Radar (SAR), etc., are typically multi-way and can be readily stored, analyzed, and processed by tensor-based models [1,2,3,4,5,6,7]. In some extreme circumstances, the data tensor may encounter missing entries, gross sparse outliers, and small dense noises at the same time, as a result of partial sensor failures, communication errors, occlusion by obstacles, and so on [8,9]. To robustly complete a partially observed data tensor corrupted by outliers and noises, the problem of robust tensor completion arises.

When only a fraction of partially corrupted observations are available, the crucial point of robust tensor completion lies in the assumption that the underlying data tensor is highly redundant such that the main components of it remain only slightly suppressed by missing information, outliers, and noises, and thus can be effectively reconstructed by exploiting the intrinsic redundancy. The tensor low-rankness is an ideal tool to model the redundancy of tensor data, and has gained extensive attention in remote sensing data restoration [5,10,11].

As higher-order extensions of low-rank matrix models [12], low-rank tensor models are typically formulated as minimization problems of the tensor rank function [13]. However, there are multiple definitions of tensor ranks, such as the CP rank [14], Tucker rank [15], TT rank [16], TR rank [17], etc., which focus on low rank structures in the original domains (like the pixel domain of optimal images) [18,19]. Recently, a remarkably different example named the low-tubal-rank tensor model [20,21] was proposed within the algebraic framework of tensor Singular Value Decomposition (t-SVD) [20,22], which captures low-rankness in the frequency domain defined via Discrete Fourier Transform (DFT). As discussed in [18,19,21,23], the low-tubal-rank tensor models are capable to exploit both low-rankness and smoothness of the tensor data, making it quite suitable to analyze and process diverse remote sensing imagery data which are often simultaneously low-rank and smooth [5,10].

Motivated by the advantages of low-tubal-rankness in modeling remote sensing data, we resolve the robust tensor completion problem by utilizing a generalized low-tubal-rank model based on the tensor

*_{L}

-Singular Value Decomposition (

*_{L}

-SVD) [24], which leverages low-rankness in more general transformed domains rather than DFT. What needs to be pointed out is that the

*_{L}

-SVD has become a research focus in tensor-based signal processing, computer vision, and machine learning very recently [18,23,25,26]. Regarding the preference of theory in this paper, we only introduce several typical works with statistical analysis as follows. For tensor completion in the noiseless settings, Lu et al. [26] proposed a

*_{L}

-SVD-based model which can exactly recover the underlying tensor under mild conditions. For tensor completion from partial observations corrupted by sparse outliers, Song et al. [27] designed a

*_{L}

-SVD-based algorithm with exact recovery guarantee. Zhang et al. [25] developed a theoretically guaranteed approach via the

*_{L}

-SVD to for tensor completion from Poisson noises. The problem of tensor recovery from noisy linear observations is studied in [18] based the

*_{L}

-SVD with guaranteed statistical performance.

In this paper, we focus on statistical guaranteed approaches in a more challenging setting than the aforementioned

*_{L}

-SVD-based models, where the underlying signal tensor suffers from missing entries, sparse outliers, and small dense noises simultaneously. Specifically, we resolve the problem of robust tensor completion by formulating a

*_{L}

-SVD-based estimator whose estimation error is established and further proved to be minimax optimal (up to a log factor). We propose an algorithm based on Alternating Direction Method of Multipliers (ADMM) [28,29] to compute the estimator and evaluate both the effectiveness and efficiency on seven different types of remote sensing data.

The remainder of this paper proceeds as follows. We first introduce some notation and preliminaries in Section 2. Then, the proposed estimator for robust tensor completion is formulated in Section 3. We compute the estimator by using an ADMM-based algorithm described in Section 4. The statistical performance of the proposed estimator is analyzed in Section 5. Experimental results on both synthetic and real datasets are reported in Section 7. We summarize this paper and discuss future directions briefly in Section 8. The proofs of the theoretical results are given in Appendix A.

2. Preliminaries

In this section, we first introduce some notations and then give a brief introduction to the

*_{L}

-SVD framework.

2.1. Notations

Main notations are listed in Table 1. Let

[d] : = {1, \dots, d}

,

\forall d \in N_{+}

. Let

a \lor b = max {a, b}

and

a \land b = min {a, b}

,

\forall a, b \in R^{}

. For

i \in [d], e_{i} \in R^{d}

denotes the standard vector basis whose

i_{t h}

entry is 1 with the others 0. For

(i, j, k) \in [d_{1}] \times [d_{2}] \times [d_{3}]

, the outer product

e_{i} \circ e_{j} \circ e_{k}

denotes a standard tensor basis in

R^{d_{1} \times d_{2} \times d_{3}}

, whose

{(i, j, k)}_{t h}

entry is 1 with the others 0. For a 3-way tensor, a tube is a vector defined by fixing indices of the first two modes and varying the third one; A slice is a matrix defined by fixing all but two indices. For any set

Θ

,

| Θ |

denotes its cardinality and

Θ^{⊥}

its complement. Absolute positive constants are denoted by

C, c

,

c_{0}, etc

whose values may vary from line to line. When the field and size of a tensor are not shown explicitly, it is defaulted to be in

R^{d_{1} \times d_{2} \times d_{3}}

. The spectral norm

∥ \cdot ∥

and nuclear norm

{∥ \cdot ∥}_{*}

of a matrix are the maximum and the sum of the singular values, respectively.

2.2. Tensor $*_{L}$ -Singular Value Decomposition

The tensor

*_{L}

-SVD is a generalization of the t-SVD [22]. To get a better understanding of

*_{L}

-SVD, we first introduce several basic notions of t-SVD as follows. For any tensor

T \in R^{d_{1} \times d_{2} \times d_{3}}

, its block circulant matrix

bcirc (T)

is defined as

\begin{matrix} bcirc (T) : = [\begin{matrix} T^{(1)} & T^{(d_{3})} & \dots & T^{(2)} \\ T^{(2)} & T^{(1)} & \dots & T^{(3)} \\ ⋮ & ⋱ & ⋱ & ⋮ \\ T^{(d_{3})} & T^{(d_{3} - 1)} & \dots & T^{(1)} \end{matrix}] \end{matrix}

We also define the block vectorization operator and its inverse operator for any

T \in R^{d_{1} \times d_{2} \times d_{3}}

by:

\begin{matrix} bvec (T) : = [\begin{matrix} T^{(1)} \\ T^{(2)} \\ ⋮ \\ T^{(d_{3})} \end{matrix}], bvfold (bvec (T)) = T \end{matrix}

Then, based on the operators defined above, we are able to give the definition of the tensor t-product.

Definition 1

(T-product [22]). For any tensors

A \in R^{d_{1} \times d_{2} \times d_{3}}

and

B \in R^{d_{2} \times d_{4} \times d_{3}}

, their t-product is a tensor

C

of size

d_{1} \times d_{4} \times d_{3}

computed as follows:

\begin{matrix} C = A * B : = bvfold (bcirc (A) bvec (B)) \end{matrix}

If we view the 3-way tensor

C \in R^{d_{1} \times d_{4} \times d_{3}}

as a

d_{1}

-by-

d_{4}

“matrix”

C

of tubes

C (i, j, :) \in R^{d_{3}}

, then the t-product can be analogously conducted like the matrix multiplication by changing scalar multiplication by the circular convolution between the tubes (i.e., vectors), as follows:

\begin{matrix} C (i, j, :) = \sum_{k = 1}^{d_{2}} A (i, k, :) ⊛ B (k, j, :) \end{matrix}

(1)

where the symbol ⊛ denotes the circular convolution of two tubes

a, b \in R^{d_{3}}

defined as follows [22]:

\begin{matrix} {(a ⊛ b)}_{j} = \sum_{k = 1}^{d_{3}} a_{k} b_{1 + (j - k) \mod d_{3}} \end{matrix}

where

\mod (\cdot)

is the modulus operator. According to the well-known relationship between circular convolution and DFT, the t-product is equivalent to matrix multiplication between all the frontal slices in the Fourier domain [22], i.e.,

\begin{matrix} \bar{C} = \bar{A} ⊙ \bar{B} \end{matrix}

(2)

where

\bar{T}

denotes the tensor obtained by conducting DFT on all the mode-3 fibers of any tensor

T

, i.e.,

\begin{matrix} \bar{T} = T \times_{3} F_{d_{3}} \end{matrix}

(3)

where

F_{d_{3}}

is the transform matrix of DFT [22], and

\times_{3}

denotes the tensor mode-3 product [30].

In [24], Kernfeld et al. extended the t-product to the tensor

*_{L}

-product by replacing DFT by any invertible linear transform

L (\cdot)

induced by a non-singular transformation matrix

L

, and established the framework of

*_{L}

-SVD. In the latest studies, the transformation matrix

L

defining the transform L is restricted to be orthogonal [18,26,31,32] (unitary in [25,27]) for better properties, which is also followed in this paper.

Given any orthogonal matrix

L \in R^{d_{3} \times d_{3}}

(though we restrict

L

to be orthogonal for simplicy, our analysis still holds with simple extensions for unitary

L

[27]), define the associated linear transform

L (\cdot)

with inverse

L^{- 1} (\cdot)

on any

T \in R^{d_{1} \times d_{2} \times d_{3}}

as

\begin{matrix} \bar{T} = L (T) : = T \times_{3} L, a n d L^{- 1} (T) : = T \times_{3} L^{- 1} \end{matrix}

(4)

Definition 2

(Tensor

*_{L}

-product [24]). The

*_{L}

–product of any

A \in R^{d_{1} \times d_{2} \times d_{3}}

and

B \in R^{d_{2} \times d_{4} \times d_{3}}

under the invertible linear transform L in Equation (4), denoted by

A *_{L} B

, is defined as the tensor

C \in R^{d_{1} \times d_{4} \times d_{3}}

such that

L (C) = L (A) ⊙ L (B)

.

Definition 3

(

*_{L}

–block-diagonal matrix [18]). For any

T \in R^{d_{1} \times d_{2} \times d_{3}}

, its

*_{L}

–block-diagonal matrix, denoted by

\bar{T}

, is defined as the block diagonal matrix whose i-th diagonal block is the i-th frontal slice

{\bar{T}}^{(i)}

of

\bar{T} = L (T)

, i.e.,

\begin{matrix} \bar{T} : = bdiag (\bar{T}) : = [\begin{matrix} {\bar{T}}^{(1)} \\ ⋱ \\ {\bar{T}}^{(d_{3})} \end{matrix}] \in R^{d_{1} d_{3} \times d_{2} d_{3}} \end{matrix}

Based on the notions of tensor

*_{L}

–transpose,

*_{L}

-identity tensor,

*_{L}

-orthogonal tensor, and f-diagonal tensor [24], the

*_{L}

–SVD (illustrated in Figure 1) is given.

Theorem 1

(Tensor

*_{L}

–SVD,

*_{L}

-tubal rank [24]). Any

T \in R^{d_{1} \times d_{2} \times d_{3}}

has a tensor

*_{L}

–Singular Value Decomposition (

*_{L}

–SVD) under any L in Equation (4), given as follows

T = U *_{L} D *_{L} V^{⊤}

(5)

where

U \in R^{d_{1} \times d_{1} \times d_{3}}

,

V \in R^{d_{2} \times d_{2} \times d_{3}}

are

*_{L}

-orthogonal, and

D \in R^{d_{1} \times d_{2} \times d_{3}}

is f-diagonal.

The

*_{L}

-tubal rank of

T \in R^{d_{1} \times d_{2} \times d_{3}}

is defined as the number of non-zero tubes of

D

in its

*_{L}

–SVD in Equation (5) i.e.,

\begin{matrix} r_{t b} (T) : = # {i | D (i, i, :) \neq 0, i \in [d_{1} \land d_{2}]} \end{matrix}

where # counts the number of elements of a given set.

For any

T \in R^{d_{1} \times d_{2} \times d_{3}}

, we have the following equivalence between its

*_{L}

-SVD and the matrix SVD of its

*_{L}

–block-diagonal matrix

\bar{T}

:

\begin{matrix} T = U *_{L} D *_{L} V^{⊤} \Leftrightarrow \bar{T} = \bar{U} \cdot \bar{D} \cdot \bar{V}^{⊤} . \end{matrix}

Considering the block diagonal structure of

\bar{T}

, we define the tensor

*_{L}

-multi-rank on its diagonal blocks

{\bar{T}}^{(i)}

:

Definition 4

(Tensor

*_{L}

–nuclear norm, tensor

*_{L}

-spectral norm [26]). The tensor

*_{L}

–nuclear norm (

*_{L}

-TNN) and

*_{L}

-spectral norm of any

T \in R^{d_{1} \times d_{2} \times d_{3}}

under any L in Equation (4) are defined as the matrix nuclear norm and matrix spectral norm of

\bar{T}

, respectively, i.e.,

\begin{matrix} {∥ T ∥}_{⋆} : = ∥ \bar{T} ∥_{*} {, ∥ T ∥}_{s p} : = ∥ \bar{T} ∥ . \end{matrix}

As proved in [26,27],

*_{L}

–TNN is the convex envelop of the

l_{1}

-norm of the

*_{L}

–multi-rank in unit tensor

*_{L}

-spectral norm ball. Thus,

*_{L}

–TNN encourages a low

*_{L}

–multi-rank structure which means low-rankness in spectral domain. When the linear transform L represents the DFT (although we restrict the

L

in Equation (4) to be orthogonal, we still consider TNN as a special case of

*_{L}

–TNN up to constants and real/complex domain) along the 3-rd mode,

*_{L}

–TNN and tensor

*_{L}

-spectral norm degenerate to the Tubal Nuclear Norm (TNN) and the tensor spectral norm, respectively, up to a constant factor

d_{3}^{- 1}

[26,33].

3. Robust Tensor Completion

In this section, we will formulate the robust tensor completion problem. The observation model will be shown first.

3.1. The Observation Model

Consider an underling signal tensor

L^{*} \in R^{d_{1} \times d_{2} \times d_{3}}

which possesses intrinsically low-dimensionality structure characterized by low-tubal-rankness, that is

r_{t b} (L^{*}) ≪ d_{1} \land d_{2}

. Suppose we obtain N scalar observations

y_{i}

of

L^{*} \in R^{d_{1} \times d_{2} \times d_{3}}

from the noisy observation model:

\begin{matrix} y_{i} = 〈L^{*} + S^{*}, X_{i}〉 + ξ_{i}, \forall i \in [N], \end{matrix}

(6)

where the tensor

S^{*} \in R^{d_{1} \times d_{2} \times d_{3}}

represents some gross corruptions (e.g., outliers, errors, etc.) additive to the signal

L^{*}

which is element-wisely sparse (the presented theoretical analysis and optimization algorithm can be generalized to more sparsity settings of corruptions (e.g. the tube-wise sparsity [20,34], and slice-wise sparsity [34,35]) by using the tools developed for robust matrix completion in [36] and robust tensor decomposition in [34]; for simplicity, we only consider the most common element-wisely sparse case),

ξ_{i}

’s are random noises sampled i.i.d. from Gaussian distribution

N (0, σ^{2})

, and

X_{i}

’s are known random design tensors in

R^{d_{1} \times d_{2} \times d_{3}}

satisfying the following assumptions:

Assumption A1.

We make two natural assumptions on the design tensors:

I.: All the corrupted positions of $L^{*}$ are observed, that is, the (unknown) support $Θ_{s} = supp (S^{*}) : = {(i, j, k) | S_{i j k}^{*} \neq 0}$ of the corruption tensor $S^{*}$ is fully observed. Formally speaking, there exists an unknown subset $X_{s} \subset {X_{i}}_{i = 1}^{N}$ drawn from an (unknown) distribution $Π_{Θ_{s}}$ on the set $X_{Θ_{s}} : = \{e_{j} \circ e_{k} \circ e_{l}, \forall (j, k, l) \in Θ_{s}\}$ , such that each element in $X_{Θ_{s}}$ is sampled at least once.
II.: All uncorrupted positions of $L^{*}$ are sampled uniformly with replacement for simplicity of exposition. Formally speaking, each element of the set $X_{s}^{⊥} : = {X_{i}}_{i = 1}^{N} ∖ X_{s}$ is sampled i.i.d. from an uniform distribution $Π_{Θ_{s}^{⊥}}$ on the set $X_{Θ_{s}^{⊥}} : = \{e_{j} \circ e_{k} \circ e_{l}, \forall (j, k, l) \in Θ_{s}^{⊥}\}$ .

According to the observation model (6), the true tensor

L^{*}

is first corrupted by a sparse tensor

S^{*}

and then sampled to N scalars

{y_{i}}

with additive Gaussian noises

{ξ_{i}}

(see Figure 2). The corrupted positions of

L^{*}

are further assumed in Assumption A1 to be totally observed with design tensors in

X_{s} \subset {X_{i}}_{i = 1}^{N}

, and the remaining uncorrupted positions are sampled uniformly through design tensors in

X_{s}^{⊥} = {X_{i}}_{i = 1}^{N} ∖ X_{s}

.

Let

y = (y_{1}, \dots, y_{N})^{⊤} \in R^{N}

and

ξ = (ξ_{1}, \dots, ξ_{N})^{⊤} \in R^{N}

be the vector of observations and noises, respectively. Define the design operator

X : R^{d_{1} \times d_{2} \times d_{3}} \to R^{N}

as

X (\cdot) : = (〈\cdot, X_{1}〉, \dots, 〈\cdot, X_{N}〉)^{⊤},

and its adjoint operator

X^{*} (z) : = \sum_{i = 1}^{N} z_{i} X_{i}

for any

z \in R^{N}

. Then the observation model (6) can be rewritten in a compact form

\begin{matrix} y = X (L^{*} + S^{*}) + ξ . \end{matrix}

3.2. The Proposed Estimator

The aim of robust tensor completion is to reconstruct the unknown low-rank

L^{*}

and sparse

S^{*}

from incomplete and noisy measurements

{(X_{i}, y_{i})}_{i = 1}^{N}

generated by the observation model (6). It can be treated as a robust extension of tensor completion in [33], and a noisy partial variant of tensor robust PCA [37].

To reconstruct the underlying low-rank tensor

L^{*}

and sparse tensor

S^{*}

, it is natural to consider the following minimization model:

\begin{matrix} min_{L, S} \frac{1}{2 N} {∥ y - X (L + S) ∥}_{2}^{2} + λ_{ι} r_{t b} (L) + λ_{s} {∥ S ∥}_{1}, \end{matrix}

(7)

where we use least squares as the fidelity term for Gaussian noises, the tubal rank as the regularization to impose low-rank structure in

L

, the tensor

l_{0}

-(pseudo)norm to regularize

S

for sparsity,

λ_{ι}, λ_{s} \geq 0

are tunable regularization parameters, balancing the regularizations and the fidelity term.

However, general rank and

l_{0}

-norm minimization is NP-hard [12,38], making it extremely hard to soundly solve Problem (7). For tractable low-rank and sparse optimization, we follow the most common idea to relax the non-convex functions

r_{t b} (\cdot)

and

{∥ \cdot ∥}_{0}

to their convex surrogates, i.e., the

*_{L}

–tubal nuclear norm

{∥ \cdot ∥}_{⋆}

and the tensor

l_{1}

-norm

{∥ \cdot ∥}_{1}

, respectively. Specifically, the following estimator is defined:

\begin{matrix} (\hat{L}, \hat{S}) : = \underset{{∥ L ∥}_{\infty} \leq a, {∥ S ∥}_{\infty} \leq a}{error} \frac{1}{2 N} ∥ y - X (L + S) ∥_{2}^{2} + λ_{ι} {∥ L ∥}_{⋆} + λ_{s} {∥ S ∥}_{1}, \end{matrix}

(8)

where

a > 0

is a known constant constraining the magnitude of entries in

L^{*}

and

S^{*}

. The additional constraint

{∥ L ∥}_{\infty} \leq a

and

{∥ S ∥}_{\infty} \leq a

is very mild since most signals and corruptions are of limited energy in real applications. It can also provide a theoretical benefit to exclude the “spiky” tensors, which is important in controlling the separability of

L^{*}

and

S^{*}

. Such “non-spiky” constraints are also imposed in previous literatures [36,39,40], playing a key role in bounding the estimation error.

Then, it is natural to ask the following questions:

Q1:: How to compute the proposed estimator?
Q2:: How well can the proposed estimator estimate $L^{*}$ and $S^{*}$ ?

We first discuss Q1 in Section 4 and then answer Q2 in Section 5.

4. Algorithm

In this section, we answer Q1 by designing an algorithm based on ADMM to compute the proposed estimator.

To solve Problem (8), the first step is to introduce auxiliary variables

g, K, T, M, N

to deal with the complex couplings between

X (\cdot)

,

{∥ \cdot ∥}_{2}

{∥ \cdot ∥}_{⋆}

,

{∥ \cdot ∥}_{1}

, and

{∥ \cdot ∥}_{\infty}

as follows:

\begin{matrix} min_{g, L, S, K, T, M, N} & \frac{1}{2 N} {∥ g ∥}_{2}^{2} + λ_{ι} {∥ K ∥}_{⋆} + λ_{s} {∥ T ∥}_{1} + δ_{a}^{\infty} (M) + δ_{a}^{\infty} (N), \\ s . t . & g = y - X (L + S), L = K = M, S = T = N \end{matrix}

(9)

where

δ_{a}^{\infty} (\cdot)

is the indicator function of tensor

l_{\infty}

-norm ball defined as follows

\begin{matrix} δ_{a}^{\infty} (M) = \{\begin{matrix} {0 ∥ M ∥}_{\infty} \leq a \\ + {\infty ∥ M ∥}_{\infty} > a \end{matrix} \end{matrix}

We then give the augmented Lagrangian of Equation (9) with Lagrangian multipliers

z

and

{Z_{i}}_{i = 1}^{4}

and penalty parameter

ρ > 0

:

\begin{matrix} L_{ρ} (L, S, g, K, T, M, N, z, Z_{1}, Z_{2}, Z_{3}, Z_{4}) \\ = \frac{1}{2 N} {∥ g ∥}_{2}^{2} + λ_{ι} {∥ K ∥}_{⋆} + λ_{s} {∥ T ∥}_{1} + δ_{a}^{\infty} (M) + δ_{a}^{\infty} (N) \\ + 〈z, g + X (L + S) - y〉 + \frac{ρ}{2} {∥ g + X (L + S) - y ∥}_{2}^{2} \\ + 〈Z_{1}, L - K〉 + \frac{ρ}{2} {∥ L - K ∥}_{F}^{2} + 〈Z_{2}, L - M〉 + \frac{ρ}{2} {∥ L - M ∥}_{F}^{2} \\ + 〈Z_{3}, S - T〉 + \frac{ρ}{2} {∥ S - T ∥}_{F}^{2} + 〈Z_{4}, S - N〉 + \frac{ρ}{2} {∥ S - N ∥}_{F}^{2} \end{matrix}

(10)

Following the framework of standard two-block ADMM [41], we separate the primal variables into two blocks

(L, S)

and

(g, K, T, M, N)

, and update them alternatively as follows:

Update the first block

(L, S)

: After the t-th iteration, we first update

(L, S)

by keeping the other variables fixed as follows:

\begin{matrix} (L^{t + 1}, S^{t + 1}) \\ = \underset{L, S}{error} L_{ρ} (L, S, g^{t}, K^{t}, T^{t}, M^{t}, N^{t}, z^{t}, Z_{1}^{t}, Z_{2}^{t}, Z_{3}^{t}, Z_{4}^{t}) \\ = \underset{L, S}{error} 〈z^{t}, g^{t} + X (L + S) - y〉 + \frac{ρ}{2} {∥ g^{t} + X (L + S) - y ∥}_{2}^{2} \\ + 〈Z_{1}^{t}, L - K^{t}〉 + \frac{ρ}{2} ∥ L - K^{t} ∥_{F}^{2} + 〈Z_{2}^{t}, L - M^{t}〉 + \frac{ρ}{2} {∥ L - M^{t} ∥}_{F}^{2} \\ + 〈Z_{3}^{t}, S - T^{t}〉 + \frac{ρ}{2} ∥ S - T^{t} ∥_{F}^{2} + 〈Z_{4}^{t}, S - N^{t}〉 + \frac{ρ}{2} {∥ S - N^{t} ∥}_{F}^{2} \end{matrix}

(11)

By taking derivatives, respectively, to

L

and

S

and setting them to zero, we obtain the following system of equations:

\begin{matrix} X^{*} z^{t} + ρ X^{*} (X (L + S) + g^{t} - y) + Z_{1}^{t} + ρ (L - K^{t}) + Z_{2}^{t} + ρ (L - M^{t}) & = 0 \\ X^{*} z^{t} + ρ X^{*} (X (L + S) + g^{t} - y) + Z_{3}^{t} + ρ (S - T^{t}) + Z_{4}^{t} + ρ (S - N^{t}) & = 0 \end{matrix}

(12)

Through solving the system of equations in Equation (12), we obtain

\begin{matrix} L^{t + 1} & = \frac{1}{4 ρ} (X^{*} X {(X^{*} X + I)}^{- 1} (2 A + B_{ι} + B_{s}) - 2 (A + B_{ι})) \\ S^{t + 1} & = \frac{1}{4 ρ} (X^{*} X {(X^{*} X + I)}^{- 1} (2 A + B_{ι} + B_{s}) - 2 (A + B_{s})) \end{matrix}

(13)

where

I

denotes the identity operator, and the intermediate tensors are given by

A = X^{*} (z^{t} + ρ g^{t} - y)

,

B_{ι} = Z_{1}^{t} + Z_{2}^{t} - ρ (K^{t} + M^{t})

, and

B_{s} = Z_{3}^{t} + Z_{4}^{t} - ρ (T^{t} + N^{t})

.

Update the second block

(g, K, T, M, N)

: According to the special form of the Lagrangian in Equation (10), the variables

g, K, T, M, N

in the second block can be updated separately as follows.

We first update

g

with fixed

(L, S)

:

\begin{matrix} g^{t + 1} & = \underset{g}{error} L_{ρ} (L^{t + 1}, S^{t + 1}, g, K, T, M, N, z^{t}, Z_{1}^{t}, Z_{2}^{t}, Z_{3}^{t}, Z_{4}^{t}) \\ = \underset{g}{error} \frac{1}{2 N} {∥ g ∥}_{2}^{2} + \frac{ρ}{2} {∥ g + X (L^{t + 1} + S^{t + 1}) - y + ρ^{- 1} z^{t} ∥}_{2}^{2} \\ = \frac{N ρ}{1 + N ρ} (y - X (L^{t + 1} + S^{t + 1}) - ρ^{- 1} z^{t}) \end{matrix}

(14)

We then update

K

with fixed

(L, S)

:

\begin{matrix} K^{t + 1} & = \underset{K}{error} L_{ρ} (L^{t + 1}, S^{t + 1}, g, K, T, M, N, z^{t}, Z_{1}^{t}, Z_{2}^{t}, Z_{3}^{t}, Z_{4}^{t}) \\ = \underset{K}{error} λ_{ι} {∥ K ∥}_{⋆} + 〈Z_{1}^{t}, L^{t + 1} - K〉 + \frac{ρ}{2} {∥ L^{t + 1} - K ∥}_{F}^{2} \\ = {Prox}_{λ_{ι} ρ^{- 1}}^{{∥ \cdot ∥}_{⋆}} (L^{t + 1} + ρ^{- 1} Z_{1}^{t}), \end{matrix}

(15)

where

{Prox}_{ρ^{- 1}}^{{∥ \cdot ∥}_{⋆}} (\cdot)

is the proximality operator of

*_{L}

–TNN given in the following lemma.

Lemma 1

(A modified version of Theorem 3.2 in [26]). Let

L_{0} \in R^{d_{1} \times d_{2} \times d_{3}}

be any tensor with

*_{L}

–SVD

L_{0} = U *_{L} D *_{L} V^{⊤}

. Then the proximality operator of

*_{L}

–TNN at

L_{0}

with constant

τ > 0

, defined as

{Prox}_{τ}^{{∥ \cdot ∥}_{⋆}} (L_{0}) : = {error}_{L} {τ ∥ L ∥}_{⋆} + \frac{1}{2} {∥ L - L_{0} ∥}_{F}

, can be computed by

\begin{matrix} {Prox}_{τ}^{{∥ \cdot ∥}_{⋆}} (L_{0}) = U *_{L} D_{τ} *_{L} V^{⊤} \end{matrix}

(16)

where

\begin{matrix} D_{τ} = L^{- 1} {(L (D) - τ)}_{+}) . \end{matrix}

(17)

where

t_{+}

denotes the positive part of t, i.e.,

t_{+} = max (t, 0)

.

We update

T

with fixed

(L, S)

:

\begin{matrix} T^{t + 1} & = \underset{T}{error} L_{ρ} (L^{t + 1}, S^{t + 1}, g, K, T, M, N, z^{t}, Z_{1}^{t}, Z_{2}^{t}, Z_{3}^{t}, Z_{4}^{t}) \\ = \underset{T}{error} λ_{s} {∥ T ∥}_{1} + 〈Z_{3}^{t}, S^{t + 1} - T〉 + \frac{ρ}{2} {∥ S^{t + 1} - T ∥}_{F}^{2} \\ = {Prox}_{λ_{s} ρ^{- 1}}^{{∥ \cdot ∥}_{1}} (S^{t + 1} + ρ^{- 1} Z_{3}^{t}), \end{matrix}

(18)

where

{Prox}_{τ}^{{∥ \cdot ∥}_{1}} (T)

is the proximality operator [19] of the tensor

l_{1}

-norm at point

T

given as

{Prox}_{τ}^{{∥ \cdot ∥}_{1}} (T) = sign (T) ⊙ {(| T | - τ)}_{+}

, where ⊙ denotes the element-wise product.

We then update

M

with fixed

(L, S)

:

\begin{matrix} M^{t + 1} & = \underset{M}{error} L_{ρ} (L^{t + 1}, S^{t + 1}, g, K, T, M, N, z^{t}, Z_{1}^{t}, Z_{2}^{t}, Z_{3}^{t}, Z_{4}^{t}) \\ = \underset{M}{error} δ_{a}^{\infty} (M) + 〈Z_{2}^{t}, L^{t + 1} - M〉 + \frac{ρ}{2} {∥ L^{t + 1} - M ∥}_{F}^{2} \\ = {Proj}_{a}^{{∥ \cdot ∥}_{\infty}} (L^{t + 1} + ρ^{- 1} Z_{2}^{t}), \end{matrix}

(19)

where

{Proj}_{a}^{{∥ \cdot ∥}_{\infty}} (\cdot)

is the projector onto the tensor

l_{\infty}

-norm ball of radius a, which is given by

{Proj}_{a}^{{∥ \cdot ∥}_{\infty}} (M) = sign (M) ⊙ min (| M |, a)

[19].

Similarly, we update

N

as follows:

\begin{matrix} N^{t + 1} & = \underset{N}{error} L_{ρ} (L^{t + 1}, S^{t + 1}, g, K, T, M, N, z^{t}, Z_{1}^{t}, Z_{2}^{t}, Z_{3}^{t}, Z_{4}^{t}) \\ = \underset{N}{error} δ_{a}^{\infty} (N) + 〈Z_{4}^{t}, S^{t + 1} - N〉 + \frac{ρ}{2} {∥ S^{t + 1} - N ∥}_{F}^{2} \\ = {Proj}_{a}^{{∥ \cdot ∥}_{\infty}} (S^{t + 1} + ρ^{- 1} Z_{4}^{t}) . \end{matrix}

(20)

Update the dual variables

z

and

{Z_{i}}

: According to the update strategy of dual variables in ADMM [41], the variables

z

and

{Z_{i}}

can be updated using dual ascent as follows:

\begin{matrix} z^{t + 1} & = z^{t} + ρ (g^{t + 1} + X (L^{t + 1} + S^{t + 1}) - y) \\ Z_{1}^{t + 1} & = Z_{1}^{t + 1} + ρ (L^{t + 1} - K^{t + 1}) \\ Z_{2}^{t + 1} & = Z_{2}^{t + 1} + ρ (L^{t + 1} - M^{t + 1}) \\ Z_{3}^{t + 1} & = Z_{3}^{t + 1} + ρ (S^{t + 1} - T^{t + 1}) \\ Z_{4}^{t + 1} & = Z_{4}^{t + 1} + ρ (S^{t + 1} - N^{t + 1}) \end{matrix}

(21)

The algorithm for solving Problem (8) is summarized in Algorithm 1.

Algorithm 1 Solving Problem (8) using ADMM.

Input:: The design tensors ${X_{i}}$ and observations ${y_{i}}$ , the regularization parameters $λ_{ι}, λ_{s}$ , the $l_{1}$ -norm bound a, the penalty parameter $ρ$ of the Lagrangian, the convergence tolerance $δ$ , the maximum iteration number $T_{max}$ .
1:: Initialize $t = 0$ , $g^{0} = z^{0} = 0 \in R^{N}, L^{0} = S^{0} = K^{0} = T^{0} = M^{0} = N^{0} = Z_{1}^{0} = Z_{2}^{0} = Z_{3}^{0} = Z_{4}^{0} = 0 \in R^{d_{1} \times d_{2} \times d_{3}}$
2:: for $t = 0, \dots, T_{max}$ do
3:: Update $(L^{t + 1}, S^{t + 1})$ by Equation (13);
4:: Update $(g^{t + 1}, K^{t + 1}, T^{t + 1}, M^{t + 1}, N^{t + 1})$ by Equations (14)–(20), respectively;
5:: Update $(z^{t + 1}, Z_{1}^{t + 1}, Z_{2}^{t + 1}, Z_{3}^{t + 1}, Z_{4}^{t + 1})$ by Equation (21);
6:: Check the convergence criteria:
7:: (i) convergence of primal variables:

$\begin{matrix} ∥ A^{t + 1} - A^{t} ∥_{\infty} \leq δ, \forall A \in {g, L, S, K, T, M, N} \end{matrix}$

(ii) convergence of constraints:

$\begin{matrix} max {∥ L^{t + 1} - K^{t + 1} ∥_{\infty}, ∥ L^{t + 1} - M^{t + 1} ∥_{\infty}} & \leq δ max {∥ S^{t + 1} - T^{t + 1} ∥_{\infty}, ∥ S^{t + 1} - N^{t + 1} ∥_{\infty}} & \leq δ ∥ g^{t + 1} + X (L^{t + 1} + S^{t + 1}) - y ∥_{\infty} & \leq δ \end{matrix}$
8:: end for
Output:: $(\hat{L}, \hat{S}) = (L^{t + 1}, S^{t + 1})$ .

Complexity Analysis: The time complexity of Algorithm 1 is analyzed as follows. Due to the special structures of design tensors

{X_{i}}

, the operators

X

and

{(X^{*} X X^{*} X + I)}^{- 1}

can be implemented with time cost

O (N)

and

O (d_{1} d_{2} d_{3} + N)

, respectively. The cost of updating

L, S, T, M, N

and

{Z_{i}}

is

O (d_{1} d_{2} d_{3})

. The main time cost in Algorithm 1 lies in the update of

K

which needs the

*_{L}

–SVD on

d_{1} \times d_{2} \times d_{3}

tensors, involving the

*_{L}

-transform (costing

O (d_{1} d_{2} d_{3}^{2})

in general), and

d_{3}

matrix SVDs on

d_{1} \times d_{2}

matrices (costing

O (d_{1} d_{2} d_{3} (d_{1} \land d_{2}))

). Thus, the one-iteration cost of Algorithm 1 is

\begin{matrix} O (d_{1} d_{2} d_{3} ((d_{1} \land d_{2}) + d_{3})) \end{matrix}

(22)

in general, and can be reduced to

O (d_{1} d_{2} d_{3} ((d_{1} \land d_{2}) + log d_{3}))

for some linear transforms L which have fast implementations (like DFT and DCT).

Convergence Analysis: According to [28], the convergence rate of general ADMM-based algorithms is

O (1 / t)

, where t is the iteration number. The convergence analysis of Algorithm 1 is established in Theorem 2.

Theorem 2

(Convergence of Algorithm 1). For any positive constant ρ, if the unaugmented Lagrangian function

L_{0} (g, L, S, K, T, M, N, z, Z_{1}, Z_{2}, Z_{3}, Z_{4})

has a saddle point, then the iterations

(g^{t}, L^{t}, S^{t}, K^{t}, T^{t}, M^{t}, N^{t}, z^{t}, Z_{1}^{t}, Z_{2}^{t}, Z_{3}^{t}, Z_{4}^{t})

in Algorithm 1 satisfy the residual convergence, objective convergence and dual variable convergence (defined in [41]) of Problem (9) as

t \to \infty

.

Proof.

The key idea is to rewrite Problem (9) into a standard two-block ADMM problem. For notational simplicity, let

\begin{matrix} f (u) = 0, g (v) = \frac{1}{2 N} {∥ g ∥}_{2}^{2} + λ_{ι} {∥ K ∥}_{⋆} + λ_{s} {∥ S ∥}_{1} + δ_{a}^{\infty} (M) + δ_{a}^{\infty} (N), \end{matrix}

with

u, v, w, c

and

A

defined as follows

\begin{matrix} u = [\begin{matrix} vec (L) \\ vec (S) \end{matrix}] \in R^{2 d_{1} d_{2} d_{3}}, v = [\begin{matrix} g \\ vec (K) \\ vec (T) \\ vec (M) \\ vec (N) \end{matrix}] \in R^{N + 4 d_{1} d_{2} d_{3}}, \\ w = [\begin{matrix} z \\ vec (Z_{1}) \\ vec (Z_{2}) \\ vec (Z_{3}) \\ vec (Z_{4}) \end{matrix}] \in R^{N + 4 d_{1} d_{2} d_{3}}, c = [\begin{matrix} - y \\ 0 \\ 0 \\ 0 \\ 0 \end{matrix}] \in R^{N + 4 d_{1} d_{2} d_{3}}, \end{matrix}

and

\begin{matrix} A = [\begin{matrix} - X & - X \\ I_{d_{1} d_{2} d_{3}} & 0 \\ I_{d_{1} d_{2} d_{3}} & 0 \\ 0 & I_{d_{1} d_{2} d_{3}} \\ 0 & I_{d_{1} d_{2} d_{3}} \end{matrix}] \in R^{(N + 4 d_{1} d_{2} d_{3}) \times (2 d_{1} d_{2} d_{3})}, w i t h X = [\begin{matrix} vec (X_{1})^{⊤} \\ vec (X_{2})^{⊤} \\ ⋮ \\ vec (X_{N})^{⊤} \end{matrix}] \in R^{N \times (2 d_{1} d_{2} d_{3})}, \end{matrix}

(23)

where

vec (\cdot)

denotes the operation of tensor vectorization (see [30]).

It can be verified that

f (\cdot)

and

g (\cdot)

are closed, proper convex functions. Then, Problem (9) can be re-written as follows:

\begin{matrix} min_{u, v} & f (u) + g (v) \\ s . t . & A u - v = c . \end{matrix}

According to the convergence analysis in [41], we have:

\begin{matrix} objective convergence : & lim_{t \to \infty} f (u^{t}) + g (v^{t}) = f^{⋆} + g^{⋆}, \\ dual variable convergence : & lim_{t \to \infty} w^{t} = w^{⋆}, \\ constraint convergence : & lim_{t \to \infty} A u^{t} - v^{t} = c, \end{matrix}

where

f^{⋆}, g^{⋆}

are the optimal values of

f (u)

,

g (v)

, respectively. Variable

w^{⋆}

is a dual optimal point defined as:

\begin{matrix} w^{⋆} = w = [\begin{matrix} z^{⋆} \\ vec (Z_{1}^{⋆}) \\ vec (Z_{2}^{⋆}) \\ vec (Z_{3}^{⋆}) \\ vec (Z_{4}^{⋆}) \end{matrix}] \end{matrix}

where

(z^{⋆}, Z_{1}^{⋆}, Z_{2}^{⋆}, Z_{3}^{⋆}, Z_{4}^{⋆})

is the component of dual variables in a saddle point

(g^{⋆}, L^{⋆}, S^{⋆}, K^{⋆}, T^{⋆}, M^{⋆}, N^{⋆}, z^{⋆}, Z_{1}^{⋆}, Z_{2}^{⋆}, Z_{3}^{⋆}, Z_{4}^{⋆})

of the unaugmented Lagrangian

L_{0} (g, L, S, K, T, M, N, z, Z_{1}, Z_{2}, Z_{3}, Z_{4})

. □

5. Statistical Performance

In this section, we answer Q2 by studying the statistical performances of the proposed estimator

(\hat{L}, \hat{S})

. Specifically, the goal is to upper bound the squared F-norm error

∥ \hat{L} - L^{*} ∥_{F}^{2} + {∥ \hat{S} - S^{*} ∥}_{F}^{2}

. We will first give an upper bound on the estimation error in a non-asymptotic manner, and then prove that the upper bound is minimax optimal up to a logarithm factor.

5.1. Upper Bound on the Estimation Error

We establish an upper bounds on the estimation error in the following theorem. For notational simplicity, let

N_{s} = | X_{s} |

and

N_{ι} = | X_{s}^{⊥} |

denote the number of corrupted and uncorrupted observations of

L^{*}

in the observation model (6), respectively.

Theorem 3

(Upper bounds on the estimation error). If the number of uncorrupted observations in the observation model (6) satisfy

\begin{matrix} N_{ι} \geq c_{1} d_{1} d_{3} log (d_{1} d_{3} + d_{2} d_{3}) {log}^{2} (d_{1} + d_{2}) \end{matrix}

(24)

and regularization parameters in Problem (8) are set by

\begin{matrix} λ_{ι} = c_{2} (σ \lor a) \sqrt{\frac{log (d_{1} d_{3} + d_{2} d_{3})}{d_{1} \land d_{2}}}, a n d λ_{s} = c_{3} (σ \lor a) \frac{l o g (d_{1} d_{3} + d_{2} d_{3})}{N}, \end{matrix}

(25)

then it holds with probability at least

1 - c_{5} {(d_{1} d_{3} + d_{2} d_{3})}^{- 1}

that:

\begin{matrix} \frac{∥ \hat{L} - L^{*} ∥_{F}^{2} + {∥ \hat{S} - S^{*} ∥}_{F}^{2}}{d_{1} d_{2} d_{3}} \\ \leq C (r_{t b} (L^{*}) \cdot \frac{(σ^{2} \lor a^{2}) (d_{1} \lor d_{2}) d_{3} log \tilde{d}}{N_{ι}} + \frac{N_{s} log (d_{1} d_{3} + d_{2} d_{3})}{N_{ι}} + {∥ S^{*} ∥}_{0} \cdot \frac{a^{2}}{d_{1} d_{2} d_{3}}) . \end{matrix}

(26)

Theorem 3 implies that, if the noise level

σ

and spikiness level a are fixed, and all the corrupted positions are observed exactly only once (i.e., the number of corrupted observations

N_{s} = {∥ S^{*} ∥}_{0}

), then the estimation error in Equation (26) would be bounded by

\begin{matrix} O (r_{t b} (L^{*}) \cdot \frac{(d_{1} \lor d_{2}) d_{3} log (d_{1} d_{3} + d_{2} d_{3})}{N_{ι}} + {∥ S^{*} ∥}_{0} \cdot (\frac{log (d_{1} d_{3} + d_{2} d_{3})}{N_{ι}} + \frac{1}{d_{1} d_{2} d_{3}})) . \end{matrix}

(27)

Note that, the bound in Equation (27) is intuition-consistent: if the underlying tensor

L^{*}

gets more complex (i.e., with higher tubal rank), then the estimation error will be larger; if the corruption tensor

S^{*}

gets denser, then the estimation error will also become larger; if the number of uncorrupted observations

N_{ι}

gets larger, then the estimation error will decrease. The scaling behavior of the estimation error in Equation (27) will be verified through experiments on synthetic data in Section 7.1.

Remark 1

(Consistence with prior models for robust low-tubal-rank tensor completion). According to Equation (27), our

*_{L}

–SVD-based estimator in Equation (8) allows the tubal rank

r_{t b} (L^{*})

to take the order

O (d_{2} / log (d_{1} d_{3} + d_{2} d_{3}))

, and the corruption ratio

∥ S^{*} ∥_{0} / (d_{1} d_{2} d_{3})

to be

O (1)

for approximate estimation with small error. It is slightly better with a logarithm factor than the results for t-SVD-based tensor robust completion model in [8] which allows

r_{t b} (L^{*}) = O (d_{2} / {log}^{2} (d_{1} d_{3} + d_{2} d_{3}))

and

∥ S^{*} ∥_{0} / (d_{1} d_{2} d_{3}) = O (1)

.

Remark 2

(Consistence with prior models for noisy low-tubal-rank tensor completion). If

∥ S^{*} ∥_{0} = 0

, i.e., the corruption

S^{*}

vanishes, then we obtain

\begin{matrix} \frac{∥ \hat{L} - L^{*} ∥_{F}^{2}}{d_{1} d_{2} d_{3}} = O (\frac{r_{t b} (L^{*}) (d_{1} \lor d_{2}) d_{3} log (d_{1} d_{3} + d_{2} d_{3})}{N}) \end{matrix}

which is consistent with the error bound for t-SVD-based noisy tensor completion [42,43,44], and

*_{L}

–SVD-based tensor Dantzig Selector in [18].

Remark 3

(Consistence with prior models for robust low-tubal-rank tensor decomposition). In the setting of Robust Tensor Decomposition (RTD) [34], the fully observed model instead of our estimation model in Equation (6) is considered. For the RTD problem, our error bound in Equation (27) is consistent with the t-SVD-based bound for RTD [34] (up to a logarithm factor).

Remark 4

(No exact recovery guarantee). According to Theorem 3, when

σ = 0

and

∥ S^{*} ∥_{0} = 0

, i.e., in the noiseless case, the estimation error is upper bounded by

O (a (d_{1} \lor d_{2}) d_{3} r_{t b} (L^{*}) log \tilde{d} / N)

which is not zero. Thus, no exact recovery is guaranteed by Theorem 3. It can be seen as a trade-off that we do not assume the low-tubal-rank tensor

L^{*}

to satisfy the tensor incoherent conditions [8,35,37] which essentially ensures the separability between

L^{*}

and

S^{*}

.

5.2. A Minimax Lower Bound for the Estimation Error

In Theorem 3, we established the estimation error for Model (8). Then one may ask the complementary questions: how tight is the upper bound? Are there fundamental (model-independent) limits of estimation error in robust tensor completion? In this section, we will answer the questions.

To analyze the optimality of the proposed upper bound in Theorem 3, the minimax lower bounds of the estimation error is established for the tensor pair

(L^{*}, S^{*})

belonging to the class

A (r, s, a)

of tensor pairs defined as:

\begin{matrix} A (r, s, a) : = \{(L, S) | r_{t b} (L) \leq {r, ∥ S ∥}_{0} \leq {s, ∥ L ∥}_{\infty} < a, {∥ S ∥}_{\infty} \leq a\} \end{matrix}

(28)

We then define the associated element-wise minimax error as follows

\begin{matrix} M (A (r, s, a)) : = inf_{(\hat{L}, \hat{S})} sup_{(L^{*}, S^{*}) \in A (r, s, a)} E [\frac{∥ \hat{L} - L^{*} ∥_{F}^{2} + {∥ \hat{S} - S^{*} ∥}_{F}^{2}}{d_{1} d_{2} d_{3}}], \end{matrix}

(29)

where the infimum ranges over all pairs of estimators

(\hat{L}, \hat{S})

, the supremum ranges over all pairs of underlying tensors

(L^{*}, S^{*})

in the given tensor class

A (r, s, a)

, and the expectation is taken over the design tensors

{X_{i}}

and i.i.d. Gaussian noises

{ξ_{i}}

in the observation model (6). We come up with the following theorem.

Theorem 4

(Minimax lower bound). Suppose the dimensionality

d_{1}, d_{2} \geq 2

, the rank and sparsity parameters

r \in [d_{1} \lor d_{2}], s \leq d_{1} d_{2} d_{3} / 2

, the number of uncorrupted entries

N_{ι} \geq r d_{1} d_{3}

, and the number of corrupted entries

N_{s} \leq τ r \tilde{d}

with a constant

τ > 0

. Then, under Assumption A1, there exist absolute constants

b \in (0, 1)

and

c > 0

, such that

\begin{matrix} M (A (r, s, a)) \geq b ϕ (N, r, s) \end{matrix}

(30)

where

\begin{matrix} ϕ (N, r, s) : = {(σ \land a)}^{2} (\frac{r (d_{1} + d_{2}) d_{3} + N_{s}}{N_{ι}} + \frac{s}{d_{1} d_{2} d_{3}}) . \end{matrix}

(31)

The lower bound given in Equation (30) implies that the proposed upper bound in Theorem 3 is optimal (up to a log factor) in the minimax sense for tensors belonging to the set

A (r, s, a)

. That is to say no estimator can obtain more accurate estimation than our estimator in Equation (8) (up to a log factor) for

(L^{*}, S^{*}) \in A (r, s, a)

, thereby showing the optimality of the proposed estimator.

6. Connections and Differences with Previous Works

In this section, we discuss the connections and differences with existing nuclear norm based robust matrix/tensor completion models, where the underlying matrix/tensor suffers from missing values, gross sparse outliers, and small dense noises at the same time.

First, we briefly introduce and analyze the two most related models, i.e., the matrix nuclear norm based model [36] and the sum of mode-wise matrix nuclear norms based model [45] as follows.

(1): The matrix Nuclear Norm (NN) based model [36]: If the underlying tensor is of 2-way, i.e., a matrix, then the observation model in Equation (6) becomes the setting for robust matrix completion, and the proposed estimator in Equation (8) degenerates to the matrix nuclear norm based estimator in [36]. In both model formulation and statistical analysis, this work can be seen as a 3-way generalization of [36].
Moreover, by conducting robust matrix completion on each frontal slice of a 3-way tensor, we can obtain the matrix nuclear norm based robust tensor completion model as follows:

$\begin{matrix} min_{L, S} \frac{1}{2 N} {∥ y - X (L + S) ∥}_{2}^{2} + λ_{ι} \sum_{k = 1}^{d_{3}} (∥ L^{(k)} ∥_{*} + λ_{s} ∥ S^{(k)} ∥_{1}) \end{matrix}$

(32)
(2): The Sum of mode-wise matrix Nuclear Norms (SNN) based model [45]: Huang et al. [45] proposed a robust tensor completion model based on the sum of mode-wise nuclear norms deduced by the Tucker decomposition as follows

$\begin{matrix} min_{L, S} \frac{1}{2 N} {∥ y - X (L + S) ∥}_{2}^{2} + \sum_{k = 1}^{3} α_{k} ∥ L_{(k)} ∥_{*} + λ_{s} {∥ S ∥}_{1}, \end{matrix}$

(33)

where $L_{(k)} \in R^{d_{i} \times \prod_{j \neq k} d_{j}}$ is the mode-k matriculation of tensor $L \in R^{d_{1} \times d_{2} \times d_{3}}$ , for all $i = 1, 2, 3$ .
The main differences between SNN and this work are two-fold: (i) SNN is based on the Tucker decomposition [15], whereas this work is based on the recently proposed tensor $*_{L}$ -SVD [24]; (ii) the theoretical analysis for SNN cannot guarantee the minimax optimality of the model in [45], whereas this works rigorously proof of the minimax optimality of the proposed estimator is established in Section 5.

Then, we discuss the following related works which can be seen as special cases of this work.

(1): The robust tensor completion model based on t-SVD [46]: In a short conference presentation [46] (whose first author is the same as this paper), the t-SVD-based robust tensor completion model is studied. As t-SVD can be viewed as a special case of the $*_{L}$ -SVD (when DFT is used as the transform L), the model in [46] can be a special case of ours.
(2): The robust tensor recovery models with missing values and sparse outliers [8,27]: In [8,27], the authors considered the robust reconstruction of incomplete tensor polluted by sparse outliers, and proposed t-SVD (or $*_{L}$ -SVD) based models with theoretical guarantees for exact recovery. As they did not consider small dense noises, their settings are indeed a special case of our observation model (6) when $E = 0$ .
(3): The robust tensor decomposition based on t-SVD [34]: In [34], the authors studied the t-SVD-based robust tensor decomposition, which aims at recovering a tensor corrupted by both gross sparse outliers and small dense noises. Comparing with this work, Ref. [34] can be seen as a special case when there are no missing values.

7. Experiments

In this section, experiments on synthetic datasets will be first conducted to validate the sharpness of the proposed upper bounds in Theorem 3. Then, both effectiveness and efficiency of the proposed algorithm will be demonstrated through experiments on seven different types of remote sensing datasets. All codes are written in Matlab, and all experiments are performed on a Windows 10 laptop with AMD Ryzen 3.0 GHz CPU and 8 GB RAM.

7.1. Sharpness of the Proposed Upper Bound

Sharpness of the proposed upper bounds in Theorem 3 will be validated. Specifically, we will check whether the upper bounds in Equation (27) can reflect the true scaling behavior of the estimation error. As predicted in Equation (27), if the upper bound is “sharp”, then it is expected that the Mean Square Errors (MSE)

(∥ \hat{L} - L^{*} ∥_{F}^{2} + ∥ \hat{S} - S^{*} ∥_{F}^{2}) / (d_{1} d_{2} d_{3})

will possess a scaling behavior very similar to the upper bound: approximately linear w.r.t the tubal rank of the underlying tensor

L^{*}

, the

l_{0}

-norm of the corruption tensor

S^{*}

, and the reciprocal of uncorrupted observation number

N_{ι}

. We will examine whether this expectation will happen in simulation studies on synthetic datasets.

The synthetic datasets are generated as follows. Similar to [26], we consider three cases of linear transform L with orthogonal matrix

L

: (1) Discrete Fourier Transform (DFT); (2) Discrete Cosine Transform (DCT) [24]; (3) Random Orthogonal Matrix (ROM) [26]. The underlying low-rank tensor

L^{*} \in R^{d_{1} \times d_{2} \times d_{3}}

with

*_{L}

-tubal rank

r^{*}

is generated by

L^{*} = P *_{L} Q

, where

P \in R^{d_{1} \times r^{*} \times d_{3}}

and

Q \in R^{r^{*} \times d_{2} \times d_{3}}

are i.i.d. sampled from

N (0, 1)

.

L^{*}

is then normalized such that

∥ L^{*} ∥_{\infty} = 1

. Second, to generate the sparse corruption tensor

S^{*}

, we first form

S_{0}

with i.i.d. uniform distribution

Uni (0, 1)

and then uniformly select

γ d_{1} d_{2} d_{3}

entries. Thus the number of corrupted entries

∥ S^{*} ∥_{0} = γ d_{1} d_{2} d_{3}

. Third, we uniformly select

N_{ι}

elements from the uncorrupted positions of

(L^{*} + S^{*})

. Finally, the noise

{ξ_{i}}

are sampled from i.i.d. Gaussian

N (0, σ^{2})

with

σ = 0.1 ∥ L^{*} ∥_{F} / \sqrt{d_{1} d_{2} d_{3}}

. We consider f-diagonal tensors with

d_{1} = d_{2} = d \in {80, 100, 120}

,

d_{3} = 30

and tubal rank

r_{t b} (L^{*}) \in {3, 6, 9, 12, 15}

. We choose corruption ratio

γ \in {0.01 : 0.01 : 0.1}

and uncorrupted observation ratio

N_{ι} / (d_{1} d_{2} d_{3} - N_{s}) \in {0.4 : 0.1 : 0.9}

. In each setting, the MSE averaged over 30 trials is reported.

In Figure 3, we report the results for

100 \times 100 \times 30

tensors when the DFT is adopted as the linear transform L in Equation (4). According to sub-plots (a), (b), and (d) in Figure 3, it can be seen that the MSE scales approximately linearly w.r.t.

r_{t b} (L^{*})

,

∥ S^{*} ∥_{0}

, and

N_{ι}^{- 1}

. There results accord well with our expectation for the size

100 \times 100 \times 30

and linear transform

L = DFT

. As very similar phenomena are also observed in all the other settings where

d \in {80, 120}

and

L \in {DCT, ROM}

, we simply omit them.Thus, it can be verified that the scaling behavior of the estimation error can be approximately predicted by the proposed upper bound in Equation (27).

7.2. Effectiveness and Efficient of the Proposed Algorithm

In this section, we evaluate both the effectiveness and efficiency of the proposed Algorithm 1 by conducting robust tensor completion on seven different types of datasets collected from several remote sensing related applications from Section 7.2.1– Section 7.2.7.

Following [25], we adopted three different transformations L in Equation (4) to define the

*_{L}

–TNN: the first two transformations are DFT and DCF (denoted by TNN (DFT) and TNN (DCT), respectively), and the third one named TNN (Data) depends on the given data motived by [27,31]. We first perform SVD on the mode-3 unfolding matrix of

L^{*}

as

L_{(3)}^{*} = U S V^{⊤}

, and then use

U^{⊤}

as the desired transform matrix in the

*_{L}

–product (4). The proposed algorithm is compared with the aforementioned models NN [36] in Equation (32) and SNN [45] in Equation (33) in Section 6. Both Model (32) and Model (33) are solved by using ADMM with implementations by ourselves in Matlab language.

We conduct robust tensor completion on the datasets in Figure 4 with a similar settings as [47]. For a

d_{1} \times d_{2} \times d_{3}

tensor data

L^{*}

re-scaled by

∥ L^{*} ∥_{\infty} = 1

, we choose its support uniformly at random with ratio

ρ_{s}

and fill in the values with i.i.d. standard Gaussian variables to generate the corruption

S^{*}

. Then, we randomly sample the entries of

L^{*} + S^{*}

uniformly with observation ratio

ρ_{obs}

. The noises

{ξ_{i}}

are further generated with i.i.d. zero-mean Gaussian entries whose standard deviation is given by

σ = 0.05 ∥ L^{*} ∥_{F} / \sqrt{d_{1} d_{2} d_{3}}

to generate the observations

{y_{i}}

. The goal in the experiments is to estimate the underlying signal

L^{*}

from

{y_{i}}

. The effectiveness of algorithms are measured by the Peaks Signal Noise Ratio (PSNR) and structural similarity (SSIM) [48]. Specifically, the PSNR of an estimator

\hat{L}

is defined as

\begin{matrix} PSNR : = 10 {log}_{10} (\frac{d_{1} d_{2} d_{3} {∥ L^{*} ∥}_{\infty}^{2}}{∥ \hat{L} - L^{*} ∥_{F}^{2}}), \end{matrix}

for the underlying tensor

L^{*} \in R^{d_{1} \times d_{2} \times d_{3}}

. The SSIM is computed via

\begin{matrix} SSIM : = \frac{(2 μ_{L^{*}} μ_{\hat{L}} + {(0.01 \bar{ω})}^{2}) (2 σ_{L^{*}, \hat{L}} + {(0.03 \bar{ω})}^{2})}{(μ_{L^{*}}^{2} + μ_{\hat{L}}^{2} + {(0.01 \bar{ω})}^{2}) (σ_{L^{*}}^{2} + σ_{\hat{L}}^{2} + {(0.03 \bar{ω})}^{2})}, \end{matrix}

where

μ_{L^{*}}, μ_{\hat{L}}, σ_{L^{*}}, σ_{\hat{L}}, σ_{L^{*}, \hat{L}}

and

\bar{ω}

denotes the local means, standard deviation, cross-covariance, and dynamic range of the magnitude of tensors

L^{*}

and

\hat{L}

. Larger PSNR and SSIM values indicate higher quality of the estimator

\hat{L}

.

7.2.1. Experiments on an Urban Area Imagery Dataset

Area imagery data processing plays a key role in many remote sensing applications, such as land-use mapping [49]. We adopt the popular area imagery dataset UCMerced [50], which is a 21 class land use image dataset meant for research purposes. The images were manually extracted from large images from the USGS National Map Urban Area Imagery collection for various urban areas around the country. The pixel resolution of this public domain imagery is 1 foot, and each RGB image measures

256 \times 256

pixels. There are 100 images for each class, and we chose the 85-th image to form a dataset of 21 images as shown in Figure 4.

We consider two scenarios by setting

(ρ_{obs}, ρ_{s}) \in {(0.3, 0.2), (0.8, 0.3)}

for the

d \times d \times 3

images. For NN (Model (32)), we set the regularization parameters

λ_{s} = λ_{ι} / \sqrt{d ρ_{obs}}

(suggested by [38]), and tune the parameter

λ_{ι}

around

6.5 σ \sqrt{ρ_{obs} d log (6 d)}

(suggested by [51]). For SNN, the parameter

λ_{s}

is tuned in

{0.01, 0.05, 0.1, 1}

for better performance in most cases, and the weight

α

is set by

α_{1} = α_{2} = λ_{s} \sqrt{3 d ρ_{obs}}, α_{3} = 0.01 λ_{s} \sqrt{3 d ρ_{obs}}

. For Algorithm 1, we tune

λ_{ι}

around

2 σ \sqrt{3 ρ_{obs} d log (6 d)}

, and let

λ_{s} = λ_{ι} / \sqrt{3 d ρ_{obs}}

for TNN (DFT) and

λ_{s} = λ_{ι} / \sqrt{d ρ_{obs}}

for TNN (DCT) and TNN (Data). In each setting, we test each image for 10 trials and report the averaged PSNR (in db), SSIM and running time (in seconds).

We present the PSNR, SSIM values and running time in Figure 5 and Figure 6 for settings of

(ρ_{obs}, ρ_{s}) = (0.3, 0.2)

and

(ρ_{obs}, ρ_{s}) = (0.8, 0.3)

, respectively, for quantitative evalution, with visual examples shown in Figure 7 and Figure 8. It can seen that from Figure 5, Figure 6, Figure 7 and Figure 8 that the proposed TNN (Data) has the highest recovery quality in most cases, and posses a comparative running time as NN. We attribute the promising performance of the proposed algorithm to the extraordinary representation power of the low-tubl-rank models: low-tubal-rankness can exploit both low-rankness and smoothness simultaneously, whereas traditional models like NN and SNN can only exploit low-rankness in the original domain [18].

7.2.2. Experiments on Hyperspectral Data

Benefit from its fine spectral and spatial resolutions, hyperspectral image processing has been extensively adopted in many remote sensing applications [10,52]. In this section, we conduct robust tensor completion on subsets of the two representative hyperspectral datasets described as follows:

Indian Pines: This dataset was collected by AVIRIS sensor in 1992 over the Indian Pines test site in North-western Indiana and consists of $145 \times 145$ pixels and 224 spectral reflectance bands. We use the first 30 bands in the experiments due to the trade-off between the limitation of computing resources and the efforts for parameter tuning.
Salinas A: The data were acquired by AVIRIS sensor over the Salinas Valley, California in 1998, and consists of 224 bands over a spectrum range of 400–2500 nm. This dataset has a spatial extent of $86 \times 83$ pixels with a resolution of 3.7 m. We use the first 30 bands in the experiments too.

We consider three settings, i.e., Setting I

(ρ_{obs} = 0.3

,

ρ_{s} = 0.2)

, Setting II

(ρ_{obs} = 0.6

,

ρ_{s} = 0.25)

, and Setting III

(ρ_{obs} = 0.8, ρ_{s} = 0.3)

for robust completion of hyper-spectral data. For NN, we set the regularization parameters

λ_{s} = λ_{ι} / \sqrt{ρ_{obs} (d_{1} \lor d_{2})}

(suggested by [38]), and tune the parameter

λ_{ι}

around

6.5 σ \sqrt{ρ_{obs} (d_{1} \lor d_{2}) log (d_{1} d_{3} + d_{2} d_{3})}

(suggested by [51]). For SNN, the parameter

λ_{s}

is tuned in

{0.01, 0.05, 0.1, 1}

for better performance in most cases, and we chose the weight

α

by

α_{1} = α_{2} = α_{3} = λ_{s} \sqrt{ρ_{obs} (d_{1} \lor d_{2}) d_{3}}

(suggested by [47]). For Algorithm 1, we tune the parameter

λ_{ι}

around

2 σ \sqrt{ρ_{obs} (d_{1} \lor d_{2}) d_{3} log (d_{1} d_{3} + d_{2} d_{3})}

, and let

λ_{s} = λ_{ι} / \sqrt{ρ_{obs} (d_{1} \lor d_{2}) d_{3}}

for TNN (DFT) and

λ_{s} = λ_{ι} / \sqrt{ρ_{obs} (d_{1} \lor d_{2})}

for TNN (DCT) and TNN (Data). In each setting, we test each image for 10 trials and report the averaged PSNR (in db), SSIM and running time (in seconds).

For quantitative evalution, we report the PSNR, SSIM values and running time in Table 2 and Table 3 for the Indian Pines and Salinas A datasets, respectively. The visual examples are, respectively, shown in Figure 9 and Figure 10. It can seen that the proposed TNN (Data) has the highest recovery quality in most cases, and has a comparative running time as NN, indicating the effectiveness and efficiency of low-tubal-rank models in comparison with original domain-based models NN and SNN.

7.2.3. Experiments on Multispectral Images

Multispectral imaging captures image data within specific wavelength ranges across the electromagnetic spectrum, and has become one of the most widely utilized datatype in remote sensing. This section presents simulated experiments on multispectral images. The original data are two multispectral images Beads and Cloth from the Columbia MSI Database (available at http://www1.cs.columbia.edu/CAVE/databases/multispectral accessed on 28 July 2021) containing scenes of a variety of real-world objects. Each MSI is of size 512 × 512 × 31 with intensity range scaled to [0, 1].

We also consider three settings, i.e., Setting I

(ρ_{obs} = 0.3, ρ_{s} = 0.2)

, Setting II

(ρ_{obs} = 0.6,

ρ_{s} = 0.25)

, and Setting III

(ρ_{obs} = 0.8, ρ_{s} = 0.3)

for robust completion of multi-spectral data. We tune the parameters in the same way as Section 7.2.2. In each setting, we test each image for 10 trials and report the averaged PSNR (in db), SSIM and running time (in seconds).

For quantitative evalution, we report the PSNR, SSIM values and running time in Table 4 and Table 5 for the Beads and Cloth datasets, respectively. The visual examples for the Cloth dataset is shown in Figure 11. We can also find that the proposed TNN (Data) achieves the highest accuracy in most cases, and has a comparative running time as NN, which demonstrates both the effectiveness and efficiency of low-tubal-rank models.

7.2.4. Experiments on Point Could Data

With the rapid advances of sensor technology, the emerging point cloud data provide better performance than 2D images in many remote sensing applications due to its flexible and scalable geometric representation [53]. In this section, we also conduct experiments on a dataset (scenario B from http://www.mrt.kit.edu/z/publ/download/velodynetracking/dataset.html, accessed on 28 July 2021) for Unmanned Ground Vehicle (UGV). The dataset contains a sequence of point cloud data acquired from a Velodyne HDL-64E LiDAR. We select 30 frames (Frame Nos. 65-94) from the data sequence. The point cloud data is formatted into two tensors sized

64 \times 870 \times 30

representing the distance data (named SenerioB Distance) and the intensity data (named SenerioB Intensity), , respectively.

We also consider three settings, i.e., Setting I

(ρ_{obs} = 0.3, ρ_{s} = 0.2)

, Setting II

(ρ_{obs} = 0.6,

ρ_{s} = 0.25)

, and Setting III

(ρ_{obs} = 0.8, ρ_{s} = 0.3)

for robust completion of point cloud data. We tune the parameters in the same way as Section 7.2.2. In each setting, we test each image for 10 trials and report the averaged PSNR (in db), SSIM and running time (in seconds). For quantitative evalution, we report the PSNR, SSIM values and running time in Table 6 and Table 7 for the SenerioB Distance and SenerioB Intensity datasets, respectively. We can also find that the proposed TNN (Data) achieves the highest accuracy in most cases, and has a comparative running time as NN, which demonstrates both the effectiveness and efficiency of low-tubal-rank models.

7.2.5. Experiments on Aerial Video Data

Aerial videos (or time sequences of images) are broadly used in many computer vision based remote sensing tasks [54]. We experiment on a

180 \times 320 \times 30

tensor which consists of the first 30 frames of the Sky dataset (available at http://www.loujing.com/rss-small-target, accessed on 28 July 2021) for small object detection [55].

We also consider three settings, i.e., Setting I

(ρ_{obs} = 0.3, ρ_{s} = 0.2)

, Setting II

(ρ_{obs} = 0.6, ρ_{s} = 0.25)

, and Setting III

(ρ_{obs} = 0.8, ρ_{s} = 0.3)

. We tune the parameters in the same way as Section 7.2.2. In each setting, we test each image for 10 trials and report the averaged PSNR (in db), SSIM and running time (in seconds). For quantitative evalution, we report the PSNR, SSIM values and running time in Table 8. It is also found that the proposed TNN (Data) achieves the highest accuracy in most cases, and can run as fast as NN.

7.2.6. Experiments on Thermal Imaging Data

Thermal infrared data can provide important measurements of surface energy fluxes and temperatures in various remote sensing applications [7]. In this section, we experiment on two infrared datasets as follows:

The Infraed Detection dataset [56]: this dataset is collected for infrared detection and tracking of dim-small aircraft targets under ground/air background (available at http://www.csdata.org/p/387/, accessed on 28 July 2021). It consists of 22 subsets of infrared image sequences of all aircraft targets. We use the first 30 frames of data3.zip to form a $256 \times 256 \times 30$ tensor due to the trade-off between the limitation of computing resources and the efforts for parameter tuning.
The OSU Thermal Database [3]: The sequences were recorded on the Ohio State University campus during the months of February and March 2005, and show several people, some in groups, moving through the scene. We use the first 30 frames of Sequences 1 and form a tensor of size $320 \times 240 \times 30$ .

Similiar to Section 7.2.2, we test in three settings, i.e., Setting I

(ρ_{obs} = 0.3, ρ_{s} = 0.2)

, Setting II

(ρ_{obs} = 0.6, ρ_{s} = 0.25)

, and Setting III

(ρ_{obs} = 0.8, ρ_{s} = 0.3)

, and use the same strategy for parameter tuning. In each setting, we test each image for 10 trials and report the averaged PSNR (in db), SSIM and running time (in seconds). For quantitative evalution, we report the PSNR, SSIM values and running time in Table 9 and Table 10 for the Infraed Detection and OSU Thermal Database datasets, respectively. The visual examples are, respectively, shown in Figure 12 and Figure 13. It can seen that the proposed TNN (Data) has the highest recovery quality in most cases, and has a comparative running time as NN, showing both effectiveness and efficiency of low-tubal-rank models in comparison with original domain-based models NN and SNN.

7.2.7. Experiments on SAR Data

Polarimetric synthetic aperture radar (PolSAR) has attracted lots of attention from remote sensing scientists because of its various advantages, e.g., all-weather, all-time, penetrating capability, and multi-polarimetry [57]. In this section, we adopt the PolSAR UAVSAR Change Detection Images dataset. It is a dataset of single-look quad-polarimetric SAR images acquired by the UAVSAR airborne sensor in L-band over an urban area in San Francisco city on 18 September 2009, and May 11, 2015. The dataset #1 have length and width of 200 pixels, and we use the first 30 bands.

We also consider three settings, i.e., Setting I

(ρ_{obs} = 0.3, ρ_{s} = 0.2)

, Setting II

(ρ_{obs} = 0.6,

ρ_{s} = 0.25)

, and Setting III

(ρ_{obs} = 0.8, ρ_{s} = 0.3)

. We tune the parameters in the same way as Section 7.2.2. In each setting, we test each image for 10 trials and report the averaged PSNR (in db), SSIM and running time (in seconds) in Table 11. It is also found that the proposed TNN (Data) achieves the highest accuracy in most cases, and can run as fast as NN.

8. Conclusions

In this paper, we resolve the challenging robust tensor completion problem by proposing a

*_{L}

-SVD-based estimator to robustly reconstruct a low-rank tensor in the presence of missing values, gross outliers, and small noises simultaneously. Specifically, this work can be concluded in the following three aspects:

(1): Algorithmically, we design an efficient algorithm within the framework of ADMM to efficiently compute the proposed estimator with guaranteed convergence behavior.
(2): Statistically, we analyze the statistical performance of the proposed estimator by establishing a non-asymptotic upper bound on the estimation error. The proposed upper bound is further proved to be minimax optimal (up to a log factor).
(3): Experimentally, the correctness of the upper bound is first validated through simulations on synthetic datasets. Then both effectiveness and efficiency of the proposed algorithm are demonstrated by extensive comparisons with state-of-the-art nuclear norm based models (i.e., NN and SNN) on seven different types of remote sensing data.

However, from a critical point of view, the proposed method has the following two limitations:

(1): The orientational sensitivity of $*_{L}$ -SVD: Despite the promising empirical performance of the $*_{L}$ -SVD-based estimator, a typical defect of it is the orientation sensitivity owing to low-rankness strictly defined along the tubal orientation which makes it fail to simultaneously exploit transformed low-rankness in multiple orientations [19,58].
(2): The difficulty in finding the optimal transform $L (\cdot)$ for $*_{L}$ -SVD: Although a direct use of fixed transforms (like DFT and DCT) may produce fairish empirical performance, it is still unclear how to find the best optimal transformation $L (\cdot)$ for any certain tensor $L^{*}$ when only partial and corrupted observations are available.

According to the above limitations, it is interesting to consider higher-order extensions of the proposed model in an orientation invariant way like [19] and discuss the statistical performance. It is also interesting to consider the data-dependent transformation learning like [31,59]. Another future direction is to consider more efficient solvers of Problem (8) using the factorization strategy or Frank–Wolfe method [47,60,61,62].

Author Contributions

Conceptualization, A.W. and Q.Z.; Data curation, Q.Z.; Formal analysis, G.Z.; Funding acquisition, A.W., G.Z. and Q.Z.; Investigation, A.W.; Methodology, G.Z.; Project administration, A.W. and Q.Z.; Resources, Q.Z.; Software, A.W.; Supervision, G.Z. and Q.Z.; Validation, A.W., G.Z. and Q.Z.; Visualization, Q.Z.; Writing—original draft, A.W. and G.Z.; Writing—review & editing, A.W., G.Z. and Q.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the National Natural Science Foundation of China under Grants 61872188, 62073087, 62071132, U191140003, 6197309, in part by the China Postdoctoral Science Foundation under Grant 2020M672536, in part by the Natural Science Foundation of Guangdong Province under Grants 2020A1515010671, 2019B010154002, 2019B010118001.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

In this paper, all the data supporting our experimental results are publicly available with references or URL links.

Acknowledgments

The authors are grateful to the editor and reviewers for their valuable time in processing this manuscript. The first author would like to thank Shasha Jin for her kind understanding in these months, and Zhong Jin in Nanjing University of Science and Technology for his long time support. The authors are also grateful to Xiongjun Zhang for sharing the code for [25] and Olga Klopp for her excellent theoretical analysis in [36,51] which is so helpful.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Proof of Theoretical Results

Appendix A.1. Additional Notations and Preliminaries

For the ease of exposition, we first list the additive notations often used in the proofs in Table A1.

Table A1. Additional notations in the proofs.

Notations	Descriptions	Notations	Descriptions
$Δ^{ι} : = L^{*} - \hat{L}$	estimation error of $L^{*}$	$Δ^{s} : = S^{*} - \hat{S}$	estimation error of $S^{*}$
$r^{} : = r_{t b} (L^{})$	$_{L}$ -tubal-rank of $L^{}$	$s^{} : = {∥ S^{} ∥}_{0}$	sparsity of $S^{*}$
$ϱ : = N / N_{ι}$	inverse uncorrupted ratio	a	$l_{\infty}$ -norm bound in Equation (8)
$\tilde{d}$	$(d_{1} + d_{3}) d_{3}$
$Ω_{ι} : = {i \in [N] \| 〈X_{i}, S^{*}〉 = 0}$	index set of design tensors ${X_{i}}$ corresponding to uncorrupted entries
$Ω_{s} : = {i \in [N] \| 〈X_{i}, S^{*}〉 \neq 0}$	index set of design tensors ${X_{i}}$ corresponding to corrupted entries
$E : = \frac{1}{N} \sum_{i \in Ω_{ι}} ξ_{i} X_{i}$	stochastic tensor defined to lower bound parameters $λ_{ι}$ and $λ_{s}$
$W : = \frac{1}{N} \sum_{i \in Ω_{ι}} X_{i}$	stochastic tensor defined to lower bound parameter $λ_{s}$
$R_{Σ} : = \frac{1}{N_{ι}} \sum_{i \in Ω_{ι}} ε_{i} X_{i}$	random tensor defined in bounding $∥ Δ^{ι} ∥_{F}$ with i.i.d. Rademacher ${ε_{i}}$
${∥ T ∥}_{Π} : = \sqrt{E_{X_{i}} [{〈X_{i}, T_{Θ_{s}^{⊥}}〉}^{2}]}$	expectation of ${〈X_{i}, \cdot〉}^{2}$ for $i \in Ω_{ι}$ defined to establish the RSC condition

We then introduce the decomposability of

*_{L}

-TNN and tensor

l_{1}

-norm which plays a key role in the analysis.

Decomposability of

*_{L}

-TNN. Suppose

L^{*}

has reduced

*_{L}

-SVD as

L^{*} = U *_{L} D *_{L} V^{⊤}

, where

U \in R^{d_{1} \times r^{*} \times d_{3}}

and

V \in R^{d_{2} \times r^{*} \times d_{3}}

are orthogonal and

D \in R^{r^{*} \times r^{*} \times d_{3}}

is f-diagonal. Define projectors

P^{⋆} (\cdot)

and

P^{⊥} (\cdot)

as follows:

\begin{matrix} P^{⋆} (T) = U *_{L} U^{⊤} *_{L} T + T *_{L} V *_{L} V^{⊤} - U *_{L} U^{⊤} *_{L} T *_{L} V *_{L} V^{⊤}, \\ P^{⊥} (T) = (I - U *_{L} U^{⊤}) *_{L} T *_{L} (I - V *_{L} V^{⊤}) . \end{matrix}

Then, it can be verified that:

(I).: $T = P^{⋆} (T) + P^{⊥} (T)$ , $\forall T \in R^{d_{1} \times d_{2} \times d_{3}}$ ;
(II).: $〈P^{⋆} (A), P^{⊥} (B)〉 = 0$ , $\forall A, B \in R^{d_{1} \times d_{2} \times d_{3}}$ .
(III).: $r_{t b} (P^{⋆} (T)) \leq 2 r_{t b} (L^{*})$ , $\forall T \in R^{d_{1} \times d_{2} \times d_{3}}$ .

In the same way to the results in supplementary material of [43], it can also be shown that the following equations hold:

(I).: (Decomposability of $*_{L}$ –TNN) For any $A, B \in R^{d_{1} \times d_{2} \times d_{3}}$ satisfying $A *_{L} B^{⊤} = 0$ , $A^{⊤} *_{L} B = 0$ ,

$\begin{matrix} {∥ A + B ∥}_{⋆} = ∥ P^{⋆} (A) ∥_{⋆} + {∥ P^{⊥} (B) ∥}_{⋆} . \end{matrix}$

(A1)
(II).: (Norm compatibility inequality) For any $T \in R^{d_{1} \times d_{2} \times d_{3}}$

$\begin{matrix} {∥ T ∥}_{⋆} = \sqrt{d_{3} r_{t b} (T)} {∥ T ∥}_{F} . \end{matrix}$

(A2)

Decomposability of tensor

l_{1}

-norm [63]. Let

Ω_{e} \subset [d_{1}] \times [d_{2}] \times [d_{3}]

denote any index set and

Ω_{e}^{⊥}

its complement. Then for any tensor

T

, define two tensors

T_{Ω_{e}}

and

T_{Ω_{e}^{⊥}}

as follows

\begin{matrix} T_{Ω_{e}} (i, j, k) : = \{\begin{matrix} T_{i j k}, (i, j, k) \in Ω_{e} \\ 0, (i, j, k) \in Ω_{e}^{⊥} \end{matrix}, T_{Ω_{e}^{⊥}} : = T - T_{Ω_{e}} . \end{matrix}

(A3)

Then, one has

(I).: (Decomposability of $l_{1}$ -norm) For any $T \in R^{d_{1} \times d_{2} \times d_{3}}$ , ${∥ T ∥}_{1} = ∥ T_{Ω_{e}} ∥_{1} + {∥ T_{Ω_{e}^{⊥}} ∥}_{1} .$
(II).: (Norm compatibility inequality) For any $T \in R^{d_{1} \times d_{2} \times d_{3}}$ , $∥ T_{Ω_{e}} ∥_{1} = \sqrt{| Ω_{e} |} {∥ T_{Ω_{e}} ∥}_{F} .$

Appendix A.2. The Proof for Theorem 3

The proof follows the lines of [34,36]. For notational simplicity, we define the following two sets

\begin{matrix} Ω_{ι} : = {i \in [N] | 〈X_{i}, S^{*}〉 = 0}, Ω_{s} : = {i \in [N] | 〈X_{i}, S^{*}〉 \neq 0} \end{matrix}

(A4)

which denote the index set of design tensors

{X_{i}}

corresponding to uncorrupted/corrupted entries, respectively.

Appendix A.2.1. Mainstream of Proving Theorem 3

Proof of Theorem 3.

Let

F (L, S) = \frac{1}{N} \sum_{i = 1}^{N} {(y_{i} - 〈L + S, X_{i}〉)}^{2} + λ_{ι} {∥ L ∥}_{⋆} + λ_{s} {∥ S ∥}_{1}

for simplicity. Then, according to the optimality of

(\hat{L}, \hat{S})

to Problem (8), it holds that

\begin{matrix} F (\hat{L}, \hat{S}) \leq F (L^{*}, S^{*}) \end{matrix}

(A5)

and

\begin{matrix} ∥ Δ^{ι} ∥_{\infty} \overset{}{=} ∥ \hat{L} - L^{*} ∥_{\infty} \overset{}{\leq} ∥ \hat{L} ∥_{\infty} + {∥ L^{*} ∥}_{\infty} \overset{}{\leq} 2 a \\ ∥ Δ^{s} ∥_{\infty} \overset{}{=} ∥ \hat{S} - S^{*} ∥_{\infty} \overset{}{\leq} ∥ \hat{S} ∥_{\infty} + {∥ S^{*} ∥}_{\infty} \overset{}{\leq} 2 a \end{matrix}

(A6)

Equation (A5) indicates that

\begin{matrix} \frac{1}{N} \sum_{i = 1}^{N} {(ξ_{i} + 〈Δ^{ι} + Δ^{s}, X_{i}〉)}^{2} + λ_{ι} ∥ \hat{L} ∥_{⋆} + λ_{s} ∥ \hat{S} ∥_{1} \leq \frac{1}{N} \sum_{i = 1}^{N} ξ_{i}^{2} + λ_{ι} ∥ L^{*} ∥_{⋆} + λ_{s} {∥ S^{*} ∥}_{1} \end{matrix}

(A7)

which leads to

\begin{matrix} \frac{1}{N} \sum_{i \in Ω_{ι}} {〈Δ^{ι} + Δ^{s}, X_{i}〉}^{2} & \leq \underset{: = I}{\underset{︸}{\frac{2}{N} \sum_{i \in Ω_{s}} | 〈ξ_{i} X_{i}, Δ^{ι} + Δ^{s}〉 | - \frac{1}{N} \sum_{i \in Ω_{s}} {〈X_{i}, Δ^{ι} + Δ^{s}〉}^{2}}} \\ + \underset{: = I I}{\underset{︸}{2 | 〈E, Δ^{ι}〉 | + λ_{ι} (∥ L^{*} ∥_{⋆} - ∥ \hat{L} ∥_{⋆})}} \\ + \underset{: = I I I}{\underset{︸}{2 |〈E, Δ_{Θ_{s}^{⊥}}^{s}〉| + λ_{s} (∥ S^{*} ∥_{1} - ∥ \hat{S} ∥_{1})}} \end{matrix}

(A8)

where

E : = \frac{1}{N_{ι}} \sum_{i \in Ω_{ι}} ξ_{i} X_{i}

, and the equality

〈E, Δ^{s}〉 = 〈E, Δ_{Θ_{s}^{⊥}}^{s}〉

holds. Now each item in the right hand side of (A8) will be upper bounded separately as follows. Following the idea of [36], the upper bound will be analyzed upon the following event

\begin{matrix} E : = \{max_{1 \leq i \leq N} | ξ_{i} | \leq C_{*} σ \sqrt{log \tilde{d}}\} \end{matrix}

(A9)

According to the tail behavior of the maximum in a sub-Gaussian sequence, it holds with an absolute constant

C_{*} > 0

such that

P [E] \geq 1 - 1 / (2 \tilde{d})

.

Bound

I

. On the event

E

, we get

\begin{matrix} I \leq \frac{1}{N} \sum_{i \in Ω_{s}} ξ_{i}^{2} \leq \frac{C σ^{2} | Θ_{s} | log \tilde{d}}{N} . \end{matrix}

(A10)

Bound

I I

. Note that according to the properties of

*_{L}

-TNN, we have

\begin{matrix} ∥ L^{*} ∥_{⋆} - {∥ \hat{L} ∥}_{⋆} & = ∥ L^{*} ∥_{⋆} - {∥ L^{*} - Δ^{ι} ∥}_{⋆} \\ = ∥ L^{*} ∥_{⋆} - {∥ L^{*} - P^{⊥} (Δ^{ι}) - P^{⋆} (Δ^{ι}) ∥}_{⋆} \\ \leq ∥ L^{*} ∥_{⋆} - (∥ L^{*} - P^{⊥} (Δ^{ι}) ∥_{⋆} - ∥ P^{⋆} (Δ^{ι}) ∥_{⋆}) \\ = ∥ L^{*} ∥_{⋆} - (∥ L^{*} ∥_{⋆} + ∥ P^{⊥} (Δ^{ι}) ∥_{⋆} - ∥ P^{⋆} (Δ^{ι}) ∥_{⋆}) \\ = ∥ P^{⋆} (Δ^{ι}) ∥_{⋆} - {∥ P^{⊥} (Δ^{ι}) ∥}_{⋆} \end{matrix}

Thus, we can bound term

I I

by

\begin{matrix} I I \leq {2 ∥ E ∥}_{s p} {∥ Δ^{ι} ∥}_{⋆} + λ_{ι} (∥ P^{⋆} (Δ^{ι}) ∥_{⋆} - {∥ P^{⊥} (Δ^{ι}) ∥}_{⋆}) \end{matrix}

By letting

λ_{ι} \geq 4 {∥ E ∥}_{s p}

, it holds that

\begin{matrix} I I \leq \frac{3}{2} λ_{ι} ∥ P^{⋆} (Δ^{ι}) ∥_{⋆} \leq \frac{3}{2} λ_{ι} \sqrt{2 r^{*} d_{3}} {∥ Δ^{ι} ∥}_{F} . \end{matrix}

(A11)

Bound

I I I

: Note that since

S_{Θ_{s}^{⊥}}^{*} = 0

, we have

Δ_{Θ_{s}^{⊥}}^{s} = - {\hat{S}}_{Θ_{s}^{⊥}}

, leading to

\begin{matrix} I I I \leq {2 ∥ E ∥}_{\infty} ∥ {\hat{S}}_{Θ_{s}^{⊥}} ∥_{1} + λ_{s} (∥ S^{*} ∥_{1} - ∥ \hat{S} ∥_{1} {) \leq (2 ∥ E ∥}_{\infty} - λ_{s}) ∥ {\hat{S}}_{Θ_{s}^{⊥}} ∥_{1} + λ_{s} {∥ S^{*} ∥}_{1} \end{matrix}

Letting

λ_{s} \geq 4 {∥ E ∥}_{\infty}

yields

\begin{matrix} I I I \leq λ_{s} {∥ S^{*} ∥}_{1} \end{matrix}

(A12)

Thus, putting Equations (A10)–(A12) together, we have the following inequality on the event

E

:

\begin{matrix} \frac{1}{N} \sum_{i \in Ω_{ι}} {〈X_{i}, Δ^{ι} + Δ^{s}〉}^{2} \leq \frac{3 λ_{ι} \sqrt{r^{*} d_{3}}}{\sqrt{2}} ∥ Δ^{ι} ∥_{F} + λ_{s} {∥ S^{*} ∥}_{1} + \frac{C σ^{2} | Θ_{s} | log \tilde{d}}{N} \end{matrix}

(A13)

Then, we follow the line of [36] to specify a kind of Restricted Strong Convexity (RSC) for the random sampling operator formed by the design tensors

{X_{i}}

on a carefully chosen constrained set. The RSC will show that when the error tensors

(Δ^{ι}, Δ^{s})

belong to the constrained set, the following relationship:

\begin{matrix} \frac{1}{N} \sum_{i \in Ω_{ι}} {〈X_{i}, Δ^{ι} + Δ^{s}〉}^{2} \geq κ_{0} {∥ Δ^{ι} + Δ^{s} ∥}_{Π}^{2} - τ, \end{matrix}

(A14)

holds with an appropriate residual

τ

with high probability.

Before explicitly defining the constrained set, we first consider the following set where

Δ^{s}

should lie:

\begin{matrix} B (δ_{1}, δ_{2}) : = {B \in R^{d_{1} \times d_{2} \times d_{3}} {| ∥ B ∥}_{Π}^{2} \leq δ_{1}^{2} {, ∥ B ∥}_{1} \leq δ_{2}} \end{matrix}

(A15)

with two positive constants

δ_{1}

and

δ_{2}

whose values will be specified later. We also define the following set of tensor pairs:

\begin{matrix} D (r, κ, β) : = \{{(A, B) | ∥ A + B ∥}_{Π}^{2} \geq {β, ∥ A + B ∥}_{\infty} \leq {1, ∥ A ∥}_{⋆} \leq \sqrt{r d_{3}} {∥ A_{Θ_{s}^{⊥}} ∥}_{F} + κ\} \end{matrix}

(A16)

We then define the constrained set as the intersection:

\begin{matrix} D (r, κ, β) \cap {R^{d_{1} \times d_{2} \times d_{3}} \times B (δ_{1}, δ_{2})} \end{matrix}

(A17)

To bound the estimation error in Equation (26), we will upper bound

∥ Δ^{ι} ∥_{F}

and

∥ Δ^{s} ∥_{F}

separately.

Note that

\begin{matrix} ∥ Δ^{ι} ∥_{F}^{2} = ∥ Δ_{Θ_{s}}^{ι} ∥_{F}^{2} + ∥ Δ_{Θ_{s}^{⊥}}^{ι} ∥_{F}^{2} \leq ∥ Δ_{Θ_{s}}^{ι} ∥_{F}^{2} + 4 a^{2} | Θ_{s}^{⊥} | = | Θ_{s} | ∥ Δ_{Θ_{s}^{⊥}}^{ι} ∥_{Π}^{2} + 4 a^{2} | Θ_{s}^{⊥} | \end{matrix}

(A18)

and similarly

\begin{matrix} ∥ Δ^{s} ∥_{F}^{2} \leq | Θ_{s} | ∥ Δ_{Θ_{s}^{⊥}}^{s} ∥_{Π}^{2} + 4 a^{2} | Θ_{s}^{⊥} | \end{matrix}

(A19)

We will bound

∥ Δ_{Θ_{s}^{⊥}}^{ι} ∥_{Π}^{2}

and

∥ Δ_{Θ_{s}^{⊥}}^{s} ∥_{Π}^{2}

separately. We first bound

∥ Δ_{Θ_{s}^{⊥}}^{ι} ∥_{Π}^{2}

in what follows.

Case 1: If

∥ Δ^{ι} + Δ^{s} ∥_{Π}^{2} \leq 16 a^{2} \sqrt{\frac{128 log \tilde{d}}{N_{ι}}}

, then we use the following inequality

\begin{matrix} ∥ Δ^{ι} + Δ^{s} ∥_{Π}^{2} \geq \frac{1}{2} ∥ Δ^{ι} ∥_{Π}^{2} - {∥ Δ^{s} ∥}_{Π}^{2} \end{matrix}

(A20)

which holds due to

{(x + y)}^{2} = x^{2} + y^{2} + 2 x y \geq x^{2} + y^{2} + 2 \cdot x / \sqrt{2} \cdot \sqrt{2} y \geq x^{2} + y^{2} - (x^{2} / 2 + 2 y^{2}) = x^{2} / 2 - y^{2}

.

Thus, we can upper bound

∥ Δ^{ι} ∥_{Π}

with an upper bound of

∥ Δ^{s} ∥_{Π}

in Lemma A1

\begin{matrix} ∥ Δ^{ι} ∥_{Π}^{2} \leq 16 a^{2} \sqrt{\frac{128 log \tilde{d}}{N_{ι}}} + {∥ Δ^{s} ∥}_{Π}^{2} \end{matrix}

(A21)

Case 2: Suppose

∥ Δ^{ι} + Δ^{s} ∥_{Π}^{2} \geq 16 a^{2} \sqrt{\frac{128 log \tilde{d}}{N_{ι}}}

. First, according to Lemma A3, it holds on the event

E

defined in Equation (A9) that

\begin{matrix} ∥ Δ^{ι} ∥_{⋆} & \overset{(i)}{\leq} ∥ P^{⊥} (Δ^{ι}) ∥_{⋆} + {∥ P^{⋆} (Δ^{ι}) ∥}_{⋆} \\ \overset{(i i)}{\leq} 4 {∥ P^{⋆} (Δ^{ι}) ∥}_{⋆} + \frac{2 N_{s}}{N λ_{ι}} (a N λ_{s} + C σ^{2} log \tilde{d}) \\ \overset{(i i i)}{\leq} 4 \sqrt{2 r^{*} d_{3}} {∥ P^{⋆} (Δ^{ι}) ∥}_{F} + \frac{2 N_{s}}{N λ_{ι}} (a N λ_{s} + C σ^{2} log \tilde{d}) \\ \overset{(i v)}{\leq} 4 \sqrt{2 r^{*} d_{3}} {∥ Δ^{ι} ∥}_{F} + \frac{2 N_{s}}{N λ_{ι}} (a N λ_{s} + C σ^{2} log \tilde{d}) \\ \overset{(v)}{\leq} 4 \sqrt{2 r^{*} d_{3}} (∥ Δ_{Θ_{s}^{⊥}}^{ι} ∥_{F} + ∥ Δ_{Θ_{s}}^{ι} ∥_{F}) + \frac{2 N_{s}}{N λ_{ι}} (a N λ_{s} + C σ^{2} log \tilde{d}) \\ \overset{(v i)}{\leq} \sqrt{32 r^{*} d_{3}} {∥ Δ_{Θ_{s}^{⊥}}^{ι} ∥}_{F} + a \sqrt{128 r^{*} d_{3} | Θ_{s} |} + \frac{2 N_{s}}{N λ_{ι}} (a N λ_{s} + C σ^{2} log \tilde{d}) \end{matrix}

(A22)

where

(i)

holds due to the triangular inequality;

(i i)

is a direct consequence of Lemma A3, and the definition of event

E

;

(i i i)

holds because

r_{t b} (P^{⋆} (T)) \leq 2 r^{*}

, and

{∥ T ∥}_{⋆} \leq \sqrt{r_{t b} (T) d_{3}} {∥ T ∥}_{F}

for any

T \in R^{d_{1} \times d_{2} \times d_{3}}

;

(i v)

stems from the inequality

∥ P^{⋆} (T) ∥_{F} \leq {∥ T ∥}_{F}

= \sqrt{∥ P^{⋆} (T) ∥_{F}^{2} + {∥ P^{⊥} (T) ∥}_{F}^{2}}

;

(v)

is due to the triangular inequality;

(v i)

holds since

∥ Δ^{ι} ∥_{\infty} \leq 2 a

.

Note that according to Lemma A1, we have with probability at least

1 - 2.5 / \tilde{d}

:

\begin{matrix} \frac{Δ^{s}}{4 a} \in B (δ_{1}, δ_{2}) w i t h δ_{1} = \frac{\sqrt{Δ_{1}}}{4 a}, δ_{2} = \frac{N_{s}}{4 a N λ_{s}} (4 a^{2} + 8 a N λ_{s} + C σ^{2} log \tilde{d}) \end{matrix}

(A23)

where

Δ_{1}

is defined in Lemma A4.

Together with Equation (A22), we have

\begin{matrix} \frac{1}{4 a} (Δ^{ι}, Δ^{s}) \in D (r, κ, β) \cap (R^{d_{1} \times d_{2} \times d_{3}} \times B (δ_{1}, δ_{2})) \end{matrix}

(A24)

with the following parameters

\begin{matrix} r = 32 r^{*} a n d κ = 2 a \sqrt{2 r^{*} d_{3} | Θ_{s} |} + \frac{N_{s}}{2 a N λ_{ι}} (a N λ_{s} + C σ^{2} log \tilde{d}) \end{matrix}

(A25)

Then, according to Lemma A6, it holds with probability at least

1 - 2 / \tilde{d}

that

\begin{matrix} \frac{1}{16 a^{2} N_{ι}} \sum_{i \in Ω_{ι}} {〈X_{i}, Δ^{ι} + Δ^{s}〉}^{2} \geq \frac{1}{32 a^{2}} {∥ Δ^{ι} + Δ^{s} ∥}_{Π}^{2} - τ (r, κ, δ_{1}, δ_{2}) \end{matrix}

(A26)

where

τ (r, κ, δ_{1}, δ_{2})

is defined in Equation (A64).

Recall that Equation (A13) writes:

\begin{matrix} \frac{1}{N} \sum_{i \in Ω_{ι}} {〈X_{i}, Δ^{ι} + Δ^{s}〉}^{2} \leq \frac{3 λ_{ι} \sqrt{r^{*} d_{3}}}{\sqrt{2}} {∥ Δ^{ι} ∥}_{F} + N_{s} (a λ_{s} + \frac{C σ^{2} log \tilde{d}}{N}) \end{matrix}

(A27)

Letting

ϱ : = N / N_{ι}

, we further obtain

\begin{matrix} \frac{1}{2} {∥ Δ^{ι} + Δ^{s} ∥}_{Π}^{2} & \leq \frac{3 ϱ λ_{ι}}{\sqrt{2}} \sqrt{r^{*} d_{3}} {∥ Δ^{ι} ∥}_{F} + C τ^{'} \\ \leq \frac{9}{8} ϱ^{2} λ_{ι}^{2} r^{*} d_{3} \cdot 4 d_{1} d_{2} d_{3} + \frac{∥ Δ^{ι} ∥_{F}^{2}}{4 d_{1} d_{2} d_{3}} + C τ^{'} \\ \leq \frac{9}{2} ϱ^{2} λ_{ι}^{2} r^{*} d_{1} d_{2} d_{3}^{2} + \frac{∥ Δ_{Θ_{s}^{⊥}}^{ι} ∥_{F}^{2}}{4 d_{1} d_{2} d_{3}} + \frac{a^{2} | Θ_{s} |}{d_{1} d_{2} d_{3}} + C τ^{'} \end{matrix}

(A28)

Thus, by using Equation (A20) and Lemma A1, we have

\begin{matrix} \frac{∥ Δ_{Θ_{s}^{⊥}}^{ι} ∥_{F}^{2}}{d_{1} d_{2} d_{3}} \leq C (ϱ^{2} λ_{ι}^{2} r^{*} d_{1} d_{2} d_{3}^{2} + \frac{a^{2} | Θ_{s} |}{d_{1} d_{2} d_{3}} + τ^{'}) \end{matrix}

(A29)

where

\begin{matrix} τ^{'} = ϱ N_{s} (a λ_{s} + \frac{C σ^{2} log \tilde{d}}{N}) + 16 a^{2} τ (r, κ, δ_{1}, δ_{2}) \end{matrix}

(A30)

Note that the bound on

∥ Δ^{s} ∥_{Π}

is given in Lemma A4, and the values of

λ_{ι}

and

λ_{s}

can be set according to Lemmas A8 and A9, respectively. Then, by putting Equations (A18), (A19) and (A52) together, and using Lemmas A8 and A9 to bound associated norms of the stochastic quantities

E

,

W

, and

R_{Σ}

in the error term, we can obtain the bound on

∥ Δ^{ι} ∥_{F}^{2} + {∥ Δ^{s} ∥}_{F}^{2}

and complete the proof. □

Appendix A.2.2. Lemmas for the Proof of Theorem 3

Lemma A1.

Letting

λ_{s} \geq {4 (∥ E ∥}_{\infty} + {2 a ∥ W ∥}_{\infty})

, it holds that

\begin{matrix} ∥ Δ_{Θ_{s}^{⊥}}^{s} ∥_{1} \leq 3 {∥ Δ_{Ω_{s}}^{s} ∥}_{1} + \frac{1}{N λ_{s}} (4 a^{2} N_{s} + \sum_{i \in Ω_{s}} ξ_{i}^{2}) \end{matrix}

(A31)

Proof of Lemma A1.

By the standard condition for optimality over a convex set, it holds that for any feasible

(L, S)

\begin{matrix} 〈(L, S), \partial F (\hat{L}, \hat{S})〉 \geq 0 \end{matrix}

(A32)

which further leads to

\begin{matrix} - \frac{2}{N} \sum_{i = 1}^{N} (y_{i} - 〈X_{i}, \hat{L} + \hat{S}〉) 〈X_{i}, L + S - \hat{L} - \hat{S}〉 \\ + λ_{ι} 〈\partial ∥ \hat{L} ∥_{⋆}, L - \hat{L}〉 + λ_{s} 〈\partial ∥ \hat{S} ∥_{1}, S - \hat{S}〉 \geq 0 . \end{matrix}

(A33)

Letting

(L, S) \leftarrow (\hat{L}, S^{*})

, we have

\begin{matrix} - \frac{2}{N} \sum_{i = 1}^{N} (y_{i} - 〈X_{i}, \hat{L} + \hat{S}〉) 〈X_{i}, Δ^{s}〉 + λ_{s} 〈\partial ∥ \hat{S} ∥_{1}, Δ^{s}〉 \geq 0 . \end{matrix}

(A34)

Note that

\begin{matrix} - \frac{2}{N} \sum_{i = 1}^{N} (y_{i} - 〈X_{i}, \hat{L} + \hat{S}〉) 〈X_{i}, Δ^{s}〉 \\ \overset{}{=} - \frac{2}{N} \sum_{i = 1}^{N} (ξ_{i} + 〈X_{i}, Δ^{ι} + Δ^{s}〉) 〈X_{i}, Δ^{s}〉 \\ = - \frac{2}{N} \sum_{i = 1}^{N} {〈X_{i}, Δ^{s}〉}^{2} - \frac{2}{N} \sum_{i = 1}^{N} ξ_{i} 〈X_{i}, Δ^{s}〉 - \frac{2}{N} \sum_{i = 1}^{N} 〈X_{i}, Δ^{ι}〉 〈X_{i}, Δ^{s}〉 \\ \overset{}{=} - \frac{2}{N} \sum_{i = 1}^{N} {〈X_{i}, Δ^{s}〉}^{2} - (\frac{2}{N} \sum_{i \in Ω_{s}} ξ_{i} 〈X_{i}, Δ^{s}〉 + \frac{2}{N} \sum_{i \in Ω_{ι}} ξ_{i} 〈X_{i}, Δ^{s}〉) \\ - (\frac{2}{N} \sum_{i \in Ω_{s}} 〈X_{i}, Δ^{ι}〉 〈X_{i}, Δ^{s}〉 + \frac{2}{N} \sum_{i \in Ω_{ι}} 〈X_{i}, Δ^{ι}〉 〈X_{i}, Δ^{s}〉) \\ \overset{}{=} - (\frac{2}{N} \sum_{i = 1}^{N} {〈X_{i}, Δ^{s}〉}^{2} + \frac{2}{N} \sum_{i \in Ω_{s}} ξ_{i} 〈X_{i}, Δ^{s}〉 + \frac{2}{N} \sum_{i \in Ω_{s}} 〈X_{i}, Δ^{ι}〉 〈X_{i}, Δ^{s}〉) \\ - 2 〈E, Δ^{s}〉 - \frac{2}{N} \sum_{i \in Ω_{ι}} 〈X_{i}, Δ^{ι}〉 〈X_{i}, Δ^{s}〉 \\ \overset{}{\leq} - (\frac{2}{N} \sum_{i = 1}^{N} {〈X_{i}, Δ^{s}〉}^{2} - \frac{1}{N} \sum_{i \in Ω_{s}} (ξ_{i}^{2} + {〈X_{i}, Δ^{s}〉}^{2}) - \frac{1}{N} \sum_{i \in Ω_{s}} ({〈X_{i}, Δ^{ι}〉}^{2} + {〈X_{i}, Δ^{s}〉}^{2})) \\ - 2 〈E, Δ^{s}〉 - \frac{2}{N} \sum_{i \in Ω_{ι}} 〈X_{i}, Δ^{ι}〉 〈X_{i}, Δ^{s}〉 \\ = \frac{1}{N} \sum_{i \in Ω_{s}} ξ_{i}^{2} + \frac{1}{N} \sum_{i \in Ω_{s}} ({〈X_{i}, Δ^{ι}〉}^{2} - 2 〈E, Δ^{s}〉 - \frac{2}{N} \sum_{i \in Ω_{ι}} 〈X_{i}, Δ^{ι}〉 〈X_{i}, Δ^{s}〉 \\ \overset{}{\leq} \frac{1}{N} \sum_{i \in Ω_{s}} ξ_{i}^{2} + \frac{4 a^{2} | Θ_{s} |}{N} - 2 〈E, Δ^{s}〉 - \frac{2}{N} \sum_{i \in Ω_{ι}} 〈X_{i}, Δ^{ι}〉 〈X_{i}, Δ^{s}〉 \end{matrix}

(A35)

Thus, we have

\begin{matrix} λ_{s} 〈\partial ∥ \hat{S} ∥_{1}, \hat{S} - S^{*}〉 \\ \leq \frac{1}{N} \sum_{i \in Ω_{s}} ξ_{i}^{2} + \frac{4 a^{2} | Θ_{s} |}{N} - 2 〈E, Δ^{s}〉 - \frac{2}{N} \sum_{i \in Ω_{ι}} 〈X_{i}, Δ^{ι}〉 〈X_{i}, Δ^{s}〉 \\ \leq \frac{1}{N} \sum_{i \in Ω_{s}} ξ_{i}^{2} + \frac{4 a^{2} | Θ_{s} |}{N} + 2 | 〈E, Δ^{s}〉 | + \frac{2}{N} |\sum_{i \in Ω_{ι}} 〈〈X_{i}, Δ^{ι}〉 X_{i}, Δ^{s}〉| \\ \overset{}{\leq} \frac{1}{N} \sum_{i \in Ω_{s}} ξ_{i}^{2} + \frac{4 a^{2} | Θ_{s} |}{N} + {2 ∥ E ∥}_{\infty} ∥ Δ^{s} ∥_{1} + 2 ∥ \frac{1}{N_{ι}} \sum_{i \in Ω_{ι}} 〈X_{i}, Δ^{ι}〉 X_{i} ∥_{\infty} {∥ Δ^{s} ∥}_{1} \\ \overset{}{\leq} \frac{1}{N} \sum_{i \in Ω_{s}} ξ_{i}^{2} + \frac{4 a^{2} | Θ_{s} |}{N} + {2 ∥ E ∥}_{\infty} ∥ Δ^{s} ∥_{1} + {4 a ∥ W ∥}_{\infty} {∥ Δ^{s} ∥}_{1} \end{matrix}

(A36)

One the other hand, the definition of sub-differential indicates

\begin{matrix} ∥ S^{*} ∥_{1} - {∥ \hat{S} ∥}_{1} \geq 〈S^{*} - \hat{S}, \partial {∥ \hat{S} ∥}_{1}〉 \end{matrix}

(A37)

which implies

\begin{matrix} λ_{s} (∥ \hat{S} ∥_{1} - ∥ S^{*} ∥_{1}) \leq \frac{1}{N} \sum_{i \in Ω_{s}} ξ_{i}^{2} + \frac{4 a^{2} | Θ_{s} |}{N} + {2 ∥ E ∥}_{\infty} ∥ Δ^{s} ∥_{1} + {2 ∥ W ∥}_{\infty} {∥ Δ^{s} ∥}_{1} \end{matrix}

Also note that

\begin{matrix} ∥ \hat{S} ∥_{1} - {∥ S^{*} ∥}_{1} & = ∥ S^{*} - Δ^{s} ∥_{1} - {∥ S^{*} ∥}_{1} \\ = ∥ S_{Θ_{s}}^{*} - (Δ_{Θ_{s}^{⊥}}^{s} + Δ_{Θ_{s}}^{s}) ∥_{1} - {∥ S_{Θ_{s}}^{*} ∥}_{1} \\ \geq ∥ S_{Θ_{s}}^{*} - Δ_{Θ_{s}^{⊥}}^{s} ∥_{1} - ∥ Δ_{Θ_{s}}^{s} ∥_{1} - {∥ S_{Θ_{s}}^{*} ∥}_{1} \\ = ∥ S_{Θ_{s}}^{*} ∥_{1} + ∥ Δ_{Θ_{s}^{⊥}}^{s} ∥_{1} - ∥ Δ_{Θ_{s}}^{s} ∥_{1} - {∥ S_{Θ_{s}}^{*} ∥}_{1} \\ = ∥ Δ_{Θ_{s}^{⊥}}^{s} ∥_{1} - {∥ Δ_{Θ_{s}}^{s} ∥}_{1} \end{matrix}

(A38)

which implies

\begin{matrix} λ_{s} (∥ Δ_{Θ_{s}^{⊥}}^{s} ∥_{1} - ∥ Δ_{Θ_{s}}^{s} ∥_{1}) \leq \frac{1}{N} \sum_{i \in Ω_{s}} ξ_{i}^{2} + \frac{4 a^{2} | Θ_{s} |}{N} + {2 ∥ E ∥}_{\infty} ∥ Δ^{s} ∥_{1} + {2 ∥ W ∥}_{\infty} {∥ Δ^{s} ∥}_{1} \end{matrix}

Since

λ_{s} \geq {4 (∥ E ∥}_{\infty} + {2 a ∥ W ∥}_{\infty})

, we have

\begin{matrix} λ_{s} (∥ Δ_{Θ_{s}^{⊥}}^{s} ∥_{1} - ∥ Δ_{Θ_{s}}^{s} ∥_{1}) \leq \frac{1}{N} \sum_{i \in Ω_{s}} ξ_{i}^{2} + \frac{4 a^{2} | Θ_{s} |}{N} + \frac{λ_{s}}{2} (∥ Δ_{Θ_{s}^{⊥}}^{s} ∥_{1} + ∥ Δ_{Θ_{s}}^{s} ∥_{1}) \end{matrix}

(A39)

Thus, it holds that

\begin{matrix} ∥ Δ_{Θ_{s}^{⊥}}^{s} ∥_{1} \leq 3 ∥ Δ_{Θ_{s}^{⊥}}^{s} ∥_{1} + \frac{1}{N λ_{s}} (4 a^{2} | Θ_{s} | + \sum_{i \in Ω_{s}} ξ_{i}^{2}) \end{matrix}

(A40)

which complete the proof. □

Lemma A2.

It holds that

\begin{matrix} ∥ \frac{1}{N_{ι}} \sum_{i \in Ω_{ι}} 〈X_{i}, Δ^{ι}〉 X_{i} ∥_{\infty} \leq 2 a {∥ W ∥}_{\infty} \end{matrix}

(A41)

Proof of Lemma A2.

Note that

\begin{matrix} ∥ \frac{1}{N_{ι}} \sum_{i \in Ω_{ι}} 〈X_{i}, Δ^{ι}〉 X_{i} ∥_{\infty} & \overset{(i)}{\leq} sup_{{∥ T ∥}_{1} \leq 1} 〈\frac{1}{N_{ι}} \sum_{i \in Ω_{ι}} 〈X_{i}, Δ^{ι}〉 X_{i}, T〉 \\ \overset{}{\leq} 2 a sup_{{∥ T ∥}_{1} \leq 1} 〈\frac{1}{N_{ι}} \sum_{i \in Ω_{ι}} \frac{〈X_{i}, Δ^{ι}〉}{2 a} X_{i}, T〉 \\ \overset{(i i)}{\leq} 2 a sup_{∥ T^{'} ∥_{1} \leq 1} 〈\frac{1}{N_{ι}} \sum_{i \in Ω_{ι}} X_{i}, T^{'}〉 \\ \leq {2 a ∥ W ∥}_{\infty} \end{matrix}

(A42)

where

(i)

hold since

{∥ \cdot ∥}_{\infty}

is the dual norm of

{∥ \cdot ∥}_{1}

;

(i i)

holds since

| 〈X_{i}, Δ^{ι}〉 | \leq ∥ Δ^{ι} ∥_{\infty} \leq 2 a

and the tensor

ℓ_{1}

-norm

{∥ \cdot ∥}_{1}

is invariant to changes in sign. □

Lemma A3.

By letting

λ_{ι} \geq 4 {∥ E ∥}_{s p}

and

λ_{s} \geq {∥ E ∥}_{\infty}

, we have

\begin{matrix} ∥ P^{⊥} (Δ^{ι}) ∥_{⋆} \leq 3 {∥ P^{⋆} (Δ^{ι}) ∥}_{⋆} + \frac{2 a λ_{s}}{λ_{ι}} N_{s} + \frac{2}{N λ_{ι}} \sum_{i \in Ω_{s}} ξ_{i}^{2} \end{matrix}

(A43)

Proof of Lemma A3.

In Equation (A33), letting

(L, S) \leftarrow (L^{*}, S^{*})

, we obtain

\begin{matrix} - \frac{2}{N} \sum_{i = 1}^{N} (y_{i} - 〈X_{i}, \hat{L} + \hat{S}〉) 〈X_{i}, Δ^{ι} + Δ^{s}〉 + λ_{ι} 〈\partial ∥ \hat{L} ∥_{⋆}, Δ^{ι}〉 + λ_{s} 〈\partial ∥ \hat{S} ∥_{1}, Δ^{s}〉 \geq 0 \end{matrix}

(A44)

First, note that

\begin{matrix} - \frac{2}{N} \sum_{i = 1}^{N} (y_{i} - 〈X_{i}, \hat{L} + \hat{S}〉) 〈X_{i}, Δ^{ι} + Δ^{s}〉 \\ \overset{}{=} - \frac{2}{N} \sum_{i = 1}^{N} (ξ_{i} + 〈X_{i}, Δ^{ι} + Δ^{s}〉) 〈X_{i}, Δ^{ι} + Δ^{s}〉 \\ \overset{}{=} - \frac{2}{N} \sum_{i = 1}^{N} {〈X_{i}, Δ^{ι} + Δ^{s}〉}^{2} - \frac{2}{N} \sum_{i = 1}^{N} ξ_{i} 〈X_{i}, Δ^{ι} + Δ^{s}〉 \\ \overset{}{=} - \frac{2}{N} \sum_{i = 1}^{N} {〈X_{i}, Δ^{ι} + Δ^{s}〉}^{2} - \frac{2}{N} \sum_{i \in Ω_{s}} ξ_{i} 〈X_{i}, Δ^{ι} + Δ^{s}〉 - 2 〈E, Δ^{ι}〉 - 2 〈E, Δ_{Θ_{s}^{⊥}}^{s}〉 \end{matrix}

(A45)

Also, we have according to the convexity of

{∥ \cdot ∥}_{⋆}

and

{∥ \cdot ∥}_{1}

that

\begin{matrix} ∥ L^{*} ∥_{⋆} - ∥ \hat{L} ∥_{⋆} \geq 〈L^{*} - \hat{L}, \partial {∥ \hat{L} ∥}_{⋆}〉, and ∥ S^{*} ∥_{1} - {∥ \hat{S} ∥}_{1} \geq 〈S^{*} - \hat{S}, \partial {∥ \hat{S} ∥}_{1}〉 \end{matrix}

(A46)

Thus, we have

\begin{matrix} λ_{ι} (∥ \hat{L} ∥_{⋆} - ∥ L^{*} ∥_{⋆}) + λ_{s} (∥ \hat{S} ∥_{1} - ∥ S^{*} ∥_{1} {) \leq 2 ∥ E ∥}_{s p} ∥ Δ^{ι} ∥_{⋆} + {2 ∥ E ∥}_{\infty} {∥ Δ_{Θ_{s}^{⊥}}^{s} ∥}_{1} + \frac{1}{N} \sum_{i \in Ω_{s}} ξ_{i}^{2} . \end{matrix}

Moreover, it is often used that

\begin{matrix} ∥ \hat{L} ∥_{⋆} - ∥ L^{*} ∥_{⋆} \geq ∥ P^{⊥} (Δ^{ι}) ∥_{⋆} - {∥ P^{⋆} (Δ^{ι}) ∥}_{⋆}, \end{matrix}

(A47)

Since we set

λ_{ι} \geq 4 {∥ E ∥}_{s p}

and

λ_{s} \geq 4 {∥ E ∥}_{\infty}

, we have

\begin{matrix} λ_{ι} (∥ P^{⊥} (Δ^{ι}) ∥_{⋆} - ∥ P^{⋆} (Δ^{ι}) ∥_{⋆}) + λ_{s} (∥ \hat{S} ∥_{1} - ∥ S^{*} ∥_{1}) \\ \leq \frac{λ_{ι}}{2} (∥ P^{⋆} (Δ^{ι}) ∥_{⋆} + ∥ P^{⊥} (Δ^{ι}) ∥_{⋆}) + \frac{λ_{s}}{2} {∥ {\hat{S}}_{Θ_{s}^{⊥}} ∥}_{1} + \frac{1}{N} \sum_{i \in Ω_{s}} ξ_{i}^{2} \end{matrix}

(A48)

where we use

{\hat{S}}_{Θ_{s}^{⊥}} = - Δ_{Θ_{s}^{⊥}}^{s}

. It implies

\begin{matrix} \frac{λ_{ι}}{2} ∥ P^{⊥} (Δ^{ι}) ∥_{⋆} + λ_{s} ∥ {\hat{S}}_{Θ_{s}} ∥_{1} + \frac{λ_{s}}{2} ∥ {\hat{S}}_{Θ_{s}^{⊥}} ∥_{1} \leq \frac{3 λ_{ι}}{2} ∥ P^{⋆} (Δ^{ι}) ∥_{⋆} + λ_{s} {∥ S^{*} ∥}_{1} + \frac{1}{N} \sum_{i \in Ω_{s}} ξ_{i}^{2} \end{matrix}

(A49)

Note that,

∥ S^{*} ∥_{1} \leq a N_{s}

. Thus, we have

\begin{matrix} ∥ P^{⋆} (Δ^{ι}) ∥_{⋆} \leq 3 λ_{ι} {∥ P^{⋆} (Δ^{ι}) ∥}_{⋆} + \frac{2 a λ_{s}}{λ_{ι}} N_{s} + \frac{2}{N λ_{ι}} \sum_{i \in Ω_{s}} ξ_{i}^{2} \end{matrix}

(A50)

□

Lemma A4.

If

N_{ι} \geq \tilde{d}

and

λ_{s} \geq {4 (∥ E ∥}_{\infty} + {2 a ∥ W ∥}_{\infty})

, then on the event

E

defined in Equation (A9), we have

\begin{matrix} ∥ Δ^{s} ∥_{1} \leq \frac{N_{s}}{N λ_{s}} (4 a^{2} + 8 a N λ_{s} + C σ^{2} log \tilde{d}) \end{matrix}

(A51)

and it holds with probability at least

1 - 2.5 / \tilde{d}

that

\begin{matrix} ∥ Δ^{s} ∥_{Π} \leq Δ_{1} : = \\ C (ϱ \frac{2 N_{s}}{N_{ι}} (4 a^{2} + 2 a N λ_{ι} + C N_{s} σ^{2} log \tilde{d}) + \frac{16 a N_{s}}{N λ_{s}} (4 a^{2} + 8 a N λ_{s} + C σ^{2} log \tilde{d}) {E [∥ E ∥}_{\infty}]) \end{matrix}

(A52)

Proof of Lemma A4.

We first prove Equation (A51), and then prove Equation (A52).

(I) The proof of Equation (A51): Recall that Lemma A1 implies

\begin{matrix} ∥ Δ_{Θ_{s}^{⊥}}^{s} ∥_{1} \leq 3 {∥ Δ_{Ω_{s}}^{s} ∥}_{1} + \frac{1}{N λ_{s}} (4 a^{2} N_{s} + \sum_{i \in Ω_{s}} ξ_{i}^{2}) \end{matrix}

Then, we have on event

E

:

\begin{matrix} ∥ Δ^{s} ∥_{1} = ∥ Δ_{Θ_{s}^{⊥}}^{s} ∥_{1} + {∥ Δ_{Θ_{s}}^{s} ∥}_{1} & \overset{(i)}{\leq} ∥ Δ_{Θ_{s}^{⊥}}^{s} ∥_{1} + {∥ Δ_{Ω_{s}}^{s} ∥}_{1} \\ \overset{(i i)}{\leq} 4 {∥ Δ_{Ω_{s}}^{s} ∥}_{1} + \frac{1}{N λ_{s}} (4 a^{2} N_{s} + \sum_{i \in Ω_{s}} ξ_{i}^{2}) \\ \overset{(i i i)}{\leq} \frac{N_{s}}{N λ_{s}} (4 a^{2} + 8 a N λ_{s} + C σ^{2} log \tilde{d}) \end{matrix}

(A53)

where

(i)

holds due to Assumption A1.I;

(i i)

is a direct use of Lemma A1;

(i i i)

stems from the facts that

∥ Δ^{s} ∥_{\infty} \leq 2 a

and the definition of event

E

in Equation (A9).

Thus, Equation (A51) is proved.

(II) The proof of Equation (A52): According to the optimality of

(\hat{L}, \hat{S})

to Problem (8), we have

\begin{matrix} F (\hat{L}, \hat{S}) \leq F (\hat{L}, S^{*}) \end{matrix}

(A54)

which implies

\begin{matrix} \frac{1}{N} \sum_{i = 1}^{N_{ι}} {(ξ_{i} + 〈X_{i}, Δ^{ι} + Δ^{s}〉)}^{2} + λ_{s} ∥ \hat{S} ∥_{1} \leq \frac{1}{N} \sum_{i = 1}^{N_{ι}} {(ξ_{i} + 〈X_{i}, Δ^{ι}〉)}^{2} + λ_{s} {∥ S^{*} ∥}_{1} \end{matrix}

(A55)

which further leads to

\begin{matrix} \frac{1}{N} \sum_{i \in Ω_{ι}} {〈X_{i}, Δ^{s}〉}^{2} + \frac{1}{N} \sum_{i \in Ω_{s}} {〈X_{i}, Δ^{s}〉}^{2} + \frac{2}{N} \sum_{i \in Ω_{s}} ξ_{i} 〈X_{i}, Δ^{s}〉 + \frac{2}{N} \sum_{i \in Ω_{s}} 〈X_{i}, Δ^{ι}〉 〈X_{i}, Δ^{s}〉 \\ + \frac{2}{N} \sum_{i \in Ω_{ι}} 〈X_{i}, Δ^{ι}〉 〈X_{i}, Δ_{Θ_{s}^{⊥}}^{s}〉 + 2 \sum_{i \in Ω_{ι}} 〈E, Δ_{Θ_{s}^{⊥}}^{s}〉 + λ_{s} ∥ \hat{S} ∥_{1} \leq λ_{s} {∥ S^{*} ∥}_{1} \end{matrix}

(A56)

Note that by using

2 a b > - (1 / 2 a^{2} + 2 b^{2})

, we have

\begin{matrix} \frac{1}{N} \sum_{i \in Ω_{s}} {〈X_{i}, Δ^{s}〉}^{2} + \frac{2}{N} \sum_{i \in Ω_{s}} ξ_{i} 〈X_{i}, Δ^{s}〉 + \frac{2}{N} \sum_{i \in Ω_{s}} 〈X_{i}, Δ^{ι}〉 〈X_{i}, Δ^{s}〉 \\ \geq - (\frac{2}{N} \sum_{i \in Ω_{s}} ξ_{i}^{2} + \frac{2}{N} \sum_{i \in Ω_{s}} {〈X_{i}, Δ^{ι}〉}^{2}) \end{matrix}

Thus on the event

E

defined in Equation (A9), we have

\begin{matrix} \frac{1}{N} \sum_{i \in Ω_{ι}} {〈X_{i}, Δ^{s}〉}^{2} & \leq | \frac{2}{N} \sum_{i \in Ω_{ι}} 〈X_{i}, Δ^{ι}〉 〈X_{i}, Δ_{Θ_{s}^{⊥}}^{s}〉 | + | 2 \sum_{i \in Ω_{ι}} 〈E, Δ_{Θ_{s}^{⊥}}^{s}〉 | \\ + λ_{s} (∥ S^{*} ∥_{1} - ∥ \hat{S} ∥_{1}) + \frac{2}{N} \sum_{i \in Ω_{s}} ξ_{i}^{2} + \frac{2}{N} \sum_{i \in Ω_{s}} {〈X_{i}, Δ^{ι}〉}^{2} \\ \overset{(i)}{\leq} {(4 a ∥ W ∥}_{\infty} + {2 ∥ E ∥}_{\infty}) ∥ Δ_{Θ_{s}^{⊥}}^{s} ∥_{1} + λ_{s} (∥ S^{*} ∥_{1} - ∥ \hat{S} ∥_{1}) \\ + \frac{2}{N} \sum_{i \in Ω_{s}} ξ_{i}^{2} + \frac{2}{N} \sum_{i \in Ω_{s}} {〈X_{i}, Δ^{ι}〉}^{2} \\ \overset{(i i)}{\leq} \frac{λ_{s}}{2} ∥ {\hat{S}}_{Θ_{s}^{⊥}} ∥_{1} + λ_{s} (∥ S^{*} ∥_{1} - ∥ \hat{S} ∥_{1}) + \frac{2}{N} \sum_{i \in Ω_{s}} (ξ_{i}^{2} + {〈X_{i}, Δ^{ι}〉}^{2}) \\ \overset{(i i i)}{\leq} λ_{s} {∥ S^{*} ∥}_{1} + \frac{2}{N} \sum_{i \in Ω_{s}} (ξ_{i}^{2} + 4 a^{2}) \\ \overset{(i v)}{\leq} \frac{2 N_{s}}{N} (4 a^{2} + 2 a N λ_{ι} + C N_{s} σ^{2} log \tilde{d}) \end{matrix}

(A57)

where

(i)

holds due to Lemma A2;

(i i)

holds because

λ_{s} \geq {4 (2 a ∥ W ∥}_{\infty} + {∥ E ∥}_{\infty})

, and

Δ_{Θ_{s}^{⊥}}^{s} = - {\hat{S}}_{Θ_{s}^{⊥}}

;

(i i i)

holds because

∥ \hat{S} ∥_{1} = ∥ {\hat{S}}_{Θ_{s}} ∥_{1} + ∥ {\hat{S}}_{Θ_{s}^{⊥}} ∥_{1} \geq {∥ {\hat{S}}_{Θ_{s}^{⊥}} ∥}_{1}

, and

| 〈X_{i}, Δ^{ι}〉 | \leq ∥ Δ^{ι} ∥_{\infty} \leq 2 a

;

(i v)

holds as a consequence of

∥ S^{*} ∥_{\infty} \leq a

,

| Θ_{s} | \leq N_{s}

(due to Assumption A1.I), and the definition of event

E

.

Now, we discuss the bound of

∥ Δ^{s} ∥_{Π}

in two cases.

Case 1. If $∥ Δ^{s} ∥_{Π}^{2} \leq β = 4 a^{2} \sqrt{\frac{128 log \tilde{d}}{N_{ι}}}$ , then Equation (A52) holds trivially.
Case 2. If $∥ Δ^{s} ∥_{Π}^{2} \geq 4 a^{2} \sqrt{\frac{128 log \tilde{d}}{N_{ι}}}$ , then we have

$\begin{matrix} \frac{Δ^{s}}{2 a} \in D (\frac{N_{s}}{2 a N λ_{s}} (4 a^{2} + 8 a N λ_{s} + C σ^{2} log \tilde{d}), δ) \end{matrix}$

due to the fact $∥ Δ^{s} / (2 a) ∥_{\infty} \leq 1$ and Equation (A51). Then according to Lemma A5, it holds with probability at least 1 - $1 / {\tilde{d}}^{2}$ that

$\begin{matrix} \frac{1}{N_{ι}} \sum_{i \in Ω_{ι}} {〈X_{i}, Δ^{s}〉}^{2} \geq \frac{1}{2} ∥ Δ^{s} ∥_{Π}^{2} - \frac{16 a N_{s}}{N λ_{s}} (4 a^{2} + 8 a N λ_{s} + C σ^{2} log \tilde{d}) {E [∥ E ∥}_{\infty}] . \end{matrix}$

(A58)

Combing Equations (A57) and (A58) yields the bound on $∥ Δ^{s} ∥_{Π}$ .

□

Lemma A5.

Define the following set

\begin{matrix} D (δ, β) : = \{B \in R^{d_{1} \times d_{2} \times d_{3}} {| ∥ B ∥}_{\infty} \leq {1, ∥ B ∥}_{Π}^{2} \geq β, {∥ B ∥}_{1} \leq δ\} \end{matrix}

(A59)

Then, it holds with probability at least

1 - 2 / {\tilde{d}}^{3}

that

\begin{matrix} \frac{1}{N_{ι}} \sum_{i \in Ω_{ι}} {〈X_{i}, B〉}^{2} \geq \frac{1}{2} {∥ B ∥}_{Π}^{2} - 8 δ E [∥ R_{Σ} ∥_{\infty}] \end{matrix}

(A60)

for any

B \in D (δ, β)

.

Proof of Lemma A5.

We prove this lemma using a standard peeling argument. First, define the following

\begin{matrix} G : = \{\exists B \in D (δ, β) s u c h t h a t | \frac{1}{N_{ι}} \sum_{i \in Ω_{ι}} 〈X_{i}, B〉 - {∥ B ∥}_{Π}^{2} | \geq \frac{1}{2} {∥ B ∥}_{Π}^{2} + 8 δ E [∥ R_{Σ} ∥_{\infty}]\} \end{matrix}

We partition this set to simpler events with

l \in N_{+}

:

\begin{matrix} G_{l} : = {\exists B & \in D (δ, β) \cap C^{'} (t) w i t h t \in [α^{l - 1} β, α^{l} β] \\ s u c h t h a t | \frac{1}{N_{ι}} \sum_{i \in Ω_{ι}} 〈X_{i}, B〉 - {∥ B ∥}_{Π}^{2} | \geq \frac{1}{2} {∥ B ∥}_{Π}^{2} + 8 δ E [∥ R_{Σ} ∥_{\infty}]} \end{matrix}

(A61)

Note that according to Lemma A7, we have with

t \in [α^{l - 1} β, α^{l} β)

:

\begin{matrix} P [G_{l}] \leq P [sup_{B \in C^{'} (t)} | \frac{1}{N_{ι}} \sum_{i \in Ω_{ι}} 〈X_{i}, B〉 - {∥ B ∥}_{Π}^{2} | \geq \frac{1}{2 α} α^{l} β + 8 δ E [∥ R_{Σ} ∥_{\infty}]] \leq exp (- \frac{n {(α^{l} β)}^{2}}{32 α^{2}}) \end{matrix}

Thus, we have

\begin{matrix} P [G] \leq P [⋃_{l = 1}^{\infty} G_{l}] \leq \sum_{l = 1}^{\infty} P [G_{l}] & \leq \sum_{l = 1}^{\infty} exp (- \frac{N_{ι} {(α^{l} β)}^{2}}{32 α^{2}}) \\ \overset{}{=} exp (- \frac{N_{ι} β^{2}}{32}) + \sum_{l = 2}^{\infty} exp (- \frac{N_{ι} β^{2}}{32} α^{2 (l - 1)}) \\ \overset{(i)}{\leq} exp (- \frac{N_{ι} β^{2}}{32}) + \sum_{l = 2}^{\infty} exp (- \frac{N_{ι} β^{2}}{32} \cdot 2 (l - 1) log α) \\ \overset{}{\leq} exp (- \frac{N_{ι} β^{2}}{32}) + \frac{exp (- \frac{N_{ι} β^{2}}{16} log α)}{1 - exp (- \frac{N_{ι} β^{2}}{16} log α)} \end{matrix}

(A62)

where

(i)

is due to

x \geq log x

for positive x. By setting

α = e

and recalling the value of

β = \sqrt{\frac{128 log \tilde{d}}{N_{ι}}}

, the lemma is proved. □

Lemma A6.

For any

(A, B) \in C (r, κ, β) \cap R^{d_{1} \times d_{2} \times d_{3}} \times B (δ_{1}, δ_{2})

, it holds with probability at least

1 - 2 / \tilde{d}

that

\begin{matrix} \frac{1}{N_{ι}} \sum_{i \in Ω_{ι}} {〈X_{i}, A + B〉}^{2} \geq \frac{1}{2} {∥ A + B ∥}_{Π}^{2} - τ (r, κ, δ_{1}, δ_{2}) \end{matrix}

(A63)

where

\begin{matrix} τ (r, κ, δ_{1}, δ_{2}) = 4 (16 α + 1) r d_{3} | Θ_{s}^{⊥} | E [∥ R_{Σ} ∥_{s p}]^{2} + 8 κ E [∥ R_{Σ} ∥_{s p}] + 8 δ_{2} E [∥ R_{Σ} ∥_{\infty}] + 4 δ_{1}^{2} \end{matrix}

(A64)

Proof of Lemma A6.

The proof is very similar to that of Lemma A5, and we simply omit it. □

Lemma A7.

Define the set

\begin{matrix} C^{'} (t) : = {T \in R^{d_{1} \times d_{2} \times d_{3}} {| ∥ T ∥}_{Π}^{2} \leq {t, ∥ T ∥}_{\infty} \leq 1} \end{matrix}

(A65)

and

\begin{matrix} Z_{t} : = sup_{T \in C^{'} (t)} |\frac{1}{N_{ι}} \sum_{i \in Ω_{ι}} {〈X_{i}, T〉}^{2} - {∥ T ∥}_{Π}^{2}| . \end{matrix}

(A66)

Then, it holds that

\begin{matrix} P [Z_{t} \geq E [Z_{t}] + \frac{t}{4 α}] \leq exp (- \frac{N_{ι} t^{2}}{128 α^{2}}), a n d E [Z_{t}] \leq 8 E [sup_{T \in C^{'} (t)} |〈R_{Σ}, T〉|] \end{matrix}

(A67)

Proof of Lemma A7.

First, we study the tail behavior of

Z_{t}

by directly using the Massart’s inequality in Theorem 14.2 of [64]. According to the Massart’s inequality, it holds for any

s > 0

\begin{matrix} P [Z_{t} \geq E [Z_{t}] + s] \leq exp (- \frac{N_{ι} s^{2}}{8}) \end{matrix}

(A68)

By letting

s = t / (4 α)

, the first inequality in Equation (A67) is proved.

Then, we will upper bound the expectation of

Z_{t}

. By standard symmetrization argument [65], we have

\begin{matrix} E [Z_{t}] & = E [sup_{T \in C^{'} (t)} |\frac{1}{N_{ι}} \sum_{i \in Ω_{ι}} {〈X_{i}, T〉}^{2} - E {〈X, T〉}^{2}|] \\ = 2 E [sup_{T \in C^{'} (t)} |\frac{1}{N_{ι}} \sum_{i \in Ω_{ι}} ε_{i} {〈X_{i}, T〉}^{2}|] \end{matrix}

(A69)

where

ε_{i}

’s are i.i.d. Randemacher variables. Further, according to the contraction principle [66], it holds that

\begin{matrix} E [Z_{t}] & \leq 8 E [sup_{T \in C^{'} (t)} |\frac{1}{N_{ι}} \sum_{i \in Ω_{ι}} ε_{i} 〈X_{i}, T〉|] = 8 E [sup_{T \in C^{'} (t)} |〈R_{Σ}, T〉|] \end{matrix}

(A70)

In the following, we consider the two cases:

Case 1. Consider

T \in D (δ, β) \cap C^{'} (t)

, we have

\begin{matrix} E [Z_{t}] \leq 8 E [sup_{T} |〈R_{Σ}, T〉|] \leq 8 E [∥ R_{Σ} ∥_{\infty} {∥ T ∥}_{1}] \leq 8 δ E [∥ R_{Σ} ∥_{\infty}] . \end{matrix}

(A71)

By letting

s = t / (2 α)

in Equation (A68), we obtain

\begin{matrix} P [Z_{t} \geq 8 δ E [∥ R_{Σ} ∥_{\infty}] + \frac{t}{2 α}] \leq exp (- \frac{N_{ι} t^{2}}{32 α^{2}}) \end{matrix}

(A72)

when

T \in D (δ, β) \cap C^{'} (t)

.

Case 2. Consider

T = A + B

, where

(A, B) \in C (r, κ, β)

,

B \in B (δ_{1}, δ_{2})

, and

{∥ T ∥}_{Π}^{2} \leq t

. The goal in this case is to upper bound

\begin{matrix} E [Z_{t}] & \leq 8 E [sup_{A, B} |〈R_{Σ}, A + B〉|] \\ \overset{(i)}{\leq} 8 E [sup_{A} |〈R_{Σ}, A〉|] + 8 E [sup_{B} |〈R_{Σ}, B〉|] \\ \overset{(i i)}{\leq} 8 E [sup_{A} ∥ R_{Σ} ∥_{s p} {∥ A ∥}_{⋆}] + 8 E [sup_{B} ∥ R_{Σ} ∥_{\infty} {∥ B ∥}_{1}] \\ \overset{(i i i)}{\leq} 8 E [sup_{A} ∥ R_{Σ} ∥_{s p} {∥ A ∥}_{⋆}] + 8 δ_{2} E [∥ R_{Σ} ∥_{\infty}] \end{matrix}

(A73)

where

(i)

holds as a property of the sup operation;

(i i)

holds due to the definition of dual norm;

(i i i)

stems from the condition

B \in B (δ_{1}, δ_{2})

.

It remains to upper bound

{∥ A ∥}_{⋆}

in Equation (A72). First, according to the definition of

C (r, κ, β)

, we have

\begin{matrix} {∥ A ∥}_{⋆} \leq \sqrt{r d_{3}} {∥ A_{Θ_{s}^{⊥}} ∥}_{F} + κ \end{matrix}

(A74)

We also have

\begin{matrix} {∥ A ∥}_{Π} \overset{(i)}{\leq} {∥ A + B ∥}_{Π} + {∥ B ∥}_{Π} \overset{(i i)}{\leq} \sqrt{t} + δ_{1} . \end{matrix}

(A75)

where

(i)

holds due to the triangular inequality, and

(i i)

is a result of conditions

{∥ T ∥}_{Π}^{2} \leq t

and

B \in B (δ_{1}, δ_{2})

. Since Assumption A1.II indicates

{∥ A ∥}_{Π}^{2} = | Θ_{s}^{⊥} |^{- 1} {∥ A_{Θ_{s}^{⊥}} ∥}_{F}^{2}

, combing Equations (A74) and (A75) further yields an upper bound on

{∥ A ∥}_{⋆}

as follows:

\begin{matrix} {∥ A ∥}_{⋆} \leq \sqrt{r d_{3} | Θ_{s}^{⊥} |} (\sqrt{t} + δ_{1}) + κ \end{matrix}

which further leads to

\begin{matrix} 8 E [sup_{A} ∥ R_{Σ} ∥_{s p} {∥ A ∥}_{⋆}] \leq 8 E [∥ R_{Σ} ∥_{s p}] (\sqrt{r d_{3} | Θ_{s}^{⊥} |} (\sqrt{t} + δ_{1}) + κ) \end{matrix}

(A76)

The application of

2 \sqrt{a b} \leq a / c + b c

is also used to further relax the above inequality:

\begin{matrix} 8 E [∥ R_{Σ} ∥_{s p}] (\sqrt{r d_{3} | Θ_{s}^{⊥} |} \sqrt{t} \leq \frac{t}{4 α} + 64 α r d_{3} | Θ_{s}^{⊥} | E [∥ R_{Σ} {∥_{s p}]}^{2} \\ 8 E [∥ R_{Σ} ∥_{s p}] (\sqrt{r d_{3} | Θ_{s}^{⊥} |} δ_{1} \leq 4 δ_{1}^{2} + 4 r d_{3} | Θ_{s}^{⊥} | E [∥ R_{Σ} {∥_{s p}]}^{2} \end{matrix}

(A77)

Thus, putting things in Equations (A72), (A76) and (A77) together, we obtain

\begin{matrix} E [Z_{t}] \leq \frac{t}{4 α} + \underset{= : τ}{\underset{︸}{4 (16 α + 1) r d_{3} | Θ_{s}^{⊥} | E [∥ R_{Σ} ∥_{s p}]^{2} + 8 κ E [∥ R_{Σ} ∥_{s p}] + 8 δ_{2} E [∥ R_{Σ} ∥_{\infty}] + 4 δ_{1}^{2}}} \end{matrix}

which further gives

\begin{matrix} P [Z_{t} \geq \frac{t}{2 α} + τ] \leq exp (- \frac{N_{ι} t^{2}}{128}) \end{matrix}

(A78)

for

T = A + B

, where

(A, B) \in C (r, κ, β)

,

B \in B (δ_{1}, δ_{2})

, and

{∥ T ∥}_{Π}^{2} \leq t

. □

Lemma A8.

Under Assumption A1, there exists an absolute constant

C > 0

such that the following bounds on the tensor spectral norm of stochastic tensors

E

and

R_{Σ}

hold:

(I): For tensor $E$ , we have

$\begin{matrix} P [{∥ E ∥}_{s p} \leq C σ max \{\sqrt{\frac{t + log \tilde{d}}{ϱ N (d_{1} \land d_{2})}} + \frac{log (d_{1} \land d_{2}) (t + log \tilde{d})}{N}\}] \leq 1 - e^{- t} \end{matrix}$

(A79)
(II): For tensor $R_{Σ}$ , we have

$\begin{matrix} E [∥ R_{Σ} ∥_{\infty}] \leq (\sqrt{\frac{log \tilde{d}}{N_{ι} (d_{1} \land d_{2})}} + \frac{{log}^{2} \tilde{d}}{N_{ι}}) \end{matrix}$

(A80)

Proof of Lemma A8.

Equation (A79) can be seen as a special case of Lemma 8 in [18] when

k = 1

. Equation (A80) can be proved very similarly to Equation (A79) followed by tricks used in proof of Lemma 6 in [51]. We omit the details due to the high similarity. □

Lemma A9.

Under Assumption A1, there exists an absolute constant

C > 0

such that the following bounds on the

l_{\infty}

-norm of stochastic tensors

E

,

W

, and

R_{Σ}

hold:

(I): For tensor $E$ , we have

$\begin{matrix} P [{∥ E ∥}_{\infty} \leq C σ (\sqrt{\frac{t + log \tilde{d}}{ϱ N d_{1} d_{2} d_{3}}} + \frac{t + log d}{N})] \leq 1 - e^{- t} \end{matrix}$

(A81)

$\begin{matrix} E [{∥ E ∥}_{\infty}] \leq C σ (\sqrt{\frac{log \tilde{d}}{ϱ N d_{1} d_{2} d_{3}}} + \frac{log d}{N}) \end{matrix}$

(A82)
(II): For tensor $W$ , we have

$\begin{matrix} P [{∥ W ∥}_{\infty} \leq C (\frac{1}{ϱ d_{1} d_{2} d_{3}} + \sqrt{\frac{t + log \tilde{d}}{ϱ N d_{1} d_{2} d_{3}}} + \frac{t + log \tilde{d}}{N})] \leq 1 - e^{- t} \end{matrix}$

(A83)

$\begin{matrix} E [{∥ W ∥}_{\infty}] \leq C (\frac{1}{ϱ d_{1} d_{2} d_{3}} + \sqrt{\frac{log \tilde{d}}{ϱ N d_{1} d_{2} d_{3}}} + \frac{log \tilde{d}}{N}) \end{matrix}$

(A84)
(III): For tensor $R_{Σ}$ , we have

$\begin{matrix} P [∥ R_{Σ} ∥_{\infty} \leq C (\sqrt{\frac{t + log \tilde{d}}{N_{ι} d_{1} d_{2} d_{3}}} + \frac{t + log d}{N_{ι}})] \leq 1 - e^{- t} \end{matrix}$

(A85)

$\begin{matrix} E [∥ R_{Σ} ∥_{\infty}] \leq C σ (\sqrt{\frac{log \tilde{d}}{N_{ι} d_{1} d_{2} d_{3}}} + \frac{log \tilde{d}}{N_{ι}}) \end{matrix}$

(A86)

Proof of Lemma A9.

Since this lemma can be straightforwardly proved in the same way as Lemma 10 in [36], we omit the proof. □

Appendix A.3. Proof of Theorem 4

The proof of Theorem 4 follows those of Theorems 2 and 3 in [36] for robust matrix completion. Given

L^{*}

and

S^{*}

, let

P_{(L^{*}, S^{*})} [\cdot]

be the probability with respect to the random design tensors

{X_{i}}

and random noises

{ξ_{i}}

according to the observation model in Equation (6). Without loss of generality, we assume

d_{1} \geq d_{2}

.

Proof of Theorem 4.

For element-wisely sparse

S^{*}

, we first construct a set

L \subset L (r, a)

which satisfies the following conditions:

(i): For any tensor $T$ in $L$ , we have $r_{t b} (T) \leq r$ ;
(ii): For any two tensors $T_{1}, T_{2}$ in $L$ , we have $r_{t b} (T_{1} - T_{2}) \leq r$ ;
(iii): For any tensor $T = (T_{i j k})$ in $L$ , any of its entries $T_{i j k}$ are in ${0, α}$ , where $α \leq a$ ,

Motivated by the proof of Theorem 2 in [36],

L

is constructed as follows

\begin{matrix} L : = & \{L \in R^{d_{1} \times d_{2} \times d_{3}} : \forall k \in [d_{3}], L^{(k)} = (M_{k} | \dots | M_{k} | 0) \in R^{d_{1} \times d_{2}}, where M_{k} \in {0, α}^{d_{1} \times r}\}, \end{matrix}

(A87)

where

0 \in R^{d_{1} \times (d_{2} - r ⌊ \frac{d_{2}}{2 r} ⌋)}

is the zero matrix, and

α = γ (a \land σ) \sqrt{r d_{1} d_{3} / N_{ι}}

with

γ \leq 1

being a small enough constant such that

α \leq a

.

Next, according to the Varshamov-Gilbert lemma (see Lemma 2.9 in [67]), there exists a set

L_{0} \subset L

containing the zero tensor

0 \in R^{d_{1} \times d_{2} \times d_{3}}

, such that

(i): its cardinality $| L_{0} | \geq 2^{r d_{1} d_{3} / 8} + 1$ , and
(ii): for any two distinct tensors $T_{1}$ and $T_{2}$ in $L_{0}$ ,

$\begin{matrix} ∥ T_{1} - T_{2} ∥_{F}^{2} & \geq \frac{d_{1} d_{3} r}{8} \cdot γ^{2} {(a \land σ)}^{2} \frac{r d_{1} d_{3}}{N_{ι}} \cdot ⌊ \frac{d_{2}}{2 r} ⌋ \geq \frac{γ^{2} d_{1} d_{2} d_{3}}{16} \cdot \frac{{(σ \land a)}^{2} r d_{1} d_{3}}{N_{ι}} . \end{matrix}$

(A88)

Let

P_{(L, 0)}

denote the probabilistic distribution of random variables

{y_{i}}

observed when the underlying tensor is

(L, 0)

in the observation model (6). Note that, the distribution of the random noise

ξ_{i} \overset{i . i . d .}{\sim} N (0, σ^{2})

. Thus, for any

L \in L_{0}

, the KL divergence

K (P_{0, 0}, P_{(L, 0)})

between

P_{(0, 0)}

and

P_{(L, 0)}

satisfies

\begin{matrix} K (P_{(0, 0)}, P_{(L, 0)}) = \frac{| Ω_{ι} |}{2 σ^{2}} {∥ L ∥}_{Π}^{2} \leq \frac{| Ω_{ι} |}{2 σ^{2}} γ^{2} {(a \land σ)}^{2} \frac{r d_{1} d_{3}}{N_{ι}} \leq \frac{γ^{2} r d_{1} d_{3}}{2} . \end{matrix}

(A89)

Hence, if we choose

γ \in (0, \sqrt{b log 2} / 2]

, then it holds that

\begin{matrix} \frac{1}{| T_{0} | - 1} \sum_{L \in L_{0}} K (P_{0}, P_{L}) \leq b log (| L_{0} | - 1), \end{matrix}

(A90)

for any

b \in (0, 1 / 8)

.

According to Theorem 2.5 in [67], using Equations (A88) and (A90), there exists a constant

c > 0

, such that

\begin{matrix} inf_{(\hat{L}, \hat{S})} sup_{L^{*} \in L {r, a}} P_{(L^{*}, 0)} & [\frac{∥ \hat{L} - L^{*} ∥_{F}^{2}}{d_{1} d_{2} d_{3}} > c \frac{{(σ \land a)}^{2} r d_{1} d_{3}}{N_{ι}}] \geq β (b, r, d_{1}, d_{2}, d_{3}), \end{matrix}

(A91)

where

\begin{matrix} β (b, r, d_{1}, d_{2}, d_{3}) = \frac{1}{1 + 2^{- r d_{1} d_{3} / 16}} (1 - 2 b - 4 \sqrt{\frac{b}{r d_{1} d_{3} log 2}}) > 0 . \end{matrix}

(A92)

Note that b can be chosen to be arbitrarily small, then low-rank part of Theorem 4 is proved.

Then, we consider the sparse part of Theorem 4. Given a set

Ω_{e} \subset [d_{1}] \times [d_{2} - ⌊ d_{2} / 2 ⌋] \times [d_{3}]

with cardinality

s = | Ω_{e} | \leq (d_{1} d_{2} d_{3}) / 2

, we also define

S

as follows

\begin{matrix} S : = {S = [0 | M], & w h e r e M_{i j k} \in \{\begin{matrix} {0, α^{'}}, & if (i, j, k) \in Ω_{e} \\ {0}, & if (i, j, k) \notin Ω_{e} \end{matrix}} . \end{matrix}

where

0 \in R^{d_{1} \times ⌊ \frac{d_{2}}{2} ⌋ \times d_{3}}

is the zero tensor, and

α^{'} = γ^{'} (σ \land a)

. Then, according to the Varshamov-Gilbert lemma (see Lemma 2.9 in [67]), there exists a set

S_{0} \subset S

containing the zero tensor

0 \in R^{d_{1} \times d_{2} \times d_{3}}

, such that: (i) its cardinality

| S_{0} | \geq 2^{s / 8} + 1

, and (ii) for any two distinct tensors

S_{1}

and

S_{2}

in

S_{0}

,

∥ S_{1} - S_{2} ∥_{F}^{2} \geq \frac{s γ^{' 2} {(σ \land a)}^{2}}{8} .

(A93)

Let

P_{(0, S)}

denote the probabilistic distribution of random variables

Y

observed when the underlying tensor is

(0, S)

in the observation model (6). Thus, for any

S \in S_{0}

, the KL divergence

K (P_{(0, 0)}, P_{(0, S)})

between

P_{(0, 0)}

and

P_{(0, S)}

satisfies

\begin{matrix} K (P_{(0, 0)}, P_{(0, S)}) = N_{s} \frac{S_{i j k}^{2}}{2 σ^{2}} \leq \frac{s γ^{' 2}}{2} . \end{matrix}

(A94)

Hence, if we choose

γ^{'} \in (0, \sqrt{b^{'} log 2} / 2]

, then it holds that

\begin{matrix} \frac{1}{| S_{0} | - 1} \sum_{S \in S_{0}} K (P_{(0, 0)}, P_{(0, S)}) \leq b^{'} log (| S_{0} | - 1), \end{matrix}

(A95)

for any

b^{'} \in (0, 1 / 8)

. According to Theorem 2.5 in [67], using Equations (A93) and (A95), there exists a constant

c^{'} > 0

, such that

\begin{matrix} inf_{(\hat{L}, \hat{S})} sup_{S^{*} \in S} P_{(0, S^{*})} & [\frac{∥ \hat{S} - S^{*} ∥_{F}^{2}}{d_{1} d_{2} d_{3}} > c^{'} \frac{{(σ \land a)}^{2} s}{d_{1} d_{2} d_{3}}] \geq β^{'} (b^{'}, s), \end{matrix}

(A96)

where

\begin{matrix} β^{'} (b^{'}, s) = \frac{1}{1 + 2^{- s / 8}} (1 - 2 b^{'} - 4 \sqrt{\frac{b^{'}}{s log 2}}) > 0 . \end{matrix}

(A97)

Note that

b^{'}

can be chosen to be arbitrarily small, then sparse part of Theorem 4 is proved.

Thus, according to Equations (A91) and (A96), by setting

\begin{matrix} c_{1}^{'} = \frac{c}{2}, c_{1}^{''} = \frac{c^{'}}{2}, and β_{1} = min \{β (b, r, d_{1}, d_{2}, d_{3}), β^{'} (b^{'}, s)\}, \end{matrix}

(A98)

the following relationship holds

\begin{matrix} inf_{(\hat{L}, \hat{S})} sup_{\begin{matrix} (L^{*}, S^{*}) \\ \in A (r, s, a) \end{matrix}} P_{(L^{*}, S^{*})} [\frac{∥ Δ^{ι} ∥_{F}^{2} + {∥ Δ^{s} ∥}_{F}^{2}}{d_{1} d_{2} d_{3}} \geq ϕ_{e}] \geq β_{1}, \end{matrix}

(A99)

where

ϕ : = {(σ \land a)}^{2} (c_{1}^{'} r d_{1} d_{3} / N_{ι} + c_{1}^{''} s / (d_{1} d_{2} d_{3}))

. Then according to Markov inequality, we obtain

\begin{matrix} M (A (r, s, a)) \geq β_{1} ϕ . \end{matrix}

(A100)

□

References

He, W.; Yokoya, N.; Yuan, L.; Zhao, Q. Remote sensing image reconstruction using tensor ring completion and total variation. IEEE Trans. Geosci. Remote Sens. 2019, 57, 8998–9009. [Google Scholar] [CrossRef]
He, W.; Yao, Q.; Li, C.; Yokoya, N.; Zhao, Q.; Zhang, H.; Zhang, L. Non-local meets global: An integrated paradigm for hyperspectral image restoration. IEEE Trans. Pattern Anal. Mach. Intell. 2020. [Google Scholar] [CrossRef] [PubMed]
Davis, J.W.; Sharma, V. Background-subtraction using contour-based fusion of thermal and visible imagery. Comput. Vis. Image Underst. 2007, 106, 162–182. [Google Scholar] [CrossRef]
Bello, S.A.; Yu, S.; Wang, C.; Adam, J.M.; Li, J. Deep learning on 3D point clouds. Remote Sens. 2020, 12, 1729. [Google Scholar] [CrossRef]
Zheng, Y.B.; Huang, T.Z.; Zhao, X.L.; Chen, Y.; He, W. Double-factor-regularized low-rank tensor factorization for mixed noise removal in hyperspectral image. IEEE Trans. Geosci. Remote Sens. 2020, 58, 8450–8464. [Google Scholar] [CrossRef]
Liu, H.K.; Zhang, L.; Huang, H. Small target detection in infrared videos based on spatio-temporal tensor model. IEEE Trans. Geosci. Remote Sens. 2020, 58, 8689–8700. [Google Scholar] [CrossRef]
Zhou, A.; Xie, W.; Pei, J. Background modeling combined with multiple features in the Fourier domain for maritime infrared target detection. IEEE Trans. Geosci. Remote. Sens. 2021. [Google Scholar] [CrossRef]
Jiang, Q.; Ng, M. Robust low-tubal-rank tensor completion via convex optimization. In Proceedings of the 28th International Joint Conference on Artificial Intelligence, Macao, China, 10–16 August 2019; AAAI Press: Palo Alto, CA, USA, 2019; pp. 2649–2655. [Google Scholar]
Zhao, Q.; Zhou, G.; Zhang, L.; Cichocki, A.; Amari, S.I. Bayesian robust tensor factorization for incomplete multiway data. IEEE Trans. Neural Networks Learn. Syst. 2016, 27, 736–748. [Google Scholar] [CrossRef] [Green Version]
Liu, H.; Li, H.; Wu, Z.; Wei, Z. Hyperspectral image recovery using non-convex low-rank tensor approximation. Remote Sens. 2020, 12, 2264. [Google Scholar] [CrossRef]
Ma, T.H.; Xu, Z.; Meng, D. Remote sensing image denoising via low-rank tensor approximation and robust noise modeling. Remote Sens. 2020, 12, 1278. [Google Scholar] [CrossRef] [Green Version]
Fazel, M. Matrix Rank Minimization with Applications. Ph.D. Thesis, Stanford University, Stanford, CA, USA, 2002. [Google Scholar]
Liu, J.; Musialski, P.; Wonka, P.; Ye, J. Tensor completion for estimating missing values in visual data. IEEE Trans. Pattern Anal. Mach. Intell. 2013, 35, 208–220. [Google Scholar] [CrossRef]
Carroll, J.D.; Chang, J. Analysis of individual differences in multidimensional scaling via an N-way generalization of “Eckart-Youn” decomposition. Psychometrika 1970, 35, 283–319. [Google Scholar] [CrossRef]
Tucker, L.R. Some mathematical notes on three-mode factor analysis. Psychometrika 1966, 31, 279–311. [Google Scholar] [CrossRef] [PubMed]
Oseledets, I. Tensor-train decomposition. SIAM J. Sci. Comput. 2011, 33, 2295–2317. [Google Scholar] [CrossRef]
Zhao, Q.; Zhou, G.; Xie, S.; Zhang, L.; Cichocki, A. Tensor ring decomposition. arXiv 2016, arXiv:1606.05535. [Google Scholar]
Wang, A.; Zhou, G.; Jin, Z.; Zhao, Q. Tensor recovery via *_L-spectral k-support norm. IEEE J. Sel. Top. Signal Process. 2021, 15, 522–534. [Google Scholar] [CrossRef]
Wang, A.; Li, C.; Jin, Z.; Zhao, Q. Robust tensor decomposition via orientation invariant tubal nuclear norms. In Proceedings of the The AAAI Conference on Artificial Intelligence (AAAI), New York, NY, USA, 7–12 February 2020; pp. 6102–6109. [Google Scholar]
Zhang, Z.; Ely, G.; Aeron, S.; Hao, N.; Kilmer, M. Novel methods for multilinear data completion and de-noising based on tensor-SVD. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Columbus, OH, USA, 23–28 June 2014; pp. 3842–3849. [Google Scholar]
Liu, X.; Aeron, S.; Aggarwal, V.; Wang, X. Low-tubal-rank tensor completion using alternating minimization. IEEE Trans. Inf. Theory 2020, 66, 1714–1737. [Google Scholar] [CrossRef]
Kilmer, M.E.; Braman, K.; Hao, N.; Hoover, R.C. Third-order tensors as operators on matrices: A theoretical and computational framework with applications in imaging. SIAM J. Matrix Anal. A 2013, 34, 148–172. [Google Scholar] [CrossRef] [Green Version]
Liu, X.Y.; Wang, X. Fourth-order tensors with multidimensional discrete transforms. arXiv 2017, arXiv:1705.01576. [Google Scholar]
Kernfeld, E.; Kilmer, M.; Aeron, S. Tensor–tensor products with invertible linear transforms. Linear Algebra Its Appl. 2015, 485, 545–570. [Google Scholar] [CrossRef]
Zhang, X.; Ng, M.K.P. Low rank tensor completion with poisson observations. IEEE Trans. Pattern Anal. Mach. Intell.. 2021. [Google Scholar] [CrossRef]
Lu, C.; Peng, X.; Wei, Y. Low-rank tensor completion with a new tensor nuclear norm induced by invertible linear transforms. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 16–20 June 2019; pp. 5996–6004. [Google Scholar]
Song, G.; Ng, M.K.; Zhang, X. Robust tensor completion using transformed tensor singular value decomposition. Numer. Linear Algebr. 2020, 27, e2299. [Google Scholar] [CrossRef]
He, B.; Yuan, X. On the O(1/n) convergence rate of the Douglas–Rachford alternating direction method. SIAM J. Numer. Anal. 2012, 50, 700–709. [Google Scholar] [CrossRef]
Parikh, N.; Boyd, S. Proximal algorithms. Found. Trends® Optim. 2014, 1, 127–239. [Google Scholar] [CrossRef]
Kolda, T.G.; Bader, B.W. Tensor decompositions and applications. SIAM Rev. 2009, 51, 455–500. [Google Scholar] [CrossRef]
Kong, H.; Lu, C.; Lin, Z. Tensor Q-rank: New data dependent definition of tensor rank. Mach. Learn. 2021, 110, 1867–1900. [Google Scholar] [CrossRef]
Lu, C.; Zhou, P. Exact recovery of tensor robust principal component analysis under linear transforms. arXiv 2019, arXiv:1907.08288. [Google Scholar]
Zhang, Z.; Aeron, S. Exact tensor completion using t-SVD. IEEE Trans. Signal Process. 2017, 65, 1511–1526. [Google Scholar] [CrossRef]
Wang, A.; Jin, Z.; Tang, G. Robust tensor decomposition via t-SVD: Near-optimal statistical guarantee and scalable algorithms. Signal Process. 2020, 167, 107319. [Google Scholar] [CrossRef]
Zhou, P.; Feng, J. Outlier-robust tensor PCA. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
Klopp, O.; Lounici, K.; Tsybakov, A.B. Robust matrix completion. Probab. Theory Relat. Fields 2017, 169, 523–564. [Google Scholar] [CrossRef]
Lu, C.; Feng, J.; Chen, Y.; Liu, W.; Lin, Z.; Yan, S. Tensor robust principal component analysis: Exact recovery of corrupted low-rank tensors via convex optimization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 26 June–1 July 2016; pp. 5249–5257. [Google Scholar]
Candès, E.J.; Li, X.; Ma, Y.; Wright, J. Robust principal component analysis? J. ACM (JACM) 2011, 58, 11. [Google Scholar] [CrossRef]
Negahban, S.; Wainwright, M.J. Estimation of (near) low-rank matrices with noise and high-dimensional scaling. Ann. Stat. 2011, 39, 1069–1097. [Google Scholar] [CrossRef]
Wang, A.; Wei, D.; Wang, B.; Jin, Z. Noisy low-tubal-rank tensor completion through iterative singular tube thresholding. IEEE Access 2018, 6, 35112–35128. [Google Scholar] [CrossRef]
Boyd, S.; Parikh, N.; Chu, E. Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends® Mach. Learn. 2011, 3, 1–122. [Google Scholar]
Wang, A.; Jin, Z. Near-optimal noisy low-tubal-rank tensor completion via singular tube thresholding. In Proceedings of the IEEE International Conference on Data Mining Workshop (ICDMW), New Orleans, LA, USA, 18–21 November 2017; pp. 553–560. [Google Scholar]
Wang, A.; Lai, Z.; Jin, Z. Noisy low-tubal-rank tensor completion. Neurocomputing 2019, 330, 267–279. [Google Scholar] [CrossRef]
Wang, A.; Song, X.; Wu, X.; Lai, Z.; Jin, Z. Generalized Dantzig selector for low-tubal-rank tensor recovery. In Proceedings of the The International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Brighton, UK, 12–17 May 2019; pp. 3427–3431. [Google Scholar]
Huang, B.; Mu, C.; Goldfarb, D.; Wright, J. Provable models for robust low-rank tensor completion. Pac. J. Optim. 2015, 11, 339–364. [Google Scholar]
Wang, A.; Song, X.; Wu, X.; Lai, Z.; Jin, Z. Robust low-tubal-rank tensor completion. In Proceedings of the ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK, 12–17 May 2019; pp. 3432–3436. [Google Scholar]
Fang, W.; Wei, D.; Zhang, R. Stable tensor principal component pursuit: Error bounds and efficient algorithms. Sensors 2019, 19, 5335. [Google Scholar] [CrossRef] [Green Version]
Wang, Z.; Bovik, A.C.; Sheikh, H.R.; Simoncelli, E.P. Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Process. 2004, 13, 600–612. [Google Scholar] [CrossRef] [Green Version]
Chen, J.; Wang, C.; Ma, Z.; Chen, J.; He, D.; Ackland, S. Remote sensing scene classification based on convolutional neural networks pre-trained using attention-guided sparse filters. Remote Sens. 2018, 10, 290. [Google Scholar] [CrossRef] [Green Version]
Yang, Y.; Newsam, S. Bag-of-visual-words and spatial extensions for land-use classification. In Proceedings of the 18th SIGSPATIAL International Conference on Advances in Geographic Information Systems, San Jose, CA, USA, 2–5 November 2010; pp. 270–279. [Google Scholar]
Klopp, O. Noisy low-rank matrix completion with general sampling distribution. Bernoulli 2014, 20, 282–303. [Google Scholar] [CrossRef] [Green Version]
Li, N.; Zhou, D.; Shi, J.; Wu, T.; Gong, M. Spectral-locational-spatial manifold learning for hyperspectral images dimensionality reduction. Remote Sens. 2021, 13, 2752. [Google Scholar] [CrossRef]
Mayalu, A.; Kochersberger, K.; Jenkins, B.; Malassenet, F. Lidar data reduction for unmanned systems navigation in urban canyon. Remote Sens. 2020, 12, 1724. [Google Scholar] [CrossRef]
Hwang, Y.S.; Schlüter, S.; Park, S.I.; Um, J.S. Comparative evaluation of mapping accuracy between UAV video versus photo mosaic for the scattered urban photovoltaic panel. Remote Sens. 2021, 13, 2745. [Google Scholar] [CrossRef]
Lou, J.; Zhu, W.; Wang, H.; Ren, M. Small target detection combining regional stability and saliency in a color image. Multimed. Tools Appl. 2017, 76, 14781–14798. [Google Scholar] [CrossRef]
Hui, B.; Song, Z.; Fan, H. A dataset for infrared detection and tracking of dim-small aircraft targets under ground/air background. China Sci. Data 2020, 5, 291–302. [Google Scholar]
Wang, Z.; Zeng, Q.; Jiao, J. An adaptive decomposition approach with dipole aggregation model for polarimetric SAR data. Remote Sens. 2021, 13, 2583. [Google Scholar] [CrossRef]
Wei, D.; Wang, A.; Feng, X.; Wang, B.; Wang, B. Tensor completion based on triple tubal nuclear norm. Algorithms 2018, 11, 94. [Google Scholar] [CrossRef] [Green Version]
Han, X.; Wu, B.; Shou, Z.; Liu, X.Y.; Zhang, Y.; Kong, L. Tensor FISTA-net for real-time snapshot compressive imaging. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 10933–10940. [Google Scholar]
Mu, C.; Zhang, Y.; Wright, J.; Goldfarb, D. Scalable robust matrix recovery: Frank–Wolfe meets proximal methods. SIAM J. Sci. Comput. 2016, 38, A3291–A3317. [Google Scholar] [CrossRef] [Green Version]
Wang, A.; Jin, Z.; Yang, J. A faster tensor robust PCA via tensor factorization. Int. J. Mach. Learn. Cybern. 2020, 11, 2771–2791. [Google Scholar] [CrossRef]
Lou, J.; Cheung, Y. Robust Low-Rank Tensor Minimization via a New Tensor Spectral k-Support Norm. IEEE TIP 2019, 29, 2314–2327. [Google Scholar] [CrossRef] [PubMed]
Negahban, S.; Yu, B.; Wainwright, M.J.; Ravikumar, P.K. A unified framework for high-dimensional analysis of M-estimators with decomposable regularizers. Proceedings of Advances in Neural Information Processing Systems, Vancouver, BC, USA, 7–10 December 2009; pp. 1348–1356. [Google Scholar]
Bühlmann, P.; Van De Geer, S. Statistics for High-Dimensional Data: Methods, Theory and Applications; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2011. [Google Scholar]
Vershynin, R. High-Dimensional Probability: An Introduction with Applications in Data Science; Cambridge University Press: Cambridge, UK, 2018; Volume 47. [Google Scholar]
Talagrand, M. A new look at independence. Ann. Probab. 1996, 24, 1–34. [Google Scholar] [CrossRef]
Tsybakov, A.B. Introduction to Nonparametric Estimation; Springer: New York, NY, USA, 2011. [Google Scholar]

Figure 1. An illustration of

*_{L}

–SVD [18].

Figure 1. An illustration of

*_{L}

–SVD [18].

Figure 2. An illustration of the robust tensor completion problem.

Figure 3. Plots of the MSE versus the tubal rank

r_{t b} (L^{*})

of the underlying tensor, the number of corruptions

∥ S^{*} ∥_{0}

, the number of uncorrupted observations

N_{ι}

and its inversion

N_{ι}^{- 1}

: (a) MSE vs. the tubal rank

r_{t b} (L^{*})

with fixed corruption level

∥ S^{*} ∥_{0} = 0.03 d_{1} d_{2} d_{3}

and number of uncorrupted observations

N_{ι} = 0.7 d_{1} d_{2} d_{3} - {∥ S^{*} ∥}_{0}

; (b) MSE vs. the number of corruptions

∥ S^{*} ∥_{0}

with fixed tubal rank 9 and total observation number

0.7 d_{1} d_{2} d_{3}

; (c) MSE vs. the number of uncorrupted observation

N_{ι}

with

r_{t b} (L^{*}) = 3

and corruption level

∥ S^{*} ∥_{0} = 0.01 d_{1} d_{2} d_{3}

; (d) MSE vs.

N_{ι}^{- 1}

with

r_{t b} (L^{*}) = 3

and

∥ S^{*} ∥_{0} = 0.01 d_{1} d_{2} d_{3}

.

Figure 3. Plots of the MSE versus the tubal rank

r_{t b} (L^{*})

of the underlying tensor, the number of corruptions

∥ S^{*} ∥_{0}

, the number of uncorrupted observations

N_{ι}

and its inversion

N_{ι}^{- 1}

: (a) MSE vs. the tubal rank

r_{t b} (L^{*})

with fixed corruption level

∥ S^{*} ∥_{0} = 0.03 d_{1} d_{2} d_{3}

and number of uncorrupted observations

N_{ι} = 0.7 d_{1} d_{2} d_{3} - {∥ S^{*} ∥}_{0}

; (b) MSE vs. the number of corruptions

∥ S^{*} ∥_{0}

with fixed tubal rank 9 and total observation number

0.7 d_{1} d_{2} d_{3}

; (c) MSE vs. the number of uncorrupted observation

N_{ι}

with

r_{t b} (L^{*}) = 3

and corruption level

∥ S^{*} ∥_{0} = 0.01 d_{1} d_{2} d_{3}

; (d) MSE vs.

N_{ι}^{- 1}

with

r_{t b} (L^{*}) = 3

and

∥ S^{*} ∥_{0} = 0.01 d_{1} d_{2} d_{3}

.

Figure 4. The dataset consists of the 85-th frame of all the 21 classes in the UCMerced dataset.

Figure 5. The PSNR, SSIM values and running time (in seconds) on the UCMerced dataset for the setting

(ρ_{obs}, ρ_{s}) = (0.3, 0.2)

.

Figure 5. The PSNR, SSIM values and running time (in seconds) on the UCMerced dataset for the setting

(ρ_{obs}, ρ_{s}) = (0.3, 0.2)

.

Figure 6. The PSNR, SSIM values and running time (in seconds) on the UCMerced dataset for the setting

(ρ_{obs}, ρ_{s}) = (0.8, 0.3)

.

Figure 6. The PSNR, SSIM values and running time (in seconds) on the UCMerced dataset for the setting

(ρ_{obs}, ρ_{s}) = (0.8, 0.3)

.

Figure 7. The visual examples for five models on UCMerced dataset for the setting

(ρ_{obs}, ρ_{s}) = (0.3, 0.2)

. (a) The original image; (b) the observed image; (c) image recovered by the matrix nuclear norm (NN) based Model (32); (d) recovered by the sum of mode-wise nuclear norms (SNN) based Model (33); (e) image recovered by TNN (DFT); (f) image recovered by TNN (DCT); (g) image recovered by TNN (Data).

Figure 7. The visual examples for five models on UCMerced dataset for the setting

(ρ_{obs}, ρ_{s}) = (0.3, 0.2)

. (a) The original image; (b) the observed image; (c) image recovered by the matrix nuclear norm (NN) based Model (32); (d) recovered by the sum of mode-wise nuclear norms (SNN) based Model (33); (e) image recovered by TNN (DFT); (f) image recovered by TNN (DCT); (g) image recovered by TNN (Data).

Figure 8. The visual examples for five models on UCMerced dataset for the setting

(ρ_{obs}, ρ_{s}) = (0.8, 0.3)

. (a) The original image; (b) the observed image; (c) image recovered by the matrix nuclear norm (NN) based Model (32); (d) recovered by the sum of mode-wise nuclear norms (SNN) based Model (33); (e) image recovered by TNN (DFT); (f) image recovered by TNN (DCT); (g) image recovered by TNN (Data).

Figure 8. The visual examples for five models on UCMerced dataset for the setting

(ρ_{obs}, ρ_{s}) = (0.8, 0.3)

. (a) The original image; (b) the observed image; (c) image recovered by the matrix nuclear norm (NN) based Model (32); (d) recovered by the sum of mode-wise nuclear norms (SNN) based Model (33); (e) image recovered by TNN (DFT); (f) image recovered by TNN (DCT); (g) image recovered by TNN (Data).

Figure 9. Visual results of robust tensor completion for five models on the 21st bound of Indian Pines dataset. The top, middle, and bottum row corresponds to the Setting I

(ρ_{obs} = 0.3, ρ_{s} = 0.2)

, Setting II

(ρ_{obs} = 0.6, ρ_{s} = 0.25)

, and Setting III

(ρ_{obs} = 0.8, ρ_{s} = 0.3)

, respectively. The sub-plots from (a) to (g): (a) the original image; (b) the observed image; (c) image recovered by the matrix nuclear norm (NN) based Model (32); (d) recovered by the sum of mode-wise nuclear norms (SNN) based Model (33); (e) image recovered by TNN (DFT); (f) image recovered by TNN (DCT); (g) image recovered by TNN (Data).

Figure 9. Visual results of robust tensor completion for five models on the 21st bound of Indian Pines dataset. The top, middle, and bottum row corresponds to the Setting I

(ρ_{obs} = 0.3, ρ_{s} = 0.2)

, Setting II

(ρ_{obs} = 0.6, ρ_{s} = 0.25)

, and Setting III

(ρ_{obs} = 0.8, ρ_{s} = 0.3)

, respectively. The sub-plots from (a) to (g): (a) the original image; (b) the observed image; (c) image recovered by the matrix nuclear norm (NN) based Model (32); (d) recovered by the sum of mode-wise nuclear norms (SNN) based Model (33); (e) image recovered by TNN (DFT); (f) image recovered by TNN (DCT); (g) image recovered by TNN (Data).

Figure 10. Visual results of robust tensor completion for five models on the 21st bound of Salinas A dataset. The top, middle, and bottum row corresponds to the Setting I

(ρ_{obs} = 0.3, ρ_{s} = 0.2)

, Setting II

(ρ_{obs} = 0.6, ρ_{s} = 0.25)

, and Setting III

(ρ_{obs} = 0.8, ρ_{s} = 0.3)

, respectively. The sub-plots from (a) to (g): (a) the original image; (b) the observed image; (c) image recovered by the matrix nuclear norm (NN) based Model (32); (d) recovered by the sum of mode-wise nuclear norms (SNN) based Model (33); (e) image recovered by TNN (DFT); (f) image recovered by TNN (DCT); (g) image recovered by TNN (Data).

Figure 10. Visual results of robust tensor completion for five models on the 21st bound of Salinas A dataset. The top, middle, and bottum row corresponds to the Setting I

(ρ_{obs} = 0.3, ρ_{s} = 0.2)

, Setting II

(ρ_{obs} = 0.6, ρ_{s} = 0.25)

, and Setting III

(ρ_{obs} = 0.8, ρ_{s} = 0.3)

, respectively. The sub-plots from (a) to (g): (a) the original image; (b) the observed image; (c) image recovered by the matrix nuclear norm (NN) based Model (32); (d) recovered by the sum of mode-wise nuclear norms (SNN) based Model (33); (e) image recovered by TNN (DFT); (f) image recovered by TNN (DCT); (g) image recovered by TNN (Data).

Figure 11. Visual results of robust tensor completion for five models on the 21st bound of Cloth dataset. The top, middle, and bottum row corresponds to the Setting I

(ρ_{obs} = 0.3, ρ_{s} = 0.2)

, Setting II

(ρ_{obs} = 0.6, ρ_{s} = 0.25)

, and Setting III

(ρ_{obs} = 0.8, ρ_{s} = 0.3)

, respectively. The sub-plots from (a) to (g): (a) The original image; (b) the observed image; (c) image recovered by the matrix nuclear norm (NN) based Model (32); (d) recovered by the sum of mode-wise nuclear norms (SNN) based Model (33); (e) image recovered by TNN (DFT); (f) image recovered by TNN (DCT); (g) image recovered by TNN (Data).

Figure 11. Visual results of robust tensor completion for five models on the 21st bound of Cloth dataset. The top, middle, and bottum row corresponds to the Setting I

(ρ_{obs} = 0.3, ρ_{s} = 0.2)

, Setting II

(ρ_{obs} = 0.6, ρ_{s} = 0.25)

, and Setting III

(ρ_{obs} = 0.8, ρ_{s} = 0.3)

, respectively. The sub-plots from (a) to (g): (a) The original image; (b) the observed image; (c) image recovered by the matrix nuclear norm (NN) based Model (32); (d) recovered by the sum of mode-wise nuclear norms (SNN) based Model (33); (e) image recovered by TNN (DFT); (f) image recovered by TNN (DCT); (g) image recovered by TNN (Data).

Figure 12. Visual results of robust tensor completion for five models on the 21st bound of Infraed Detection dataset. The top, middle, and bottum row corresponds to the Setting I

(ρ_{obs} = 0.3, ρ_{s} = 0.2)

, Setting II

(ρ_{obs} = 0.6, ρ_{s} = 0.25)

, and Setting III

(ρ_{obs} = 0.8, ρ_{s} = 0.3)

, respectively. The sub-plots from (a) to (g): (a) the original image; (b) the observed image; (c) image recovered by the matrix nuclear norm (NN) based Model (32); (d) recovered by the sum of mode-wise nuclear norms (SNN) based Model (33); (e) image recovered by TNN (DFT); (f) image recovered by TNN (DCT); (g) image recovered by TNN (Data).

Figure 12. Visual results of robust tensor completion for five models on the 21st bound of Infraed Detection dataset. The top, middle, and bottum row corresponds to the Setting I

(ρ_{obs} = 0.3, ρ_{s} = 0.2)

, Setting II

(ρ_{obs} = 0.6, ρ_{s} = 0.25)

, and Setting III

(ρ_{obs} = 0.8, ρ_{s} = 0.3)

, respectively. The sub-plots from (a) to (g): (a) the original image; (b) the observed image; (c) image recovered by the matrix nuclear norm (NN) based Model (32); (d) recovered by the sum of mode-wise nuclear norms (SNN) based Model (33); (e) image recovered by TNN (DFT); (f) image recovered by TNN (DCT); (g) image recovered by TNN (Data).

Figure 13. Visual results of robust tensor completion for five models on the 21st bound of OSU Thermal Database dataset. The top, middle, and bottum row corresponds to the Setting I

(ρ_{obs} = 0.3, ρ_{s} = 0.2)

, Setting II

(ρ_{obs} = 0.6, ρ_{s} = 0.25)

, and Setting III

(ρ_{obs} = 0.8, ρ_{s} = 0.3)

, respectively. The sub-plots from (a) to (g): (a) the original image; (b) the observed image; (c) image recovered by the matrix nuclear norm (NN) based Model (32); (d) recovered by the sum of mode-wise nuclear norms (SNN) based Model (33); (e) image recovered by TNN (DFT); (f) image recovered by TNN (DCT); (g) image recovered by TNN (Data).

Figure 13. Visual results of robust tensor completion for five models on the 21st bound of OSU Thermal Database dataset. The top, middle, and bottum row corresponds to the Setting I

(ρ_{obs} = 0.3, ρ_{s} = 0.2)

, Setting II

(ρ_{obs} = 0.6, ρ_{s} = 0.25)

, and Setting III

(ρ_{obs} = 0.8, ρ_{s} = 0.3)

, respectively. The sub-plots from (a) to (g): (a) the original image; (b) the observed image; (c) image recovered by the matrix nuclear norm (NN) based Model (32); (d) recovered by the sum of mode-wise nuclear norms (SNN) based Model (33); (e) image recovered by TNN (DFT); (f) image recovered by TNN (DCT); (g) image recovered by TNN (Data).

Table 1. List of notations.

Notations	Descriptions	Notations	Descriptions
t	a scaler	$T$	a matrix
$t$	a vector	$T$	a tensor
$L^{*}$	the true low-rank tensor	$\hat{L}$	the estimator of $L^{*}$
$S^{*}$	the true sparse tensor	$\hat{S}$	the estimator of $S^{*}$
$y_{i}$	a scalar observation	$ξ_{i}$	Gaussian noise
$X_{i}$	a design tensor	N	number of observations
$N_{ι}$	number of uncorrupted observations	$N_{s}$	$N - N_{ι}$
$Θ_{s}$	support of corruption tensor $S^{*}$	$Θ_{s}^{⊥}$	complement of $Θ_{s}$
$X (\cdot)$	design operator	$X^{*} (\cdot)$	adjoint operator of $X (\cdot)$
$L$	an orthogonal matrix in $R^{d_{3} \times d_{3}}$	$L (T) : = T \times_{3} L$	tensor L-transform
$\bar{T}$	block-diagonal matrix of $L (T)$	${∥ T ∥}_{s p} : = ∥ \bar{T} ∥$	tensor spectral norm
$T_{i j k}$	${(i, j, k)}_{t h}$ entry of $T$	${∥ T ∥}_{⋆} : = {∥ \bar{T} ∥}_{*}$	tubal nuclear norm
$T (i, j, :)$	${(i, j)}_{t h}$ tube of $T$	${∥ T ∥}_{1} : = \sum_{i j k} \| T_{i j k} \|$	tensor $l_{1}$ -norm
$T (:, :, k)$	$k_{t h}$ frontal slice of $T$	${∥ T ∥}_{F} : = \sqrt{\sum_{i j k} T_{i j k}^{2}}$	tensor F-norm
$T^{(k)}$	$T (:, :, k)$	${∥ T ∥}_{\infty} : = {max}_{i j k} \| T_{i j k} \|$	tensor $l_{\infty}$ -norm
$T_{(k)}$	mode-k unfolding of $T$	$〈A, B〉 : = \sum_{i j k} A_{i j k} B_{i j k}$	tensor inner product

Table 2. Quantitative evaluation on the Indian Pines dataset in PSNR, SSIM, and running time of five models for robust tensor completion in three settings, i.e., Setting I

(ρ_{obs} = 0.3, ρ_{s} = 0.2)

, Setting II

(ρ_{obs} = 0.6, ρ_{s} = 0.25)

, and Setting III

(ρ_{obs} = 0.8, ρ_{s} = 0.3)

. The highest PSNR/SSIM, or lowest time (in seconds) is highlighted in bold.

Table 2. Quantitative evaluation on the Indian Pines dataset in PSNR, SSIM, and running time of five models for robust tensor completion in three settings, i.e., Setting I

(ρ_{obs} = 0.3, ρ_{s} = 0.2)

, Setting II

(ρ_{obs} = 0.6, ρ_{s} = 0.25)

, and Setting III

(ρ_{obs} = 0.8, ρ_{s} = 0.3)

. The highest PSNR/SSIM, or lowest time (in seconds) is highlighted in bold.

Settings	Metrics	NN	SNN	TNN-DFT	TNN-DCT	TNN-Data
	PSNR	20.63	25.46	28.49	29.33	30.08
Setting I	SSIM	0.4842	0.7275	0.7619	0.7872	0.8181
	TIME	14.77	40.1	11.17	15.53	13.54
	PSNR	21.95	27.66	29.49	30.17	30.61
Setting II	SSIM	0.5454	0.7864	0.7912	0.8073	0.8296
	TIME	14.18	39.76	11.04	15.23	13.29
	PSNR	22.43	28.22	29.64	30.31	30.87
Setting III	SSIM	0.5534	0.8051	0.7971	0.8139	0.8345
	TIME	14.21	38.88	11.05	15.27	13.43

Table 3. Quantitative evaluation on the Salinas A dataset in PSNR, SSIM, and running time of five tensor completion models for robust tensor completion in three settings, i.e., Setting I

(ρ_{obs} = 0.3, ρ_{s} = 0.2)

, Setting II

(ρ_{obs} = 0.6, ρ_{s} = 0.25)

, and Setting III

(ρ_{obs} = 0.8, ρ_{s} = 0.3)

. The highest PSNR/SSIM, or lowest time (in seconds) is highlighted in bold.

Table 3. Quantitative evaluation on the Salinas A dataset in PSNR, SSIM, and running time of five tensor completion models for robust tensor completion in three settings, i.e., Setting I

(ρ_{obs} = 0.3, ρ_{s} = 0.2)

, Setting II

(ρ_{obs} = 0.6, ρ_{s} = 0.25)

, and Setting III

(ρ_{obs} = 0.8, ρ_{s} = 0.3)

. The highest PSNR/SSIM, or lowest time (in seconds) is highlighted in bold.

Settings	Metrics	NN	SNN	TNN-DFT	TNN-DCT	TNN-Data
	PSNR	19.01	26.18	27.1	30.99	32.69
Setting I	SSIM	0.4918	0.837	0.7501	0.8350	0.8774
	TIME	5.57	11.53	4.07	5.73	4.9
	PSNR	20.97	28.79	29.4	32.26	33.3
Setting II	SSIM	0.5806	0.8675	0.8117	0.8645	0.8714
	TIME	5.41	11.36	4.02	5.67	4.79
	PSNR	21.54	29.5	29.73	32.38	33.54
Setting III	SSIM	0.5914	0.8772	0.8208	0.8683	0.8848
	TIME	5.34	11.01	3.98	5.59	4.91

Table 4. Quantitative evaluation on the Beads dataset in PSNR, SSIM, and running time of five tensor completion models for robust tensor completion in three settings, i.e., Setting I

(ρ_{obs} = 0.3, ρ_{s} = 0.2)

, Setting II

(ρ_{obs} = 0.6, ρ_{s} = 0.25)

, and Setting III

(ρ_{obs} = 0.8, ρ_{s} = 0.3)

. The highest PSNR/SSIM, or lowest time (in seconds) is highlighted in bold.

Table 4. Quantitative evaluation on the Beads dataset in PSNR, SSIM, and running time of five tensor completion models for robust tensor completion in three settings, i.e., Setting I

(ρ_{obs} = 0.3, ρ_{s} = 0.2)

, Setting II

(ρ_{obs} = 0.6, ρ_{s} = 0.25)

, and Setting III

(ρ_{obs} = 0.8, ρ_{s} = 0.3)

. The highest PSNR/SSIM, or lowest time (in seconds) is highlighted in bold.

Settings	Metrics	NN	SNN	TNN-DFT	TNN-DCT	TNN-Data
	PSNR	18.58	18.71	25.11	25.18	27.05
Setting I	SSIM	0.448	0.6208	0.804	0.8203	0.8673
	TIME	309.55	933.12	280	260.46	241.65
	PSNR	20.4	21.35	27.31	27.46	28.9
Setting II	SSIM	0.5406	0.7603	0.8754	0.8894	0.9143
	TIME	302.95	915.57	276.2	268.58	244.5
	PSNR	21.01	22.36	27.96	28.13	29.4
Setting III	SSIM	0.5531	0.7848	0.8803	0.8944	0.9165
	TIME	301.92	922.07	276.99	272.59	244.02

Table 5. Quantitative evaluation on the Cloth dataset in PSNR, SSIM, and running time of five tensor completion models for robust tensor completion in three settings, i.e., Setting I

(ρ_{obs} = 0.3, ρ_{s} = 0.2)

, Setting II

(ρ_{obs} = 0.6, ρ_{s} = 0.25)

, and Setting III

(ρ_{obs} = 0.8, ρ_{s} = 0.3)

. The highest PSNR/SSIM, or lowest time (in seconds) is highlighted in bold.

Table 5. Quantitative evaluation on the Cloth dataset in PSNR, SSIM, and running time of five tensor completion models for robust tensor completion in three settings, i.e., Setting I

(ρ_{obs} = 0.3, ρ_{s} = 0.2)

, Setting II

(ρ_{obs} = 0.6, ρ_{s} = 0.25)

, and Setting III

(ρ_{obs} = 0.8, ρ_{s} = 0.3)

. The highest PSNR/SSIM, or lowest time (in seconds) is highlighted in bold.

Settings	Metrics	NN	SNN	TNN-DFT	TNN-DCT	TNN-Data
	PSNR	21.5	22.79	29.7	30.83	30.77
Setting I	SSIM	0.5054	0.6333	0.8649	0.8883	0.8941
	TIME	308.29	915.43	281.4	264.84	242.63
	PSNR	22.63	24.94	32.32	33.57	33.86
Setting II	SSIM	0.5566	0.7355	0.916	0.9323	0.9391
	TIME	300.3	911.27	273.36	268.46	243.37
	PSNR	22.99	25.78	32.76	34.02	34.39
Setting III	SSIM	0.5652	0.7643	0.9183	0.9342	0.941
	TIME	297.94	910.64	280.51	268.25	246.14

Table 6. Quantitative evaluation on the SenerioB Distance dataset in PSNR, SSIM, and running time of five tensor completion models for robust tensor completion in three settings, i.e., Setting I

(ρ_{obs} = 0.3, ρ_{s} = 0.2)

, Setting II

(ρ_{obs} = 0.6, ρ_{s} = 0.25)

, and Setting III

(ρ_{obs} = 0.8, ρ_{s} = 0.3)

. The highest PSNR/SSIM, or lowest time (in seconds) is highlighted in bold.

Table 6. Quantitative evaluation on the SenerioB Distance dataset in PSNR, SSIM, and running time of five tensor completion models for robust tensor completion in three settings, i.e., Setting I

(ρ_{obs} = 0.3, ρ_{s} = 0.2)

, Setting II

(ρ_{obs} = 0.6, ρ_{s} = 0.25)

, and Setting III

(ρ_{obs} = 0.8, ρ_{s} = 0.3)

. The highest PSNR/SSIM, or lowest time (in seconds) is highlighted in bold.

Settings	Metrics	NN	SNN	TNN-DFT	TNN-DCT	TNN-Data
	PSNR	17.55	20.01	23.86	23.86	23.87
Setting I	SSIM	0.468	0.763	0.8732	0.8737	0.8739
	TIME	15.22	186.31	14.49	19.66	15.83
	PSNR	18.57	23.87	25.28	25.31	25.34
Setting II	SSIM	0.551	0.9055	0.9096	0.91	0.9105
	TIME	15.51	189.87	15.55	18.68	16.31
	PSNR	18.98	24.78	25.79	25.83	25.87
Setting III	SSIM	0.5678	0.9197	0.9179	0.9184	0.9189
	TIME	15.03	195	14.8	19.11	15.02

Table 7. Quantitative evaluation on the SenerioB Intensity dataset in PSNR, SSIM, and running time of five tensor completion models for robust tensor completion in three settings, i.e., Setting I

(ρ_{obs} = 0.3, ρ_{s} = 0.2)

, Setting II

(ρ_{obs} = 0.6, ρ_{s} = 0.25)

, and Setting III

(ρ_{obs} = 0.8, ρ_{s} = 0.3)

. The highest PSNR/SSIM, or lowest time (in seconds) is highlighted in bold.

Table 7. Quantitative evaluation on the SenerioB Intensity dataset in PSNR, SSIM, and running time of five tensor completion models for robust tensor completion in three settings, i.e., Setting I

(ρ_{obs} = 0.3, ρ_{s} = 0.2)

, Setting II

(ρ_{obs} = 0.6, ρ_{s} = 0.25)

, and Setting III

(ρ_{obs} = 0.8, ρ_{s} = 0.3)

. The highest PSNR/SSIM, or lowest time (in seconds) is highlighted in bold.

Settings	Metrics	NN	SNN	TNN-DFT	TNN-DCT	TNN-Data
	PSNR	16.35	20.35	21.28	21.26	21.31
Setting I	SSIM	0.2588	0.7076	0.7114	0.7116	0.7137
	TIME	15.13	188.96	14.18	19.06	15.19
	PSNR	17.09	22.17	22.28	22.32	22.45
Setting II	SSIM	0.3149	0.7889	0.7708	0.7718	0.7781
	TIME	14.64	187.61	14.09	19.1	15.74
	PSNR	17.35	22.48	22.57	22.61	22.79
Setting III	SSIM	0.3331	0.7985	0.7828	0.7836	0.7914
	TIME	14.7	187.66	14.23	19	15.22

Table 8. Quantitative evaluation on the Sky dataset in PSNR, SSIM, and running time of five tensor completion models for robust tensor completion in three settings, i.e., Setting I

(ρ_{obs} = 0.3, ρ_{s} = 0.2)

, Setting II

(ρ_{obs} = 0.6, ρ_{s} = 0.25)

, and Setting III

(ρ_{obs} = 0.8, ρ_{s} = 0.3)

. The highest PSNR/SSIM, or lowest time (in seconds) is highlighted in bold.

Table 8. Quantitative evaluation on the Sky dataset in PSNR, SSIM, and running time of five tensor completion models for robust tensor completion in three settings, i.e., Setting I

(ρ_{obs} = 0.3, ρ_{s} = 0.2)

, Setting II

(ρ_{obs} = 0.6, ρ_{s} = 0.25)

, and Setting III

(ρ_{obs} = 0.8, ρ_{s} = 0.3)

. The highest PSNR/SSIM, or lowest time (in seconds) is highlighted in bold.

Settings	Metrics	NN	SNN	TNN-DFT	TNN-DCT	TNN-Data
	PSNR	21.03	26.74	28.67	28.59	29.74
Setting I	SSIM	0.4875	0.7805	0.708	0.7076	0.788
	TIME	36.1	144.84	28.1	39.85	34.43
	PSNR	22.44	28.8	29.41	29.35	30.48
Setting II	SSIM	0.5715	0.8026	0.7155	0.7147	0.7814
	TIME	34.23	138.52	27.11	38.77	33.74
	PSNR	22.77	28.72	29.55	29.49	30.59
Setting III	SSIM	0.5786	0.7471	0.7324	0.7307	0.7906
	TIME	34.42	139.78	26.97	39.14	33.86

Table 9. Quantitative evaluation on the Infraed Detection dataset in PSNR, SSIM, and running time of five tensor completion models for robust tensor completion in three settings, i.e., Setting I

(ρ_{obs} = 0.3, ρ_{s} = 0.2)

, Setting II

(ρ_{obs} = 0.6, ρ_{s} = 0.25)

, and Setting III

(ρ_{obs} = 0.8, ρ_{s} = 0.3)

. The highest PSNR/SSIM, or lowest time (in seconds) is highlighted in bold.

Table 9. Quantitative evaluation on the Infraed Detection dataset in PSNR, SSIM, and running time of five tensor completion models for robust tensor completion in three settings, i.e., Setting I

(ρ_{obs} = 0.3, ρ_{s} = 0.2)

, Setting II

(ρ_{obs} = 0.6, ρ_{s} = 0.25)

, and Setting III

(ρ_{obs} = 0.8, ρ_{s} = 0.3)

. The highest PSNR/SSIM, or lowest time (in seconds) is highlighted in bold.

Settings	Metrics	NN	SNN	TNN-DFT	TNN-DCT	TNN-Data
	PSNR	24.82	30.09	31.94	32.07	32.84
Setting I	SSIM	0.6021	0.8231	0.7408	0.7437	0.7768
	TIME	49.01	215.41	48.97	52.37	46.59
	PSNR	26.58	31.8	32.33	32.43	33.11
Setting II	SSIM	0.6679	0.8414	0.7428	0.7453	0.7724
	TIME	47.55	217.43	50.07	52.9	47.18
	PSNR	26.95	31.9	33.11	33.2	33.81
Setting III	SSIM	0.6682	0.8454	0.7237	0.7265	0.7525
	TIME	48.65	216.42	49.13	52.81	46.94

Table 10. Quantitative evaluation on the OSU Thermal Database in PSNR, SSIM, and running time of five tensor completion models for robust tensor completion in three settings, i.e., Setting I

(ρ_{obs} = 0.3, ρ_{s} = 0.2)

, Setting II

(ρ_{obs} = 0.6, ρ_{s} = 0.25)

, and Setting III

(ρ_{obs} = 0.8, ρ_{s} = 0.3)

. The highest PSNR/SSIM, or lowest time (in seconds) is highlighted in bold.

Table 10. Quantitative evaluation on the OSU Thermal Database in PSNR, SSIM, and running time of five tensor completion models for robust tensor completion in three settings, i.e., Setting I

(ρ_{obs} = 0.3, ρ_{s} = 0.2)

, Setting II

(ρ_{obs} = 0.6, ρ_{s} = 0.25)

, and Setting III

(ρ_{obs} = 0.8, ρ_{s} = 0.3)

. The highest PSNR/SSIM, or lowest time (in seconds) is highlighted in bold.

Settings	Metrics	NN	SNN	TNN-DFT	TNN-DCT	TNN-Data
	PSNR	15.62	21.5	31.33	31.5	31.51
Setting I	SSIM	0.3402	0.8105	0.9345	0.9347	0.935
	TIME	49	222.49	40.31	49.45	42.22
	PSNR	17.47	29.48	32.88	33.19	33.21
Setting II	SSIM	0.4428	0.9057	0.9427	0.9431	0.9433
	TIME	46.18	197.9	36.14	47.19	41.79
	PSNR	18.17	30.83	33.31	33.71	33.75
Setting III	SSIM	0.468	0.9265	0.9495	0.95	0.9507
	TIME	45.85	200.39	36.39	46.96	41.36

Table 11. Quantitative evaluation on the UAVSAR-Dataset1-2015 dataset in PSNR, SSIM, and running time of five tensor completion models for robust tensor completion in three settings, i.e., Setting I

(ρ_{obs} = 0.3, ρ_{s} = 0.2)

, Setting II

(ρ_{obs} = 0.6, ρ_{s} = 0.25)

, and Setting III

(ρ_{obs} = 0.8, ρ_{s} = 0.3)

. The highest PSNR/SSIM, or lowest time (in seconds) is highlighted in bold.

Table 11. Quantitative evaluation on the UAVSAR-Dataset1-2015 dataset in PSNR, SSIM, and running time of five tensor completion models for robust tensor completion in three settings, i.e., Setting I

(ρ_{obs} = 0.3, ρ_{s} = 0.2)

, Setting II

(ρ_{obs} = 0.6, ρ_{s} = 0.25)

, and Setting III

(ρ_{obs} = 0.8, ρ_{s} = 0.3)

. The highest PSNR/SSIM, or lowest time (in seconds) is highlighted in bold.

Settings	Metrics	NN	SNN	TNN-DFT	TNN-DCT	TNN-Data
	PSNR	29.14	25.86	26.22	26.5	31.62
Setting I	SSIM	0.8748	0.8797	0.8868	0.8909	0.9438
	TIME	23.54	75.07	17.46	25.24	22.6
	PSNR	31.3	26.71	26.96	27.31	34.28
Setting II	SSIM	0.9044	0.8742	0.9018	0.9059	0.9615
	TIME	23.09	74.75	17.6	24.77	22.59
	PSNR	31.8	26.98	27.14	27.66	35.03
Setting III	SSIM	0.9118	0.8829	0.903	0.9092	0.9649
	TIME	22.88	73.03	17.64	25.22	22.54

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, A.; Zhou, G.; Zhao, Q. Guaranteed Robust Tensor Completion via ∗_L-SVD with Applications to Remote Sensing Data. Remote Sens. 2021, 13, 3671. https://doi.org/10.3390/rs13183671

AMA Style

Wang A, Zhou G, Zhao Q. Guaranteed Robust Tensor Completion via ∗_L-SVD with Applications to Remote Sensing Data. Remote Sensing. 2021; 13(18):3671. https://doi.org/10.3390/rs13183671

Chicago/Turabian Style

Wang, Andong, Guoxu Zhou, and Qibin Zhao. 2021. "Guaranteed Robust Tensor Completion via ∗_L-SVD with Applications to Remote Sensing Data" Remote Sensing 13, no. 18: 3671. https://doi.org/10.3390/rs13183671

APA Style

Wang, A., Zhou, G., & Zhao, Q. (2021). Guaranteed Robust Tensor Completion via ∗_L-SVD with Applications to Remote Sensing Data. Remote Sensing, 13(18), 3671. https://doi.org/10.3390/rs13183671

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Guaranteed Robust Tensor Completion via ∗L-SVD with Applications to Remote Sensing Data

Abstract

1. Introduction

2. Preliminaries

2.1. Notations

2.2. Tensor ∗ L -Singular Value Decomposition

3. Robust Tensor Completion

3.1. The Observation Model

3.2. The Proposed Estimator

4. Algorithm

5. Statistical Performance

5.1. Upper Bound on the Estimation Error

5.2. A Minimax Lower Bound for the Estimation Error

6. Connections and Differences with Previous Works

7. Experiments

7.1. Sharpness of the Proposed Upper Bound

7.2. Effectiveness and Efficient of the Proposed Algorithm

7.2.1. Experiments on an Urban Area Imagery Dataset

7.2.2. Experiments on Hyperspectral Data

7.2.3. Experiments on Multispectral Images

7.2.4. Experiments on Point Could Data

7.2.5. Experiments on Aerial Video Data

7.2.6. Experiments on Thermal Imaging Data

7.2.7. Experiments on SAR Data

8. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A. Proof of Theoretical Results

Appendix A.1. Additional Notations and Preliminaries

Appendix A.2. The Proof for Theorem 3

Appendix A.2.1. Mainstream of Proving Theorem 3

Appendix A.2.2. Lemmas for the Proof of Theorem 3

Appendix A.3. Proof of Theorem 4

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Guaranteed Robust Tensor Completion via ∗_L-SVD with Applications to Remote Sensing Data

2.2. Tensor $*_{L}$ -Singular Value Decomposition