Guaranteed Robust Tensor Completion via ∗ L -SVD with Applications to Remote Sensing Data

: This paper conducts a rigorous analysis for the problem of robust tensor completion, which aims at recovering an unknown three-way tensor from incomplete observations corrupted by gross sparse outliers and small dense noises simultaneously due to various reasons such as sensor dead pixels, communication loss, electromagnetic interferences, cloud shadows, etc. To estimate the underlying tensor, a new penalized least squares estimator is ﬁrst formulated by exploiting the low rankness of the signal tensor within the framework of tensor ∗ L -Singular Value Decomposition ( ∗ L -SVD) and leveraging the sparse structure of the outlier tensor. Then, an algorithm based on the Alternating Direction Method of Multipliers (ADMM) is designed to compute the estimator in an efﬁcient way. Statistically, the non-asymptotic upper bound on the estimation error is established and further proved to be optimal (up to a log factor) in a minimax sense. Simulation studies on synthetic data demonstrate that the proposed error bound can predict the scaling behavior of the estimation error with problem parameters (i.e., tubal rank of the underlying tensor, sparsity of the outliers, and the number of uncorrupted observations). Both the effectiveness and efﬁciency of the proposed algorithm are evaluated through experiments for robust completion on seven different types of remote sensing data.


Introduction
Despite the broad adoption of advanced sensors in various remote sensing tasks, the quality of data remains a critical issue and can significantly influence the actual performances of the backend applications. Many types of modern remote sensing data in the modality of optical, hyperspectral, multispectral, thermal, Light Detection and Ranging (LiDAR), Synthetic Aperture Radar (SAR), etc., are typically multi-way and can be readily stored, analyzed, and processed by tensor-based models [1][2][3][4][5][6][7]. In some extreme circumstances, the data tensor may encounter missing entries, gross sparse outliers, and small dense noises at the same time, as a result of partial sensor failures, communication errors, occlusion by obstacles, and so on [8,9]. To robustly complete a partially observed data tensor corrupted by outliers and noises, the problem of robust tensor completion arises.
When only a fraction of partially corrupted observations are available, the crucial point of robust tensor completion lies in the assumption that the underlying data tensor is highly redundant such that the main components of it remain only slightly suppressed by missing information, outliers, and noises, and thus can be effectively reconstructed by exploiting the intrinsic redundancy. The tensor low-rankness is an ideal tool to model the redundancy of tensor data, and has gained extensive attention in remote sensing data restoration [5,10,11].
As higher-order extensions of low-rank matrix models [12], low-rank tensor models are typically formulated as minimization problems of the tensor rank function [13]. However, there are multiple definitions of tensor ranks, such as the CP rank [14], Tucker rank [15], TT rank [16], TR rank [17], etc., which focus on low rank structures in the original domains (like the pixel domain of optimal images) [18,19]. Recently, a remarkably different example named the low-tubal-rank tensor model [20,21] was proposed within the algebraic framework of tensor Singular Value Decomposition (t-SVD) [20,22], which captures low-rankness in the frequency domain defined via Discrete Fourier Transform (DFT). As discussed in [18,19,21,23], the low-tubal-rank tensor models are capable to exploit both low-rankness and smoothness of the tensor data, making it quite suitable to analyze and process diverse remote sensing imagery data which are often simultaneously low-rank and smooth [5,10].
Motivated by the advantages of low-tubal-rankness in modeling remote sensing data, we resolve the robust tensor completion problem by utilizing a generalized low-tubalrank model based on the tensor * L -Singular Value Decomposition ( * L -SVD) [24], which leverages low-rankness in more general transformed domains rather than DFT. What needs to be pointed out is that the * L -SVD has become a research focus in tensor-based signal processing, computer vision, and machine learning very recently [18,23,25,26]. Regarding the preference of theory in this paper, we only introduce several typical works with statistical analysis as follows. For tensor completion in the noiseless settings, Lu et al. [26] proposed a * L -SVD-based model which can exactly recover the underlying tensor under mild conditions. For tensor completion from partial observations corrupted by sparse outliers, Song et al. [27] designed a * L -SVD-based algorithm with exact recovery guarantee. Zhang et al. [25] developed a theoretically guaranteed approach via the * L -SVD to for tensor completion from Poisson noises. The problem of tensor recovery from noisy linear observations is studied in [18] based the * L -SVD with guaranteed statistical performance.
In this paper, we focus on statistical guaranteed approaches in a more challenging setting than the aforementioned * L -SVD-based models, where the underlying signal tensor suffers from missing entries, sparse outliers, and small dense noises simultaneously. Specifically, we resolve the problem of robust tensor completion by formulating a * L -SVD-based estimator whose estimation error is established and further proved to be minimax optimal (up to a log factor). We propose an algorithm based on Alternating Direction Method of Multipliers (ADMM) [28,29] to compute the estimator and evaluate both the effectiveness and efficiency on seven different types of remote sensing data.
The remainder of this paper proceeds as follows. We first introduce some notation and preliminaries in Section 2. Then, the proposed estimator for robust tensor completion is formulated in Section 3. We compute the estimator by using an ADMM-based algorithm described in Section 4. The statistical performance of the proposed estimator is analyzed in Section 5. Experimental results on both synthetic and real datasets are reported in Section 7. We summarize this paper and discuss future directions briefly in Section 8. The proofs of the theoretical results are given in Appendix A.

Preliminaries
In this section, we first introduce some notations and then give a brief introduction to the * L -SVD framework.

Notations
Main notations are listed in Table 1. Let [d] := {1, . . . , d}, ∀d ∈ N + . Let a ∨ b = max{a, b} and a ∧ b = min{a, b}, ∀a, b ∈ R. For i ∈ [d], e i ∈ R d denotes the standard vector basis whose i th entry is 1 with the others 0. For (i, j, k) ∈ [d 1 ] × [d 2 ] × [d 3 ], the outer product e i • e j • e k denotes a standard tensor basis in R d 1 ×d 2 ×d 3 , whose (i, j, k) th entry is 1 with the others 0. For a 3-way tensor, a tube is a vector defined by fixing indices of the first two modes and varying the third one; A slice is a matrix defined by fixing all but two indices. For any set Θ, |Θ| denotes its cardinality and Θ ⊥ its complement.
Absolute positive constants are denoted by C, c, c 0 , etc whose values may vary from line to line. When the field and size of a tensor are not shown explicitly, it is defaulted to be in R d 1 ×d 2 ×d 3 . The spectral norm · and nuclear norm · * of a matrix are the maximum and the sum of the singular values, respectively. design operator X * (·) adjoint operator of X(·) L an orthogonal matrix in R d 3 ×d 3 L(T ) := T × 3 L tensor L-transform T block-diagonal matrix of L(T ) T sp := T tensor spectral norm T ijk (i, j, k) th entry of T T := T * tubal nuclear norm T (i, j, :) (i, j) th tube of T T 1 := ∑ ijk |T ijk | tensor l 1 -norm T (:, :, k) k th frontal slice of T T F := ∑ ijk T 2 ijk tensor F-norm T (k) T (:, :, k) T ∞ := max ijk |T ijk | tensor l ∞ -norm T (k) mode-k unfolding of T A, B := ∑ ijk A ijk B ijk tensor inner product
We also define the block vectorization operator and its inverse operator for any T ∈ R d 1 ×d 2 ×d 3 by: (1) T (2) . . .
, bvfold(bvec(T )) = T Then, based on the operators defined above, we are able to give the definition of the tensor t-product.
Definition 1 (T-product [22]). For any tensors A ∈ R d 1 ×d 2 ×d 3 and B ∈ R d 2 ×d 4 ×d 3 , their t-product is a tensor C of size d 1 × d 4 × d 3 computed as follows: C = A * B := bvfold bcirc(A)bvec (B) If we view the 3-way tensor C ∈ R d 1 ×d 4 ×d 3 as a d 1 -by-d 4 "matrix" C of tubes C(i, j, :) ∈ R d 3 , then the t-product can be analogously conducted like the matrix mul-tiplication by changing scalar multiplication by the circular convolution between the tubes (i.e., vectors), as follows: where the symbol denotes the circular convolution of two tubes a, b ∈ R d 3 defined as follows [22]: where mod(·) is the modulus operator. According to the well-known relationship between circular convolution and DFT, the t-product is equivalent to matrix multiplication between all the frontal slices in the Fourier domain [22], i.e., where T denotes the tensor obtained by conducting DFT on all the mode-3 fibers of any tensor T , i.e., where F d 3 is the transform matrix of DFT [22], and × 3 denotes the tensor mode-3 product [30].
In [24], Kernfeld et al. extended the t-product to the tensor * L -product by replacing DFT by any invertible linear transform L(·) induced by a non-singular transformation matrix L, and established the framework of * L -SVD. In the latest studies, the transformation matrix L defining the transform L is restricted to be orthogonal [18,26,31,32] (unitary in [25,27]) for better properties, which is also followed in this paper.
Given any orthogonal matrix L ∈ R d 3 ×d 3 (though we restrict L to be orthogonal for simplicy, our analysis still holds with simple extensions for unitary L [27]), define the associated linear transform L(·) with inverse L −1 (·) on any T ∈ R d 1 ×d 2 ×d 3 as Definition 2 (Tensor * L -product [24]). The * L -product of any A ∈ R d 1 ×d 2 ×d 3 and B ∈ R d 2 ×d 4 ×d 3 under the invertible linear transform L in Equation (4), denoted by A * L B, is defined as the tensor C ∈ R d 1 ×d 4 ×d 3 such that L(C) = L(A) L(B).
Definition 3 ( * L -block-diagonal matrix [18]). For any T ∈ R d 1 ×d 2 ×d 3 , its * L -block-diagonal matrix, denoted by T, is defined as the block diagonal matrix whose i-th diagonal block is the i-th . . .
Theorem 1 (Tensor * L -SVD, * L -tubal rank [24]). Any T ∈ R d 1 ×d 2 ×d 3 has a tensor * L -Singular Value Decomposition ( * L -SVD) under any L in Equation (4), given as follows The * L -tubal rank of T ∈ R d 1 ×d 2 ×d 3 is defined as the number of non-zero tubes of D in its * L -SVD in Equation (5) i.e., where # counts the number of elements of a given set. For any T ∈ R d 1 ×d 2 ×d 3 , we have the following equivalence between its * L -SVD and the matrix SVD of its * L -block-diagonal matrix T: Considering the block diagonal structure of T, we define the tensor * L -multi-rank on its diagonal blocks T (i) : Definition 4 (Tensor * L -nuclear norm, tensor * L -spectral norm [26]). The tensor * L -nuclear norm ( * L -TNN) and * L -spectral norm of any T ∈ R d 1 ×d 2 ×d 3 under any L in Equation (4) are defined as the matrix nuclear norm and matrix spectral norm of T, respectively, i.e., T := T * , T sp := T .
As proved in [26,27], * L -TNN is the convex envelop of the l 1 -norm of the * L -multirank in unit tensor * L -spectral norm ball. Thus, * L -TNN encourages a low * L -multi-rank structure which means low-rankness in spectral domain. When the linear transform L represents the DFT (although we restrict the L in Equation (4) to be orthogonal, we still consider TNN as a special case of * L -TNN up to constants and real/complex domain) along the 3-rd mode, * L -TNN and tensor * L -spectral norm degenerate to the Tubal Nuclear Norm (TNN) and the tensor spectral norm, respectively, up to a constant factor d −1 3 [26,33].

Robust Tensor Completion
In this section, we will formulate the robust tensor completion problem. The observation model will be shown first.

The Observation Model
Consider an underling signal tensor L * ∈ R d 1 ×d 2 ×d 3 which possesses intrinsically low-dimensionality structure characterized by low-tubal-rankness, that is r tb (L * ) d 1 ∧ d 2 . Suppose we obtain N scalar observations y i of L * ∈ R d 1 ×d 2 ×d 3 from the noisy observation model: where the tensor S * ∈ R d 1 ×d 2 ×d 3 represents some gross corruptions (e.g., outliers, errors, etc.) additive to the signal L * which is element-wisely sparse (the presented theoretical analysis and optimization algorithm can be generalized to more sparsity settings of corruptions (e.g. the tube-wise sparsity [20,34], and slice-wise sparsity [34,35]) by using the tools developed for robust matrix completion in [36] and robust tensor decomposition in [34]; for simplicity, we only consider the most common element-wisely sparse case), ξ i 's are random noises sampled i.i.d. from Gaussian distribution N (0, σ 2 ), and X i 's are known random design tensors in R d 1 ×d 2 ×d 3 satisfying the following assumptions: Assumption 1. We make two natural assumptions on the design tensors: I. All the corrupted positions of L * are observed, that is, the (unknown) support Θ s = supp(S * ) := {(i, j, k) | S * ijk = 0} of the corruption tensor S * is fully observed. Formally speaking, there exists an unknown subset X s ⊂ {X i } N i=1 drawn from an (unknown) distribution Π Θ s on the set X Θ s := e j • e k • e l , ∀(j, k, l) ∈ Θ s , such that each element in X Θ s is sampled at least once. II. All uncorrupted positions of L * are sampled uniformly with replacement for simplicity of exposition. Formally speaking, each element of the set X ⊥ s : According to the observation model (6), the true tensor L * is first corrupted by a sparse tensor S * and then sampled to N scalars {y i } with additive Gaussian noises {ξ i } (see Figure 2). The corrupted positions of L * are further assumed in Assumption 1 to be totally observed with design tensors in X s ⊂ {X i } N i=1 , and the remaining uncorrupted positions are sampled uniformly through design tensors in Let y = (y 1 , . . . , y N ) ∈ R N and ξ = (ξ 1 , . . . , ξ N ) ∈ R N be the vector of observations and noises, respectively. Define the design operator X : R d 1 ×d 2 ×d 3 → R N as X(·) := ( ·, X 1 , . . . , ·, X N ) , and its adjoint operator X * (z) := ∑ N i=1 z i X i for any z ∈ R N . Then the observation model (6) can be rewritten in a compact form y = X(L * + S * ) + ξ.

The Proposed Estimator
The aim of robust tensor completion is to reconstruct the unknown low-rank L * and sparse S * from incomplete and noisy measurements {(X i , y i )} N i=1 generated by the observation model (6). It can be treated as a robust extension of tensor completion in [33], and a noisy partial variant of tensor robust PCA [37].
To reconstruct the underlying low-rank tensor L * and sparse tensor S * , it is natural to consider the following minimization model: where we use least squares as the fidelity term for Gaussian noises, the tubal rank as the regularization to impose low-rank structure in L, the tensor l 0 -(pseudo)norm to regularize S for sparsity, λ ι , λ s ≥ 0 are tunable regularization parameters, balancing the regularizations and the fidelity term. However, general rank and l 0 -norm minimization is NP-hard [12,38], making it extremely hard to soundly solve Problem (7). For tractable low-rank and sparse optimization, we follow the most common idea to relax the non-convex functions r tb (·) and · 0 to their convex surrogates, i.e., the * L -tubal nuclear norm · and the tensor l 1 -norm · 1 , respectively. Specifically, the following estimator is defined: where a > 0 is a known constant constraining the magnitude of entries in L * and S * . The additional constraint L ∞ ≤ a and S ∞ ≤ a is very mild since most signals and corruptions are of limited energy in real applications. It can also provide a theoretical benefit to exclude the "spiky" tensors, which is important in controlling the separability of L * and S * . Such "non-spiky" constraints are also imposed in previous literatures [36,39,40], playing a key role in bounding the estimation error. Then, it is natural to ask the following questions: Q1: How to compute the proposed estimator? Q2: How well can the proposed estimator estimate L * and S * ? We first discuss Q1 in Section 4 and then answer Q2 in Section 5.

Algorithm
In this section, we answer Q1 by designing an algorithm based on ADMM to compute the proposed estimator.
To solve Problem (8), the first step is to introduce auxiliary variables g, K, T , M, N to deal with the complex couplings between X(·), · 2 · , · 1 , and · ∞ as follows: min g,L,S,K,T ,M,N 1 2N where δ ∞ a (·) is the indicator function of tensor l ∞ -norm ball defined as follows We then give the augmented Lagrangian of Equation (9) with Lagrangian multipliers z and {Z i } 4 i=1 and penalty parameter ρ > 0: Following the framework of standard two-block ADMM [41], we separate the primal variables into two blocks (L, S) and (g, K, T , M, N ), and update them alternatively as follows: Update the first block (L, S): After the t-th iteration, we first update (L, S) by keeping the other variables fixed as follows: By taking derivatives, respectively, to L and S and setting them to zero, we obtain the following system of equations: Through solving the system of equations in Equation (12), we obtain where I denotes the identity operator, and the intermediate tensors are Update the second block (g, K, T , M, N ): According to the special form of the Lagrangian in Equation (10), the variables g, K, T , M, N in the second block can be updated separately as follows.
We first update g with fixed (L, S): We then update K with fixed (L, S): where Prox · ρ −1 (·) is the proximality operator of * L -TNN given in the following lemma.
Lemma 1 (A modified version of Theorem 3.2 in [26]). Let L 0 ∈ R d 1 ×d 2 ×d 3 be any tensor with * L -SVD L 0 = U * L D * L V . Then the proximality operator of * L -TNN at L 0 with constant where t + denotes the positive part of t, i.e., t + = max(t, 0).
We update T with fixed (L, S): where Prox · 1 τ (T ) is the proximality operator [19] of the tensor l 1 -norm at point T given as Prox where Proj · ∞ a (·) is the projector onto the tensor l ∞ -norm ball of radius a, which is given by Proj · ∞ a (M) = sign(M) min(|M|, a) [19]. Similarly, we update N as follows: Update the dual variables z and {Z i }: According to the update strategy of dual variables in ADMM [41], the variables z and {Z i } can be updated using dual ascent as follows: The algorithm for solving Problem (8) is summarized in Algorithm 1.

Input:
The design tensors {X i } and observations {y i }, the regularization parameters λ ι , λ s , the l 1 -norm bound a, the penalty parameter ρ of the Lagrangian, the convergence tolerance δ, the maximum iteration number T max .
Convergence Analysis: According to [28], the convergence rate of general ADMMbased algorithms is O(1/t), where t is the iteration number. The convergence analysis of Algorithm 1 is established in Theorem 2.
Theorem 2 (Convergence of Algorithm 1). For any positive constant ρ, if the unaugmented Lagrangian function L 0 (g, L, S, K, T , M, N , z, Z 1 , Z 2 , Z 3 , Z 4 ) has a saddle point, then the in Algorithm 1 satisfy the residual convergence, objective convergence and dual variable convergence (defined in [41]) of Problem (9) as t → ∞.

Proof.
The key idea is to rewrite Problem (9) into a standard two-block ADMM problem. For notational simplicity, let with u, v, w, c and A defined as follows where vec(·) denotes the operation of tensor vectorization (see [30]). It can be verified that f (·) and g(·) are closed, proper convex functions. Then, Problem (9) can be re-written as follows: According to the convergence analysis in [41], we have: where f , g are the optimal values of f (u), g(v), respectively. Variable w is a dual optimal point defined as:

Statistical Performance
In this section, we answer Q2 by studying the statistical performances of the proposed estimator (L,Ŝ). Specifically, the goal is to upper bound the squared F-norm error L − L * 2 F + Ŝ − S * 2 F . We will first give an upper bound on the estimation error in a non-asymptotic manner, and then prove that the upper bound is minimax optimal up to a logarithm factor.

Upper Bound on the Estimation Error
We establish an upper bounds on the estimation error in the following theorem. For notational simplicity, let N s = |X s | and N ι = |X ⊥ s | denote the number of corrupted and uncorrupted observations of L * in the observation model (6), respectively.

Theorem 3 (Upper bounds on the estimation error).
If the number of uncorrupted observations in the observation model (6) satisfy and regularization parameters in Problem (8) are set by then it holds with probability at least Theorem 3 implies that, if the noise level σ and spikiness level a are fixed, and all the corrupted positions are observed exactly only once (i.e., the number of corrupted observations N s = S * 0 ), then the estimation error in Equation (26) would be bounded by Note that, the bound in Equation (27) is intuition-consistent: if the underlying tensor L * gets more complex (i.e., with higher tubal rank), then the estimation error will be larger; if the corruption tensor S * gets denser, then the estimation error will also become larger; if the number of uncorrupted observations N ι gets larger, then the estimation error will decrease. The scaling behavior of the estimation error in Equation (27) will be verified through experiments on synthetic data in Section 7.1.

Remark 1 (Consistence with prior models for robust low-tubal-rank tensor completion).
According to Equation (27), our * L -SVD-based estimator in Equation (8) allows the tubal rank r tb (L * ) to take the order O(d 2 / log(d 1 d 3 + d 2 d 3 )), and the corruption ratio S * 0 /(d 1 d 2 d 3 ) to be O(1) for approximate estimation with small error. It is slightly better with a logarithm factor than the results for t-SVD-based tensor robust completion model in [8] which allows r tb (L * ) Remark 2 (Consistence with prior models for noisy low-tubal-rank tensor completion). If S * 0 = 0, i.e., the corruption S * vanishes, then we obtain which is consistent with the error bound for t-SVD-based noisy tensor completion [42][43][44], and * L -SVD-based tensor Dantzig Selector in [18].
Remark 3 (Consistence with prior models for robust low-tubal-rank tensor decomposition).
In the setting of Robust Tensor Decomposition (RTD) [34], the fully observed model instead of our estimation model in Equation (6) is considered. For the RTD problem, our error bound in Equation (27) is consistent with the t-SVD-based bound for RTD [34] (up to a logarithm factor).
Remark 4 (No exact recovery guarantee). According to Theorem 3, when σ = 0 and S * 0 = 0, i.e., in the noiseless case, the estimation error is upper bounded by O(a(d 1 ∨ d 2 )d 3 r tb (L * ) logd/N) which is not zero. Thus, no exact recovery is guaranteed by Theorem 3. It can be seen as a trade-off that we do not assume the low-tubal-rank tensor L * to satisfy the tensor incoherent conditions [8,35,37] which essentially ensures the separability between L * and S * .

A Minimax Lower Bound for the Estimation Error
In Theorem 3, we established the estimation error for Model (8). Then one may ask the complementary questions: how tight is the upper bound? Are there fundamental (model-independent) limits of estimation error in robust tensor completion? In this section, we will answer the questions.
To analyze the optimality of the proposed upper bound in Theorem 3, the minimax lower bounds of the estimation error is established for the tensor pair (L * , S * ) belonging to the class A(r, s, a) of tensor pairs defined as: We then define the associated element-wise minimax error as follows where the infimum ranges over all pairs of estimators (L,Ŝ) , the supremum ranges over all pairs of underlying tensors (L * , S * ) in the given tensor class A(r, s, a), and the expectation is taken over the design tensors {X i } and i.i.d. Gaussian noises {ξ i } in the observation model (6). We come up with the following theorem.

Theorem 4 (Minimax lower bound).
Suppose the dimensionality d 1 , d 2 ≥ 2, the rank and sparsity parameters r and the number of corrupted entries N s ≤ τrd with a constant τ > 0. Then, under Assumption 1, there exist absolute constants b ∈ (0, 1) and c > 0, such that The lower bound given in Equation (30) implies that the proposed upper bound in Theorem 3 is optimal (up to a log factor) in the minimax sense for tensors belonging to the set A(r, s, a). That is to say no estimator can obtain more accurate estimation than our estimator in Equation (8) (up to a log factor) for (L * , S * ) ∈ A(r, s, a), thereby showing the optimality of the proposed estimator.

Connections and Differences with Previous Works
In this section, we discuss the connections and differences with existing nuclear norm based robust matrix/tensor completion models, where the underlying matrix/tensor suffers from missing values, gross sparse outliers, and small dense noises at the same time.
First, we briefly introduce and analyze the two most related models, i.e., the matrix nuclear norm based model [36] and the sum of mode-wise matrix nuclear norms based model [45] as follows.
(1) The matrix Nuclear Norm (NN) based model [36]: If the underlying tensor is of 2-way, i.e., a matrix, then the observation model in Equation (6) becomes the setting for robust matrix completion, and the proposed estimator in Equation (8) degenerates to the matrix nuclear norm based estimator in [36]. In both model formulation and statistical analysis, this work can be seen as a 3-way generalization of [36]. Moreover, by conducting robust matrix completion on each frontal slice of a 3-way tensor, we can obtain the matrix nuclear norm based robust tensor completion model as follows: (2) The Sum of mode-wise matrix Nuclear Norms (SNN) based model [45]: Huang et al. [45] proposed a robust tensor completion model based on the sum of mode-wise nuclear norms deduced by the Tucker decomposition as follows where L (k) ∈ R d i ×∏ j =k d j is the mode-k matriculation of tensor L ∈ R d 1 ×d 2 ×d 3 , for all i = 1, 2, 3. The main differences between SNN and this work are two-fold: (i) SNN is based on the Tucker decomposition [15], whereas this work is based on the recently proposed tensor * L -SVD [24]; (ii) the theoretical analysis for SNN cannot guarantee the minimax optimality of the model in [45], whereas this works rigorously proof of the minimax optimality of the proposed estimator is established in Section 5.
Then, we discuss the following related works which can be seen as special cases of this work.
(1) The robust tensor completion model based on t-SVD [46]: In a short conference presentation [46] (whose first author is the same as this paper), the t-SVD-based robust tensor completion model is studied. As t-SVD can be viewed as a special case of the * L -SVD (when DFT is used as the transform L), the model in [46] can be a special case of ours. (2) The robust tensor recovery models with missing values and sparse outliers [8,27]: In [8,27], the authors considered the robust reconstruction of incomplete tensor polluted by sparse outliers, and proposed t-SVD (or * L -SVD) based models with theoretical guarantees for exact recovery. As they did not consider small dense noises, their settings are indeed a special case of our observation model (6) when E = 0. (3) The robust tensor decomposition based on t-SVD [34]: In [34], the authors studied the t-SVD-based robust tensor decomposition, which aims at recovering a tensor corrupted by both gross sparse outliers and small dense noises. Comparing with this work, Ref. [34] can be seen as a special case when there are no missing values.

Experiments
In this section, experiments on synthetic datasets will be first conducted to validate the sharpness of the proposed upper bounds in Theorem 3. Then, both effectiveness and efficiency of the proposed algorithm will be demonstrated through experiments on seven different types of remote sensing datasets. All codes are written in Matlab, and all experiments are performed on a Windows 10 laptop with AMD Ryzen 3.0 GHz CPU and 8 GB RAM.

Sharpness of the Proposed Upper Bound
Sharpness of the proposed upper bounds in Theorem 3 will be validated. Specifically, we will check whether the upper bounds in Equation (27) can reflect the true scaling behavior of the estimation error. As predicted in Equation (27), if the upper bound is "sharp", then it is expected that the Mean Square will possess a scaling behavior very similar to the upper bound: approximately linear w.r.t the tubal rank of the underlying tensor L * , the l 0 -norm of the corruption tensor S * , and the reciprocal of uncorrupted observation number N ι . We will examine whether this expectation will happen in simulation studies on synthetic datasets.
The synthetic datasets are generated as follows. Similar to [26], we consider three cases of linear transform L with orthogonal matrix L: (1) Discrete Fourier Transform (DFT); (2) Discrete Cosine Transform (DCT) [24]; (3) Random Orthogonal Matrix (ROM) [26]. The underlying low-rank tensor L * ∈ R d 1 ×d 2 ×d 3 with * L -tubal rank r * is generated by L * = P * L Q, where P ∈ R d 1 ×r * ×d 3 and Q ∈ R r * ×d 2 ×d 3 are i.i.d. sampled from N (0, 1). L * is then normalized such that L * ∞ = 1. Second, to generate the sparse corruption tensor S * , we first form S 0 with i.i.d. uniform distribution Uni(0, 1) and then uniformly select γd 1 d 2 d 3 entries. Thus the number of corrupted entries S * 0 = γd 1 d 2 d 3 . Third, we uniformly select N ι elements from the uncorrupted positions of (L * + S * ). Finally, In Figure 3, we report the results for 100 × 100 × 30 tensors when the DFT is adopted as the linear transform L in Equation (4). According to sub-plots (a), (b), and (d) in Figure 3, it can be seen that the MSE scales approximately linearly w.r.t. r tb (L * ), S * 0 , and N −1 ι . There results accord well with our expectation for the size 100 × 100 × 30 and linear transform L = DFT. As very similar phenomena are also observed in all the other settings where d ∈ {80, 120} and L ∈ {DCT, ROM}, we simply omit them.Thus, it can be verified that the scaling behavior of the estimation error can be approximately predicted by the proposed upper bound in Equation (27). (c) (d)

Effectiveness and Efficient of the Proposed Algorithm
In this section, we evaluate both the effectiveness and efficiency of the proposed Algorithm 1 by conducting robust tensor completion on seven different types of datasets collected from several remote sensing related applications from Sections 7.2.1-7.2.7.
Following [25], we adopted three different transformations L in Equation (4) to define the * L -TNN: the first two transformations are DFT and DCF (denoted by TNN (DFT) and TNN (DCT), respectively), and the third one named TNN (Data) depends on the given data motived by [27,31]. We first perform SVD on the mode-3 unfolding matrix of L * as L * (3) = USV , and then use U as the desired transform matrix in the * L -product (4). The proposed algorithm is compared with the aforementioned models NN [36] in Equation (32) and SNN [45] in Equation (33) in Section 6. Both Model (32) and Model (33) are solved by using ADMM with implementations by ourselves in Matlab language.
We conduct robust tensor completion on the datasets in Figure 4 with a similar settings as [47]. For a d 1 × d 2 × d 3 tensor data L * re-scaled by L * ∞ = 1, we choose its support uniformly at random with ratio ρ s and fill in the values with i.i.d. standard Gaussian variables to generate the corruption S * . Then, we randomly sample the entries of L * + S * uniformly with observation ratio ρ obs . The noises {ξ i } are further generated with i.i.d. zeromean Gaussian entries whose standard deviation is given by σ = 0.05 L * F / √ d 1 d 2 d 3 to generate the observations {y i }. The goal in the experiments is to estimate the underlying signal L * from {y i }. The effectiveness of algorithms are measured by the Peaks Signal Noise Ratio (PSNR) and structural similarity (SSIM) [48]. Specifically, the PSNR of an estimatorL is defined as PSNR := 10 log 10 for the underlying tensor L * ∈ R d 1 ×d 2 ×d 3 . The SSIM is computed via where µ L * , µL, σ L * , σL, σ L * ,L andω denotes the local means, standard deviation, crosscovariance, and dynamic range of the magnitude of tensors L * andL. Larger PSNR and SSIM values indicate higher quality of the estimatorL.

Experiments on an Urban Area Imagery Dataset
Area imagery data processing plays a key role in many remote sensing applications, such as land-use mapping [49]. We adopt the popular area imagery dataset UCMerced [50], which is a 21 class land use image dataset meant for research purposes. The images were manually extracted from large images from the USGS National Map Urban Area Imagery collection for various urban areas around the country. The pixel resolution of this public domain imagery is 1 foot, and each RGB image measures 256 × 256 pixels. There are 100 images for each class, and we chose the 85-th image to form a dataset of 21 images as shown in Figure 4.
We present the PSNR, SSIM values and running time in Figures 5 and 6 for settings of (ρ obs , ρ s ) = (0.3, 0.2) and (ρ obs , ρ s ) = (0.8, 0.3), respectively, for quantitative evalution, with visual examples shown in Figures 7 and 8. It can seen that from Figures 5-8 that the proposed TNN (Data) has the highest recovery quality in most cases, and posses a comparative running time as NN. We attribute the promising performance of the proposed algorithm to the extraordinary representation power of the low-tubl-rank models: lowtubal-rankness can exploit both low-rankness and smoothness simultaneously, whereas traditional models like NN and SNN can only exploit low-rankness in the original domain [18].

Experiments on Hyperspectral Data
Benefit from its fine spectral and spatial resolutions, hyperspectral image processing has been extensively adopted in many remote sensing applications [10,52]. In this section, we conduct robust tensor completion on subsets of the two representative hyperspectral datasets described as follows: • Indian Pines: This dataset was collected by AVIRIS sensor in 1992 over the Indian Pines test site in North-western Indiana and consists of 145 × 145 pixels and 224 spectral reflectance bands. We use the first 30 bands in the experiments due to the trade-off between the limitation of computing resources and the efforts for parameter tuning. • Salinas A: The data were acquired by AVIRIS sensor over the Salinas Valley, California in 1998, and consists of 224 bands over a spectrum range of 400-2500 nm. This dataset has a spatial extent of 86 × 83 pixels with a resolution of 3.7 m. We use the first 30 bands in the experiments too.
For quantitative evalution, we report the PSNR, SSIM values and running time in Tables 2 and 3

Experiments on Multispectral Images
Multispectral imaging captures image data within specific wavelength ranges across the electromagnetic spectrum, and has become one of the most widely utilized datatype in remote sensing. This section presents simulated experiments on multispectral images. The original data are two multispectral images Beads and Cloth from the Columbia MSI Database (available at http://www1.cs.columbia.edu/CAVE/databases/multispectral accessed on 28 July 2021) containing scenes of a variety of real-world objects. Each MSI is of size 512 × 512 × 31 with intensity range scaled to [0, 1].
We also consider three settings, i.e., Setting I (ρ obs = 0.3, ρ s = 0.2), Setting II (ρ obs = 0.6, ρ s = 0.25), and Setting III (ρ obs = 0.8, ρ s = 0.3) for robust completion of multi-spectral data. We tune the parameters in the same way as Section 7.2.2. In each setting, we test each image for 10 trials and report the averaged PSNR (in db), SSIM and running time (in seconds).
For quantitative evalution, we report the PSNR, SSIM values and running time in Tables 4 and 5 for the Beads and Cloth datasets, respectively. The visual examples for the Cloth dataset is shown in Figure 11. We can also find that the proposed TNN (Data) achieves the highest accuracy in most cases, and has a comparative running time as NN, which demonstrates both the effectiveness and efficiency of low-tubal-rank models. Table 4. Quantitative evaluation on the Beads dataset in PSNR, SSIM, and running time of five tensor completion models for robust tensor completion in three settings, i.e., Setting I (ρ obs = 0.3, ρ s = 0.2), Setting II (ρ obs = 0.6, ρ s = 0.25), and Setting III (ρ obs = 0.8, ρ s = 0.3). The highest PSNR/SSIM, or lowest time (in seconds) is highlighted in bold.

Settings
Metrics

Experiments on Point Could Data
With the rapid advances of sensor technology, the emerging point cloud data provide better performance than 2D images in many remote sensing applications due to its flexible and scalable geometric representation [53]. In this section, we also conduct experiments on a dataset (scenario B from http://www.mrt.kit.edu/z/publ/download/velodynetracking/ dataset.html, accessed on 28 July 2021) for Unmanned Ground Vehicle (UGV). The dataset contains a sequence of point cloud data acquired from a Velodyne HDL-64E LiDAR. We select 30 frames (Frame Nos. 65-94) from the data sequence. The point cloud data is formatted into two tensors sized 64 × 870 × 30 representing the distance data (named SenerioB Distance) and the intensity data (named SenerioB Intensity), , respectively.
We also consider three settings, i.e., Setting I (ρ obs = 0.3, ρ s = 0.2), Setting II (ρ obs = 0.6,ρ s = 0.25), and Setting III (ρ obs = 0.8, ρ s = 0.3) for robust completion of point cloud data. We tune the parameters in the same way as Section 7.2.2. In each setting, we test each image for 10 trials and report the averaged PSNR (in db), SSIM and running time (in seconds). For quantitative evalution, we report the PSNR, SSIM values and running time in Tables 6 and 7 for the SenerioB Distance and SenerioB Intensity datasets, respectively. We can also find that the proposed TNN (Data) achieves the highest accuracy in most cases, and has a comparative running time as NN, which demonstrates both the effectiveness and efficiency of low-tubal-rank models. Table 6. Quantitative evaluation on the SenerioB Distance dataset in PSNR, SSIM, and running time of five tensor completion models for robust tensor completion in three settings, i.e., Setting I (ρ obs = 0.3, ρ s = 0.2), Setting II (ρ obs = 0.6, ρ s = 0.25), and Setting III (ρ obs = 0.8, ρ s = 0.3). The highest PSNR/SSIM, or lowest time (in seconds) is highlighted in bold.

Experiments on Aerial Video Data
Aerial videos (or time sequences of images) are broadly used in many computer vision based remote sensing tasks [54]. We experiment on a 180 × 320 × 30 tensor which consists of the first 30 frames of the Sky dataset (available at http://www.loujing.com/rss-smalltarget, accessed on 28 July 2021) for small object detection [55].
We also consider three settings, i.e., Setting I (ρ obs = 0.3, ρ s = 0.2), Setting II (ρ obs = 0.6, ρ s = 0.25), and Setting III (ρ obs = 0.8, ρ s = 0.3). We tune the parameters in the same way as Section 7.2.2. In each setting, we test each image for 10 trials and report the averaged PSNR (in db), SSIM and running time (in seconds). For quantitative evalution, we report the PSNR, SSIM values and running time in Table 8. It is also found that the proposed TNN (Data) achieves the highest accuracy in most cases, and can run as fast as NN.  Thermal infrared data can provide important measurements of surface energy fluxes and temperatures in various remote sensing applications [7]. In this section, we experiment on two infrared datasets as follows: • The Infraed Detection dataset [56] 2), Setting II (ρ obs = 0.6, ρ s = 0.25), and Setting III (ρ obs = 0.8, ρ s = 0.3), and use the same strategy for parameter tuning. In each setting, we test each image for 10 trials and report the averaged PSNR (in db), SSIM and running time (in seconds). For quantitative evalution, we report the PSNR, SSIM values and running time in Tables 9 and 10 for the Infraed Detection and OSU Thermal Database datasets, respectively. The visual examples are, respectively, shown in Figures 12 and 13. It can seen that the proposed TNN (Data) has the highest recovery quality in most cases, and has a comparative running time as NN, showing both effectiveness and efficiency of low-tubal-rank models in comparison with original domain-based models NN and SNN. Table 9. Quantitative evaluation on the Infraed Detection dataset in PSNR, SSIM, and running time of five tensor completion models for robust tensor completion in three settings, i.e., Setting I (ρ obs = 0.3, ρ s = 0.2), Setting II (ρ obs = 0.6, ρ s = 0.25), and Setting III (ρ obs = 0.8, ρ s = 0.3). The highest PSNR/SSIM, or lowest time (in seconds) is highlighted in bold.

Settings
Metrics

Experiments on SAR Data
Polarimetric synthetic aperture radar (PolSAR) has attracted lots of attention from remote sensing scientists because of its various advantages, e.g., all-weather, all-time, penetrating capability, and multi-polarimetry [57]. In this section, we adopt the PolSAR UAVSAR Change Detection Images dataset. It is a dataset of single-look quad-polarimetric SAR images acquired by the UAVSAR airborne sensor in L-band over an urban area in San Francisco city on 18 September 2009, and May 11, 2015. The dataset #1 have length and width of 200 pixels, and we use the first 30 bands.
We also consider three settings, i.e., Setting I (ρ obs = 0.3, ρ s = 0.2), Setting II (ρ obs = 0.6, ρ s = 0.25), and Setting III (ρ obs = 0.8, ρ s = 0.3). We tune the parameters in the same way as Section 7.2.2. In each setting, we test each image for 10 trials and report the averaged PSNR (in db), SSIM and running time (in seconds) in Table 11. It is also found that the proposed TNN (Data) achieves the highest accuracy in most cases, and can run as fast as NN. Table 11. Quantitative evaluation on the UAVSAR-Dataset1-2015 dataset in PSNR, SSIM, and running time of five tensor completion models for robust tensor completion in three settings, i.e., Setting I (ρ obs = 0.3, ρ s = 0.2), Setting II (ρ obs = 0.6, ρ s = 0.25), and Setting III (ρ obs = 0.8, ρ s = 0.3). The highest PSNR/SSIM, or lowest time (in seconds) is highlighted in bold.

Settings
Metrics

Conclusions
In this paper, we resolve the challenging robust tensor completion problem by proposing a * L -SVD-based estimator to robustly reconstruct a low-rank tensor in the presence of missing values, gross outliers, and small noises simultaneously. Specifically, this work can be concluded in the following three aspects: (1) Algorithmically, we design an efficient algorithm within the framework of ADMM to efficiently compute the proposed estimator with guaranteed convergence behavior. (2) Statistically, we analyze the statistical performance of the proposed estimator by establishing a non-asymptotic upper bound on the estimation error. The proposed upper bound is further proved to be minimax optimal (up to a log factor). However, from a critical point of view, the proposed method has the following two limitations: (1) The orientational sensitivity of * L -SVD: Despite the promising empirical performance of the * L -SVD-based estimator, a typical defect of it is the orientation sensitivity owing to low-rankness strictly defined along the tubal orientation which makes it fail to simultaneously exploit transformed low-rankness in multiple orientations [19,58]. (2) The difficulty in finding the optimal transform L(·) for * L -SVD: Although a direct use of fixed transforms (like DFT and DCT) may produce fairish empirical performance, it is still unclear how to find the best optimal transformation L(·) for any certain tensor L * when only partial and corrupted observations are available.
According to the above limitations, it is interesting to consider higher-order extensions of the proposed model in an orientation invariant way like [19] and discuss the statistical performance. It is also interesting to consider the data-dependent transformation learning like [31,59]. Another future direction is to consider more efficient solvers of Problem (8) using the factorization strategy or Frank-Wolfe method [47,[60][61][62].

Informed Consent Statement: Not applicable.
Data Availability Statement: In this paper, all the data supporting our experimental results are publicly available with references or URL links.

Acknowledgments:
The authors are grateful to the editor and reviewers for their valuable time in processing this manuscript. The first author would like to thank Shasha Jin for her kind understanding in these months, and Zhong Jin in Nanjing University of Science and Technology for his long time support. The authors are also grateful to Xiongjun Zhang for sharing the code for [25] and Olga Klopp for her excellent theoretical analysis in [36,51] which is so helpful.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A. Proof of Theoretical Results
Appendix A. 1

. Additional Notations and Preliminaries
For the ease of exposition, we first list the additive notations often used in the proofs in Table A1. Table A1. Additional notations in the proofs.

Notations Descriptions Notations Descriptions
∆ ι := L * −L estimation error of L * ∆ s := S * −Ŝ estimation error of S * r * := r tb (L * ) * L -tubal-rank of L * s * := S * 0 sparsity of S * := N/N ι inverse uncorrupted ratio a l ∞ -norm bound in Equation (8) index set of design tensors {X i } corresponding to corrupted entries E := 1 N ∑ i∈Ωι ξ i X i stochastic tensor defined to lower bound parameters λ ι and λ s W := 1 N ∑ i∈Ωι X i stochastic tensor defined to lower bound parameter λ s expectation of X i , · 2 for i ∈ Ω ι defined to establish the RSC condition We then introduce the decomposability of * L -TNN and tensor l 1 -norm which plays a key role in the analysis. Decomposability of * L -TNN. Suppose L * has reduced * L -SVD as L * = U * L D * L V , where U ∈ R d 1 ×r * ×d 3 and V ∈ R d 2 ×r * ×d 3 are orthogonal and D ∈ R r * ×r * ×d 3 is f-diagonal. Define projectors P (·) and P ⊥ (·) as follows: Then, it can be verified that: (III). r tb (P (T )) ≤ 2r tb (L * ), ∀T ∈ R d 1 ×d 2 ×d 3 .
In the same way to the results in supplementary material of [43], it can also be shown that the following equations hold: Decomposability of tensor l 1 -norm [63].
Let The proof follows the lines of [34,36]. For notational simplicity, we define the following two sets Proof of Theorem 3. Let F (L, S) = 1 N ∑ N i=1 (y i − L + S, X i ) 2 + λ ι L + λ s S 1 for simplicity. Then, according to the optimality of (L,Ŝ) to Problem (8), it holds that and :=II where E := 1 N ι ∑ i∈Ω ι ξ i X i , and the equality E, ∆ s = E, ∆ s Θ ⊥ s holds. Now each item in the right hand side of (A8) will be upper bounded separately as follows. Following the idea of [36], the upper bound will be analyzed upon the following event According to the tail behavior of the maximum in a sub-Gaussian sequence, it holds with an absolute constant C * > 0 such that P[E] ≥ 1 − 1/(2d). Bound I. On the event E, we get Bound II. Note that according to the properties of * L -TNN, we have Thus, we can bound term II by By letting λ ι ≥ 4 E sp , it holds that Bound III: Note that since S * Thus, putting Equations (A10)-(A12) together, we have the following inequality on the event E: Then, we follow the line of [36] to specify a kind of Restricted Strong Convexity (RSC) for the random sampling operator formed by the design tensors {X i } on a carefully chosen constrained set. The RSC will show that when the error tensors (∆ ι , ∆ s ) belong to the constrained set, the following relationship: holds with an appropriate residual τ with high probability. Before explicitly defining the constrained set, we first consider the following set where ∆ s should lie: with two positive constants δ 1 and δ 2 whose values will be specified later. We also define the following set of tensor pairs: We then define the constrained set as the intersection: To bound the estimation error in Equation (26), we will upper bound ∆ ι F and ∆ s F separately. Note that and similarly We will bound ∆ ι Thus, we can upper bound ∆ ι Π with an upper bound of ∆ s Π in Lemma A1 First, according to Lemma A3, it holds on the event E defined in Equation (A9) that where (i) holds due to the triangular inequality; (ii) is a direct consequence of Lemma A3, and the definition of event E; (iii) holds because r tb (P (T )) ≤ 2r * , and T ≤ r tb (T )d 3 T F for any T ∈ R d 1 ×d 2 ×d 3 ; (iv) stems from the inequality P (T ) F ≤ (v) is due to the triangular inequality; (vi) holds since ∆ ι ∞ ≤ 2a. Note that according to Lemma A1, we have with probability at least 1 − 2.5/d: where ∆ 1 is defined in Lemma A4. Together with Equation (A22), we have with the following parameters Then, according to Lemma A6, it holds with probability at least 1 − 2/d that where τ(r, κ, δ 1 , δ 2 ) is defined in Equation (A64).
Recall that Equation (A13) writes: Letting := N/N ι , we further obtain Thus, by using Equation (A20) and Lemma A1, we have Note that the bound on ∆ s Π is given in Lemma A4, and the values of λ ι and λ s can be set according to Lemmas A8 and A9, respectively. Then, by putting Equations (A18), (A19) and (A52) together, and using Lemmas A8 and A9 to bound associated norms of the stochastic quantities E, W, and R Σ in the error term, we can obtain the bound on ∆ ι 2 F + ∆ s 2 F and complete the proof.
Appendix A.2.2. Lemmas for the Proof of Theorem 3 Proof of Lemma A1. By the standard condition for optimality over a convex set, it holds that for any feasible (L, S) (L, S), ∂F (L,Ŝ) ≥ 0 (A32) which further leads to Thus, we have One the other hand, the definition of sub-differential indicates Thus, it holds that which complete the proof.
Lemma A2. It holds that Proof of Lemma A2. Note that where (i) hold since · ∞ is the dual norm of · 1 ; (ii) holds since | X i , ∆ ι | ≤ ∆ ι ∞ ≤ 2a and the tensor 1 -norm · 1 is invariant to changes in sign.
Lemma A3. By letting λ ι ≥ 4 E sp and λ s ≥ E ∞ , we have Proof of Lemma A3. In Equation (A33), letting (L, S) ← (L * , S * ), we obtain Also, we have according to the convexity of · and · 1 that Thus, we have Moreover, it is often used that Since we set λ ι ≥ 4 E sp and λ s ≥ 4 E ∞ , we have Note that, S * 1 ≤ aN s . Thus, we have where (i) holds due to Assumption 1.I; (ii) is a direct use of Lemma A1; (iii) stems from the facts that ∆ s ∞ ≤ 2a and the definition of event E in Equation (A9). Thus, Equation (A51) is proved. (II) The proof of Equation (A52): According to the optimality of (L,Ŝ) to Problem (8), which further leads to Note that by using 2ab > −(1/2a 2 + 2b 2 ), we have Thus on the event E defined in Equation (A9), we have Combing Equations (A57) and (A58) yields the bound on ∆ s Π .
Lemma A5. Define the following set Then, it holds with probability at least 1 − 2/d 3 that for any B ∈ D(δ, β).
Proof of Lemma A5. We prove this lemma using a standard peeling argument. First, define the following We partition this set to simpler events with l ∈ N + : such that 1 Note that according to Lemma A7, we have with t ∈ [α l−1 β, α l β): Lemma A6. For any (A, B) ∈ C(r, κ, β) ∩ R d 1 ×d 2 ×d 3 × B(δ 1 , δ 2 ), it holds with probability at least 1 − 2/d that where τ(r, κ, δ 1 , δ 2 ) = 4(16α + 1)rd 3 Proof of Lemma A7. First, we study the tail behavior of Z t by directly using the Massart's inequality in Theorem 14.2 of [64]. According to the Massart's inequality, it holds for any By letting s = t/(4α), the first inequality in Equation (A67) is proved. Then, we will upper bound the expectation of Z t . By standard symmetrization argument [65], we have where ε i 's are i.i.d. Randemacher variables. Further, according to the contraction principle [66], it holds that In the following, we consider the two cases: Case 1. Consider T ∈ D(δ, β) ∩ C (t), we have By letting s = t/(2α) in Equation (A68), we obtain when T ∈ D(δ, β) ∩ C (t). Case 2. Consider T = A + B, where (A, B) ∈ C(r, κ, β), B ∈ B(δ 1 , δ 2 ), and T 2 Π ≤ t. The goal in this case is to upper bound where (i) holds as a property of the sup operation; (ii) holds due to the definition of dual norm; (iii) stems from the condition B ∈ B(δ 1 , δ 2 ).
Lemma A8. Under Assumption 1, there exists an absolute constant C > 0 such that the following bounds on the tensor spectral norm of stochastic tensors E and R Σ hold: (I) For tensor E, we have Proof of Lemma A8. Equation (A79) can be seen as a special case of Lemma 8 in [18] when k = 1. Equation (A80) can be proved very similarly to Equation (A79) followed by tricks used in proof of Lemma 6 in [51]. We omit the details due to the high similarity.
Lemma A9. Under Assumption 1, there exists an absolute constant C > 0 such that the following bounds on the l ∞ -norm of stochastic tensors E, W, and R Σ hold: (I) For tensor E, we have Proof of Lemma A9. Since this lemma can be straightforwardly proved in the same way as Lemma 10 in [36], we omit the proof.
(ii) for any two distinct tensors T 1 and T 2 in L 0 , Let P (L,0) denote the probabilistic distribution of random variables {y i } observed when the underlying tensor is (L, 0) in the observation model (6). Note that, the distribution of the random noise ξ i i.i.d.