1. Introduction
Despite the broad adoption of advanced sensors in various remote sensing tasks, the quality of data remains a critical issue and can significantly influence the actual performances of the backend applications. Many types of modern remote sensing data in the modality of optical, hyperspectral, multispectral, thermal, Light Detection and Ranging (LiDAR), Synthetic Aperture Radar (SAR), etc., are typically multi-way and can be readily stored, analyzed, and processed by tensor-based models [
1,
2,
3,
4,
5,
6,
7]. In some extreme circumstances, the data tensor may encounter missing entries, gross sparse outliers, and small dense noises at the same time, as a result of partial sensor failures, communication errors, occlusion by obstacles, and so on [
8,
9]. To robustly complete a partially observed data tensor corrupted by outliers and noises, the problem of robust tensor completion arises.
When only a fraction of partially corrupted observations are available, the crucial point of robust tensor completion lies in the assumption that the underlying data tensor is highly redundant such that the main components of it remain only slightly suppressed by missing information, outliers, and noises, and thus can be effectively reconstructed by exploiting the intrinsic redundancy. The tensor low-rankness is an ideal tool to model the redundancy of tensor data, and has gained extensive attention in remote sensing data restoration [
5,
10,
11].
As higher-order extensions of low-rank matrix models [
12], low-rank tensor models are typically formulated as minimization problems of the tensor rank function [
13]. However, there are multiple definitions of tensor ranks, such as the CP rank [
14], Tucker rank [
15], TT rank [
16], TR rank [
17], etc., which focus on low rank structures in the original domains (like the pixel domain of optimal images) [
18,
19]. Recently, a remarkably different example named the low-tubal-rank tensor model [
20,
21] was proposed within the algebraic framework of tensor Singular Value Decomposition (t-SVD) [
20,
22], which captures low-rankness in the frequency domain defined via Discrete Fourier Transform (DFT). As discussed in [
18,
19,
21,
23], the low-tubal-rank tensor models are capable to exploit both low-rankness and smoothness of the tensor data, making it quite suitable to analyze and process diverse remote sensing imagery data which are often simultaneously low-rank and smooth [
5,
10].
Motivated by the advantages of low-tubal-rankness in modeling remote sensing data, we resolve the robust tensor completion problem by utilizing a generalized low-tubal-rank model based on the tensor
-Singular Value Decomposition (
-SVD) [
24], which leverages low-rankness in more general transformed domains rather than DFT. What needs to be pointed out is that the
-SVD has become a research focus in tensor-based signal processing, computer vision, and machine learning very recently [
18,
23,
25,
26]. Regarding the preference of theory in this paper, we only introduce several typical works with statistical analysis as follows. For tensor completion in the noiseless settings, Lu et al. [
26] proposed a
-SVD-based model which can exactly recover the underlying tensor under mild conditions. For tensor completion from partial observations corrupted by sparse outliers, Song et al. [
27] designed a
-SVD-based algorithm with exact recovery guarantee. Zhang et al. [
25] developed a theoretically guaranteed approach via the
-SVD to for tensor completion from Poisson noises. The problem of tensor recovery from noisy linear observations is studied in [
18] based the
-SVD with guaranteed statistical performance.
In this paper, we focus on statistical guaranteed approaches in a more challenging setting than the aforementioned
-SVD-based models, where the underlying signal tensor suffers from missing entries, sparse outliers, and small dense noises simultaneously. Specifically, we resolve the problem of robust tensor completion by formulating a
-SVD-based estimator whose estimation error is established and further proved to be minimax optimal (up to a log factor). We propose an algorithm based on Alternating Direction Method of Multipliers (ADMM) [
28,
29] to compute the estimator and evaluate both the effectiveness and efficiency on seven different types of remote sensing data.
The remainder of this paper proceeds as follows. We first introduce some notation and preliminaries in
Section 2. Then, the proposed estimator for robust tensor completion is formulated in
Section 3. We compute the estimator by using an ADMM-based algorithm described in
Section 4. The statistical performance of the proposed estimator is analyzed in
Section 5. Experimental results on both synthetic and real datasets are reported in
Section 7. We summarize this paper and discuss future directions briefly in
Section 8. The proofs of the theoretical results are given in
Appendix A.
4. Algorithm
In this section, we answer Q1 by designing an algorithm based on ADMM to compute the proposed estimator.
To solve Problem (
8), the first step is to introduce auxiliary variables
to deal with the complex couplings between
,
,
, and
as follows:
where
is the indicator function of tensor
-norm ball defined as follows
We then give the augmented Lagrangian of Equation (
9) with Lagrangian multipliers
and
and penalty parameter
:
Following the framework of standard two-block ADMM [
41], we separate the primal variables into two blocks
and
, and update them alternatively as follows:
Update the first block
: After the
t-th iteration, we first update
by keeping the other variables fixed as follows:
By taking derivatives, respectively, to
and
and setting them to zero, we obtain the following system of equations:
Through solving the system of equations in Equation (
12), we obtain
where
denotes the identity operator, and the intermediate tensors are given by
,
, and
.
Update the second block
: According to the special form of the Lagrangian in Equation (
10), the variables
in the second block can be updated separately as follows.
We first update
with fixed
:
We then update
with fixed
:
where
is the proximality operator of
–TNN given in the following lemma.
Lemma 1 (A modified version of Theorem 3.2 in [
26]).
Let be any tensor with –SVD . Then the proximality operator of –TNN at with constant , defined as , can be computed bywhere denotes the positive part of t, i.e., .
We update
with fixed
:
where
is the proximality operator [
19] of the tensor
-norm at point
given as
, where ⊙ denotes the element-wise product.
We then update
with fixed
:
where
is the projector onto the tensor
-norm ball of radius
a, which is given by
[
19].
Similarly, we update
as follows:
Update the dual variables
and
: According to the update strategy of dual variables in ADMM [
41], the variables
and
can be updated using dual ascent as follows:
The algorithm for solving Problem (
8) is summarized in Algorithm 1.
Algorithm 1 Solving Problem (8) using ADMM. |
- Input:
The design tensors and observations , the regularization parameters , the -norm bound a, the penalty parameter of the Lagrangian, the convergence tolerance , the maximum iteration number . - 1:
Initialize , - 2:
fordo - 3:
Update by Equation ( 13); - 4:
Update by Equations ( 14)–( 20), respectively; - 5:
Update by Equation ( 21); - 6:
Check the convergence criteria: - 7:
(i) convergence of primal variables:
(ii) convergence of constraints:
- 8:
end for - Output:
.
|
Complexity Analysis: The time complexity of Algorithm 1 is analyzed as follows. Due to the special structures of design tensors
, the operators
and
can be implemented with time cost
and
, respectively. The cost of updating
and
is
. The main time cost in Algorithm 1 lies in the update of
which needs the
–SVD on
tensors, involving the
-transform (costing
in general), and
matrix SVDs on
matrices (costing
). Thus, the one-iteration cost of Algorithm 1 is
in general, and can be reduced to
for some linear transforms
L which have fast implementations (like DFT and DCT).
Convergence Analysis: According to [
28], the convergence rate of general ADMM-based algorithms is
, where
t is the iteration number. The convergence analysis of Algorithm 1 is established in Theorem 2.
Theorem 2 (Convergence of Algorithm 1).
For any positive constant ρ, if the unaugmented Lagrangian function has a saddle point, then the iterations in Algorithm 1 satisfy the residual convergence, objective convergence and dual variable convergence (defined in [41]) of Problem (9) as . Proof. The key idea is to rewrite Problem (
9) into a standard two-block ADMM problem. For notational simplicity, let
with
and
defined as follows
and
where
denotes the operation of tensor vectorization (see [
30]).
It can be verified that
and
are closed, proper convex functions. Then, Problem (
9) can be re-written as follows:
According to the convergence analysis in [
41], we have:
where
are the optimal values of
,
, respectively. Variable
is a dual optimal point defined as:
where
is the component of dual variables in a saddle point
of the unaugmented Lagrangian
. □
6. Connections and Differences with Previous Works
In this section, we discuss the connections and differences with existing nuclear norm based robust matrix/tensor completion models, where the underlying matrix/tensor suffers from missing values, gross sparse outliers, and small dense noises at the same time.
First, we briefly introduce and analyze the two most related models, i.e., the matrix nuclear norm based model [
36] and the sum of mode-wise matrix nuclear norms based model [
45] as follows.
- (1)
The matrix Nuclear Norm (NN) based model [
36]: If the underlying tensor is of 2-way, i.e., a matrix, then the observation model in Equation (
6) becomes the setting for robust matrix completion, and the proposed estimator in Equation (
8) degenerates to the matrix nuclear norm based estimator in [
36]. In both model formulation and statistical analysis, this work can be seen as a 3-way generalization of [
36].
Moreover, by conducting robust matrix completion on each frontal slice of a 3-way tensor, we can obtain the matrix nuclear norm based robust tensor completion model as follows:
- (2)
The Sum of mode-wise matrix Nuclear Norms (SNN) based model [
45]: Huang et al. [
45] proposed a robust tensor completion model based on the sum of mode-wise nuclear norms deduced by the Tucker decomposition as follows
where
is the mode-
k matriculation of tensor
, for all
.
The main differences between SNN and this work are two-fold: (i) SNN is based on the Tucker decomposition [
15], whereas this work is based on the recently proposed tensor
-SVD [
24]; (ii) the theoretical analysis for SNN cannot guarantee the minimax optimality of the model in [
45], whereas this works rigorously proof of the minimax optimality of the proposed estimator is established in
Section 5.
Then, we discuss the following related works which can be seen as special cases of this work.
- (1)
The robust tensor completion model based on t-SVD [
46]: In a short conference presentation [
46] (whose first author is the same as this paper), the t-SVD-based robust tensor completion model is studied. As t-SVD can be viewed as a special case of the
-SVD (when DFT is used as the transform
L), the model in [
46] can be a special case of ours.
- (2)
The robust tensor recovery models with missing values and sparse outliers [
8,
27]: In [
8,
27], the authors considered the robust reconstruction of incomplete tensor polluted by sparse outliers, and proposed t-SVD (or
-SVD) based models with theoretical guarantees for exact recovery. As they did not consider small dense noises, their settings are indeed a special case of our observation model (
6) when
.
- (3)
The robust tensor decomposition based on t-SVD [
34]: In [
34], the authors studied the t-SVD-based robust tensor decomposition, which aims at recovering a tensor corrupted by both gross sparse outliers and small dense noises. Comparing with this work, Ref. [
34] can be seen as a special case when there are no missing values.
8. Conclusions
In this paper, we resolve the challenging robust tensor completion problem by proposing a -SVD-based estimator to robustly reconstruct a low-rank tensor in the presence of missing values, gross outliers, and small noises simultaneously. Specifically, this work can be concluded in the following three aspects:
- (1)
Algorithmically, we design an efficient algorithm within the framework of ADMM to efficiently compute the proposed estimator with guaranteed convergence behavior.
- (2)
Statistically, we analyze the statistical performance of the proposed estimator by establishing a non-asymptotic upper bound on the estimation error. The proposed upper bound is further proved to be minimax optimal (up to a log factor).
- (3)
Experimentally, the correctness of the upper bound is first validated through simulations on synthetic datasets. Then both effectiveness and efficiency of the proposed algorithm are demonstrated by extensive comparisons with state-of-the-art nuclear norm based models (i.e., NN and SNN) on seven different types of remote sensing data.
However, from a critical point of view, the proposed method has the following two limitations:
- (1)
The orientational sensitivity of
-SVD: Despite the promising empirical performance of the
-SVD-based estimator, a typical defect of it is the orientation sensitivity owing to low-rankness strictly defined along the tubal orientation which makes it fail to simultaneously exploit transformed low-rankness in multiple orientations [
19,
58].
- (2)
The difficulty in finding the optimal transform for -SVD: Although a direct use of fixed transforms (like DFT and DCT) may produce fairish empirical performance, it is still unclear how to find the best optimal transformation for any certain tensor when only partial and corrupted observations are available.
According to the above limitations, it is interesting to consider higher-order extensions of the proposed model in an orientation invariant way like [
19] and discuss the statistical performance. It is also interesting to consider the data-dependent transformation learning like [
31,
59]. Another future direction is to consider more efficient solvers of Problem (
8) using the factorization strategy or Frank–Wolfe method [
47,
60,
61,
62].