Next Article in Journal
An Improved Dynamic Joint Resource Allocation Algorithm Based on SFR
Next Article in Special Issue
Robust Hessian Locally Linear Embedding Techniques for High-Dimensional Data
Previous Article in Journal
The Effect of Preprocessing on Arabic Document Categorization
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Alternating Direction Method of Multipliers for Generalized Low-Rank Tensor Recovery

School of Science, Xi’an University of Architecture and Technology, Xi’an 710055, China
*
Author to whom correspondence should be addressed.
Algorithms 2016, 9(2), 28; https://doi.org/10.3390/a9020028
Submission received: 23 November 2015 / Revised: 26 March 2016 / Accepted: 13 April 2016 / Published: 19 April 2016
(This article belongs to the Special Issue Manifold Learning and Dimensionality Reduction)

Abstract

:
Low-Rank Tensor Recovery (LRTR), the higher order generalization of Low-Rank Matrix Recovery (LRMR), is especially suitable for analyzing multi-linear data with gross corruptions, outliers and missing values, and it attracts broad attention in the fields of computer vision, machine learning and data mining. This paper considers a generalized model of LRTR and attempts to recover simultaneously the low-rank, the sparse, and the small disturbance components from partial entries of a given data tensor. Specifically, we first describe generalized LRTR as a tensor nuclear norm optimization problem that minimizes a weighted combination of the tensor nuclear norm, the l1-norm and the Frobenius norm under linear constraints. Then, the technique of Alternating Direction Method of Multipliers (ADMM) is employed to solve the proposed minimization problem. Next, we discuss the weak convergence of the proposed iterative algorithm. Finally, experimental results on synthetic and real-world datasets validate the efficiency and effectiveness of the proposed method.

1. Introduction

In the past decade, the low-rank property of some datasets has been explored skillfully to recover both the low-rank and the sparse components or complete the missing entries. The datasets to be analyzed are usually modeled by matrices and the corresponding recovery technique is named as Low-Rank Matrix Recovery (LRMR). LRMR has received a significant amount of attention in some fields of information science such as computer vision, machine learning, pattern recognition, data mining and linear system identification. There are several appealing types of LRMR including Matrix Completion (MC) [1], Robust Principal Component Analysis (RPCA) [2] and Low-Rank Representation (LRR) [3]. Mathematically, we customarily formulate LRMR as matrix nuclear norm minimization problems that can be effectively solved by several scalable methods. Diverse variants of LRMR derive from the aforementioned three models, for instance, MC with noise (or stable MC) [4], stable RPCA [5,6], incomplete RPCA [7], LRR with missing entries [8], and LRMR based on matrix factorization or tri-factorization [9,10].
The fields of image and signal processing usually require processing large amounts of multi-way data, such as video sequences, functional magnetic resonance imaging sequences and direct-sequence code-division multiple access (DS-CDMA) systems [11]. For the datasets with multi-linear structure, traditional matrix-based data analysis is prone to destroy the spatial and temporal structure, and can be affected by the curse of dimensionality. In contrast, tensor-based data representation can avoid or alleviate the above deficiencies to some extent. Furthermore, tensor decompositions can be used to obtain a low-rank approximation of an investigated data tensor. Two of the most popular tensor decompositions are the Tucker model and the PARAFAC (Parallel Factor Analysis) model, and they can be thought of as the higher order generalizations of Singular Value Decomposition (SVD) and Principal Component Analysis (PCA) [12].
Generally speaking, the traditional tensor decomposition models are very sensitive to missing entries, arbitrary outliers and gross corruptions. On the contrary, Low-Rank Tensor Recovery (LRTR) is very effective to recover simultaneously the low-rank component and the sparse noise, or complete the missing entries. As the higher order generalization of LRMR, LRTR is composed mainly of tensor completion [13] and Multi-linear RPCA (MRPCA) [14]. The task of tensor completion is to recover missing or unsampled entries according to the low-rank structure. The alternating least squares method based on the PARAFAC decomposition was originally proposed to complete missing entries [15]. Subsequently, the Tucker decomposition was applied to the problem of tensor completion [16,17,18]. The multi-linear rank is commonly used to describe the low-rank property, although it is hard to be properly estimated. The tensor nuclear norm generalizes the matrix nuclear norm and becomes a new standard for measuring the low-rank structure of a tensor. Hence, LRTR can usually be boiled down to a tensor nuclear norm minimization problem.
Liu et al. [13] established a tensor nuclear norm minimization model for tensor completion and proposed the Alternating Direction Method of Multipliers (ADMM) for efficiently solving this model. Gandy et al. [19] considered the low-n-rank tensor recovery problem with some linear constraints and developed the Douglas–Rachford splitting technique, a dual variant of the ADMM. To reduce the computational complexity of the tensor nuclear norm minimization problems, Liu et al. [20] introduced the matrix factorization idea into the minimization model and obtained a much smaller matrix nuclear norm minimization problem. Tan et al. [21] transformed tensor completion into a linear least squares problem and proposed a nonlinear Gauss–Seidel method to solve the corresponding optimization problem. In [14], Shi et al. extended RPCA to the case of tensors and presented the MRPCA model. MRPCA, also named robust low-rank tensor recovery [22], was described as a tensor nuclear norm minimization which can be solved efficiently by the ADMM.
This paper studies a generalized model of LRTR. In this model, the investigated data tensor is assumed to be the superposition of a low-rank component, a gross sparse tensor and a small dense error tensor. The Generalized Low-Rank Tensor Recovery (GLRTR) aims mainly to recover the low-rank and the sparse components from partially observed entries. For this purpose, we establish a tensor nuclear norm minimization model for GLRTR. In addition, the ADMM method is adopted to solve the proposed convex model.
The rest of this paper is organized as follows. Section 2 provides mathematical notations and introduces preliminaries on tensor algebra. In Section 3, we review related works on LRTR. We present a generalized LRTR model and develop an efficient iterative algorithm for solving the proposed model in Section 4. Section 5 discusses the weak convergence result on the proposed algorithm. We report the experimental results in Section 6. Finally, Section 7 draws the conclusions.

2. Notations and Preliminaries

This section will briefly introduce mathematical notations and review tensor algebra. We denote tensors by boldface Euclid Math One, e.g., A , matrices by boldface capital letters, e.g., A, vectors by boldface letters, e.g., a, and scalars by italic letters, e.g., a or A. A tensor with order N indicates that its entries can be expressed via N indices. For an Nth-order real tensor A I 1 × I 2 × × I N , its ( i 1 , i 2 , , i N ) entry is denoted by a i 1 i 2 i N .
The n-mode matricization (unfolding) of tensor A , denoted by A ( n ) , is an I n × j n I j matrix obtained by rearranging n-mode fibers to be the columns of the resulting matrix. Furthermore, we define an opposite operation “fold” as f o l d n ( A ( n ) ) = A . Given another tensor I 1 × I 2 × × I N , the inner product between A and is defined as A , = i 1 = 1 I 1 i 2 = 1 I 2 i N = 1 I N a i 1 i 2 i N b i 1 i 2 i N . Then, the Frobenius norm (or l2-norm) of tensor A can be induced by the above inner product, that is, A F = A , A . Moreover, we also give other norm definitions of a tensor, which will be needed in the following sections. The l1-norm of tensor A is expressed as A 1 = i 1 = 1 I 1 i 2 = 1 I 2 i N = 1 I N | a i 1 i 2 i N | and the tensor nuclear norm is A = n = 1 N w n A ( n ) , where A ( n ) , named the nuclear norm of A ( n ) , is the sum of singular values of A ( n ) , w n 0 and n = 1 N w n = 1 . Due to the fact that the nuclear norm of one matrix is the tightest convex relaxation of the rank function on the unit ball in the spectral norm, it is easy to verify that the tensor nuclear norm A is a convex function with respect to A .
The n-mode product of tensor A by a matrix U J n × I n , denoted as A × n U , is an Nth-order tensor with the dimensionality of I 1 × × I n 1 × J n × I n 1 × × I N , and its ( i 1 , , i n 1 , j , i n + 1 , i N ) t h entry is calculated as i n = 1 I n a i 1 i 2 i N u j i n . The multi-linear rank of tensor A is stipulated as a vector ( r a n k ( A ( 1 ) ) , r a n k ( A ( 2 ) ) , , r a n k ( A ( N ) ) ) , where r a n k ( ) indicates the rank function of a matrix. Tensor A is low-rank if and only if r a n k ( A ( n ) ) I n for some n. Then, the Tucker decomposition of A is defined as
A C × 1 U ( 1 ) × 2 U ( 2 ) × N U ( N ) C × n = 1 N U ( n ) ,
where the core tensor C J 1 × J 2 × × J N , mode matrices U ( n ) J n × I n and J n I n . The multi-linear rank of a tensor has superiority over other definitions (e.g., the rank used in PARAFAC decomposition) because it is easy to compute. Refer to the survey [12] for further understanding of tensor algebra.
Within the field of low-rank matrix/tensor recovery, two proximal minimization problems are extensively employed:
min X   λ X 1 + 1 2 X A F 2 ,
min X   λ X + 1 2 X A F 2 ,
where A and A are a given data tensor and matrix respectively, λ 0 is a regularization factor. To address the above two optimization problems, we first define an absolute thresholding operator S λ ( · ) : I 1 × I 2 × × I N I 1 × I 2 × × I N as below: ( S λ ( A ) ) i 1 i 2 i N = max ( | a i 1 i 2 i N | λ , 0 ) . It has been proven that problems (2) and (3) have closed-form solutions denoted by S λ ( A ) [2] and T λ ( A ) [23] respectively, where T λ ( A ) = U S λ ( Σ ) V T and U Σ V T is the singular value decomposition of A.

3. Related Works

Tensor completion and MRPCA are two important and appealing applications of LRTR. In this section, we review the related works on the aforementioned two applications.
As the higher order generalization of matrix completion, tensor completion aims to recover all missing entries with the aid of the low-rank (or approximately low-rank) structure of a data tensor. Although low-rank tensor decompositions are practical in dealing with missing values [15,16,17,18], we have to estimate properly the rank of an incomplete tensor in advance. In the past few years, the matrix nuclear norm minimization model has been extended to the tensor case [13]. Given an incomplete data tensor D I 1 × I 2 × × I N and a corresponding sampling index set Ω [ I 1 ] × [ I 2 ] × × [ I N ] , we define a linear operator P Ω ( · ) : I 1 × I 2 × × I N I 1 × I 2 × × I N as follows: if ( i 1 , i 2 , , i N ) Ω , ( P Ω ( D ) ) i 1 i 2 i N = d i 1 i 2 i N ; otherwise, ( P Ω ( D ) ) i 1 i 2 i N = 0 , where [ I n ] = { 1 , 2 , , I n } . Then, the original tensor nuclear norm minimization model for tensor completion [13] is described as:
min X   X ,   s . t .   P Ω ( X ) = P Ω ( D ) .
To tackle problem (4), Liu et al. [13] developed three different algorithms and demonstrated experimentally that the ADMM is the most efficient algorithm in obtaining a high accuracy solution. If we further take the dense Gaussian noise into consideration, then the stable version of tensor completion is given as below [19]:
min X   X ,   s . t .   P Ω ( X ) P Ω ( D ) F η ,
or its corresponding unconstrained formulation:
min X   λ X + 1 2 P Ω ( X ) P Ω ( D ) F 2 ,
where the regularized parameter λ ( 0 ) is used to balance the low-rankness and the approximate error, and η is a known estimate of the noise level.
In RPCA, a data matrix is decomposed into the sum of a low-rank component and a sparse component, and it is possible to recover simultaneously the two components by principal component pursuit under some suitable assumptions [2]. Shi et al. [14] extended RPCA to the case of tensors and presented the framework of MRPCA, which regards the data tensor D as the sum of a low-rank tensor A and a sparse noise term . Mathematically, the low-rank and the sparse components can be simultaneously recovered by solving the following tensor nuclear norm minimization problem:
min A ,   A + λ 1 ,   s . t .   D = A + .
In [22], MRPCA is also called as robust low-rank tensor recovery.

4. Generalized Low-Rank Tensor Recovery

Both tensor completion and MRPCA do not consider dense Gaussian noise corruptions. In this section, we investigate the model of Generalized Low-Rank Tensor Recovery (GLRTR) and develop a corresponding iterative scheme.

4.1. Model of GLRTR

The datasets contaminated by Gaussian noise are very universal in practical engineering applications. In view of this, we assume the data tensor D to be the superposition of the low-rank component A , the large sparse corruption and the Gaussian noise G . We also consider the case that some entries of D are missing. To recover simultaneously the above three terms, we establish a convex GLRTR model as follows:
min A , , G   A + λ 1 + τ G F 2 ,   s . t .   P Ω ( D ) = P Ω ( A + + G ) ,
where the regularization coefficients λ and τ are nonnegative.
If we reinforce the constraints G = O (or equivalently τ + ) and = O (or equivalently λ + ), then the GLRTR is transformed into the tensor completion model (4), where the zero tensor O has the same dimensionality as D . If only the constraint = O is considered, then the GLRTR is equivalent to Equation (6). Furthermore, if we take G = O and Ω = [ I 1 ] × [ I 2 ] × × [ I N ] , then the model of GLRTR becomes the model of MRPCA. In summary, the proposed model is the generalization of the existing LRTR.
For the convenience of using the splitting method, we discard A and introduce N + 1 auxiliary tensor variables 1 , 2 , , N , X , where these auxiliary variables have the same dimensionality as D . Let M n ( n ) be the n-mode matricization of n for each n [ N ] and = { 1 , 2 , , N } . Hence, we have the equivalent formulation of Equation (8):
min X , , G , M   N = 1 n   w n M n ( n ) + λ 1 + τ G F 2 , s . t .   X = M n + G + , P Ω ( X ) = P Ω ( D ) , n = 1 , 2 , , N .
As a matter of fact, the discarded low-rank tensor A can be represented by A = N = 1 n M n / N . Now, we explain the low-rankness of A from two viewpoints. One is A N = 1 n M n / N max 1 n N M n . The other is that a better solution to Equation (9) will satisfy 1 2 N . These viewpoints illustrate that A is approximately low-rank along each mode. The aforementioned non-smooth minimization problem is distributed convex. Concretely speaking, the tensor variables can be split into several parts and the objective function is separable across this splitting.

4.2. Optimization Algorithm to GLRTR

As a special splitting method, the ADMM is very efficient to solve a distributed optimization problem with linear equality constraints. It takes the form of the decomposition–coordination procedure and blends the merits of dual decomposition and augmented Lagrangian methods. In this section, we will propose the method of ADMM to solve Equation (9).
We first construct the augmented Lagrangian function of the aforementioned convex optimization problem without considering the constraint P Ω ( X ) = P Ω ( D ) :
L μ ( X , , , G , Y ) =   n = 1 N w n M n ( n ) + λ 1 + τ G F 2 + n = 1 N ( Y n , X n G + μ 2 X n G F 2 ) ,
where μ is a positive numerical constant, Y n are Lagrangian multiplier tensors and Y = { Y 1 , Y 2 , , Y N } . There are totally five blocks of variables in L μ ( X , , , G , Y ) . The ADMM algorithm updates alternatively each block of variables while letting the other blocks of variables be fixed. If Y is given, one block of variables in { X , , G , } can be updated by minimizing L μ ( X , , , G , Y ) with respect to one argument. On the contrary, if { X , , G , } are given, Y is updated by maximizing L μ ( X , , , G , Y ) with respect to Y . The detailed iterative procedure is outlined as follows:
Computing X . Fix { , , G , Y } and minimize L μ ( X , , , G , Y ) with respect to X :
X : = arg   min X L μ ( X , , , G , Y ) = arg   min X   n = 1 N X n G + Y n / μ F 2 = X ˜ ,
where X ˜ = n = 1 N ( n Y n / μ ) / N + G + . In consideration of the constraint P Ω ( X ) = P Ω ( D ) , the final update formulation of X is revised as
X : = P Ω ¯ ( X ˜ ) + P Ω ( D ) ,
where Ω ¯ is the complement set of Ω.
Computing . If n is unknown and other variables are fixed, we update n by minimizing L μ ( X , , , G , Y ) with respect to n :
n : = arg   min n L μ ( X , , , G , Y ) = arg   min n   w n M n ( n ) + μ 2 ( X G + Y n / μ ) n F 2 = arg   min n   w n μ M n ( n ) + 1 2 ( Q ( n ) M n ( n ) ) n F 2 = fold n ( T w n / μ ( Q ( n ) ) ) ,
where Q = X G + Y n / μ and Q ( n ) is the n-mode matricization of Q .
Computing . If is unknown and other variables are fixed, the calculation procedure of is given as follows:
: = arg   min L μ ( X , , , G , Y ) = arg   min λ μ N 1 + 1 2 n = 1 N ( X n G + Y n / μ ) / N F 2 = S ( λ / μ N ) ( n = 1 N ( X n G + Y n / μ ) / N ) .
Computing G . The update formulation of G is calculated as
G : = arg   min G L μ ( X , , , G , Y ) = arg   min G τ G F 2 + μ N 2 G n = 1 N ( X n + Y n / μ ) / N F 2 = μ 2 τ + μ N n = 1 N ( X n + Y n / μ ) .
Computing Y . If { X , , , G } are fixed, we can update the Lagrangian multipliers Y as follows:
Y n : = Y n + μ ( X n G ) ,   n = 1 , 2 , , N .
Once { X , , , G , Y } are updated, we will increase the value of μ by multiplying it with a constant ρ ( > 1 ) during the procedure of iterations. The whole iterative procedure is outlined in Algorithm 1. The stopping condition of Algorithm 1 is set as max 1 n N X n G F 2 < ε or the maximum number of iterations is reached, where ε is a sufficiently small positive number.
Algorithm 1. Solving GLRTR by ADMM
Input: Data tensor D , sampling index set Ω, regularization parameters λ and τ.
Initialize: , , G , Y , μ , μ max and ρ.
Output: X , , G and .
While not converged do
  • Update X according to (12).
  • Update n according to (13), n = 1 , 2 , , N .
  • Update according to (14).
  • Update G according to (15).
  • Update Y n according to (16), n = 1 , 2 , , N .
  • Update μ as μ : = min ( ρ μ , μ max ) .
End while
In our implementation, , , G and Y are initialized to zeros tensors. The other parameters are set as follows: w 1 = w 2 = = w N = 1 / N , μ = 10 4 , ρ = 1.1 ,   μ max = 10 10 and ε = 10 8 . Furthermore, the maximum number of iterations is set to 100.

5. Convergence Analysis

Because the number of block variables in Equation (9) is more than two, it is difficult for us to prove the convergence of Algorithm 1. Nevertheless, the experimental results in the next section demonstrate this algorithm has good convergence behavior. This section will discuss the weak convergence result on our ADMM algorithm.
We consider a special case of Algorithm 1, that is, there is no missing entries. In this case, it holds that X = D . Thus, Equation (9) is transformed into
min , , G   w n M n ( n ) + λ 1 + τ G F 2 s . t .   D = n + G + , n = 1 , 2 , , N .
We can design a corresponding ADMM algorithm by revising Algorithm 1, namely, the update formulation of X is replaced by X = D . In fact, the revised algorithm is an inexact version of ADMM. Subsequently, we give the iterative formulations of exact ADMM for Equation (17):
{ : = arg   min L μ ( D , , , G , Y ) , ( , G ) : = arg   min , G L μ ( D , , , G , Y ) , Y n : = Y n + μ ( D n G ) , n = 1 , 2 , , N .
In this circumstance, we have the following statement:
Theorem 1. 
Let { ( k ) , ( k ) , G ( k ) , Y ( k ) } be the sequence produced by (18). If L 0 ( D , , , G , Y ) has a saddle point with respect to { , , G , Y } , then { ( k ) , ( k ) , G ( k ) } satisfy that the iterative sequence of objective function in Equation (17) converges to the minimum value.
Proof. 
By matricizing each tensor along the one-mode, the constraints of Equation (17) can be rewritten as a linear equation:
( D ( 1 ) D ( 1 ) D ( 1 ) ) = ( M 1 ( 1 ) M 2 ( 1 ) M N ( 1 ) ) + ( G ( 1 ) G ( 1 ) G ( 1 ) ) + ( E ( 1 ) E ( 1 ) E ( 1 ) ) ,
or equivalently,
( I I I ) D ( 1 ) = ( M 1 ( 1 ) M 2 ( 1 ) M N ( 1 ) ) + ( I I I ) G ( 1 ) + ( I I I ) E ( 1 ) ,
where I is an identity matrix of size I1 × I1. If we integrate G ( 1 ) and E ( 1 ) into one block of variables, then Equation (20) is re-expressed as below:
( I I I ) D ( 1 ) = ( M 1 ( 1 ) M 2 ( 1 ) M N ( 1 ) ) + ( I I I I I I ) ( G ( 1 ) E ( 1 ) ) .
We denote M = { M 1 ( 1 ) , M 2 ( 1 ) , , M N ( 1 ) } and partition all variables in Equation (17) into two parts in form of matrices: M and ( E ( 1 ) , G ( 1 ) ) . Essentially, M , E ( 1 ) and G ( 1 ) are the matricizations of , and G , respectively. Let g ( M ) =   n = 1 N w n M n ( n ) , h ( E ( 1 ) , G ( 1 ) ) = λ E ( 1 ) 1 + τ G ( 1 ) F 2 . Then, the objective function in Equation (17) can be expressed as g ( M ) + h ( E ( 1 ) , G ( 1 ) ) . It is obvious that g ( M ) and h ( E ( 1 ) , G ( 1 ) ) are two closed, proper and convex functions. According to the basic convergence result given in [24], we have lim k + g ( M ( k ) ) + h ( E ( 1 ) ( k ) , G ( 1 ) ( k ) ) = f , where f is the minimum value of g ( M ) + h ( E ( 1 ) , G ( 1 ) ) under the linear Constraint (21). This ends the proof. □
In the following, we discuss the detailed iterative procedure of { M , E ( 1 ) , G ( 1 ) } or { , , G } in Equation (18). It is easy to verify that the update formulation of in Equation (18) is equivalent to:
( M 1 ( 1 ) , M 2 ( 1 ) , , M N ( 1 ) ) : = arg   min M L μ ( D , , , G , Y ) = arg min ( M 1 ( 1 ) , M 2 ( 1 ) , , M N ( 1 ) )   n = 1 N ( w n M n ( n ) + μ 2 D n G + 1 μ Y n F 2 ) .
The block of variables M n ( 1 ) can be solved in parallel because the objective function in Equation (22) is separable with respect to M 1 ( 1 ) , M 2 ( 1 ) , , M N ( 1 ) . Hence, the iterative formulation of n is similar to that of Equation (13).
For fixed M and Y , we can get the optimal block of variables ( , G ) or ( E ( 1 ) , G ( 1 ) ) by minimizing f ( , G ) , where f ( , G ) = λ 1 + τ G F 2 + μ 2 n = 1 N ( D n + Y n / μ ) ( G + ) F 2 . The partial derivative of f ( , G ) with respect to G is
f G = 2 τ G + μ n = 1 N ( ( G + ) ( D n + Y n / μ ) ) .
By letting f G = 0 , we have
G : = δ T δ N ,
where δ = μ 2 τ + μ N , T = n = 1 N ( D n + Y n / μ ) . By substituting Equation (24) into f ( , G ) , we have the update formulation of :
: = arg   min   λ 1 + τ δ T δ N F 2 + μ 2 n = 1 N ( D n + Y n / μ ) δ T + δ N F 2 = arg   min λ 1 + τ δ 2 N 2 1 N T F 2 + μ ( 1 δ N ) 2 N 2 1 ( 1 δ N ) N n = 1 N ( D n + Y n / μ δ T ) F 2 = arg   min λ 1 + τ δ 2 N 2 1 N T F 2 + μ ( 1 δ N ) 2 N 2 1 N T F 2 = arg   min λ 1 + τ δ 2 N 2 + μ ( 1 δ N ) 2 N 2 1 N T F 2 = S ( λ / ( τ δ 2 N 2 + μ ( 1 δ N ) 2 N ) ) ( 1 N T ) .
A standard extension of the classic ADMM is to use varying parameter μ for each iteration. The goal of this extension is to improve the convergence and reduce the dependence on the initial choice of μ. In the context of the method of multipliers, this approach is proven to be superlinearly convergent if μ ( k ) + [25], which inspires us to adopt the non-decreasing sequence { μ ( k ) } . Furthermore, large values of μ result in a large penalty on violations of primal feasibility and are thus inclined to produce small primal residuals. However, it is difficult to prove the convergence of ADMM [24]. A commonly-used choice for { μ ( k ) } is μ ( k + 1 ) : = min ( ρ μ ( k ) , μ max ) , where ρ > 1 and μ max is the upper limit of μ. Due to the fact that μ ( k ) is fixed after a finite number of iterations, the corresponding ADMM is convergent according to Theorem 1.

6. Experimental Results

We perform experiments on synthetic data and two real-world video sequences, and validate the feasibility and effectiveness of the proposed method. The experimental results of GLRTR are compared with that of MRPCA, where the missing values in MRCPA are replaced by zeros.

6.1. Synthetic Data

In this subsection, we synthesize data tensors with missing entries. First, we generate an Nth-order low-rank tensor as follows: A = C × n = 1 N U ( n ) , with the core tensor C J 1 × J 2 × × J N and mode matrices U ( n ) I n × J n ( J n < I n ) . The entries of C and U ( n ) are independently drawn from the standard normal distribution. Then, we generate randomly a dense noise tensor G I 1 × I 2 × × I N whose entries also obey the standard normal distribution. Next, we construct a sparse noise tensor P Ω ( ) , where I 1 × I 2 × × I N is produced by a uniform distribution on the interval ( a , a ) and the index set Ω is produced by uniformly sampling on [ I 1 ] × [ I 2 ] × × [ I N ] with probability p′%. Finally, the generation of the sampling index set Ω is similar to Ω′ and the corresponding sampling rate is set to be p%. Therefore, an incomplete data tensor is synthesized as D Ω = P Ω ( A + G + ) .
For given data tensor D Ω with missing values, its low-rank component recovered by some method is denoted by A ^ . The Relative Error (RE) is employed to evaluate the recovery performance of the low-rank structure and its definition is given as follows: RE = A ^ A F / A F . Small relative error means good recovery performance. The experiments are carried out on 50 × 50 × 50 tensors and 20 × 20 × 20 × 20 tensors, respectively. Furthermore, we set a = 500 and J 1 = J 2 = = J N = r .
For convenience of comparison, we design three groups of experiments. In the first group of experiments, we only consider the case that there are no missing entries, that is, Ω = [ I 1 ] × [ I 2 ] × × [ I N ] or p = 100. The values of F / A F and G F / A F are adopted to indicate the Inverse Signal-to-Noise Ratio (ISNR) with respect to the sparse and the Gaussian noise respectively. Three different degrees of sparsity are taken into account, that is, p′ = 5, 10 and 15. In addition, we take λ = 0.025 , τ = 0.02 , r { 3 , 5 } for 3rd-order tensors and λ = 0.03 , τ = 0.01 , r { 2 , 4 } for 4th-order tensors. For given parameters, we repeat the experiments ten times and report the average results. As a low-rank approximation method for tensors, the Higher-Order SVD (HOSVD) truncation [12] to rank- ( r 1 , r 2 , , r N ) is not suitable for gross corruptions owing to the fact that its relative error reaches up to 97% or even 100%. Hence, we do not compare this method with our GLRTR in subsequent experiments. The experimental results are shown in Table 1 and Table 2, respectively.
From the above two tables, we have the following observations: (I) Although the values of ISNR on sparse noise are very large, both MRPCA and GLRTR remove efficiently sparse noise to some extent. Meanwhile, large values of ISNR on Gaussian noise are disadvantageous for recovering the low-rank components; (II) GLRTR has better recovery performance than MRPCA. In an average sense, the relative error of GLRTR is 2.68% smaller than that of MRPCA for 50 × 50 × 50 tensors, and 14.17% for 20 × 20 × 20 × 20 tensors; (III) For 3rd-order tensors, GLRTR removes effectively Gaussian noise, and, on average, its relative error is 5.96% smaller than the value of ISNR on Gaussian noise; for 4th-order tensors, GLRTR effectively removes Gaussian noise only in the case r = 2. In summary, GLRTR is more effective than MRPCA in recovering the low-rank components.
The second group of experiments considers four different sampling rates for Ω and one fixed degree of sparsity for Ω′, that is, p { 30 , 50 , 70 , 90 } and p′ = 5. We set τ = 0.02 for both 3rd-order and 4th-order tensors, and choose the superior tradeoff parameter λ for each p. The comparisons of experimental results between MRPCA and GLRTR are shown in Table 3 and Table 4, respectively. We can see from these two tables that MRPCA is very sensitive to the sampling rate p%, and it hardly recovers the low-rank components. In contrast, GLRTR achieves better recovery performance for 3rd-order tensors. As for 4th-order tensors, it also has smaller relative error when the sampling rate p% is relatively large. These observations show that GLRTR is more robust to missing values than MRPCA.
We will evaluate the sensitivity of GLRTR to the choice of λ and τ in the last group of experiments. For convenience of designing experiments, we only perform experiments on 50 × 50 × 50 tensors and consider the case that p = 100. The values of λ and τ are set according to the following manner: we vary the value of one parameter while letting the other be fixed. In the first case, the parameter τ is chosen as 0.01. Under this circumstance, the relative errors versus different λ of MRPCA and GLRTR are shown in Figure 1. We take λ = 0.01 in the second case and the relative errors versus different τ of GLRTR are shown in Figure 2.
It can be seen from Figure 1 that the relative errors of MRPCA and GLRTR are about 9.90% and 4.62%, respectively, if 0.02 ≤ λ ≤ 0.07, which means the latter has better recovery performance than the former. Furthermore, their relative errors are relatively stable when λ lies within a certain interval. Figure 2 illustrates that the relative error has the tendency to increase monotonically, and it becomes almost stationary when τ ≥ 1. At this moment, the relative errors lie in the interval (0.037, 0.080). This group of experiments implies that, for our synthetic data, GLRTR is not very sensitive to the choice of λ and τ.

6.2. Influence of Noise and Sampling Rate on the Relative Error

This subsection will evaluate the influence of noise and sampling rate on the relative error. For this purpose, we design four groups of experiments and use the synthetic data generated in the same manner as in the previous subsection. For the sake of convenience, we only carry out experiments on 50 × 50 × 50 tensors.
The first group of experiments aims to investigate the influence of noise on the recovery performance. In the data generation process, we only change the manner for generating G , that is, each entry of G is drawn independently from the normal distribution with mean zero and standard deviation b. Let r = 5 , p = 10 , p = 100 , a = 50 i and b = 0.2j, where i , j { 0 , 1 , , 10 } . For different combinations of a and b, the relative errors of GLRTR are shown in Figure 3. We can draw two conclusions from this figure. For given b, the relative error is relatively stable with the increasing of a, which means the relative error is not very sensitive to the magnitude of sparse noise. The relative error monotonically increases with the increasing of Gaussian noise level, which validates that large Gaussian noise is disadvantageous for recovering the low-rank component.
Next, we study the influence of sampling rate p% on the relative error for different r. Set p = 10 , a = 500 , b = 1 , r = 1 + 2 i , i = 1 , 2 , , 14 and vary the value of p% from 30 to 100 in steps of size 10. For fixed r and p, we obtain the low-rank component according to GLRTR and then plot the relative errors in Figure 4. From the 3-D colored surface in Figure 4, we can see that both r and p have significant influence on the relative error. This observation indicates that small r or large p is conducive to the recovery of low-rank term.
The third group of experiments will validate the robustness of GLRTR to sparse noise. Concretely speaking, we investigate the recovery performance under different ISNR on sparse noise without consideration of Gaussian noise. Set r { 3 , 5 } , p = 10 , p = 100 , b = 0 ,   a = 2 i × 500 , i = 15 , 14 , , 10 . The experimental results are shown in Figure 5, where the horizontal and the vertical coordinates represent the ISNR on sparse noise and the relative error, respectively. This figure illustrates that the relative error is less than 4.5% for synthetic 3rd-order tensors, which verifies experimentally that our method is very robust to sparse noise.
In the last group of experiments, we discuss the performance of GLRTR in removing the small dense noise for given large sparse noise. We also propose a combination strategy: GLRTR + HOSVD, that is, GLRTR is followed by HOSVD. The goal of this new method is to improve the denoising performance of GLRTR. Let r { 3 , 5 } , p = 10 , p = 100 , a = 500 and b = 0.1 i , i = 0 , 1 , , 50 . Different values for b lead to different ISNR on Gaussian noise. We draw four curves to reflect the relationship between the relative error and ISNR on Gaussian noise, as shown in Figure 6, where the black dashed line is a reference line. We have two observations from this figure. When ISNR on Gaussian noise is larger than 3.5%, GLRTR not only successfully separates the sparse noise to some extent but also effectively removes the Gaussian noise. The GLRTR+HOSVD method has better denoising performance than GLRTR in the presence of large Gaussian noise.

6.3. Applications in Background Modeling

In this subsection, we test our method on two real-world surveillance videos for object detection and background subtraction: Lobby and Bootstrap datasets [26]. For convenience of computation, we only consider the first 200 frames for each dataset and transform the color images into the gray-level images. The resolutions of each image in the Lobby and Bootstrap datasets are 128 × 160 and 120 × 160, respectively. We add Gaussian noise with mean zero and standard deviation 5 to each image. Hence, we obtain two data tensors of order 3 and their sizes are 128 × 160 × 200 and 120 × 160 × 200, respectively. For two given tensors, we execute random sampling on them with a probability of 50%.
Considering the fact that MRPCA fails in recovering the low-rank components on the synthetic data with missing values, we only implement the method of GLRTR on the video datasets. Two tradeoff parameters are set as follows: λ = 0.0072 and τ = 0.001. We can obtain the low-rank, the sparse and the completed components from the incomplete data tensors according to the proposed method. Actually, the low-rank terms are the backgrounds and the sparse noise terms correspond to the foregrounds. The experimental results are partially shown in Figure 7 and Figure 8, respectively, where the missing entries in incomplete images are shown in white. From Figure 7 and Figure 8, we can see that GLRTR can recover efficiently the low-rank images and the sparse noise images. Moreover, we observe from the recovered images that a large proportion of missing entries are effectively completed.
To evaluate the completion performance of GLRTR, we define the Relative Approximate Error (RAE) as RAE = D ^ D F / D F , where D is the original video tensor without Gaussian noise corruptions and missing entries, and D ^ is the approximated term of D . The RAE of the Lobby dataset is 8.94% and that of the Bootstrap dataset is 20.11%. These results demonstrate GLRTR can complete approximately the missing entries to a certain degree. There are two reasons for that the Bootstrap dataset has relatively large RAE: one is its more complex foreground and the other is that the entries of the foreground can not be recovered when they are missing. In summary, GLRTR is robust to gross corruption, Gaussian noise and missing values.

7. Conclusions

In this paper, we investigate a generalized model of LRTR in which large sparse corruption, missing entries and Gaussian noise are taken into account. For the generalized LRTR, we establish an optimization problem that minimizes the weighted combination of the tensor nuclear norm, the l1 norm and the Frobenius norm. To address this minimization problem, we present an iterative scheme based on the technique of ADMM. The experimental results on synthetic data and real-world video datasets illustrate that the proposed method is efficient and feasible in recovering the low-rank components and completing missing entries. In the future, we will consider the theoretical conditions for exact recoverability and other scalable algorithms.

Acknowledgments

This work is partially supported by the National Natural Science Foundation of China (No. 61403298, No. 11526161 and No. 11401357), and the Natural Science Basic Research Plan in Shaanxi Province of China (No. 2014JQ8323).

Author Contributions

Jiarong Shi constructed the model, developed the algorithm and wrote the manuscript. Qingyan Yin designed the experiments. Xiuyun Zheng and Wei Yang implemented all experiments. All four authors were involved in organizing and refining the manuscript. All authors have read and approved the final manuscript.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Candès, E.J.; Recht, B. Exact matrix completion via convex optimization. Found. Comput. Math. 2009, 9, 717–772. [Google Scholar] [CrossRef]
  2. Candès, E.J.; Li, X.; Ma, Y.; Wright, J. Robust principal component analysis? J. ACM. 2011, 58, 37. [Google Scholar] [CrossRef]
  3. Liu, G.; Lin, Z.; Yan, S.; Sun, J.; Yu, Y.; Ma, Y. Robust recovery of subspace structures by low-rank representation. IEEE Trans. Pattern Anal. Mach. Intell. 2013, 35, 171–184. [Google Scholar] [CrossRef] [PubMed]
  4. Candès, E.J.; Plan, Y. Matrix completion with noise. P. IEEE. 2010, 98, 925–936. [Google Scholar] [CrossRef]
  5. Zhou, Z.; Li, X.; Wright, J.; Candès, E.J.; Ma, Y. Stable principal component pursuit. In Proceedings of the 2010 IEEE International Symposium on Information Theory Proceedings (ISIT), Austin, TX, USA, 13–18 June 2010.
  6. Xu, H.; Caramanis, C.; Sanghavi, S. Robust PCA via outlier pursuit. IEEE Trans. Inf. Theory. 2012, 58, 3047–3064. [Google Scholar] [CrossRef]
  7. Shi, J.; Zheng, X.; Yong, L. Incomplete robust principal component analysis. ICIC Express Letters, Part B Appl. 2014, 5, 1531–1538. [Google Scholar]
  8. Shi, J.; Yang, W.; Yong, L.; Zheng, X. Low-rank representation for incomplete data. Math. Probl. Eng. 2014, 10. [Google Scholar] [CrossRef]
  9. Liu, Y.; Jiao, L.C.; Shang, F.; Yin, F.; Liu, F. An efficient matrix bi-factorization alternative optimization method for low-rank matrix recovery and completion. Neural Netw. 2013, 48, 8–18. [Google Scholar] [CrossRef] [PubMed]
  10. Liu, Y.; Jiao, L.C.; Shang, F. A fast tri-factorization method for low-rank matrix recovery and completion. Pattern Recogn. 2013, 46, 163–173. [Google Scholar] [CrossRef]
  11. De Lathauwer, L.; Castaing, J. Tensor-based techniques for the blind separation of DS–CDMA signals. Signal Process. 2007, 87, 322–336. [Google Scholar] [CrossRef]
  12. Kolda, T.G.; Bader, B.W. Tensor decompositions and applications. SIAM Rev. 2009, 51, 455–500. [Google Scholar] [CrossRef]
  13. Liu, J.; Musialski, P.; Wonka, P.; Ye, J. Tensor completion for estimating missing values in visual data. IEEE Trans. Patt. Anal. Mach. Intel. 2013, 35, 208–220. [Google Scholar] [CrossRef] [PubMed]
  14. Shi, J.; Zhou, S.; Zheng, X. Multilinear robust principal component analysis. Acta Electronica Sinica. 2014, 42, 1480–1486. [Google Scholar]
  15. Tomasi, G.; Bro, R. PARAFAC and missing values. Chemom. Intell. Lab. Syst. 2005, 75, 163–180. [Google Scholar] [CrossRef]
  16. Shi, J.; Jiao, L.C.; Shang, F. Tensor completion algorithm and its applications in face recognition. Pattern Recognit. Artif. Intell. 2011, 24, 255–261. [Google Scholar]
  17. Kressner, D.; Steinlechner, M.; Vandereycken, B. Low-rank tensor completion by Riemannian optimization. BIT Numer. Math. 2014, 54, 447–468. [Google Scholar] [CrossRef]
  18. Shi, J.; Yang, W.; Yong, L.; Zheng, X. Low-rank tensor completion via Tucker decompositions. J. Comput. Inf. Syst. 2015, 11, 3759–3768. [Google Scholar]
  19. Gandy, S.; Recht, B.; Yamada, I. Tensor completion and low-n-rank tensor recovery via convex optimization. Inv. Probl. 2011, 27, 19. [Google Scholar] [CrossRef]
  20. Liu, Y.; Shang, F. An efficient matrix factorization method for tensor completion. IEEE Signal Process. Lett. 2013, 20, 307–310. [Google Scholar] [CrossRef]
  21. Tan, H.; Cheng, B.; Wang, W.; Zhang, Y.J.; Ran, B. Tensor completion via a multi-linear low-n-rank factorization model. Neurocomputing. 2014, 133, 161–169. [Google Scholar] [CrossRef]
  22. Goldfarb, D.; Qin, Z. Robust low-rank tensor recovery: Models and algorithms. SIAM J. Matrix Anal. Appl. 2014, 35, 225–253. [Google Scholar] [CrossRef]
  23. Cai, J.F.; Candès, E.J.; Shen, Z. A singular value thresholding algorithm for matrix completion. SIAM J. Optimiz. 2010, 20, 1956–1982. [Google Scholar] [CrossRef]
  24. Boyd, S.; Parikh, N.; Chu, E.; Peleato, B.; Eckstein, J. Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 2011, 3, 1–122. [Google Scholar] [CrossRef]
  25. Rockafellar, R.T. Monotone operators and the proximal point algorithm. SIAM J. Control Optim. 1976, 14, 877–898. [Google Scholar] [CrossRef]
  26. Databases of Lobby and Bootstrap. Available online: http://perception.i2r.a-star.edu.sg/bk_ model/bk_index.html (accessed on 1 November 2015).
Figure 1. Comparison of relative errors between Multi-linear Robust Principal Component Analysis (MRPCA) and Generalized Low-Rank Tensor Recovery (GLRTR) with varying λ.
Figure 1. Comparison of relative errors between Multi-linear Robust Principal Component Analysis (MRPCA) and Generalized Low-Rank Tensor Recovery (GLRTR) with varying λ.
Algorithms 09 00028 g001
Figure 2. Relative errors of Generalized Low-Rank Tensor Recovery (GLRTR) with varying τ.
Figure 2. Relative errors of Generalized Low-Rank Tensor Recovery (GLRTR) with varying τ.
Algorithms 09 00028 g002
Figure 3. Relative errors for different a and b.
Figure 3. Relative errors for different a and b.
Algorithms 09 00028 g003
Figure 4. Relative errors for different r and p.
Figure 4. Relative errors for different r and p.
Algorithms 09 00028 g004
Figure 5. Relative errors versus different Inverse Signal-to-Noise Ratio (ISNR) on sparse noise.
Figure 5. Relative errors versus different Inverse Signal-to-Noise Ratio (ISNR) on sparse noise.
Algorithms 09 00028 g005
Figure 6. Relative errors versus different Inverse Signal-to-Noise Ratio (ISNR) on Gaussian noise.
Figure 6. Relative errors versus different Inverse Signal-to-Noise Ratio (ISNR) on Gaussian noise.
Algorithms 09 00028 g006
Figure 7. Background modeling from lobby video. (a) draws the images with missing entries, (b) plots the recovered background, i.e., the low-rank images, (c) shows the recovered foreground, i.e., the sparse noised images, and (d) displays the recovered images.
Figure 7. Background modeling from lobby video. (a) draws the images with missing entries, (b) plots the recovered background, i.e., the low-rank images, (c) shows the recovered foreground, i.e., the sparse noised images, and (d) displays the recovered images.
Algorithms 09 00028 g007
Figure 8. Background modeling from bootstrap video. (a) draws the images with missing entries, (b) plots the recovered background, i.e., the low-rank images, (c) shows the recovered foreground, i.e., the sparse noised images, and (d) diplays the recovered images.
Figure 8. Background modeling from bootstrap video. (a) draws the images with missing entries, (b) plots the recovered background, i.e., the low-rank images, (c) shows the recovered foreground, i.e., the sparse noised images, and (d) diplays the recovered images.
Algorithms 09 00028 g008
Table 1. Comparison of experimental results on 50 × 50 × 50 tensors.
Table 1. Comparison of experimental results on 50 × 50 × 50 tensors.
(r, p′) F / A F (%) G F / A F (%)RE of MRPCA (%)RE of GLRTR (%)
(3, 5)125318.339.71 ± 0.986.60 ± 0.69
(3, 10)176919.208.04 ± 0.847.99 ± 0.84
(3, 15)221820.059.84 ± 1.978.75 ± 1.87
(5, 5)5698.797.39 ± 0.793.74 ± 0.42
(5, 10)8559.408.55 ± 0.974.47 ± 0.52
(5, 15)10409.169.13 ± 0.525.05 ± 0.31
Table 2. Comparison of experimental results on 20 × 20 × 20 × 20 tensors.
Table 2. Comparison of experimental results on 20 × 20 × 20 × 20 tensors.
(r, p′) F / A F (%) G F / A F (%)RE of MRPCA (%)RE of GLRTR (%)
(2, 5)172224.0529.24 ± 7.9513.90 ± 4.68
(2, 10)282337.7248.07 ± 18.4723.98 ± 9.66
(2, 15)305227.1744.71 ± 13.6623.10 ± 8.52
(4, 5)4486.9712.00 ± 1.608.96 ± 1.15
(4, 10)5636.0415.12 ± 1.986.26 ± 0.87
(4, 15)7586.5824.22 ± 2.6112.16 ± 2.90
Table 3. Comparison of experimental results on incomplete 50 × 50 × 50 tensors for r = 5.
Table 3. Comparison of experimental results on incomplete 50 × 50 × 50 tensors for r = 5.
(p, λ)(30,0.07)(50, 0.05)(70, 0.03)(90,0.03)
RE of MRPCA (%)87.81 ± 0.2777.02 ± 0.4660.64 ± 1.7711.49 ± 1.73
RE of GLRTR (%)11.58 ± 1.726.62 ± 0.825.51 ± 0.834.27 ± 0.47
Table 4. Comparison of experimental results on incomplete 20 × 20 × 20 × 20 tensors for r = 4.
Table 4. Comparison of experimental results on incomplete 20 × 20 × 20 × 20 tensors for r = 4.
(p, λ)(30, 0.03)(50, 0.035)(70, 0.03)(90, 0.03)
RE of MRPCA (%)88.38 ± 0.2578.94 ± 0.3465.06 ± 0.5940.03 ± 1.34
RE of GLRTR (%)62.51 ± 1.7220.09 ± 2.639.14 ± 0.128.58 ± 1.11

Share and Cite

MDPI and ACS Style

Shi, J.; Yin, Q.; Zheng, X.; Yang, W. Alternating Direction Method of Multipliers for Generalized Low-Rank Tensor Recovery. Algorithms 2016, 9, 28. https://doi.org/10.3390/a9020028

AMA Style

Shi J, Yin Q, Zheng X, Yang W. Alternating Direction Method of Multipliers for Generalized Low-Rank Tensor Recovery. Algorithms. 2016; 9(2):28. https://doi.org/10.3390/a9020028

Chicago/Turabian Style

Shi, Jiarong, Qingyan Yin, Xiuyun Zheng, and Wei Yang. 2016. "Alternating Direction Method of Multipliers for Generalized Low-Rank Tensor Recovery" Algorithms 9, no. 2: 28. https://doi.org/10.3390/a9020028

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop