Tensor Robust Principal Component Analysis via Non-Convex Low Rank Approximation

Tensor Robust Principal Component Analysis (TRPCA) plays a critical role in handling high multi-dimensional data sets, aiming to recover the low-rank and sparse components both accurately and efficiently. In this paper, different from current approach, we developed a new t-Gamma tensor quasi-norm as a non-convex regularization to approximate the low-rank component. Compared to various convex regularization, this new configuration not only can better capture the tensor rank but also provides a simplified approach. An optimization process is conducted via tensor singular decomposition and an efficient augmented Lagrange multiplier algorithm is established. Extensive experimental results demonstrate that our new approach outperforms current state-of-the-art algorithms in terms of accuracy and efficiency.


Introduction
An array of numbers arranged on a regular grid with a variable number of axes is described as a tensor.As the dimensionality reaches beyond 3D, conventional data representations such as vector-based and matrix-based become insufficient, and highly dimensional data sets are usually formulated as tensors [1][2][3].
Robust Principal Component Analysis (RPCA) [4][5][6][7][8], aiming to recover the low-rank and sparse matrices both accurately and efficiently, has been widely studied in data compressed sensing and computer vision.It is of great use when we try to solve the problem of subspace learning or Principal Component Analysis (PCA) in the presence of outliers [9].Especially RPCA is a very useful tool in foreground detection, which is the first step in video surveillance system to detect moving objects [10,11].Tensor Robust Principal Component Analysis (TRPCA) extends the RPCA from matrix to the tensor case, that is, to recover a low-rank tensor and a sparse (noise entries) tensor.
Tensor rank [12] essentially characterizes the intrinsic "dimension" or "degree of freedom" for highly dimensional data sets.In order to make the rank characteristic be more transparent, one can make use of tensor Singular Value Decomposition (t-SVD) developed recently by Kilmer and Martin [13], rendering the tensor rank approximation approachable.The key for their framework is the well-defined t-product that leads to the 3rd-order tensor space to be an inner product space so that approximation becomes more effective such as in image processing [14,15].t-SVD allows to define tensor tubal-rank, which appears to be much more approachable such as in the study of image analysis than previous rank definitions [16][17][18].The characteristic of a low tensor tubal-rank implies that a 3rd-order tensor can be significantly compressed (Theorem 4.3 of [13]), which is particularly important in applications.
Mathematically, TRPCA based on t-SVD [19,20] aims to recover the low tubal rank component L 0 and sparse component E 0 from noisy observations X = L 0 + E 0 of size N 1 × N 2 × N 3 by convex optimization min λ is a Lagrange multiplier, E is sparse if the tensor 1 norm E 1 is small.It is well-known that the t-SVD-based tensor nuclear norm (TNN, III.B of [21]) has been proven to be the tightest convex relaxation to 1 -norm of the tensor multi-rank (Theorem 2.4.1 in [15] or Theorem 3.1 in [19]).However, TNN treats the singular values in t-SVD equally to pursue the convexity of the objective function, while the singular values associated with a practical image may emphasize its different properties and should be handled differently, in particular, when some singular values are very large.Due to the gap between the tensor rank function and its lower convex envelop, the adoption of tensor nuclear norm may lead to the approximation of the corresponding tensor tubal-rank being inaccurate.Therefore, a new development of tensor tubal-rank approximation becomes necessary to reflect these fundamental characteristics.
To make the the necessity of a new tensor tubal-rank approximation instead of TNN more transparent, we consider the the matrix RPCA [4] first.Now the goal becomes to recover a low-rank matrix L from highly corrupted observations X = L + S, by minimizing L * + λ S 1 .Here L * is the summation of the singular values of L. It is easy to see that Hence, when σ 1 is very large, for example, and the rest are relatively small, we may have Then the minimization of L * will attempt to minimize σ 1 that leads to over-penalizes σ 1 , and consequently it generates undesirable solution L with small σ 1 .Recall that our goal is to find a low-rank L, whose largest singular value is not necessarily to be small.In addition, it is worth mentioning that in practice the matrix RPCA problem is very sensitive to small perturbations and this problem can be addressed by relaxing the equality constraints.The authors in [22,23] propose an alternative optimization approach that appears to be suitable to deal with robustness issues in the "Sparse Plus Low-rank" decomposition problem.It has long been noticed that the nuclear norm is not always satisfactory [24], since it over-penalizes large singular values, and consequently may only find a much biased solution.Since the the nuclear norm is the tightest convex relaxation of the rank function [25], a further improvement under the convex setting in general is limited, if it is still possible, in particular when their gap is not very small.In order to alleviate the above problem, the Gamma quasi-norm for matrices is introduced in [26] in RPCA, which results in a tighter approximation of the rank function.For L ∈ R m×n and the singular values of L, denoted by σ 1 (L), . . ., σ min{m,n} (L), the Gamma quasi-norm of L is defined as . So it overcomes the imbalanced penalization by different singular values in convex nuclear norm.In addition, when γ goes to 0, the limit of the Gamma quasi-norm is exactly the matrix rank.The Gamma quasi-norm has nice algebraic properties, i.e., unitarily invariant and positive definiteness.Thanks to the unitarily invariance [27], the Gamma quasi-norm minimization problem for A ∈ R m×n : min F can be solved very effectively (Theorem 1 of [26]).Moreover, experimental results in [26] show that Gamma quasi-norma-based RPCA outperforms several other convex/non-convex norma-based RPCA in both accuracy and efficiency.
In this paper, inspired by the success of the non-convex rank approximation [26], we proposed tensor Gamma quasi-norm via t-SVD to seek a closer approximation and to alleviate the above-mentioned limitations resulted from the tensor nuclear norm, which has not been used for TRPCA yet in literature.
The main contributions of this paper are illustrated as follows: 1. We develop new non-convex tensor rank approximation functions that can be used as desirable regularization for the optimization process, which appears to be more effective than current existing approaches in literature.2.An efficient algorithm based on the alternating direction method of multipliers (ADMM) is developed to solve the equivalent optimization problem in the Fourier domain, which is also suitable for general TRPCA problem.In addition, we provided the convergent analysis of the augmented Lagrange multiplier based optimization algorithm.3. Extensive evaluation of the proposed approach on several benchmark data sets has been conducted, and detailed comparison with most recent approaches are provided.Experimental results demonstrate that our proposed approach yields superior performance for image recovery and video background modeling.
The rest of the paper is organized as follows.In Section 2, we roughly describe some related works.Then Section 3 provides factorization strategies for 3rd-order tensors and its relationship to the corresponding tensor approximation problem.And then we present the main results on t-Gamma quasi-norm of 3rd-order tensors.Section 4 introduces our non-convex TRPCA framework and the convergent analysis of the augmented Lagrange multiplier based optimization algorithm.Experimental analysis and results are shown in Section 5 to verify the proposed algorithm.Finally, the paper ends with concluding remarks in Section 6.

Related Works
Recently, many authors observed the limitations of t-SVD-based TRPCA [15,19].It was pointed out in [28] that the classical TRPCA methods based on t-SVD fails to utilize the low rank structure in the third mode.To further exploit the low-rank structures in multiway data, they defined a new TNN that extends TNN with core matrix and proposed a creative algorithm to deal with TRPCA problems.In addition, the authors of [29] presented that there are some weak points in the existing low-rank approximation approaches, for example, people need to predefine the rank values or fail to consider local information of data (e.g., spatial or spectral smooth structure).So they proposed a novel TNN-based low-rank approximation with total variation regularization, which uses the TNN to encode the global low-rank prior of tensor data and the total variation regularization to preserve the spatial-spectral continuity in a unified framework.
It has been noticed by others that TNN is not always satisfactory.For instance, to seek a closer approximation of tensor rank, Kong et al. in [30] introduced t-Schatten-p tensor quasi-norm to replace the TNN.The t-Schatten-p norm is convex when p ≥ 1, but non-convex when 0 < p < 1.In addition, they proved multi-Schatten-p norm surrogate theorem to convert the non-convex t-Schatten-p tensor quasi-norm for 0 < p < 1 into a weighted sum of convex functions.Then based on the multi-Schatten-p norm surrogate, they proposed a novel convex tensor RPCA model.
Besides above, a recent paper [31] developed randomized algorithms for tensor low-rank approximation and decomposition, together with tensor multiplication.Their proposed tubal focused algorithms employ a small number of lateral and/or horizontal slices of the underlying 3rd order tensor, that come with relative error guarantees for the quality of the obtained solutions.
Besides the above-mentioned methods based on t-SVD, there are many other TRPCA methods mainly applied CP (CANDECOMP/PARAFAC) or Tucker decompositions methods in their optimization models.We need to mention that recently Driggs et al. [32] studied TRPCA using a non-convex formulation of the CP based tensor atomic-norm and identified a class of local minima of this non-convex program that are globally optimal.
Presently, background subtraction become more and more important for visual surveillance systems and several subspace learning algorithms based on matrix and tensor tools have been used to perform the background modeling of the scenes.Recently, Sobral et al. [33] developed an incremental tensor subspace learning that uses only a small part of the entire data and updates the low-rank model incrementally when new data arrives and in [34] the authors proposed an online stochastic framework for tensor decomposition of multispectral video sequences.In particular, it is worth mentioning that Bouwmans et al. [35] gave a systematic and complete review of the "low-rank plus sparse matrix decomposition" framework of separating moving objects from the background.
However, all those above mentioned t-SVD-based TRPCA algorithms solve the TRPCA optimization problem under convex regularization.In addition, most analyses of ADMM algorithms are in the convex setting [36].In our work, we explore the TRPCA method based on the non-convex t-Gamma quasi-norm of 3rd-order tensors.Due to the non-convex nature of our TRPCA model, the convergent analysis of algorithm is difficult to justify.In addition, the analysis of non-convex augmented Lagrangians is not easy, but this compels us to make efforts of clarity and pedagogy.Also it should be noted that there are few works on t-SVD-based TRPCA achieve background modeling.Thus experiments are provided in our paper on two computer vision applications, which are image recovery and video background modeling.

Notations and Definitions
Throughout this paper, all boldface Euler script letters (X ) denote tensors, uppercase characters (X) denote matrices, and lowercase letters (a) denote scalars.The field of real numbers is denoted as R. We denote N i , i = 1, 2, 3 as the index set associated with a given tensor.For 3rd-order tensors X = [a ijk ] ∈ R N 1 ×N 2 ×N 3 , we use X (i, :, :), X (:, i, :), X (:, :, i) to denote the i-th horizontal, lateral, and frontal slice, respectively.For simplicity, we denote the frontal slice X (:, :, i) as X i .The inner product between X and Y of size The Frobenius norm of a tensor is defined as . The 1 norm of a tensor is defined as X 1 = ∑ i,j,k a ijk .In this paper, the adopted basic definitions and notations follow the references [13,21,37] for the purpose of an easier comparison.Definition 1 (Transpose p. 10 of [13]).For X ∈ R N 1 ×N 2 ×N 3 , the transpose tensor X T is an N 2 × N 1 × N 3 tensor obtained by transposing each frontal slice of X and then reversing the order of the transposed frontal slices 2 through N 3 .
Similar to matrices whose elements can be grouped by rows or by columns, higher dimensional tensors can be viewed forming by slices along the third mode.
, where X i = X (:, :, i) for i = 1, . . ., N 3 .It is important to notice that the block circulant matrix bcirc(X ) keeps the order of frontal slices of X in an appropriate way, and thus it better maintains X 's structure in terms of the frontal direction, compared with the direct tensor unfolding along the third mode.
Definition 2 (t-product p. 11 of [13]).Let X ∈ R N 1 ×N 2 ×N 3 , we define the matvec and fold operators as matvec(X ) = X 1 ; X Then the t-product of 3rd-order tensors is defined as Definition 3 (Identity, orthogonal and diagonal tensor p.10 of [13]).The identity tensor I ∈ R N 1 ×N 1 ×N 3 is a tensor whose first frontal slice is the N 1 × N 1 identity matrix and all other frontal slices are zero.A tensor It is well known that circulant matrices can be diagonalized by the FFT (21.11 of [38]), similarly, block-circulant matrices can also block-diagonalized as where F N 3 , F N 3 denotes the N 3 × N 3 fast Fourier transform matrix, F * N 3 denotes its conjugate, and the notation ⊗ denotes the standard Kronecker product.In addition, X = A * B if and only if Xn 3 = Ân 3 Bn 3 for 1 ≤ n 3 ≤ N 3 (p.4 of [19]).
Please note that the matrix SVD [39] can be performed on each frontal slice of X , The matrix Xn 3 has the following Singular Value Decomposition (SVD) Xn 3 = Ûn 3 Σn 3 VT n 3 , Ûn 3 ∈ O(N 1 ), Vn 3 ∈ O(N 2 ) (here O(N i ) stands for the set of orthogonal N i -by-N i matrices) and Σn 3 is an N 1 × N 2 diagonal matrix with diagonal entries σ 1 ( Xn 3 ), . . ., σ min{N 1 ,N 2 } ( Xn 3 ) in descending order.Definition 5 (t-SVD, Theorem 4.1 in [13]).The tensor Singular Value Decomposition (t-SVD) of X is given by X = U * Σ * V T , where U and V are orthogonal tensors of size N 1 × N 1 × N 3 and N 2 × N 2 × N 3 respectively.Σ is an f-diagonal tensor of size N 1 × N 2 × N 3 , and * denotes the t-product.Definition 6 (Tensor tubal rank III.B of [21]).The tensor tubal rank is defined as the number of nonzero singular tubes of f-diagonal tensor Σ, which is the maximum of the number of nonzero { Σn,n,n 3 , 1 Definition 7 (Tensor nuclear norm III.B of [21]).The nuclear norm for a given 3rd-order tensor X ∈ R N 1 ×N 2 ×N 3 is defined to be which is equal to Here Xn 3 is the n 3 -th frontal slice of 3 obtained by applying FFT on X along the third mode.
In this paper, to overcome the disadvantages of the tensor nuclear norm, we propose smooth but more effective, though being non-convex, t-SVD-based non-convex rank approximation to the data sets represented by 3rd-order tensor.This will make the larger singular values be shrunk less in order to preserve their characteristics.Moreover, under our framework, we establish explicit solutions to the corresponding non-convex tensor norm minimization problems that is not available in current literature.Definition 8 (Gamma quasi-norm for matrix [26]).Let X ∈ R m×n and the singular values of X, denoted by σ 1 (X), . . ., σ min{m,n} (X).We could introduce the Gamma norm of X, which is defined as Definition 9 (t-Gamma tensor quasi-norm).The Gamma tensor quasi-norm for a given 3rd-order tensor X ∈ R N 1 ×N 2 ×N 3 is defined to be which is equal to .
Lemma 1 (Theorem 1 of [26]).Let A = UΣ A V T be the SVD of A ∈ R m×n and Σ A = diag(σ A ) are singular values of A. Let F(Z) = f • σ(Z) be a unitary invariant function and µ > 0. Then an optimal solution to the following problem Here prox f ,µ (σ A ) is the Moreau-Yosida operator [40], defined as Now let us consider the tensor t-Gamma quasi-norm minimization problem: Theorem 1 (t-Gamma tensor rank minimization).Let us consider a 3rd-order tensor where 3), then Ωn,n,n 3 is the limit point of Fixed Point Iteration Remark 2. Theorem 1 provides the connection of the optimal solution between the original domain and the Fourier domain.More importantly, it shows that the minimization problem under our framework is tractable.After the original tensor X is transformed to X in the Fourier domain, an equivalent minimization problem could be solved by using Lemma 1 for each slice Xn 3 , 1 ≤ n 3 ≤ N 3 in the Fourier domain.Then the optimal solution Z * in the original domain can be obtained by the inverse Fourier transform Ẑ * → Z * , whose detailed approach can be found from the proof in Theorem 1.
Proof of Theorem 1.Let f (Z ) = 1 2 X − Z 2 F + τ||Z || t-γ , then it can be reformulated as matrix objective function in Fourier domain: Following the DC programming algorithm used in Theorem 1 of [26] (similar method has been used in Section III of [41]), we can solve it efficiently using a local minimization method.

Tensor Robust Principal Component Analysis with Non-Convex Regularization
A TRPCA model using non-convex regularization can be formulated as min where M is the Lagrangian multiplier, β is the penalty parameter for the violation of the linear constraints, and C is a constant.
Then, the problem arg min L,E ,M L β (L, E , M) can be updated as: where the tensor non-negative soft-thresholding operator E v (•) [43] is defined as Algorithm 1 shows the pseudocode for the proposed non-convex tensor robust component analysis method.

Input:
The observed tensor X ∈ R N 1 ×N 2 ×N 3 , the set of index of observed entries Ω, parameter λ, stopping criterion .
end while Output: The low rank tensor L and the sparse tensor E Theorem 2 (Convergent Analysis).Suppose that Proof of Theorem 2. E k+1 satisfies the first-order necessary local optimality condition, With some algebra, we have the following equality F .By induction, we could obtain Since Since each term on the right-hand side is bounded, E k+1 is bounded.By the last term on the right-hand, L k+1 is bounded.Therefore, {L k } and {E k } are both bounded.
From Bolzano-Weierstrass theorem [39], there must be at least one accumulation point of the sequence {P k } +∞ k=1 .We denote one of the points P * = {L * , E * , M * }.Without loss of generality, we assume From Theorem 3 of [26], we know Therefore, From the fact that L is equivalent to Tucker product (for its definition see Chapter 3 of [44]) (see the proof of Proposition 3 of [30] and Remark 2.3. of [45]) and using the chain rule [39], we can deduce that In addition, it is not hard to see Thus M k+1 → M * ∈ ∂||E * || 1 .Therefore (L * , E * , M * ) satisfies the the Karush-Kuhn-Tuker (KKT) conditions of the Lagrange function L β (L, E , M) (Chapter 9 of [46]).We thus complete the proof.

Experiments
In this section, we conduct several applications to demonstrate the performance of our algorithm.All experiments are implemented in Matlab 2017b on Ubuntu 16.04 LTS with 12 cores, 2.40 GHz CPU and 64 GB RAM.

Image Recovery
In this part, we focus on noisy color image recovery.A color image with size N 1 × N 2 and 3 channel, i.e., red, green and blue, can be regarded as a third order tensor X N 1 ×N 2 ×N 3 and each frontal slice of X is the corresponding channel of the color image.Please note that each channel of the color image is usually not a low rank matrix, but it is observed that the top singular values dominate the main information [19,47].Thus we can reconstruct the noisy image in low rank structure.
We perform the experiments on Berkeley Segmentation Dataset [48], which contains 200 color image for training.For each image, we randomly set 10% of pixels to the random values in [0, 255], that is, we corrupt the image on 3 channels simultaneously which is more challenging than make corruptions on only one channel.We compare our t-Gamma algorithms with RPCA [4], Gamma [26], SNN [49] and TRPCA [20], in which RPCA minimizes a weighted combination of the nuclear norm and of the l 1 norm in matrix form, Gamma replaces the nuclear norm by matrix Gamma quasi-norm, SNN uses the sum of the nuclear norm for all mode-i matricization of the given tensor, and TRPCA extends the RPCA from the matrix to tensor case.For RPCA and Gamma, we conduct the algorithms on each channel, i.e., regarding each channel of the image as a matrix, and then combine the channel recovery result into a color image.The parameter λ in RPCA and Gamma is set to λ = 1 √ max{N 1 ,N 2 } as suggested in [4].However, when the parameter γ in Gamma is set to 0.01 as stated in [26], it cannot perform well.We empirically set the γ = min{N 1 , N 2 } in Gamma and t-Gamma.For SNN, we set λ = [15, 15, 1.5] as suggested in [20].The parameters λ in TRPCA and t-Gamma is set to λ = 1 √ max{N 1 ,N 2 }N 3 . By using the Peak Signal to Noise Ratio (PSNR) to evaluate the recovery results and PSNR is defined as In Figure 1, we give the comparison of PSNR value on 200 images and each subfigure contains 50 results.The sample recovery performance are shown in Figure 2. From Figure 1, we can observe that the proposed t-Gamma algorithm perform better on almost every image and can achieve around 2 dB improvement in many cases.Table 1 shows the average recovery performance obtained by these five algorithms on Berkeley Segmentation Dataset.In Figure 2 we can visually conclude that our algorithm yields a better result than the compare algorithms which can illustrate the merit of the our approach.[4], Gamma [26], SNN [49], TRPCA [20] and t-Gamma.(f) TRPCA [20]; (g) t-Gamma.

Qualitative Experiments
In this part, we consider the background modeling problem, an essential task in video surveillance, which aims to separate the foreground objects from the background.The frames of background in video surveillance are highly correlated and thus it is reasonable to consider it as a low rank tensor.While the objects in the foreground usually only occupy a small part of the image in some frames and hence can regard it as the sparse part.
To solve this problem, we test our algorithm on four example color videos, Hall, MovedObject, Escalator and Lobby http://perception.i2r.a-star.edu.sg/bk_model/bk_index.html,and compare it with RPCA [4], Gamma [26], SNN [49] and TRPCA [20].We extract N 2 = 200 frames from these four videos.Suppose the size of frames in the given video is h × w, we stack each frame in each color channel as a column vector of size N 1 × 1, where N 1 = h × w, and then all column vectors into a matrix of size 3N 1 × N 2 for RPCA and Gamma, and into a tensor of size N 1 × N 2 × 3 for SNN, TRPCA and t-Gamma.The parameter lambda is set to λ = 1 max{3N 1 ,N 2 } for RPCA and Gamma, and λ = 1 max{N 1 ,N 2 }3 for SNN, TRPCA and t-Gamma.The parameter γ in Gamma and t-Gamma is set to γ respectively.As suggested in [20], λ in SNN is set to [200,2,20].
The background separated results are depicted in Figures 3-6.In Figures 3 and 4, we can see that our method could reconstruct the background with fewer ghosting effects and more completely.From Figures 5 and 6, it could be observed that though all the method separate the background from the foreground effectively, the proposed algorithm are more robust to the dynamic foreground and illumination variation, e.g., only Gamma and t-Gamma can remove the steps of moving escalator in Figure 5, while the method Gamma cannot adapt to change of illumination in Figure 6.

Quantitative Experiments
Since the datasets in Section 5.2.1 do not include the ground truth background for each frame of the video, we perform the algorithms on the ChangeDetection.net(CDNet)dataset2014 [50] which consists of visual surveillance and smart environments.However, only part of frames are accompanied with ground truth background and thus we only take out some of frames of the video for testing.That is, we select the frames from 900 to 1000 of the dataset highway, pedestrians and PETS2006 in the baseline category.Hence the size of the video will be h × w × 101, where h × w is the resolution of each frame, and 101 is the number of frames.We compare our algorithm with RPCA [4], Gamma [26], SNN [49] and TRPCA [20].The parameters setting is conducted from Section 5.2.1.
To evaluate the performance of the experiment results, we use the three metrics for the quantitative measure, that is, Recall, Precision and F-Measure.They are defined as: where TP, FP and FN are true positives numbers, false positives numbers and false negatives numbers respectively.We primarily use F-Measure for overall evaluation since it combines the metrics Recall and Precision and also makes a balance between Recall and Precision which can be viewed as a good indicators for comparison.
From Table 2, we can see that the proposed algorithm achieves better segmentation performance among the compare methods with respect to the F-Measure.Some of the frames of the segmentation results are shown in Figure 7, it can be observed that among the three dataset in CDNet our method can usually get the clearer and more accurate background segmentation.These quantitative comparison illustrates the superior of t-Gamma tensor quasi-norm as the surrogate of the rank of tensor.[26]; (e) SNN [49]; (f) TRPCA [20]; (g) t-Gamma

Concluding Remarks
In this paper, we conduct the TRPCA with a new t-Gamma tensor quasi-norm used as the regularization for tensor rank function.The t-Gamma tensor quasi-norm is developed via t-SVD in the Fourier domain, which simplifies the required optimization process.Compared with the convex tensor nuclear norm, our regularization approach appears to be more effective in approximating the tensor tubal rank.We incorporate the alternating direction method of multipliers with the t-Gamma tensor quasi-norm to efficiently solve the non-convex TRPCA problem, yielding an approachable framework for non-convex optimization.Numerical experimental results indicate that our approach outperforms the existing methods in distinguishable ways.

Table 1 .
Average PSNR obtained by five Algorithms on 200 images.Comparison of the PSNR values with RPCA

Table 2 .
Quantitative Evaluation for algorithm on CDNet Datasets