Stable Analysis of Compressive Principal Component Pursuit

Compressive principal component pursuit (CPCP) recovers a target matrix that is a superposition of low-complexity structures from a small set of linear measurements. Pervious works mainly focus on the analysis of the existence and uniqueness. In this paper, we address its stability. We prove that the solution to the related convex programming of CPCP gives an estimate that is stable to small entry-wise noise. We also provide numerical simulation results to support our result. Numerical results show that the solution to the related convex program is stable to small entry-wise noise under board condition.


Introduction
Recently, there has been a rapidly increasing interest in recovering a target matrix that is a superposition of low-rank and sparse components from a small set of linear measurements.In many cases, this problem is shorted for matrix completion [1][2][3], which arises in a number of fields, such as medical imaging [4,5], seismology [6], and computer vision [7,8] and Kalman filter [9].Mathematically, there exists a large-scale data matrix M = L 0 + S 0 , where L 0 is a low-rank matrix, and S 0 is a sparse matrix.One of the important problems here is how to extract the intrinsic low-dimensional structure from a small set of linear measurements.In a recent paper [10], E. J. Candès et al. proved that most low-rank matrices and the sparse components can be recovered, provided that the rank of the low-rank component is not too large, and that the sparse component is reasonably sparse.It is more important that they proved that these two components can be recovered by solving a simple convex optimization problem.In [11], John Wright et al. generalized this problem to decompose a matrix into multiple incoherent components: where X i (i) are norms that encourage various types of low-complexity structure.The authors also provide a sufficient condition that can promise the existence and uniqueness theorem of compressive principle component pursuit (CPCP).The result in [11] requires that the components are low-complexity structures.
However, in many applications, the observed measurements are always corrupted by different kinds of noise which may affect every entry of the data matrix.In order to further complete the theory developed in [11], it is necessary to research the stability of CPCP which can guarantee stable and accurate recovery in the presence of entry-wise noise.In this paper, we make a commendable attempt in this respect.We denote M as the observing matrix which can decompose into multiple incoherent components, and assume that where X i,0 are corresponding incoherent components and Z 0 is an independent and identically distributed (i.i.d.) noise.We assume that Z 0 is only limited by Z 0 F ≤ δ for some δ > 0. In order to recover the unknown low-complexity structures, we suggest solving the following relaxed optimization problem.
In this paper, we prove the solution of ( 2) is stable to small entry-wise noise.The rest of paper is organized as follows.In Section 2, we show some notations and the main result, which will be proven in Sections 3 and 4. In Section 3, we give two important lemmas which are an important parts of our main result.In Section 4, The proof of Theorem 1 will be given.We further provide numerical results in Section 5 and conclude the paper in Section 6.

Notations and Main Results
In this section, we first give some important notions which will be used throughout this paper, and then provide the main result.

Notations
We denote the operator norm of matrix by X , the Frobenius norm by X F , and the nuclear norm by X * , and denote the dual norm of X (i) by X * (i) .The Euclidean inner product between two matrices is defined by the formula X, Y = trace(X * Y).Note that X 2 F = X, X .The Cauchy-Schwarz inequality gives X, Y ≤ X F Y F , and it is well known that we also have X, Y ≤ X (i) Y * (i) (e.g., [1,12]).X (i) majorized the Frobenius norm means X (i) ≥ X F for all X.Linear transformations which act on the space of matrices are denoted by P T X.It is easy to see that the operator of P T is a high dimension matrix.The operator norm of the operator is denoted by P T .It should be noted that P T = sup { X F =1} P T X F .
For any matrix vector x = [X i ], i = 1, 2, . . ., τ, where X i ∈ R m×n is i-th matrix.We will consider two norms of this matrix pair, which can define as x := ∑ τ i λ i X i (i) and x 2 := ∑ τ i X i F .In order to simplify the stability analysis of CPCP, we also define the subspaces (the common component) In order to analyze the behavior of special projection operator, we define the projection operator we assume that X i (i) i = 1, 2, . . ., τ are decomposable norms.The definition of decomposable norms is below.Definition 1 (Decomposable Norms).if there exists a subspace T and a matrix Z satisfying where • * denotes the dual norm of • and P T ⊥ is nonexpansive with respect to • * .Then, we say that the norm • is decomposable at X.

Main Results
Pertaining to Problem (1), we have the result as follows.
Lemma 1 ([11]).Assume there exists a feasible solution x = (X 1 , . . ., X τ ) to the optimization Problem (1).Suppose that each of the norms • (i) is decomposable at X i , and that each of the • (i) majorizes the Frobenius norm.Then, x is the unique optimal solution if T 1 , . . ., T τ are independent subspaces with and there exists an (α, β)-inexact certificate Λ, with The main contribution of this paper is the stability analysis of the solution of CPCP; the main Theorem of [13] can be regarded as a special case of our result (although the main idea of proof is similar to the paper [13], there are some important differences here).Next, we will provide the proposed related convex programming ( 2) is stable from small entry-wise noise under board condition.The main result of this paper is provided below.Theorem 1. Assume x = (X 1, , . . ., X τ, ), x = (X 1 , . . ., X τ ) are the solutions of the optimization Problems (1) and (2), respectively.Suppose that each of the norms • (i) is decomposable at X i, , and each of the • (i) majorizes the Frobenius norm.Then, if T 1 , . . ., T τ are independent subspaces with and there exists an (α, β)-inexact certificate Λ, with then for any Z 0 which is limited by Z 0 F ≤ δ, the solution x to the convex programming (2) obeys where C(n, τ, α, β) is a numerical constant only depending upon n, τ, α, β.

Main Lemmas
In this section, we present two main lemmas which are used to obtain Theorem 1.The paper [11] states that: Lemma 2 ([11]).Suppose T 1 , . . ., T τ are independent subspaces of R m×n and Z 1 ∈ T 1 , . . ., Z τ ∈ T τ , under the other conditions of Lemma 1.Then, the below equations In order to bound the behavior of the norm of x , we have the first main lemma that is used to obtain Theorem 1.
Lemma 3. Assume P T i P T j < 1 τ−1 ∀i = j.Suppose there exists an (α, β)-inexact certificate Λ satisfying Lemma 1.Then, for any perturbation h = [H i ] obeying ∑ i H i = 0 . It is easy to see that under the hypothesis of Lemma 1, the Proof.According to the property of convex function, for any subgradients z = [Z i ] ∈ ∂ x 0 , we can obtain Now, because the norm of the subgradients is decomposable at X i , there exists Λ, Z i , α, and where the second equation obeys Z i ∈ T i .According to the above equation, we will continue bounding With the definition of duality, there exists Ẑi ∈ ∂ X i,0 (i) with Ẑi * . Moreover, with the Cauchy-Schwarz inequality, we have Let Z i = Ẑi .Then, we can obtain: Combining with the inequalities above, we can obtain The Lemma 3 is established.
For bounding the behavior of F , we have to bound the projection operator P T 1 × • • • × P T τ (x).Therefore, we have the second main lemma that will be used to obtain Theorem 1. Lemma 4. Assume that P T i P T j < 1 τ−1 ∀i = j.For any matrix vector x = [X i ], we have It is easy to see that under the hypothesis of P T i P T j < is strictly greater than zero.
Proof.With respect to any matrix x = [X i ], we have F .Then, we have Note that < P T i X i , P T j X j > = < P T i X i , P T i P T j X j > ≥ − P T i P T j P T i X i F P T j X j F .
Together with P T i P T j < 1 τ−1 ∀i = j, we have where in the second inequality, we have used the inequity that for any x, y, 2xy ≤ (x 2 + y 2 ).Therefore, Lemma 4 is established.

Proof of Theorem 1
In this section, we will provide the proof of Theorem 1.Our main proof is based on two elementary and important properties of x, which is the solution of Problem (2).First, note that x 0 is also a feasible solution to Problem (2) and x is the optimum solution; therefore, we can obtain x ≤ x 0 .Second, according to triangle inequality, we can obtain According to the definition of subspace of γ, we denote h γ := P γ (h), h γ ⊥ := P γ ⊥ (h) for short.Our main aim is to bound h 2 = x − x 0 2 , which can be rewritten as Combining with (4), we have Therefore, it is necessary to bound the other two terms on the right-hand-side of (5).We will bound the second and third terms, respectively.
Norm equivalence theorem tells us that every two norms on a finite dimensional normed space are equivalent, which implies that there exists two constants A. Estimate the third term of (5) Let Λ be a dual certificate obeying Lemma 1.Then, using triangle inequality, we have Combining with Lemma 3, we can obtain wherein, to get the third inequality, we used the fact x d ≤ x 0 d .For simplification, let Therefore, we have Combining with (7), we can obtain Then where C 2 (α, β) = 1/C 1 (α, β).We will estimate the third term of (5).Using triangle inequality, we have where C(n, τ, α, β) := 2C 2 (α,β) c(n,τ) √ τ .The second inequality is set up by (6); the fourth inequality is obtained by (8); the last one is obtained by the fact h γ 2 ≤ 2δ √ τ .Therefore, we can obtain which implies that the third term of (5) can bound by Cδ.
B. Estimate the second term of (5) According to Lemma 4, we can obtain where Ĉ(τ, α, β) := 1−max i 1 2 ∑ j =i ( P T i P T j + P T j P T i ) τ . Note that Therefore, Taking the previous two inequalities, we have where C(n, τ, α, β) is an appropriate constant.Combining with (9), we can obtain Therefore, Theorem 1 is established.

Numerical Results
In this section, numerical experiments with varieties of the value of parameter σ, parameter ρ s , and rank r are given.For each setting of parameters, we show the average errors over 10 trials.
Our implementation was realized with MATLAB.All the computational results were obtained on a desktop computer with a 2.27-GHz CPU (Intel(R) Core(TM) i3) and 2 GB of memory.Without loss of generality, we assume that τ = 2.In [13], the authors certified this result with Accelerated Proximal Gradient (APG) by numerical experiments.In our numerical experiments, we will provide that this result is also proper with Principal Component Pursuit by Alternating Direction Method (PCP-ADM).
In our simulations, our matrix is generated by the formulation as: M = L 0 + S 0 + N 0 , and a rank-r matrix L 0 is a product L 0 = XY T , where X and Y are m × r and n × r matrices in which entries are independently sampled from a N (0; 1) distribution.According to PCP-ADM, we can generate S 0 by choosing a support set of size k s = ρ s mn uniformly at random, and set S 0 = P Ω E. Noise component N 0 is generated with entries independently sampled from a N (0; σ) distribution.Without loss of generality, we set m = n = 200 and ρ s = 0.01, and other parameters which PCP-ADM requires are the same as parameters of PCP-ADM [10].Here we briefly interpret PCP-ADM.In [10], in order to stably recover X = ( L; Ŝ), the ADM method operates on the augmented Lagrangian l(L, S, Y) = The details of the PCP-ADM can be found in [14,15].In our simulations, the stopping criterion of the PCP-ADM algorithm can be or the maximum iteration number (k max = 500).In order to estimate the errors, we use the root-mean-squared (RMS) error as L − L 0 F /n, Ŝ − S 0 F /n for the low-rank component and the sparse component, respectively.Figure 1 shows the RMS errors' variation with different values of σ 2 .It is noted that the RMS error grows approximately linearly with the noise level in Figure 1.This phenomenon verifies Theorem 1 by numerical experiments with PCP-ADM (this phenomenon also exists in [13] with APG, which is very different from PCP-ADM in principle).

Conclusions
In this paper, we have investigated the the stability of CPCP.Our main contribution is the proof of Theorem 1, which implies the solution to the related convex programming (1.2) is stable to small entrywise noise under board condition.It is an extension of the result in [13], which only allows τ = 2.Moreover, in the numerical experiments, we have investigated the performance of the PCP-ADM algorithm.Numerical results showed that it is stable to small entrywise noise.