Exact Solution Analysis of Strongly Convex Programming for Principal Component Pursuit

In this paper, we address strongly convex programming for principal component analysis, which recovers a target matrix that is a superposition of low-complexity structures from a small set of linear measurements. In this paper, we firstly provide sufficient conditions under which the strongly convex models lead to the exact low-rank matrix recovery. Secondly, we also give suggestions that will guide us how to choose suitable parameters in practical algorithms. Finally, the proposed result is extended to the principal component pursuit with reduced linear measurements and we provide numerical experiments.


Introduction
Recently, much attention has been focussed on the problem of recovering a target matrix with low-complexity structure from a small set of linear measurements.This problem has regained great concern since the publication of the pioneering works of E.J. Candés et al. [1][2][3][4].It can be found in many different fields, such as medical imaging [5][6][7], seismology [8], information retrieval [9] and machine learning [10], especially the detection of moving objects [11].In the case of detection of moving objects, the columns of matrix M are the video frames, and the low-rank matrix L 0 and the sparse matrix S 0 are the stationary background and the moving objects in the foreground respectively.According to [12], the main problem of detection of moving objects is how to recover the low-rank matrix L 0 and the sparse matrix S 0 from the given data matrix M = L 0 + S 0 , where L 0 ∈ R n×n has low-rank, and S 0 is a sparse matrix.In the paper [12], E.J. Candés et al. have proved that that most low-rank matrices and the sparse components can be recovered, provided that the rank of the low-rank component is not too large, and that the sparse component is reasonably sparse; and more importantly they proved that this can be done by solving a simple convex optimization problem, i.e., provided that the rank of the matrix L and the cardinality of the sparse matrix S obey some suitable conditions, most matrices L 0 of rank r and the sparse component S 0 can be perfectly recovered by solving the simple optimization problem as follows: wherein L * is the nuclear norm of matrix L, and S 1 is the sum of absolute values of all matrix entries.Strongly convex optimizations have many advantages, e.g., unique optimal solution.Many scholars suggest solving their strongly convex approximations (see, e.g., [13][14][15][16]), instead of directly solving the original convex optimizations.J.F. Cai et al. addressed the strongly convex optimization τ X * + 1 2 X 2 F ( X F denoting the Frobenius norm) instead of the original convex optimization X * , and the authors introduced an important algorithm (singular value thresholding algorithm) to solve matrix completion based on this strongly convex optimization [15]; J. Wright et al. also addressed the strongly convex optimization F instead of the original convex optimization L * + λ S 1 , and the authors proposed the iterative thresholding(IT) algorithm to solve robust principal component analysis [14].J. Wright et al.only confirm performance of iterative thresholding by numerical experiments; however, the authors do not provide the sufficient conditions which can guarantee strongly convex optimization F and original convex optimization L * + λ S 1 have the same optimal solution.In this article, the authors have given many suitable sufficient conditions that would lead the strongly convex models to the exact low-rank and sparse matrix recovery.Some suggestions have been given in [16] on how to choose suitable parameters in practical algorithms.However, the results shown in [16] are limited to a special condition, i.e., Q = R n×n .In this paper, we extend this result to the principal component pursuit with reduced linear measurements, that is It is easy to prove that the results in [16] are only a special case of those that we proposed.

Basic Problem Formulations
In this subsection, we will interpret an important strongly convex programming to be addressed in this paper, and list its existence and uniqueness theorems, which will be proved in the next sections.In [17], the authors have studied principal component pursuit with reduced linear measurements and given sufficient conditions under which L 0 and S 0 can be perfectly recovered.
In this paper, we address a strongly convex programming, and prove that it has the capability to guarantee exact low-rank matrix recovery.The proposed optimization is realized in the following way: wherein, τ ≥ 0 refers to some positive penalty parameter and P Q the orthogonal projection onto the linear subspace Q.We also assume that Q ⊥ is a random subspace.The existence and uniqueness theorems when τ = ∞ in (2)are provided in [17], and listed below.In the end, how to choose suitable parameters in the optimization model ( 2) is discussed.

Contents and Notations
We provide a brief summary of the notations which are used throughout the paper.X denotes the operator norm of matrix X, X F denotes the Frobenius norm, X * the nuclear norm, and the dual norm of X (i) by X * (i) .The Euclidean inner product between two matrices X, Y is defined by the formula X, Y = trace(X * Y).It's easy to note that X 2 F = X, X .The Cauchy-Schwarz inequality which will often be used in next sections gives X, Y ≤ X F Y F , and it is well known that X, Y ≤ X (i) Y * (i) , (see e.g., [2,18]).Linear transformations which act on the space of matrices are denoted by P X.It can be easily seen that the operator of P is high dimension matrix in substance.The operator norm of this operator is signified by P .It should also be noted that P = sup { X F =1} P X F .We say an event E occurs with high probability if P[E] = Cm −α .We denote the reduced singular value decomposition (SVD) of the low-rank matrix L 0 as L 0 = UΣV * , and define a linear subspace T as follows: We denote the support of the sparse matrix S 0 as Ω, by a slight abuse of notation, we also denote Ω as the subspace of matrices whose support is contained in the support of S 0 .
The rest of this paper is organized as follows.In Section 2, we firstly list many important lemmas, at the same time we prove a key lemma on which our main result is based.Secondly suggestions are given in Section 3, these conditions will guide us to choose suitable parameters in practical algorithms.Thirdly, the numerical result is given in Section 4. Finally, conclusions and results are discussed in Section 5.

Important Lemmas
In this section, we first list some useful lemmas which will be used throughout this paper and then prove a main lemma.Although the main lemma is similar to the corresponding one in [17], they have significant difference, in which the construction of W Q is very different.That leads to our necessary additional work.
At the same time we assume that P Ω P Γ ⊥ < 1/2 and λ < 1.Then (L 0 , S 0 ) is the unique optimal solution to (2) if there is a pair (W, F) ∈ R n×n × R n×n satisfying the following conditions

(a). W
Lemma 3 ([17]).In addition to the assumptions in the previous lemma, suppose that the signs of the non-zero entries of S 0 are i.i.d.random.Then the matrix W S obeys the below inequalities with high probability.

(a). W
The construction of W L and W S can be found in the paper [17].However, the matrix W Q constructed in the paper [17] does not satisfy the requirement of our problem, so we have to modify this construction in order to satisfy the problem (2).Firstly we will give explicit construction of W Q , and then prove that the modification of W Q satisfies the proper property.
Construction of W Q with least modification.We define W Q by the following least squares problem: This construction of W Q satisfies Lemma 5 in the paper [17], also has the below proper property.
Then the matrix W Q obeys the below inequalities with high probability. (a).
Proof.A: Bounding the Frobenius norm of UV * + 1 τ L 0 .For convenience, let ξ := UV * + 1 τ L 0 F .According to triangle inequality, we have In the last equality, we have used S 0 ∈ Ω.Note that According to the derivation in [16], the below inequality is true with high probability Putting those all together, we can obtain Combining with τ ≥ M F , we can obtain is the optimum solution of least squares problem, due to this we can use the convergent Neumann series expansion.It's easy to note that According to triangle inequality, we have B: Estimating the first inequality of Lemma 4. In order to bound W Q F , we first have to bound the norm of ∑ k>0 (P According to Lemma 11 in the paper [17], the following inequality is true with high probability for any > 0, According to the paper [17], the following inequality is true with high probability: Secondly, we will bound the Frobenius norm of is a random Gaussian matrix with i.i.d.entries satisfied N (0, 1/n 2 ).Therefore, we can obtain the below inequality Together with Lemma 7 in the paper [17], we can obtain It is easy to note that any entries of H * vec(UV * + 1 τ L 0 ) have the same distribution as < G, UV * + 1 τ L 0 >, in which G ij ∼ N (0, 1/n 2 ) are independent identically distributed.It is obvious to see that , where ξ: = UV * + 1 τ L 0 F .For simplicity, we define Z: = H * vec(UV * + 1 τ L 0 ).Using the Jesen inequality, we can obtain According to the Proposition 2.18 in [18], we can obtain Setting t = 6logn, after a simple inference, we can obtain the below inequality with high probability.
For sufficiently large n, the first inequality of Lemma 4 is established.We will estimate the second inequality of Lemma 4 further.C: Estimating the second inequality of Lemma 4, Note that After a simple inference, we can obtain the below inequality.
wherein C is some constant.Note that the second inequality of Lemma 4 is established for sufficiently large n.

Estimating Parameter τ
In this section, we shall provide sufficient conditions under which (L 0 ; S 0 ) is the unique and exact solution of the strongly convex programming (2) with high probability, i.e., the solution of problem ( 2) is exact L = L 0 and Ŝ = S 0 .Afterwards, an explicit lower bound of τ will be provided, which will further guide us to choose suitable parameters in practical algorithms.
where α, β are positive parameters satisfying Then (L 0 , S 0 ) is the unique solution of the strongly convex programming (2).
Proof.For any feasible perturbation (H L , H S ), it's easy to note that P Q H L = P Q H S .According to the definition of Γ, we have Γ ⊂ Q. Therefore P Γ H L = P Γ H S .For simplicity, we define f (L, S): F , and we can obtain the below inequality In the second inequality above, we have used the facts In the third inequality above, we have used the property P Q H L = P Q H S .
We have provided the bound of f (L 0 + H L , S 0 − H S ), and then we will give the bound of P Ω H S F .According to the definition of Γ , we can obtain Putting those all together, we get According to (6), the inequality above implies that (L 0 , S 0 ) is a solution to (2), i.e., (L 0 ; S 0 ) is the exact solution of the strongly convex programming (2) with high probability.The uniqueness follows from the strong convexity of the objective in (2).
In the practice, the choose of parameter τ is very difficult, therefore we will provide the criterion of the value of τ in the next section which will guide us to choose suitable parameters in practical algorithms.Theorem 3 and 4 provide the criterion of the value of τ, and the bound of τ in Theorem 4 is more explicit and useful in practice.
, and τ 3 = Then, under the other assumptions of Theorem 1, (L 0 , S 0 ) is the unique solution to the strongly convex programming (2) with high probability.
Proof.In order to check the conditions in Theorem 2, we will prove the existence of a matrix W obeying Note that W = W L + W S + W Q .We will check above conditions hold true one by one.For simplicity, we define γ := P Ω ⊥ (L 0 − S 0 ) ∞ , δ := P Ω (L 0 − S 0 ) F Without loss of generality, let β > 1/2.With the help of the construction of W L , W S and W Q , it is easy to check the first and second conditions are true.According to the modification of W Q constructed in Lemma 4, and P Q ⊥ W L = 0 and P Q ⊥ W S = 0, we have P Q , which implies that the third condition holds true.Consequently, we will provide the last two conditions also hold true under some suitable assumptions.Pertaining to the fourth inequality, we have For the last inequality, noting that P Ω (W S ) = λsgn(S 0 ) and P Ω (W Q ) = 0, we can obtain In order to satisfy the condition (8), we choose a τ obeying Combining ( 9) with (6), we can obtain Together with (10) and (11), the Theorem 3 is established.
In order to simplify the Formula (7), we suppose α = 3/8 and β = 5/8, which satisfy the conditions above.Therefore However, note that the exact lower bound is very hard to get, because we only have the information about the given data matrix M in practical problem.Noting that And according to the paper [16], we have It is obvious that M ∞ ≤ M F .Therefore, we can obtain Theorem 3.3 as follows.
Theorem 4. Assuming and the other assumptions of Theorem 1, (L 0 , S 0 ) is the unique solution to the strongly convex programming (2) with high probability.

Numerical Results
In this section, we provide numerical experiments to certify the Theorem 4. Without loss of generality, we assume that r = 2, M = L 0 + S 0 , and a rank-r matrix L 0 = XY T where X and Y are 15 × 2 and 30 × 2 matrices with entries independently sampled from a N (0; δ 2 ) distribution, the sparse matrix S 0 = P Ω E with the support set of size k s = ρ s mn uniformly at random.Assume that (L 1 , S 1 ) and (L 2 , S 2 ) are the optimal solutions of optimization problem (1) and strongly convex optimization problem (2) respectively.Numerical experiments are given under M F = 1. Figure 1

Results and Conclusions
In this paper, we have studied strongly convex programming for principal component pursuit with reduced linear measurements.
We firstly provide sufficient conditions under which the strongly convex models lead to the exact low rank and sparse components recovery, i.e., Assuming and the other assumptions of Theorem 1, (L 0 , S 0 ) is the unique solution to the strongly convex programming (2) with high probability.
Secondly, we give the criterion of the choice of the value of τ, which gives very useful advice on how to set the suitable parameters in designing efficient algorithms.In particular, it is easy to note that the main results of paper [16] are only the special case of our results.In some sense, we extend the result of choosing suitable parameters to the general problem.