Unified Scalable Equivalent Formulations for Schatten Quasi-Norms

The Schatten quasi-norm can be used to bridge the gap between the nuclear norm and rank function. However, most existing algorithms are too slow or even impractical for large-scale problems, due to the SVD or EVD of the whole matrix in each iteration. In this paper, we rigorously prove that for any p, p1, p2>0 satisfying 1/p=1/p1+1/p2, the Schatten-p quasi-norm is equivalent to the minimization of the product of the Schatten-p1 norm (or quasi-norm) and Schatten-p2 norm (or quasi-norm) of its two factor matrices. Then we prove the equivalence relationship between the product formula of the Schatten quasi-norm and its sum formula for the two cases of p1 and p2. Especially, when p>1/2, there is an equivalence between the Schatten-p quasi-norm (or norm) of any matrix and the Schatten-2p norms of its two factor matrices. That is, various Schatten-p quasi-norm minimization problems with p>1/2 can be transformed into the one only involving the smooth, convex norms of two factor matrices, which can lead to simpler and more efficient algorithms than conventional methods. We further extend the theoretical results of two factor matrices to the cases of three and more factor matrices, from which we can see that for any 0<p<1, the Schatten-p quasi-norm is the minimization of the mean to the power of p3+1 of the Schatten-(p3+1)p norms of all factor matrices, where p3 denotes the largest integer not exceeding 1/p. In other words, for any 0<p<1, the Schatten-p quasi-norm minimization can be transformed into an optimization problem only involving the smooth, convex norms of multiple factor matrices. In addition, we present some representative examples for two and three factor matrices. The bi-nuclear and Frobenius/nuclear quasi-norms defined in [1] and the tri-nuclear quasi-norm defined in [2] are three important special cases.


I. INTRODUCTION
T HE affine rank minimization problem arises directly in various areas of science and engineering including statistics, machine learning, information theory, data mining, medical imaging and computer vision.Some representative applications include low-rank matrix completion (LRMC) [3], robust principal component analysis (RPCA) [4], low-rank representation [5], multivariate regression [6], multitask learning [7] and system identification [8].To efficiently solve such problems, we mainly relax the rank function to its tractable convex envelope, i.e., the nuclear norm (sum of the singular values, also known as the trace norm or Schatten-1 norm), which leads to a convex optimization problem [3,9,10,11].
In fact, the nuclear norm of one matrix is the ℓ 1 -norm of the vector of its singular values, and thus it can motivate a low-rank solution.However, it has been shown in [12,13] that the ℓ 1 -norm over-penalizes large entries of vectors, and therefore results in a solution from a possibly biased solution space.Recall from the relationship between the ℓ 1 -norm and nuclear norm, the nuclear norm penalty shrinks all singular values equally, which also leads to over-penalize large singular values.That is, the nuclear norm may make the solution deviate from the original solution as the ℓ 1 -norm does.Compared with the nuclear norm, the Schatten-p quasi-norm with 0 < p < 1 is non-convex, but it can give a closer approximation to the rank function.Thus, the Schatten-p quasi-norm minimization has received a significant amount of attention from researchers in various communities, such as images recovery [14,15], collaborative filtering [16,17] and MRI analysis [18].
Recently, two classes of iterative reweighted lease squares (IRLS) algorithms in [19] and [20] were proposed to approximate associated Schatten-p quasi-norm minimization problems, respectively.In addition, Lu et al. [14] proposed a family of iteratively reweighted nuclear norm (IRNN) algorithms to solve various non-convex surrogate, including the Schatten quasi-norm, minimization problems.In [14,15,16,21,22], the Schatten-p quasi-norm has been shown to be empirically superior to the nuclear norm for many different problems.Moreover, [23] theoretically proved that Schatten-p quasi-norm minimization requires significantly fewer measurements than conventional nuclear norm minimization.However, existing algorithms mentioned above have to be solved iteratively and involve singular value decomposition (SVD) or eigenvalue decomposition (EVD) in each iteration.Thus they suffer from high computational cost and are even not applicable for large-scale problems [1,2].
On the contrary, the nuclear norm has a scalable equivalent formulation, also known as the bilinear spectral penalty [11,24,25], which has been successfully applied in many large-scale applications, such as collaborative filtering [17,26,27].In addition, Zuo et al. [28] proposed a generalized shrinkagethresholding operator to iteratively solve ℓ p quasi-norm minimization with arbitrary p values, i.e., 0 ≤ p < 1.
Since the Schatten-p quasi-norm of one matrix is equivalent to the ℓ p quasi-norm on its singular values, we may naturally ask the following question: can we design a unified scalable equivalent formulation to the Schatten-p quasi-norm with arbitrary p values, i.e., 0 < p < 1.
In this paper, we first present and prove the equivalence relationship between the Schatten-p quasi-norm of any matrix and the minimization of the product of the Schatten-p 1 norm (or quasi-norm) and Schattenp 2 norm (or quasi-norm) of its two factor matrices, for any p, p 1 , p 2 > 0 satisfying 1/p = 1/p 1 +1/p 2 .In addition, we also prove the equivalence relationship between the product formula of the Schatten quasinorm and its sum formula for the two cases of p 1 and p 2 : p 1 = p 2 and p 1 = p 2 .When p > 1/2 and by setting the same value for p 1 and p 2 , there is an equivalence between the Schatten-p quasi-norm (or norm) of any matrix and the Schatten-2p norms of its two factor matrices, where a representative example is the widely used equivalent formulation of the nuclear norm, i.e., In other worlds, various Schatten-p quasi-norm minimization problems with p > 1/2 can be transformed into the one only involving the smooth convex norms of two factor matrices, which can lead to simpler and more efficient algorithms than conventional methods [14,15,16,19,20,21,22].
We further extend the theoretical results of two factor matrices to the cases of three and more factor matrices, from which we can know that for any 0 < p < 1, the Schatten-p quasi-norm of any matrix is equivalent to the minimization of the mean to the power of (⌊1/p⌋+1) of the Schatten-(⌊1/p⌋+1)p norms of all factor matrices, where ⌊1/p⌋ denotes the largest integer not exceeding 1/p.Note that the norms of all factor matrices are convex and smooth.Besides the theoretical results, we also present several representative examples for two and three factor matrices.Naturally, the bi-nuclear and Frobenius/nuclear quasi-norms defined in our previous paper [1] and the tri-nuclear quasi-norm defined in our previous paper [2] are three important special cases.

II. NOTATIONS AND BACKGROUND
Definition 1.The Schatten-p norm (0 < p < ∞) of a matrix X ∈ R m×n (without loss of generality, we can assume that m ≥ n) is defined as where σ i (X) denotes the i-th singular value of X.
When p ≥ 1, Definition 1 defines a natural norm, for instance, the Schatten-1 norm is the so-called nuclear norm, X * , and the Schatten-2 norm is the well-known Frobenius norm, whereas it defines a quasi-norm for 0 < p < 1.As the non-convex surrogate for the rank function, the Schatten-p quasi-norm is the better approximation than the nuclear norm [23], analogous to the superiority of the ℓ p quasi-norm to the ℓ 1 -norm [20,29].
To recover a low-rank matrix from a small set of linear observations, b ∈ R l , the general Schatten quasi-norm minimization problem is formulated as follows: where A : R m×n → R l is a general linear operator.Alternatively, the Lagrangian version of ( 2) is where λ > 0 is a regularization parameter, and the loss function f (•) : R l → R generally denotes certain measurement for characterizing the loss A(X) − b.For instance, A is the linear projection operator P Ω , and f (•) = • 2 2 in LRMC problems [14,19,22,30], where P Ω is the orthogonal projection onto the linear subspace of matrices supported on Ω := {(i, j)|D ij is observed}: P Ω (D) ij = D ij if (i, j) ∈ Ω and P Ω (D) ij = 0 otherwise.In addition, for RPCA problems [4,31,32,33,34], A is the identity operator and f (•) = • 1 .In the problem of multivariate regression [35], A(X) = AX with A being a given matrix, and f (•) = • 2 F .f (•) may be chosen as the Hinge loss in [24] or the ℓ p quasi-norm in [16].Generally, the Schatten-p quasi-norm minimization problem, such as (2) and (3), is non-convex, nonsmooth and even non-Lipschitz [36].So far, only few algorithms, such as IRLS [19,20] and IRNN [14], have been developed to solve such challenging problems.However, since most existing Schatten-p quasinorm minimization algorithms involve SVD or EVD of the whole matrix in each iteration, they suffer from a high computational cost of O(n 2 m), which severely limits their applicability to large-scale problems [1,2].While there have been many efforts towards fast SVD or EVD computation such as partial SVD [37], the performance of those methods is still unsatisfactory for many real applications [38].Therefore, it is a very important problem that how to transform the challenging problems such as (2) and (3) into a more tractable one, which can be solved by simpler and more efficient algorithms.

III. MAIN RESULTS
In this section, we first present and prove the equivalence relationship between the Schatten-p quasinorm of any matrix and the Schatten-p 1 and Schatten-p 2 quasi-norms (or norms) of its two factor matrices, where 1/p = 1/p 1 + 1/p 2 with any p 1 > 0 and p 2 > 0.Moreover, we prove the equivalence relationship between the product formula of the Schatten quasi-norm and its sum formula for the two cases of p 1 and p 2 : p 1 = p 2 and p 1 = p 2 .For any 1/2 < p ≤ 1, the Schatten-p quasi-norm (or norm) of any matrix is equivalent to the minimization of the squared mean of the Schatten-2p norms of both factor matrices, for instance , which can lead to simpler and more efficient algorithms than conventional methods.Finally, we extend the theoretical results of two factor matrices to the cases of three and more factor matrices.We can see that for any 0 < p < 1, the Schatten-p quasi-norm of any matrix is the minimization of the mean to the power of (⌊1/p⌋+1) of the Schatten-(⌊1/p⌋+1)p norms of all factor matrices, where ⌊1/p⌋ denotes the largest integer not exceeding 1/p.

A. Unified Schatten Quasi-Norm Formulations of Two Factor Matrices
Theorem 1.For any matrix X ∈ R m×n with rank(X) = r ≤ d, it can be decomposed into the product of two much smaller matrices U ∈ R m×d and V ∈ R n×d , i.e., X = UV T .For any 0 < p ≤ 1, p 1 > 0 and p 2 > 0 The detailed proof of Theorem 1 is provided in Section IV-A.From Theorem 1, it is very clear that for any 0 < p ≤ 1 and p 1 , p 2 > 0 satisfying 1/p = 1/p 1 +1/p 2 , then the Schatten-p quasi-norm (or norm) of any matrix X is equivalent to the minimization of the product of the Schatten-p 1 norm (or quasi-norm) and Schatten-p 2 norm (or quasi-norm) of its two factor matrices.
Naturally, we can see that p 1 and p 2 may have the same value, i.e., p 1 = p 2 = 2p, or different values, i.e., p 1 = p 2 .Next, we discuss these two cases for p 1 and p 2 , i.e., p 1 = p 2 and p 1 = p 2 .
1) Case of p 1 = p 2 : First, we discuss the case when p 1 = p 2 .In fact, for any given 0 < p ≤ 1, there exist infinitely many pairs of positive numbers p 1 and p 2 satisfying 1/p 1 +1/p 2 = 1/p, such that the equality (4) holds.By setting the same value for p 1 and p 2 , i.e., p 1 = p 2 = 2p, we give a unified scalable equivalent formulation for the Schatten-p quasi-norm (or norm) as follows.
Corollary 1.Given any matrix X ∈ R m×n of rank(X) = r ≤ d, then the following equalities hold: (5)

Remark 1. The detailed proof of Corollary 1 is provided in Section IV-B. From the second equality in
(5), we know that, for any 0 < p ≤ 1, the Schatten-p quasi-norm (or norm) minimization problems in many low-rank matrix completion and recovery applications can be transformed into the one of minimizing the mean of the Schatten-2p norms (or quasi-norms) of both much smaller factor matrices.We note that when 1/2 < p ≤ 1, the norms of both much smaller factor matrices are convex and smooth due to 2p > 1, which can lead to simpler and more efficient algorithms than conventional methods [14,15,16,19,20,21,22].
When p = 1 and p 1 = p 2 = 2, the equalities in Corollary 1 become the following forms.
Corollary 2. Given any matrix X ∈ R m×n with rank(X) = r ≤ d, the following equalities hold: The bilinear spectral penalty in the third equality of ( 6) has been widely used in many low-rank matrix completion and recovery problems, such as collaborative filtering [11,24], RPCA [39], online RPCA [? ], and image recovery [40].Note that the well-known equivalent formulations of the nuclear norm in Corollary 2 are just a special case of Corollary 1, i.e., p = 1 and p 1 = p 2 = 2.In the following, we give two more representative examples for the case of p 1 = p 2 .
Example 1: When p = 1/2, and by setting p 1 = p 2 = 1 and using Theorem 1, we have Due to the basic inequality xy ≤ [(x+y)/2] 2 for any real numbers x and y, we obtain X as in [1,2], then we have X = U ⋆ V T ⋆ and i,i .Therefore, under the constraint X = UV T , we have the following property [1,2].
In our previous papers [1,2], the scalable formulations in the above equalities are known as the binuclear quasi-norm.In other words, the bi-nuclear quasi-norm is also a special case of Corollary 1, i.e., p = 1/2 and p 1 = p 2 = 1.
Example 2: When p = 2/3, and by setting p 1 = p 2 = 4/3 and using Theorem 1, we have Due to the basic inequality xy ≤ ( x+y 2 ) 2 for any real numbers x and y, then Together with the constraint X = UV T , thus we have the following property.
2) Case of p 1 = p 2 : In this part, we discuss the case of p 1 = p 2 .Deferent from the case of p 1 = p 2 , we may set infinitely many different values for p 1 and p 2 .For any given 0 < p ≤ 1, there must exist p 1 , p 2 > 0, at least one of which is no less than 1 (which means that the norm of at least one factor matrix is convex), such that 1/p = 1/p 1 + 1/p 2 .Indeed, for any 0 < p ≤ 1, the values of p 1 and p 2 may be different, e.g., p 1 = 1 and p 2 = 2, thus we give the following unified scalable equivalent formulations for the Schatten-p quasi-norm (or norm).
Corollary 3. Given any matrix X ∈ R m×n of rank(X) = r ≤ d, and any 0 < p ≤ 1, p 1 > 0 and p 2 > 0 satisfying 1/p 1 +1/p 2 = 1/p, then the following equalities hold: In the following, we give two representative examples for the case of p 1 = p 2 .
Example 3: When p = 2/3, and by setting p 1 = 1 and p 2 = 2, and using Theorem 1, then In addition, we have where the inequality a ≤ holds due to the fact that x 1 x 2 x 3 ≤ [(x 1 +x 2 +x 3 )/3] 3 for any real numbers x 1 , x 2 and x 3 , and the inequality b ≤ follows from the Jensen's inequality for the concave function g X as in [1], then we have X = U ⋆ V T ⋆ and Therefore, together with the constraint X = UV T , we have the following property [1].
In our previous paper [1], the scalable formulations in the above equalities are known as the Frobenius/nuclear hybrid quasi-norm.It is clear that the Frobenius/nuclear hybrid quasi-norm is also a special case of Corollary 3, i.e., p = 2/3, p 1 = 1 and p 2 = 2.As shown in the above representative examples and our previous papers [1,2], we can design more efficient algorithms to solve the Schatten-p quasi-norm with 1/2 ≤ p < 1 than conventional methods [14,15,16,19,20,21,22].
Example 4: When p = 2/5, and by setting p 1 = 1/2 and p 2 = 2, and using Theorem 1, we have Moreover, where the inequality a ≤ holds due to the familiar inequality of arithmetic and geometric means, and the inequality b ≤ follows from the Jensen's inequality for the concave function g X , then we have X = U ⋆ V T ⋆ and With the constraint X = UV T , thus we have the following property.

B. Extensions to Multiple Factor Matrices
Theorem 2. For any matrix X ∈ R m×n of rank(X) = r ≤ d, it can be decomposed into the product of three much smaller matrices U ∈ R m×d , V ∈ R d×d and W ∈ R n×d , i.e., X = UV W T .For any 0 < p ≤ 1 and The detailed proof of Theorem 2 is provided in Section IV-D.From Theorem 2, we can see that for any 0 < p ≤ 1 and p 1 , p 2 , p 3 > 0 satisfying 1/p 1 +1/p 2 +1/p 3 = 1/p, the Schatten-p quasi-norm (or norm) of any matrix is equivalent to the minimization of the product of the Schatten-p 1 norm (or quasi-norm), Schatten-p 2 norm (or quasi-norm) and Schatten-p 3 norm (or quasi-norm) of these three much smaller factor matrices.Similarly, we extend Theorem 2 to the case of more factor matrices as follows.
Theorem 3.For any matrix X ∈ R m×n of rank(X) = r ≤ d, it can be decomposed into the product of multiple much smaller matrices U i , i = 1, 2, . . ., M, i.e., X = M i=1 U i .For any 0 < p ≤ 1 and p i > 0 for all i = 1, 2, . . ., M, satisfying M i=1 1/p i = 1/p, then The proof of Theorem 3 is very similar to that of Theorem 2 and is thus omitted.Similar to the case of two factor matrices, for any given 0 < p ≤ 1, there exist infinitely many positive numbers p 1 , p 2 and p 3 such that 1/p 1 +1/p 2 +1/p 3 = 1/p, and the equality (12) holds.By setting the same value for p 1 , p 2 and p 3 , i.e., p 1 = p 2 = p 3 = 3p, we give the following unified scalable equivalent formulations for the Schatten-p quasi-norm (or norm).
Corollary 4. Given any matrix X ∈ R m×n of rank(X) = r ≤ d, then the following equalities hold: Remark 3. The detailed proof of Corollary 4 is provided in Section IV-E.From the second equality in (14), we know that, for any 0 < p < 1, various Schatten-p quasi-norm minimization problems in many low-rank matrix completion and recovery applications can be transformed into the problem of minimizing the mean of the Schatten-3p norms (or quasi-norms) of three much smaller factor matrices.In addition, we note that when 1/3 < p ≤ 1, the norms of the three factor matrices are convex and smooth due to 3p > 1, which can also lead to some simpler and more efficient algorithms than conventional methods.

Example 5:
In the following, we give a representative example.When p = 1/3 and p 1 = p 2 = p 3 = 3, the equalities in Corollary 4 become the following forms [2].
Property 5.For any matrix X ∈ R m×n of rank(X) = r ≤ d, then the following equalities hold: From Property 5, we can see that the tri-nuclear quasi-norm defined in our previous paper [2] is also a special case of Corollary 4. Corollary 5. Given any matrix X ∈ R m×n of rank(X) = r ≤ d, then the following equalities hold: IV. PROOFS In this section, we give the detailed proofs for some important theorems and corollaries.We first introduce several important inequalities, such as the Jensen's inequality, Hölder's inequality and Young's inequality, that we use throughout our proofs.
Lemma 1 (Jensen's inequality).Assume that the function g : R + → R + is a continuous concave function on [0, +∞).For all t i ≥ 0 satisfying i t i = 1, and any x i ∈ R + for i = 1, . . ., n, then Lemma 2 (Hölder's inequality).For any p, q > 1 satisfying 1/p + 1/q = 1, then for any x i and y i , with equality iff there is a constant c = 0 such that each x p i = cy q i .
A. Proof of Theorem 1 Before giving a complete proof for Theorem 1, we first present and prove the following lemma.
Lemma 4. Suppose that Z ∈ R m×n is a matrix of rank r ≤ min(m, n), and we denote its thin SVD by For any A ∈ R r×r satisfying AA T = A T A = I r×r , and the given p (0 < p ≤ 1), then (AΣ Z A T ) k,k ≥ 0 for all k = 1, . . ., r, and where Tr p (B) = i B p ii .
Proof: For any k ∈ {1, . . ., r}, we have Recall that g(x) = x p with 0 < p < 1 is a concave function on R + .By using the Jensen's inequality [41], as stated in Lemma 1, and i a 2 ki = 1 for any k ∈ {1, . . ., r}, we have Using the above inequality and k a 2 ki = 1 for any i ∈ {1, . . ., r}, (20) can be rewritten as In addition, when g(x) = x, i.e., p = 1, we obtain which means that the inequality ( 21) is still satisfied.This completes the proof.

Proof of Theorem 1:
be the thin SVDs of U and V , respectively, where , where the columns of L X ∈ R m×d and R X ∈ R n×d are the left and right singular vectors associated with the top d singular values of X with rank at most r (r ≤ d), and for ∀i, j ∈ {1, 2, . . ., d}, where a i,j denotes the element of the matrix A in the i-th row and the j-th column.In addition, let According to the above analysis, then we have Let ̺ i and τ j denote the i-th and the j-th diagonal elements of Σ U and Σ V , respectively.In the following, we consider the two cases of p 1 and p 2 , i.e., at least one of p 1 and p 2 must be no less than 1, or both of them are smaller than 1.It is clear that for any 1/2 ≤ p ≤ 1 and p 1 , p 2 > 0 satisfying 1/p 1 +1/p 2 = 1/p, at least one of p 1 and p 2 must be no less than 1.On the other hand, only if 0 < p < 1/2, there exist 0 < p 1 < 1 and 0 < p 2 < 1 such that 1/p 1 +1/p 2 = 1/p, i.e., both of them are smaller than 1.
Case 1.For any 0 < p ≤ 1, there exist p 1 > 0 and p 2 > 0, at least one of which is no less than 1, such that 1/p 1 + 1/p 2 = 1/p.Without loss of generality, we assume that p 2 ≥ 1.Here, we set k 1 = p 1 /p and k 2 = p 2 /p.Clearly, we can know that k 1 , k 2 > 1 and 1/k 1 +1/k 2 = 1.From Lemma 4, we have where the inequality a ≤ holds due to the Hölder's inequality [41], as stated in Lemma 2. In addition, the inequality b ≤ follows from the basic inequality xy ≤ x 2 +y 2 2 for any real numbers x and y, and the inequality c ≤ relies on the facts that i (O 3 ) 2 ij = 1 and i (O 4 ) 2 ji ≤ 1, and we apply the Jensen's inequality (see Lemma 1) for the convex function h(x) = x p 2 with p 2 ≥ 1.
Thus, for any matrices U ∈ R m×d and V ∈ R n×d satisfying X = UV T , we have On the other hand, let X , where Σ p X is entry-wise power to p, then we obtain and In summary, for any 0 < p ≤ 1, p 1 > 0 and p 2 > 0 satisfying 1/p = 1/p 1 +1/p 2 , we have This completes the proof.

B. Proof of Corollary 1
Proof: Because p 1 = p 2 = 2p > 0, and using Theorem 1, we obtain Due to the basic inequality xy ≤ ( x+y 2 ) 2 for any real numbers x and y, we have X is entry-wise power to 1/2, then we obtain which implies that .
The theorem now follows because This completes the proof.

C. Proof of Corollary 3
Proof: For any 0 < p ≤ 1, p 1 > 0 and p 2 > 0 satisfying 1/p 1 +1/p 2 = 1/p, and using Theorem 1, we have Let k 1 = (p 1 +p 2 )/p 2 and k 2 = (p 1 +p 2 )/p 1 , we can know that 1/k 1 +1/k 2 = 1.Then where the above inequality follows from the well-known Young's inequality, as stated in Lemma 3, and the monotone increasing property of the function g Then which implies that This completes the proof.

E. Proof of Corollary 4
Proof: Since p 1 = p 2 = p 3 = 3p > 0, and using Theorem 2, we have From the basic inequality xyz ≤ ( x+y+z 3 ) 3 for any real numbers x, y and z, we obtain X is entry-wise power to 1/3, then we have which implies that 3 .
The theorem now follows because This completes the proof.

V. CONCLUSIONS
In general, the Schatten-p quasi-norm minimization is non-convex, non-smooth and even non-Lipschitz.
In addition, most existing algorithms are too slow or even impractical for large-scale problems, due to the SVD or EVD of the whole matrix in each iteration.Therefore, it is very important that how to transform such challenging problems into a simpler one, such as a smooth optimization problem.In this paper, we first presented and rigorously proved that for any p, p 1 , p 2 > 0 satisfying 1/p = 1/p 1 +1/p 2 , the Schatten-p quasi-norm of any matrix is equivalent to the minimization of the product of the Schatten-p 1 norm (or quasi-norm) and Schatten-p 2 norm (or quasi-norm) of two factor matrices.Especially, when p > 1/2, there is an equivalence between the Schatten-p quasi-norm (or norm) of any matrix and the Schatten-2p norms of its two factor matrices, e.g., X * = min X=U V T ( U 2 F + V 2 F )/2.That is, various Schatten-p quasinorm minimization problems with p > 1/2 can be transformed into the simpler one only involving the smooth norms of two factor matrices, which can naturally lead to simpler and more efficient algorithms than conventional methods.
We further extended the equivalence relationship of two factor matrices to the cases of three and more factor matrices, from which we can see that for any 0 < p < 1, the Schatten-p quasi-norm of any matrix is the minimization of the mean to the power of (⌊1/p⌋+1) of the Schatten-(⌊1/p⌋+1)p norms of all factor matrices.In other words, for any 0 < p < 1, the Schatten-p quasi-norm minimization can be transformed into an optimization problem only involving the smooth norms of multiple factor matrices.Finally, we provided some representative examples for two and three factor matrices.It is clear that the bi-nuclear and Frobenius/nuclear quasi-norms defined in our previous paper [1] and the tri-nuclear quasi-norm defined in our previous paper [2] are three important special cases.

) Remark 2 .
The detailed proof of Corollary 3 is given in Section IV-C.From Corollary 3, we know that Corollaries 1 and 2 can be viewed as two special cases of Corollary 3, i.e., p 1 = p 2 = 2p and p 1 = p 2 = 2, respectively.That is, Corollary 3 is the more general form of Corollaries 1 and 2. From the second equality in(9), we can see that, for any 0 < p ≤ 1, the Schatten-p quasi-norm (or norm) minimization problem can be transformed into the one of minimizing the weighted sum of the Schatten-p 1 norm (or quasi-norm) and Schatten-p 2 norm (or quasi-norm) of two much smaller factor matrices, where the weights of the two terms in the second equality of (9) are p 2 /(p 1 +p 2 ) and p 1 /(p 1 +p 2 ), respectively.

Remark 4 .
From Corollary 2, we know that for any 1/2 < p ≤ 1, the Schatten-p quasi-norm (or norm) of any matrix is equivalent to the minimization of the squared mean of the Schatten-2p norms of both factor matrices, as well as Corollary 4 for any 1/3 < p ≤ 1.In other worlds, if 1/2 < p ≤ 1 or 1/3 < p ≤ 1, the original Schatten-p quasi-norm (or norm) minimization problem can be transformed into a simpler one only involving the convex and smooth norms of two or three factor matrices.In addition, we extend the results of Corollary 2 and Corollary 4 to the case of more factor matrices, as shown in Corollary 5 below.The proof of Corollary 5 is very similar to that of Corollary 4 and is thus omitted.In other words, for any 0 < p < 1, the Schatten-p quasi-norm of any matrix can theoretically be equivalent to the minimization of the mean to the power of M of the Schatten-(Mp) norms of all M factor matrices, where M = (⌊1/p⌋+1) and ⌊1/p⌋ denotes the largest integer not exceeding 1/p.