Non-Convex Sparse and Low-Rank Based Robust Subspace Segmentation for Data Mining

Parsimony, including sparsity and low-rank, has shown great importance for data mining in social networks, particularly in tasks such as segmentation and recognition. Traditionally, such modeling approaches rely on an iterative algorithm that minimizes an objective function with convex l1-norm or nuclear norm constraints. However, the obtained results by convex optimization are usually suboptimal to solutions of original sparse or low-rank problems. In this paper, a novel robust subspace segmentation algorithm has been proposed by integrating lp-norm and Schatten p-norm constraints. Our so-obtained affinity graph can better capture local geometrical structure and the global information of the data. As a consequence, our algorithm is more generative, discriminative and robust. An efficient linearized alternating direction method is derived to realize our model. Extensive segmentation experiments are conducted on public datasets. The proposed algorithm is revealed to be more effective and robust compared to five existing algorithms.


Introduction
High dimensionality research for data mining is an essential topic in modern imaging applications, such as social networks and the Internet of Things (IoT). It is worth noting that data of high dimension is often supposed to reside in several subspaces of lower dimension. For instance, facial images with various lightning conditions and expressions lie in a union of a nine-dimensional linear subspace [1]. Moreover, moving motions in videos [2] and hand-written digits [3] can also be approximated by multiple low-dimensional subspaces. Inspiringly, these characteristics enable effective segmentation, recognition, and classification to be carried out. The problem of subspace segmentation [4] is formulated as determining the number of subspaces and partitions the data according to the intrinsic structure.
Many subspace segmentation algorithms have emerged in the past decades. Some of these methods are algebraic or statistical. Among the algebraic methods, generalized principal component analysis (GPCA) [5] is mostly widely used. GPCA characterizes the data subspace with the gradient of a polynomial, and segmentation is obtained by fitting the data with polynomials. However, the performance drops quickly in the presences of noise, and polynomial fitting computation is time consuming. As to the statistical algorithms, including random sample consensus (RANSAC) [6], factorization-based methods [7,8], and probabilistic principal component analysis (PPCA) [9], the estimation of exact subspace models exquisitely changed the performance of this type of methods. Recently, spectral-type methods [10] like sparse representation (SR) [7,11,12], low-rank representation better measurement of data redundancy. Thus, our new objective is to solve joint the l p -norm and Schatten p-norm (0 < p ≤ 1) minimization together. When p→0, our proposed l p SpSS turns to be more robust and effective than SR-and LRR-based subspace segmentation algorithms. In addition, we enforce non-negative constraint to the reconstruction coefficients, which aids interpretability and allows better solutions in numerous application areas such as text mining, computer vision, and bioinformatics. Traditionally, an alternating direction method (ADM) [40] can solve this optimization problem efficiently. However, to increase the speed and scalability of the algorithm, we choose an efficient solver commonly named the linearized alternating direction method with adaptive penalty (LADMAP) [19]. As it is based on fewer auxiliary parameters and without an inverse of its matrix, it is more efficient than ADM. Numerical experimental results verify our proposed method, which consistently obtains better segmentation results.
The rest of this paper is structured as follows: In Section 2, the notations, as well as the overview of SSC and LRR, will be presented. Section 3 is dedicated to introducing our novel non-convex sparse and low-rank based robust subspace segmentation. Section 4 conducts multiple numerical experiments to examine the effectiveness and robustness of l p SpSS. Section 5 concludes this work.

Background
This section is divided into three parts. First, the notation and definition are illustrated. The background of two algorithms, SSC and LRR, will be discussed in Sections 2.2 and 2.3, respectively.

Notations and Definitions ∈
Suppose X = [x 1 , · · ·, x N ] is an image matrix consists of N sufficiently dense data points is an arrangement of n subspaces. Let {x i } be drawn from n subspaces {S 1 ∪ S 2 · · · ∪S n } of lower dimension. Given X, the goal of subspace segmentation is to partition the data points into the underlying low-dimensional subspaces.
The l p -norm (0 < p < ∞) of vector x∈R n×1 can be expressed as x p = (∑ n 1 x i p ) 1/p , in which x i is the i-th element. Therefore, the p-norm of x∈R n×1 to the power p can be expressed as x p p = ∑ n 1 x i p .
The Schatten p-norm of a matrix x∈R n×m is expressed as: in which 0 < p ≤ 1, and σ i is the i-th largest singular value. Thus, it can be deduced that: The Schatten 1-norm is just nuclear norm |X| * , while the Schatten 0-norm is the approximation of the rank of X. Compared with |X| * , |X| S p is a better approximation of the rank of X.

Sparse Subspace Clustering
Recently, SSC [16] has grabbed considerable attention. The hypothesis states that data are drawn from several subspaces of lower dimension, and can be sparsely self-expressed. More formally, SSC aims to solve the following program: where Z i = [z i1 , z i2 , · · ·, z iN ] T ∈ R N are the reconstruction coefficients, and z 0 refers to the number of nonzero values. As it is difficult to solve this non-convex objective, a convex l 1 minimization problem is proposed by solving the following program: The minimization problem in Equation (4) can be concluded using the alternating direction method of multipliers (ADMM) [19]. Afterwards, the coefficient matrix Z can be utilized to construct the affinity matrix as W = |Z| + Z T . Finally, W is performed via spectral clustering and the segmentation result is drawn. While SSC works well in practice, the model is invalid when the obtained similarity graph is poorly connected (we refer readers to Soltanolkotabi et al. [41] for very recent results in this direction).

Low-Rank Representation-Based Subspace Segmentation
The difference between LRR and SSC is that, LRR seeks the lowest rank representation Z but not the sparsest representation. LRR is based on the assumption that for observed data X ∈ R d×N drawn from n low-dimensional subspaces, and the rank of coefficient matrix r = rank(Z) = ∑ n i=1 z i is assumed to be much smaller than min{d, N}. The LRR is formulated as: As the rank function minimization is non-convex, Equation (5) can be reformulated as the following convex minimization problem: in which Z * is the nuclear norm, which yields a good approximation to the matrix rank of Z. Singular value threshold (SVD) can be used to efficiently solve Equation (6), when there is no error present in X.
When the data X is noisy, an extension of LRR is proposed as follows: in which λ ≥ 0 is the tradeoff parameter, trading off low rankness between reconstruction error. E p ∈ R d×N is the noise term with different regularization strategies, which depends on the property of E p . When the noise term is Gaussian noise, E p = E 2 F , in which · F refers to the Frobenius norm. When the noise term are entry-wise corruptions, E p = E 1 , in which · 1 refers to the l 1 norm. When the noise term are sample-specific corruption and outliers, E p = E 2,1 , in refers to the l 2,1 norm. Equation (7) can be solved by ADMM [19] to obtain the coefficient matrix Z. Afterwards, the coefficient matrix Z can be utilized for the construction of affinity matrix W = |Z| + Z T . Finally, spectral clustering can be applied to W for segmentation results.

Non-Convex Sparse and Low-Rank Based Robust Subspace Segmentation
In this section, we first propose the non-convex sparse and low-rank-ased robust subspace segmentation model, in which we combine the l p -norm with the Schatten p-norm together for clustering, and then use LADMAP to solve l p S p SS. Finally, we analyze of the time complexity of l p SpSS.

Model of l p SpSS
We consider the non-convex sparse and low-rank-based subspace segmentation for data contaminated by noise and corruption. Notice that the nuclear norm is replaced by the Schatten p-norm, when p is smaller than 1, the underlying global information can be captured more effectively. Additionally, the l p norm (0 < p ≤ 1) of the coefficient matrix is also introduced as an error function, in order to harvest stronger robustness to noise [42]. It has been demonstrated in some recent research [43] that the Schatten p-norm is more powerful than the nuclear norm in matrix completion, and the recovery performance of the l p -norm is also superior to the convex l 1 -norm [36]. Our l p SpSS will surely be more effective than the convex methods.
We begin by considering the relaxed low-rank subspace segmentation problem, which is equivalent to: min In which, the first term Z p p is the l p -norm, which improves the integration of the local geometrical structure. Meanwhile, the second term Z p S p is the Schatten p-norm, which can better approximate the rank of Z. Moreover, the third term reconstruction error E 2,1 is the l 2,1 norm, which can better characterize errors like corruption and outliers. β and λ are trade-off parameters. Regarding the widely-used non-negative constraint (Z ≥ 0), which is to ensure direct use of the reconstruction coefficients in the affinity graph construction.

Brief Description of LADMAP
We adopt LADMAP [19] to solve the objective function (Equation (8)) constrained by the l p -norm norm and Schatten-p regularizers. An auxiliary variable W is introduced and the optimization problem becomes separable. Thus, Equation (8) is rewritten as: To remove two linear constraints in Equation (9), we introduce two Lagrange multipliers Y 1 and Y 2 , hence, the optimization problem is defined using the following Lagrangian function: where q(Z, W, E, Y 1 , and Y 2 are the Lagrange multipliers, and µ ≥ 0 is a trade-off parameter. We solve Equation (10) by minimizing L to update each variable with the other variables fixed. The updating schemes at each iteration can be designed as follows: In Equation (12), ∇ Z q is the partial differential of q with respect to Z, θ = X 2 F . In particular, the detailed procedures of LADMAP are shown in Algorithm 1. The first, Equation (11), and the second, Equation (12), are solved using the following subsections. The last convex problem (Equation (13)) can solved by the l 2,1 -norm minimization operator [15]. Algorithm 1. LADMAP for solving Equation (9).

End while Output:
The optimal solution W, Z and E.

Solving the Non-Convex l p -Norm Minimization Subproblem (Equation (11))
For each element in X i j (i, j) ∈ Ω , we can decouple Equation (11) into a simplified formula: Recently, Zhang et al. solved this l p -norm optimization problem via the proposed GIST [30]. For l p -norm minimization, the thresholding function is: Meanwhile, the generalized soft-thresholding function is: The corresponding thresholding rule in the generalized soft-thresholding function is (12)) We can reformulate Equation (12) as the following simplified notation: After applying SVD on X, X is decomposed into summation of r rank-p matrices X = U∆V T . Here, U is the left singular vector, ∆ is the non-zero singular diagonal matrix, and V is the right singular vector. The i-th singular value δ i is solved by:

Solving the Non-Convex Schatten p-Norm Minimization Subproblem (Equation
Equation (16) can be used to solve Equation (18) again. For p = 1, we can obtain the same solution with nuclear norm minimization [44].

Convergence and Computational Complexity Analysis
Although Algorithm 1 is described in three major alternative steps for solving W, Z, and E, we can actually combine steps for Z and E easily into one larger block step by simultaneously solving for (Z, E). Thus, the convergence conclusion of two variables LADMAP in [45] can be applied to our case. Finally, the convergence of the algorithm is ensured.
Suppose the size of X are d × n, k is number of total iterations, and r is the lowest rank for X. The major time consumption of Algorithm 1 is mainly determined by Step 2, as it involves time-consuming SVDs. In Step 1, each component of ∇ Z k q can be computed in O(rn 2 ) by using the skinny SVD to update W. In Step 2, the complexity of the SVD to update Z is approximately O(d 2 n). In Step 3, the computation complexity of l 2,1 minimization operator is about O(dn). The total complexity is, thus, O(krn 2 + kd 2 n). Since r ≤ min(d, n), the time cost is, at most, O(krn 2 ).

Affinity Graph for Subspace Segmentation
Once Equation (9) was solved by LADMAP, we can obtain the optimal coefficient matrix Z * . Since every sample is reconstructed by its neighbors, Z* naturally characterizes the relationships among samples. Such information is a good indicator of similarity among samples, we use the reconstruction coefficients to build the affinity graph. The non-convex l p -norm ensures that each sample only connects to few samples. As a result, the weights of the graph tend to be sparse. While the non-convex Schatten p-norm ensures samples lying in the same subspace are highly correlated and tend to be assigned into the same cluster, Z* is theoretically able to capture the global information, and the graph weights are constrained with non-negativity, as they reflect similarities between data points.
After obtaining the coefficient matrix Z * , the reconstruction coefficients of each sample are normalized and thresholded to zero. Therefore, the obtained normalized sparseẐ * can be used to compute the affinity graph W = (Ẑ * + (Ẑ * ) T )/2. Finally, W carries out spectral clustering to obtain the segmentation results. Our proposed non-convex sparse and low-rank based subspace segmentation is outlined in Algorithm 2.
Solve the non-convex sparse and low-rank constrained program by Algorithm 1: Normalize coefficient matrixZ * , and threshold small values by θ to obtainẐ * .

5.
The data is segmented by spectral clustering.

Output:
The segmentation results.

Experimental Evaluation and Discussion
In the following, we will discuss the performance of our proposed l p SpSS model. Firstly, the experimental setting is detailed in Section 4.1. From Section 4.2, Section 4.3, Section 4.4, we will test the segmentation performance of l p SpSS on CMU-PIE, COIL20, and USPS. In Section 4.5, we will examine the robustness of l p SpSS to block occlusions and pixel corruptions. Finally, the discussion of experimental results will be given in Section 4.6.

Experimental Settings
The proposed l p SpSS approach will be evaluated on realistic images and compared with five related works. We use four publicly-available datasets, including CMU-PIE [46], COIL20 [47], USPS [48], and Extended Yale B [49]. Among them, datasets [46] and [49] contain face images with various poses/illuminations/facial expressions, COIL20 consists of different general objects, and USPS includes handwritten digit images. Our proposed l p SpSS will be compared with five segmentation methods, including PCA, SSC [16], LRR [15], and NNLRS [50], while K-means serves as a baseline for comparison.
We adopt the same experimental settings as Zhuang's work [50]. For the compared methods, a grid search strategy is used for selecting model parameters, and the optimal segmentation is achieved by tuning the parameters carefully. As to our l p SpSS, there are two regularized parameters, β and λ, affecting its performance. We take a stepwise selection strategy to search the best parameters. For example, we search the possible candidate interval λ may exit, with β fixed, and alternatively search λ's most possible candidate interval, with β fixed. Finally, the best values are found in a two-dimensional candidate space of (β, λ).
To quantitatively and effectively measure the segmentation performance, two quantity metrics, namely accuracy (AC) and normalized mutual information (NMI) [51], are used in our experiments. All the experiments are implemented by MATLAB, on a MacBook Pro with a 2.6 GHz Intel Core i7 CPU and 16 GB memory.

Segmentation Results on CMU-PIE Database
In this experiment, we compare l p SpSS with the other five methods on the CMU-PIE facial images dataset. It includes 41,368 pictures of 68 persons, acquired with various postures and lighting scenarios. The resolution of each image is 32 × 32 = 1024 pixels. Typical examples of CMU-PIE are shown in Figure 1. For each given cluster number K = 4,..., 68 in the whole dataset, the segmentation results with different K were averaged on the twenty tests. The averaged segmentation performance of proposed and existing algorithms on the CMU-PIE dataset [46] are reported in Table 1.

Experimental Evaluation and Discussion
In the following, we will discuss the performance of our proposed lpSpSS model. Firstly, the experimental setting is detailed in Section 4.1. From Sections 4.2-4.4, we will test the segmentation performance of lpSpSS on CMU-PIE, COIL20, and USPS. In Section 4.5, we will examine the robustness of lpSpSS to block occlusions and pixel corruptions. Finally, the discussion of experimental results will be given in Section 4.6.

Experimental Settings
The proposed lpSpSS approach will be evaluated on realistic images and compared with five related works. We use four publicly-available datasets, including CMU-PIE [46], COIL20 [47], USPS [48], and Extended Yale B [49]. Among them, datasets [46] and [49] contain face images with various poses/illuminations/facial expressions, COIL20 consists of different general objects, and USPS includes handwritten digit images. Our proposed lpSpSS will be compared with five segmentation methods, including PCA, SSC [16], LRR [15], and NNLRS [50], while K-means serves as a baseline for comparison.
We adopt the same experimental settings as Zhuang's work [50]. For the compared methods, a grid search strategy is used for selecting model parameters, and the optimal segmentation is achieved by tuning the parameters carefully. As to our lpSpSS, there are two regularized parameters, β and λ, affecting its performance. We take a stepwise selection strategy to search the best parameters. For example, we search the possible candidate interval λ may exit, with β fixed, and alternatively search λ's most possible candidate interval, with β fixed. Finally, the best values are found in a twodimensional candidate space of (β, λ).
To quantitatively and effectively measure the segmentation performance, two quantity metrics, namely accuracy (AC) and normalized mutual information (NMI) [51], are used in our experiments. All the experiments are implemented by MATLAB, on a MacBook Pro with a 2.6 GHz Intel Core i7 CPU and 16 GB memory.

Segmentation Results on CMU-PIE Database
In this experiment, we compare lpSpSS with the other five methods on the CMU-PIE facial images dataset. It includes 41,368 pictures of 68 persons, acquired with various postures and lighting scenarios. The resolution of each image is 32 × 32 = 1024 pixels. Typical examples of CMU-PIE are shown in Figure 1. For each given cluster number K = 4,..., 68 in the whole dataset, the segmentation results with different K were averaged on the twenty tests. The averaged segmentation performance of proposed and existing algorithms on the CMU-PIE dataset [46] are reported in Table 1.   We can see that our proposed l p SpSS achieves the best segmentation AC and NMI on CMU-PIE dataset, which proves the effectiveness of our l p SpSS. For example, the average segmentation accuracy of NNLRS and l p SpSS are 84.1% and 89.8%, respectively. l p SpSS improves the segmentation accuracy by 5.7% compared with NNLRS (the second best algorithm). The improvement of l p SpSS indicates the importance of the non-convex SR and LRR affinity graph.

Segmentation Results on COIL20 Database
When it comes to the evaluation using second dataset COIL20 [49], the proposed l p SpSS is compared with five existing algorithms. This dataset contains 1440 images of 20 objects, with 72 different views. The resolution of each picture is 32 × 32 = 1024 pixels. Typical examples of COIL20 are shown in Figure 2. For each given cluster number K = 2,...,20 in the whole dataset, the segmentation results with different K were averaged on the twenty tests. The averaged segmentation performances of the proposed and existing algorithms on the COIL20 dataset [47] are reported in Table 2.  We can see that our proposed lpSpSS achieves the best segmentation AC and NMI on CMU-PIE dataset, which proves the effectiveness of our lpSpSS. For example, the average segmentation accuracy of NNLRS and lpSpSS are 84.1% and 89.8%, respectively. lpSpSS improves the segmentation accuracy by 5.7% compared with NNLRS (the second best algorithm). The improvement of lpSpSS indicates the importance of the non-convex SR and LRR affinity graph.

Segmentation Results on COIL20 Database
When it comes to the evaluation using second dataset COIL20 [49], the proposed lpSpSS is compared with five existing algorithms. This dataset contains 1440 images of 20 objects, with 72 different views. The resolution of each picture is 32 × 32 = 1024 pixels. Typical examples of COIL20 are shown in Figure 2. For each given cluster number K = 2,...,20 in the whole dataset, the segmentation results with different K were averaged on the twenty tests. The averaged segmentation performances of the proposed and existing algorithms on the COIL20 dataset [47] are reported in Table 2.    Experimental results on COIL20 indicates that our l p SpSS outperforms the other five existing algorithms. For example, the average AC and NMI of l p SpSS are 90.2% and 91.0%, which are higher than for NNLRS (the second best algorithm) by 2.0% and 1.5%, respectively. Especially, when the cluster number is large, the superiority of l p SpSS is very obvious.

Segmentation Results on USPS Handwritten Digit Dataset
In this experiment, we compare l p SpSS with the other five methods on the USPS handwritten digit dataset. It contains 9298 images of 10 classes, with a variety of orientations. The resolution of each picture is 16 × 16 = 256 pixels. Figure 3 shows typical sample images in the USPS dataset. For each given cluster number K = 2,...,10, the segmentation results with different K were averaged on the twenty tests. The averaged segmentation performances of proposed and existing algorithms on the USPS handwritten digit dataset are reported in Table 3. Experimental results on COIL20 indicates that our lpSpSS outperforms the other five existing algorithms. For example, the average AC and NMI of lpSpSS are 90.2% and 91.0%, which are higher than for NNLRS (the second best algorithm) by 2.0% and 1.5%, respectively. Especially, when the cluster number is large, the superiority of lpSpSS is very obvious.

Segmentation Results on USPS Handwritten Digit Dataset
In this experiment, we compare lpSpSS with the other five methods on the USPS handwritten digit dataset. It contains 9298 images of 10 classes, with a variety of orientations. The resolution of each picture is 16 × 16 = 256 pixels. Figure 3 shows typical sample images in the USPS dataset. For each given cluster number K = 2,...,10, the segmentation results with different K were averaged on the twenty tests. The averaged segmentation performances of proposed and existing algorithms on the USPS handwritten digit dataset are reported in Table 3.  Table 3 shows that our proposed lpSpSS still obtains the best segmentation performance. This result demonstrates that a non-convex sparse and low-rank graph is better to model complex related data than traditional SR and LRR based graphs. Experimental results have demonstrated that our proposed lpSpSS model can not only represent the global information, but also preserves the local geometrical structures in the data by incorporating the non-convex lp-norm regularizer.   Table 3 shows that our proposed l p SpSS still obtains the best segmentation performance. This result demonstrates that a non-convex sparse and low-rank graph is better to model complex related data than traditional SR and LRR based graphs. Experimental results have demonstrated that our proposed l p SpSS model can not only represent the global information, but also preserves the local geometrical structures in the data by incorporating the non-convex l p -norm regularizer.

Segmentation Results on Dataset with Block Occlusions and Pixel Corruptions
Finally, we evaluate the robustness of each model on the more challenging Extended Yale B face dataset. It has 38 × 64 facial images, with various lighting scenarios. To reduce the resources and budget, the resolution of each picture is downsized to 96 × 84. This dataset more challenging for subspace segmentation, as 50% of the samples with hard shadows or specularities. Figure 4 shows typical sample images of Extended Yale B.

Segmentation Results on Dataset with Block Occlusions and Pixel Corruptions
Finally, we evaluate the robustness of each model on the more challenging Extended Yale B face dataset. It has 38 × 64 facial images, with various lighting scenarios. To reduce the resources and budget, the resolution of each picture is downsized to 96 × 84. This dataset more challenging for subspace segmentation, as 50% of the samples with hard shadows or specularities. Figure 4 shows typical sample images of Extended Yale B.  Table 4 as well as Table 5.   Both experimental results show that our lpSpSS achieves the best segmentation results again. Segmentation results suggest that proposed lpSpSS is more robust than compared methods, especially when a significant portion of the realistic samples are corrupted.  Table 4 as well as Table 5.

Segmentation Results on Dataset with Block Occlusions and Pixel Corruptions
Finally, we evaluate the robustness of each model on the more challenging Extended Yale B face dataset. It has 38 × 64 facial images, with various lighting scenarios. To reduce the resources and budget, the resolution of each picture is downsized to 96 × 84. This dataset more challenging for subspace segmentation, as 50% of the samples with hard shadows or specularities. Figure 4 shows typical sample images of Extended Yale B.  Table 4 as well as Table 5.   Both experimental results show that our lpSpSS achieves the best segmentation results again. Segmentation results suggest that proposed lpSpSS is more robust than compared methods, especially when a significant portion of the realistic samples are corrupted.   Both experimental results show that our l p SpSS achieves the best segmentation results again. Segmentation results suggest that proposed l p SpSS is more robust than compared methods, especially when a significant portion of the realistic samples are corrupted.

Discussions
Our l p SpSS outperforms five existing subspace segmentation algorithms. Especially, in the case of CMU-PIE face dataset, the improvement by l p SpSS is largest. Our affinity graph can capture the local geometrical structure, as well as the global information of the data, hence, is both generative and discriminative.
l p SpSS is more robust than the other compared methods, which can properly deal with multiple noises. Images in Extended Yale B dataset contain different errors, including block occlusions, pixel corruptions, illuminations, partition them is challenging. However, l p -norm and Schatten p-norm are introduced for l p SpSS affinity graph construction. Therefore, our model can better predict errors and is a better measurement for data redundancy.
The segmentation performance of SSC and LRR are almost the same. For example, the segmentation accuracy of SSC on CMU-PIE is 0.9% better than LRR, while the performance of LRR on USPS is 0.7% higher than SSC. The segmentation results heavily depends on the intrinsic structure of the testing dataset, it is difficult to determine which one is better.
The LRR-based algorithm is robust in handling noisy data. It aims at obtaining the low rankness of coefficient matrix, thus, the LRR-based methods can better model the global information. Furthermore, LRR can find similar clusters which measure the data redundancy, ensuring high quality and stability of the segmentation results. For data heavily contaminated with corruptions or outliers, the model can find lower ranks that will be more robust to noise.
However, SVD computation for Schatten p-norm minimization is performed in each iteration, which is very time consuming, and the best segmentation results are not achieved at the lowest value of p. Hence, we will be focused on the study of speeding up the Schatten p-norm solver and the selection of best p values in our future work.

Conclusions
This paper presents an accurate and robust for subspace segmentation, named l p SpSS, by introducing the non-convex l p -norm and Schatten p-norm minimization. Taking advantages from the original sparsity and low rankness of data of high dimension, both local geometrical structure and the global information of the data can be learnt. A linearized alternating direction method with adaptive penalty (LADMAP) is also introduced to search for optimal solutions. Numerous experiments on CMU-PIE, COIL20, USPS, and Extended Yale B verify the effectiveness and robustness of our l p SpSS compared to five existing works.