Accelerated Stochastic Variance Reduction Gradient Algorithms for Robust Subspace Clustering

Robust face clustering enjoys a wide range of applications for gate passes, surveillance systems and security analysis in embedded sensors. Nevertheless, existing algorithms have limitations in finding accurate clusters when data contain noise (e.g., occluded face clustering and recognition). It is known that in subspace clustering, the ℓ1- and ℓ2-norm regularizers can improve subspace preservation and connectivity, respectively, and the elastic net regularizer (i.e., the mixture of the ℓ1- and ℓ2-norms) provides a balance between the two properties. However, existing deterministic methods have high per iteration computational complexities, making them inapplicable to large-scale problems. To address this issue, this paper proposes the first accelerated stochastic variance reduction gradient (RASVRG) algorithm for robust subspace clustering. We also introduce a new momentum acceleration technique for the RASVRG algorithm. As a result of the involvement of this momentum, the RASVRG algorithm achieves both the best oracle complexity and the fastest convergence rate, and it reaches higher efficiency in practice for both strongly convex and not strongly convex models. Various experimental results show that the RASVRG algorithm outperformed existing state-of-the-art methods with elastic net and ℓ1-norm regularizers in terms of accuracy in most cases. As demonstrated on real-world face datasets with different manually added levels of pixel corruption and occlusion situations, the RASVRG algorithm achieved much better performance in terms of accuracy and robustness.


Introduction
Subspace clustering aims to find groups of similar objects or clusters, which usually exist in low-dimensional subspaces.With the devolvement of artificial intelligence and the popularity of computer vision applications such as face recognition and clustering [1,2], motion segmentation [3] and document analysis [4], subspace clustering has attracted more attention in recent years, especially mask-occluded face recognition due to COVID-19 in embedded sensors.Samples of different classes can be approximated well by data from a union of low-dimensional subspaces.In practice, we perform the task of dividing the data points based on their similarity into various subspaces.Subspace clustering is one subcategory of clustering which gathers data into different groups until each group consists of data points from the same subspace only.For subspace clustering, there are series of related methods being developed, such as statistical, iterative and algebraic methods, spectral clustering and deep learning algorithms [5][6][7][8][9][10].
Compared with other techniques, the methods based on spectral clustering have become increasing popular because of convenient implementation, complete theoretical support and reliable accuracy [11].The key of these methods is to adopt an ℓ 1 , ℓ 2 -norm or elastic net (i.e., a mixture of ℓ 1 -and ℓ 2 -norms) regularizer to solve an optimization problem for obtaining an affinity matrix and making further use of spectral clustering with the matrix.Each point from the union of subspaces can be represented as a linear combination of other data points in the subspace [12], which is called the self-expressiveness property.It can be formulated as follows: x j = Xc j and c jj = 0. (1) In fact, Equation ( 1) is equivalent to the following form: where X = [x 1 , x 2 , . . . ,x N ] ∈ R D×N is the data matrix, whose jth column corresponds to the sparse representation of x j , C = [c 1 , c 2 , . . . ,c N ] ∈ R N×N is the coefficient matrix, whose jth column corresponds to c j , c jj is the jth element of c j and diag(C) ∈ R N is the vector of the diagonal elements of C.Although multiple sets of solutions for C may be found rather than a unique solution, such a solution under the condition that if c ij ̸ = 0, then x i belongs to the same subspace as x j still exists.Due to the reservation of subspace clustering, these solutions are called subspace-preserving.If a subspace-preserving C exists, and the connection between a pair of points x i and x j can be established in an affinity matrix W (i.e., w ij = c ij + c ji ), then one can cluster the data points by applying spectral clustering [13] to the affinity matrix.
In order to obtain the subspace-preserving matrix C, one effective method is to regularize C and solve the following minimization problem: where ∥•∥ 1 is the ℓ 1 -norm (i.e., ∥C∥ 1 = ∑ n i=1 ∑ n j=1 |c ij |).Here, the ℓ 1 -norm can be replaced by the ℓ 2 -norm (i.e., ∥C∥ 2 = ∑ n i=1 ∑ n j=1 c 2 ij ).Due to the choice of regularizers for the coefficient matrix C, there are different subspace clustering algorithms.For instance, the sparse subspace clustering (SSC) method [12] applies the ℓ 1 -norm to find the coefficient matrix C. Previous work has indicated that SSC can provide a subspace-preserving solution under certain circumstances, where the subspaces are independent [14] or the data in different subspaces meet some separation conditions, and the data in the same subspaces distribute well [14,15].Similar conclusions are also obtained when data are corrupted by noise [16] or outliers [17].Least squares regression [18] uses the ℓ 2 -norm regularizer on the matrix C. Low-rank representation [19] applies the nuclear norm regularizer to C to retain its low-rank property.Moreover, the authors of [17,20,21] utilized the elastic net regularizer to induce a sparse matrix C.
In recent years, many subspace clustering methods have greatly promoted the development of SSC algorithms.However, the performance of these methods is mostly evaluated in a case where the datasets are clean, ignoring the existence of potential noise in reality.On the other hand, many algorithms require additional procedures to evaluate noises and remove them.For instance, the authors of [22] required principal components analysis (PCA) to be performed on the data for dimensionality reduction and noise reduction, and the authors of [17] modeled and removed the outliers for further clustering.These methods have a strong dependence on the cleanliness of the data.Therefore, in consideration of the above reasons, the actual performance of existing methods on real-world datasets is not always satisfied.
In this paper, we propose a robust accelerated stochastic variance reduction gradient (RASVRG) method and present its efficient implementation for self-expressiveness-based subspace clustering problems.Our algorithms can be directly applied to SSC problems with ℓ 1 -norm and elastic-net regularizers on datasets that may be corrupted by potential noise and achieve superior clustering accuracy and efficiency compared with existing popular algorithms, demonstrating the excellent performance and strong robustness of the RASVRG.
The key accelerated technique in the RASVRG is the snapshot momentum proposed in our previous work [23,24].We introduce the momentum acceleration technique into the proximal stochastic variance reduction gradient (Prox-SVRG) method [25].The proposed RASVRG algorithms require tracking only one variable vector in the inner loop, which means that its computational time and memory overhead are exactly the same as those of the SVRG [26] and Prox-SVRG [25].Thus, our RASVRG algorithms have much lower per iteration complexity than other accelerated methods (e.g., Katyusha [27]), which means that the RASVRG is more suitable for large-scale SSC problems [28], especially large-scale robust face clustering problems.To the best of our knowledge, this work is the first one to propose faster stochastic optimization algorithms instead of deterministic methods to solve various SSC problems, including robust face clustering.
We summarize the major contributions of this paper as follows: 1.
Faster convergence rates: Our RASVRG obtains the oracle complexity O((D+ √ Dκ) log(1/ϵ)) for strongly convex (SC) subspace clustering problems (e.g., the elastic net regularized face clustering problem), which is the best oracle gradient complexity, as pointed out in [29], where κ is the condition number of the objective function.For subspace clustering problems which are not strongly convex (non-SC) (e.g., the ℓ 1 -norm regularized problem), the RASVRG achieves an optimal convergence rate O(1/S 2 ), where S is the number of epochs; that is, the RASVRG is much faster than existing stochastic and deterministic algorithms, such as the Prox-SVRG [25].

2.
Better accuracy: Both in theory and in practice, our algorithms can generally yield better performance than existing state-of-the-art methods for solving problems with the ℓ 1 -norm or elastic net regularizer, while most existing methods are greatly influenced by the choice of the regularizer.

3.
More robust: Our RASVRG obtains much better performance compared with other algorithms on real-world datasets with different manually added levels of random pixel corruption or unrelated block occlusion for simulating potential real-world noise, while existing methods may be seriously deteriorated by such strong noise.

4.
Extension to more applications: Our RASVRG requires tracking only one variable vector in the inner loop, which means its computational cost and memory overhead are exactly the same as the SVRG [26] and Prox-SVRG algorithms.This feature allows the RASVRG to be extended to other real-world clustering applications and more settings such as the sparse and asynchronous setting, which can significantly accelerate the speed of the RASVRG.
The rest of this paper is organized as follows.In Section 2, we discuss some related works concerning sparse subspace clustering and stochastic optimization methods.Section 3 proposes two efficient RASVRG algorithms for solving both strongly convex and non-strongly convex models and analyzes their convergence properties for ℓ 1 -norm and elastic net regularized SSC problems.In Section 4, we exhibit the practical performance of the RASVRG for subspace clustering tasks on synthetic and real-world face datasets.Section 5 concludes this paper and discusses future work.

Related Works
In this section, we briefly overview some related works concerning sparse subspace clustering and stochastic optimization.

Sparse Subspace Clustering
In this part, we briefly overview sparse subspace clustering (SSC) [22].Let In addition, X k ∈ R D×N k is a submatrix of X and N k points in the subspace S k which satisfies ∑ K k=1 N k = N.Each point from a random subspace S k can be represented by a linear combination of at most (N−N k ) other points from other subspaces.Therefore, we can find it by solving the following optimization problem: min ∥c j ∥ 0 , s.t., x j = Xc j , c jj = 0 (4) where c j = [c j1 , c j2 , • • • , c jN ] T ∈ R N are the coefficient vectors, and ∥c j ∥ 0 counts the number of nonzero entries in vector c j .Since this is an NP-hard problem, the authors of [12] relaxed this problem and solved the following problem: where ∥c j ∥ 1 = ∑ N i=1 c ji is the ℓ 1 -norm of c j ∈ R N .For all the data points i = 1, . . ., N, the optimization problem (Equation ( 5)) can be expressed in matrix form: where C = [c 1 , c 2 , . . . ,c N ] ∈ R N×N is the coefficient matrix and each column c i is the sparse representation vector of x i .Equations ( 4) and ( 5) have attracted much attention in the fields of compressed sensing [30,31], subspace clustering [5] and face recognition [32].In fact, the two solutions are the same under certain conditions.However, the results of compressed sensing may not be suitable for the subspace clustering problem, since the solution for C is not necessarily unique as the columns of X lie in a union of subspaces.When the matrix C is obtained, spectral clustering [33] is adopted in the affinity matrix W = |C| + |C| T for clustering.
Furthermore, we also consider the case where the data points from a union of linear subspaces contain a certain amount of noise.More specifically, the jth data point contaminated with noise ζ j is represented by x j = x j +ζ j , where ζ j satisfies the condition of ζ j 2 ≤ ϵ and ∥•∥ 2 is the Euclidean norm.We can find the sparsest solution of the following problem to obtain the sparse representation of x j with a given error tolerance ϵ: However, we cannot acquire the scale of the noise ζ j in most instances.Under this circumstance, the Lasso optimization algorithm [34] can be applied to obtain the sparse representation in the following form: where γ ≥ 0 is a constant parameter.
In addition, potential connectivity issues in the representation graph may exist [35] (i.e., over-segmentation problems).This phenomenon is caused by sparsity of the representation matrix C computed from Equation (8).In order to promote connections between data points, the authors of [17,21] used the elastic net regularizer to solve the sparse coefficient matrix C as follows: where λ ∈ [0, 1] determines the trade-off between sparseness (from the ℓ 1 -norm regularizer) and connectivity (from the ℓ 2 -norm regularizer).In particular, when λ is extremely close to one, the performance of the elastic net approaches is much better than that of the method based on the ℓ 1 -norm.The purpose of the ℓ 2 -norm regularizer is to enhance the connectivity between data points; that is, in the case of a relatively small λ, there exist more nonzero elements in the matrix C.

Stochastic Methods
All the algorithms mentioned above for SSC are deterministic methods, and their per iteration complexity is O(ND), which is expensive for extremely large N values.Recently, stochastic gradient descent (SGD) has been successfully applied to many large-scale machine learning problems due to its significantly lower per iteration complexity O(D).SGD only requires one (or a small batch of) component function(s) per iteration to form an estimator of the full gradient.However, the variance of the stochastic gradient estimator may be large [26], which leads to slow convergence and poor performance.

Accelerated Stochastic Variance Reduced Gradient Algorithms for Sparse Subspace Clustering
In this section, we propose a new robust accelerated stochastic variance reduced gradient (RASVRG) method for solving sparse representation problems such as SSC.For elastic net regularized problems, we present a strongly convex (SC) version of the RASVRG (RASVRG SC ) which has the best-known linear convergence rate.Moreover, we also provide a non-strongly convex (NSC) version (RASVRG NSC ) which attains the fastest convergence rate O(1/S 2 ).
Compared with stochastic variance reduced gradient methods (e.g., SVRG [26] or SAGA [43]), most existing accelerated methods have improved convergence rates while having complex coupling structures, which leads to slow convergence in practice [24].Thus, we propose a robust accelerated stochastic variance reduced gradient (RASVRG) method for both strongly convex and non-strongly convex problems.This means that the RASVRG can solve the SSC problem [14] based on the ℓ 1 -norm regularizer and elastic net regularizer [21].We focus on the following convex optimization problem with a finite-sum structure, which is a common problem in machine learning and statistics: min Here, the symbol α stands for the parameter, and N is the dimension of α.Meanwhile, and it is σ strongly convex if for all α, β ∈ R d , we have where G ∈ ∂ f (α) is a sub-gradient of f at α.We make the following two assumptions to categorize Equation ( 10): Assumption 1 (Strongly convex).In Equation (10), each f i (•) is L-smooth and convex, and g(•) is σ strongly convex.

RASVRG SC for Elastic Net Regularized SSC
In this subsection, we consider the elastic net regularized SSC problem, which is strongly convex.Inspired by our previous work [23], we present the RASVRG SC , shown in Algorithm 1, as a solver together with an active set-based optimization framework [21] (i.e., Oracle Guided Elastic Net (ORGEN)): We apply our RASVRG SC to compute c * (b, X).As we can see in step 5 of Algorithm 1, y is a convex combination of c and c with the momentum parameter θ.In other words, our algorithm uses the momentum acceleration technique proposed in our previous work [23].
We just need to keep track of y in a single inner loop and have a fancy update for c (see step 10), which is efficient but simple in implementation.Note that temporary variables w 1 and w 2 are also introduced into Algorithm 1, which will make our algorithm statement clearer.
Of course, we can rewrite the algorithm and keep track of only one vector per iteration during implementation.Below, we give the convergence analysis of the RASVRG SC .
Input: Initial vector c 0 , train matrix A ∈ R D×(N−1) , b ∈ R D , epoch length m, learning rate η and parameters θ and γ.
Pick a row i j ∈ {1 . . .D} randomly from A and assign it to a T ; 5: 7: end for Theorem 1 (Strongly convex).Let c * be the optimal solution for Equation (13).Suppose that Assumption 1 holds.Then, by choosing m = Θ(D), the RASVRG SC achieves an ϵ-additive error with the following oracle complexity in expectation: The proof of this theorem is similar to that in our previous work [23], and thus it is omitted here.Similar to the analysis in our previous work [23], the overall oracle complexity of the RASVRG SC is O (D+ √ κD) log 1 ϵ .This result indicates that under strongly convex conditions, the RASVRG SC has the best-known oracle complexity for stochastic accelerated algorithms (e.g., APCG [37], SPDC [39] and Katyusha [27]).As analyzed in [23], the RASVRG SC has only one variable y, while most existing accelerated methods, including Katyusha, require two additional variables.Therefore, the RASVRG SC has a faster convergence speed than them in practice.
We applied the RASVRG SC to the ORGEN framework, which is an efficient method that can handle large-scale datasets.We could find the optimal c * of each column of X.The basic idea of ORGEN is to solve a sequence of reduced-scale subproblems defined by an active set.Here, we introduce the ORGEN-RASVRG SC algorithm, as shown in Algorithm 2.
Algorithm 2 ORGEN-RASVRG SC .By solving a series of reduced-scale problems in step 2 of Algorithm 2, we can address large-scale data efficiently.Next, we define the subspace clustering problem with the ORGEN-RASVRG SC .Let X ∈ R D×N be a real value matrix whose columns are drawn from a union of n subspaces of R D , where each x i is normalized to be an ℓ 2 -norm unit.Here, we use A = X j to denote the matrix X without the jth column.The goal of subspace clustering is to segment each column of X into their corresponding subspaces by finding a sparse representation of each point in terms of other points.The sparse subspace clustering procedure of the ORGEN-RASVRG SC is shown in Algorithm 3. The vector c * j ∈ R N (i.e., the jth column of C * ∈ R N×N ) is computed by the ORGEN-RASVRG SC .After C * was computed, and by using spectral clustering for the matrix W = |C * | + C * ⊤ , we then obtained the segmentation result of X.

RASVRG NSC for ℓ 1 -Norm Regularized SSC
In this subsection, we consider Equation ( 13) with λ = 1, also known as the Lasso problem, which is a non-strongly convex problem: The RASVRG NSC shown in Algorithm 4 can achieve a convergence rate of O 1/S 2 .
Pick a row i j ∈ {1 . . .D} randomly from A and assign it to a T ; 5: c s j = sign(w 2 ) max{|w 2 | − 1 γ η, 0}; The proof of Theorem 2 is similar to our previous work [23], and thus it is omitted here.This result indicates that the RASVRG NSC attained the optimal convergence rate of O 1/S 2 , where each epoch included m + D stochastic iterations.We also applied Algorithm 4 to solve the SSC problem.The sparse subspace clustering procedure of the RASVRG NSC is shown in Algorithm 5.
We will test whether the RASVRG NSC works well under the general experimental framework as the algorithm in [22] did.Moreover, we have an intuitive demonstration of the optimization ability of the RASVRG NSC for computing c * j by recovering the corrupted image.We simply computed the matrix multiplication of X j and c * j to find the restoration of the corrupted image, as shown in Figure 1.All the results show that our RASVRG NSC performed well for restoration of the corrupted image.For example, we chose an image from the dataset and manually added random pixel corruption to the image with corruption rates ρ = 0.3 and 0.6, as shown in Figure 1b,c.Then, the image was resized to the vector b, and the whole dataset without the chosen image was A. By using the RASVRG NSC , the recovery vector was then computed as the matrix multiplication of A and c * .Eventually, the recovery image was resized to a recovered vector.As we can see in Figure 1d,e, our algorithm could recover the damaged image excellently, which shows that the RASVRG NSC enjoys good robustness intuitively.

Experimental Results
In this section, we evaluate the efficiency, clustering accuracy and robustness of the RASVRG on many synthetic and real-world face datasets.

Experimental Set-Up
Datasets.Firstly, the synthetic datasets were generated as follows.Each element in the matrix X ∈ R D×N was independent and had identical Gaussian distributions.When solving the high underdetermination problem, we found that the performance of all the algorithms was not good, and there was no convergence.For this reason, we generated highly overdetermined data X for sparse subspace clustering.For more details, refer to [14].Test sample b could be obtained by b = Xc 0 , where the sparsity of c 0 was set to p, which means the number of nonzero entries in c 0 was p×∥c 0 ∥ 0 .Each entry of c 0 was generated in the interval [−10, 10] according to a uniform distribution.Under the conditions p = 0.1 and λ =1e−6, the sparsity of c 0 was set to p = 0.1, which was for consistency with the other literature [14] to compare the results.We computed the relative error of c 0 as a function of time and the number of passes.
Secondly, the AR face database included more than 4000 frontal images 165×120 in size under different illumination changes, expressions and facial disguises.In order to save computational time, we downsampled the face images and reduced the number of individuals in the experiments.We randomly chose a subset with no more than 15 individuals, and each individual had 26 face images, which were downsampled to 32×32 pixels.
Thirdly, the extended Yale B database contains 2414 images [44].The face images of each individual were taken under different illumination conditions and cropped to images [45] 192×168 in size.We randomly chose 10 persons, and each individual had 64 images.Each image was manually downsampled to a size of 42×48.
On these three datasets, we compared the clustering accuracy and running time of our algorithms with those of state-of-the-art algorithms under different conditions.For the AR face database, we manually added different levels of random, unrelated block occlusion to measure the performance in the framework of the elastic net.And for the extended Yale B database, different levels of random pixel corruption were added to the images, and the performance was measured in the framework of the ℓ 0 /ℓ 1 -norm regularizer.
All of the experiments were performed on a computer with a CPU Intel i7-7700K, Windows 7 operating system and 40 GB of memory.We used Matlab and C++ to each of our clustering tasks.
Parameter Settings.In face clustering tasks, the regularization parameter γ of each algorithm is selected in the range of 10 {1,2,...,9} .Through tuning γ, all of the algorithms achieve their best performance.The addition of different levels of random pixel corruption and random square blocks is explained in detail.For random pixel corruption, a variable ρ ∈ [0, 1) is introduced as the random corruption index to corrupt randomly chosen pixels from each image, which follows a uniform distribution between [0,1].Moreover, the occlusion index ϕ ∈ [0, 1] is set.We replaced random square blocks with the unrelated image at a percentage of ϕ.In two real-world face data experiments, through setting each of parameters ρ and ϕ ∈ [0, 1] to 0.3 and 0.6, respectively, we simulated two levels of possible corruption on the clean original image.The occlusion index was set to 0.3 and 0.6 for analysis, and it could also be set to other values, such as 0.1 and 0.8.The smaller the value, the better the recovery result.

Clustering on Synthetic Data
We tested the performance of the RASVRG in the frameworks of the ℓ 1 /ℓ 0 -norm regularizer and elastic net regularizer in terms of clustering accuracy and running time on synthetic data.All the results reported are the averages of 10 independent trials.
We compared the proposed RASVRG NSC with three state-of-the-art algorithms namely OMP [22], Prox-SVRG [25] and DALM [46], with ℓ 1 /ℓ 0 -norm regularizers.The experimental results for the clustering accuracy are shown in Figure 2. Here, K denotes the number of subspaces, D is the ambient dimension, N i is the number of points per subspace, and d denotes the dimension of the subspaces.The clustering accuracies are reported with different D, K and γ values when setting d = 10 and N i ≡ 60.
From Figure 2, we can see that the RASVRG NSC consistently achieved accuracy rates over 95% and outperformed other methods, including the Prox-SVRG algorithm.Compared with the Prox-SVRG algorithm, the RASVRG NSC could achieve a higher accuracy in a shorter time, which also shows the obvious acceleration effect of our RASVRG method.We all know that OMP is a fast algorithm, but its accuracy is usually the lowest.Other algorithms such as the Prox-SVRG algorithm run quickly but ultimately do not reach the highest accuracy.It can be seen that the clustering accuracies of all the algorithms except OMP fluctuated up and down over time.In addition, the accuracies of the Prox-SVRG decreased even after 4 s in Figure 2e, while the RASVRG NSC had the most stable performance and always achieved superior accuracy over other algorithms.Moreover, it was found that the performance of the DALM and Prox-SVRG algorithms was affected by the change in γ, while the RASVRG NSC had stable performance.Moreover, we compared the convergence performance (including the objective value minus the minimum value (i.e., objective gap) versus the number of effective passes or running time) of all the methods, as shown in Figure 3.Note that evaluating N component function gradients or computing a single full gradient was considered one effective pass.All the experimental results show that the proposed RASVRG NSC algorithm converged significantly faster than other methods.
We compared the proposed RASVRG SC with the four algorithms, namely RFSS [47], FISTA [48], Homotopy [49], and the Prox-SVRG [25], for the elastic net model.In order to evaluate the robustness of different methods versus the parameter γ, we varied γ in the range of 10 {5,6,7,8,9} when setting d = 40, N i ≡ 60 to generate two types of random data.
The clustering accuracies of these methods are listed in Table 1.It is clear that the RASVRG SC achieved the highest accuracy in most cases.The stochastic method (Prox-SVRG) performed relatively well, but the running time was longer than that of the RASVRG SC .The accuracies of RFSS and Homotopy were similar and much lower than those of the RASVRG SC and Prox-SVRG, and thus we did not report their running time results.FISTA performed much better than Homotopy and RFSS, and it sometimes achieved an accuracy of over 90%, but it ran more than five times slower than the RASVRG SC .All the results indicate that the RASVRG SC was superior to the other methods in terms of both robustness and efficiency.

Face Clustering Based on Elastic-Net Regularizer
In this part, we compare the proposed RASVRG SC with popular elastic net solvers on the AR face database, including regularized feature sign search (RFSS) [47], Homotopy [49], the proximal stochastic variance reduction gradient (Prox-SVRG) [25] and the fast iterative shrinkage thresholding algorithm (FISTA) [48].In each trial, we randomly picked k ∈ {2, 5, 8, 11, 14} individuals and took all the images (under different illuminations) as the data to be clustered.
Table 2 reports the clustering accuracies of different methods on the face datasets with manually added random, unrelated block occlusion.All the results indicate that in most cases, the RASVRG SC outperformed the other algorithms in terms of clustering accuracy.
When the number of clusters was k = 2, all the algorithms except for Homotopy and RFSS achieved relatively high accuracies.With the number of clusters and the occlusion rate ϕ increasing, the performance of Homotopy had certain advantages over other methods, except for the RASVRG SC .In addition, the accuracies of Homotopy were close to those of RFSS.Due to the existence of random block occlusion, the clustering accuracies of all of the algorithms decreased rapidly as the number of clusters k increased.When k ≤ 10, the RASVRG SC obtained the best clustering accuracy.In the case of slight corruption (e.g., ϕ = 0.3), the Prox-SVRG algorithm achieved the best accuracy when k = 14, but in the other cases, the RASVRG SC significantly outperformed the other algorithms.When the occlusion rate was high (e.g., ϕ = 0.6), the RASVRG SC performed much better than the other algorithms, except for the case of k = 14.In this part, we evaluate the clustering accuracy and robustness of the RASVRG NSC on the extended Yale Face B database compared with five popular ℓ 0 /ℓ 1 -based solvers, namely the Prox-SVRG [25], Homotopy [50], OMP [22], DALM [46] and PALM [46].We randomly picked k ∈ {2, 3, 5, 8, 10} individuals and took all the images under different illuminations as the data to be clustered.
The clustering performance of all the methods is shown in Figure 4.The RASVRG NSC attained the highest clustering accuracy in almost all cases.It is quite clear that the performance of OMP was the worst in all cases when the random pixel corruption value ρ varied from 0.3 to 0.6.In the case of slight pixel corruption (e.g., ρ = 0.3), all the algorithms reached more than 90% accuracy when k = 2, except for OMP.The accuracy of Homotopy decreased rapidly when k > 2. The Prox-SVRG and PALM had rather close accuracies and maintained high clustering accuracies as k increased.When the random pixel corruption (e.g., ρ = 0.6) was high, only the RASVRG NSC and Prox-SVRG algorithms achieved more than 70% accuracy when k = 2.As the number of clusters increased, the clustering accuracies of all of the algorithms decreased.DALM performed the best when k = 10.Similarly, the RASVRG NSC performed well under various numbers of clusters, and the clustering accuracy was always about 10% higher than that of Homotopy.
The RASVRG algorithm outperformed other algorithms in terms of clustering accuracy under the frameworks of both elastic net and ℓ 1 /ℓ 0 -norm regularizers on the real-world face datasets with manually added random pixel corruption and unrelated block occlusion in most cases.This illustrates the strong robustness and wide applicability of our algorithms for various SSC problems, especially robust face clustering.

Conclusions and Future Work
In this paper, we proposed two efficient algorithms, the RASVRG SC and RASVRG NSC , to solve the elastic net regularizer and ℓ 1 -norm regularizer-based sparse subspace clustering problems, respectively.To the best of our knowledge, this work is the first one to propose faster stochastic optimization algorithms instead of deterministic methods to solve various large-scale SSC problems, especially large-scale robust face clustering.The experimental results for both synthetic and real-world face datasets demonstrated the effectiveness of our algorithms.Our algorithms performed much better than the state-of-the-art methods in terms of both clustering accuracy and running time.On the synthetic datasets, both the RASVRG NSC and RASVRG SC achieved more stable and higher clustering accuracies in most cases compared with other elastic net solvers and ℓ 1 -norm solvers.On the real-world face datasets with different levels of random pixel corruption and random block occlusion, our algorithms also achieved much higher clustering accuracies, which indicates their robustness to corrupted or damaged data.In other words, aisde from enjoying a higher speed, the RASVRG algorithm performed much better than the state-of-the-art methods in terms of both accuracy and robustness.
It was noticed that the RASVRG tracked only one variable in the inner loop, which made it quite friendly to asynchronous parallel and distributed implementation, including privacy-preserving federated learning [51].Applying parallel acceleration to our algorithms and other subspace clustering problems, such as those in [52,53], can make the RASVRG excellent in terms of both clustering accuracy and speed in large-scale privacy-preserving clustering.This can be an exciting orientation for our future work.
).In addition, b and x j N j=1 are normalized to an ℓ 2 -norm unit.The goal of the elastic net model is to attain c * (b, X) as follows: c * (b, X) := arg min c f (c; b, X).
end for Output: cs .

Theorem 2 (
Non-strongly convex).If Assumption 2 holds, then by choosing m = Θ(D), the RASVRG NSC can achieve the following oracle complexity in terms of expectation:

Figure 1 .
Figure 1.Examples of recovered results of our RASVRG method for a face image with random pixel corruption, where the face image was chosen from the extended Yale B database.(a) An original clean image chosen from the extended Yale B database [44].(b,c) The images with random pixel corruption of ρ = 0.3 and ρ = 0.6, respectively.(d,e) The images recovered by the RASVRG from (b) and (c), respectively.

Figure 2 .
Figure 2. Comparison of the algorithms based on ℓ 0 -and ℓ 1 -norm regularizers on synthetic datasets under different conditions.

Figure 3 .
Figure 3.Comparison of the convergence performance of the methods on the synthetic data 50,000 × 1000 in size.Note that the horizontal axis denotes the number of effective passes (left) or running time (right, in seconds), and the vertical axis corresponds to the objective value minus the minimum value.

6 Figure 4 .
Figure 4. Comparison of the algorithms based on ℓ 0 -and ℓ 1 -norm regularizers on the extended Yale B database with different random pixel corruption.

Table 1 .
Clustering accuracy and running time of the algorithms based on elastic net regularizers on synthetic data.The highest clustering accuracy is shown in bold.

Table 2 .
Clustering performance of different algorithms based on the elastic net regularizer on the AR face database with random, unrelated block occlusion.