A New Low-Rank Structurally Incoherent Algorithm for Robust Image Feature Extraction

: In order to solve the problem in which structurally incoherent low-rank non-negative matrix decomposition (SILR-NMF) algorithms only consider the non-negativity of the data and do not consider the manifold distribution of high-dimensional space data, a new structurally incoherent low rank two-dimensional local discriminant graph embedding (SILR-2DLDGE) is proposed in this paper. The algorithm consists of the following three parts. Firstly, it is vital to keep the intrinsic relationship between data points. By the token, we introduced the graph embedding (GE) framework to preserve locality information. Secondly, the algorithm alleviates the impact of noise and corruption uses the L1 norm as a constraint by low-rank learning. Finally, the algorithm improves the discriminant ability by encrypting the structurally incoherent parts of the data. In the meantime, we capture the theoretical basis of the algorithm and analyze the computational cost and convergence. The experimental results and discussions on several image databases show that the proposed algorithm is more effective than the SILR-NMF algorithm.

Unfortunately, these above algorithms are similar in that they all use the L2 norm, which will hinder the algorithms' performance when dealing with noise or abnormal data.Therefore, in order to solve the problem in which the L2 norm is affected by outliers, many researchers use L1-norm-based algorithms as dimension reduction algorithms of distance criteria, which are widely considered to be effective methods [14][15][16][17][18].For example, L1-PCA [14] and PCA-L1 [15], based on the PCA algorithm, solve the noise and outlier sensitivity in the data through an optimization problem.Finally, the Rotation Invariant L1norm PCA (R1-PCA) [16] algorithm proposed on the basis of L1-PCA has some PCA properties, which has some PCA properties.In order to solve the general L1 norm local preservation problem, Ref. [17] proposed the LPP algorithm (LPP-L1) based on the PCA-L1 algorithm to more effectively preserve the spatial topological structure.At the same time, in order to solve the outliers and corrosion problems in LPP-L1, a 2DLPP algorithm based on LPP-L1 algorithm (2DLPP-L1) was proposed [18].
In recent years, compared with algorithms based on L1 norm, many feature extraction algorithms using low rank representation (LRR) are conducive to extracting clean information, which has proved to be crucially important when there is noise in the data.
The robustness of these algorithms has attracted considerable attention from researchers [19][20][21][22][23][24].For example, in order to better maintain the lowest rank representation and global structure of data, the LRR in reference [19,20] introduced the single subspace clustering problem into multiple subspace clustering.The subspace structure was recovered from the data damaged by noise or occlusion.On the basis of LRR, a robust PCA (RPCA) was proposed by introducing kernel norm [21,22].On the basis of LRR, the manifold structure was introduced as the regularization term, and Laplace regularization LRR [23] was proposed and applied to clustering data.Based on the combination of the NNLRS-graph and LRR, the non-negative concept was introduced to propose a non-negative low-rank sparse graph (NNLRS) [24] to maintain the global structure in the data.
A new matrix factorization method with structural incoherence (SILR-NMF) [25] has been introduced into face recognition to improve the recognition performance in the presence of corrupted data.In Ref. [26], a new structurally incoherent method was proposed to improve the additional discrimination ability in face recognition by introducing the structural incoherence.Motivated by [25] and [26], we propose a new algorithm dubbed structural incoherent low rank two-dimensional local discriminant mapping embedding (SILR-2DLDGE), which is superior to the SILR-NMF algorithm who has important limitations.Since the SILR-NMF algorithm can not make full use of the neighborhood relationship between data points.SILR-2DLDGE is implemented in three steps.Firstly, the intra-class weighted matrix graph and the inter-class weighted matrix graph are constructed to maintain the discriminant information of local neighborhood.Secondly, lowrank learning is used to eliminate noise and damage in the data.Finally, the structural incoherencies [25,26] are combined to optimize the discriminant information of different classes.
The four main contributions of this paper are as follows:


We present a novel algorithm-based 2DLPP, which can simultaneously perform structurally incoherent, optimal graph Laplacian, and low-rank functions in a unified strategy.The algorithm has stronger discriminant ability than SILR-NMF, which fully reveals the structure information of the neighborhood to improve the discrimination ability.


We propose to introduce intraclass and interclass graphs into the structurally incoherent model to make the data points in the same class more compact and different class data points as far away as possible.


We used the low-rank feature to ensure that the given data are treated as two parts composed of the low-rank matrix and sparse matrix, representing useful features and nasty noise, respectively, so as to improve the performance of the algorithm using kernel norms as a measure of the regularization term.


We designed a practical and simple algorithm to cater to the optimization process, and verified the algorithm on five datasets.Actually, our proposed method can achieve better performance.
The rest of study was planned as follows: Section 2 mainly introduces the early algorithms such as LRR, SILR-NMF, and LRMD-SI.Section 3 introduces the details of proposed algorithm, and analyzes the computational complexity and convergence of the SILR-2DLDGE algorithm.In Section 4, we compare the proposed algorithm with other state-of-the-art algorithms, and verify the promising performance of the SILR-2DLDGE algorithm on five databases.The summary and future work can be seen in the last section of the paper.

Notation
To facilitate the understanding of this article, we will introduce the related work on some early algorithms, i.e., LRR, LRMD-SI, and SILR-NMF algorithms.
Firstly, we consider N original space samples Then, the linear transformation shown in Equation ( 1) can be obtained: Finally, we define the matrix respectively.Additionally, * Z represents the kernel norm which is used to compute the sum of the singular matrix.

LRR
The LRR [19] algorithm is different from sparse representation [27].LRR adopts a joint approach to transfer the recovery of damaged data to multiple subspaces, which is an effective subspace segmentation algorithm.At the same time, the global structure of data is obtained by searching the lowest rank representation of all data.
Each data vector in X can be learned by the linear combination of the T : , X TZ  where is the coefficient matrix being, representation X : The noise E can be separated from corrupted data as follows [19]: where  is a parameter.

SILR-NMF
In view of the fact that face is often affected by noise, SILR-NMF [25] is a better choice for feature representation and image classification.We can obtain Equation (5) as the final form of its objective function: where  ,  , and  are positive parameters.

LRMD-SI
In view of the fact that face is often affected by noise, Wei et al. [26] introduced a structurally incoherent constraint based on low-rank matrix decomposition (LRMD-SI), which is considered to have good performance in image representation and classification.We can obtain Equation (6) as the final form of its objective function: where matrix .
F is the F-norm of the matrix,  is the parameter, and   1, 2, ..., .i c  .

Structurally Incoherent Low-Rank 2D Local Discriminant Graph Embedding
This section discusses the objective function of the SILR-2DLDGE algorithm, and describes its optimization process, computational complexity, and convergence.

The Objective Function of SILR-2DLDGE
In low-rank matrix recovery, the training dataset , which is divided into the low-rank matrix and the noise matrix , where , C is the number of class of training sample X , and N is the total number of data in the i class of training sample X .Then, we have the following: where  is a parameter.
The following equation can be obtained by encoding the 2DLDGE algorithm into Equation (10) and introducing an orthogonal constraint on Y : where the balance parameters 0   and 0   .We define the scatter matrix w ij S (intra-class) and the scatter matrix b ij S (inter-class)   in Equation ( 8): 1, ( ) ( ) 0, . . where in the same class means the index set of the Kc nearest neighbors of the sample i X or j X , respectively.Additionally, respectively.We added structural incoherence to Equation ( 8), which is expressed as follows to improve the discrimination ability and further distinguish the different classes of the Equation ( 7) algorithm: In Equation (11), the algorithm improves the incoherence of algorithm structure by adding the F-norm between different pairs of i Z  and j Z  .The first term is to maintain the neighborhood information in the low dimensional subspace consistent with the clean data structure.The second term is used to learn a low rank matrix to ensure that its noise interference is reduced as much as possible.The last term has the discrimination ability and further distinguishes the different classes in the data.

The Optimization of SILR-2DLDGE
Due to the product term 11) is non-convex.We iteratively replaced low-rank matrices i Z  to solve the following problems: By introducing the auxiliary variable i B  to solve Equation ( 12), the following equa- tion can be obtained: Augmented Lagrange multipliers or ADMM [28] and other techniques can be used to solve Equation (13).We solved Equation ( 13) by performing singular value decomposition (SVD) at each iteration step, but it may have high computational complexity.We used the nuclear norm property [29] to reduce the complexity of Equation ( 13) as follows [30]: where is obtained by the minimum solution of the above equation.
We can write the equivalent of Equation ( 13) using the conclusion of Equation ( 14) as follows: The augmented Lagrange function of Equation ( 15) is detailed as follows: where the penalty parameter is 0   , and the Lagrange multipliers are 1 M , 2 M , and Next, we separated the solutions for each variable. ( Letting the partial derivative of variable B be zero in Equation ( 17): Then, we have the following: In Equation (20), letting the partial derivative of i Z  be zero, we have: where  16); thus, we have: To simplify the solution of the above problem, we introduce a shrinkage operator according to reference [31].In addition, the soft threshold to achieve the above purposes [32].Finally, it is converted into Equation ( 23): , and P, and update R: we obtain the following Equation ( 24) by fixing all variables except R in Equation ( 16): In Equation ( 24), letting the partial derivative of variable R be zero, we obtain: (5 , and P, and update H : we fix all other variables in Equation ( 16) except H ; thus, the following equation can be obtained: The following equation is obtained to use the partial derivative of Equation ( 26): , and H , and update P: we simplify Equation ( 16) and ob- tain the solution of variable P: Equation ( 28) will be converted as follows: In order to solve the equation better, we add a constraint adjustment; the equation is as follows: Equations ( 29) and ( 30) can be transformed into the following formula: The solution of Equation ( 31) is the same as solving the generalized eigenvalue equation as follows: Algorithm 1 gives the concrete steps of SILR-2DLDGE.
Step 10.Obtain the Output: Obtain projection matrix P The flow chart of the SILR-2DLDGE method is shown in Figure 1.

The Convergence Analysis
Firstly, we analyzed and proved the weak convergence of the proposed SILR-2DLDGE algorithm.In some cases, any limit point of the iterative sequence generated by the SILR-2DLDGE algorithm is a stationary point satisfying the Karush-Kuhn-Tucker (KKT) condition [33].
Assuming that the SILR-2DLDGE algorithm reaches a stationary point, any convergence point must satisfy the KKT condition, which is a necessary condition for a local optimal solution.Therefore, we can derive the KKT condition of Equation (11) as follows (note that Lagrange multiplier is not involved in the process of solving P; thus, we do not prove its KKT condition): 0, 0, 0 It can be deduced from the last relationship in Equation ( 33) that: where the application element to [34], the following relationships will be learned: where . Thus, the following equation can be obtained with the KKT condition: 0, 0, 0 On the basis of the above conditions, the point convergence satisfying the KKT condition is proved.
Theorem 1 [25].Suppose that   , , , , , , , , satisfies KKT conditions and converges, then it can be obtained that any point converges to the KKT point in   and 3 M can be obtained, as shown in the following equation: converge to the stationary point, and we obtain the first three conditions in Equation (37).
Next, according to the SILR-2DLDGE algorithm and the fourth KKT condition, we obtain Equation ( 38): The following equation is obtained by the third KKT condition: Thus, Similarly, the following equation can be obtained: From the last condition in Equation ( 35), the following equation is as follows: We can obtain the last condition when Both sides of Equations ( 38)-( 42) indicate tending to 0 when Finally, the sequence variable of   can be obtained asymptotically to satisfy the KKT condition in Equation (11).QED 

Computational Complexity
The main computation steps of SILR-2DLDGE algorithm are shown in steps 2 and 6.In the SILR-2DLDGE algorithm, the complexity of step 2 is O(t(2a 2 l)) to solve the Sylvester equation problem, and the complexity of step 6 is O(t(2a 3 )).Therefore, we can obtain the total complexity of the SILR-2DLDGE algorithm, which is O(t(2(a 3 + a 2 l))), where t is the number of iterations.

Selection of Samples and Parameters
In each subsequent experiment, we will randomly select T = 5, 6, 10, 5, 6, 3 samples from each class in FERET, ORL, COIL 100, Yale, AR, and PolyU databases for training.In each running experiment, NN classifier was used for classification, and the experiment was repeated 10 times.
In all the following experiments, we used an iterative algorithm to derive solutions of the L1-norm algorithms and set the maximum value of iterations of the relevant iterative algorithm to 500.The LPP, 2DLPP, LPP-L1, and 2DLPP-L1 algorithms based on local graph embedding are obtained using k-nearest neighbors, where k = T − 1 (T is the training sets of each databases) is well collected in the observation space [43].We choose the parameters of LRR, LRMD-SI, and SILR-NMF models in the references.From the objective function of the SILR-2DLDGE algorithm in Equation ( 13), the parameter range of α, β, and η were selected from [0.001, 0.01, 0.1, 1, 10, 100, 1000], and the γ parameter was selected from [0.1, 0.2, ……, 0.9, 1] to evaluate the effect, the values of parameters in our algorithm are given in Table 1.

Experiments on Occlusion Databases
To verify the robustness of the algorithm, we will randomly add 10 × 10 blocks to different positions of images to carry out continuous occlusion experiments in FERET, ORL, and COIL100 databases.

 FERET database
The FERET database is mainly used to study changes in pose, illumination, and facial expression.There are 200 classes in the FERET database, and each class has seven images with a resolution of 40 × 40 pixels, resulting in a total of 1400 gray-scale images.

 ORL database
The ORL database is mainly used to study changes with different expressions, postures, and illuminations.There are 40 classes in the ORL database., and each class has 10 images with a resolution of 56 × 46 pixels, resulting in a total of 400 gray-scale images.

 COIL 100 database
The COIL 100 object database is mainly used to study changes with different illuminations.There are 100 subjects, and each subject has 72 images with a resolution of 32×32 pixels, resulting in a total of 7200 gray-scale images.Figure 2 shows some occlusion images on the three different databases.To verify the effectiveness of continuous occlusion in FERET, ORL, and COIL100 databases, respectively, we randomly added 10 × 10 blocks to different positions of images.Figures 3-5 show the average recognition rates (%) of different dimension changes on the three different databases.
We ran a set of experimental results and compared them with seven methods; the results are presented in this section.According to the settings in Section 4.1, when each class in FERET, ORL, and COIL100 databases randomly selects T sample points to form a training sample set, the optimal average recognition rate (%) of the seven algorithms is shown in Table 2.

Experiments on Noise Databases
Experiments were carried out on different levels of random pixel corruptions to verify the robustness of the algorithm."Salt & pepper" noise with a density of 0.1 was added to the Yale, AR, and PolyU databases for corrosion experiments.

Description of the Yale database
The Yale face database is mainly used to study changes in facial expressions and lighting conditions.There are 15 classes in the Yale database, and each class has 11 images with a resolution of 50 × 40 pixels, resulting in a total of 165 gray-scale images.

Description of the AR database
The AR database is mainly used to study the changes in lighting conditions, facial expressions, and occlusion.There are 70 men and 56 women with a total of 126 people in the AR database, including 4000 color images with a resolution of 50 × 40 pixels.

Description of the PolyU palmprint database
The PolyU database is mainly used to study image changes in two periods.There are 100 different palms in the PolyU database, and each palm has six samples, resulting in a total of 600 gray images with a resolution of 64 × 64 pixels.Figure 6 shows the original images and some corrosion images on the three different databases.As in the last experiment, we can obtain the best average recognition rates (%) of the eight algorithms according to the settings in Section 4.1, as shown in Table 3.

Convergence Study
In the last experiment, we will further prove the convergence of the SILR-2DLDGE algorithm on the FERET and AR databases, respectively.The objective function values decreased monotonically; Figure 10 shows that the proposed algorithms are convergent due to the increase in the number of iterations.In addition, to further verify the effectiveness of our proposed algorithm, we took the first three training samples from each class on the PolyU palmprint database; Table 4 shows the CPU time of each method.

Observations and Discussion
The four contributions of the SILR-2DLDGE algorithm are as follows: (1) In Tables 2 and 3, the maximum average recognition rate of the SILR-2DLDGE algorithm on different databases is the highest, which fully shows that our algorithm is optimal and robust.The key reason is that SILR-2DLDGE not only learns a base matrix with low-rank property and local discriminant ability, but also has the advantages of SILR-NMF, which can weaken the disturbance of noise.(2) The SILR-2DLDGE algorithm has the advantages of low-rank learning, sparse learning, and incoherent structure learning, similarly to the SILR-NMF algorithm, as well as the advantages of graph embedding, which shows that more discrimination information can be obtained by sparse learning combined with the L1 norm and L * norm.We can see from the curve changes in Figures 3-5 and 7-9 that the average recognition rates of this algorithm in three noisy databases and three occlusion databases are higher than that of other algorithms, which fully show that our algorithm has stronger robustness than other algorithms.
(3) The SILR-2DLDGE algorithm takes less time than the others (Table 4) and learns the sparse transformation matrix to encode the geometric structure of the data and effectively improve the classification accuracy, which can obtain clean data in cases of noise disturbance.At the same time, structural incoherence can ensure that it is easier to separate data points of different classes.(4) The maximum average recognition rate of the SILR-2DLDGE algorithm varies among different databases.For example, the SILR-2DLDGE algorithm has the highest average maximum recognition rate without occlusion on the ORL database, whereas it has the lowest average maximum recognition rate with an occlusion of 10 × 10 on the COIL20 database.At the same time, the SILR-2DLDGE algorithm has the highest average maximum recognition rate without noise on the YALE database, whereas it has the lowest average maximum recognition rate with added "salt and pepper" noise with a density of 0.1 on the AR database.The above reasons are due to the small number of ORL and YALE samples, the large AR and PolyU palmprint databases, the presence of glasses and other occlusions in AR, and the presence of illumination and other effects in the PolyU palmprint.

Conclusions
This study mainly combined the 2DLPP algorithm with low-rank representation learning and proposed a structurally incoherent low-rank two-dimensional local discriminant graph embedding (SILR-2DLDGE) algorithm based on subspace learning, graph embedding, low-rank sparsity, and structural incoherence.Identification information, local geometry information, low-rank representation information, and structural incoherence existed simultaneously in the proposed algorithm.In addition, the ultimate goal of the proposed algorithm is to make the data points as independent as possible from different classes.In particular, it used low-rank learning L1 norms as constraints to reduce the influence of noise and corruption.The noise and occlusion experiments on six public databases further proved that the proposed SILR-2DLDGE algorithm is more robust than other algorithms.The algorithm is sensitive to parameters.In the future, we will study various parameters and further improve the robustness of the algorithm.

j
row of Z , respectively.


denotes the nuclear norm.
be generated by the SILR-2DLDGE algorithm and let  ≜   1 2 3

Figure 3 .
Figure 3.The average recognition rates (%) of the eight algorithms in the FERET database vary with the dimension.

Figure 4 .
Figure 4.The average recognition rates (%) of the eight algorithms in the ORL database varies with the dimension.

Figure 5 .
Figure 5.The average recognition rates (%) of the eight algorithms in COIL20 database vary with the dimension.

Figure 6 .
Figure 6.Some the original images and corrosion images on the three different databases.(a) Yale, (b) AR, and (c) PolyU.

Figure 7 .
Figure 7.The average recognition rates (%) of the eight algorithms in the Yale database vary with the dimension.

Figure 8 .
Figure 8.The average recognition rates (%) of the eight algorithms in the AR database vary with the dimension.

Figure 9 .
Figure 9.The average recognition rates (%) of the eight algorithms in the PolyU palmprint database vary with the dimension.
, H , and P, and update i Z  : by fixing other variables, we can obtain the following equation from Equation (16): , H , and P, and update i E  : to fix other variables, the optimal i E  can be obtained in Equation ( , H and P, update i B  with Equation (17); i E  , R, H and P, update i Z  with Equation (20); Step 3. Fixing i Z  , i B  , R, H and P, update i E  with Equation (22); Step 4. Fixing i Z  , i B  , i E  , H and P, update R with Equation (24); Step 5. Fixing i Z  , i B  , i E  , R and P, update H with Equation (26); i E  , R and H , update P with Equation (28); 3 M as follows: Update 1

Table 1 .
Parameter values of the SILR-2DLDGE algorithm on different datasets.

Table 2 .
The best average recognition rates (%) of the eight algorithms in FERET, ORL, and COIL100 databases and the corresponding dimensions (D).

Table 4 .
The average CPU time consumed of the eight algorithms in the PolyU palmprint database.