Minimum Eigenvector Collaborative Representation Discriminant Projection for Feature Extraction

High-dimensional signals, such as image signals and audio signals, usually have a sparse or low-dimensional manifold structure, which can be projected into a low-dimensional subspace to improve the efficiency and effectiveness of data processing. In this paper, we propose a linear dimensionality reduction method—minimum eigenvector collaborative representation discriminant projection—to address high-dimensional feature extraction problems. On the one hand, unlike the existing collaborative representation method, we use the eigenvector corresponding to the smallest non-zero eigenvalue of the sample covariance matrix to reduce the error of collaborative representation. On the other hand, we maintain the collaborative representation relationship of samples in the projection subspace to enhance the discriminability of the extracted features. Also, the between-class scatter of the reconstructed samples is used to improve the robustness of the projection space. The experimental results on the COIL-20 image object database, ORL, and FERET face databases, as well as Isolet database demonstrate the effectiveness of the proposed method, especially in low dimensions and small training sample size.


Introduction
High-dimensional data widely exists in real applications, such as image recognition, information retrieval, etc. Particularly, in actual data processing problems, one often encounters the so-called high-dimensiona l small sample size (SSS) problem, in which the number of available samples is smaller than the dimensionality of the sample feature. Besides, high-dimensional data contains a lot of redundant information, and directly processing high-dimensional data will consume a lot of storage and computing resources. Fortunately, some previous research work [1][2][3] has shown that high-dimensional data is likely lying on or close to a low-dimensional submanifold space, which means that high-dimensional data can be projected into a low-dimensional subspace by some dimensionality reduction (DR) methods without losing important information. In the past few decades, numerous DR theories and methods have been proposed, and part of the work can be found in [4][5][6][7][8].
Principal component analysis (PCA) [4] and linear discriminant analysis (LDA) [5] are the most classic and popular DR methods, and PCA belongs to an unsupervised method while LDA is a supervised method. Despite their simplicity and effectiveness, they still suffer from some limitations in practice. For example, PCA is an unsupervised DR method and fails to provide discrimination information for different classes of data. LDA can find at most C − 1 meaningful discriminant projection directions because theoretical analysis shows that the rank of the between-class scatter matrix is at most C − 1, where C represents the number of classes. More importantly, they do not use the structural information of the samples, which greatly reduces the discriminativeness of the extracted features, especially when dealing with SSS problems. Most of the DR methods proposed afterward are based on these two methods or their extended version but consider the structure of the samples. Some DR methods which focus on the local structure of the data have been developed to improve the discrimination performance of the low-dimensional space. A partial list of these methods includes the unsupervised ones such as locality preserving projections (LPP) [9], neighborhood preserving embedding (NPE) [10], and locally linear embedding (LLE) [1], and the supervised ones like marginal Fisher analysis (MFA) [11] and constrained discriminant neighborhood embedding (CDEN) [12]. Several other DR methods that utilize both the local and global structure of the data to improve recognition accuracy have been proposed, such as local Fisher discriminant analysis (LFDA) [13], locally linear discriminant embedding (LLDE) [14], and locality preserving discriminant projections (LPDP) [15].
Recently, some representation-based methods have been used for classification, including sparse representation classification (SRC) [16], sparsity preserving projections (SPP) [17], discriminant sparse neighborhood preserving embedding (DSNPE) [18], and discriminative sparsity preserving projections (DSPP) [19]. However, solving a sparse problem requires an iterative method, which is usually time-consuming. Another type of representation-based DR method, which uses L2 regularization and has a closed solution, has attracted wide attention. It has been proved in [20] that the collaborative representation classification (CRC), with higher efficiency, is competitive to SRC in terms of recognition accuracy. Since then, some collaborative representation based have been proposed, such as collaborative representation based projections (CRP) [21], regularized least square-based discriminative projections (RLSDP) [22], a collaborative representation reconstruction based projections (CRRP) [23], collaborative representation based discriminant neighborhood projections (CRDNP) [24] and collaborative preserving Fisher discriminant analysis (CPFDA) [25].
Except for the linear dimensionality reduction methods introduced above, a class of nonlinear dimensionality reduction methods has been proposed to deal with nonlinear dimensionality reduction problems. Most of the nonlinear DR methods directly use kernel trick to expand linear DR method, likes kernel PCA (KPCA) [26], kernel Fisher discriminant (KFD) [27], kernel direct discriminant analysis (KDDA) [28], and kernel collaborative representation-based projection (KCRP) [29]. Some other recent DR methods can be found in [30][31][32].
In this article, we study linear DR methods. Although some of the proposed methods make use of both the local structure and the global structure of the sample to extract features, when the reconstruction error of the sample is large, it is difficult for them to maintain the true structural similarity of the sample. In addition, most of the previous dimensionality reduction methods only consider the structural similarity of similar samples when looking for the projection subspace, while ignoring the structural similarity of different types of samples, which could also be used to improve the discriminability of the extracted features.
In order to extract the features with a strong discriminant, a minimum eigenvector collaborative representation discriminant projection (MECRDP) is proposed in this paper. The main contributions of our work are as follows. First, in the collaborative representation of samples, we not only use the information of the sample space but also consider the information of the sample eigenvector space. Specifically, we use the eigenvector corresponding to the smallest non-zero eigenvalue of the sample covariance matrix to reduce the collaborative representation error of each sample. Then, we maintain the collaborative representation relationship of the samples to improve the discriminability of the extracted features. Also, the between-class scatter of the reconstructed samples is used to improve the robustness of the projection subspace. Last, experimental results on four public databases show that MECRDP outperforms other DR methods in terms of recognition accuracy, especially in low dimensions and small training sample size.
The remainder of this paper is organized as follows. In Section 2, we briefly introduce the research work closely related to our method, including LDA and CRP. Section 3, we propose a minimum eigenvector collaborative representation discriminant projection to improve the discriminability of the projection subspace. The experimental results are presented in Section 4. Finally, the conclusion remarks are given in Section 5.

Related Works
For simplicity, suppose a training samples set of C classes is denoted by X = [x 1 , x 2 , . . . , x n ] ∈ m×n , where x i ∈ m represents the ith sample, m is the dimension of the sample feature, and n is the number of samples. Besides, suppose the cth class contains n c samples, and C c=1 n c = n. The method we proposed in this paper has a great relationship with LDA and CRP. In what follows, we briefly review these two methods.

Linear Discriminant Analysis
The goal of LDA [5] is seeking a projection matrix so that the within-class scatter is minimized, and the between-class scatter is maximized simultaneously. According to the graph embedding [13], the projection matrix of LDA corresponds to the following two optimization problems, respectively, where the weights W i,j are defined as, respectively, Using some algebraic transform, we can rewrite (1) and (2) as where tr(·) denotes matrix trace, S b = 2XL (b) X T and S w = 2XL (w) X T are the between-class scatter matrix, and the within-class scatter matrix, respectively.
are the Laplacian matrices, in which D (b) and D (w) are diagonal matrices with their diagonal entries i,j , respectively. Using (5) and (6), the objective function of LDA can be modeled as

Collaborative Representation Based Projections
CRP [21] is an unsupervised discriminant projection method based on L2 regularized least squares. The collaborative representation coefficients of the ith sample is gotten by solving the following optimization problem min where r i ∈ n is the collaborative representation coefficients of the ith sample, λ > 0 is a regularization parameter, and e i = [0 1 , . . . , 0 i−1 , 1, 0 i+1 , . . . , 0 n ] T . The constraint in (1) means the ith sample is represented on all the samples other than itself. The optimal solution of (1) can be easily achieved by using the Lagrangian multiplier method as where Q = (X T X + λI) −1 and I ∈ n×n is a identity matrix.
Using the collaborative representation coefficients, the optimal projection matrix P ∈ m×d (d < m) of CRP is obtained by solving the following two optimization problems, simultaneously where r i,j denotes the jth entry of r i , and x = (1/n) n i=1 x i is the mean of the samples. With some algebraic formulations, (10) and (11) can be, respectively, rewritten as where R = [r 1 , r 2 , . . . , r n ] is the representation coefficients matrix, S L = (I − R)(I − R) T denotes the local scatter matrix and S T = n i (x i − x)(x i − x) T is the total scatter matrix. Then, the optimal projection matrix of the CRP is gotten by solving the following optimization problem min P tr P T S L P tr P T S T P (14)

Minimum Eigenvector Collaborative Representation Discriminant Projection
LDA uses between-class scatter and within-class scatter to improve the discrimination of extracted features, but it ignores the structural relationship of the samples, resulting in a decline in the discrimination of the features in the projection subspace. Although CRP takes into account the structural relationship of the sample, it does not use the class information of the sample, which is not conducive to the improvement of recognition accuracy. In particular, when the number of samples is small, there will be a large representation error, x i − Xr i 2 2 , which fails to maintain the similarity of the sample structure. Here, we propose a new feature extraction method to alleviate the problems mentioned above.

Method Proposed
The sample is represented on all samples other than itself; therefore, this may cause a large reconstruction error when the number of samples is small. However, any eigenvector corresponding to a non-zero eigenvalue of the sample covariance matrix contains partial information of each sample. In order to improve the reconstruction error of the sample, but only use very little information of the represented sample, we consider constructing an expanded sample matrix X with the eigenvector corresponding to the smallest non-zero eigenvalue of the sample covariance matrix. Let x v be the eigenvector corresponding to the smallest non-zero eigenvalue of XX T , then the expanded sample matrix X is defined as X = [X, x v ]. Similar to (8), the collaborative representation coefficients of the ith sample is achieved by where r i ∈ n+1 is the collaborative representation coefficient and e i = [e T i , 0] T . Let R = [ r 1 , r 2 , . . . , r n ] be the collaborative representation coefficient matrix. Then, the sample x i can be reconstructed aŝ and the reconstructed sample matrix iŝ In order to maintain the reconstruction similarity of the samples and keep within-class compactness, we modified the optimization problem in (2) as where S w = XD (w) X T −XW (w) X T − XW (w)X T +XD (w)X T is the collaborative within-class scatter matrix and W (w) ,D (w) are defined in (4) and (6).
In what follows, we define the collaborative reconstructed between-class scatter matrix aŝ where L (b) has been defined (5). Based on the Fisher criterion, considering both the between-class scatter matrix S b and the collaborative reconstructed between-class scatter matrixŜ b , the objective function of the proposed MECRDP is formulated as where is a balance factor between S b andŜ b . According to the definition of S b , we know that maximizing tr(P T S b P) can improve the discrimination of the projection matrix P. However, when the dimension of the subspace exceeds a certain size, as the dimension of the projection subspace increases, the discriminant performance is more affected by noise. Consider that the collaborative representation between samples could better describe the similarity of samples in the structure, thereby reducing the impact of the noise, then we can useŜ b to improve the robustness of the projection matrix. The problem (21) is a generalized eigenvalue problem, whose optimal solution could be achieved by the generalized eigenvalue decomposition as follows where λ i and p i are the eigenvalue and corresponding eigenvector, respectively. Then the projection matrix P is composed of the eigenvectors corresponding to d-the smallest non-zero eigenvalues-that is P = [p 1 , p 2 , . . . , p d ].
Using x v , different from CRP, we not only consider the information in the sample space but also use part of the information in the feature value space, so we can get a smaller representation error, x i − Xr i 2 2 and keep more collaborative reconstruction information in the feature projection space. In addition, CRP uses the local divergence and global divergence of the sample to obtain Sensors 2020, 20, 4778 7 of 19 the discriminativeness of the extracted features, and in our method, we use the label information of the sample to keep the sample in the projection space with a small within-class scatter and larger between-class scatter, thereby improving the discriminativeness of the extracted features.

Algorithmic Procedures
For simplicity, we summarize the algorithmic procedures of the proposed MECRDP as follows: Step 1: Project the original high-dimensional sample into an intermediate subspace to remove noise and useless information by PCA, and get the projection matrix P V . For simplicity, we still utilize X to represent the training samples after the projection.
Step 2: Find the eigenvector x v corresponding to the smallest non-zero eigenvalue of XX T , then the Step 3: Compute the collaborative representation coefficients R by (15) and then reconstruct the samples matrixX byX = X R.
Step 4: Compute the collaborative within-class scatter matrix S w , the between-class scatter matrix S b and the collaborative reconstructed between-class scatter matrixŜ b by (19), (5) and (20), respectively.
Step 5: Perform generalized eigenvalue decomposition by S w p i = λ i S b p i , and use eigenvectors corresponding to the smallest d eigenvalues to construct the projection matrix Step 6: For any input sample x ∈ n , its low-dimensional projection is It is obvious that the proposed method has no iterative steps and its optimal projection matrix can be analytically obtained. Without considering the first step, which is the data preprocessing process that most linear DR methods need to perform, we simply analyze the computational complexity of our proposed method. In step 2, it costs about O(m 3 ) to find the eigenvector corresponding to the smallest non-zero eigenvalue of XX T , step 3 costs about O(mn 2 + n 3 ), and step 4 needs about O(mn 2 + m 2 n), finding the optimal projection matrix in step 5 requires O(n 3 ), with the total computational complexity being about O(m 3 + m 2 n + mn 2 + n 3 ).

Experiments
In this section, some experiments are conducted to evaluate the performance of the proposed MECRDP. We compare the recognition performance of these methods on four public databases, including an image object database COIL20 [33]

Preprocessing and Parameter Setting
In our experiments, all the images were converted to grayscale images and were resized. In the following experiments, each sample is stacked into a column vector in column order and is normalized. Besides, to avoid the singular problem caused by small sample size problems, we reduce the dimension by remaining 98% data energy by PCA. Each database is randomly divided into a training set and test set, say randomly selecting s-samples from each class to form a training set, and the remaining samples are used as a test set. For simplicity, without additional explanation, the nearest neighbor method is used to classify the test samples. Considering that most methods compared have adjustable parameters, for the sake of fairness, we look for the satisfactory parameters in a larger range for them. For example, we find the neighbor parameter for LLDE from 1 to s − 1, with and empirically set the rescaling coefficient to 1. In MFA, the parameter for the intrinsic graph is empirically set as s − 1 and the parameter for the penalty graph is chosen from {1C, 3C, 5C, 7C, 9C}.
Particularly, each experiment was independently repeated 20 times to avoid the bias caused by random selection, and the average results are recorded and reported. All the experiments are

Experimental on COIL20 Database
The COIL20 [33] is an object recognition database which has 1440 images from 20 objects with 72 images in each object. We resize each image into 32 × 32 and randomly select 4, 6, 8, 10 images from each class to form the training sample. The parameters λ and α in MECRDP are set to 0.5 and 0.1, respectively. Table 1 records the maximum average recognition accuracy, the standard derivations, the corresponding dimension, and the average running time of each method on COIL20. Figure 1 shows the average recognition accuracy versus the subspace dimensions on COIL20. From the results in Table 1 and Figure 1, we observe that the proposed MECRDP achieves better recognition accuracy than other methods on COIL20 database. Compared with other methods, the maximum average recognition accuracy of MECRDP is improved about 3%. Table 1 shows the running time of our method is almost the same as other methods, which implies the efficiency of our method. In addition, the optimal recognition result of MECRDP is usually obtained in a lower projection dimension than other methods. The same result could be verified more intuitively in Figure 1. The result in Figure 1 shows that when the dimension of the projection subspace is low, MECRDP is significantly better than other methods. This result means that the features extracted by our method have obvious discriminativeness even in the low-dimensional space. Table 2 shows the maximum average recognition accuracy with different dimension by using nearest neighbor classifier. To evaluate the effectiveness of MECRDP further, we compare it with other dimension reduction methods under another classifier two-layer neural network; the results are shown in Table 3. Tables 2 and 3 show that the recognition accuracy achieved by the neural network is not as good as that of the nearest neighbor method, but the proposed method always achieves the best results. The result of COIL20 verifies the effectiveness of our method for feature extraction.

Experiment on the ORL Database
The ORL [34] is the face database and consists of 400 images from 40 individuals with 10 images for each individual. These images are varying in lighting, facial expressions, and details. In our experiments, we cropped and resized each image to 32 32 × , and randomly selected 3, 4, 5, 6 images from each class to form the training set. We set λ and α in MECRDP to 0.05 and 0.9, respectively. Table 4 reports the maximum average recognition accuracy, the standard derivations, the corresponding dimension, and the average running time of each method on the ORL database. Tables 5 and 6 list the average recognition rates obtained by the nearest neighbor classifier and neural network classifier for each method, respectively. Figure 2 plots the average recognition

Experiment on the ORL Database
The ORL [34] is the face database and consists of 400 images from 40 individuals with 10 images for each individual. These images are varying in lighting, facial expressions, and details. In our experiments, we cropped and resized each image to 32 × 32, and randomly selected 3, 4, 5, 6 images from each class to form the training set. We set λ and α in MECRDP to 0.05 and 0.9, respectively. Table 4 reports the maximum average recognition accuracy, the standard derivations, the corresponding dimension, and the average running time of each method on the ORL database. Tables 5 and 6 list the average recognition rates obtained by the nearest neighbor classifier and neural network classifier for each method, respectively. Figure 2 plots the average recognition accuracy versus the subspace dimensions on ORL. The results show that our method is superior to other methods, especially when the number of training samples is small; the performance improvement of our method is more obvious. For example, when the training samples of each class are 3 and 4, the maximum average recognition accuracy of MECRDP is improved by about 4.1% and 3.1%, respectively. Besides, we note that, for each method, the recognition accuracy is increased as the training sample increases, while that of MECRDP always outperforms other methods. Figure 2 shows that as the projection dimension increases, the performance of some methods, such as LDA, LLDE, and RLSDP, will decrease because they can extract some noise information when the projection dimension is large. However, our method has shown a certain degree of robustness. Comparing Tables 5 and 6, we find that the recognition accuracy obtained by the nearest neighbor classifier is better than that of the neural network classifier. What's more, the proposed MECRDP can achieve the best results in almost all the selected feature dimensions. The experimental results on the ORL database verify the advantages of the proposed algorithm. Table 4. The maximum average recognition accuracy (%) ± the standard derivations (%), the corresponding dimension, and the average running time (seconds) in parentheses of each method in the ORL database.

Experiment on the FERET Database
The FERET [35] is the face database and consists of 13,539 images from 1565 individuals, where we select a subset containing 1400 images from 200 individuals, with seven images for each individual. These images are varying in facial expression, illumination, and pose. We cropped and resized each image to 40 40 × , and randomly selected 3, 4, 5 images from each class to form the training set. We set λ and α in MECRDP to 0.1 and 0.1, respectively. Table 7 shows the maximum average recognition accuracy, the standard derivations, the corresponding dimensions, and the average running time of each method on the FERET database. The average recognition rates obtained by the nearest neighbor classifier and neural network classifier for each method are shown in Tables 8 and 9, respectively. Figure 3 illustrates the average These images are varying in facial expression, illumination, and pose. We cropped and resized each image to 40 × 40, and randomly selected 3, 4, 5 images from each class to form the training set. We set λ and α in MECRDP to 0.1 and 0.1, respectively. Table 7 shows the maximum average recognition accuracy, the standard derivations, the corresponding dimensions, and the average running time of each method on the FERET database. The average recognition rates obtained by the nearest neighbor classifier and neural network classifier for each method are shown in Tables 8 and 9, respectively. Figure 3 illustrates the average recognition accuracy versus the subspace dimensions on FERET. The results in Table 3 and Figure 3 show that the proposed MECRDP outperforms other methods. It is worth noting that FERET has 200 classes but only seven samples for each class. On the FERET database, the average recognition accuracy is greatly improved by MECRDP. For example, the training samples of each class are 3 and 4, the maximum average recognition accuracy of MECRDP is improved by about 26.2% and 14.1%, respectively. Besides, Figure 3 shows that when the projection dimension increases, the recognition accuracy of MECRDP will decrease but it is still higher than other methods. The results in Tables 8 and 9 show that the nearest neighbor classifier could achieve higher recognition accuracy than the neural network classifier. Table 8 shows that MECRDP has achieved the highest recognition accuracy in each selected feature dimension. Table 8 shows that MECRDP has achieved the highest recognition accuracy in each selected feature dimension, while Table 9 shows that MECRDP performs better than other feature extraction methods in most cases when the neural network classifiers are used. In general, the experimental results show that the features extracted by MECRDP are more discriminative in most cases. Table 7. The maximum average recognition accuracy (%) ± the standard derivations (%), the corresponding dimension, and the average running time (seconds) in parentheses of each method in the FERET database.

Experiment on the Isolet Database
The Isolet database [36] was generated as follows. One hundred fifty subjects spoke the name of each letter of the alphabet twice. Hence, there were 52 training examples from each speaker and the size of each sample was 617 × 1. The speakers were grouped into sets of 30 speakers each, and were referred to as isolet1, isolet2, isolet3, isolet4, and isolet5. In our experiment, we compared the performance of each feature extraction method in Isolet1. We set λ and α in MECRDP to 1 and 0.1, respectively. Table 10 shows the maximum average recognition accuracy, the standard derivations, the corresponding dimensions, and the average running time of each method on the Isolet1 database. Tables 11 and 12 report the average recognition rates obtained by the nearest neighbor classifier and neural network classifier for each method, respectively. Figure 4 illustrates the average recognition accuracy versus the subspace dimensions in Isolet1. The results in Table 10 show that under different numbers of training samples, the best results are always obtained by MECRDP. Figure 4 shows that as the number of training samples increases, the recognition accuracy of all the feature extraction algorithms is improved. Although the advantages of the proposed MECRDP over other methods will gradually decrease, it can maintain a high recognition accuracy. For example, when the number of training samples for each class is five, the recognition accuracy of MECRDP is improved by at least 3.4% compared to other methods. When the number of training samples for each class is increased to 20, the performance of our method is improved by only 1.2%, and its best recognition accuracy is 94.05%. Tables 11 and 12 show that MECRDP performs better than other feature extraction methods in most cases. Based on the above experimental results, we believe that the proposed MECRDP is an effective feature extraction method.

Experiment on the Isolet Database
The Isolet database [36] was generated as follows. One hundred fifty subjects spoke the name of each letter of the alphabet twice. Hence, there were 52 training examples from each speaker and the size of each sample was 617 × 1. The speakers were grouped into sets of 30 speakers each, and were referred to as isolet1, isolet2, isolet3, isolet4, and isolet5. In our experiment, we compared the performance of each feature extraction method in Isolet1. We set λ and α in MECRDP to 1 and 0.1, respectively. Table 10 shows the maximum average recognition accuracy, the standard derivations, the corresponding dimensions, and the average running time of each method on the Isolet1 database. Tables 11 and 12 report the average recognition rates obtained by the nearest neighbor classifier and neural network classifier for each method, respectively. Figure 4 illustrates the average recognition accuracy versus the subspace dimensions in Isolet1. The results in Table 10 show that under different numbers of training samples, the best results are always obtained by MECRDP. Figure 4 shows that as the number of training samples increases, the recognition accuracy of all the feature extraction algorithms is improved. Although the advantages of the proposed MECRDP over other methods will gradually decrease, it can maintain a high recognition accuracy. For example, when the number of training samples for each class is five, the recognition accuracy of MECRDP is improved by at least 3.4% compared to other methods. When the number of training samples for each class is increased to 20, the performance of our method is improved by only 1.2%, and its best recognition accuracy is 94.05%. Tables 11 and 12 show that MECRDP performs better than other feature extraction methods in most cases. Based on the above experimental results, we believe that the proposed MECRDP is an effective feature extraction method.

Parameter Sensitivity Analysis
Here we analyze the influence of parameters on the proposed algorithm through experiments on the four databases. Figures 5 and 6 plot the maximum average recognition accuracy of MECRDP versus the

Parameter Sensitivity Analysis
Here we analyze the influence of parameters on the proposed algorithm through experiments on the four databases. Figures 5 and 6 plot the maximum average recognition accuracy of MECRDP versus the parameter λ and α for the four samples of each database, respectively. Figures 4 and 5 show that the parameter selection of MECRDP has a great relationship with the sample database. Within a certain range, such as λ ∈ [0, 1], there is a little change in the recognition accuracy of the four databases. While α has a greater impact on the recognition accuracy for COIL20, ORL, and FERET, particularly for Isolet1, the recognition accuracy of MECRDP was robust for parameter α.

Visualization
In order to compare the distribution of extracted features in low-dimensional space more intuitively, we randomly selected six classes of samples from the ORL database and then projected them into two-dimensional space. Figure 7 shows the projection results of all methods in two dimensions. The results show that most methods have good clustering results except CRP and PCA, which are unsupervised feature extraction methods. This shows that the label information helps to improve the discriminative feature extraction. However, in some classes (class 1 and class 6), some methods, such as CRRP, LDA, MFA, and RLSDP, have not achieved a large between-class distance. Figure 7 shows that the features extracted by MECRDP not only have a small within-class compactness, but also have a larger between-class distance.
In a word, these experimental results imply that using the minimum eigenvector to reduce the sample reconstruction error, and maintaining the collaborative representation relationship between the samples helps to improve the discriminability of the extracted features, especially when the

Visualization
In order to compare the distribution of extracted features in low-dimensional space more intuitively, we randomly selected six classes of samples from the ORL database and then projected them into two-dimensional space. Figure 7 shows the projection results of all methods in two dimensions. The results show that most methods have good clustering results except CRP and PCA, which are unsupervised feature extraction methods. This shows that the label information helps to improve the discriminative feature extraction. However, in some classes (class 1 and class 6), some methods, such as CRRP, LDA, MFA, and RLSDP, have not achieved a large between-class distance. Figure 7 shows that the features extracted by MECRDP not only have a small within-class compactness, but also have a larger between-class distance.
In a word, these experimental results imply that using the minimum eigenvector to reduce the sample reconstruction error, and maintaining the collaborative representation relationship between the samples helps to improve the discriminability of the extracted features, especially when the number of training samples is small.

Visualization
In order to compare the distribution of extracted features in low-dimensional space more intuitively, we randomly selected six classes of samples from the ORL database and then projected them into two-dimensional space. Figure 7 shows the projection results of all methods in two dimensions. The results show that most methods have good clustering results except CRP and PCA, which are unsupervised feature extraction methods. This shows that the label information helps to improve the discriminative feature extraction. However, in some classes (class 1 and class 6), some methods, such as CRRP, LDA, MFA, and RLSDP, have not achieved a large between-class distance. Figure 7 shows that the features extracted by MECRDP not only have a small within-class compactness, but also have a larger between-class distance.  In a word, these experimental results imply that using the minimum eigenvector to reduce the sample reconstruction error, and maintaining the collaborative representation relationship between the samples helps to improve the discriminability of the extracted features, especially when the number of training samples is small.

Conclusions
A new linear dimensionality reduction method based on minimum eigenvector collaborative representation discriminant projection has been proposed for feature extraction in this paper, which can be viewed as an extended collaborative representation projection method. This method employs the eigenvector corresponding to the smallest non-zero eigenvalue of the sample covariance matrix to reduce the collaborative representation error. Meanwhile, the collaborative representation relationship of the samples is maintained in the projection subspace. In addition, the between-class scatter of the reconstructed samples is used to improve the robustness of the projection subspace. The experimental results on four public databases demonstrated the superiority of the proposed method in terms of recognition accuracy as compared with other commonly used linear DR methods. What's more, the experiments show that the proposed method is especially suitable for dealing with small sample size problems, and it can also work well when the number of training samples is large. Thus, we believe that MECRDP is a general algorithm for feature extraction.
Note that the proposed MECRDP is a parameterized method, and its performance will inevitably be affected by the choice of the parameters, which also happens to other parameterized methods such as CRP, MFA, and RLSDP. In addition, sometimes, as the feature dimension increases, the performance of the algorithm will decrease. How to utilize other information of the sample, such as spatial distribution information, local structure information, and high-order statistical information to further improve the performance and robustness of the algorithm, is also an interesting direction for future study.