A Sparse Representation-Based Sample Pseudo-Labeling Method for Hyperspectral Image Classiﬁcation

.


Introduction
Hyperspectral images (HSIs) have hundreds of spectral bands that contain detailed spectral information. Thus, hyperspectral images are widely used in precision agriculture [1], geological prospecting [2], target detection [3], and landscape classification [4]. Classification is one of the important branches in hyperspectral image processing [5] because it can help people understand hyperspectral image scenes via visual classification. However, in the context of supervised classification, the existence of the "Hughes phenomenon" [6] caused by an imbalance between the limited training samples and the extreme spectral dimension of hyperspectral images influences the classification performance [7].
To alleviate the "Hughes phenomenon", dimensionality reduction [8,9] and semi-supervised classification [10,11] have been extensively studied. The former can reduce the dimensions of hyperspectral images, and the latter can increase the number of training samples. In general, the dimension reduction method can be divided into two categories: feature extraction and 2. We have found that, for HSI, remote sensing reflectance extraction is necessary before using the sparse representation method. In this paper, the effective IID method is performed on the HSI, which is beneficial for reducing the sparse representation error.
The experimental results illustrate that the proposed method for pseudo-labeled sample generation based on sparse representation is very effective in improving the classification results.
The remainder of this paper is organized as follows. Related work is described in Section 2. The proposed SRSPL method is introduced in Section 3. The experiment are presented in Section 4. Discussion and conclusions are provided in Sections 5 and 6, respectively.

Intrinsic Image Decomposition of Hyperspectral Images
Intrinsic image decomposition (IID) [37] is a challenging problem in computer vision that aims at modeling the perceiving function of human vision to distinguish the reflectance and shading of the objects from a single image [38]. Since the intrinsic components of an image reflect different physical characteristics of the scene, e.g., reflectance, illumination, and shading, many issues such as natural image segmentation [39] can benefit from IID. Given a single RGB image G, the IID algorithm decomposes G into two components: the spectral reflectance component R and the shading component S: We denote each pixel i ∈ G as G i = R i S i , where G i = (G ir , G ig , G ib ) and R i = (R ir , R ig , R ib ). The parameters r, g, and b refer to the red, green, and blue channels of a color image, respectively. The optimization-based IID [40] method is based on the assumption of the local color characteristics in images; in a local window of an image, the changes in pixel values are usually caused by changes in the reflectance [41]. Under this assumption, the shading component could be separated from the input image. Thus, the reflectance value of one pixel can be represented by the weighted summation of its neighborhood pixel values: (2) where N(i) is the set of neighboring pixels around pixel i; w ij measures the similarity of the intensity value and the spectral angle value between pixel i and pixel j; Y represents the intensity image, which is calculated by averaging all the image bands; R j = (R jr , R jg , R jb ), A(G i , G j ) = arcos(G ir G jr + G ig G jg + G ib G jb ) denote the angle between the pixel vectors G i and G j ; and σ iY , σ iA denote the variance of the intensities and the angle in a local window around i, respectively.
Based on Equations (1) and (2), the shading component S and the reflectance component R can be obtained by optimizing the following energy function: The complete description of the optimized process can be found in [40]. Kang et al. [37] first extended the IID as a feature extraction method from a three-band image to a hyperspectral image (HSI) of more than 100 bands and achieved excellent results. Here, the pixel values of HSIs are determined by the spectral reflectance, which is determined by the material of different objects, and the shading component, which consists of light and shape-dependent properties. The shading component is not directly related to the material of the object. Therefore, IID is adopted to extract the spectral reflectance components of the HSI to distinguish more classes, and remove useless spatial information preserved in the shading component of the HSIs.

Sparse Representation Classification of Hyperspectral Images
The sparse representation classification (SRC) framework was first introduced for face recognition [42]. Chen et al. [43] extended the SRC to pixelwise HSI classification, which relied on the observation that spectral pixels of a particular class should lie in a low-dimensional subspace spanned by dictionary atoms (training pixels) from the same class. An unknown test pixel can be represented as a linear combination of training pixels from all classes. For HSI classification, suppose that we have C distinct classes and stack an overcomplete dictionary A = [A 1 , A 2 , ..., A N ] ∈ R B×N , where B denotes the number of bands and N is the number of training samples, respectively. Set L ∈ {1, 2, ..., C} is a set of labels, and c ∈ L refers to the cth class. For a test sample x ∈ R B×1 of HSI, x can be represented as follows: where α = [α 1 , α 2 , ..., α N ] T ∈ R B represents the sparse vector. Intuitively, the sparse vector α can be measured by the l 0 -norm of it (l 0 -norm counts the number of nonzero entries in a vector). Since the combinatorial l 0 -norm minimization is an NP-hard problem, the l 1 -norm minimization, as the closest convex function to l 0 -norm minimization, is widely employed in sparse coding, and it was shown that l 0 -norm and l 1 -norm minimizations are equivalent if the solution is sufficiently sparse [44].
As the hyperspectral signals from the same class often span the same low-dimensional subspace that is constructed by the corresponding training samples, which involves the non-zero entries of the sparse vector, the class of the hyperspectral signal x can be directly determined by the characteristics of the recovered sparse vector α. Given the dictionary of training samples A, the sparse representation coefficient vector α can be be recovered by solving the following Lasso problem: where · 2 represents the l 2 norm, · 1 represents the l 1 norm of the vector and λ > 0 is a scalar regularization parameter. This optimization problem can be solved by Proximal Gradient Descent (PGD) [45]. After the sparse representation vectorα is obtained, the label of the test sample x of HSI can be assigned by the minimal reconstructed residual: where A c is the subdictionary of the cth class andα c denotes the representative coefficient of the cth class.

Proposed Method
A novel sparse representation-based sample pseudo-labeling (SRSPL) method is proposed in this paper. The SRSPL method consists of the following three steps. First, IID is used to reduce the dimension and noise of the original hyperspectral image. Second, sparse representation is applied to generate the sparse representation vectors of the hyperspectral pixels other than the training samples and test samples. The information entropies of all these hyperspectral pixels are calculated based on the sparse representation vectors to discriminate purity, and the samples with low representation entropies are selected to generate candidate samples. Third, these candidate samples are assigned pseudo-labels by the minimal reconstructed residual. Then, these pseudo-labeled samples are augmented to the original training set, and an extended random walker (ERW) classifier is trained to evaluate the sample quality. Figure 1 shows a graphical example illustrating the principle of the proposed SRSPL method.

Feature Extraction of Hyperspectral Images
(1) Spectral Dimensionality Reduction The initial hyperspectral image I = (I 1 , ..., I N ) ∈ R B×N is divided into M groups of equal band sizes [37], where N is the number of pixels and B is the dimension of the initial image. The number of bands in each group is denoted by B 1 , B 2 , ..., B M .
The averaging-based image fusion is applied to each group and the resulting fused bands are used for further processing:Ĩ where m is the mth group, I n m is the nth band in the mth group of the original hyperspectral data, B m is the number of bands in the mth group, andĨ m is the mth fused band. Here, the averaging-based method can ensure that the pixels of the dimensionality-reduced hyperspectral image can still be directly interpreted in a physical sense. In other words, the pixels of the dimensionality-reduced image will be related to the reflectance of the scene. Moreover, this method can effectively remove image noise.
(2) Band Grouping The dimensionality-reduced imageĨ is partitioned into several subgroups of adjacent bands as follows:Î whereÎ k refers to the kth subgroup, M/Z is the largest integer not greater than M/Z, and M/Z is the smallest integer not less than M/Z. Here, Z refers to the number of bands in each subgroup.
(3) Reflectance Extraction with the Optimization-Based IID For each subgroupÎ k , the optimization-based IID method is used to obtain the reflectance and shading components, and the equation is as follows: where R k and S k refer to the reflectance and the shading components of the kth subgroup, respectively.
The reflectance components of the different subgroups are combined to obtain the resulting IID features, which is an M-dimensional feature matrixR that can be used for subsequent processing.

Candidate Sample Selection Based on Sparse Representation
To find the purest pixels and use them as candidate samples, we first calculate the sparse representation of all the hyperspectral samples other than the training samples and test them (denoted as S unlabeled = (I i ) u i=1 ). Then, we determine the purity of the samples by calculating its sparse representation information entropy. When a sample has a lower sparse representation information entropy, it is very likely to be a pure pixel. The process can be described as follows: (1) Calculating the Hyperspectral Sample's Sparse Representation For a given sample I i ∈ S unlabeled ,R I i denotes the reflectance component of I i , andR I i ∈ R M . The goal of sparse representation is to representR I i by a sparse linear combination of training samples that make up the dictionary. Suppose L ∈ {1, 2, ..., C} is a set of labels and c ∈ L refers to the cth class. The representation coefficientα ofR I i can be calculated based on an overcomplete dictionaryÃ according to Equation (5). Therefore,R I i is assumed to lie in the union of the C different subspaces, which can be seen as the sparse linear combination of all the training samples: whereÃ ∈ R M×N m is the structured overcomplete dictionary, which consists of the class sub-dictionaries {Ã c } c=1,...,C . N m is the number of training samples, andα is an N m -dimensional representation coefficient vector formed by concatenating the sparse vectors {α c } c=1,...,C . The representation coefficientα ofR I i can be recovered by solving the following optimization problem:α = miñ where λ is a scalar regularization parameter and requires experimental analysis to determine the optimal value.
(2) Discriminating the Purity of Hyperspectral Samples based on Information Entropy The information entropy of eachR I i is calculated based on its representation coefficient: where Ent(·) denotes an entropy function andα is the representation coefficient vector ofR I i . The purity ofR I i is determined according to the magnitude of the entropy.

(3) Finding the Candidate Samples
First, the information entropy for the reflectance components of eachR I i is obtained. Second, these samples corresponding to these reflectance components are sorted in ascending order according to the magnitude of their entropies. Third, the first T samples are selected as the candidate samples set where the parameter T is an optimal value obtained by experiments.

Pseudo-label Assignment for Candidate Samples
For each candidate sample, one pseudo-label is determined based on the minimal reconstructed residual:ỹ whereÃ c is the sub-dictionary of the cth class andα c denotes the representative coefficient of the cth class. Thus, the pseudo-labeled sample set is

Classification Model Optimization Using Pseudo-Labeled Samples
First, the pseudo-labeled samples set S pseudo and the initial labeled samples set S initial are combined to form the new labeled samples set S labeled . Then, the ERW [46] algorithm is used as a classification model to evaluate the quality of these newly generated pseudo-labeled samples, where the ERW algorithm is adopted to calculate a set of optimized probabilities for the class of each pixel to be determined based on the maximum probability.

Pseudo-Code of the Proposed SRSPL Method
For a detailed illustration of the proposed SRSPL method, the pseudocode of the proposed SRSPL method is shown in Algorithm 1.

Algorithm 1: Sparse Representation-based Sample Pseudo-Labeling (SRSPL) for Hyperspectral Image Classification
Input: Hyperspectral image I = (I 1 , ..., I N ); the initial labeled samples set S initial = (I i , y i ) l i=1 ; the hyperspectral samples set other than the training samples and test samples: Output: Classification map C 1: Reduce the dimension and noise of I based on averaging image fusion (Equation (7)) to obtain the dimensionally reduced imageĨ.

2:
According to Equation (8), partition theĨ into several subgroups of adjacent bands, denoted asÎ k . (9) and (10), obtain the spectral reflectance components of eachÎ k , (through the optimization-based IID method) and combined to obtain the resulting IID featuresR.

4: For each I
(1) According to Equation (11) and (12), solve the sparse representation coefficientα of each sample I i based on the reflectance componentR I i and the overcomplete dictionaryÃ. (2) According to Equation (13), calculate the information entropy of eachR I i to discriminating the purity of each sample I i . 5: End For 6: Sort the samples corresponding to these reflectance components in ascending order according to their entropy magnitudes. 7: Selected the first T samples as the candidate samples set S candidate = (R I i ) T i=1 . 8: According to Equation (14), assign the pseudo-label for each sample in set S candidate , to obtain the pseudolabeled samples set S pseudo = (R I i ,ỹ I i ) T i=1 . 9: Combine the initial labeled samples set S initial and the pseudo-labeled samples set S pseudo to the new labeled samples set S labeled . 10: Classify the spectral reflectance componentsR with the extended random walker (ERW) classifier and the new labeled samples set S labeled to obtain the final classification map C. 11: Return C 4. Experiment

Experimental Data Sets
(1) Indian Pines data set: The Indian Pines image was acquired by the Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) sensor, which captured the Indian Pines unlabeled agricultural site in northwestern Indiana. The image contains 220 × 145 × 145 bands. Twenty water absorption bands (Nos. 104-108, 150-163, and 220) were removed before hyperspectral image classification. The spatial resolution of the Indian Pines image is 20 m per pixel, and the spectral coverage ranges from 0.4 to 2.5 µm. Figure 2 shows a color composite of the Indian Pines image and the corresponding ground-truth data. (2) University of Pavia data set: The University of Pavia image, which captured the campus of the University of Pavia, Italy, was recorded by the Reflective Optics System Imaging Spectrometer (ROSIS). This image contains 115 bands of size 610 × 340 and has a spatial resolution of 1.3 m per pixel and a spectral coverage ranging from 0.43 to 0.86 µm. Using a standard preprocessing approach before hyperspectral image classification, 12 noisy channels were removed. Nine classes of interest are considered for this image. Figure 3 shows a color composite of the University of Pavia image and the corresponding ground-truth data. (3) Salinas data set: The Salinas image was captured by the AVIRIS sensor over Salinas Valley, California, at a spatial resolution of 3.7 m per pixel. The Salinas image contains 224 bands of size 512 × 217. Twenty water absorption bands (Nos. 108-112, 154-167, and 224) were discarded before classification. Figure 4 shows a color composite of the Salinas image and the corresponding ground-truth data.
(4) Kennedy Space Center data set: The Kennedy Space Center (KSC) image was captured by the National Aeronautics and Space Administration (NASA) Airborne Visible/Infrared Imaging Spectrometer instrument at a spatial resolution of 18 m per pixel. The KSC image contains 224 bands of size 512 × 614. The water absorption and low signal-to-noise ratio (SNR) bands were discarded before classification. Figure 5 shows the KSC image and the corresponding ground-truth data.

Parameter Analysis
(1) Analysis of the Influence of Parameter λ For sparse representation, the regularization parameter λ used in Equation (12) controls the relative importance between sparsity level and reconstruction error, leading to different classification accuracies. Figure 6 shows the effect of varying λ on the classification accuracies for Indian Pines, University of Pavia, Salinas, and Kennedy Space Center data sets. In the experiment, the four different data sets are trained with 20 samples per class, respectively, and the remaining labeled samples are used for testing. The regularization parameter λ for sparse representation is chosen between λ = 1 × 10 −8 and λ = 1. In Figure 6, when λ < 1 × 10 −6 , the overall accuracy (OA) shows an increasing trend on each data set. As λ continues to increase, the overall trend of OA is decreasing. This is because more weight on sparsity level and less weight on approximation accuracy. Therefore, λ = 1 × 10 −6 is set as the optimal value in this paper.
(2) Analysis of the Influence of Parameter T The parameter T is selected by using the five-fold cross-validation strategy and repeating the finetuning procedure. As described in Section 3.2, when the entropy of the sparse representation vector of a sample is small, the probability of that sample being a pure sample is large. As T increases, samples subsequently added will have a larger entropy, and these newly pseudo-labeled samples have a high probability of being mixed pixels. Thus, although the number of samples increases, the magnitude of the increase in the classification accuracy is small; consequently, it is important to choose a suitable T. In the experiment, the four different data sets are trained with 20 samples per class, respectively, and the remaining labeled samples are used for testing. Figure 7 shows the impact of different numbers of pseudo-labeled samples on the OA for the four data sets. The OA corresponding to different T values is the mean value of 30 random replicate experiments. As seen by observing Figure 7, for the different data sets, when T < 40, the OA tends to grow, although there is a downward trend in the interval related to the quality of the generated samples and the selected sample points. When T > 40, the OA shows a significant downward trend. In addition, when T = 40, the OAs are highest. Therefore, in this paper, we selected T = 40 as the default parameter value.   The parameters M and Z from Equation (8) are hyperparameters that require tuning. We evaluated the influences of the parameters M and Z by objective and visual analyses. In the experiment, the four different data sets are trained with 5, 20, 3, and 15 samples per class, respectively, and the remaining labeled samples are used for testing. Figure 8 shows the OA of the proposed SRSPL method of four different data sets. For the four data sets, as Figure 8 shows, as M and Z change, the OA exhibits large fluctuations. When M is less than 10, the accuracy of the proposed method is relatively low. This result indicates that, when M has few features, useful spectral information will be lost. Furthermore, the results show that the proposed method can achieve stable and high accuracy when the size of the subgroup is less than 8 because the IID method works for small channel images. In this paper, the default parameter settings M = 32 and Z = 4 were adopted for subsequent testing of the data sets. (4) Sensitivity Analysis The parameters involved in our proposed SRSPL method are mainly regularization parameter λ, the number of generated pseudo-labeled samples T, feature numbers M and different subgroup sizes Z. We have conducted an extensive experimental analysis of each parameter for the SRSPL method. The one-parameter-at-a-time (OAT) [47] approach is used for model parameter sensitivity analysis, that is, to ensure that other parameters remain unchanged when changing one parameter, and then to study the effect of one parameter on the model. Figure 9 shows the sensitivity of parameters λ, T, M and Z for four data sets. It can be seen that the sensitivity value of λ on the four data sets is the smallest. However, parameters T, M, and Z show a higher sensitivity value on the four different data sets. Thus, it follows that the λ has a slight effect on the SRSPL method and T, M, Z have a greater effect on the SRSPL method, which is highly sensitive and should be tuned.

Performance Evaluation
We applied the proposed SRSPL method and other methods to four hyperspectral images to evaluate the effectiveness of various hyperspectral image classification methods. We compared the proposed SRSPL method with several other classification methods, including the traditional SVM [48], three semi-supervised methods, e.g., extended random walker (ERW) [46], spatial-spectral label propagation based on SVM (SSLP-SVM) [49], the maximizer of the posterior marginal by loopy belief propagation (MPM-LBP) [50] and two deep learning methods, e.g., recurrent 2D convolutional neural network (R-2D-CNN) [51], cascaded recurrent neural network (CasRNN) [52]. IID [37] is used to extract the useful features and reduce noise in the proposed SRSPL method. Therefore, in order to verify the validity of the IID, an experiment that does not use the IID method is added (record as Without-IID). In this section, the effect of different feature extraction methods to the proposed method is also analyzed, e.g., principal component analysis (PCA) [53], image fusion, and recursive filtering (IFRF) [54] (record as With-PCA and With-IFRF, respectively). Here, the magnitude setting of the parameter T in the With-PCA and With-IFRF methods is the same as the SRSPL method. The SVM classifier was implemented from a library for LIBSVM and used a Gaussian kernel. The tests were performed using five-fold cross-validation.
The parameters for the ERW, SSLP-SVM, MPM-LBP, R-2D-CNN, CasRNN, PCA and IFRF methods were set to the default parameters reported in the corresponding papers. In this paper, three common metrics, namely: the OA, average accuracy (AA), and Kappa coefficient, were used to evaluate classifier performance. The classification results for each method are given as the average of 30 experiments to reduce the influence of sample randomness. The data on the left side of Tables 1-3 represent the mean values, while those on the right represent the standard deviations.
The experiment was first performed on the Indian Pines date set. As shown in Table 1, the training samples were randomly selected and accounted for 5, 10, 15, 20, and 25 per class of the reference data. Note that, as the number of training samples increases, the classification accuracy grows steadily.
The proposed SRSPL method always outperforms the other methods, such as ERW, SSLP-SVM, and MPM-LBP, in terms of the highest overall classification accuracies; this method obtained the highest accuracies in all cases (84.73%, 90.90%, 93.88%, 95.52% and 97.09%, respectively). Compared with the two deep learning methods R-2D-CNN and CasRNN, the SRSPL method can achieve higher accuracy in the case of small samples. Compared with the three methods of Without-IID, With-PCA, and With-IFRF, Table 1 shows that the proposed SRSPL method obtains higher accuracies, which indicates that feature extraction using the IID method is effective.    Figure 10 shows the classification maps obtained for the Indian Pines data set with the five methods when using 25 training samples from each class. As shown in Figure 10c Table 2 shows the experimental results using the University of Pavia data set when the number of training samples per class is 20. As Table 2 shows, the nine compared methods achieve various performance characteristics when only 20 training samples per class are provided. The proposed SRSPL method not only outperforms the R-2D-CNN and CasRNN methods, which are the state-of-the-art methods based on deep learning in the hyperspectral image classification, but also outperforms the other compared methods by 2-20%. Notably, on classes such as "Asphalt" and "Trees", the proposed SRSPL method performs much better than does the ERW method-by 12.57% and 12.94%, respectively. In the feature extraction experiment, the OA of the proposed SRSPL method outperforms the Without-IID, With-PCA, and With-IFRF methods by 2.01%, 15.14%, and 1.87%, respectively. Figure 11 shows the classification maps obtained for the University of Pavia data set with the different methods when using 20 training samples from each class. As Figure 11 shows, the SRSPL method is superior to other semi-supervised and deep learning methods in visual appearance, which shows its effectiveness.
The third experiment was conducted on the Salinas data set. In Table 3, three training samples per class were randomly selected for training, and the rest were used for testing. Given the limited samples, the experiment is quite challenging. The corresponding quantitative values of the classification results are tabulated in Table 3. As shown, although only limited training samples were provided, some of the methods achieved high OA and Kappa scores. In fact, some of the methods achieved 100% classification accuracy on several classes. This is because the Salinas image includes many large uniform regions that make classification simpler. Compared with the other methods, the proposed SRSPL method greatly improves the classification accuracy when training samples are extremely limited. Because the R-2D-CNN and CasRNN methods require a large number of samples for training, when there are small training samples, it is easy to produce overfitting and thus lead to poor classification results. As seen from Figure 12, the proposed SRSPL method can classify most of the features correctly, which reflects the effectiveness of the method.  The fourth experiment was performed on the Kennedy Space Center data set. In the experiment, the proposed SRSPL method was compared with the SVM method and three other semi-supervised methods, e.g., ERW, SSLP-SVM, MPM-LBP, and two different deep learning methods, e.g., R-2D-CNN, CasRNN. Figure 13 shows the changes in OA and Kappa coefficient as the number of training samples per class increases from 3 to 15. Compared with the deep learning-based classification methods R-2D-CNN and CasRNN, the proposed SRSPL method shows great advantages. As shown in Figure 13, the SRSPL method achieves the highest accuracy among the tested methods.  Finally, we evaluated the computational times of the three different semi-supervised methods (i.e., MPM-LBP, SSLP-SVM, and SRSPL) on the four different data sets using 20 samples per class using MATLAB on a computer with 3.6 GHz CPU and 8-GB memory. Because the semi-supervised method proposed in this paper does not involve a single training process, its computational cost is greatly improved. Table 4 shows that the proposed SRSPL method requires less time to process the Indian Pines, Salinas, and Kennedy Space Center data sets than do the other tested methods.

Discussion
In hyperspectral image classification, it is difficult and expensive to obtain enough labeled samples for model training. Considering the strong spectral correlation between labeled and unlabeled samples in the image, we proposed a novel sparse representation-based sample pseudo-labeling method (SRSPL). The pseudo-labeled samples generated by this method can be used to augment the training set, thereby solving the problem of poor classification performance under the condition of small samples.
Compared with other pseudo-labeled sample generation method (such as SSLP-SVM), the proposed SRSPL method is more reliable in generating pseudo-labeled samples. The SSLP-SVM method only adds a small number of pseudo-labeled samples near the labeled samples, and the added samples may be mixed samples. The proposed SRSPL method considers the relationship between sample purity and information entropy, that is, the purer the sample's spectrum, the lower the information entropy of the sparse representation coefficient, and vice versa. Specifically, the spectral characteristics of a pure sample can be linearly represented using a single class of atoms in an overcomplete dictionary, so its coefficient vector will have smaller entropy. In our method, pure samples with smaller entropy are used to expand the initial training sample set, which can greatly improve classifier performance.
In the case of small samples, compared with other classifiers (such as SVM, ERW, and MPM-LBP), the SRSPL method produces a good classification map for each hyperspectral data set visually, as shown in Figures 9-11. This is because the hyperspectral image is first processed using the intrinsic image decomposition technology, which helps reduce errors in subsequent sparse representations, and the pseudo-label samples generated by the SRSPL method can help optimize the classification model. In addition, compared with deep-learning-based classifiers (such as R-2D-CNN and CasRNN), the proposed SRSPL method performs better for a limited number of samples because, in such cases, these two methods are more likely to overfit the training samples, resulting in poor classification results. Thus, the proposed SRSPL method shows better classification results than other comparative methods.

Conclusions
In this paper, we proposed a novel sample pseudo-labeling method based on sparse representation that addresses the problem of limited samples. The previously proposed semi-supervised methods for generating pseudo-labeled samples typically select some samples and assign them a pseudo-label based on spectral information correlations or local neighborhood information. However, due to the presence of mixed pixels, the selected samples are not necessarily representative. To find the purest samples, we designed a sparse representation-based pseudo-labeling method that utilizes the coefficient vector of sparse representation and draws on the definition and idea of entropy from information theory. Overall, the proposed SRSPL method provides a new option for semi-supervised learning, which is the first contribution of the paper. Moreover, the proposed SRSPL method also solves the problem of uneven sample distribution through sparse representation based on spectral features, which is beneficial for subsequent classification. In addition, by comparing the standard deviations of OA, AA, and Kappa of 30 random replicate experiments with other state-of-the-art classification methods, we found that the proposed SRSPL method had higher robustness and stability. This is the second contribution of the paper. The experimental results on four real-world hyperspectral images show that the proposed SRSPL method is superior to other state-of-the-art classification methods from the perspectives of both quantitative indicators and classification maps.