Classiﬁcation of Hyperspectral Images with Robust Regularized Block Low-Rank Discriminant Analysis

: Classiﬁcation of Hyperspectral Images (HSIs) has gained attention for the past few decades. In remote sensing image classiﬁcation, the labeled samples are insufﬁcient or hard to obtain; however, the unlabeled ones are frequently rich and of a vast number. When there are no sufﬁcient labeled samples, overﬁtting may occur. To resolve the overﬁtting issue, in this present work, we proposed a novel approach for HSI feature extraction, called robust regularized Block Low-Rank Discriminant Analysis (BLRDA), which is a robust and efﬁcient feature extraction method to improve the HSIs’ classiﬁcation accuracy with few labeled samples. To reduce the exponentially growing computational complexity of the low-rank method, we divide the entire image into blocks and implement the low-rank representation for each block respectively. Due to the symmetric matrix requirements for the regularized graph of discriminant analysis, the k -nearest neighbor is applied to handle the whole low-rank graph integrally. The low-rank representation and the k NN can maximally capture and preserve the global and local geometry of the data, respectively, and the performance of regularized discriminant analysis feature extraction can be apparently improved. Extensive experiments on multi-class hyperspectral images show that the proposed BLRDA is a very robust and efﬁcient feature extraction method. Even with simple supervised and semi-supervised classiﬁers (nearest neighbor and SVM) and randomly given parameters, the feature extraction method achieves signiﬁcant results with few labeled samples, which shows better performance than similar feature extraction methods.


Introduction
With the advancement of remotely-sensed hyperspectral imaging instruments, Hyperspectral Images (HSIs) have gained widespread attention throughout the globe.HSI contains enriched information due to the presence of various bands (almost more than hundreds) [1].HSI provides the comprehensive spectral information of the materials' physical properties.This ubiquitous technique is applied in agricultural monitoring [2,3], forestry [4], ecosystem monitoring [5], mineral identification [6,7], environmental pollution monitoring [8] and urban growth analysis [9,10].For the HSIs' classification, good results usually entail many labeled samples.The images are deemed as high-dimensional points that lie in or nearly in low-dimensional manifolds.Recently, numerous dimensionality reduction algorithms have proposed preserving the local manifold structure of the data, such as Locally Linear Embedding (LLE) [11], Isomap [12] and Laplacian Eigenmap [13].The main limitation is the lack of sufficient labeled training data since identifying and labeling samples comprise a daunting task.The main drawback of the machine learning approach is overfitting [14,15] due to the limited number of training data when dealing with small sample problems in highly dimensional and nonlinear cases, which is referred to as the "Hughes phenomenon" [16,17].
To resolve the issue above, semi-supervised learning was proposed to utilize both labeled data and the information conveyed by the marginal distribution of the unlabeled samples to boost the algorithmic performance [18][19][20].Graph embedding means that each node is mapped to a low-dimensional feature vector and tries to maintain the connection between vertices [20].Cai et al. proposed Semi-supervised Discriminant Analysis (SDA), which is a novel method that takes advantage of labeled and unlabeled samples [21].Nevertheless, the semi-supervised learning algorithmic performance relies profoundly on the graph construction process.The k-nearest neighbor [12] and Locally Linear Embedding (LLE) neighbors [13,22], as well as other traditional methods depend mainly on the pair of Euclidean distances.However, these methods are not robust to noise as a result of using Euclidean distances to find the pairwise weights.Yan et al. proposed the l 1 graph for SSL (Semi-supervised learning) [23].Zhu et al. presented a series of novel semi-supervised learning approaches arising from graph representation [18,24,25].Belkin and Niyogi [26] proposed a regression function that fits the label of the labeled data while maintaining the smoothness of the data.Jebara et al. provided a b-matching graph for SSL, which ensures that each node has the same number of edges in the balanced graph [22].Zhou et al. [18] conducted semi-supervised learning with local and global consistency.
Low-rank representation is proposed to construct an undirected graph (LR-graph) [27][28][29], which jointly receives the graph of all of the data according to the low-rank constraint [27][28][29].The graph regards as finite a group of samples in which one sample is associated with a vertex and the weight represents two connecting vertexes' similarity [20,30,31].In [32], Zhang et al. introduced the HSI restoration Low-Rank-based Recovery Method (LRMR), which can remove various noises.Veganzones et al. revealed an approach that partitions the image into patches and resolves the fusion issue of each patch by low-rank representation individually [33].According to LRR (Low-rank Representation) and the Learned Dictionary (LD), a hyperspectral anomaly detector has been put forward, which assumes that the hyperspectral image can disintegrate into a low-rank and a sparse matrix, respectively, representing the background and anomalies [34].To resolve the failure to preserve spatial information problem, a Tensor Sparse and Low-rank Graph for Discriminant Analysis (TSLGDA) is proposed [35].In [36], He et al. suggested a spatial-spectral mixed-noise removal method for the HSI.The proposed Sparse and Low-rank Graph-based Discriminant Analysis (SLGDA) incorporates sparsity and low-rank to simultaneously preserve both local and global information [30].Qi et al. presented a multi-task joint sparse and low-rank representation model for high-resolution satellite image interpretation [37].
The calculation of the LRR graph is a grave concern; because the growing data points subsequently issue the pixels in HSIs, the computational complexity of LRR grows exponentially.To resolve the issues above, we explore the low-rank structure via block theory.Figure 1 shows the formulation of the proposed regularized block low-rank discriminant analysis feature extraction for HSI classification.As shown in Figure 1, we preprocess the hyperspectral image first by the Image Fusion and Recursive Filtering feature (IFRF), which removes the redundant information simultaneously [17].The IFRF feature method is to select better bands, which eradicates the noise and redundant information concurrently.However, our goal is to improve the graph adjacency between the pixels for the regularization term of regularized discriminant analysis.Consequently, inspired by the LRR algorithm, we aim to exploit the low-rank structure.LR uses all samples as the dictionary, where each sample is represented as a linear combination of the dictionary.The low-rank hypothesis is a global constraint, which guarantees that data points in the same subspace are clustered in the same class [29].When all samples distribute into independent subspaces, the coefficients of the low-rank assumption reveal the membership of these samples: the within-cluster affinities are dense, and the between-cluster affinities are all zeros.Afterward, divide the processed image into blocks (subsets) by pixels and implement the low-rank representation on each block image.Eventually, we combine the subsets' feature representation with a complete feature graph.Further, the k-nearest neighbor is applied to handle the integral low-rank graph to satisfy the symmetric matrix requirements for the regularized graph of discriminant analysis.Additionally, at the same time, k-nearest neighbor preserves the local information of the image [38].Furthermore, process the semi-supervised discriminant analysis for feature extraction, which takes advantage of the labeled samples and the distribution of the whole samples.Finally, implement the supervised and semi-supervised classifier methods.We perform comprehensive experiments on several real multi-class HSIs.We summarize the main contributions of the paper in the following paragraphs.

•
From the inspiration of LRR and the semi-supervised discriminant analysis algorithm, we proposed a robust feature extraction method, regularized block low-rank discriminant analysis.The block LRR solves the growing computational complexity problem of LRR, which captures the global structure of the data.

•
After image fusion and recursive filtering, we implement our proposed regularized block low-rank discriminant analysis feature extraction method.The kNN approach simultaneously addresses two issues such as SDA's symmetric requirements and capturing the local information of the HSIs.Consequently, the BLRDA method maximally preserves the local geometry of the data.

•
Extensive experiments on several multi-class HSIs demonstrate that our proposed BLRDA method is a novel feature extraction method not yet evident in the literature.It can enhance the performance of hyperspectral image classification significantly.Given simple supervised and semi-supervised classifiers, the feature graph achieves significant performance in the HSIs' classification with few labeled samples.Moreover, the BLRDA feature extraction method is much more robust than similar methods.
We organize the remaining part of our paper as follows.We start with a preliminary introduction in Section 2. Section 3 deciphers the HSI classification based on regularized block low-rank discriminant analysis.To check the reproducibility of our proposed method, several experiments on real-world hyperspectral images are carried out in Section 4. In Section 5, we provide the discussion.Conclusions are presented in Section 6.

Regularization
To avoid the overfitting issue, regularization is implemented.Regularization means adding some rules (restrictions) to the objective function that needs training, which amends the solution space and reduces the possibility of finding the wrong solution, so that it does not inflate itself.The constraint is interpreted as a priori knowledge (the regularization parameter is equivalent to introducing the prior distribution to the parameter).
The regularization constraint has a guiding role.When optimizing the error function, it tends to choose the gradient reduction direction.Subsequently, the final solution tends to conform to a priori knowledge (such as the general l-norm a priori knowledge, indicating that the original problem is more likely to be relatively simple and tends to produce sparse parameters).
Regularization has roughly two functions.From the model modification point of view, it is used to balance the two terms in the learning process such as bias-variance, fitting ability-generalization ability, loss function-generalization ability and empirical risk-structural risk.From the model solution point of view, regularization provides the possibility of a unique solution.Eventually, least squares fitting may produce an infinite solution, but adding l 1 or l 2 regularization terms can lead to unique solutions.
In practice, overfitting [14,15] may happen due to limited training samples [16].To address the overfitting issue, semi-supervised learning was proposed to utilize both labeled samples and unlabeled samples to enhance algorithmic performance.The priori assumption of consistency is the key of SSL, which indicates that nearby points may have the same label or similar embeddings [18].Cai et al. proposed Semi-supervised Discriminant Analysis (SDA), which takes advantage of both labeled and unlabeled samples [21].
The semi-supervised learning algorithmic performance relies profoundly on the graph construction processes.Hence, inspired by low-rank representation, we aim to improve the semisupervised discriminant analysis feature extraction method via a novel regularized low-rank-based graph term.Furthermore, we narrated a brief outline of two important techniques, semi-supervised discriminant analysis and the low-rank representation algorithm.

Overview of Semi-supervised Discriminant Analysis
Semi-supervised Discriminant Analysis (SDA) is derived from Linear Discriminant Analysis (LDA).LDA is a supervised method that minimizes within-class covariance while maximizing between-class covariance.The objective function of LDA is as follows: where S b is defined as the between-class scatter and S w is the within-class scatter matrix.Overfitting may occur when there are not enough training samples.A possible solution to deal with overfitting could be learned on both labeled and unlabeled data by imposing a normalizer [21].Applying the notion of the regularized graph, a smoothness penalty is incorporated into the objective function of linear discriminant analysis, and the overfitting problem is solved.The labeled samples in the SDA algorithm are to maximize the different classes' separability.The graph adjacency of all of the samples is used to estimate the fundamental geometric information, which is called the regularized graph.
Given a set of samples where N = m + l, the first m samples are labeled as [y 1 , • • • , y m ], and the remaining l samples are unlabeled.There are c classes.A rejection matrix a is obtained here in the SDA method [21], which can present the prior assumption of consistency according to a regularization term.
Here, S t is defined as the total class scatter matrix.α is the parameter that balances the complexity and empirical loss of the model.The regularized graph J (a) controls the learning complexity of the hypothesis family.
Given a set of examples, S ij is the graph adjacency that represents the model of the relationships of nearby data points.Typically, the k-nearest neighbor graph is used to calculate the neighbor points' relationship, where an edge between the nearest neighbor is the weight of each other.
The model can easily incorporate the priori knowledge in the regularization term J (a), as follows: The column (or row, since S is symmetric) sum of S, i.e., D ii = ∑ j S ij is the diagonal matrix D. The Laplacian matrix [23] is L = D − S.Then, the SDA's objective function with regularization term J (a) is: We obtain the projective vector a by maximizing the generalized eigenvalue problem where d is the weight matrix's rank for the labeled graph.

Low-Rank Representation
The low-rank representation is first proposed to construct a graph (LR-graph) [23], which jointly receives the graph according to the low-rank constraint.It can efficiently catch the global structures of the data [39].
Suppose X = [x 1 ,x 2 , • • • ,x n ] ∈ R m×n is the samples set.Each column is an object sample.Therefore, we represent the feature matrix X on a given dictionary A [29] as: where is the low-rank coefficient matrix and z i is the representation coefficient of x i .z i is a linear combination, which is called the representation coefficient.The so-called low-rank representation is a matrix for this coefficient.For example, in HSIs, samples of the same category exhibit similar spectra, while samples from different groups do not present similar spectra.Due to the apparent intra-class similarity and inter-class dissimilarity, each sample can be represented well by others in the same class, while not by others in another class.Therefore, when being represented on the dictionary A, samples from the i-th class x i will produce a significant representation on the A i component and small factors on other parts A j (j = i).The appropriate permutations on columns of X, Z will exhibit apparent diagonal-block structure, as shown in Figure 2.
The representation with a diagonal-block structure.
However, the actual labels of samples are unknown in X; it is intractable to directly reveal the block-diagonal structure in Z.Nevertheless, the underlying block-diagonal structure enables Z to be low-rank [40].Therefore, we exploit the low-rank property of Z instead of the block-diagonal structure implicitly.Furthermore, each sample can be well represented due to the intra-class similarity.The following low-rank framework [29] searches the lowest rank solution: Due to the discrete nature of the rank function, the above optimization problem is onerous to sort out.It is evident from the literature that by matrix completion methods (e.g., [41]), the optimization problem is changed to a convex problem: where • * is the nuclear norm (i.e., trace norm) [42], which is the matrix's singular values' summation.A more reasonable function is proposed by taking into consideration the noise or corruption in real-world situations, where the residual matrix is often sparse.min where • l can be the l 2,1 -norm or l 1 -norm.Here, the l 2,1 -norm is chosen as the error term, which . λ is to balance the low-rank and the error term.Here, the inexact Augmented Lagrange Multipliers (ALM) method [43,44] is applied to obtain the optimization solution Z * .

Image Fusion and Recursive Filtering Feature
Initially, we extracted the spatial information of the HSI by using Image Fusion and Recursive Filtering (IFRF), which is one of the easiest approaches to image fusion [17].The IFRF feature of an HSI contains the main information of the HSI, which eradicates noise and redundant information simultaneously.It plays a vital role in identifying objects and can be used to discriminate between different classes in the classification problem [17].Hence, we preprocess the HSI first by IFRF to eliminate noise and redundant information.
Let us suppose R = (r 1 , r 2 , • • • , r D ) ∈ R M×D represents the original hyperspectral image, which has D-dimensional bands and M pixels in the i-th band of the whole image represented by r i .We have attempted to spectrally partition the whole hyperspectral image (hyperspectral bands) into multiple subsets.Each subset is composed of several K continuous bands.Here, we define N as the number of subsets.Further, we can get N = D/K , where D/K indicates the floor operation that computes the biggest integer no larger than D/K.The i-th subset is shown as follows: Afterward, fuse the adjacent bands of each subset by image fusion method (the averaging method).For example, the i-th image fusion feature F i , i.e., the i-th fusion band, is computed as follows, Here, P i n refers to the n-th band in the hyperspectral image's i-th subset.N i is the i-th subset's band number.After image fusion, we remove the noise pixels and redundant information for each subset.Then, we obtain the i-th feature by recursive filtering on the above fusion band Q i .
Here, RF indicates the recursive filtering transform method.δ s is defined as the filter's spatial standard deviations, and δ r is the range standard deviations [45].After image fusion and recursive filtering, we obtain the feature image be the preprocessed features vector, where x i represents a pixel with N band numbers (dimension) in the hyperspectral image.

Methodology
In the above section, we study the classic regularized discriminant analysis, semi-supervised discriminant analysis and the LRR, which can efficiently capture the global structures of the data.Furthermore, the labeled samples, as well as the unlabeled samples can be exploited simultaneously by depicting the underlying block-diagonal structure of Z. Considering this, we have attempted to decipher in our present work the robust regularized block low-rank discriminant analysis feature extraction method.We have comprehensively provided insight into the optimization method based on the inexact ALM algorithm.The framework of the HSI classification applying the proposed graph incorporation is found in the following part of this section.

Regularized Block Low-Rank Discriminant Analysis
The HSI features' vector , N is the channel number (data dimension), and M is the pixels number (sample or object number).Determining an appropriate subspace for classification is an important task.The low-rank representation feature graph is an onerous task; the operating time increases exponentially with a growing number of samples.Hence, we explore the low-rank structure via block theory, which divides the whole image into blocks for low-rank representation.
We choose the block size as S pixels in each block partition.Let {g 1 , g 2 , • • • , g m } represent m blocks' set index of the image, where there are S pixels in each g i .To formulate the matrix X, according to {g 1 , g 2 , • • • , g m }, we can put the vectors belonging to the same blocks together in the form of Further, the LRR optimization problem for each block is converted into the following form: min where we choose the l 2,1 -norm as the error term • l , which is Here, we use the inexact ALM method [43,44] as the optimal solution Z g i * .
For properly optimizing problem (13), by presenting an auxiliary variable J i regarding Z g i * , we convert it to the similar issue as follows [29]: min Incorporating an intermediate variable J i , we minimize the augmented Lagrange function as follows via the inexact ALM method [43]: where Y 1 , Y 2 and µ > 0 are Lagrangian multipliers and the penalty parameter, respectively.
To minimize the problem (15), we update the variables Z g i , J i , E g i , Y 1 and Y 2 alternately with others fixed.Meanwhile, the sub-problem is convex regarding all the variables; it can supply a relevant, unique solution.The inexact ALM is applied to update the variables J i , Z g i and E g i iteratively.
The optimization process outline is given in the following Algorithm 1.
After the block low-rank representation, we get Z g i ∈ R S×S .Then, we combine these feature representation subsets with a whole feature graph The combined low-rank graph is composed of multiple blocks graph Z, which is asymmetric.Traditionally, the symmetrization process is used to determine the mean value of its transposition by itself to satisfy the symmetric requirement of SDA.However, the graph adjacency matrix Z is not a square matrix.Our previous work shows that the k-nearest neighbor is an effective symmetrization process that can significantly improve the performance of SDA [38].Hence, the kNN algorithm integrally handles the combined low-rank graph, which addresses two issues, such as addressing the symmetrization process and further maximally preserving the local information of the image by the kNN's property.
The samples z i and z j are considered as neighbors if z i is among the k-nearest neighbor of z j or z j is among the k-nearest neighbor of z i .Here, we use the Heat kernel weighting [13] method to assign weights for S as follows: where N k (z i ) denotes k neighbors of z i in (16).Therefore, the BLRDA method can achieve high-quality feature representations for classification.
Input: Mapped data graph X g i , regularization parameter α for local affinity.
Fix other variables and update J i ; Fix other variables and update Z g i ; Fix the others and update E g i ; Update the multipliers as follows: Update parameter µ by µ = min (ρµ, µ max );

7:
Examine convergence conditions end while Output: The Laplacian regularized low-rank representation graph Z g i .

BLRDA Feature Extraction for HSI Classification
The semi-supervised discriminant analysis method can successfully solve the overfitting problem with few labeled samples.Given a set of samples {(x i , y i )} l i=1 with c classes and unlabeled samples {x i } m i=l+1 , the l k is the samples of the k-th class.The algorithmic procedure of HSI classification by applying the regularized block low-rank discriminant analysis feature extraction method for both supervised and semi-supervised classification is stated below: Step 1 Construct the adjacency graph: Construct the block low-rank and kNN graph S in Formula ( 16) for the regularization term.In addition, calculate the graph Laplacian L = D − S.
Step 2 Construct the labeled graph: for the labeled graph, construct the matrix as: Define Ĩ = I 0 0 0 , where I ∈ l×l is an identity matrix.
Step 3 Eigen-problem: Calculate the eigenvectors for the generalized eigenvector problem: where The W is of rank d, and we will have d eigenvectors denoted as {a 1 , a 2 , • • • , a d }.
Step 4 Regularized discriminant analysis embedding: Let the transformation matrix we can obtain the between-class scatter S b and the total class scatter matrix S t .
Therefore, the Eigen-problem in Formula ( 18) is the same as the Eigen problem in Formula ( 5), and we can obtain the projective matrix A: where S is the graph after kNN, i.e., S ij is the graph of nearby data points' relationships.L = D − S is the Laplacian matrix [23].
Then, graph S can be embedded into the d-dimensional subspace as the SDA embedding matrix Φ.
Step 5 HSIs' Classification: Finally, perform the supervised and semi-supervised classifier nearest neighbor and SVM classifier for classification.The classifiers mentioned above are simple and ubiquitously-used classifiers.

Experiments and Analysis
To investigate the performance of the BLRDA feature extraction method, we have conducted extensive experiments on several real multi-class hyperspectral images.In this section, we have incorporated the hyperspectral images used in the present study and illustrated the performance of our proposed methods.We performed the experimental analysis on Intel Core CPU 2.60 GHz and 8 GB RAM machines.

Datasets
We evaluate the proposed method on three hyperspectral images, namely the Indian Pines image, the Pavia University scene and the Salinas image (http://www.ehu.eus/ccwintco/index.php/ Hyperspectral_Remote_Sensing_Scenes).

•
The Indian Pines image is for the agricultural Indian Pine test site in Northwestern Indiana, which was obtained by the sensor AVIRIS.There are 145 × 145 pixels and 224 spectral bands between the wavelength range 400 nm and 2500 nm.The image scene comprises two-thirds agriculture and one-third forest or other natural perennial vegetation.It includes two major dual-lane highways, a rail line, as well as some low-density housing, other built structures and smaller roads.There is minuscule coverage, approximately less than 5%, because of some of

Evaluation Criteria
To evaluate the proposed method of HSIs, we give some evaluation criteria as follows.
Classification Accuracy (CA) refers to the pixels in the image classification in each class.In the field of remote sensing classification, the confusion matrix [46] is frequently used, which is defined in the form: M = [m ij ] n×n , where m ij denotes that the number of pixels labeled by j should belong to class i. n is the class number.The reliability of classification depends on the diagonal values of the confusion matrix.Higher values of the confusion matrix are suggested for favorable results.
The three primary indicators we used include the Overall Accuracy (OA), Average Accuracy (AA) and the kappa coefficient [47].For the three metrics, OA refers to the percentage of pixels correctly classified.AA measures the average correctly-classified pixels' percentage of each class.To make the measurements more objective, the kappa coefficient is used to estimate the percentage of correctly-classified pixels.Whereas OA and AA check how many of all pixels are classified correctly, assuming that the reference classification (ground truth) is true, here it is assumed that both classification and reference classification are independent class assignments of equal reliability; how well they agree is what is measured.The big advantage of the kappa coefficient over overall accuracy is that the kappa coefficient considers chance agreement and corrects for it.Chance agreement means here the probability that classification and reference classification agree by mere chance.Assuming statistical independence, we obtain for this probability the estimation [48,49].

Classifier Settings
We conduct a series of experiments to test and verify the proposed BLRDA feature extraction method under the supervised and semi-supervised classifiers.

•
The Support Vector Machine (SVM) classifier is a commonly-used supervised learning model for classification and regression analysis.

•
Nearest Neighbor (NN) is used as the semi-supervised classifier, which stores all available samples and classifies new samples based on similarity measures (e.g., distance functions).

Comparative Algorithms
To explain the significant improvement achieved by using the regularized block low-rank discriminant analysis feature extraction method in the hyperspectral images' classification, several comparative methods are shown in the paper.For fairness, these comparative graphs incorporate the regularized discriminant analysis algorithm, which is shown below, • BSDA (Block Sparse Representation Discriminant Analysis) method [43]: The Sparse Representation (SR)-graph considers the sparse representation by resolving the problem: â = arg min a y − Xa 1 .The weight of the graph is W ij = a i j .• BKDA (Block k-Nearest Neighbor Discriminant Analysis) method [43]: Euclidean distance is employed as our similarity measure, and the Gaussian kernel is adopted for the k-nearest neighbor and kNN feature graph.The nearest neighbor number is set as five here.• BLEDA (Block Locally Linear Embedding Discriminant Analysis) method [11]: Neighbors in Locally Linear Embedding (LLE) consider reconstructing a sample from its neighbor points and then minimizes the l 2 reconstruction error. min W ij = 0 if x j does not belong to the neighbors of x i .The nearest neighbor number k is set as five.

•
Image Fusion and Recursive Filtering (IFRF) method mentioned in Section 2.4.

Supervised and Semi-Supervised HSI Classification Results
To examine the performance of the combined BLRDA feature extraction method, we perform experiments on the three HSIs.Ten independent runs of each algorithm are evaluated by resampling the training samples in each run.We have chosen the mean values as the results.Unlike most of the existing supervised and semi-supervised HSI experiments, we test the performance of all comparative methods using only a small part of the label samples.In a practical scenario, the labeled samples are difficult to capture, and the unlabeled ones are usually present in substantial numbers.Table 1 provides the training and testing data for all classes.For the Indian Pines image, the Pavia University scene and the Salinas image, the training set is approximately 6%, 4% and 0.4%, which are minimal sets compared to the entire dataset.Subsequently, the training sets are chosen randomly.Considering the classes with a meager number of samples, we have incorporated a minimum threshold of training samples.Here, we set the minimum threshold of training samples for each class as five, which can eradicate the difference between the classes with a low number of samples.In our experiments, the filter's spatial standard deviations and range standard deviations are δ s and δ r with 200 and 0.3, respectively.The parameter σ in k-nearest neighbor S ij = exp − z i −z j 2 2σ 2 is 0.1, which is provided randomly.Hence, the following results are not under the best parameters, which shows the robustness and superiority of our method.
Initially, we utilize different graph construction methods to get the regularized graph of the given hyperspectral images.Then, the SDA algorithm is implemented for feature extraction.Furthermore, the NN method and SVM classifier are applied for final classification in the derived low-dimensional feature subspace.Tables 2-4 show the detailed classification results for CA, OA and AA, as well as the kappa coefficient obtained from various methods, where the bold numbers suggest the best results for different graph algorithms.Figures 6-8 show the classification results (randomly selected from our above experiments) acquired by several methods for the three hyperspectral images associated with the corresponding OA values.It has been observed that the results in the figures are random for each method and do not have comparability.These results reveal the supervised and semi-supervised classification results of different feature extraction methods.From these results, we can see the following.
In most cases, our proposed BLRDA feature extraction method brings about the highest classification accuracy.Therefore, it significantly improves the classification performance of hyperspectral images, which indicates that the BLRDA method is a superior HSI feature extraction method for both NN and SVM classifiers.Consequently, our feature extraction method is robust to both supervised and semi-supervised classification.From Tables 2 and 4, we reach the same conclusion that the proposed approaches BLRDA + NN and BLRDA + SVM are preferable to the other methods, particularly BLRDA + SVM.For example, with our methods, the Corn-notill, Corn-mintill, Grass-pasture, Grass-pasture-mowed Oats Soybean-notill and Buildings-Grass-Trees-Drives classes are significantly improved.For the Pavia University scene, BLRDA + SVM gives better results.The traditional graph construction methods (such as kNN-graph and LLE-graph) may perform well in some classes, but they are not as stable as our algorithm.

Running Time
We analyzed the running times of different models on the Indian Pines scene, Pavia University scene and Salinas scene images.used 10 separate runs to calculate the total time.We give the mean running time.As shown in Table 5, the execution time of the BLRDA method was slightly longer compared to the others.Although our algorithm is slower than the traditional kNN algorithms, the performance is much better than these baseline algorithms at an acceptable running time.In the above subsection, we evaluated the performance of the proposed BLRDA method.Fully excavating the superiority of the proposed method, we analyze the robustness of the BLRDA in this subsection.Taking into consideration practical situations, we analyzed the robustness based on the labeled samples' size and noise.

Robustness to the Size of Labeled Samples
We analyze the impact of different sizes of training and testing sets in this subsection.We perform experiments on these three images, namely the Indian Pines image, the Pavia University scene and the Salinas image.The test was carried out with 10 independent runs for each algorithm.We have calculated the mean value of the results.Figure 9  In most cases, our proposed BLRDA method consistently achieves the best results, which is robust to the label percentage variations.With the size of the training sample increasing, OA generally increases in all methods that show a similar trend.When the training sample is fixed, the effectiveness of the BLRDA method is usually superior to others, in terms of both the NN and SVM classifier.Similarly, the three classification criteria increase with the number of training samples simultaneously.
It is noteworthy that the method we proposed achieves higher classification accuracy even at very low label rates, while some other compared algorithms are not as robust as our BLRDA algorithm, especially when the label rate is low.Due to the high cost and difficulty of labeling data, our proposed graph for the SDA algorithm is much more robust and suitable for real-world HSIs.

Robustness on Simulated Noisy Hyperspectral Images
In the simulated experiment, we evaluated the noise robustness of the BLRDA method on the three hyperspectral images.Zero-mean Gaussian noise with different variances was added to all bands.For the Indian Pines image and Salinas image, the variance value ranges from 50-250.In the image of the Pavia University scene, the variance of noise varies from 100-500.For different bands, the noise intensity is equal.Figure 10 gives an example of noisy image samples in the Indian Pines image where the band is randomly selected, which is similar to the other two hyperspectral images.Compare the methods in these noisy environments.Ten independent runs of each algorithm have been evaluated, where the training samples in each run are resampled.We have calculated the mean value of the results.The labeled sample rates are 6%, 2% and 0.2%, respectively, as shown in Table 6.Figures 11-13 show the classification results (randomly selected from our above experiments) for the three noisy hyperspectral images associated with the corresponding OA values.It has been observed that the results in the figures are random for each method and do not have comparability.These results reveal the NN classification results of different feature extraction algorithms with some noise variance σ shown in the caption of these figures.We can see that the results of our method are robust to noise and labeled sample size.With few labeled samples, our method BLRDA is more powerful than other methods, which benefits from the robustness of the low-rank representation to noise and the global property of the LR-graph.As the noise gradually increases, the performance of some methods drops substantially.Our method is robust to noise and suffers little performance degradation.Above all, the BLRDA performs very well with respect to all three experimental hyperspectral images.

Parameters of the BLRDA Graph Effect
We evaluate the parameters for the BLRDA method.We conduct 10 independent runs per algorithm.We have calculated the mean value of the results.We show the performance of different block sizes and reduction dimensions in Tables 7 and 8. From Table 7, we can see that the block size increases for (25, 50, 75 and 100), and the training set is about 6%, 4% and 0.4% for the Indian Pines image, the Pavia University scene and the Salinas image.
In general, the classification results increase slightly with the growing block size and the reduced dimension.It is observed that the increase in block size simultaneously accelerates the running time considerably.Therefore, we could use the small block size in actual situations for the purpose of efficiency with minimal classification accuracy loss.
From Table 7, we can conclude that the reduced dimension is robust, which will work on a small block size.Overall, our proposed BLRDA method is much more robust and excellent for supervised and semi-supervised classification of HSIs.

Discussion
Classification of HSIs plays a pivotal role in understanding the HSIs.In the present work, we propose a novel approach for HSI feature extraction, Regularized Block Low-rank Discriminant Analysis (BLRDA).Our goal is to enhance the classification accuracy of HSIs by the useful feature extraction method.Experimental results on the three images show that the BLRDA is a competitive feature extraction method with other comparative methods for HSI classification.
From the supervised and semi-supervised experiments illustrated in Tables 2-4, we observe that the BLRDA method is an effective feature extraction method, which achieves the highest classification accuracy compared to the other methods.The performance was remarkable even with simple supervised and semi-supervised classifiers (nearest neighbor and SVM) and randomly given parameters.In some case, traditional construct graph methods (such as kNN-graph and LLE-graph) may perform well in some classes, but they are not as stable as our proposed algorithm.The LR-graph captures the global data structures better and obtains the representation of all samples under global low-rank constraints, while the most common k-Nearest Neighbor (kNN) and Locally Linear Embedding (LLE) neighbors use fixed global parameters to determine the weights of the graph with Euclidean distances whose graphical structures are associated with noise instability and are sensitive to noise.Additionally, the SR-graph lacks global constraints, which greatly degrades performance when the data are grossly corrupted.However, the LR-graph addresses these drawbacks, which jointly receives the graph of the whole data and is demonstrated to be robust to the noise.Further, the k-nearest neighbor is used to integrally handle the combined low-rank graph, which performs two functions: preserving the local information of the image and further satisfying the algorithmic requirements for the subsequent dimension reduction procedure.Consequently, it significantly improves the classification performance of hyperspectral images, which indicates that the BLRDA method is a superior HSI feature graph both for supervised classification and for semi-supervised classification.
We analyzed robustness based on the labeled samples' size effect and noise effect.In Figure 9, we discover that the BLRDA method is usually superior to the others in terms of both the NN and SVM classifier, which is robust to the label percentage variations.Figure 9 also shows that the proposed method achieves higher classification accuracy even at meager label rates.Some of the other compared algorithms are not as robust as our BLRDA algorithm, in particular when the label rate is low.Due to the lack of labeled samples, our proposed method is much more robust and suitable for real-world HSI classification.As we can see in Table 6 and Figures 11-13, the results of our method are robust to noise and labeled sample size.As the noise increases, the performance of BLRDA drops very slowly,

Figure 1 .
Figure 1.Formulation of the proposed regularized block low-rank discriminant analysis feature extraction for Hyperspectral Image (HSI) classification.IFRF, Image Fusion and Recursive Filtering; SDA, Semi-supervised Discriminant Analysis.

Figure 3 .
Figure 3. Indian Pines dataset.(a) Three-band color composite of the Indian Pines image.(b,c) Ground truth image and reference data.

Figure 4 .
Figure 4. Pavia University scene dataset.(a) Three-band color composite of the Pavia University scene.(b,c) Ground truth image and reference data.

Figure 5 .
Figure 5. Salinas dataset.(a) Three-band color composite of the Salinas image.(b,c) Ground truth image and reference data.
shows the supervised and semi-supervised classification accuracy of HSIs with different feature extraction methods.The figure compares the overall classification results with different training size samples in each class.The percentage of training samples (in percentage form) grow from 2-14% for the Indian Pines image, 2-8% for the Pavia University scene and 0.2-0.8% for the Salinas image.

Figure 10 .
Figure 10.Noised hyperspectral image example: Three-band color composite of noised Indian Pines image.

Table 1 .
Training and testing samples for the three hyperspectral images.

Table 2 .
Supervised and semi-supervised classification results for the Indian Pines image.BLRDA, Block Low-Rank Discriminant Analysis.BSDA, Block Sparse Representation Discriminant Analysis.BKDA, Block k-Nearest Neighbor Discriminant Analysis.BLEDA, Block Locally Linear Embedding Discriminant Analysis.IFRF, Image Fusion and Recursive Filtering.

Table 3 .
Supervised and semi-supervised classification results for the Pavia University scene.BLRDA, Block Low-Rank Discriminant Analysis.BSDA, Block Sparse Representation Discriminant Analysis.BKDA, Block k-Nearest Neighbor Discriminant Analysis.BLEDA, Block Locally Linear Embedding Discriminant Analysis.IFRF, Image Fusion and Recursive Filtering.

Table 4 .
Supervised and semi-supervised classification results for the Salinas image.BLRDA, Block Low-Rank Discriminant Analysis.BSDA, Block Sparse Representation Discriminant Analysis.BKDA, Block k-Nearest Neighbor Discriminant Analysis.BLEDA, Block Locally Linear Embedding Discriminant Analysis.IFRF, Image Fusion and Recursive Filtering.

Table 5 .
Run time of different methods on real-word HSIs (units).

Table with varying
Gaussian noise on three HSIs.

Table 7 .
Classification accuracy of the BLRDA method with different block sizes.Classification BLRDA method with different reduced dimensions.