Hyperspectral Remote Sensing Image Classiﬁcation Based on Partitioned Random Projection Algorithm

: Dimensionality reduction based on random projection (RP) includes two problems, namely, the dimensionality is limited by the data size and the class separability of the dimensionality reduction results is unstable due to the randomly generated projection matrix. These problems make the RP algorithm unsuitable for large-size hyperspectral image (HSI) classiﬁcation. To solve these problems, this paper presents a new partitioned RP (PRP) algorithm and proves its rationality in theory. First, a large-size HSI is evenly divided into multiple small-size sub-HSIs. Afterwards, the projection matrix that maximizes the class separability is selected from multiple samplings in which the class dissimilarity measurement is deﬁned as large inter-class distance and small intra-class variance. By using the same projection matrix, each small-size sub-HSI is projected to generate a low dimensional sub-HSI, thereby generating a low dimensional HSI. Next, the minimum distance (MD) classiﬁer is utilized to classify the low dimensional HSI obtained by the PRP algorithm. Finally, four real HSIs are used for experiments, and three of the most popular classiﬁcation algorithms based on RP are selected as comparison algorithms to validate the effectiveness of the proposed algorithm. The classiﬁcation performance is evaluated with the kappa coefﬁcient, overall accuracy (OA), average accuracy (AA), average precision rate (APR), and running time. Experimental results indicate that the proposed algorithm can obtain reliable classiﬁcation results in a very short time.


Introduction
The wavelength range of hyperspectral images (HSIs) is continuous and dense from visible light to short-wave infrared [1]. Compared with panchromatic images and multispectral remote sensing images, HSIs contain richer spectral information and finer texture information of ground objects, which provide a solid data foundation for high-precision ground object classification [2,3]. Therefore, more and more researchers apply it to the fields of medicine [4,5] and agriculture [6,7]. However, for commonly configured hardware, the classification of large-size HSIs has some problems, such as long running time and a large amount of data. The most straightforward solution is to perform dimensionality reduction before HSI classification, thereby reducing the computational load through dimensionality reduction [8,9].
Popular HSI dimensionality reduction methods are roughly divided into two types: band selection [10,11] and feature extraction [12,13]. The band selection method directly discards most of the bands and only selects a part of the bands for subsequent HSI analysis. Currently, Zhang et al. [14] proposed a marginalized graph self-representation (MGSR) method for unsupervised hyperspectral band selection, which considers the differences between different homogenous regions. First, super-pixel segmentation is used to construct the structure map of HSI [15]. Additionally, considering the relationship between adjacent pixels of the same segmentation, the damage is marginalized by an alternating optimization ings in large-size HSI classification. (1) The lowest projection dimensionality of the RP algorithm is independent of the high dimensionality, which is positively correlated with the number of hyperspectral vectors [41]. The larger number of hyperspectral vectors, the larger the lowest projection dimensionality. (2) Different dimensionality reduction results will be generated by different projection matrices, which lead to the class separability of dimensionality reduction results being very unstable. Thus, the RP algorithm cannot play a dimensionality reduction effect on the HSI with a large number of hyperspectral vectors, which is not conducive to large-size HSI classification.
Aiming at the above problems existing in the RP algorithm in large-scale dimensionality reduction and classification, an HSI classification algorithm based on partitioned RP (PRP) is proposed. Firstly, the PRP algorithm is proposed by means of image division, and the distance preservation of the hyperspectral vector pair after partitioning is still satisfied from two perspectives within each sub-HSI and between any two different sub-HSIs. In addition, a classifier based on the PRP algorithm is designed, in which the optimal projection matrix is determined according to the principle of large inter-class distance and small intraclass variance, and minimum distance (MD) [42,43] is used to classify the low dimensional HSI received by the PRP algorithm. The experimental results show that the PRP algorithm can project the large-size HISs into a subspace with a lower dimensionality than the RP algorithm, and the proposed classification algorithm can obtain good classification results. The structure of this paper is organized as below. Sections 2 and 3 give the materials and the proposed algorithm, respectively. The experimental results and discussion are provided in Sections 4 and 5. Finally, this paper is concluded in Section 6.

Materials
This paper uses four publicly available datasets with validation data, namely, real Pavia Centre, Salinas, Chikusei, and LongKou [44,45] images. The four images are respectively located in Pavia Centre in northern Italy, Salinas in Northwestern Indiana, Chikusei in Japan, and LongKou in China. Details of these images are listed in Table 1, including download address, year, sensor, spatial resolution, and number of bands. Spatial resolution represents the actual distance on the ground represented by a pixel. It is worth noting that this paper uses HSIs of sizes 1096 × 531, 512 × 127, 1000 × 800, and 550 × 400, with homogeneous regions of 9, 13, 7, and 9, respectively, as shown in Figure 1. The classes included in the Pavia Centre image are water, trees, meadows, selfblocking bricks, bare soil, asphalt, bitumen, tiles, and shadows. The classes included in the Salinas image are broccoli green weeds2, fallow, fallow rough plow, fallow smooth, stubble, celery, grapes untrained, soil vineyard develop, corn senesced weeds, lettuce romaine 4 week, lettuce romaine 5 week, lettuce romaine 6 week, and lettuce romaine 7 week. The classes included in the Chikusei image are water, bare soil (school), bare soil (farmland), natural plants, glass, rice field (grown), and row crops. The classes included Remote Sens. 2022, 14, 2194 4 of 24 in the LongKou image are corn, cotton, sesame, broad leaf soybean, narrow leaf soybean, rice, water, roads and houses, and mixed weed. Specifically, Figure 1 shows the false-color images, the standard images (i.e., validation data), and the legends. The band combinations used to replace the RGB bands in the false-color images of the four images are (49, 31,15), (29,20,12), (54, 35,22) and (130,65,18), respectively. Moreover, the specific classes are explained in the legends. In addition, because many vectors in these images do not contain any valid information, these vectors are discarded before applying. Therefore, the numbers of spectral vectors in these scenes are 109,794, 20,655, 9435, and 204,542, respectively. stubble, celery, grapes untrained, soil vineyard develop, corn senesced weeds, lettuce romaine 4 week, lettuce romaine 5 week, lettuce romaine 6 week, and lettuce romaine 7 week. The classes included in the Chikusei image are water, bare soil (school), bare soil (farmland), natural plants, glass, rice field (grown), and row crops. The classes included in the LongKou image are corn, cotton, sesame, broad leaf soybean, narrow leaf soybean, rice, water, roads and houses, and mixed weed. Specifically, Figure 1 shows the false-color images, the standard images (i.e., validation data), and the legends. The band combinations used to replace the RGB bands in the false-color images of the four images are (49, 31,15), (29,20,12), (54, 35,22) and (130,65,18), respectively. Moreover, the specific classes are explained in the legends. In addition, because many vectors in these images do not contain any valid information, these vectors are discarded before applying. Therefore, the numbers of spectral vectors in these scenes are 109,794, 20,655, 9435, and 204,542, respectively.

Random Projection
RP is an effective dimensionality reduction tool that can keep the distance between vector pairs before and after projection unchanged. The usual version of the RP algorithm is defined below [26].

Theorem 1.
Supposing that A is any S × D matrix, where S is the data size and D is the feature dimensionality. Its row vector is as in D dimensional spaces, where s is the index of row vectors and s = 1, …, S. For constants ε and β > 0, an integer KRP is chosen as follows, where K 0 RP is the lowest projection dimensionality of the RP algorithm. RRP is any D × KRP matrix whose entry belongs to the standard normal Gaussian distribution. The S × KRP matrix is the projection of A in KRP dimensional subspaces. The row vector of matrix B is bs. Then, with the probability of no distortion at least for any two row vectors as and as′ of matrix A, the distance of the row vectors bs and bs′ of matrix B is preserved as follows, where s′ is the index of row vectors.
It can be seen from the above theorem that RP is a computationally simple method. The order of forming a projection matrix RRP and projecting the high dimensional data A into the KRP dimensional subspaces is O (D × KRP × S). If the number of vectors S becomes larger, the lowest projection dimensionality K 0 RP will become larger according to Equation (1) and the probability of no distortion PRP will also become larger according to Equation (3). Therefore, when the number of vectors is large, although the probability of no distortion is high, the low dimensionality KRP may be higher than the high dimensionality D, so

Random Projection
RP is an effective dimensionality reduction tool that can keep the distance between vector pairs before and after projection unchanged. The usual version of the RP algorithm is defined below [26].

Theorem 1.
Supposing that A is any S × D matrix, where S is the data size and D is the feature dimensionality. Its row vector is a s in D dimensional spaces, where s is the index of row vectors and s = 1, . . . , S. For constants ε and β > 0, an integer K RP is chosen as follows, where K 0 RP is the lowest projection dimensionality of the RP algorithm. R RP is any D × K RP matrix whose entry belongs to the standard normal Gaussian distribution. The S × K RP matrix is the projection of A in K RP dimensional subspaces. The row vector of matrix B is b s . Then, with the probability of no distortion at least for any two row vectors a s and a s of matrix A, the distance of the row vectors b s and b s of matrix B is preserved as follows, where s is the index of row vectors.
It can be seen from the above theorem that RP is a computationally simple method. The order of forming a projection matrix R RP and projecting the high dimensional data A into the K RP dimensional subspaces is O (D × K RP × S). If the number of vectors S becomes larger, the lowest projection dimensionality K 0 RP will become larger according to Equation (1) and the probability of no distortion P RP will also become larger according to Equation (3). Therefore, when the number of vectors is large, although the probability of no distortion is high, the low dimensionality K RP may be higher than the high dimensionality D, so the dimensionality reduction effect cannot be achieved. Besides, the generation of the projection matrix is completely random, which will lead to instability in the class separability of the generated dimensionality reduction result. In simpler terms, it cannot guarantee that the class separability of the dimensionality reduction result is good for the classification task.

Partitioned Random Projection
The lowest projection dimensionality K 0 RP of the RP algorithm is strictly limited by the parameters ε, β, and the number of hyperspectral vectors S. The Pavia Centre image with 102 bands is taken as an example, of which the number of hyperspectral vectors is 109,794. When the fixed parameters ε = 1 and β = 0.5, according to Equation (1), the lowest projection dimensionality of the RP algorithm is 349, which is much higher than the number of bands, which is 102. Therefore, when processing large-size HSIs, if the number of bands is very low, the RP algorithm will not be able to achieve the dimensionality reduction of large-size HSIs. Considering this problem, a new PRP algorithm is introduced in this section. PRP, a dimensionality reduction algorithm, can project a large-size HSI into a low dimensional subspace without serious distortion on pairwise distance.
The HSI can be expressed as a matrix U = [u 1 ; . . . ; u s ; . . . ; u S ], where S is the number of hyperspectral vectors, s is the index of hyperspectral vectors, and u s is the sth hyperspectral vector. The implementation of the PRP algorithm involves partitioning an HSI into multiple sub-HSIs evenly according to the horizontal direction and then projecting each sub-HSI into a low dimensional subspace. Specifically, the matrix of the HSI can also be defined as U = [U 1 ; . . . ; U m ; . . . ; U M ], where M is the number of sub-matrices, and m is the index of sub-matrices. Moreover, U m = [u m 1 ; . . . ; u m n ; . . . ; u m N ] is the mth N × D sub-matrix, where n is the index of hyperspectral vectors in the sub-matrix, and N is the number of hyperspectral vectors in the sub-matrix that is the same for all sub-matrices. Thus, S = M × N. Note that the dimensionality reduction method in this paper requires that an HSI must be divided equally. Otherwise, the first few hyperspectral vectors can be regarded as edges and discarded so that the HSI can be divided equally.
Next, the rationality of the distance preservation property guaranteed by this partition method is proven from two perspectives: within each sub-matrix and between any two different sub-matrices.
The distance changes before and after the projection of any two hyperspectral vectors within each sub-matrix are explained as follows. The conclusion of this theorem comes from the RP algorithm (Theorem 1).

Theorem 2.
For constants ε and β > 0, an integer K PRP such that where K 0 PRP is the lowest projection dimensionality of the proposed algorithm. R PRP is a D × K PRP matrix whose entry belongs to the standard normal Gaussian distribution. The N × K PRP sub-matrix is the projection of sub-matrix U m in K PRP dimensional subspaces. The low dimensional hyperspectral vector of sub-matrix V m is v m n . Namely, V m = {v m 1 ; . . . ; v m n ; . . . ; v m N }. Then, according to Equation (3), with the probability of no distortion at least for any two hyperspectral vectors u m n and u m n of sub-matrix U m , the distance of the two low dimensional hyperspectral vectors v m n and v m n of sub-matrix V m is preserved based on Equation (4), where n is the index of hyperspectral vectors in the sub-matrix.
According to Equation (5), the lowest projection dimensionality K 0 PRP decreases with the increase in the number of sub-matrices M and the decrease of parameter β. Moreover, with the increase of parameter ε, the change trend of lowest projection dimensionality K 0 PRP is to first decrease and then increase. Comparing Equation (1) and Equation (5), the lowest projection dimensionality of the PRP algorithm is smaller than that of the RP algorithm under the same parameters. Note that when M is equal to one, it is the projection without partitioning, that is, the RP algorithm. When M is equal to S, there is only one hyperspectral vector in each sub-matrix. Meanwhile, the lowest projection dimensionality is zero, according to Equation (5), so it can be projected into any space with a dimensionality greater than zero.
A projection matrix is randomly generated, and each sub-matrix is projected into the same low dimensional space with this projection matrix. On this basis, the distance changes before and after the projection of any two hyperspectral vectors in any two different submatrices are introduced. It is described in detail as Theorem 3. (Proof of Theorem 3 can be found in Appendix A).

Theorem 3.
For constants ε and β > 0, an integer K PRP is chosen according to Equation (5). R PRP is a D × K PRP matrix whose entries belong to the standard normal Gaussian distribution. The sub-matrix V m is obtained based on Equation (6). Then, for the same ε, β, K PRP , and R PRP , with the probability of no distortion at least for any hyperspectral vector u m n in sub-matrix U m and hyperspectral vector u m n in another sub-matrix U m , the distance of the low dimensional hyperspectral vector v m n in sub-matrix V m and low dimensional hyperspectral vector v m n in another sub-matrix V m is preserved, where m is the index of sub-matrices. Significantly, m is not equal to m (m = m).
Combining Theorem 2 and Theorem 3, the distance before and after the projection of any two hyperspectral vectors in HSIs can also remain unchanged with an acceptable probability of no distortion. Specifically, the minimum value of probabilities of no distortion within each sub-matrix and between two different sub-matrices is taken as the probability of no distortion of the PRP algorithm. The projection of the matrix U = [u 1 ; . . . ; u s ; . . . ; u S ] can be expressed as a matrix V = [v 1 ; . . . ; v s ; . . . ; v S ], where the low dimensional hyperspectral vector v s is the projection of u s . The proposed algorithm is interpreted as Theorem 4.

Theorem 4.
For constants ε and β > 0, an integer K PRP is chosen according to Equation (5). R PRP is a D × K PRP matrix whose entries belong to the standard normal Gaussian distribution. The N × K PRP sub-matrix V m is defined based on Equation (6) to present the projection of U m in K PRP dimensional subspaces. Then, for the same ε, β, K PRP and R PRP , with the probability of no distortion at least for any two hyperspectral vectors u s and u s of matrix U, the distance of the hyperspectral vectors v s and v s of matrix V remains as Equation (4).
The PRP algorithm can greatly reduce the dimensionality of HSIs, thereby reducing the computational load of subsequent HSI classification. The probability of no distortion P PRP increases as the parameter β increases and the number of sub-matrices M decreases based on Equation (11). Theorem 4 demonstrates that the PRP algorithm does not introduce a significant distortion in HSIs. Meanwhile, the proposed algorithm can also gain a high probability of no distortion in the case of dividing a small number of sub-matrices. In this case, the proposed algorithm can approximate the distance between each hyperspectral vector pair while projecting HSI into a space with a lower dimensionality than the RP algorithm. Thus, considering Theorem 1 and Theorem 4 comprehensively, the PRP algorithm is feasible in calculation and dimensionality reduction for large-size HSI. Moreover, when the number of sub-matrices is too small, the proposed algorithm cannot ensure the achievement of the purpose of dimensionality reduction. Therefore, this paper gives the minimum number of sub-matrices that guarantees high dimensionality can be reduced. Additionally, it is defined as Lemma 1. (Proof of Lemma 1 can be found in Appendix B). Lemma 1. Suppose that 1.5 > ε > 0 and β > 0, then Figure 2 presents an example of dividing an HSI with five million hyperspectral vectors into one million sub-HSIs. The number of hyperspectral vectors N in each sub-HSI is five in Figure 2, which is calculated by dividing the number of hyperspectral vectors S in the HSI by the number of sub-HSIs M, that is, N = S/M = 5,000,000/1,000,000 = 5.
The PRP algorithm can greatly reduce the dimensionality of HSIs, thereby reducing the computational load of subsequent HSI classification. The probability of no distortion PPRP increases as the parameter β increases and the number of sub-matrices M decreases based on Equation (11). Theorem 4 demonstrates that the PRP algorithm does not introduce a significant distortion in HSIs. Meanwhile, the proposed algorithm can also gain a high probability of no distortion in the case of dividing a small number of sub-matrices. In this case, the proposed algorithm can approximate the distance between each hyperspectral vector pair while projecting HSI into a space with a lower dimensionality than the RP algorithm. Thus, considering Theorem 1 and Theorem 4 comprehensively, the PRP algorithm is feasible in calculation and dimensionality reduction for large-size HSI. Moreover, when the number of sub-matrices is too small, the proposed algorithm cannot ensure the achievement of the purpose of dimensionality reduction. Therefore, this paper gives the minimum number of sub-matrices that guarantees high dimensionality can be reduced. Additionally, it is defined as Lemma 1. (Proof of Lemma 1 can be found in Appendix B).

HSI Classification Based on the PRP Algorithm
The classification based on the PRP algorithm is introduced from two aspects. One is the optimization strategy of the projection matrix to increase the class separability of the dimensionality reduction result with the assistance of samples, and then the optimization strategy is applied to the PRP algorithm. The other is the HSI classification algorithm integrates the PRP algorithm for dimensionality reduction and MD classifier for classification.    Figure 2. A partition of an HSI with five million hyperspectral vectors into one million sub-HSIs.

HSI Classification Based on the PRP Algorithm
The classification based on the PRP algorithm is introduced from two aspects. One is the optimization strategy of the projection matrix to increase the class separability of the dimensionality reduction result with the assistance of samples, and then the optimization strategy is applied to the PRP algorithm. The other is the HSI classification algorithm integrates the PRP algorithm for dimensionality reduction and MD classifier for classification.

Optimization Strategy of the Projection Matrix
Because the projection matrix is generated randomly, there is a case where the dimensionality reduction result with low class separability is not suitable for subsequent HSI classification. Therefore, it is necessary to exploit a projection matrix that is applicable for the classification task. The optimization strategy of the projection matrix is to increase the class separability of the low dimensional HSI. The class dissimilarity measurement is defined as large inter-class distance and small intra-class variance. The calculation method is the sum of the distance between classes divided by the variance within the class. Specifically, with the aid of samples, the projection matrix that maximizes the class dissimilarity is selected among multiple samplings. Samples matrix for all classes of an HSI is expressed as X. Specifically, X = [X 1 ; . . . ; X l ; . . . ; X L ], where l is the index of classes and L is the number of classes known as a prior. And X l is the lth samples matrix, where H is the number of samples in a single class, which is the same for each class. The selection of the projection matrix R PRP is very important for the HSI classification. To be specific, many matrices are generated based on the standard normal Gaussian distribution. These matrices constitute a sampling matrix of the projection matrix, Q = [Q 1 , . . . , Q t , . . . , Q T ], where t is the index of sampling, T is the number of samplings, and Q t is the projection matrix of the tth sampling. Then, each projection matrix Q t is used to project the samples matrix X. The low dimensional samples matrix W = [W 1 , . . . , W t , . . . , W T ] is the projection of samples matrix X by the sampling matrices Q. Moreover, its calculation is derived from Equation (6), that is, where W tl is the low dimensional samples matrix of class l of the tth sampling. That is, The feature mean hyperspectral vector of low dimensional samples matrix W tl of class l of the tth sampling is described as H w tl 11 + · · · + w tl H1 w tl 12 + · · · + w tl H2 · · · w tl 1K PRP + · · · + w tl HK PRP (15) Next, distance between lth class and l th class divided by the variance of the lth class is defined as follows, where l is the index of classes. Then, the class dissimilarity is calculated as follows, The better projection matrix is the t*th sampled projection matrix with the large class dissimilarity. Then, Finally, the projection matrix R PRP takes to the t*th sampled projection matrix Q t* , where t* is obtained by Equation (18), The projection matrix selection strategy can increase the separability of dimensionality reduction results, thereby increasing the accuracy of HSI classification.

HSI Classification Based on Optimization Strategy of the Projection Matrix
HSI classification algorithm is described in this section. First, the low dimensional samples matrix Y = [Y 1 ; . . . ; Y l ; . . . ; Y L ] is the projection of samples matrix X by the projection matrix obtained according to Equation (19) with K PRP dimensionalities. To be special, is the lth low dimensional samples matrix. The feature mean hyperspectral vector of the lth low dimensional samples is expressed as H y l 11 + · · · + y l H1 y l 12 + · · · + y l H2 · · · y l 1K PRP + · · · + y l HK PRP Consequently, an HSI is denoted by U = [u 1 , . . . , u s , . . . , u S ], where s is the index of hyperspectral vectors, S is the number of hyperspectral vectors, and u s is a hyperspectral vector with D dimensionalities. The low dimensional HSI V = [v 1 , . . . , v s , . . . , v S ] is obtained by the PRP algorithm, where v s is a low dimensional hyperspectral vector with K PRP dimensionalities. Then, the final classes to which the low dimensional HSIs belong are determined by executing the MD classifier. That is to calculate the distance between the low dimensional hyperspectral vector and the samples feature mean hyperspectral vector of each class.
The distance matrix is characterized as Z = [z 1 , . . . , z s , . . . , z S ], where z s is the sth distance vector. Especially, z s = [z s1 , . . . , z sl , . . . , z sL ], where z sl is the distance between a low dimensional hyperspectral vector v s and feature mean hyperspectral vector Y l mean according to Equation (21), and the calculation method is as follows, The determined classification result can be defined as f = [f 1 , . . . , f s , . . . , f S ] in the HSI classification process. That is, the class with the smallest distance is the class to which the low dimensional hyperspectral vector belongs, f s = argminz s = arg min

The Flow Chart of the Proposed Algorithm
The detailed process of the proposed Algorithm 1 is described as follows. Step 3. Calculating the lowest projection dimensionality K 0 PRP by using Equation (5).
Step 4. Forming the projection matrix R PRP by using Equation (19).
Step 5. Generating a low dimensional sub-HSI V m by using Equation (6).
Step 7. Generating the feature mean hyperspectral vector of the low dimensional samples of lth class Y l mean by using Equation (21).
Step 8. Calculating the distance z sl by using Equation (22).
In addition, Figure 3 illustrates a flow chart of the proposed classification algorithm in this paper.
To update Z, it takes Ο(MND) space and Ο(MNDKPRP) time for the calculation of the low dimensional image V. Furthermore, Ο(MNKPRP) space and Ο(MNLKPRP) time are required to calculate the distance zsl. To sum up, the overall space complexity of the proposed algorithm is O(MND), and the overall time complexity of the proposed algorithm is Ο(HLDKPRP + THL 2 KPRP + MNDKPRP + MNLKPRP).

The Flow Chart of the Proposed Algorithm
The detailed process of the proposed Algorithm 1 is described as follows.

Algorithm 1
The detailed process of the proposed classification algorithm Input: HSI U and samples X. Output: Classification result f.
Step 3. Calculating the lowest projection dimensionality K 0 PRP by using Equation (5).
Step 5. Generating a low dimensional sub-HSI V m by using Equation (6).
Step 7. Generating the feature mean hyperspectral vector of the low dimensional samples of lth class Y l mean by using Equation (21). Step 8. Calculating the distance zsl by using Equation (22).
In addition, Figure 3 illustrates a flow chart of the proposed classification algorithm in this paper.

Experiments and Results
To validate the effectiveness of the proposed algorithm, classification experiments in real HSIs were performed on a PC with Intel (R) Xeon (R) CPU E7-8880 v3, 2.30 GHz and 96.0 GB memory using MATLAB R2020a. The experimental parameters are set as shown in Table 2. The table contains the number of samplings T, the number of samples in each class H, the number of sub-HSI M, and low dimensionalities. It can be seen from Table 2 that the K PRP is less than three-quarters of that of the K TRP and one-eighth of that of the K RP , which greatly reduces the amount of computation. In addition, the K RP is larger than the high dimensionality D, which means that the RP algorithm cannot achieve the purpose of dimensionality reduction.
To verify the effectiveness of the projection matrix selection strategy, taking the LongKou image as an example, the randomly generated projection matrix (projection matrix without selection) is used as a comparison algorithm. Figure 4 shows the change of the spectral curve of the mean vector of samples of 9 classes of the LongKou image to evaluate the distance change between classes in the dimensionality reduction results. The horizontal coordinates represent the number of bands, and the vertical coordinates represent the intensity value. Figure 4a-c represent spectral curves of all classes without dimensionality reduction, dimensionality reduction without matrix selection, and dimensionality reduction based on matrix selection, with legends for all classes, respectively. Comparing the three images in Figure 4, the low dimensional spectral curves obtained by the projection matrix selection strategy have good separability for each class in each dimensionality, which fully reflect the superiority of the projection matrix optimization strategy. dimensionality reduction.
To verify the effectiveness of the projection matrix selection strategy, taking the LongKou image as an example, the randomly generated projection matrix (projection matrix without selection) is used as a comparison algorithm. Figure 4 shows the change of the spectral curve of the mean vector of samples of 9 classes of the LongKou image to evaluate the distance change between classes in the dimensionality reduction results. The horizontal coordinates represent the number of bands, and the vertical coordinates represent the intensity value. Figure 4a-c represent spectral curves of all classes without dimensionality reduction, dimensionality reduction without matrix selection, and dimensionality reduction based on matrix selection, with legends for all classes, respectively. Comparing the three images in Figure 4, the low dimensional spectral curves obtained by the projection matrix selection strategy have good separability for each class in each dimensionality, which fully reflect the superiority of the projection matrix optimization strategy.  For the sake of validating the proposed classification algorithm, the TRP-MIV algorithm, the algorithm for combining PRP and MGSR (PRP-MGSR), and the CAFCM algorithm are used as comparison algorithms. It is worth noting that the classification experiments of all experimental images have been tested 100 times. For all algorithms, the classification results with the highest accuracy in 100 trials are exhibited in Figure 5. Figure  5(a1-a4,b1-b4,c1-c4,d1-d4) show the classification results for four experimental images by the proposed algorithm and the TRP-MIV, PRP-MGSR, and CAFCM algorithms, respectively. Additionally, Figure 5(a5-d5) are legends for all classes of the four experimental images. Meanwhile, to better show the classification effect of the proposed algorithm, Figure 6 superimposes the final classification results of HSIs on the respective falsecolor images (see Figure 1(a1-d1) for details). Figure 6 shows the outlines of the superposition results of the four experimental images of the four algorithms. For the sake of validating the proposed classification algorithm, the TRP-MIV algorithm, the algorithm for combining PRP and MGSR (PRP-MGSR), and the CAFCM algorithm are used as comparison algorithms. It is worth noting that the classification experiments of all experimental images have been tested 100 times. For all algorithms, the classification results with the highest accuracy in 100 trials are exhibited in Figure 5. Figure 5(a1-a4,b1-b4,c1-c4,d1-d4) show the classification results for four experimental images by the proposed algorithm and the TRP-MIV, PRP-MGSR, and CAFCM algorithms, respectively. Additionally, Figure 5(a5-d5) are legends for all classes of the four experimental images. Meanwhile, to better show the classification effect of the proposed algorithm, Figure 6 superimposes the final classification results of HSIs on the respective false-color images (see Figure 1(a1-d1) for details). Figure 6 shows the outlines of the superposition results of the four experimental images of the four algorithms.
sample mean vectors of all classes after dimensionality reduction based on projection matrix selection, with legends for all classes.
For the sake of validating the proposed classification algorithm, the TRP-MIV algorithm, the algorithm for combining PRP and MGSR (PRP-MGSR), and the CAFCM algorithm are used as comparison algorithms. It is worth noting that the classification experiments of all experimental images have been tested 100 times. For all algorithms, the classification results with the highest accuracy in 100 trials are exhibited in Figure 5. Figure  5(a1-a4,b1-b4,c1-c4,d1-d4) show the classification results for four experimental images by the proposed algorithm and the TRP-MIV, PRP-MGSR, and CAFCM algorithms, respectively. Additionally, Figure 5(a5-d5) are legends for all classes of the four experimental images. Meanwhile, to better show the classification effect of the proposed algorithm, Figure 6 superimposes the final classification results of HSIs on the respective falsecolor images (see Figure 1(a1-d1) for details). Figure 6 shows the outlines of the superposition results of the four experimental images of the four algorithms.  (d1) (d2) (d3) (d4) (d5)  From a qualitative point of view, compared with the other three comparison algorithms, the classification performance of the proposed algorithm is superior, which proves the reliability of the proposed classification algorithm for large-size HSIs. According to qualitative experimental results, the TRP-MIV algorithm has misclassification for complex HSIs with large hyperspectral vectors. In addition, comparing the PRP-MGSR algorithm, the proposed algorithm simultaneously considers the inter-class distance and intra-class variance and can achieve better classification results. The proposed algorithm preserves the class separability so that each class is classified exactly. In theory, the more dimensionalities are reduced, the more the loss of spectral information will be, which results in an unsatisfactory classification effect. However, the proposed algorithm attains a classification effect better than the CAFCM algorithm for four experimental images. Moreover, the CAFCM algorithm needs to be projected into a space with a higher dimensionality than From a qualitative point of view, compared with the other three comparison algorithms, the classification performance of the proposed algorithm is superior, which proves the reliability of the proposed classification algorithm for large-size HSIs. According to qualitative experimental results, the TRP-MIV algorithm has misclassification for complex HSIs with large hyperspectral vectors. In addition, comparing the PRP-MGSR algorithm, the proposed algorithm simultaneously considers the inter-class distance and intra-class variance and can achieve better classification results. The proposed algorithm preserves the class separability so that each class is classified exactly. In theory, the more dimensionalities are reduced, the more the loss of spectral information will be, which results in an unsatisfactory classification effect. However, the proposed algorithm attains a classification effect better than the CAFCM algorithm for four experimental images. Moreover, the CAFCM algorithm needs to be projected into a space with a higher dimensionality than the high dimensionality so that it cannot achieve the effect of dimensionality reduction.
The kappa coefficient, overall accuracy (OA), average accuracy (AA), average precision rate (APR), and running time are calculated to effectively evaluate the classification performance. Tables 3-6   From a quantitative perspective, the classification results of the proposed algorithm surpass other algorithms. The running time of the proposed method for four image classification experiments is about 1.16, 0.37, 0.21, and 2.42 s, and the mean values of their kappa coefficient are larger than 0.82. All evaluation parameters intuitively state that the proposed algorithm accurately classifies the real HSIs. According to quantitative experimental results, although the mean values of AA and APR are less than 80% for the Pavia Centre image, the mean value of OA is more than 89%. For the Salinas image, the variance value of the kappa coefficient of the proposed algorithm in 100 trials is approximately equal to zero, which indicates that the classification results of the proposed algorithm are relatively stable. In addition, for the Chikusei image, the mean values of all evaluation parameters of the proposed algorithm are at least 0.05, 3.59%, 4.73%, 7.28%, and 7.57 times larger than those of the comparison algorithms, which fully reflects the superiority of the proposed classification algorithm. Although the kappa coefficient of the proposed algorithm is not very high for the LongKou image, the accuracy of the proposed algorithm is still higher than that of all comparison algorithms. Additionally, the values of OA, AA, and APR outperform the comparison algorithms by at least 16.51%, 21.35%, and 35.19%. In summary, the proposed algorithm can obtain reliable classification results in a very short time.

Discussion
In order to verify the superiority of the proposed classification algorithm, all classification experiments have been tested 100 times. The TRP-MIV algorithm [38] has better classification results for HSIs with a large inter-class distance. The PRP-MGSR algorithm is relatively stable in classification results in multiple tests. This is because the MGSR algorithm [14] has strong applicability to various data, but the running time is relatively long. The CAFCM algorithm [39] is more likely to be disturbed by noise, and the classification results of the algorithm are better for images with less noise. The proposed classification algorithm considers the class dissimilarity, which can reach the ideal classification result in a shorter time.
In order to use the proposed algorithm more widely, this section discusses the influence of the projection parameters on dimensionality reduction and the influence of the number of samples and the number of samplings on the classification accuracy. First, the focus is to analyze the effects of parameters ε, β, and the number of sub-HSIs M on the lowest projection dimensionality K 0 PRP . According to Equation (5), taking the Salinas image as an example, Figure 7 represents the change curves of the lowest projection dimensionality with different parameters, where the horizontal coordinates are different parameters, and the vertical coordinates are K 0 PRP . Figure 7a is the change curve of ε and K 0 PRP when β = 0.5 and M = 2295 (according to Table 2), where the position of the black dotted line is the inflection point. When ε is equal to 1.5, the denominator term of Equation (5) is 0, so the curve in Figure 7a is discontinuous at ε = 1.5. When ε < 1.5, K 0 PRP first decreases and then increases with the increase of ε and takes the minimum value when ε = 1. When ε > 1.5, K 0 PRP increases with the increase of ε and is always negative, which is meaningless for dimensionality reduction. Figure 7b is the change curve of β and K 0 PRP when ε = 1 and M = 2295. As seen in Figure 7b, K 0 PRP increases with the increase of β. Figure 7c is the change curve of M and K 0 PRP when ε = 1 and β = 0.5. The position of the black dotted line is the number of bands, and the values on the upper side and the left side of the boundary line cannot achieve dimensionality reduction. As seen from Figure 7c, K 0 PRP decreases with the increase of M. Therefore, to reduce computation and memory, it is necessary to project the HSI to a smaller dimensional space. Then, it is best to set ε to 1 within (0, 1.5); β should be set as small as possible, and M should be set as large as possible. Next, in order to reflect the impact of the number of samples H and the number of samplings T on classification performance, the change curves in the values of OA with different T and H are shown in Figure 8. The number of samples and samplings are taken from 10 to 650, respectively. Figure 8 shows that as T and H increase, the OA value varies only in a small range. The larger the number of samplings and samples, the smaller the range of variation. Among them, Figure 8a shows that with the increase of T, the error of the OA value is 1%, and Figure 8b shows that with the increase of H, the error of the OA value is 2.5%. Therefore, in practical applications, only a small number of samples and few projection matrix samplings can be used to obtain better classification results. In addition, in order to ensure that the PRP algorithm can still maintain the distance after the HSI is divided, the number of samples in each class in this paper is set to be the same. The focus of future research will be on dividing an HSI into sub-HSIs with different numbers of hyperspectral vectors to improve utility.

Conclusions
A new dimensionality reduction method, the PRP algorithm, is presented for largesize HSIs in this paper. It is worth mentioning that this paper also theoretically proves the distance preservation property of the PRP algorithm in detail. The PRP algorithm projects the large-size HSI into a space with a lower dimensionality than the RP algorithm. Therefore, when applied to the large-size HSI, the PRP algorithm no longer has the problem of the inability to reduce the dimensionality. In addition, the proposed algorithm utilizes the large inter-class distance and small intra-class variance as the class dissimilarity measurement for selecting the projection matrix, which can preserve the class separability of the low dimensional HSI well. Compared with TRP-MIV, PRP-MGSR, and CAFCM algorithms, the proposed algorithm has the shortest running time and the highest classification accuracy.

Conclusions
A new dimensionality reduction method, the PRP algorithm, is presented for large-size HSIs in this paper. It is worth mentioning that this paper also theoretically proves the distance preservation property of the PRP algorithm in detail. The PRP algorithm projects the large-size HSI into a space with a lower dimensionality than the RP algorithm. Therefore, when applied to the large-size HSI, the PRP algorithm no longer has the problem of the inability to reduce the dimensionality. In addition, the proposed algorithm utilizes the large inter-class distance and small intra-class variance as the class dissimilarity measurement for selecting the projection matrix, which can preserve the class separability of the low dimensional HSI well. Compared with TRP-MIV, PRP-MGSR, and CAFCM algorithms, the proposed algorithm has the shortest running time and the highest classification accuracy.
For any constant λ > 0, continue to construct the above formula, then, Pr v m n − v m n 2 ≥ (1 + ε) u m n − u m n 2 = Pr(exp(Cλ) ≥ exp((1 + ε)K PRP λ)) (A8) The probability that the projected distance will remain the same can be linked to the mathematical expectation of the projected distance according to Equation (A1). Then, the first scaling can be done for Pr (||v m n − v m n || 2 ≥ (1 + ε) ||u m n − u m n || 2 ), Supposing that C k = R k (u m n − u m n ). Because it is the weighted average of the independent standard normal variables, it belongs to the standard normal distribution. Hence, Since C k is independent and identically distributed, Pr (||v m n − v m n || 2 ≥ (1 + ε) ||u m n − u m n || 2 ) defined in Equation (6)