Spectral-Spatial Feature Extraction of Hyperspectral Images Based on Propagation Filter

Recently, image-filtering based hyperspectral image (HSI) feature extraction has been widely studied. However, due to limited spatial resolution and feature distribution complexity, the problems of cross-region mixing after filtering and spectral discriminative reduction still remain. To address these issues, this paper proposes a spectral-spatial propagation filter (PF) based HSI feature extraction method that can effectively address the above problems. The dimensionality/band of an HSI is typically high; therefore, principal component analysis (PCA) is first used to reduce the HSI dimensionality. Then, the principal components of the HSI are filtered with the PF. When cross-region mixture occurs in the image, the filter template reduces the weight assignments of the cross-region mixed pixels to handle the issue of cross-region mixed pixels simply and effectively. To validate the effectiveness of the proposed method, experiments are carried out on three common HSIs using support vector machine (SVM) classifiers with features learned by the PF. The experimental results demonstrate that the proposed method effectively extracts the spectral-spatial features of HSIs and significantly improves the accuracy of HSI classification.

Many DR models have been utilized to pre-process high-dimensional HSIs, including supervised, unsupervised, and semi-supervised DR methods [18]. Examples of supervised DR methods include linear discriminant analysis (LDA) [19] and nonparametric weighted feature extraction (NWFE) [20]; the unsupervised methods include PCA [21], independent component analysis (ICA) [22], superpixelwise PCA [23]; and the semi-supervised DR methods include semi-supervised discriminant analysis (SDA) [24]. Among these methods, the new optimized feature extracted by the best discriminant vector satisfies the class separability after the samples in high-dimensional feature space are projected to the low-dimensional feature space through the supervised DR model LDA. However, when the data samples between classes are nonlinearly separated in the input space, LDA is expected to fail. The semi-supervised DR technique SDA adds a regularization term to the LDA algorithm to ensure that the local structure between the

Propagation Filter
The PF [34] is a smoothing filter in which the pixel values of an HSI are acquired by where Z s = ∑ t∈N s ω s,t is the normalised factor, N s is the set of neighbouring pixels set, the size of the window size is ((2w + 1) × (2w + 1)) for the central pixel s, ω s,t is the weight of pixel t of the neighbouring pixel set to perform the filtering of pixel s, and I t is the pixel value of pixel t of the HSI.
Here, ω s,t = P(s −→ t) is defined as the weight between pixel s and the its adjacent pixel t such that t = s. The weight between pixel s and itself is ω s,s = P(s −→ s) = 1; otherwise, ω s,t = ω s,t−1 D(t − 1, t)R(s, t), where the two distances D(t − 1, t) and R(s, t)can be defined by R(s, t) = g( I s − I t ; σ r ), (4) with the function g(.) being the Gaussian function g( I s − I t ; σ r ) = exp ( − I s − I t 2 2σ 2 r ).
Without the loss of generality, it is assumed that σ α = σ r and D(.) = R(.) throughout the this paper.

Spectral-Spatial Feature Extraction Method Based on the PF
As shown in Figure 1a, the cross-region mixture problem is quite common in HSIs. In particular, a lower spatial resolution increases the number of classes. As the ground sample distance increases, there is a potential for objects covered by a given pixel to be mixed [34]. Therefore, this paper presents the spectral-spatial feature extraction of the HSI algorithm based on the advantage that the PF can handle the cross-regional mixture problem [35]. As seen in Equations (1)-(6) and Figure 1b-d, the PF generates a new center pixel using a weighted summation of the neighbouring pixels in the HSI. The adjacent pixel t, center pixel s and pixel t − 1 in the neighbouring pixel set are all the same class, and the weight of pixel t is relatively larger. In Figure 1d, pixel t − 1 selected is close to the pixel t and points to the pixel t, where pixel s is in yellow, pixel t is in red, pixel t − 1 is in green. However, when there are mixed pixels in the neighbouring pixel sets, the weight of pixel t is smaller. Therefore, the PF ensures that the similar features of the same classes of pixels are enhanced, which suppresses the effects of cross-regional mixed pixels.  In addition, to improve the performance of the PF for feature extraction in HSIs, PCA is performed before filtering: the HSIs are reduced by PCA, and the redundant information between bands is greatly reduced in the updated pixels. However, although the HSIs lose a small amount of information after PCA, the bands are sorted according to the importance of the information. After the PF process, the increased effects of the important and reduced effects of the less important features are beneficial for feature extraction and in improving the classification accuracy.
The specific process is shown in Figure 2. In the first step, PCA is used to reduce the dimensionality and remove the redundant inter-spectral information to obtain the principal components of an HSI. Then, the PCA feature is filtered with the PF. When cross-regional mixing occurs in the image, the filter template reduces or avoids the influence of cross-regional mixed pixels on the object pixel, thereby avoiding or effectively mitigating the effects of cross-regional mixed pixels. Through this technique, the proposed method can accurately extract the reflected spectral-spatial features of the real objects. Finally, to validate the effectiveness of the proposed method, experiments are carried out on HSIs using an SVM classifier trained on the learned spectral-spatial features. Algorithm 1 depicts the proposed HSI spectral-spatial feature extraction model based on the PF.  Algorithm 1: Specific flowchart of the spectral-spatial feature extraction algorithm based on the PF. Data: HSI I = (I 1 , I 2 , · · · , I n ) ∈ R d×n , d is the number of HSI spectral bands, n is the number of pixels, the size of filter window is w, and the variance of the Gaussian function is σ α (σ r ). Result: spectral-spatial feature O = (O 1 , O 2 , · · · , O n ) ∈ R k×n , k is the reduced dimension 1 The dimensionality of I is reduced from d to k using PCA, and the dimensionality-reduced HSI is I = (I 1 , I 2 , · · · , I n ) ∈ R k×n ; 2 for n = 1 : k do 3 Using Equation (6), calculate the pixel value distance between pixel s and pixel t; 4 Using Equation (5), calculate the pixel value distance between the pixel t and the pixel t − 1;

5
Using Equation (2), calculate the weight ω s,t of the pixel t in the adjacent set ; 6 Using Equation (1), calculate the pixel value O s of the pixel s output by the PF operation ;

Experimental Settings
In this paper, the training and testing samples for each HSI dataset were chosen randomly. In the experiments shown in Table 1, 20 label samples were randomly selected for each class as training samples, and the rest were used as test samples to verify the performance of the proposed methods in the three experiments. To verify the classification performance of different methods with sufficient training samples and insufficient training samples, in the experiments shown in Table 5, 10-50 training samples were selected from each class and the rest were used as test samples. For stability, each experiment was performed 10 times; the reported results are the averages.

Dataset Description
Three real HSI sets are used in this paper: the Indian Pines, Salinas and University of Pavia scenes. The Indian Pines image was obtained by the AVIRIS sensor and covers the northern agricultural Indian Pines test site. The image, which includes 16 categories of ground objects, contains 145 × 145 pixels, and only 200 out of all 224 bands are valid due to water absorption. The spatial resolution is 20 m per pixel, and the spectral range is 0.4 to 2.5 µm. The Salinas image contains 512 × 217 pixels and includes 16 types of ground objects at a 3.7-m spatial resolution and was acquired by the AVIRIS sensor over the Salinas Valley in California, USA. After removing 20 of the 224 bands due to noise and water absorption, the remaining 204 valid bands were utilized in the experiments. The University of Pavia image was acquired with 610 × 340 pixels at 1.3-m spatial resolution by the ROSIS Sensor in the city area around the University of Pavia. The image has a spectral range of 0.43 to 0.86 µm with 115 bands, where 12 bands that were noisy or impacted by water absorption were removed, and the remaining 103 bands were used.

Compared Algorithms
In the experiments, the proposed classification method PCA-PF-SVM was compared to other widely used HSI classification methods, including SVM [11], PCA-SVM [36], PCA-Gabor-SVM [28], EPF-SVM [29], HiFi [30], LBP-SVM [37], R-VCANet-SVM [38] and PF-SVM. The parameters used for these methods were the default settings provided in the related literature. The source code for the algorithms was provided by the respective authors. The SVM classifier was based on the Libsvm library [39], and the optimal parameters of the SVM classifier were determined by a fivefold cross validation. The overall accuracy (OA), the average accuracy (AA), and the kappa coefficient are used to evaluate the performance of the methods. The OA indicates the probability that the classification results are consistent with the reference classification results. The AA refers to the mean of the percentage of correctly classified pixels for each class. The kappa coefficient is used for consistency check.

Parameter Sensitivity Analysis
The proposed PCA-PF-SVM method has the following three important parameters: the filtering standard deviation σ α (σ r ), the filtering window size (w) and the feature dimension (k). To test the influence of the different parameter settings of the proposed model, we conducted extensive experiments were conducted on the Indian Pines scene. As shown in Figure 3a, the best OA, AA and kappa values were achieved when σ α (σ r ) = 1.5. In contrast, when σ α (σ r ) < 1.5, the accuracies decreased significantly because a small σ α (σ r ) leads to a smoother image. When σ α (σ r ) > 1.5, the classification accuracy remains relatively stable because the ability to suppress bad information improves after the filter parameter reaches a certain value. As shown in Figure 3b, the best OA, AA and kappa values were achieved when w = 8. These values are significantly lower when w < 8 because considerable important spatial information is lost when the window size is too small. Moreover, the values also decrease when w > 8 because the window contains a larger amount of irrelevant information that reduces the effect of the important spatial information and, thus, reduces the classification accuracy. From Figure 3c, OA becomes lager with the increase of PCA dimensions. When the dimension reaches to 45, OA trends to become decrease. In our experiments, k is set to 45 for the tradeoff between the computation complexity and classification accuracy. Therefore, in all of our experiments, the parameters were set as follows: σ α (σ r ) = 1.5, w = 8 and k = 45.

Experimental Results
(1) The proposed PCA-PF-SVM method has strong spatial capabilities. According to  and Tables 2-4, the PCA-PF-SVM method achieves better OA, AA and kappa values than does the spectral classification method. The OA values based on the proposed PCA-PF-SVM method with respect to the Indian pines, Salinas and University of Pavia datasets are 36.14%, 8.87% and 17.78% higher, respectively, than the OA values based on the PCA-SVM method and 25.32%, 11.15% and 14.68% higher, respectively, than the OA values based on the SVM method. The main reason is that the spectral classification methods do not consider spatial information, while the method proposed in this paper fully considers spatial information. These results verify that the proposed method is effective in spectral-spatial feature extraction.
(2) The results verify that combining PCA and the PF is effective for HSI feature extraction. Figures 4-6 and Tables 2-4 show that the PCA dimensionality reduction of the HSI does not improve the performance of the SVM classification and may even reduce the classification performance of the SVM. For example, the OA values of the PCA-SVM method for the Indian Pines dataset are lower than those for the SVM method. This result mainly occurs because although the PCA preserves the HSI's main information, it also loses a small amount of information, thus affecting the SVM classification accuracy. However, the combination of PCA and the PF greatly enhances the performance. The OA values based on the proposed PCA-PF-SVM method for the Indian pines, Salinas and University of Pavia datasets are 13.26%, 3.42% and 7.86% higher, respectively, than are the OA values resulting the PF-SVM method. These experimental results show that it is necessary to apply PCA dimensionality reduction before filtering.     (3) The proposed method is more effective than the other advanced classification methods. As shown in Figures 4-6 and Tables 2-4, compared with other methods, the PCA-PF-SVM method shows very good performance in terms of OA and kappa. On the Indian Pines, Salinas and University of Pavia datasets, the PCA-PF-SVM method shows more obvious effects than do the HiFi-We, LBP-SVM and R-VCANet-SVM methods. The OA values based on the proposed PCA-PF-SVM method for the Indian Pines, Salinas and University of Pavia datasets are 1.77%, 5.61% and 1.93% higher, respectively, than the OA values based on the HiFi-We method and 2.89%, 2.14% and 8.59% higher, respectively, than the OA values based on the LBP-SVM method and 8.36%, 4.53% and 3.38% higher, respectively, than the OA values based on the R-VCANet-SVM method.
(4) The experimental results demonstrate the robustness of the proposed PCA-PF-SVM method. As shown in Figures 7-9 and Table 5, in both scenarios, as the number of training samples varies from 10 to 50, the proposed method achieves the highest OA. Its advantage is especially obvious when the number of training samples is small. For example, when the number of training samples per class is 10, our method has a 3.12-36.31% advantage on the Indian Pines image and a 3.5-20.29% advantage on the Salinas image and a 3.31-23.43% advantage on the University of Pavia image compared to the other methods. This is a highly meaningful result, because it means that a large number of non-labelled samples can be distinguished using a much smaller number of labelled samples, thus greatly improving the work efficiency, which further illustrates the robustness of the proposed method.
(5) These experimental results show that the proposed method is useful for addressing the cross-regional mixture problems of HSIs. In Figure 10, the complete classified maps and ground truth maps obtained by PCA-PF-SVM are presented. The proposed method achieves better results on the cross-region mixture problem. For cross-region marked by white box in the three figures, PF can reduce cross-region problem, which keep better feature of image and improve further classification accuracy.    T-test is popular in many related works [40][41][42]. We accept the hypothesis that the mean kappa of PCA-PF-SVM is larger than a compared method only if Equation (7) is valid: whereā 1 andā 2 are the means of kappa of PCA-PF-SVM and a compared method, s 1 and s 2 are the corresponding standard deviations, n 1 and n 2 are the number of realizations of experiments reported which is set as 10 in this paper. Paired t-test shows that the increases on kappa are statistically significant in all the three datasets (at the level of 95%), and it can be also observed in Figure 11.

Conclusions
The motivation for this study was to develop a simple feature extraction method to handle the cross-regional mixed problem of HSIs. The developed method extracts spectral-spatial features via the PF. However, the HSI's high-dimensional problems affect the PF's performance to a certain extent. To ensure a real effect, based on the characteristics of the HSI, PCA is used to reduce an images dimensions. Moreover, a combination PCA-PF feature extraction method is proposed. To evaluate the performance of the proposed method, three classical datasets with different complexities of cross-regional mixing problems were analyzed, and comparative experiments were also employed. The results show that the proposed method effectively solves the cross-regional mixture problem. In addition, feature extraction method in this paper use NRS and ELM for classification, and compares with PCA-Gabor-NRS and LBP-ELM.As shown in Table 6, classification results show that our method can obtain better results than that of the compared methods.