Multiscale Union Regions Adaptive Sparse Representation for Hyperspectral Image Classification

Sparse Representation has been widely applied to classification of hyperspectral images (HSIs). Besides spectral information, the spatial context in HSIs also plays an important role in the classification. The recently published Multiscale Adaptive Sparse Representation (MASR) classifier has shown good performance in exploiting spatial information for HSI classification. But the spatial information is exploited by multiscale patches with fixed sizes of square windows. The patch can include all nearest neighbor pixels but these neighbor pixels may contain some noise pixels. Then another research proposed a Multiscale Superpixel-Based Sparse Representation (MSSR) classifier. Shape-adaptive superpixels can provide more accurate representation than patches. But it is difficult to select scales for superpixels. Therefore, inspired by the merits and demerits of multiscale patches and superpixels, we propose a novel algorithm called Multiscale Union Regions Adaptive Sparse Representation (MURASR). The union region, which is the overlap of patch and superpixel, can make full use of the advantages of both and overcome the weaknesses of each one. Experiments on several HSI datasets demonstrate that the proposed MURASR is superior to MASR and union region is better than the patch in the sparse representation.


Introduction
Hyperspectral images have been widely applied to remote sensing image applications, such as land cover classification [1], target detection [2], anomaly detection [3], spectral unmixing [4] and others.Each pixel in HSI has hundreds of narrow contiguous bands, spanning from visible to infrared spectrum [5], which makes it possible to detect and distinguish various objects with higher accuracy [6].However, increasing the number of spectral bands or features of an HSI pixel does not always help to increase the classification accuracy.Therefore, how to make full use of the information in HSIs is a problem in practical applications.
Many algorithms have been developed for the classification of HSIs.Among these, there are some well-known pixelwise classifiers, such as the support vector machine (SVM) [7][8][9], support vector conditional random classifier [10], multinomial logistic regression [11], neural network [12] and adaptive artificial immune network [13].These pixelwise classifiers can make full use of the spectral information of HSIs, but the classification results are often noisy because the spatial information is not considered.
Therefore, some recent researches incorporated the spatial information in HSI classification to enhance the classification performance.The basic way to use spatial information is to assume that the pixels within a local region usually represent the same material and have similar spectral characteristics [1].Various researches [14][15][16][17][18][19][20][21][22][23][24][25] have been done based on this assumption.Besides these researches, Sparse representation (SR), which is based on the observation that spectral pixels of a particular class should lie in a low-dimensional subspace spanned by dictionary atoms (training pixels) from the same class, is also employed.In [26], a Joint Sparse Representation Classification (JSRC) method has been proposed to incorporate spectral information and spatial information.The spatial information is expressed by a fixed-size local square window centered with the test pixel.Then all pixels in the window are simultaneously joint represented by a few common atoms in the specified dictionary.The JSRC can achieve a good performance but the optimal size of the window cannot be determined easily.In [27], a stepwise Markov random field (MRF) optimization was proposed to exploit spatial information based on the result of multitask joint sparse representation.In [28], MASR was proposed to release the difficulty in choosing region size.Instead of choosing a single scale, this method extends the spatial information to several scales to take advantage of correlations among multiple region scales for HSI classification.But the multiscale regions used in MASR refer to multiscale patches which may contain noise pixels.Better than patch region, shape-adaptive superpixel can provide more accurate spatial information.In [29], the superpixel was introduced to replace the patch region.Then a shape-adaptive local smooth region was generated for each test pixel by a shape-adaptive algorithm in [30].The latest research proposed a Multiscale Superpixel-Based Sparse Representation [31].In this research, multiscale superpixels were generated and then each scale was represented by JSRC.Finally, a fusion result was gotten from multiscale results by majority voting.But the selection of scales for superpixels is still a problem.Although it uses multiscale to release the difficulty of selecting segmentation scale, it still needs a fundamental number of superpixels determined empirically.
In fact, patch and superpixel both have their own advantages and shortages.The patch can include all nearest neighbors but it also may contain noise pixels.Shape-adaptive superpixel can exploit more accurate spatial information but there are still some mixed superpixels when the scale is not optimal.In a mixed superpixel, there must be wrong representation because all pixels in the superpixel share the same representation.Inspired by merits and demerits of patch and superpixel, we propose to use a union region to replace the patch and superpixel.Union region refers to the overlap of patch and superpixel.Compared with patch, union region includes more similar pixels for the test pixel aiming at decreasing the effect of noise pixels.Compared with superpixel, union region provides more direct neighbors for the test pixel to enhance the representation of pixels located in the wrong superpixel.In addition, the required superpixels for generating union regions don't need empirical scale.The scales are determined by the size of the image and the corresponding patch sizes.By replacing patch in MASR with union region, we get a new algorithm called Multiscale Union Regions Adaptive Sparse Representation (MURASR).MURASR also adopts a probability majority voting method to optimize the classification result generated from the sparse representation.Experiment results show that the union region based algorithms always perform better than patch region based algorithms and the proposed MURASR outperforms other algorithms in terms of quantitative metrics and visual quality on the classification maps.
The rest parts of the paper are organized as follows.The JSRC and MASR are briefly introduced in Section 2. The details of proposed MURASR method are described in Section 3. The experimental results and discussions are presented in Section 4. Finally, Section 5 summarizes the paper and future works are suggested.The outline of the MURASR is illustrated in Figure 1.

JSRC
The sparse representation classification (SRC) framework was first proposed for face recognition [32].Then Chen et al. extended the SRC to pixelwise HSI classification, which relied on the observation that spectral pixels of a particular class should lie in a low-dimensional subspace spanned by dictionary atoms (training pixels) from the same class.But spatial information is not considered by Pixelwise Sparse Representation.Therefore, based on the observation that neighboring pixels belonging to the same class usually are strongly correlated with each other, JSRC is introduced to capture such spatial correlations by assuming that neighboring pixels within a region of fixed size can be jointly represented by a few common atoms from a structural dictionary.Concretely, let y ∈ R M×1 be a pixel with M denoting the number of spectral bands and where where A row,0 denotes the joint sparse norm, which is used to select a number of the most representative nonzero rows in A, and • F is the Frobenius norm.A variant of the OMP algorithm called the simultaneous OMP (SOMP) [33,34], can be used to efficiently obtain an approximate solution.After Â is recovered, the label of test pixel y 1 can be decided by the minimal total error: where Âc denotes the rows in Â associated with the cth class.

MASR
Compared with pixelwise SRC model, the JSRC can achieve more accurate classification results because of incorporating spatial information of local regions.However, the region size (or the region scale) has great influence on the classification performance.It is of great importance to determine an optimal region scale for the JSRC.
Then Fang et al. proposed the MASR to release the difficulty of choosing region scale.The MASR effectively exploits spatial information at multiple scales via an adaptive sparse strategy.Not only does the adaptive sparse strategy restrict pixels from different scales to be represented by training atoms from a particular class but also allow the selected atoms for these pixels to be varied, thus providing an improved representation.Given one test pixel y 1 in HSI, its T neighboring regions are selected via different predefined scales.Neighboring regions are defined by multiscale patches centered with test pixel.Then a multiscale matrix can be constructed by pixels within the selected regions, where the Y t includes pixels from the tth scale region.Since spatial structures and characteristics for different scales of regions are distinct, the generated multiscale matrix Y mp for the test pixel y 1 should provide complementary yet correlated information, which can be utilized to classify y 1 more accurately.
In MASR, an adaptive sparse strategy is adopted to utilize the correlated information among multiscales and achieve a flexible selection process for atoms.An important part of the adaptive strategy is the adoption of a collection of adaptive sets.Each adaptive set is denoted as the indexes of a set of nonzero scalar coefficients, which belong to the same class in the multiscale sparse matrix A mp .By combining the adaptive set with the row,0 norm, a new adaptive norm adaptive,0 is created on A mp , which can be used to select a small number of adaptive sets from A mp .Then, A mp matrix can be recovered by applying the adaptive norm as follows: After recovering the multiscale sparse representation matrix Âmp , a single decision can be made on the test pixel y 1 based on the lowest total representation error: where Âmp c represents rows in Âmp corresponding to the cth class.

Multiscale Union Regions Adaptive Sparse Representation
The aforementioned MASR shows good performance for HSI classification.But the MASR utilizes multiscale patches to exploit spatial information.In a patch, maybe most of the pixels are different from the test pixel, such as a pixel on the edge of a building.The classification may be misled by those noise pixels from other classes which are similar to the atoms in the dictionary, thus providing an incorrect classification for test pixel.In computer vision, superpixels have been studied to provide an efficient representation, which can facilitate visual recognition [35][36][37].Each superpixel is a perceptually meaningful region, whose shape and size can be adaptively changed according to different spatial structures.But how to find an optimal scale for superpixels is still a challenge.Without optimal scale, some mixed superpixels will be generated.Based on the fact that patch and superpixel may include pixels from different classes, a multiscale union regions adaptive sparse representation model is proposed to decrease the influence of noise pixels for the test pixel.The union region is the overlap of the patch and corresponding superpixel with the same scale (see Figure 2).For a test pixel, if the patch includes some noise pixels, the superpixel can provide more similar pixels to reduce the impact of noise pixels.In the same way, if the test pixel is located in the wrong superpixel which has seldom pixels similar to test pixel, the patch can provide more similar pixels to enhance the right representation.

Generation of Multiscale Union Regions
Before generating multiscale union regions, we should get multiscale superpixels.There are various researches focusing on the segmentation [36][37][38][39].In this paper, an oversegmentation algorithm called ERS [37] is applied to generate 2-D superpixel maps on the base images because of its high efficiency.Unlike the single-band gray or three-band color image, the HSI usually has hundreds of spectral bands.To improve the computational efficiency, PCA [40] is first used to reduce the spectral bands of the HSI.Since the important information of the HSI exists in the principle components (e.g., first three principle components), they are used as the base images.In this paper, only the first principle component is chosen as the base image.Instead of choosing scales for superpixels empirically, we calculate scales of superpixels based on corresponding patch sizes.Assuming that PS t refers to the patch size of tth scale and N total is the total number of pixels in the image (note that origin image will be extended for edge pixels), the superpixels number n t for tth segmentation is calculated as: In this way, the average size of superpixels is equal to patch size.Then most superpixels will have similar sizes with patches.It guarantees that superpixel and patch can have similar influence on union region.What's more, with the increasing of patch size, the superpixels number decreases fast.Thus, only limited number of segmentations can be generated.According to the performance of limited number of segmentations, it will be easier for users to determine the scales number.After segmentations, T superpixels are generated for each test pixel y 1 and these superpixels construct the corresponding multiscale matrix where the Y t includes pixels from the tth superpixel.Then for a specific tth scale, the union region Y mu t is defined as following:

Multiscale Union Regions Adaptive Sparse Representation
For a test pixel y 1 , the corresponding multiscale matrix is To solve this problem, the method used in MASR is applied.At each iteration, the current residual correlation matrix is calculated firstly.Then a a new adaptive set based on the current residual correlation matrix will be selected.Once the selecting of the new adaptive set is finished, the new adaptive set will be merged with previously selected adaptive sets.Then the sparse coefficients matrix is estimated based on the merged adaptive sets.Finally, the residue is updated.The iterations will stop if the termination criterion is satisfied.After the multiscale sparse representation matrix Âmu is recovered, the final label of the test pixel y 1 can be determined by minimal total representation error:

Probability Majority Voting
Because multiscale union regions adaptive sparse representation is a pixel-based classifier, there will be some pepper salt noise pixels in ground truth objects.Therefore, a majority voting process will be helpful to optimize the classification result.As mentioned above, for each test pixel in each scale, a union region will be generated.Then for the union region, the probabilities belonging to all classes are calculated.If a union region at ith scale contains N total i labeled pixels and N j i pixels classified to jth class, the probability belonging to jth class P j i is calculated as: Assuming that there are k classes, T scales of segmentation maps, the class label of the test pixel ĵ can be obtained by:

Data Sets
To verify the effectiveness of the proposed MURASR method and superiority of the union region, experiments are conducted on the following three hyperspectral

Comparison of Experiment Results
In the experiments, all related algorithms are based on sparse representation.Except for published algorithms SRC, JSRC and MASR, JUSRC (Joint Union Sparse Representation Classification), MJSRC (Multiscale Joint Sparse Representation Classification), MJUSRC (Multiscale Joint Union Sparse Representation Classification), MURASR* and MURASR were conducted in the experiments.To verify the priority of union region further, the patch used in JSRC was replaced by JUSRC with the union region.For demonstrating the superiority of multiscale adaptive strategy, we extended the JSRC and JUSRC with a simple multiscale scheme that applied the majority voting to the results of all scales for the final decision-making.The extended algorithms are called MJSRC and MJUSRC.What's more, the MURASR* is the MURASR without probability majority voting process.The comparison between MURASR* and MURASR can show the difference of whether the probability majority voting method was used or not.The parameters for the SRC, JSRC, and JUSRC algorithms were tuned to reach the best results in these experiments.For all multiscale algorithms, seven different scales were simultaneously adopted, and the selected region scales were as follows: 3 × 3, 5 × 5, 7 × 7, 9 × 9, 11 × 11, 13 × 13, and 15 × 15.Then superpixels numbers for segmentation were calculated with Equation ( 6) and listed in Table 1.Other parameters in MJSRC, MJUSRC, MASR, MURASR*, and MURASR were the same as [28].To evaluate the performance of classifiers, three objective metrics (overall accuracy (OA), average accuracy (AA) and kappa coefficient) are adopted.In addition, the McNemar's test is applied to analyse the experiment results.The McNemar's test is based on the standardized normal test statistic, as described in [42]: where h 12 represents the samples correctly classified by method 1 but incorrectly classified by method 2.
If |Z| > 1.96, the accuracy between two methods can be considered statistically significant.The sign of the Z indicates which method is better.If Z > 0, the method 1 is more accurate than method 2.
The Indian Pines data set was classified firstly.10% of the labeled pixels were randomly sampled for training from each class, while the rest 90% were used to test the classifiers (see Table 2).The classification maps generated by different classifiers on the Indian Pines image are shown in Figure 6.The details of the classification results averaged by ten runs with randomly sampled training samples are tabulated in Table 3.The results of the McNemar's tests between classifiers are listed in Table 4.It is easy to find that JUSRC, MJUSRC and MURASR* perform better than JSRC, MJSRC and MASR, which demonstrates the priority of union region over patch region.In addition, the multiscale majority voting based MJSRC and MJUSRC perform worse than the multiscale adaptive strategy based MASR and MURASR* for this image.Compared with MJSRC and MJUSRC, accuracy improvements of MASR and MURASR* are more than 3%.MURASR gets a better result than MURASR* in accuracy and classification map.As can be observed from the classification maps of MURASR* and MURASR, many misclassifications in MURASR* can be eliminated efficiently by probability majority voting method.What's more, MURASR performs best among all algorithms in terms of OA and AA, and the results of the McNemar's test are statistically significant and coherent with the obtained overall accuracies.The second experiment was performed on the Salinas data set.To compare the classification with MASR, only 1% of the labeled pixels for each class were randomly selected for training.Then the remaining 99% labeled data were classified with the classifiers to demonstrate the superiority of the proposed MURASR (see Table 5).The classification maps for various classifiers are illustrated in Figure 7 and the average quantitative results of ten runs are tabulated in Table 6.Moreover, the results of the McNemar's tests are shown in Table 7.As can be observed, union region based algorithms JUSRC, MJUSRC and MURASR* still get more accurate results than patch region based JSRC, MJSRC and MASR in terms of OA, AA and Kappa coefficients.The classification maps of MJSRC and MJUSRC have more pepper salt noise pixels than MASR and MURASR*.Comparing classification maps of MURASR* and MURASR, we can find that most misclassifications generated by MURASR* can be corrected by probability majority voting method.In addition, the average accuracy of MURASR is 99.70% which is very high for classification.Moreover, it should be noted that the McNemar's tests between classifiers are also statistically significant and coherent with the obtained overall accuracies.
The final experiment was conducted on the University of Pavia image.The shapes of surface objects in this image are more complex than previous two images.For each reference class, 200 train samples were randomly selected from the labeled data and the remaining pixels were used for testing the performance of various classifiers (see Table 8).The classification maps are demonstrated in Figure 8 and the detail data averaged by ten runs in term of OA, AA, and Kappa coefficients is listed in Table 9.The McNemar's tests between classifiers also were conducted on this image and the results are tabulated in Table 10.Same as previously mentioned two images, union region based classifiers also performed better than patch region based classifiers.Multiscale adaptive strategy still works better than multiscale majority voting strategy in this image.The accuracy improvement gained by probability majority voting is less than previous two images because the University of Pavia image has less large homogenous regions.And from Table 9, we can find that MASR only has more accurate result than MURASR with one class and MURASR performs best among all classifiers with 7 classes, which proves the priority of MURASR further.The results of the McNemar's tests also provide enough support for the analysis.
Compared with many presented algorithms, MASR is a time-consuming algorithm.In this paper, the proposed MURASR is designed based on the multiscale adaptive representation in MASR.Also, the generation of union regions will consume some time.Moreover, the union region has more pixels than patch region.Therefore, the MURASR is also a time-consuming algorithm and the time cost of MURASR is about twice as much as MASR.But the proposed MURASR was coded in MATLAB (R2016a, Mathworks, Portola Valley, CA, USA) and was not optimized for speed.The MURASR can be significantly sped up by changing the compiling code from MATLAB to C++ and adopting a general-purpose graphics processing unit (GPU).From Table 1, we can find when the scale number is 7, the calculated scale for superpixels is large enough.If the scale continues increasing, there will be more mixed superpixels generated.Moreover, the classification results of MURASR on three images are encouraging when the number of scales is 7. Therefore, the effects of scales number under or equal to 7 will be analyzed in this section.It means that scales for patches range from 3 × 3 to 15 × 15. Figure 9 shows the average OA of ten runs for JSRC, JUSRC, MJSRC, MJUSRC, MASR, MURASR* and proposed MURASR.For multiscale algorithms, each scale represents the combination of the current scale and its smaller scales.It is easy to find that the union region based classifiers JUSRC, MJUSRC, and MURASR* generally outperform corresponding patch region based JSRC, MJSRC and MASR.And the probability majority voting method can optimize the classification result on each region scale.In addition, the proposed MURASR consistently outperforms other algorithms on all the region scales.

Effects of Training Samples Number
The number of training samples may affect the performance of the classifiers.Therefore the effects of different number of training samples on the JSRC, MJSRC, JUSRC, MJUSRC, MASR, MSPASR and proposed MURASR were examined on the three images.For the Indian Pines, the number of selected training samples for every class varies from 1% to 20% percentage.For the Salinas, the percentage is from 0.1% to 2%.For the University of Pavia, 60-500 training samples were selected for each reference class.The difference in terms of classification OA for each classifier with different number of training samples is illustrated in Figure 10.The OA is also the average of ten runs.As can be observed, the union region based classifiers JUSRC, MJUSRC and MURASR* always perform better than corresponding patch region based JSRC, MJSRC, and MASR.Comparing the result of MURASR* and MURASR, it is easy to find that the improvement obtained from probability majority voting method increases with the decreasing of training samples number.Moreover, the proposed MURASR generally outperforms other classifiers on all the training samples.

Conclusions
In this paper, a novel multiscale union region adaptive sparse representation, the MURASR, which uses union region integrating patch and superpixel to exploit the spatial information, is proposed for spectral-spatial HSI classification.Unlike the patch region based MASR, the proposed MURASR extends the patch region to the union region.The union region utilizes the integration of the observation that neighboring pixels that belong to the same material usually are strongly correlated with each other and pixels in the superpixel usually belong to the same material.Before sparse representation, multiscale union regions are generated via the union operation for patch and superpixel.Then multiscale adaptive sparse representation is adopted to classify multiscale union regions and an effective probability majority voting method is applied to generate the final result.Experiments on three HSIs demonstrate that the union region based algorithms always perform better than patch region based algorithms and the proposed MURASR outperforms other algorithms in terms of quantitative metrics and visual quality for the classification maps.
As the MURASR is a pixel-based algorithm, if we replace the superpixel with a region growing up from each test pixel, the generated union region will have more accurate representation of the spatial information.Thus, the further research will generate one superpixel for each test pixel.In addition, the structure dictionary for sparse representation is constructed directly by selected training pixels.A trained structure dictionary may decrease the running time of the algorithm and provide more accurate representation for test pixels.

Figure 1 .
Figure 1. Outline of the proposed MURASR framework.
be a structure dictionary, where D c ∈ R M×N c , c = 1, • • • , C is the cth class subdictionary whose columns (atoms) are extracted from the training pixels; C is the number of classes; N c is the number of atoms in subdictionary D c ; and N = ∑ C c=1 N c is the total number of atoms in D. Specifically, the size of a region surrounding the test pixel y 1 is denoted by W × W, and pixels within such a region can be denoted by a matrix Y = [y 1 , y 2 , • • • , y W×W ].The matrix can be compactly represented as:

Figure 2 .
Figure 2. Three kinds of spatial regions: (a) fixed-size patch; (b) adaptive size superpixel; and (c) union of patch and superpixel.The blue pixel represents test pixel, orange pixels are neighbors defined by patch, green pixels are neighbors defined by superpixel and red pixels are overlap of neighbors defined by patch and superpixel.
data sets: the Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) Indian Pines data, the AVIRIS Salinas data, and the Reflective Optics System Imaging Spectrometer (ROSIS-03) University of Pavia data.The AVIRIS Indian Pines image has 220 data channels with the size of 145 × 145 across the spectral range from 0.2 to 2.4 µm.It was captured over the agricultural Indian Pine test site in northwestern Indiana with a spatial resolution of 20 m per pixel.Before classification, 20 water absorption bands (No. 104-108, 150-163 and 220) were discarded [41].Figure 3a,b show the color composite of the Indian Pines image and the corresponding reference data with 16 reference classes from different types of crops.

Figure 5 .
Figure 5. University of Pavia image: (a) three-band color composite image; (b) reference image.
is the sparse coefficients matrix corresponding to Y. Since the indexes of the selected atoms in D are determined by the positions of nonzero coefficients in [A 1 , A 2 , • • • , A W×W ], the neighboring pixels [y 1 , y 2 , • • • , y W×W ] can be represented by a small set of common atoms by enforcing a few nonzero rows on the sparse coefficients matrix A. Then, matrix A can be obtained by solving the following optimization problem: where Y t is the union of Y

Table 1 .
Number of Superpixels in Each Scale.

Table 2 .
Sixteen reference classes in the Indian Pines image.

Table 3 .
Classification accuracy (averaged on ten runs with randomly sampled training samples) of the Indian Pines image.The best results are highlighted in bold typeface.

Table 4 .
The McNemar's tests between classifiers (averaged on ten runs with randomly sampled training samples) of the Indian Pines image.

Table 5 .
Sixteen reference classes in the Salinas image.

Table 6 .
Classification accuracy (averaged on ten runs with randomly sampled training samples) of the Salinas image.The best results are highlighted in bold typeface.

Table 7 .
The McNemar's tests between classifiers (averaged on ten runs with randomly sampled training samples) of the Salinas image.

Table 8 .
Nine reference classes in the University of Pavia image.

Table 9 .
Classification accuracy (averaged on ten runs with randomly sampled training samples) of the University of Pavia image.The best results are highlighted in bold typeface.