Hybrid Collaborative Representation for Remote-Sensing Image Scene Classification

In recent years, the collaborative representation-based classification (CRC) method has achieved great success in visual recognition by directly utilizing training images as dictionary bases. However, it describes a test sample with all training samples to extract shared attributes and does not consider the representation of the test sample with the training samples in a specific class to extract the class-specific attributes. For remote-sensing images, both the shared attributes and class-specific attributes are important for classification. In this paper, we propose a hybrid collaborative representation-based classification approach. The proposed method is capable of improving the performance of classifying remote-sensing images by embedding the class-specific collaborative representation to conventional collaborative representation-based classification. Moreover, we extend the proposed method to arbitrary kernel space to explore the nonlinear characteristics hidden in remote-sensing image features to further enhance classification performance. Extensive experiments on several benchmark remote-sensing image datasets were conducted and clearly demonstrate the superior performance of our proposed algorithm to state-of-the-art approaches.


Introduction
Remote-sensing (RS) images are widely used for land-cover classification, target identification, and thematic mapping from local to global scales owing to its technical advantages such as multiresolution, wide coverage, repeatable observation, and multi/hyperspectral-spectral records [1]. During the past few decades, we have witnessed the rapid development of remote-sensing technology. Nowadays, a volume of heterogeneous RS images, with different spatial and spectral resolutions, provide a vivid service for a myriad Earth observation (EO) applications, from road detection and traffic monitoring to visual tracking and weather reports, respectively. To offer a better performance of such intelligent EO applications, RS image scene recognition has attracted widespread attention. Weifeng Liu et al. [2,3] applied p-Laplacian regularization to scene recognition. Generally, an integrated RS image scene-recognition system includes two components, i.e., a feature-learning approach [4] and a corresponding classifier, and both have a vital effect on the classification result.
As a core problem in image-related applications, image feature representation exhibits the trend of transference from handcrafted to learning-based methods. Specifically, most of the early literature representation-based classification, Deng et al. [25] proposed a superposed linear representation classifier (SLRC) by representing the test image in terms of superposition of the class centroids and the shared intraclass differences. For class-specific representation-based classification, Liu et al. [26,27] proposed a class-specific representation algorithm that can find the intrinsic relationship between base vectors and the original-image features. Wang et al. [28] proposed a label-constrained specific representation approach to preserve the structural information in the feature space.
In this paper, we propose a hybrid collaborative representation-based classification approach. The proposed method is capable of improving the performance of classifying remote-sensing images by embedding the class-specific collaborative representation to CRC. Moreover, we extend the proposed method to arbitrary kernel space to explore the nonlinear characteristics hidden in remote-sensing image features to further enhance the performance of classification. The scheme of our proposed method is listed in Figure 1. Our work foci are threefold: • We propose a novel hybrid collaborative representation-based classification method that considers both conventional collaborative representation and class-specific collaborative representation.

•
We extend our proposed hybrid collaborative representation-based classification method to arbitrary kernel space to find the nonlinear structures hidden in the image features.

•
The proposed hybrid collaborative representation-based classification method is evaluated on four benchmark remote-sensing image datasets and achieves state-of-the-art performance. Figure 1. Scheme of our proposed hybrid collaborative representation algorithm. The left part is the collaborative representation to extract the shared attributes while the right part is the class-specific collaborative representation to extract class-specific attributes. Their combination forms our hybrid collaborative representation algorithm.
The rest of the paper is organized as follows. Section 2 overviews several classical visual-recognition algorithms and proposes our hybrid collaborative representation-based classification with kernels. Then, experimental results and analysis are shown in Section 3. Discussion about the experimental results and the proposed method is presented in Section 4. Finally, conclusions are drawn in Section 5.

Proposed Method
In this section, we review related work about CRC. Then, we introduce some work about class-specific CRC (CS-CRC). Finally, we focus on introducing our proposed approach.

Overview of CRC
Zhang et al. [20] proposed the CRC. For CRC, all training samples are concatenated together as the base vectors to form a subspace, and the test sample is described in the subspace. To be specific, given the training samples X = [X 1 , X 2 , · · · X C ] ∈ R D×N , X c ∈ R D×N c represents the training samples from the c th class, C represents the number of classes, N c represents the number of training samples in the c th class (N = C ∑ c=1 N c ), and D represents the dimensions of the samples. Supposing that y ∈ R D×1 is a test sample, the objective function of CRC is as follows: Here, λ is the regularization parameter to control the tradeoff between fitting goodness and collaborative term (i.e., multiple entries in X participating into representing the test samples). The role of the regularization term is twofold. Firstly, the 2 norm makes the least-square solution stable. Secondly, it introduces a certain amount of "sparsity" to collaborative representationŝ and indicates that it is the collaborative representation but not the 1 -norm sparsity that makes sparsity powerful for classification. Collaborative representation-based classification effectively utilizes all training samples for visual recognition and the objective function of CRC has analytic solutions.

Class-Specific Collaborative Representation
For class-specific collaborative representation, the training samples in each category are considered a subspace. A test sample is represented with the samples in the specific class. The objective function of class-specific collaborative representation is as follows: Here, γ is the regularization parameter to control the tradeoff between fitting goodness and collaborative term. The CS-CRC is capable of describing the sample y in each category. Proof of Theorem 1. According to Cauchy inequality,

Hybrid Collaborative Representation
Shared collaborative representation is advantageous to reduce the reconstruction error (as shown in Theorem 1), while class-specific collaborative representation is conducive to capturing discriminant information. We propose a hybrid collaborative representation algorithm to combine these two approaches. The objective function is as follows: In Equation (5), the first two terms are the conventional collaborative representation and the latter two terms are the class-specific collaborative representation. Conventional collaborative representation guarantees residual error and robustness, while the class-specific collaborative representation obtains distinctiveness via different classes. Equation (5) can be further arranged as follows: Here, β = λ + τ × γ. The latter form of Equation (6) arose in the estimation of latent-variable graphical models [29].

Hybrid Collaborative Representation with Kernels
Superior performance of visual recognition is often achieved in Reproducing Kernel Hilbert Space because a nonlinear structure often exists in image features. Our proposed hybrid collaborative representation algorithm is easily extended to arbitrary kernel space. Suppose there exists a kernel The objective function of our proposed hybrid collaborative representation algorithm with kernels is as follows:

Hybrid Collaborative Representation-Based Classification with Kernels
After obtaining collaborative codeŝ, hybrid collaborative representation-based classification aims to find the minimum value of the residual error for each class: where, id(y) is the label of the testing sample, and y belongs to the class that has the minimum residual error. The procedure of the hybrid collaborative representation-based classification with kernels is shown in Algorithm 1.

Algorithm 1 Algorithm for hybrid collaborative representation-based classification with kernels
Require: Training samples X ∈ R D×N , β, and test sample y 1: for c = 1;c ≤ C;c++ do 2: Code y with the hybrid collaborative representation algorithm with kernels.

Experimental Results
In this section, we show our experimental results on four remote-sensing image datasets. To illustrate the significance of our approach, we compared our method with several state-of-the-art methods. In the following section, we firstly introduce the experimental settings. Then, we illustrate the experimental results on each aerial-image dataset.

Experimental Settings
We tested our method on four datasets, the RSSCN7 dataset [30], UC Merced Land Use dataset [5], WHU-RS19 dataset [31], and AID dataset [32]. For all the datasets, we implemented traditional CNN feature representation, where the image is directly fed into the pretrained VGG model [33] and layer fc6 is utilized to extract a 4096-dimensional vector for each image. The final feature of each image is 2 -normalized for better performance [32]. To eliminate the randomness, we randomly (repeatable) split the dataset into the train set and test set for 10 times, respectively. Average accuracy was recorded.

Experiment on UC Merced Land Use Dataset
The UC Merced Land Use Dataset [5] was manually extracted from large images from the USGS National Map Urban-Area Imagery collection for various urban areas around the country. The pixel resolution of this public-domain imagery is 1 foot. The UC Merced Land Use Dataset involves 21 categories of 2100 land-use images in total. Each image measures 256 × 256 pixels. There are 100 images for each of the following classes: agricultural, airplane, baseball diamond, beach, buildings, chaparral, dense residential, forest, freeway, golfcourse, harbor, intersection, medium residential, mobile homepark, overpass, parking lot, river, runway, sparse residential, storage tanks, and tennis court. In Figure 2, we list several samples from this dataset. There are two parameters in the objective function of the Hybrid-KCRC algorithm that need to be specified. β is an important parameter in the Hybrid-KCRC algorithm, which is used to adjust the tradeoff between the reconstruction error and the collaborative representation. τ is another important factor in the algorithm, which is used to control the tradeoff between the shared collaborative representation and the class-specific collaborative representation. β and τ are tuned to achieve the best accuracy. β are tuned in the range of 2 −9 , and 2 2 and τ are tuned in the range of 2 −10 and 2 −4 . We randomly chose 20 images as the training samples and testing samples from each category, respectively, in this section. Figure 3 shows the classification rate with different β and τ on four kernels. For the four kernels, linear kernel, POLY kernel, RBF kernel, and Hellinger kernel, the optimal parameter β and τ are (2 −4 , 2 −6 ), (2 0 , 2 −8 ), (2 −7 , 2 −7 ), (2 0 , 2 −8 ), respectively.

Comparison with Several Classical Classifier Methods on UC Merced Land Use Dataset
First, we randomly chose 20 images as the training samples and testing samples from each category, respectively, in this section. Table 1 illustrates the effectiveness of Hybrid-CRC for classifying images. For the four kernels, Hybrid-KCRC algorithm achieves the highest accuracy of 91.43% with the POLY kernel and RBF kernel. This is 1.03% higher than the CRC method and 2.33% higher than the CS-CRC method. Second, we increased the number of training samples in each category to evaluate the performance of our proposed Hybrid-KCRC method. Figure 4 shows the classification rate on the UC-Merced dataset with 20, 40, 60, 80 training samples in each category. From Figure 4, we can draw the conclusion that our proposed Hybrid-KCRC method achieves superior performance to the liblinear, CRC, and CS-CRC methods.

Confusion Matrix on UC Merced Land Use Dataset
To further illustrate the superior performance of our proposed Hybrid-KCRC method, we evaluated the classification rate per class of our method on UC-Merced dataset using a confusion matrix. In this section, we randomly chose 80 images per class as the training samples and 20 images per class as the testing samples. To eliminate randomness, we also randomly (repeatable) split the dataset into a train set and test set for 10 times, respectively. The confusion matrices are shown in Figure 5. From Figure 5, for the Hybrid-KCRC methods, 13, 13, 12, 12 classes achieved classification accuracy greater than or equal to 0.99 for the linear kernel, polynomial kernel, RBF kernel, and Hellinger kernel, respectively. However, 9 and 6 classes achieved classification accuracy greater than or equal to 0.99 for CRC and CS-CRC, respectively. Compared with the CRC method, the Hybrid-KCRC methods achieved a significant performance boost on the denseresidential class. Compared with CS-CRC method, the Hybrid-KCRC methods achieved great performance improvement on the storagetanks and tenniscourt classes. It is worth noting that all methods did not perform well on denseresidental and medimuresidential classes.

Comparison with State-of-the-Art Approaches
For comparison, we referred to previous works in the literature [35,36] and randomly selected 80% images from each class as the training set, and the remaining 20% images as the testing set. Several baseline methods (e.g., liblinear, CRC, CS-CRC), as well as state-of-the-art remote-sensing image-classification methods were utilized as benchmarks. Table 2 shows the overall accuracy of the classification rate with various remote-sensing image-classification methods. First, we compared our proposed Hybrid-KCRC method with the liblinear, CRC, and CS-CRC methods. By comparing our proposed Hybrid-KCRC with the three baseline methods mentioned above, we found that Hybrid-KCRC achieved superior performance to the three baseline methods. It is remarkable that our proposed Hybrid-KCRC is the improvement of the CRC and CS-CRC methods. Second, we compared our Hybrid-KCRC with state-of-the-art remote-sensing image-classification results. It is clear that our proposed Hybrid-KCRC achieved the top performance. It should be noted that the feature utilized by CNN-W + VLAD with SVM, CNN-R + VLAD with SVM, CaffeNet + VLAD is more effective than the feature extracted directly from CNN (e.g., CaffeNet method with 93.42% versus CaffeNet + VLAD method with 95.39%). 87.69% WDM [36] 2017 95.71% UCFFN [35] 2018 87.83% CNN-W + VLAD with SVM [43] 2018 95.61% CNN-R + VLAD with SVM [43] 2018 95.85%

Experiment on the RSSCN7 Dataset
The RSSCN7 dataset was collected from Google Earth 3, which contains 2800 aerial-scene images divided into seven classes, i.e., grassland, forest, farmland, industry, parking lot, residential, and river and lake region. There are 400 images in each class. All images had the same size of 400 × 400 pixels. Figure 6 shows several sample images from the dataset.

Comparison with Several Classical Classifier Methods on the RSSCN7 Dataset
First, we randomly selected 100 images as the training samples and 100 images as the testing samples from each category. For the four kernels, linear kernel, POLY kernel, RBF kernel, and Hellinger kernel, optimal parameters β and τ were (2 −4 , 2 −7 ), (2 1 , 2 −6 ), (2 −6 , 2 −6 ), (2 1 , 2 −6 ), respectively. Recognition accuracy is shown in Table 3. From Table 3, we can see that the Hybrid-KCRC algorithm outperformed other conventional methods, achieving accuracy of 86.39%, 87.34%, 86.71%, 87.29% for the linear kernel, POLY kernel, RBF kernel, and Hellinger kernel, respectively. For the four kernels, the Hybrid-KCRC algorithm achieved the highest accuracy, 87.34%, with the POLY kernel. This is 1.57% higher than the CRC method, and 3.11% higher than the CS-CRC method.
Second, we increased the number of training samples in each category to evaluate the performance of our proposed Hybrid-KCRC method. Figure 7 shows the classification rate on the RSSCN7 dataset with 100, 200, and 300 training samples in each category. From Figure 7, we find that our proposed Hybrid-KCRC algorithm achieved superior performance to the baseline methods. The experiment on the CS-CRC method with 200 training samples also achieved a lower classification rate than with 300 training samples. The reason might be that too many training samples make the CRC overfitted. Moreover, both the hybrid-KCRC (poly) and hybrid-KCRC (rbf) achieved top accuracy.

Confusion Matrix on the RSSCN7 Dataset
To further illustrate the superior performance of our proposed Hybrid-KCRC method, we evaluated the classification rate per class of our method on the RSSCN7 dataset using a confusion matrix. In this section, we randomly chose 200 images per class as the training samples and 100 images per class as the testing samples. To eliminate randomness, we also randomly (repeatable) split the dataset into a train set and test set for 10 times, respectively. The confusion matrices are shown in Figure 8. From Figure 8, compared with the CS-CRC method and CRC method, the Hybrid-KCRC methods achieved better performance in most categories.
From Table 4, we can see that the Hybrid-KCRC algorithm outperformed other conventional methods, achieving accuracy of 94.76%, 95.34%, 95.34%, 95.39% for the linear kernel, POLY kernel, RBF kernel, and Hellinger kernel, respectively. For the four kernels, the Hybrid-KCRC algorithm achieved the highest accuracy of 95.39% with the Hellinger kernel. This is 0.81% higher than the CRC method and 1.44% higher than the CS-CRC method. Then, we changed the number of training samples in each category to illustrate the performance of our proposed method. Figure 10 shows the classification accuracy on the WHU-RS19 dataset with 10, 20, and 30 training samples in each category. We can clearly see from Figure 10 that our proposed Hybrid-KCRC algorithm achieved superior performance to classical methods.

Experiment on the AID Dataset
The AID dataset is a new large-scale aerial-image dataset that collects sample images from Google Earth imagery. The AID dataset is the most challenging dataset for the scene classification of aerial images. The dataset is made up of the following 30 aerial-scene types: airport, bare land, baseball field, beach, bridge, center, church, commercial, dense residential, desert, farmland, forest, industrial, meadow, medium residential, mountain, park, parking, playground, pond, port, railway station, resort, river, school, sparse residential, square, stadium, storage tanks, and viaduct. The size of each aerial image is fixed to 600 × 600 pixels to cover a scene with various resolutions. There are 10,000 images labeled into 30 categories. In Figure 11, we show several images of this dataset. We randomly selected 20 and 20 samples per class for training and testing, respectively. For the four kernels, linear kernel, POLY kernel, RBF kernel, and Hellinger kernel, optimal parameters β and τ were (2 −4 , 2 −7 ), (2 0 , 2 −8 ), (2 −7 , 2 −9 ), (2 3 , 2 −6 ), respectively. Recognition accuracy is shown in Table 5. From Table 5, we can see that the Hybrid-KCRC algorithm outperformed other conventional methods, achieving an accuracy of 81.07%, 82.07%, 82.05%, 81.28% for the linear kernel, POLY kernel, RBF kernel, and Hellinger kernel, respectively. For the four kernels, the Hybrid-KCRC algorithm achieved the highest accuracy of 82.07% with the POLY kernel. This is 1.34% higher than the CRC method and 4.15% higher than the CS-CRC method. We also used different numbers of training samples in each category to evaluate the performance of the Hybrid-KCRC method. The classification rate on the AID dataset using 20, 40, 60, 80 training samples in each category is shown in Figure 12. From Figure 12, we find that our proposed Hybrid-KCRC algorithm was better than several classical classification methods.

•
For RS image classification, both shared attributes and class-specific attributes are vital to the representation of testing samples with training samples. So, based on CRC, we propose a hybrid collaborative representation-based classification method that can decrease the reconstruction error and improve the classification rate. Through comparison with several state-of-the-art methods for RS image classification, we can see that our proposed method is capable of efficiently promoting the performance of classifying remote-sensing images.

•
Because of the existence of a nonlinear structure in image features, we extended our method into Reproducing Kernel Hilbert Space to further improve the performance of our method with kernel functions. From the experimental results of comparing with several classical classification methods, we can see that the classification rates of the Hybrid-KCRC method on these four datasets are all higher than that with the NN, LIBLINEAR, SOFTMAX, SLRC-L2 , CRC and CS-CRC methods. Obviously, our proposed Hybrid-KCRC method achieved superior performance to these methods.

•
We took the UC-Merced dataset as an example and evaluated the performance of our proposed Hybrid-KCRC method per class with a confusion matrix. From the confusion matrix, we can see that the Hybrid-KCRC method is better than other methods in most categories.

•
It is true that there are several pretrained models to extract features and the performance of the resnet model outperforms the performance of the vgg model. In our paper, however, we paid more attention to the design of the classifier and not feature extraction. We only extracted the features of remote-sensing images to complete the classification task. The vgg is also very popular candidate models for extracting CNN activations of images. That are the reasons why we chose vgg. As a matter of fact, our method could be further improved by using other, better feature-extraction pretrained models. To demonstrate this, we also extracted CNN features with the pretrained Resnet model [44], and layer pool5 was utilized. The feature was a 2048-dimensional vector for each image. The final feature of each image was 2 -normalized. The experimental results are shown in Table 6. For fair comparison on each dataset, we fixed the ratio of the number of training sets of the UC-Merced dataset, the WHU-RS19 dataset, the RSSCN7 dataset, and the AID dataset to 80%, 50%, 60% and 50%, respectively. From For the RBF kernel function, the metric was different from the linear kernel. The distance between two points x and y from the linear kernel space would be closer in the RBF kernel space if x and y were close, with the contrary conclusion if x and y were far away. This makes representation more discriminative and achieves higher classification accuracy. For the polynomial kernel, the linear kernel can be a special case of the polynomial kernel (p = 0, q = 1). Note that kernel function κ(x, y) can be approximated by φ T (x)φ(y) [45] to save time for the learning algorithm. We will adopt the approximation of kernel κ(x, y) to save time in future works.

•
In the literature [23], the author proposed a joint collaborative representation (CR) classification method that uses several complementary features to represent images, including spectral values and spectral gradient features, Gabor texture features, and DMP features. This multifeature fusion can also be implemented via our proposed method. In the literature [24], the author proposed a spatial-aware CR for hyperspectral image classification that utilized both spatial and spectral features to represent an image. The penalty term can also be added into the objective function of our proposed method.

Conclusions
In this paper, we proposed a hybrid collaborative representation-based algorithm via embedding class-specific collaborative representation into conventional collaborative representation-based classification to improve the performance of classifying remote-sensing images. The proposed method is capable of balancing class-specific collaborative representation and shared collaborative representation. Moreover, we extended the proposed method to arbitrary kernel space to explore the nonlinear characteristics hidden in remote-sensing image features to further enhance classification performance. Extensive experiments on four benchmark remote-sensing image datasets have demonstrated the superiority of our proposed hybrid collaborative representation algorithm.