Kernel Supervised Ensemble Classifier for the Classification of Hyperspectral Data Using Few Labeled Samples

Kernel-based methods and ensemble learning are two important paradigms for the classification of hyperspectral remote sensing images. However, they were developed in parallel with different principles. In this paper, we aim to combine the advantages of kernel and ensemble methods by proposing a kernel supervised ensemble classification method. In particular, the proposed method, namely RoF-KOPLS, combines the merits of ensemble feature learning (i.e., Rotation Forest (RoF)) and kernel supervised learning (i.e., Kernel Orthonormalized Partial Least Square (KOPLS)). In particular, the feature space is randomly split into K disjoint subspace and KOPLS is applied to each subspace to produce the new features set for the training of decision tree classifier. The final classification result is assigned to the corresponding class by the majority voting rule. Experimental results on two hyperspectral airborne images demonstrated that RoF-KOPLS with radial basis function (RBF) kernel yields the best classification accuracies due to the ability of improving the accuracies of base classifiers and the diversity within the ensemble, especially for the very limited training set. Furthermore, our proposed method is insensitive to the number of subsets.


Introduction
Hyperspectral remote sensing images, which can record hundreds of contiguous spectral bands in each pixel of the image, contains plenty of spectral information.The growing availability of hyperspectral imagery has opened up a new area for the investigation of the urbanization, land cover mapping, surface material analysis and target detection with improved accuracy [1][2][3][4][5].The rich spectral information in hyperspectral images provides a great potential for generating more accurate classification maps compared to the ones produced by the multi-spectral images.
However, high dimensionality and relatively small size of training set pose the well-known Hughes phenomenon, which limits the performances of supervised classification methods [6].In order to alleviate this problem, many strategies have been proposed.As far as classification algorithms are concerned, ensemble learning or classifier ensemble has been shown to have the ability of alleviating the contradict of small training set and high-dimensionality.Furthermore, ensemble learning proved to provide better and more robust solutions in numerous remote sensing applications [7][8][9] in terms of the variety of available classification algorithms and the complexity of the hyperspectral data.The effectiveness of an ensemble method relies on the diversity and accuracy of the base classifiers [10,11].Since the ensemble is typically more effective than a single classifier, many approaches have been developed and widely used in remote sensing applications [12][13][14][15][16].For instance, [15] applied multiple classifiers (e.g., Bagging, Boosting and consensus theory) to multisource remote sensing data, and demonstrated that they outperformed several traditional classifiers in terms of accuracies.In [16] suggested that the Random Forest (RF) classifier performed equally to or better than the support vector machines (SVMs) for the classification of hyperspectral data.In particular, a special attention has been paid to the Rotation Forest (RoF), which is a relatively new classifier ensemble that can improve the accuracy of individual classifiers and diversity within the ensemble simultaneously [17].The authors of [18][19][20] adapted RoF to classify hyperspectral images and found that it achieved better performances then traditional ensemble methods, e.g., Bagging, AdaBoost and RF.The authors of [21] proposed to apply RoF and RF for fully polarized SAR image classification using polarimetric and spatial features, and demonstrated that RoF can get better accuracy than SVM and RF.
Although RoF has demonstrated great performances in the classification of hyperspectral data, feature extraction methods used in RoF are limited to unsupervised ones in the previous studies, e.g., principle component analysis (PCA).RoF builds classifier ensembles based on independent decision tree by using feature extraction and random subspace so that each tree is trained on the training samples in a rotated feature space.It must be pointed out that, in the context of RoF, all the components derived from the feature extraction are kept, the discriminatory information is preserved even though it lies with the component responsible for the least variance [17].According to the available prior class information, feature extraction as a pre-processing step of hyperspectral image analysis can be categorized into unsupervised and supervised ones [22,23].
In terms of feature reduction, PCA is one of the most popular unsupervised feature extraction methods in remote sensing community [24,25].In contrast, supervised methods take into account prior class information to increase the separability of classes.A number of supervised feature extraction approaches, e.g., Fisher's linear discriminant analysis (FLDA) [26], partial least square regression (PLS) [27] and orthonormalized partial least square regression (OPLS) [28], have been developed.In remote sensing community, a modified FLDA was presented for the dimensionality reduction of hyperspectral remote sensing imagery, and the desired class information was well preserved and separated in the low-dimensional space [29].The authors of [30] have found that PLS was superior to PCA when achieving the goal of discrimination and dimensionality reduction.OPLS is a variant of PLS, which is applicable to supervised problems, with certain optimality conditions regarding PLS.Moreover, considering that OPLS projections are obtained to predict the output labels, in consequence much more discriminative projection vectors are extracted compared to LDA, PLS [31,32].
A critical shortcoming of supervised feature extraction methods mentioned above is that they are based on the linear relation between the input and output spaces, which does not reflect the real data behavior [31,33,34].In order to alleviate this problem, kernel methods have been developed and applied to the feature selection and feature reduction in hyperspectral image [35,36].Moreover, as far as OPLS is concerned, the estimation of required parameter in OPLS is inaccurate without sufficient training set [37].In order to circumvent these limitations, a non-linear version of OPLS, i.e., kernel OPLS (KOPLS), has been developed [38].It is a very powerful feature extractor due to its appealing property of obtaining the non-linear projections by using kernel functions.In [31], experimental results revealed that KOPLS largely outperformed the traditional (linear) PLS algorithm especially in the context of nonlinear feature extraction.
In view of the above-mentioned facts, in this paper, we propose a novel kernel supervised feature learning classification scheme, namely RoF-KOPLS, which succeeds in taking advantages of the merits of KOPLS and RoF simultaneously.In the training step, the feature space is randomly split into K disjoint subspace and KOPLS is applied to each subspace to generate the kernel matrix and the transformation matrix.Then all the extracted features are retained to reformulate the new feature set for the training of decision tree (DT) classifier.In the prediction step, the new feature set of test samples is obtained by the kernel matrix and the transformation matrix, and then used to predict the class labels.The final classification result is assigned to the corresponding class which gets the maximum number of votes.We would like to emphasize that in this work we focus on pixel-wise classification, although RoF can be combined with spatial information, such as Markov random fields [20].In order to examine the effectiveness of the proposed classification algorithm, experiments were conducted on two different hyperspectral airborne images: an AVIRIS image acquired over the Northwestern Indiana's Indian Pines site and a ROSIS image of the University of Pavia, Italy.
The remainder of this paper is organized as follows.In Section 2, Rotation Forest and OPLS are introduced.In Section 3, the proposed classification scheme is described based on the introduction of OPLS, KOPLS and RoF.Experimental results obtained on two different hyperspectral images are presented in Section 4. In Section 5, experimental results are discussed.Finally, we conclude this paper with some conclusions and future lines.

Rotation Forest
Rotation Forest is a novel ensemble classifier for building independent decision trees built on the different sets of extracted features [17].The main steps of RoF are summarized as follows: (1) the feature space is randomly split into K disjoint subsets and each subset contains M features; (2) PCA is applied to each feature set with a bootstrapped samples of 75% size of the original training set; (3) a sparse rotation matrix R i is constructed by concatenating the coefficients of the principal components in each subset; (4) an individual DT classifier is trained with the new training samples formed by concatenating M linear extracted features in each subset; (5) by repeating the above steps several times, multiple classifiers were generated, and the final result is achieved by combining the outputs of all classifiers.The main training and prediction steps of RoF are shown in Algorithm 1. Classification and regression tree (CART) is adopted as the base classifier in this paper because of its sensitiveness to the rotations of axes [39].The Gini index, is used to select the best split in the construction process of DT.

Orthonormalized Partial Least Square (OPLS)
OPLS is a multivariate analysis method for feature extraction, which exploits the correlation between the features and the target data by combining the merits of canonical variate analysis and PLS [28,31,32].Given a set of training samples {X, Y} = {x i , y i } n i=1 , where x i ∈ R D and y i ∈ R. n and D represent the number of training samples and the dimensionality, respectively.Let X and Y represent respectively.Here, we denote by X and Ỹ the columnwise-centered version of X and Y, and denote by d the number of extracted features from the original data.Let C XY = 1 n X Ỹ represent the covariance between X and Y, whereas the covariance matrix of X is given by C XX = 1 n X X. U ∈ R D×d is referred as the projection matrix, thus the extracted features can be formulated by X = XU.
The objective of OPLS is formulated as Equation ( 1) OPLS is optimal (i.e., in the sense of mean-square-error) for performing linear multiregression on a given number of features extracted from the input data [40].

Algorithm 1 Rotation Forest
Input: {X, Y} = {x i , y i } l i=1 : training samples, T: number of classifiers, K: number of subsets (M: number of features in each subset), L: base classifier.The ensemble L = ∅.F: Feature set 1: for i = 1 : T do 2: randomly split the features F into K subsets F i j 3: form the new training set X i,j with F i j 5: generate Xi,j by using the bootstrap algorithm, the 75% of the initial training samples 6: using PCA to transform Xi,j to get the coefficients v end for sparse matrix R i is composed of the above coefficients rearrange R i to R a i so as to correspond to the original feature set ).As a result, x * is assigned to the class with the largest confidence.

Kernel Orthonormalized Partial Least Square (KOPLS)
OPLS assumes that there exists the linear relation between the input features and the label.It might not be applicable when the linearity assumption is not hold.Kernel methods have been developed to alleviate this problem and demonstrated to be effective in many application domains [41,42].In kernel methods, the original input data is mapped into a high or even infinite dimensional feature space by a non-linear function.The core of kernel methods lies in the implicit non-linear mapping since only the inner products are needed in the transformation [38,43].
Let us consider the function φ : R D → H that maps the input data into a Reproducing Kernel Hilbert feature space H of very high-dimension or even infinite dimension.Thus, the input variables {x i , y i } n i=1 is mapped to {φ(x i ), y i } n i=1 , where Φ ∈ R n×dim(H) is the non-linear mapping with i-th row of vector φ(x i ).The extracted features can be given by Φ = ΦU.
The kernel version of OPLS can be expressed as follows: where Φ is the centered version of Φ.
According to the Representer Theorem [41], each projection vector in U can be written as the linear combination of the training data, such as U = Φ A, where matrix A = [α 1 , • • • , α d ] and α i is the column vector containing the coefficients for the i-th projection vector [31], which is a new argument for the maximization optimization problem.KOPLS method can be reformulated as follows: where, the kernel matrix is defined as K x = Φ Φ.In this paper, three kernels are used: • Polynomial Kernel: • Radial Basis Function Kernel:

Rotation Forest with OPLS
Rotation Forest with OPLS (RoF-OPLS) is a variant of RoF.The major difference between RoF and RoF-OPLS is that OPLS is used to extract features for RoF-OPLS, while the feature extraction of RoF is based on PCA.The main steps of RoF-OPLS are: firstly, divide the feature space into K disjoint subspaces; then, OPLS is applied to each subspace with the boostrapped samples of 75% of the training set; in the next step, the new training set obtained by rotating the original training set is treated as input to the individual classifier; finally, by repeating the above steps several times, the final result is generated by combining the outputs of all classifiers.

Rotation Forest with KOPLS
The success of MCSs (Multiple Classifier Systems) depend on not only the choice of base classifier, but also the diversity within the ensemble [12,44].Aiming at improving both the diversity and classification accuracies of the DT classifiers within the ensemble, we propose a novel ensemble method, i.e., Rotation Forest with KOPLS (RoF-KOPLS), which aims at combining the advantages of KOPLS and RoF together.The proposed method can be summarized with the following steps (see Algorithm 2 and Figure 1).In the training phase, the feature space is randomly split into K disjoint subspace.For each subset, the initial training samples with 75% are drawn from the training data by using a bootstrap sampling method.KOPLS is applied to each subspace to get the coefficients R k .In the next step, the kernel matrices of Xi,j are calculated, and an individual classifier is trained on the extracted features F new i .In the prediction phase, the kernel matrices between Xi,j and a new sample x * is generated firstly.Then, the new transformed dataset F test i is classified by the ensemble, and the final result will be assigned to the corresponding class by the majority voting rule.We expect that RoF-KOPLS can improve the performance of RoF-OPLS by introducing further diversity by performing a kernel feature extraction within the ensemble.The base classifiers in RoF-KOPLS are expected to be more diverse compared to these in RoF-OPLS, thus yielding more powerful ensemble.Furthermore, depending on the types of kernel function, RoF-KOPLS can be more specific, i.e., RoF with linear kernel (RoF-KOPLS-Linear), RoF with polynomial kernel (RoF-KOPLS-Polynomial), and RoF with RBF kernel (RoF-KOPLS-RBF).randomly split the features F into K subsets F i j 3: form the new training set X i,j with F i j 5: randomly select the 75% of the initial training samples to generate Xi,j 6: using KOPLS to transform Xi,j with the aim of getting the coefficients calculate the kernel matrices by Xi,j , Ktrain i,j = K( Xi,j , Xi,j ) end for 9: the features extracted will be given by: for j = 1 : K do 3: generate the kernel matrices between Xi,j and x * , generate the test features of x * , F test i end for 6: run the classifier L i using F test i as input 7: end for 8: calculate the confidence x * for each class and assign the class label p(y i |x * ) = 1 T ∑ T i=1 p(y i |F test i ) to the class with the largest confidence.

Experimental Results
Two popular hyperspectral airborne images were used for experiments.More detailed descriptions of the two data sets and the corresponding results are discussed in the next two subsections.
The following measures were used to evaluate the performances of different classification approaches: • Overall accuracy (OA) is the percentage of correctly classified pixels.
• Average accuracy (AA) is the average of percentages of classified pixels for individual class.
• Kappa coefficient (κ) is the percentage of agreement corrected by the level of agreement that would be expected by casually [23].
For the purpose of analysing the ensemble clearly, we adopted the following measures to estimate its performance.
• Average of OA (AOA) is the average of OAs of individual classifiers within the ensemble.
• Diversity in classifier ensemble.Diversity has been regarded as a very significant characteristic in classifier ensemble [45].In this paper, coincident failure diversity (CFD) is used as the diversity measure [10].The higher the value of CFD, the more diverse the ensemble.

Results of the AVIRIS Indian Pines Image
The Indian Pines image was acquired by the Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) sensors over the Indian Pines test site in Northwestern Indiana of an agricultural area.The image is 145 × 145 pixels, with a spatial resolution of 20 m per pixel.In order to evaluate the performance of our proposed methods, all full spectral bands, including 20 noisy and water absorption bands, was used for experiment.It is composed of 220 spectral channels in the wavelength ranging from 0.4 to 2.5 µm.Sixteen classes of interest are reported in Table 1. Figure 2 depicts the three-band false color composite image and the reference data of this image.
In order to evaluate the performance of the proposed classification techniques, some methods including support vector machine(SVMs), DT, RotBoost [46,47], DT with KOPLS (DT-KOPLS), and RoF-PCA were implemented for comparison.The reason why we select SVMs and DT in comparison to the proposed methods is that they are two of the leading classification techniques of hyperspectral data.As fa as SVM is concerned, the radial basis function kernel is choosen for classification, which include two parameters (i.e., penalty term C and the width of the exponential σ).Furthermore, in our experiments, fivefold cross-validation was used to select the best combination of parameters under the condition that C and σ were set to [2 −4 , 2 12 ] and [2 −10 , 2 5 ], respectively.Furthermore, DT-KOPLS is the variant of DT.In terms of RoF-PCA, it is a ensemble method using independent DT built on a different set of extracted features.It is worth noting that the feature extraction for RoF-PCA is based on PCA.For DT-KOPLS, KOPLS is used for feature extraction prior to DT classifier.The range of extracted components is from 2 to 30.In this paper, three kernels, e.g., linear, RBF and polynomial, are used in KOPLS feature extraction prior to DT classifier.Only the best results are reported in this paper.The kernel width σ in RBF kernel was computed by the median of all pairwise distances between the samples [48], and c in polynomial kernel was set to 2. The reported results were achieved by averaging the results obtained from ten Monte Carlo runs.According to our previous studies [19,20], T is set to be 10 in the ensembles.
The number of features in a subset (M) is a crucial parameter for the Rotation Forest ensembles.In order to investigate the impact of M on the performance of different classification scheme, we randomly select a very limited training set, i.e., 10 samples per class.The evaluation of OA with the increase of M is depicted in Figure 3.It should be noted that, the value of M should be less than the number of classes for RoF-OPLS.For other methods, M ranges from 2 to 110.The results presented in Figure 3 obviously show that there is no consistent pattern of the relationship between M and OAs, which is in accordance with the conclusions obtained in our studies [20,49].The OAs obtained by RoF-KOPLS-Linear and the RoF-KOPLS-Polynomial decrease as the increase of M. In particular, it is worth noting that the RoF-KOPLS-RBF can obtain the best OAs in all cases.Furthermore, RoF-KOPLS-RBF is insensitive to M in comparison with other classification methods when the value of M is greater the number of classes (i.e., 16).Another observation is that, the optimal values of M for different classification methods are various.For instance, RoF-KOPLS-RBF achieves the best classification result when M = 100.To ensure a fair comparison, the optimal values of M are independently selected for specific methods.Thus, the optimal values of M for RoF-OPLS, RoF-PCA, RoF-KOPLS-RBF, RoF-KOPLS-Linear, RoF-KOPLS-Polynomial were set to be 14, 100, 100, 4 and 4, respectively.Figure 4

Results of the University of Pavia ROSIS Image
In the second study, the proposed scheme was tested on the ROSIS image, which is collected from an university area with a spatial resolution of 1.In the first experiment, the impacts of M on the global accuracies obtained by all classification approaches were investigated.For the RoF-OPLS algorithm, the value of M should be less than the number of classes.Hence, the values of M were set to be 4, 5, 7 and 8.However, this limitation is not necessary for RoF-KOPLS and RoF-PCA methods.In order to clearly examine the effect of M on the OAs obtained by RoF-KOPLS and RoF-PCA methods, the value of M was set to the range from 4 to 60. Figure 6 shows the OAs obtained by different methods as a function of different values of M. Similar conclusion can be drawn with the former experiments.First, the performances of the RoF methods rely on the values of M. It should be noted that the RoF-KOPLS-RBF is insensitive to the value of M compared to other classification techniques when M is greater than 9 (i.e., the number of classes).Second, the impact of M on OA seems to be irregular.Third, the overall accuracies obtained by RoF-KOPLS-RBF are more accurate than those achieved by all other methods.Finally, the overall accuracies obtained by the RoF-KOPLS-Linear method and the RoF-KOPLS-Polynomial method exhibit larger variations as the increase of M. Nevertheless, the overall accuracies achieved by the presented RoF-KOPLS-RBF method tend to be stable with the increase of the value of M. In order to make fair comparisons, the value of M should be selected as the one achieving the best accuracy for each classification algorithm.In consequence, the values of RoF-OPLS, RoF-PCA, RoF-KOPLS-RBF, RoF-KOPLS-Linear, RoF-KOPLS-Polynomial were set to 8, 20, 20, 4 and 7, respectively.Figure 7 depicts the classification maps obtained by all the considered methods.

Discussion on the AVIRIS Indian Pines Image
The overall and class-specific accuracies of different classification algorithms are presented in Table 1.The results reveal that the classifier ensembles can yield more accurate accuracies compared to single classifiers.It is apparent that the proposed RoF-KOPLS-RBF method provides good results roughly equivalent to the recently proposed method, Rotboost, which is followed by RoF-PCA and RoF-OPLS.Furthermore, it should be noted that the proposed RoF-KOPLS-RBF method can achieve considerable increases in most class-specific accuracies, which significantly outperforms others.The McNemar's test revealed that the difference between RoF-KOPLS-RBF and RoF-OPLS are statistically significant (|z| > 1.96) [50].The kernel-based method improves the accuracies by 8.06% in OA and 6.16% in AA.Furthermore, as we can see from the Figure 4, the Rotation Forest ensembles can improve the classification accuracies and produce more smooth classification maps.These results validate the good performance of the proposed RoF-KOPLS-RBF by combining KOPLS and RoF.
The number of classifiers (T) and training samples are the key parameters for the proposed method.In order to investigate the influence of T on the classification accuracies, we have performed the classification results when the number of feature in a subset M is set to 100.As we can see from the Figure 8a, the classification accuracies are improved with the increase of T. Table 2 presents classification accuracies obtained by individual classifier using different numbers of training samples.As reported in the table, the proposed RoF-KOPLS-RBF, RoF-KOPLS-Linear, RoF-KOPLS-Polynomial, and RoF-OPLS methods are superior to DT and DT-KOPLS.RoF-KOPLS-RBF, RoF-OPLS, and RoF-PCA achieve better classification accuracies when compared to SVM.It can be found that the proposed RoF-KOPLS-RBF method gains the best classification results under most of training scenarios as compared to other classification techniques.As we can see from the Table 2, when we compare the proposed method with the recently new classification method RotBoost, our proposed method is equivalent or superior to the RotBoost approach.Therefore, it can be concluded that RoF-KOPLS-RBF works more efficiently with relatively low number of labeled training samples.
Table 3 provides the OAs, AOAs, and diversities obtained by different RoF ensembles using 10 samples for each class.The accuracy of individual classifier and diversity are two important properties for a classifier ensemble as higher values of AOA and diversity always give rise to better performance.The results in this table show that the proposed RoF-KOPLS-RBF method acquires the highest AOA and diversity, leading to the best classification accuracies.Furthermore, it is worth noting that the effect of kernel functions on the classification accuracies are significant.RoF-KOPLS-RBF method obtains better classification results in comparison to RoF-KOPLS-Linear and RoF-KOPLS-Polynomial methods.This can be attributed to RoF-KOPLS-RBF's higher values of AOA and diversity.The classification accuracies of all the classification techniques are summarized in Table 4. From this table, the best OA, kappa coefficient, and class-specific accuracies for most classes are achieved by the presented RoF-KOPLS-RBF method, which is followed by the RotBoost, RoF-PCA and RoF-OPLS approach.In this case, the OA of the RoF-OPLS approach is improved by 5.46% compared to the RoF-KOPLS-RBF.According to the results of McNemar's test, the RoF-KOPLS-RBF classification map is significantly more accurate compared to those achieved by other methods except the RotBoost approach with a confident level of 5%.We can conclude that the proposed RoF-KOPLS-RBF method inherits the good merits of KOPLS and RoF, thus leading to improved classification result.
As like in the first experiment, the impacts of T and training samples on the classification results have also been explored.When investigating the influence of T on the classification accuracies, the number of feature in a subset M is set to 20 achieving the best accuracy for the proposed method.Figure 8b shows the OA (%) using different number of T. With the increase of T, the classification results are significantly improved.Table 5 gives the OAs and AAs (in parentheses) obtained by different classification approaches when using different number of training samples.As expected, the classification accuracies obtained by all methods becomes higher with the increase of the training set size.Analogous to the first experiment, the proposed RoF-KOPLS-RBF method demonstrates relatively higher performance with a very limited number of training samples in terms of OAs and AAs, as compared to the other classification approaches.Moreover, from Figure 7, we can draw that the Rotation Forest ensembles generate more accurate classification maps with reduced data noise in comparison with the individual classifiers.
The OAs, AOAs, and diversities obtained by Rotation Forest ensembles are reported in Table 6 to evaluate the ensemble clearly.It can be noted that the proposed RoF-KOPLS-RBF approach gives the highest AOA and diversity, when compared to other classification approaches.RoF-KOPLS-RBF gains the best overall accuracy due to the fact that higher AOA and diversity lead to better ensemble performance, which confirms the validity of combining the merits of KOPLS and Rotation Forest.As can be seen from the table, we can conclude that the kernel function can give rise to significant impact on the classification accuracies, which is similar to the first experiment.RoF-KOPLS-RBF achieves the higher values of AOA and diversity when compared to RoF-KOPLS-Linear and RoF-KOPLS-Polynomial, leading to better classification results.
In addition, it should be noted that although the proposed method has shown good performance in the classification of hyperspectral data, it is confronted with some common cons for Rotation Forest, e.g., the relative low computational efficiency and sensitivity to the number of features in a subset [21].Moreover, the proposed method only consider the spectral information so that it obtains suboptimal classification results when compared to the method taking advantage of the spatial and spectral information simultaneously [20].

Conclusions
In this paper, a new classification approach is presented by combining the advantages of kernel-based feature extraction, i.e., KOPLS, and ensemble method, i.e., Rotation Forest.The performance of the proposed methods was evaluated by several experiments based on two popular hyperspectral images.Experimental results demonstrated that the proposed RoF-KOPLS methodology can inherit the merits of RoF and KOPLS to achieve more accurate classification results.
The following conclusions can be drawn according to the experimental results: • RoF-KOPLS with RBF kernel yields the best accuracies against the comparative methods above-mentioned due to the ability of improving the accuracy of base classifiers and the diversity within the ensemble, especially for the very limited training set.• In RoF-KOPLS, the kernel functions can give rise to significant influences on the classification results.Experimental results have shown that RoF-KOPLS with RBF kernel obtained the best performances.• RoF-KOPLS with RBF kernel is insensitive to the number of features in a subset when compared to other methods.
In the future, we will further explore the integration of Rotation Forest and kernel methods in classifier ensemble for real application of the hyperspectral images.On the one hand, we will attempt to combine the proposed method with Adaboost or Bagging [51].On the other hand, given the important role of spatial features in the classification of hyperspectral image [52], spatial information will be incorporated to improve the performances of the proposed classification scheme in the following work.

10 : 2 : 1 T
build an DT classifier L i using XR a i , Y 11: add the classifier to the current ensemble, L = L ∪ L i .12: end for Prediction phase Input: The ensemble L = {L i } T i .A new sample x * .Rotation matrix:R a i .Output: class label y * 1: get the output ensemble with x * R a i calculate the confidence x * for each class, y j , by average combination method: p(y i |x * ) = ∑ T i=1 p(y i |x * R a j

Algorithm 2
Rotation Forest with KOPLS Training phase Input: {X, Y} = {x i , y i } l i=1 : training samples, T: number of classifiers, K: number of subsets, M: number of features in a subset, L: base classifier.The ensemble L = ∅.F: Feature set Output: The ensemble L 1: for i = 1 : T do 2:

10 :
train a DT classifier L i using F new i , Y 11: add the classifier to the current ensemble, L = L ∪ L i .12: end for Prediction phase Input: The ensemble L = {L i } T i .A new sample x * .Rotation matrix: R. Output: class label y * 1: for i = 1 : T do 2: For pixel x * , the output class probability p(y i |x * ) = 1 T ∑ T i=1 p(y i |V i ).
plots the classification maps obtained by the individual and ensemble learning methods (only one Monte Carlo run).

Figure 2 .Figure 3 .
Figure 2. AVIRIS Indian Pines data set.(a) Three-band color composite (bands 57, 27, 17); (b) Ground-truth map containing 16 mutually exclusive land-cover classes.The legend of this scene is shown at the bottom.

Figure 4 .
Figure 4. Classification maps of the Indian Pines AVIRIS image (only one Monte Carlo run).OAs of the classifiers are presented as follows: (a) DT (40.20%);(b) RoF-PCA (57.39%);(c) RoF-OPLS (54.97%);(d) RoF-KOPLS-Linear (45.39%);(e) RoF-KOPLS-Polynomial (42.80%); (f) RoF-KOPLS-RBF (64.25%) 3 m and 103 bands.The original recorded image has a spatial dimension of 610 × 340 pixels, with 103 channels left for experiments by removing 12 noisy bands.Nine classes of interest are contained in the reference data with a total number of 42776 labeled samples.A false color composite image and the reference data are shown in Figure 5.For this experiment, we randomly select only 10 samples per class as training samples, which represents a very limited training set.In order to ensure a fair comparison, we conducted ten independent runs for each experiment in terms of training samples selection and classification.

Figure 5 .Figure 6 .
Figure 5. ROSIS University of Pavia data set.(a) Three-band color composite (bands 102, 56, 31); (b) Reference map containing 9 mutually exclusive land-cover classes.The legend of this scene is shown at the bottom.

Figure 8 .
Figure 8. Sensitivity to the change of the number of trees.(a) Indian Pines AVIRIS image; (b) University of Pavia ROSIS image.

Table 1 .
Overall, Average and Class-specific Accuracies for the Indian Pines AVIRIS image.

Table 2 .
OAs and AAs (in Parentheses) Obtained for Different Classification Methods When Applied to the Indian Pines AVIRIS image.

Table 3 .
OAs (in Percent), AOAs (in Percent), and Diversities Obtained for Different Rotation Forest Ensembles When Applied to the Indian Pines AVIRIS Image.

Table 4 .
Overall, Average and Class-specific Accuracies for the Pavia ROSIS image.

Table 5 .
OAs and AAs (in Parentheses) Obtained for Different Classification Methods Using Different Numbers of Training Samples When Applied to the Pavia ROSIS Image.

Table 6 .
OAs (in Percent), AOAs (in Percent), and Diversities Obtained for Different Rotation Forest Ensembles When Applied to the Pavia ROSIS image.