An Improved Rotation Forest for Multi-Feature Remote-Sensing Imagery Classification

Multi-feature, especially multi-temporal, remote-sensing data have the potential to improve land cover classification accuracy. However, sometimes it is difficult to utilize all the features efficiently. To enhance classification performance based on multi-feature imagery, an improved rotation forest, combining Principal Component Analysis (PCA) and a boosting naïve Bayesian tree (NBTree), is proposed. First, feature extraction was carried out with PCA. The feature set was randomly split into several disjoint subsets; then, PCA was applied to each subset, and new training data for linear extracted features based on original training data were obtained. These steps were repeated several times. Second, based on the new training data, a boosting naïve Bayesian tree was constructed as the base classifier, which aims to achieve lower prediction error than a decision tree in the original rotation forest. At the classification phase, the improved rotation forest has two-layer voting. It first obtains several predictions through weighted voting in a boosting naïve Bayesian tree; then, the first-layer vote predicts by majority to obtain the final result. To examine the classification performance, the improved rotation forest was applied to multi-feature remote-sensing images, including MODIS Enhanced Vegetation Index (EVI) imagery time series, MODIS Surface Reflectance products and ancillary data in Shandong Province for 2013. The EVI imagery time series was preprocessed using harmonic analysis of time series (HANTS) to reduce the noise effects. The overall accuracy of the final classification result was 89.17%, and the Kappa coefficient was 0.71, which outperforms the original rotation forest and other classifier ensemble results, as well as the NASA land cover product. However, this new algorithm requires more computational time, meaning the efficiency needs to be further improved. Generally, the improved rotation forest has a potential advantage in remote-sensing classification.


Introduction
Frequently updated land cover data provide useful information for multi-temporal studies and are also required inputs for land cover change models, climate change models or post-catastrophe analysis [1,2].Benefits of high-temporal frequency, remote-sensing imagery include the unique opportunity for acquiring land cover information through the process of imagery interpretation and classification [3].To generate updated land cover data at different scales, researchers have proposed a series of remote sensing imagery classification techniques.Using the individual pixel as the basic analytical unit, the techniques can be grouped into one of three categories: unsupervised classification methods (i.e., ISODATA and K-means), supervised classification methods (i.e., decision trees, naive Bayesian, support vector machine, artificial neural network, and maximum likelihood), and hybrid classification methods (i.e., semi-supervised and fusion of supervised and unsupervised learning) [4,5].As the spatial resolution increases quickly, object-based classification methods are proposed to address high-resolution remote-sensing images.In these methods, the pixels with homogeneous properties are grouped into basic units instead of individual pixels and the spatial contextual information is considered [6][7][8].
With the development of remote-sensing data acquisition technology, remote-sensing imagery can be easily obtained through various sensors [9].Under these circumstances, remote-sensing classification technology faces new challenges of processing multi-source data and obtaining higher accuracy predictions [10].The current classification techniques have respective merits and shortcomings, but different classifiers have complementary information for the correct classification results [1,[11][12][13].Therefore, one effective solution is to design classifier combinations that are either based on the same base classifiers trained on different data subsets or based on different classifiers trained on the same dataset.Schapire [14] proved that classifier combinations with a weak learning algorithm could achieve arbitrarily high accuracy.These multiple classifier systems are called classifier ensembles or multiple classifier systems (MCS) [15][16][17].Bagging and boosting are two classical approaches for creating classifier ensembles at present.Bagging takes bootstrap samples of original objects to generate base classifiers that are used to combine the accurate ensembles.The classification result is obtained by majority voting [18].Boosting tries to boost the performance of a "weak" classifier by using it within an ensemble structure [19].The classifiers in the ensembles are added one at a time so that each subsequent classifier is trained on data that have been "hard" for the previous ensemble members [20].Within this scheme, a pixel in remotely-sensed imagery could be classified using classifier ensembles to improve the accuracy.A comparison of classification performance between random forest classifier ensembles and support vector machines (SVMs) by Pal [21] indicated that random forest performs equally well to SVMs in terms of classification accuracy and training time and has even more advantages in certain aspects.Foody et al. [22] used a simple voting procedure to combine various binary classifier outputs to separate a specific class of interest from all others.Maulik and Chakraborty [23] proposed multiple classifier ensembles combining k-NN, a SVM and an incremental learning algorithm (IL) by majority voting to obtain a more accurate classification result for land cover data compared to any of the single classifiers.
Furthermore, researchers have tried many approaches to improve the performance of the classifier ensemble method.Instead of using the weights of the objects to train the next classifier, García-Pedrajas [24] used the distribution given by the weighting scheme of boosting to construct a non-linear supervised projection of the original variables.This method has been proved to achieve a better generalization error while being more robust to noise.Zhang and Zhou [25] proposed a semi-supervised ensemble method, in which the accuracy of basic learners based on labeled data was maximized, whereas the diversity among them on unlabeled data was also maximized.This method has been demonstrated to be highly competitive to other semi-supervised ensemble methods.Kim and Kang [26] utilized a neural network as a base classifier for a bagging ensemble method to obtain an improved performance over the traditional method.Rodríguez et al. [27] proposed a new classifier ensemble called rotation forest, in which the original data are projected into a new feature space using Principal Component Analysis (PCA); then, each base classifier (decision tree) is trained using the new training data for linear extracted features.The feature extraction by PCA encourages the diversity of each base classifier.The experimental results, based on hyperspectral remote-sensing imagery, revealed that rotation forest could produce more accurate results than bagging, AdaBoost, or random forest [28].These previous studies indicate that the accuracy and diversity of base classifiers are two key features that affect the performance of the classifier ensembles [27][28][29].
Based on this finding, an improved ensemble method, drawing upon the rotation forest framework, is proposed that aims to further improve the diversity and accuracy of base classifiers.To increase the diversity of base classifiers in the ensemble, feature extraction to the subsets of features was applied by PCA, which outperforms non-parametric discriminant analysis (NDA) or random projections [29], and a full feature set for each classifier was reconstructed.The feature extraction was repeated several times, and each processing instance was able to generate different new training datasets.On each new training dataset, a boosting naïve Bayesian tree was introduced as a base classifier instead of a decision tree in the original classifier ensembles.In this new method, two-layers of voting were applied in the classification phase.We first obtained predictions using the weighted vote boosting naïve Bayesian tree; then, the majority-voting rule was used to integrate the first-layer results to obtain the final result as the prediction of the classifier ensembles.To evaluate performance, this method was applied to 2013 multi-feature remote-sensing imagery classification in Shandong Province, China.
The remainder of this paper is arranged as follows.In Section 2, we introduce the study area, data source and data filtering processing.Section 3 describes the improved classifier ensembles.Section 4 presents the experimental results and the analysis.Section 5 provides the discussion.Section 6 offers conclusions and addresses future work.

Study Area
Shandong Province is located on the eastern coast of China, and in the lower reaches of the Yellow River (34 1).To the east is the Bohai Sea and Huanghai Sea.The study area covers 157,100 km 2 , which is 1.6% of China's total area.The western and northern area of the province is flatland plains, the central area is mountainous, and the eastern area is gentle hills.The main terrain in Shandong Province is plains, which cover 63% of the total area, followed by mountains and hills, which account for a combined 33%.This province is in a warm temperate humid and semi-humid monsoon climate.The characteristics of a marine climate are obvious in the eastern area, and the western area exhibits a continental climate.There are 17 cities in Shandong Province, and the population was 97.33 million in 2013.Industrialization and urbanization are relatively high in China.Furthermore, it is one of the major food and cotton producing areas of China.

Data Source
The MODIS datasets for Shandong Province from 2013 include the following: MOD13Q1 data version 005, MOD09Q1 data version 005 and MOD09A1 data version 005 from NASA's Earth Observing System Data and Information System (EOSDIS).Because the Normalized Difference Vegetation Index (NDVI) in MOD13Q1 is easily saturated in the plant growth period [30], the Enhanced Vegetation Index (EVI), which overcomes this disadvantage, was chosen to track the phenological changes.The EVI has a time intervals of 16 days, so there are 23 periods of EVI data throughout the year.The land surface reflectance data consist of bands 1-2 in MOD09Q1 and bands 3-7 in MOD09A1, which were selected from September since the spectrum varies greatly during this period.Since the Normalized Differences Water Index (NDWI) has been proven to be effective in land classification in some studies [31], it was used as a feature in this study.The data are projected with the Albers Conical Equal Area format using a WGS84 coordinate system.
The ancillary data included digital elevation model (DEM) data, vegetation phenological documents and vegetation distribution imagery.The DEM data collected from a Shuttle Radar Topography Mission (SRTM) in a 90-m pixel size was resampled into a 500-m size.Vegetation phenological documents were collected from a Chinese phenology observation website.Vegetation distribution imagery on a scale of 1:1,000,000 describes the distribution of indigenous vegetation in China [32].In this paper, the imagery was classified into six kinds of land cover categories: crop land, forest land, grass land, built-up land, water body and unused land.

EVI Data Preprocessing
In addition to cloud cover, there are other random factors, such as azimuth and elevation of the sun, moisture in the atmosphere, and aerosols, that affect the EVI data.Since the 16-day composite imagery processing cannot fully correct for these errors, the EVI images are frequently discontinuous.The errors reduce the EVI imagery accuracy and result in a lack of obvious change trends.This makes it difficult to extract useful information from the EVI.
To reduce the noise effects, harmonic analysis of time series (HANTS) [33][34][35] was performed to construct a noise-free imagery time series that is close to the reality.The fundamental principles of this filter include the notions of Fourier series and Fourier transforms.It consists of two sub-processes.One is an abnormal value identification process during which abnormal values are excluded.The other is a reconstruction process during which time series are reconstructed based on the remaining valid data.
Taking the first period of EVI imagery an example, Figure 2 shows how the imagery can become much smoother after HANTS processing.The abnormal spatial patches in the original imagery are effectively filtered, and the imagery contrast is well smoothed.EVI time-varying curves (Figure 3) indicate that the noise is effectively restrained after HANTS processing while retaining the original variation trend.Moreover, the characteristics of EVI time-varying curves indicate different phenological characteristics of land cover and provide useful information to distinguish the land cover.Based on the 32 multi-feature images (including 23 EVI images, 7 land surface reflectance images, an NDWI image and a DEM data image), the training data were selected.The Jeffries-Matusita (J-M) distance is a widely-used metric to measure the separability between classes.A J-M distance greater than 1.8 indicates good separation and less than 1.5 indicates poor separation.Tables 1 and 2 show the J-M distance of a different couple of land cover types before and after HANTS processing.The separability between classes was significantly improved after processing, especially between vegetation classes.The J-M distance of crop-forest increased from 1.56 to 1.96.And the crop-grass J-M distance increased from 1.49 to 1.83.The forest and grass had a similar growth rhythm, so the J-M distance was lower than 1.8 but still increased higher than 1.5.The water body, built-up land and unused land all had a slight increase of separability from other classes.On the whole, the samples can be used for supervised classification.

Method
The rotation forest is a successful method for generating classifier ensembles based on feature extraction [27,36].In this method, the feature set of all training samples is randomly split into K subsets (K is a parameter of the algorithm), and PCA is applied to rotate the original feature set.All of the principal components are retained to preserve the variability of information in the data.Furthermore, K-axis rotations occur to form new features for a base classifier.The diversity of each base classifier increases by feature extraction.Furthermore, the accuracy is also promoted by keeping all principal components and using the whole data set to train the base classifier.
Drawing upon the rotation forest, the modified ensemble method also uses feature extraction to increase the diversity of the base classifier (Figure 4).However, different from rotation forest that employing a decision tree as a base classifier, we used the boosting naïve Bayesian tree classifier as the base classifier.The classifier ensemble, as a base classifier, could further encourage individual accuracy and achieve lower prediction error.In the classification task, we first obtained several predictions by weighted voting within a boosting naïve Bayesian tree.Then, the several prediction results were voted on to obtain the final classification result.In other words, the modified ensemble method contained two layers of voting to obtain the final result.
To simplify the notations, consider the training set A = [X, Y] = {(x i , y i )} N i=1 containing N training objects, in which each object (x i , y i ) is described by a feature vector x i = [x i1 , x i2 , x i3 , . . ., x in ] and a class label y i that takes a value from the set of class labels Y = {1, 2, 3, . . ., J}.Let C 1 , C 2 , . . ., C S denote S (number of classifiers) base classifiers in the ensemble, and let F denote the feature set.
To promote individual diversity within the ensemble, the reconstruction of the training set by feature extraction for classifier C s (s = 1, 2, . . ., S) is conducted as follows: 1.
F is randomly split into K subsets (K is a number of feature subsets).The disjoint subsets are selected to maximize the diversity.Suppose K is a factor of n; then, each feature subset consists of M = n/K features.

2.
X sj denotes one subset of the training dataset, which contains M features.A bootstrap sample from X sj of 75 percent of the data count is drawn.Then, PCA is used to calculate the principal component of the new dataset.The coefficients of the principal components are stored in sj , . . ., a , an M × 1 matrix.

3.
D sj is placed on the main diagonal of a zero matrix to obtain a sparse "rotation" matrix R s as follows: sj , . . ., a sj , . . ., a sj , . . ., a Rearrange the columns of R s in order to correspond to the original features order.Let R a s denote the rearranged matrix; then, the training subset for classifier C s is XR a s .
The feature extraction steps above are repeated S times to construct a corresponding number of classifiers.
Based on the new training dataset, for which individual diversity has been promoted, a boosting naïve Bayesian tree classifier (Figure 5) is used as the base classifier.The motivation was to increase the individual accuracy of base classifiers with the first layer voting within the ensemble.First, we briefly review the naïve Bayesian tree.
As indicated by many researchers, the instability of base classifiers is a critical factor limiting classifier ensemble accuracy [37][38][39].To introduce instability into the boosting method, the NBTree algorithm was selected.NBTree is a hybrid of the naïve Bayesian and the Decision Tree algorithm, which integrates the advantages of the two algorithms [40][41][42].It has the same tree-growing procedure, but the leaf nodes are a naïve Bayesian classifier.In other words, it first segments the training data by building a Top-Down Decision Tree and then builds a naïve Bayesian classifier in each subset.In this study, similar to C4.5, NBTree also uses the information gain ratio criterion to select the best test at each decision node, and the maximum depth of the tree, d, is predefined.Furthermore, NBTree does not perform pruning in this study.Because the classification of the dataset at the leaf node is not completed as the maximum depth of the tree has been predefined, the naive Bayesian method is used as the classifier.In the naïve Bayesian approach, the features are assumed to be mutually independent of each other.With this assumption, the probability of a pixel x belonging to class c can be expressed as in Equation (2): where m is the number of features, a j is the jth feature value of x, P(c) is the prior probability and P(a j |c) is the conditional probability.
where n is the number of training objects, n c is the number of classes, c i is the class label of the ith training object, n j is the frequency of values of the jth feature, and a ij is the jth feature value of the ith training object.
Taking NBTree as the base classifier, AdaBoost was used as the boosting method.AdaBoost is a sequential algorithm in which multiple classifiers are induced by adaptively changing the distribution of the training dataset based on the performance of the previously generated classifiers.Denote by A a the new training set (XR a s , Y). D t = [w t (1), w t (2), . . ., w t (M)] denotes the weight of M objects at the tth trial, where all of the mth (m = 1, 2, . . ., M) objects at the tth trail, w t (m), are set to be equal at first.At each trial t = 1, . . ., T, the boosting procedure is as follows: 1.
A classification model is constructed using the NBTree from A a under the weight distribution D t .2.
In the results from step 1, if the nth object in A a is classified correctly, let δ(m) = 0, otherwise δ(m) = 1, where δ() is part of Equation (5).The error rate of the NBTree at the tth trail is defined as follows: If ε t > 0.5 or ε t = 0, then the classification result is unsatisfactory; w t (m) will be reinitialized using bootstrap sampling from A a with equal weight for each sampled object, and the boosting process continues from step 1.

3.
The weight vector w t+1 for the next trial is created based on the former w t .
If the mth object is classified correctly by w t (m), then otherwise, where Z t is a normalization factor chosen so that D t+1 has a probability distribution over A a .If w t+1 (m) < 10 −10 , w t+1 (m) is set to 10 −10 in order to address the numerical underflow problem [43].
After T trials, the T NBTree models are combined to form a classifier ensemble.The result of each boosting NBTree is summed up by weighted voting of the predicted class in every NBTree.This is the first-layer result.Based on the obtained first-layer results, equal weight voting, which belongs to the second-layer voting, is carried out, and the final result is obtained.These classification algorithms are all run with Matlab software version 2013b.

Results and Analysis
In this section, the values of parameters S and T, which represent the number of iterations for feature extraction and boosting NBTree, respectively, for the improved rotation forest are first decided.Then, the effect of EVI preprocessing on classification performance is analyzed.After that, the performances of feature extraction (in this case PCA) and feature selection (in this case forward and random feature selection) in the ensemble method are compared.Last, the classification result of the improved rotation forest is compared with other classifier ensembles, as well as the corresponding NASA MODIS land cover product.
The land cover map in vector format from the land resources sectors was collected as the truthing reference data (a sample area is shown in Figure 6b).To avoid the errors caused by different classification systems, we combined the land cover types into 6 types, including crop land, forest land, grass land, built-up land, water body and unused land.Furthermore, to guarantee the accuracy and representation of reference data and to avoid the errors due to pixel resampling, approximately 25% of each land cover type of pixels were randomly selected as the sample to evaluate the accuracy of the classification result.

Optimal Combination of S and T
To investigate the effects of S and T on the performance of the improved rotation forest, we conducted experiments choosing values 1, 2, 3, . . ., 10 (named ST S i × T j ).For the two parameters, for example, comparison ST 2 × 3 means S = 2 and T = 3.The accuracy of the classifier with different combinations of the values for S and T were evaluated five times.The means and deviations of each different combination's accuracy are shown in Table 3. From these results, we determined that ST 1 × 1 achieved the lowest mean accuracy.It is easy to see that ST 1 × 1 is identical to a normal naïve NBTree.With the increase in S and T, the means improved and the deviations reduced.When controlling one parameter and changing the other one, we observed that S and T have a similar effect on accuracy improvement.After ST 4 × 4, the overall accuracy tended to gradually stabilize.For the sake of prudence, we chose the combination ST 10 × 4, which has a higher mean value and less deviation than the parameters in the classification algorithm.

Effect of EVI Preprocessing
In this paper, HANTS filtering was used to construct a noise-free imagery time series that is close to the reality.The J-M distance of a different couple of land cover types before and after EVI preprocessing shows that HANTS can improve the separability between land classes.In order to verify the effects of EVI preprocessing to classification performance, two multiple-feature data (with original EVI data and with preprocessed EVI data, respectively) were classified using an improved rotation forest with the comparison ST 10 × 4. The confusion matrix between the improved rotation forest results and the reference data are shown in Tables 4 and 5.  Overall accuracy = 89.17%;Kappa coefficient = 0.71.
From both the overall accuracy and Kappa coefficient, the data with preprocessed EVI achieved better classification result.As can be seen from Tables 4 and 5, the overall accuracy was increased by 8.07% after preprocessing.The Kappa coefficient increased from 0.59 to 0.71.For certain classes, such as forest land and grass land, the classification accuracies were greatly improved.These classes have confusing spectral changing information with original EVI, but can be well distinguished after EVI preprocessed (as shown in Figure 3).According to the results, we can clearly see that EVI preprocessing played an important role in improving classification accuracy.

Comparison between PCA and Feature Selection
In the proposed ensemble method, PCA was used as the feature extraction technique and all the multi-feature information was completely preserved in the new space of extracted features.However, an alternative approach, feature selection, can be used to select a subset [44].The feature selection approach (like forward and random feature selection, for instance) is a special case of feature extraction.This approach represents the information in the original feature space.
In order to verify the advantage of feature extraction, we compared the performance of the feature extraction (in this case PCA) and feature selection.In this study, we considered two examples of feature selection: the forward feature selection (FFS) and the random feature selection (RFS).In the forward feature selection, we sequentially selected a number of optimal feature subsets which had the same size.For random feature selection we first randomly permuted features in the original feature set and then we split the feature set into several subsets which also had the same size.In order to ensure all features were used, the subset size was set to 4, 8 and 16, respectively.In addition, one feature set without EVI data was selected as a contrast and the subset size was set to 4. The performances were assessed by overall accuracy and Kappa coefficient (Table 6).
In Table 6, the feature sets with EVI data all performed better than those without EVI data, because the EVI data contained useful information for discrimination between land classes.With EVI data, the accuracy of random feature selection was slightly higher than forward selection technique.It might be explained that random feature selection could select more independent feature subsets than the forward feature selection.However, no large difference was noticed between the two feature selection approaches.PCA performed much better than forward and random feature selection approaches, and the best result was obtained at subset size of 8.The increasing subset size seemed not significant change the accuracy.

Comparison with Other Classification Results
The classification result of an improved rotation forest using the comparison ST 10 × 4 is shown in Figure 6a.Crop land is the dominant land cover type and is located mainly in the plains area; because Shandong is one of the major grain-producing areas.Forest land is distributed mainly in the middle area, at higher elevations, near the Tai Mountains, and a small fraction is found on the eastern Peninsula.Large patches of built-up land aggregate in the center of every city, and rural built-up land is scattered around the cities.The proportion of grassland is small and it mainly occurs in the eastern hilly area.
Using a confusion matrix, our classification result is compared with the truthing reference data.Table 5 shows the class distribution by pixel number for each class in the reference map.The overall accuracy of our results was 89.17%, and the Kappa coefficient was 0.71.Crop land had the highest user's accuracy, and the unused land had the lowest.This is because crop land is the main land cover type.Many built-up lands in small patches were not identified and were mistaken for crop land, as they occupy a smaller proportion in the mixed pixels in the coarse resolution imagery.Furthermore, the river having a small width was also hard to identify due to the limitation of the resolution, as sample area shown in Figure 6c.
The performances of other classifier ensembles were compared with the improved rotation forest.Furthermore, 500-m MODIS land cover product for Shandong Province from NASA (National Aeronautics and Space Administration) in 2013 was also collected (a sample area is shown in Figure 6d).The total number of iterations for the classifier ensembles was the same as for the improved rotation forest.The overall accuracy, Kappa coefficient and computation time were tested, as reported in Table 7.The computation time was measured by the native Matlab functions and was expressed in seconds.Since the MODIS land cover product is produced by NASA, the computation time is not available.
As can be seen in Table 7, the improved rotation forest obtains the highest accuracy among the compared algorithms.Using the accuracy to rank these algorithms from best to worst is the following: improved rotation forest, rotation forest, AdaBoost, Random forest, NBTree and NASA product.By comparing the results between random forest and rotation forest, we can see that feature extraction had an obvious effect on the classification accuracy improvement.However, taking into account the computation time, it has a negative effect on the efficiency because the rotation forest and improved rotation forest require the longest computation time.Because there are more procedures in the classification algorithm, the computation time increased.The accuracy of the NASA classification result was evaluated using a confusion matrix against the reference data (Table 8).The overall accuracy was 65.96%, and the Kappa coefficient was 0.33.It is a fair agreement according to the interpretation of Kappa coefficient [45].At the same time, the accuracy of NASA land cover data was the lowest among these classification results.In this data, the crop land was also the main land cover type.However, the forest lands in the middle of the province were not identified, and neither were the built-up lands and grass lands.Most of them were misclassified as crop land.Some water bodies near the shore were mistakenly classified as grass land or crop land.By analyzing the performance of these classification results, we found that the accuracy of the improved rotation forest result was the highest, and this improved method can provide more accurate data for land change research.Overall accuracy = 65.96%;Kappa coefficient = 0.33.

Discussion
In view of the fact that classifier ensembles, such as rotation forest and AdaBoost, that have been successfully applied in some remote-sensing imagery classification researches, it is plausible that a combination of the two methods may achieve lower prediction error than either of them.This paper proposed an improved rotation forest which was constructed by integrating the ideas of rotation forest and AdaBoost.The NBTree was used as the base classifier because it was more accurate than decision tree, yet sufficiently sensitive to rotation of the axes.
The high temporal resolution MODIS EVI time series, as well as ancillary geospatial and other MODIS data, were selected as the basic data.HANTS filtering was used to preprocess the EVI data.This strategy significantly increased the separability between land classes, especially the vegetation classes, and helped to improve the classification accuracy.
In order to find out which of the parameters and the feature extraction/selection methods are responsible for the good performance of improved rotation forest, multi-group comparison experiments were carried out.The sensitivity of the parameters S and T in the proposed method was investigated and the best parameters combination was selected.However, the best combination of parameters S and T was selected by enumeration.Whether there is a solution to automatically selecting an optimal combination requires further study.Then we compared the performance of feature extraction (in this study PCA) and feature selection (in this study forward and random feature selection).According to the results, no large difference was noticed between random and forward feature selection.It might be explained that in both cases all original features participated in the combined decision.Moreover, the PCA outperformed the two feature selection methods.This was because the PCA succeeded in extracting good features.When the feature extraction approach was used, all original features contribute in the new extracted feature set.The extracted features contained more useful information for discrimination between data classes.Moreover, the feature extraction encouraged individual diversity within the ensemble.On the contrary, the FFS, for instance, selected features sequentially one by one.The union of the first best feature selected with the second best one did not necessarily represent the best discriminative pair of features.By this, the selected feature subset might be not the most advantageous one.The comparison analysis demonstrated that feature extraction with PCA was more advantageous than applying feature selection techniques in the improved rotation forest.It also indicated that the dataset in this study need quite many principal components in order to obtain a classification rule that performs well due to the data distribution.However, if the first principal components succeeded in good discrimination between data classes, the classifier ensemble method on top of PCA may be not the best choice.
In addition, the increasing of the subset size from 4 to 8, and then 16 did not make the classification accuracy change significantly.Kuncheva and Rodríguez [29] had tested the impact of subset size on the rotation forest and found that there was no consistent relationship between the ensemble accuracy and subset size.In their experiment, the patterns for different data sets vary from decrease of the error with subset size, through almost horizontal lines, to increase of the error with subset size.In our study, the increasing of the subset size also did not make significant accuracy change.This was consistent with their finding.Generally, the performance of almost classifiers depends on the relation between the training sample size and the data dimensionality.The chances are the size of the training set in this study was much larger than feature space dimensionality.There is spacing between parameters of subset size in this experiment.For a thorough comparison, evaluation of the response of the improved rotation forest to the choice of each subset size is needed in the future study.
The classification result of the improved rotation forest was compared with other classifier ensembles (that is, NBTree, AdaBoost, random forest, and rotation forest).The result showed that improved rotation forest outperformed all other four methods.In fact, the improved rotation forest has a potential computational advantage over other methods in that it could parallel execute feature extraction, which preserves the variability information for the base classifier; furthermore, it combines the advantages of boosting NBTree, which could obtain an accurate result as a base classifier.An inadequacy of the method is the higher computational costs.The NASA land cover product was also collected, which had an overall accuracy of 65.96% and a Kappa coefficient of 0.33.It was unsatisfactory according to the interpretation of Kappa coefficient.The reason for this lies in two facts: one is the poor ecosystem representation of the 40-60 test sites, and the other is the implementation of algorithms that overcome previously unconsidered challenges involved in classifying high data volumes with complex feature attributes at global scales [46][47][48].On the whole, the comparison result indicates that our classification data is more useful for land cover applications.

Conclusions
Multi-feature, especially multi-temporal, remote-sensing data increase the potential for discriminating different land cover types.However, addressing multiple features remains a challenge in remote-sensing classification.Satisfactory classification results not only depend on basic data with little noise but also on a classification method that performs well.
In this paper, an improved rotation forest was developed to make full use of multiple feature information.HANTS processing was applied for EVI preprocessing to increase the separability between land classes and help to improve classification accuracy.Different feature extraction/selection methods were investigated for the construction of improved rotation forests.It was shown that PCA is more advantageous than feature selection techniques to create training data for a base classifier.Based on the newly generated dataset, AdaBoost with NBTree was adopted as the base classifier to further promote the accuracy.Finally, the classification result was obtained using two-step voting.The classification result of improved rotation forest was compared with other similar classifier ensembles and NASA land cover product.The results showed that improved rotation forest outperformed the other methods.The improvement of prediction accuracy was obtained with negligible increase in computational costs.Generally, the good performance that we identified mainly depended on a high-precision pixel-wise classifier, as well as a better understanding of local land systems, including the phenological rules and the terrain data.
Nevertheless, there are still some shortcomings and problems requiring further investigation.In this study, the pixels were treated as the objects, while neglecting the spatial information effects on the rotation analysis of the remote-sensing images [49,50].Future studies could incorporate spatial information into the improved rotation forest.Furthermore, there are other unstable algorithms, such as a neural network, that could be adopted as the base classifier in AdaBoost.It would be worthwhile to evaluate the effect of other base classifiers in the classifier ensemble on the classification accuracy.

Figure 3 .
Figure 3. EVI curves of different land cover types before and after HANTS processing: (a) original EVI curves; (b) HANTS-corrected EVI curves.

Figure 4 .
Figure 4. Illustration of the improved rotation forest.

Figure 6 .
Figure 6.Land cover classification maps: (a) improved rotation forest results for Shandong Province; (b) sample area of truthing reference map; (c) sample area of improved rotation forest results; and (d) sample area for the NASA product.

Table 1 .
J-M distance between different land cover types before HANTS processing.

Table 2 .
J-M distance between different land cover types after HANTS processing.

Table 3 .
Means and deviations of test accuracy with different combinations of S and T.

Table 4 .
Confusion matrix between the improved rotation forest result and the reference data before EVI preprocessing.

Table 5 .
Confusion matrix between the improved rotation forest result and the reference data after EVI preprocessing.

Table 6 .
The accuracy and Kappa coefficient for FFS, RFS and PCA.

Table 7 .
Comparison of accuracy and computation time for each algorithm.

Table 8 .
Confusion matrix between the NASA product and the reference data.