Feature Selection Solution with High Dimensionality and Low-Sample Size for Land Cover Classification in Object-Based Image Analysis

Land cover information extraction through object-based image analysis (OBIA) has become an important trend in remote sensing, thanks to the increasing availability of high-resolution imagery. Segmented objects have a large number of features that cause high-dimension and low-sample size problems in the classification process. In this study, on the basis of a partial least squares generalized linear regression (PLSGLR), we propose a group corrected PLSGLR, known as G-PLSGLR, that aims to reduce the redundancy of object features for land cover identifications. Using Gaofen-2 images, the area of interest was segmented and sampled to generate small sample-size training datasets with 51 object features. The features selected by G-PLSGLR were compared against a guided regularized random forest (GRRF) in metrics of reduction rate, feature redundancy, and accuracy assessment of classification. Three indicators of overall accuracy (OA), user’s accuracy (UA), and producer’s accuracy (PA) were applied for accuracy assessment in this paper. The result shows that the G-PLSGLR achieved a reduction rate of 9.27 with a feature redundancy of 0.29, and a value of OA 90.63%. The GRRF achieved a reduction rate of 1.61 with a feature redundancy of 0.42, and a value of OA 85.56%. The PA of each land cover category was more than 95% using features selected by G-PLSGLR, while the PA ranged from 77 to 96% using features selected by GRRF. The UA of G-PLSGLR-selected features ranged from 70 to 80% except for grass land and bare land, which achieved 10% higher UA than GRRF-selected features. The G-PLSGLR method we proposed has the advantages of a large reduction rate, low feature redundancy, and high classification performance, which can be applied in OBIA-based land cover classification.


Introduction
Land cover data are key inputs for modeling earth surface processes, such as climate, natural environment, ecology, food security, water resources, and soils [1][2][3][4][5][6][7].Obtaining land cover information is difficult, however, because it is highly dynamic due to modification from not only human activity but also nature [8].Even though remote sensing is the only method for efficiently deriving land cover information at a regional or continental scale, remotely-sensed land cover classification is far from satisfactory for many modeling tasks [7,9,10].
In land cover classification, pixel-based image analysis had been dominant since the early 1970s [11].With the development of high-resolution sensors, intra-class spectral variability of images has been increasing, which invalidates pixel-based image analysis and resulted in the development of object-based image analysis (OBIA) [12].When comparing OBIA with pixel-based image analysis of land cover extractions from high-resolution images, OBIA not only overcomes the "salt-and-pepper" effect, but also adds additional information on spectra, geometry, context, and texture [13][14][15].For these reasons, OBIA has been used widely in high-resolution image-based land cover classification [11].However, each object in OBIA is regarded as a basic unit, and a large number of features is calculated that may be useful in classification [16].
As the dimensionality of each object in OBIA increases, the sample size needed to support land cover classification methods, e.g., k-nearest neighbors, neural networks, decision trees, and random forests, often grows exponentially [17,18].Consequently, feature selection is important in OBIA for dimensionality reduction, for class separation, and for the efficient use of training samples [19].There are three types of feature subset selection approaches: filters, wrappers, and embedded approaches [20].Filter methods evaluate the quality of features by some criteria (such as correlation criteria, mutual information) to sort features, which are independent from the classification algorithm.Some extensive surveys of various filters methods that select proper features in object-based classification can be found in the literature [21,22].However, filter methods select features regardless of the performance of the classification algorithm [23].Wrapper methods measure the performance of features with a classifier, including sequential selection algorithms (such as sequential backward selection, sequential forward selection, and sequential floating selection) and heuristic search algorithms (such as genetic algorithms [24] and particle swarm optimization [25]).However, wrapper models are very computationally expensive.Embedded methods that combine the advantages of previous methods optimize both the goodness-of-fit and the number of variables, which include least absolute shrinkage and selection operator [26], regularized trees, regularized random forest [27], etc.Previous works have focused on generic dimensionality reduction solutions and mainly concentrated on hyperspectral image analysis by pixels [28][29][30].Another problem in OBIA is that sampling data of high dimensionality is time-consuming and expensive for large areas.Utilizing collected samples with minimized effects of intra-class variability is also a challenge.Though some research in OBIA has focused on the effects of training set size [31,32], there is still a need for feature selection on limited sample sizes to study the optimal features in the identification of different land covers.
To solve the problem of large intra-class variability and low sample size in OBIA-based land cover classification, we proposed a feature selection solution of group corrected partial least squares generalized linear regression (G-PLSGLR) on the basis of partial least squares generalized linear regression (PLSGLR), which has been applied in many disciplines with small sample sizes, such as genetics, spectroscopy, and analytical chemistry [33][34][35].Our approach can be described in four steps: (1) bootstrap the sampling to construct balanced training samples and validation samples; (2) feature grouping based on the Pearson correlation coefficient and graph-based theory using the training samples; (3) removing insignificant features and ranking the importance of grouped features based on PLSGLR regression coefficients; and (4) selecting features by Bayesian information criterion (BIC) calculated on the PLSGLR model after the features are added one by one.
The study area is approximately 31 km 2 over the coast of the Bohai Sea, which covers more than 21 villages (upper left longitude 119 • 8 29 and latitude 39 • 28 42 , lower right longitude 119 • 12 22 and latitude 39 • 25 42 ) (Figure 1).The size of Gaofen-2 image data of the study area is 5439 rows by 5754 columns.The land cover type of the study area consists of six classes including water, forest land, grass land, crops land, bare land, and residential and build-up land.There are many sub-categories of each land cover type, which result in large intra-class variability.The residential and build-up land category is comprised of such sub-categories as residential districts, industrial area, roads, and greenhouses; the forest land category is comprised of, for instance, the sparse forest alongside roads, the dense forest near the river, and juvenile woodland.The large intra-class variability poses a challenge to land cover classification.images with multispectral bands (blue-B1 (0.45-0.52 μm), green-B2 (0.52-0.59 μm), red-B3 (0.63-0.69 μm), and near infrared-B4 (0.77-0.89 μm)) on a swath of 45 km.
The study area is approximately 31 km 2 over the coast of the Bohai Sea, which covers more than 21 villages (upper left longitude 119°8′29″ and latitude 39°28′42″, lower right longitude 119°12′22″ and latitude 39°25′42″) (Figure 1).The size of Gaofen-2 image data of the study area is 5439 rows by 5754 columns.The land cover type of the study area consists of six classes including water, forest land, grass land, crops land, bare land, and residential and build-up land.There are many subcategories of each land cover type, which result in large intra-class variability.The residential and build-up land category is comprised of such sub-categories as residential districts, industrial area, roads, and greenhouses; the forest land category is comprised of, for instance, the sparse forest alongside roads, the dense forest near the river, and juvenile woodland.The large intra-class variability poses a challenge to land cover classification.

Segmentation and Sampling
Based on an automated multi-resolution segmentation algorithm [37], we carried out image segmentation using eCognition Developer 9.0 software (Trimble Inc., Munich, Germany).After the segmentation step, a total of 51 attributes were calculated on 10,709 objects (Table 1).
To obtain small size samples for OBIA for training, we collected samples from each land cover type, the number of which is even less than the number of features in all types.In addition, we collected testing samples based on both expert interpretation and in situ investigation.The number of samples is shown in Table 2.
Since large intra-class variability and insufficient samples result in the curse of dimensionality [38], the samples were labeled with both category and subcategory at same time in order to integrate human knowledge and avoid instability in clustering algorithms that intend to tackle such problems.Next, a bootstrap sampling strategy was applied to construct balance training samples, adjusting for the fact that most classifiers tend to favor the majority class, resulting in inaccuracy under class-imbalance [39].For each category, we regarded samples of the current category as one part and extract the same size of samples from the remaining as the other part, considering the subtype impacts.

Group Features
Feature selection algorithms frequently do not take into account the structural effects between features, which may result in the loss of representative features, especially in OBIA analysis, where the dimensionality is high [40].In this paper, we propose a Pearson correlation coefficient-based method for feature grouping, and derived feature groups using graph-based theory.The Pearson correlation coefficient is a measure of the linear dependence between two random variables, and has been proven to be invariant to scaling and translation [41].The correlation can be simply expressed as: where x i and y i are a series of n measurements of features X and Y, x and y are mean values of x i and y i , and r xy is the correlation between features X and Y.
The feature groups deriving process can be depicted as follows: The correlation r xy between each pair of features (denoted by X, Y) was calculated for the dataset.When the absolute value of the correlation between a pair of features is greater than threshold th, the edge E between two features is set to 1, otherwise it is set to 0. After the features are traversed, a graph will be established.The vertices V of the graph are made up of features, and the edge between vertices indicates whether the correlation between features exceeds the threshold.The parts that are connected to each other are extracted as a group g.

Feature Ranking
To rank the feature importance, we chose PLSGLR regression coefficients (β) as indicators on the basis of the PLSRGLM package [42,43] in R software.The PLSGLR method chooses latent components and considers the response variable in regression, which is different from similar methods, such as principal component regression [44].
The model of the PLSGLR with H components is written as follows: where c h is partial coefficient of component t h of the logistic regression of the response variable y and c 0 is the intercept.The component t h with p features can be written as follows: where w jh * is loading on feature x j of component t h .Hence, the model with p features we proposed can be expressed as: where β i is the coefficient of feature x i , and features are sorted based on coefficients; β 0 is the intercept and is removed from later analyses.Given a two-class dataset, it contains p features, and has a response variable with a value of 1 representing the current class and a value of 0 representing the other class.The PLSGLR method extracts H principal components t h in view of explanations of the response variable.Meanwhile, logistic regression is used to determine the coefficients c h of each component.In order to extract the importance of each feature, we extract loadings w jh * of features in each component, and multiply them with the coefficient c h of the corresponding component.The results produce a coefficient β that represents the importance of a feature.We rank the features according to the importance denoted by the absolute value of β.
To estimate the statistical significance of explanatory variables, a nonparametric framework by means of a bootstrap procedure was adopted [45].Through a large, pre-determined number of repeated samplings with replacements, the distribution parameters of coefficients were estimated, and their confidence intervals (CI) were calculated.Features that did not pass the significance test were removed [46].

Feature Selection
To select features that fit land cover categories well under a particular model in OBIA, BIC was adopted.BICs are often used to address model selection problems, which can be defined as: where L is the likelihood of the PLSGLR, n is the sample size, and k is the number of features added to the PLSGLR model.Given a series of ranked features for a certain land cover category, it contains k features after the significance test.The samples with a number of n are used to calculated the BIC-based PLSGLR using the selected features.Other methods such as random forests or the support vector machine can also be used to extract BICs for the selected features, but that is beyond the scope of this study.The features are added to the PLSGLR model one by one in a ranked order and the BIC is calculated each time that a feature is added.A low BIC score signals that the features added to the model are optimal in terms of their accuracy considering their dimensionality.Hence, we chose the features corresponding to the minimum BIC or with a BIC disturbance less than 3 [47].

Results
According to the BIC curve (Figure 2), the optimal number of features for water, forest land, grass land, crops land, bare land, and residential and build-up land are 1, 3, 8, 8, 9, and 4, respectively.
extracts H principal components h t in view of explanations of the response variable.Meanwhile, logistic regression is used to determine the coefficients h c of each component.In order to extract the importance of each feature, we extract loadings * jh w of features in each component, and multiply them with the coefficient h c of the corresponding component.The results produce a coefficient β that represents the importance of a feature.We rank the features according to the importance denoted by the absolute value of β .
To estimate the statistical significance of explanatory variables, a nonparametric framework by means of a bootstrap procedure was adopted [45].Through a large, pre-determined number of repeated samplings with replacements, the distribution parameters of coefficients were estimated, and their confidence intervals (CI) were calculated.Features that did not pass the significance test were removed [46].

Feature Selection
To select features that fit land cover categories well under a particular model in OBIA, BIC was adopted.BICs are often used to address model selection problems, which can be defined as: where L is the likelihood of the PLSGLR, n is the sample size, and k is the number of features added to the PLSGLR model.Given a series of ranked features for a certain land cover category, it contains k features after the significance test.The samples with a number of n are used to calculated the BIC-based PLSGLR using the selected features.Other methods such as random forests or the support vector machine can also be used to extract BICs for the selected features, but that is beyond the scope of this study.The features are added to the PLSGLR model one by one in a ranked order and the BIC is calculated each time that a feature is added.A low BIC score signals that the features added to the model are optimal in terms of their accuracy considering their dimensionality.Hence, we chose the features corresponding to the minimum BIC or with a BIC disturbance less than 3 [47].

Results
According to the BIC curve (Figure 2), the optimal number of features for water, forest land, grass land, crops land, bare land, and residential and build-up land are 1, 3, 8, 8, 9, and 4, respectively.In comparison with our approach, guided regularized random forest (GRRF) [48] (an embedded method that optimizes both the goodness-of-fit and the number of variables) was employed to extract features from the same training datasets.The features selected by the two methods, namely, G-PLSGLR and GRRF, are shown in the Figure 3.To describe the dimensionality reduction capability, a reduction rate was defined as the ratio of the original dimensions and the post-processing dimensions of one class, and an overall reduction rate was calculated by taking the mean of each class.
(a) In comparison with our approach, guided regularized random forest (GRRF) [48] (an embedded method that optimizes both the goodness-of-fit and the number of variables) was employed to extract features from the same training datasets.The features selected by the two methods, namely, G-PLSGLR and GRRF, are shown in the Figure 3.To describe the dimensionality reduction capability, a reduction rate was defined as the ratio of the original dimensions and the post-processing dimensions of one class, and an overall reduction rate was calculated by taking the mean of each class.In comparison with our approach, guided regularized random forest (GRRF) [48] (an embedded method that optimizes both the goodness-of-fit and the number of variables) was employed to extract features from the same training datasets.The features selected by the two methods, namely, G-PLSGLR and GRRF, are shown in the Figure 3.To describe the dimensionality reduction capability, a reduction rate was defined as the ratio of the original dimensions and the post-processing dimensions of one class, and an overall reduction rate was calculated by taking the mean of each class.The links on graphs are the feature grouping results from the G-PLSGLR.The selected features are depicted as red tiles across the x-axis.Water-selected features vs. methods (a); Forest land-selected features vs. methods (b); Grass land-selected features vs. methods (c); Crops land-selected features vs. methods (d); Bare land-selected features vs. methods (e); Residential and build-up landselected features vs. methods (f).
In the OBIA-based land cover classification, only one feature modeMinimu was selected to distinguish water from other land cover types.For the extraction of forest land from other land cover types, the features GLCM_Ang_2, GLCM_Homog, modeMini_2 were selected.The principal information used to select the features included textural and spectral characteristics, which conforms to the current consensus.For grass land, the features GLCM_Contr, GLCM_Mean_, Skewness_L, GLCM_Corre, GLCM_StdDe, HSI_Transf, Standard_d, Skewness_1 were selected.These features also captured textural and spectral characteristics but were more complex than those employed for forest The links on graphs are the feature grouping results from the G-PLSGLR.The selected features are depicted as red tiles across the x-axis.Water-selected features vs. methods (a); Forest land-selected features vs. methods (b); Grass land-selected features vs. methods (c); Crops land-selected features vs. methods (d); Bare land-selected features vs. methods (e); Residential and build-up land-selected features vs. methods (f).
In the OBIA-based land cover classification, only one feature modeMinimu was selected to distinguish water from other land cover types.For the extraction of forest land from other land cover types, the features GLCM_Ang_2, GLCM_Homog, modeMini_2 were selected.The principal information used to select the features included textural and spectral characteristics, which conforms to the current consensus.For grass land, the features GLCM_Contr, GLCM_Mean_, Skewness_L, GLCM_Corre, GLCM_StdDe, HSI_Transf, Standard_d, Skewness_1 were selected.These features also captured textural and spectral characteristics but were more complex than those employed for forest land.For identifying crops land, the features Standard_4, HSI_Tran_2, Standard_d, GLCM_Homog, GLCM_Contr, Roundness, NDWIF, HSI_Transf were selected, which introduced geometric characteristics in addition to textural and spectral characteristics.The geometric characteristics may reveal the regularity of segmented objects on crops land.For the identification of bare land, the features Skewness_4, HSI_Tran_2, Mean_Lay_4, Skewness_1, Skewness_3, modeMinimu, Compactnes, GLCM_Homog were selected.These features primarily focus on the first-order moments and third-order moments of the segmented object, which may be caused by the high homogeneity of bare land.For the identification of residential and build-up land, the variables Standard_4, HSI_Tran_2, Standard_d, GLCM_StdDe were selected.These features primarily focus on the second-order moment information of the segmented object, which may result from the high heterogeneity of the artificial surface.As seen in Figure 3, the number of selected features for the G-PLSGLR method is much less than that of GRRF.The overall reduction rate for G-PLSGLR is 9.27, which is larger than that of GRRF, which is 1.61.The overall reduction rate for G-PLSGLR is 5.78 times higher than that of GRRF.

Validation
To compare the feature redundancy of the proposed methodology, average correlation coefficients were applied.The overall accuracy (OA), user's accuracy (UA), and producer's accuracy (PA) were used to test the classification performance by applying linear discriminant analysis (LDA) using testing samples.The LDA classifier was chosen on the basis of its performance in terms of minimizing the Bayes error for binary classification.

Evaluation of Feature Redundancy
Feature redundancy increases search space size and affects the speed and the accuracy of learning algorithms [49].The correlation coefficient, which indicates the strength and direction of a relationship between two random variables, is used to measure the redundancy.We extracted the upper triangular matrix from the correlation coefficient matrix of selected features for each class and later calculated its average value to measure the overall redundancy.An average correlation coefficient value close to 1 indicates a strong redundancy, while a value close to 0 indicates a weak redundancy.The absolute value of the correlation coefficient is used to capture the dependency, whether it is positive or negative.
As shown in Figure 4, the GRRF method has the largest redundancy in all land cover categories.The redundancy of selected features for water by G-PLSGLR is 0; in contrast, GRRF has a redundancy of 0.53.As for forest land, the redundancy of features selected by G-PLSGLR is 0.26, while GRRF-selected features have a redundancy of 0.41.In view of grass land, features selected by G-PLSGLR have a redundancy of 0.28, and GRRF-selected features have a redundancy of 0.40.For crops land, the redundancy of features selected by G-PLSGLR is 0.29, and GRRF-selected features have a redundancy of 0.41.The redundancy of features selected by G-PLSGLR for bare land is 0.35, and GRRF-selected features have a redundancy of 0.39.In view of residential and build-up land, the redundancy of features selected by G-PLSGLR is 0.28, and GRRF-selected features have a redundancy of 0.38.Additionally, the overall redundancy for G-PLSGLR is 0.29, while GRRF has an overall redundancy of 0.42.In short, the performance of G-PLSGLR-selected features exceeds GRRF-selected features in metrics of redundancy.

Accuracy Assessment on Selected Features
The accuracy assessment is derived from a confusion matrix.The OA describes the overall correctness of classification, but does not reveal the distribution of errors [50].Therefore, UA and PA are taken into consideration.They are defined as follows: where ii P is the number of the i-th land cover category that is correctly classified, N is the total number of testing samples, n is the number of categories, j P + is sum of the j-th columns in the confusion matrix, and i P + is sum of the i-th rows in the confusion matrix.
The OA of G-PLSGLR and GRRF is shown in Figure 5 and Table 3.Five land cover categories have an OA of more than 80%, while the OA of grass land is about 75%.The mean OA for features selected by G-PLSGLR has a mean value of 90.63%, while GRRF-selected features have a mean OA of 85.56% for six land cover categories.This indicates that features selected by G-PLSGLR are more representative than features selected by GRRF.The PA of each land cover category was more than 95% using features selected by G-PLSGLR, while the UA ranged from 70 to 80%, with exceptions of the grass land and bare land.For the serious landscape fragmentation and small occupation area of these two land cover categories, G-PLSGLR chooses no more than 18% features to describe them.The insufficient discrimination of these two land covers at a fixed segmentation scale may account for the low UA, which can be ignored in the accuracy assessment of this study.Based on OA, PA, and UA metrics, G-PLSGLR-selected features achieved a higher accuracy than GRRF-selected features at higher reduction rates, which indicates that G-PLSGLR-selected features are more representative for the identification of specific land cover categories.

Accuracy Assessment on Selected Features
The accuracy assessment is derived from a confusion matrix.The OA describes the overall correctness of classification, but does not reveal the distribution of errors [50].Therefore, UA and PA are taken into consideration.They are defined as follows: where P ii is the number of the i-th land cover category that is correctly classified, N is the total number of testing samples, n is the number of categories, P +j is sum of the j-th columns in the confusion matrix, and P i+ is sum of the i-th rows in the confusion matrix.
The OA of G-PLSGLR and GRRF is shown in Figure 5 and Table 3.Five land cover categories have an OA of more than 80%, while the OA of grass land is about 75%.The mean OA for features selected by G-PLSGLR has a mean value of 90.63%, while GRRF-selected features have a mean OA of 85.56% for six land cover categories.This indicates that features selected by G-PLSGLR are more representative than features selected by GRRF.The PA of each land cover category was more than 95% using features selected by G-PLSGLR, while the UA ranged from 70 to 80%, with exceptions of the grass land and bare land.For the serious landscape fragmentation and small occupation area of these two land cover categories, G-PLSGLR chooses no more than 18% features to describe them.The insufficient discrimination of these two land covers at a fixed segmentation scale may account for the low UA, which can be ignored in the accuracy assessment of this study.Based on OA, PA, and UA metrics, G-PLSGLR-selected features achieved a higher accuracy than GRRF-selected features at higher reduction rates, which indicates that G-PLSGLR-selected features are more representative for the identification of specific land cover categories.

Comparison with Other Low Sample Size Studies
Previous studies have demonstrated that when the number of training samples is very small (e.g., 20, 40 samples), algorithms including Support Vector Machine, Random Forests, Classification and Regression Tree, and Maximum-Likelihood Classification did not perform well on 24 features [31].With the increasing number of features, the accuracy of classification using these algorithms may be even lower.This study used a number of training samples between 15 and 38 for each class of 51 features, and achieved relative high accuracy of classification after the G-PLSGLR feature selection process, which is detailed in Section 3.2.We interpret the increased performance to the following mechanisms: elimination of class imbalance, reduction of feature redundancy, and suitability of PLSGLR.
As mentioned in Section 2.2, classifiers tend to result in a classification preference towards the major class with uneven class sizes [39,51].In a feature selection process, the selected features may be more conducive to the separation of the major class.Under-sampling, which involves removing samples of the major class, is a method of dealing with this class imbalance problem.With the high dimensionality and large intra-class variability of samples, a cluster-based under-sampling approach would be invalid [38].In this work, the manual labeling of subtypes for training samples is acceptable due to the low sample size, which is inevitable for sample efficiency and classification accuracy.It is more reliable than automatic clustering process or synthetic sampling, with the introduction of domain knowledge.
Previous research has shown that the most important predictor variables always are highly correlated, which results in unstable classification accuracy even when the same training data are used [52,53].Spearman rank-order correlation was used in past research to determine pair-wise

Comparison with Other Low Sample Size Studies
Previous studies have demonstrated that when the number of training samples is very small (e.g., 20, 40 samples), algorithms including Support Vector Machine, Random Forests, Classification and Regression Tree, and Maximum-Likelihood Classification did not perform well on 24 features [31].With the increasing number of features, the accuracy of classification using these algorithms may be even lower.This study used a number of training samples between 15 and 38 for each class of 51 features, and achieved relative high accuracy of classification after the G-PLSGLR feature selection process, which is detailed in Section 3. We interpret the increased performance to the following mechanisms: elimination of class imbalance, reduction of feature redundancy, and suitability of PLSGLR.
As mentioned in Section 2.2, classifiers tend to result in a classification preference towards the major class with uneven class sizes [39,51].In a feature selection process, the selected features may be more conducive to the separation of the major class.Under-sampling, which involves removing samples of the major class, is a method of dealing with this class imbalance problem.With the high dimensionality and large intra-class variability of samples, a cluster-based under-sampling approach would be invalid [38].In this work, the manual labeling of subtypes for training samples is acceptable due to the low sample size, which is inevitable for sample efficiency and classification accuracy.It is more reliable than automatic clustering process or synthetic sampling, with the introduction of domain knowledge.
Previous research has shown that the most important predictor variables always are highly correlated, which results in unstable classification accuracy even when the same training data are used [52,53].Spearman rank-order correlation was used in past research to determine pair-wise correlations.In this study, we absorbed the correlation criteria-based filter methods for feature selection to group highly correlated variables, which is a similar process to that of the above studies.However, a slight difference is that we choose an arbitrary feature as a representative of the total group.The advantages of feature grouping include: (1) the outputs of feature selection are more stable, and (2) the redundancy of the selected features is lower.
This study introduces the PLSGLR method for feature selection for OBIA-based land cover classification.The method is widely used in areas such as hyperspectral information analysis [54][55][56].In these areas, the dimensionality is always higher than number of observations, reaching tens and even hundreds of times the number of observations.Our study provides a framework known as G-PLSGLR, taking class imbalance and feature redundancy into consideration for feature selection in OBIA-based land cover classification to extract main predictor variables with comparable validation accuracy.

Potential Extensions for Producing Multi-Class Land Use and Land Cover Maps
Although this study focuses on extracting the best features that identify the objects of specific land cover categories, a land use and land cover (LULC) map can be produced on basis of the binary classification results.In general, multi-class classification could be extended from binary classification by strategies such as stacking binary classifiers and hierarchical binary classifiers [57,58].The stacking of binary classifiers, which started with the binary classification of each land cover category from image objects, can be used to produce a final LULC map by overlaying the outputs of each binary classification result and voting.In contrast, hierarchical binary classifiers are also widely used to solve the problem of combining the results of binary classifiers with multi-class classification [59].A hierarchical binary classifier can be started by building a hierarchical classification tree from one land cover that is easier to identify or by following the maximum margin rule to split classes into two macro-classes [60].Then, other land cover categories can be extracted from the outputs of each node.By optimizing the order of extraction, the hierarchic constraints are helpful in improving the accuracy.However, there are some problems such as error accumulations, extraction orders, and category conflicts, which need further research.

Conclusions
The high-dimensionality and low sample size associated with land cover information extraction using OBIA is a common problem [15].In this study, we propose an improved extended partial least squares regression method known as G-PLSGLR to reduce the feature dimensionality of OBIA-based land cover classification.To obtain a low sample size for OBIA, we collected several samples from each land cover category using results from Gaofen-2 image segmentation, and we collected a series of validation samples to evaluate the performance of the proposed method.The approach consisted of four steps: (1) labeling the sub-categories of samples and automatically sampling them into class balanced datasets, in view of the fact that the training samples have high intra-class variability, and the number of training samples is less than the dimensionality; (2) grouping features based on the Pearson correlation coefficient and graph theory to reduce the redundancy of features calculated by OBIA; (3) ranking features; the grouping results were analyzed using a PLSGLR regression, and non-significant features were removed after evaluation by a bootstrap method for confidence interval estimation.The remaining features found to be significant for explaining land cover types were sorted based on their regression coefficients; and (4) selecting the optimal feature set from the remaining sorted features according to the BIC.
For the study area, the G-PLSGLR was applied to generate the optimal feature set and was validated by comparing with the results with those of GRRF using the metrics of reduction rate, feature redundancy, OA, and Kappa statistic.The results showed that G-PLSGLR is suitable for OBIA feature selection and can significantly reduce the number and redundancy of selected features compared with GRRF.The overall reduction rate on all land cover categories of features selected by G-PLSGLR was 9.27, while GRRF had a value of 1.61.The overall feature redundancy of G-PLSGLR was 0.29, whereas GRRF had a value of 0.42.The overall OA using an LDA classifier was 90.63% for G-PLSGLR, and 85.56% for GRRF.The PA of each land cover category was more than 95% using features selected by G-PLSGLR, while the PA ranged from 77 to 96% using features selected by GRRF.The UA of G-PLSGLR-selected features ranged from 70 to 80% except for the grass land and bare land, which achieved a 10% higher UA than GRRF-selected features.The evaluation results showed that G-PLSGLR can extract more representative features for a variety of land cover categories at a higher reduction rate with lower feature redundancy and higher classification performance than that of GRRF.
The G-PLSGLR we proposed broadens the choice of feature selection methods for OBIA-based land cover classification with high dimension and low sample size.The method is well-suited to OBIA-based land cover information extraction with a high reduction rate and comparable accuracy.G-PLSGLR can be applied to a wide range of feature selection scenarios and has the advantage of a higher reduction rate with a smaller sample size.Future work will focus on the non-linear effects in the process of grouping features.

Figure 1 .
Figure 1.Study area on the eastern coast of China showing the plot acquired by the Gaofen-2 images.(a) Geo-referenced and atmospheric corrected image; and (b) image overlaying a segmentation layer with training samples and testing samples.

Figure 1 .
Figure 1.Study area on the eastern coast of China showing the plot acquired by the Gaofen-2 images.(a) Geo-referenced and atmospheric corrected image; and (b) image overlaying a segmentation layer with training samples and testing samples.

Figure 2 .
Figure 2. Graph of BIC against the number of features for the G-PLSGLR.Water-BIC vs. number of features (a); Forest land-BIC vs. number of features (b); Grass land-BIC vs. number of features (c); Crops land-BIC vs. number of features (d); Bare land-BIC vs. number of features (e); Residential and build-up land-BIC vs. number of features (f).

Figure 2 .
Figure 2. Graph of BIC against the number of features for the G-PLSGLR.Water-BIC vs. number of features (a); Forest land-BIC vs. number of features (b); Grass land-BIC vs. number of features (c); Crops land-BIC vs. number of features (d); Bare land-BIC vs. number of features (e); Residential and build-up land-BIC vs. number of features (f).

Figure 3 .
Figure 3. Tile map of the two methods, namely, G-PLSGLR and GRRF, against the selected features.The links on graphs are the feature grouping results from the G-PLSGLR.The selected features are depicted as red tiles across the x-axis.Water-selected features vs. methods (a); Forest land-selected features vs. methods (b); Grass land-selected features vs. methods (c); Crops land-selected features vs. methods (d); Bare land-selected features vs. methods (e); Residential and build-up land-selected features vs. methods (f).

Figure 4 .
Figure 4. Feature redundancy of G-PLSGLR and GRRF against six land cover categories.

Figure 4 .
Figure 4. Feature redundancy of G-PLSGLR and GRRF against six land cover categories.

Figure 5 .
Figure 5. Overall accuracy of G-PLSGLR and GRRF against six land cover categories.

Figure 5 .
Figure 5. Overall accuracy of G-PLSGLR and GRRF against six land cover categories.

Table 1 .
Features used to identify image objects in this study.

Table 3 .
The accuracy assessment of LDA classification using features selected by G-PLSGLR and GRRF.

Table 3 .
The accuracy assessment of LDA classification using features selected by G-PLSGLR and GRRF.