Optimal Segmentation Scale Parameter, Feature Subset and Classiﬁcation Algorithm for Geographic Object-Based Crop Recognition Using Multisource Satellite Imagery

: Geographic object-based image analysis (GEOBIA) has been widely used in the remote sensing of agricultural crops. However, issues related to image segmentation, data redundancy and performance of different classiﬁcation algorithms with GEOBIA have not been properly addressed in previous studies, thereby compromising the accuracy of subsequent thematic products. It is in this regard that the current study investigates the optimal scale parameter (SP) in multi-resolution segmentation, feature subset, and classiﬁcation algorithm for use in GEOBIA based on multisource satellite imagery. For this purpose, a novel supervised optimal SP selection method was proposed based on information gain ratio, and was then compared with a preexisting unsupervised optimal SP selection method. Additionally, the recursive feature elimination (RFE) and enhanced RFE (EnRFE) algorithms were modiﬁed to generate an improved EnRFE (iEnRFE) algorithm, which was then compared with its precursors in the selection of optimal classiﬁcation features. Based on the above, random forest (RF), gradient boosting decision tree (GBDT) and support vector machine (SVM) were applied to segmented objects for crop classiﬁcation. The results indicated that the supervised optimal SP selection method is more suitable for application in heterogeneous land cover, whereas the unsupervised method proved more efﬁcient as it does not require reference segmentation objects. The proposed iEnRFE method outperformed the preexisting EnRFE and RFE methods in optimal feature subset selection as it recorded the highest accuracy and less processing time. The RF, GBDT, and SVM algorithms achieved overall classiﬁcation accuracies of 91.8%, 92.4%, and 90.5%, respectively. GBDT and RF recorded higher classiﬁcation accuracies and utilized much less computational time than SVM and are, therefore, considered more suitable for crop classiﬁcation requiring large numbers of image features. These results have shown that the proposed object-based crop classiﬁcation scheme could provide a valuable reference for relevant applications of GEOBIA in crop recognition using multisource satellite imagery.


Introduction
The acquisition of reliable crop distribution maps is becoming increasingly important for agricultural land management amidst growing food demands in response to an ever-increasing human population globally [1]. In recent years, with the continuous launch of satellites such as Landsat-8 and Sentinel-2, crop classification based on multisource medium-high resolution images has become an efficient alternative that resolves the inadequacies of insufficient time resolution of a single optical satellite, especially in cloud-prone areas (tropics and subtropics) [2][3][4].
Geographic object-based image analysis (GEOBIA) is an important approach to the use of multisource satellite imagery in crop classification [5,6]. It takes geographic objects rather than pixels as the basic unit of classification, which avoids the salt and pepper phenomenon typical of traditional pixel-based classifiers like maximum likelihood, and could deal with the problem of inconsistent spatial resolution among multisource data [7][8][9]. Moreover, with the addition of texture and shape information, object-based classification can employ more features in recognizing target objects [10][11][12][13]. However, despite its advantages over traditional classifiers, the use of GEOBIA is also constrained by uncertainties associated with image segmentation [14], feature selection [15,16] and classification algorithms [17,18]. Segmentation is an essential process in GEOBIA, and the definition of segmentation parameters could have a significant influence on classification accuracy [19][20][21]. While the introduction of abundant features brings additional information that is relevant in target recognition, it could also introduce noise or redundant information that can degrade classification accuracy [22,23]. Moreover, several classification algorithms have been proposed and their performances (accuracy, efficiency and feasibility) in GEOBIA vary considerably [24][25][26]. Therefore, in order to exploit the full potential of GEOBIA in crop recognition with multisource satellite imagery, the optimal combination of segmentation, feature selection and classification algorithms needs to be identified, hence the current study.
Multi-resolution segmentation is the most widely used segmentation method in GEOBIA, and it takes a bottom-up segmentation approach based on a pair-wise region merging technique [5,21].
A key parameter in multi-resolution segmentation is scale. An excessively large scale will result in under-segmentation, while an excessively small scale will result in over-segmentation [27,28]. The traditional method to determine the optimal scale parameter (SP) of multi-resolution segmentation is trial and error, which is not only subjective but also time-consuming [29]. In an effort to quantitatively determine optimal SP and improve automation, some supervised and unsupervised algorithms have been developed [30][31][32][33][34]. Supervised methods use 'real' objects to assess the reliability of segmentation results, while unsupervised methods use varieties of heterogeneity and homogeneity measures to assess the segmentation results [35]. Although the unsupervised method is less time-consuming when compared with the supervised method, the latter has been found to be more accurate. Based on a critical review of relevant articles published in three major remote-sensing journals in the recent past, Costa et al. [36] concluded that image segmentation accuracy assessment is still in its infancy and little attention has been paid to assessing the relative competencies of supervised and unsupervised methods. Moreover, performing a supervised optimal SP selection is still challenging owing to the complexity of measuring the differences between segmented and 'real' objects, in addition to being difficult and time-consuming in acquiring reference objects over large areas [37]. Studies that assess the potential of acquiring reference objects from representative areas for application at large scales are, therefore, required to minimize time while not compromising accuracy.
Object-based classification could generate dozens or hundreds of features. An excessive number of features may cause difficulties in training classification models and could result in a poor classification performance, known as the Hughes phenomenon [38]. Thus, assessing the importance of features and selecting optimal feature subsets could be helpful for improving the classification accuracy [39]. A series of measures such as variance [40], chi-square [41], Relief-F [42] and correlation [43] have been used to obtain the optimal feature subset, while machine learning algorithms like random forest (RF) can also provide variable importance that is useful in feature selection [44,45]. However, a feature of low importance may have a great impact on classification accuracy when combined with other features [46]. By this consideration, assessing the performance of a feature subset directly through its classification accuracy would be more appropriate. It is also important to note that when the number of features is large, it is difficult to evaluate the accuracy of all possible feature combinations due to the huge computational work and time required. Thus for operational purposes, a series of search strategies are used in screening the most optimal feature subset [22,47]. However, few studies have evaluated the performance of these feature selection methods especially in GEOBIA crop recognition applications involving multisource satellite imagery.
Several classification algorithms are available for application on segmented objects and optimal features for crop recognition with GEOBIA. With the rapid development of machine learning in recent years, image classification algorithms and tools have increasingly been provided, and their application in crop recognition has become universal. Commonly-used machine learning methods include C4.5 [48], classification and regression tree (CART) [49], RF [44], support vector machine (SVM) [45] and gradient boosting decision tree (GBDT) [50]. Several differences exist among these machine-learning methods, including parameter settings, accuracy performance and efficiency. Therefore, it is necessary to evaluate these machine-learning methods so as to arrive at the most suitable for operational crop recognition initiatives involving GEOBIA and multisource satellite imagery.
Previous studies have evaluated the effect of one or two of the three factors (segmentation, feature selection and classification algorithm) on GEOBIA in land-cover mapping. Duro et al. [51] compared three machine-learning algorithms including RF, SVM and decision tree (DT) for an object-based image analysis that utilizes the 'trial-and-error' approach to determining optimal SP while relying on expert knowledge in selecting optimal object features. Liu et al. [52] compared the performances of fully convolutional networks (FCN), RF and SVM in GEOBIA based on a fixed segmentation scale parameter and involving all features. Based on a fixed segmentation scale parameter, Ma et al. [53] compared the results obtained from three feature-subset evaluation methods in GEOBIA using SVM and RF. To the best of our knowledge, no study has reported to date the combined effect of the aforementioned three factors in GEOBIA.
In view of the research gaps in the use of GEOBIA in crop recognition outlined in the above paragraphs, this paper, based on the information gain ratio, proposed a novel supervised optimal SP selection method for comparison with an unsupervised optimal SP selection method. Additionally, based on the recursive feature elimination (RFE) [54] and enhanced RFE (EnRFE) [22] algorithms, this paper proposed an improved EnRFE (iEnRFE) algorithm for the selection of the optimal feature subset for GEOBIA. Using the optimal segmentation scale parameter and feature subset, the widely used classification algorithms of RF, SVM and GBDT were evaluated for accuracy and efficiency in crop recognition with GEOBIA and multisource satellite imagery (Sentinel-2A, Landsat-8 OLI and GF-1 WFV), and results obtained are envisaged to provide a valuable reference for relevant operational scenarios.

Study Area
Located in the Jiangsu Province of East China, Xinghua County is used as test site (Figure 1). It has a total area of 2353 km 2 and a population size of about 1.5 million. Xinghua County is within the subtropical monsoon climate zone, with an average annual temperature range of 14 • C to 15 • C and an annual precipitation total of approximately 1040.4 mm. The relatively flat topography and warm-moist climate of this area favors aquaculture and agriculture, making it one of the major food producing areas of China.

Image Acquisition and Preprocessing 147
In order to obtain enough clear sky images that cover the main phenological stages of the crops 148 under investigation in Xinghua, only satellite images whose footprint over Xinhua is 100% cloud-free 149 were downloaded for the current study. In this regard, only five images from the multisource fulfilled 150 this criterion and they included two images of the with an overlapping swath of 830 km. 173 In this study, the 6S (second simulation of a satellite signal in the solar spectrum) atmospheric 174 correction model was used to convert the digital numbers (DN) of raw Landsat-8 and GF-1 images 175 to surface reflectance [58]. The 6S model is an advanced radiative transfer code designed to simulate 176 the reflection of solar radiation by a coupled atmosphere-surface system for a wide range of 177 atmospheric, spectral and geometrical conditions. Then the rational polynomial coefficient (RPC) 178 model was used to ortho-rectify the GF-1 image [59]. RPCs provide a compact representation of a 179 The main summer crop is paddy rice [55], whereas winter wheat (Triticum aestivum L.) and oilseed rape (Brassica napus L.) [56] dominate the winter season. Green Onion (Allium ascalonicum) is planted year-round and can be harvested 3 to 4 times a year [57].

Image Acquisition and Preprocessing
In order to obtain enough clear sky images that cover the main phenological stages of the crops under investigation in Xinghua, only satellite images whose footprint over Xinhua is 100% cloud-free were downloaded for the current study. In this regard, only five images from the multisource fulfilled this criterion and they included two images of the Sentinel-2A Multispectral Instrument (MSI), two images of the Landsat-8 Operational Land Imager (OLI) and one image obtained from GF-1 Wide Field of View (WFV). Sentinel-2A images were acquired on 28 February and 29 April 2017, Landsat images were acquired on 15 March and 18 May 2017, and the GF-1 image was acquired on 7 December 2016. Being acquired from December to May, these images can capture important phenological information on winter wheat, oilseed rape and green onion in the study area, which can be used as a classification feature to discriminate these agricultural crops.
Sentinel-2 MSI is one of five sister missions of the Copernicus Initiative of the European Commission (EC) and the European Space Agency (ESA), and images from this satellite are freely available to the user community via the ESA Scientific Data hub portal (https://scihub.copernicus.eu). Sentinel-2 images are provided in top-of-atmosphere reflectance (TOA) as Level-1C. Conversion to bottom-of-atmosphere (BOA) reflectance to produce Level-2A can be implemented using the Sen2Cor atmospheric correction tool provided by ESA (http://step.esa.int/main/third-party-plugins-2/sen2cor/). The Sen2Cor atmospheric correction model is dependent on radiative transfer functions based on sensor and solar geometries, ground elevations, and atmospheric parameters. In this study, only the visible and near infrared (VNIR) and shortwave infrared (SWIR) bands are employed.
Landsat-8 images are employed in the current study. Landsat images are provided freely to users by the United States Geological Survey (USGS) and can be downloaded at https://glovis.usgs.gov.
GF-1 WFV data are provided by the China Centre for Resources Satellite Data and Application (CRESDA) (http://www.cresda.com). GF-1 WFV data contain four bands in the VNIR region at a spatial resolution of 16 m. GF-1 is the first of a series of high-resolution optical Earth observation satellites of the China National Space Administration (CNSA). It has four wide field of view cameras (WFVs) with an overlapping swath of 830 km.
In this study, the 6S (second simulation of a satellite signal in the solar spectrum) atmospheric correction model was used to convert the digital numbers (DN) of raw Landsat-8 and GF-1 images to surface reflectance [58]. The 6S model is an advanced radiative transfer code designed to simulate the reflection of solar radiation by a coupled atmosphere-surface system for a wide range of atmospheric, spectral and geometrical conditions. Then the rational polynomial coefficient (RPC) model was used to ortho-rectify the GF-1 image [59]. RPCs provide a compact representation of a ground-to-image geometry, allowing photogrammetric processing without requiring a physical camera model. All RPCs were provided together with the raw GF-1 WFV image in a ".rpb" file.
To ensure the alignment of pixels across multisource images, geometric registration was carried out after atmospheric correction, and for which a Sentinel-2A MSI image (acquired on 29 April 2017) was used as the reference to which the other images were registered.

Ground Reference Data Acquisition
Sample selection was done after segmentation (see Section 4.1). Through visual interpretation of satellite and very high resolution Google Earth images, and knowledge obtained from field surveys (including information on crop category, location, distribution and environment), a total of 2025 objects were selected as sample data based on a random selection strategy, including 649 winter wheat objects, 230 rape objects, 176 green onion objects and 970 objects in the 'others' category (buildings, roads, water, bare lands, woodlands etc.). The distribution of ground reference samples is shown in Figure 2. Using a stratified random sampling, 70% of the total samples in each class were used in feature selection and model training, whereas the remaining 30% were used to validate the feature selection and crop classification. ground-to-image geometry, allowing photogrammetric processing without requiring a physical 180 camera model. All RPCs were provided together with the raw GF-1 WFV image in a ".rpb" file. 181 To ensure the alignment of pixels across multisource images, geometric registration was carried 182 out after atmospheric correction, and for which a Sentinel-2A MSI image (acquired on 29 April 2017) 183 was used as the reference to which the other images were registered. 184

Ground Reference Data Acquisition 185
Sample selection was done after segmentation (see Section 4.1). Through visual interpretation of 186 satellite and very high resolution Google Earth images, and knowledge obtained from field surveys 187 (including information on crop category, location, distribution and environment), a total of 2,025 188 objects were selected as sample data based on a random selection strategy, including 649 winter 189 wheat objects, 230 rape objects, 176 green onion objects and 970 objects in the 'others' category 190 (buildings, roads, water, bare lands, woodlands etc.). The distribution of ground reference samples 191 is shown in Figure 2. Using a stratified random sampling, 70% of the total samples in each class were 192 used in feature selection and model training, whereas the remaining 30% were used to validate the 193 feature selection and crop classification.

Methodology 203
The methodology employed in this study is outlined in Figure 3 and explained in detail 204 thereafter. The eCognition software was used to perform image segmentation. The Python language 205 together with the Geospatial Data Abstraction Library (GDAL) and Scikit-learn library were used to 206 implement the optimal segmentation scale parameter selection, feature elimination and machine 207 learning classification algorithms. GDAL was used to read and write the data, and Scikit-learn was 208

Methodology
The methodology employed in this study is outlined in Figure 3 and explained in detail thereafter. The eCognition software was used to perform image segmentation. The Python language together with the Geospatial Data Abstraction Library (GDAL) and Scikit-learn library were used to implement the optimal segmentation scale parameter selection, feature elimination and machine learning classification algorithms. GDAL was used to read and write the data, and Scikit-learn was used to perform the feature elimination algorithms and machine learning classification algorithms.

Image Segmentation 213
The eCognition software is used to segment the two Sentinel-2A images which have the highest 214 resolution of all images used in this study. The segmentation is based on the multi-resolution 215 segmentation algorithm, as described in [5]. In eCognition, three main parameters are supposed to 216 be defined by users based on project requirements: the scale parameter designated as S, the 217 shape/color parameters designated as weight wshape/wcolor and the compactness/smoothness 218 parameters designated as weight wcompt/wsmooth. Due to fragmented and irregular croplands, the 219 shapes of different crop fields do not differ much. Therefore, compared with shape features, we 220

Image Segmentation
The eCognition software is used to segment the two Sentinel-2A images which have the highest resolution of all images used in this study. The segmentation is based on the multi-resolution Remote Sens. 2019, 11, 514 7 of 23 segmentation algorithm, as described in [5]. In eCognition, three main parameters are supposed to be defined by users based on project requirements: the scale parameter designated as S, the shape/color parameters designated as weight w shape /w color and the compactness/smoothness parameters designated as weight w compt /w smooth . Due to fragmented and irregular croplands, the shapes of different crop fields do not differ much. Therefore, compared with shape features, we accorded more importance to the spectral features in image segmentation. As the sum of w shape and w color must be equal to 1.0, they were respectively set, in the current study, to 0.1 and 0.9. In a similar vein, the sum of w compt and w smooth must also be equal to 1.0, but in this regard, they were respectively set to 0.5 to ensure an equal importance of these parameters in image segmentation. Thus, the only parameter that needed to be optimized was scale, S.
Three regions in the test site, respectively representing areas of major winter wheat, oilseed rape and green onion cultivation were selected for acquiring the reference objects needed in determining the scale parameter ( Figure 4). The first region which is predominantly of winter wheat is located in the southern part of Xinghua with an area of about 2.5 km × 1.5 km (Figure 4a). The northern part of Xinghua, composed mainly of oilseed rape, was selected as the second region with an area of about 2.1 km 2 ( Figure 4b). The third region consisting mainly of green was selected from the central part of Xinghua and has an area of about 2.1 km 2 ( Figure 4c). Both unsupervised and supervised optimal SP selection methods were employed in the current study.
Remote Sens. 2019, 10, x FOR PEER REVIEW 7 of 23 set to 0.5 to ensure an equal importance of these parameters in image segmentation. Thus, the only 224 parameter that needed to be optimized was scale, S. 225 Three regions in the test site, respectively representing areas of major winter wheat, oilseed rape 226 and green onion cultivation were selected for acquiring the reference objects needed in determining 227 the scale parameter ( Figure 4). The first region which is predominantly of winter wheat is located in 228 the southern part of Xinghua with an area of about 2.5 km×1.5 km (Figure 4a). The northern part of 229 Xinghua, composed mainly of oilseed rape, was selected as the second region with an area of about  Espindola et al. [30] proposed an unsupervised optimal SP selection method that takes into 238 account global intra-segment and inter-segment heterogeneity measures (weighted variance and 239 Moran's I, respectively) and finally obtains a global score (GS). The scale with the minimum GS is 240 selected as the optimal SP. This method has been widely used [60][61][62] and discussed [63], and its 241 framework is described as follows.

242
The global intra-segment goodness measure, weighted variance, is calculated as follows: 243 where n is the total number of segmentation objects; vi is the variance; and ai is the area of object i. 245 The inter-segment heterogeneity measure, Moran's I, is calculated as follows: 246 where wij is the spatial weight between object i; and j (if they are neighbors sharing a boundary, wij = 248 1, otherwise, wij = 0), yi and yj are the mean spectral value of object i and j respectively, and is the 249 mean spectral value of the image. 250 Then Eq. (3) is used to rescale the wVar and MI to 0-1, and GS is calculated by Equation (4). 251

Unsupervised Optimal SP Selection Method
Espindola et al. [30] proposed an unsupervised optimal SP selection method that takes into account global intra-segment and inter-segment heterogeneity measures (weighted variance and Moran's I, respectively) and finally obtains a global score (GS). The scale with the minimum GS is selected as the optimal SP. This method has been widely used [60][61][62] and discussed [63], and its framework is described as follows.
The global intra-segment goodness measure, weighted variance, is calculated as follows: where n is the total number of segmentation objects; v i is the variance; and a i is the area of object i. The inter-segment heterogeneity measure, Moran's I, is calculated as follows: where w ij is the spatial weight between object i; and j (if they are neighbors sharing a boundary, w ij = 1, otherwise, w ij = 0), y i and y j are the mean spectral value of object i and j respectively, and y is the mean spectral value of the image. Then Equation (3) is used to rescale the wVar and MI to 0-1, and GS is calculated by Equation (4).

Supervised Optimal SP Selection Method
A supervised optimal SP selection algorithm is proposed in the current study based on the information gain ratio, which is often used in decision tree construction [48]. The detailed steps in the construction of this algorithm are described as follows: (1) A series of SPs were used to segment the images, and then a series of segmented objects A scale were obtained. In this study, the scales range from 10 to 180 with an interval of 10.
(2) A classification image D was obtained by visual interpretation or other classification methods. Then, H(D), which is the entropy of classification image D, was calculated using Equation (5): where k is the number of classes in D, P i is the probability of class i, which is the ratio of the area of class i to the total area of the image.
(3) The conditional entropy was calculated using Equation (6): where H(D|A scale ) is the conditional entropy of image D with segmented objects and n is the total number of objects. P(A=A i ) is the probability of object i, which is calculated as the ratio of the area of object i to the total area of image D. H(D|A=A i ) is the entropy of image D within the boundary of object i and can be calculated using Equation (5).
(4) The information gain for segmented objects A scale was defined as follows: (5) The information gain ratio can be calculated using Equation (8). H(A scale ) is the intrinsic entropy of the segmented objects A scale , and can be calculated using Equation (5).
With the reduction in SP, it is obvious that when the size of the segmented object is larger than the 'real' object (Figure 5a), the interior classes within the segmented object tend to be consistent, and the conditional entropy H(D|A scale ) will decrease while the information gain ratio g R (D,A scale ) will increase gradually together with the information gain g(D,A scale ). Once the segmentation object coincides perfectly with the actual object (Figure 5b), the information gain ratio reaches the highest point. When the segmentation object is smaller than the actual object (Figure 5c), the information gain of segmentation objects will not increase any more, but the intrinsic entropy of the segmentation objects H(A scale ) will still increase due to the increase in the number of objects, meaning the information gain ratio will decrease with the decrease in SP. Finally, the scale with the highest information gain ratio can be selected as optimal SP. coincides perfectly with the actual object (Figure 5b), the information gain ratio reaches the highest 281 point. When the segmentation object is smaller than the actual object (Figure 5c), the information gain 282 of segmentation objects will not increase any more, but the intrinsic entropy of the segmentation 283 objects H(Ascale) will still increase due to the increase in the number of objects, meaning the 284 information gain ratio will decrease with the decrease in SP. Finally, the scale with the highest 285 information gain ratio can be selected as optimal SP. 286 Area=0.1 Area=0.4 Area=0.1 Area=0.05 Area=0.2 Area=0.05 Area=0.05 Area=0.2 Under the condition of optimal segmentation, information gain is 1, the intrinsic entropy of segmentation objects is 1.72, and the information gain ratio increases to 0.58; (c) Under the condition of over-segmentation, information gain is still equal to 1, but the intrinsic entropy of segmentation objects increased to 2.72, and the information gain ratio decreases to 0.37.

Feature Calculation
Twelve shape features were calculated from the segmented objects, including area, length, width, compactness, density, asymmetry, roundness, elliptic fit, rectangular fit, main direction, border index and shape index.
The principal component analysis (PCA) transformation was implemented on all five images and only the first PCA of each image, which possesses the highest amount of information, was selected to calculate texture features. The 8 texture features for each image included GLCM (gray-level co-occurrence matrix) homogeneity, contrast, dissimilarity, entropy, ang.2nd moment, mean, StdDev, and correlation [64]. Thus, a total of 40 texture features were obtained for use in this study.
The spectral features included the mean and variance of each band in each image. Then a total of 76 spectral features were obtained from the 38 bands of the multisource satellite imagery.
Index features included the normalized difference vegetation index (NDVI) [65], enhanced vegetation index (EVI) [66], land surface water index (LSWI) [67], and the modified normalized difference water index (MNDWI) [68]. NDVI and EVI are the most widely used vegetation indices in remote sensing-based crop monitoring [65,66]. LSWI contains information on the moisture content of the underlying surfaces of vegetation canopies [67,69], and MNDWI presents an enhanced capability of extracting open water features as it can efficiently suppress or even remove noise from soil or built-up background [68]. Therefore, 18 index features were calculated from the five satellite images. LSWI and MNDWI were not calculated from the GF-1 WFV image due to its lack of a SWIR band.
In total, 146 features were obtained from spectra, index, shape and texture, and from which the optimal features were selected.

Optimal Feature Subset Selection
Recursive feature elimination (RFE) is an embedded backward feature elimination algorithm [54] and has been used in the selection of optimal features for use in GEOBIA. Three major steps are involved in the implementation of the RFE algorithm. Firstly, a classification model (such as RF, SVM or GBDT) is trained using training data. Secondly, all features are ranked based on their importance as generated by model training. Thirdly, the least important feature is eliminated and a new feature subset is generated. Through repetition of steps 1 to 3, the selected feature subsets are subjected to classification and accuracy assessment until the feature subset with highest classification accuracy is obtained and considered as the optimal feature subset.
Chen et al. [22] proposed an enhanced RFE (EnRFE) algorithm based on RFE. EnRFE utilizes the complete feature subset (F) and can be implemented using three major steps. Firstly, a classification model is trained using training data with features in F, and the classification accuracy of F is evaluated using testing data and all features are ranked from lowest to highest based on their importance as generated by model training. Secondly, features in F are eliminated by replacement in order of importance until a feature whose elimination will not decrease the accuracy is found. Thirdly, the second step is repeated until F attains ∅. Based on the above, the F with the highest accuracy is considered the optimal feature subset for use in subsequent tasks.
In this paper, an improved EnRFE algorithm, hereafter referred to as iEnRFE, is proposed to better select the optimal feature subset. The proposed algorithm has two major improvements relative to its precursor, EnRFE. Firstly, the iEnRFE algorithm limits the depth of feature search and the feature search changes from sequential to synchronous. This approach avoids the search for most or even the entire feature set under extreme conditions and allows for parallel computing which can greatly improve the efficiency of the algorithm. Secondly, the iEnRFE algorithm searches for d features synchronously (d is the search depth) and can ultimately find that feature whose absence would produce the highest accuracy. This eliminates the ambiguity attendant with EnRFE. The processing chain of the proposed iEnRFE algorithm is outlined in Appendix A.

Parameter Optimization of Machine-Learning Classification Algorithms for Crop Recognition
Three widely used machine-learning classifiers we reemployed for crop recognition in the current study and they include random forest (RF), gradient boosting decision tree (GBDT) and support vector machine (SVM). Parameter optimization is a crucial step for improving the performance of each classifier. In this paper, a grid search algorithm was employed to find optimal parameter values [70]. It involves an exhaustive search through a manually specified subset of the hyper parameter space of a classifier. The optimal parameter value is determined as one that produced the highest classification accuracy.
RF is a combination of many individual decision trees. Each decision tree is automatically trained based on the principle of random sampling with replacement, and the final classification result is determined by a voting process involving all decision trees [44]. There are two main parameters that need to be optimized in the RF model using grid search. They are ntree-the number of decision trees grown, and nfeature-the number of features randomly sampled as candidates at each tree node split [35]. In this study, the search range for ntree was set from 10 to 300 with an interval of 10, and that for nfeature from 5 to 30 with an interval of 5.
GBDT develops an ensemble of tree-based models by training each tree in a sequential manner [50]. Each iteration fits a decision to the residuals left by the previous and then the prediction is accomplished by combining all trees [71,72]. Learning_rate-the step size while learning, max_features-the number of features to consider when looking for the best split, and n_estimators-the number of boosting stages to perform, are the three main parameters that need to be tuned in GBDT [72]. The search range for learning_rate is was set from 0.05 to 0.3 with a search step of 0.05, that for max_features from 5 to 30 with a search step of 5, while the search range for n_estimators was set from 10 to 200 with a search step of 10.
SVM is a statistical learning model based on the principle of structural risk minimization, and looks for a hyperplane with the greatest margin to divide the samples into two classes with the largest interval [73]. When classes are not linearly separable, the kernel function of input data can be projected into a higher dimensional feature space in which data can be linearly separable [74]. SVM parameters that need optimization include kernel type, penalty parameter C, and gamma coefficient. The search values for kernel function include 'linear', 'rbf', 'poly' and 'sigmoid', those for C include 1, 2, 4, 8 and 10, and those for gamma coefficient include 0.1, 0.2, 0.5, 0.75 and 1. It should be noted that the gamma parameter is not valid for a 'linear' function.
In the implementation of RF, GBDT and SVM in this study, 1,418 training samples were first used to train the models. Then the trained models were used to classify the segmented objects based on the optimal feature subset. Finally, 607 test samples were used to evaluate the accuracy of crop recognition. Overall accuracy (OA), user accuracy (UA), producer accuracy (PA) and kappa coefficient were used as accuracy assessment measures [75].

Optimal SP Selection and Image Segmentation
Based on the supervised method proposed in Section 3.1.2, the optimal SP was calculated for the three sample regions as given in Figure 6. It is observed that the highest information gain ratios were obtained at a SP of 90 for green onion and oilseed rape, and 110 for winter wheat. Hence, based on the proposed supervised method, the optimal SP for green onion and oilseed is 90 whereas that for winter wheat is 110.

387
Moreover, the unsupervised method described in Section 3.1.1 was also used to determine the 388 optimal SP in the same regions as the supervised method ( Figure 7). As shown in Figure 7a, intra-389 segment homogeneity increases with scale, indicating that the spectral difference in segmented 390 objects decreases with the increase of scale. In Figure 7b, inter-segment heterogeneity decreases with 391 the increase of scale, indicating that the spectral difference between segmented objects increases with 392 scale. The scale of minimum GS in Figure 7c was selected as the optimal SP. The optimal SP of winter 393 wheat and oilseed rape are 100, which is similar to the optimal SP selected by the supervised 394 algorithm, and this shows the consistency between the two methods in optimal SP selection. 395 However, the optimal SP of green onion is difficult to determine owing to a minimum GS at a scale 396 of 150, which is obviously not the optimal SP. Moreover, GS values of green onion at scales 50 and 70 397 are also relatively low.

401
Based on visual interpretation, it was observed that a segmentation scale of 90 or 100 can be 402 optimal for multi-resolution segmentation (Figure 8). A lower or higher scale parameter will result 403 in over-segmentation or under-segmentation, respectively. 404 Figure 8 shows the segmentation results of the winter wheat sample region at three different 405 scales. It is observed that only a part of winter wheat had turned green at the end of February ( Figure  406 8a), and in late April (Figure 8b), the entire winter wheat field had recorded substantial growth. 407 Moreover, the unsupervised method described in Section 3.1.1 was also used to determine the optimal SP in the same regions as the supervised method ( Figure 7). As shown in Figure 7a, intra-segment homogeneity increases with scale, indicating that the spectral difference in segmented objects decreases with the increase of scale. In Figure 7b, inter-segment heterogeneity decreases with the increase of scale, indicating that the spectral difference between segmented objects increases with scale. The scale of minimum GS in Figure 7c was selected as the optimal SP. The optimal SP of winter wheat and oilseed rape are 100, which is similar to the optimal SP selected by the supervised algorithm, and this shows the consistency between the two methods in optimal SP selection. However, the optimal SP of green onion is difficult to determine owing to a minimum GS at a scale of 150, which is obviously not the optimal SP. Moreover, GS values of green onion at scales 50 and 70 are also relatively low.

387
Moreover, the unsupervised method described in Section 3.1.1 was also used to determine the 388 optimal SP in the same regions as the supervised method ( Figure 7). As shown in Figure 7a, intra-389 segment homogeneity increases with scale, indicating that the spectral difference in segmented 390 objects decreases with the increase of scale. In Figure 7b, inter-segment heterogeneity decreases with 391 the increase of scale, indicating that the spectral difference between segmented objects increases with 392 scale. The scale of minimum GS in Figure 7c was selected as the optimal SP. The optimal SP of winter 393 wheat and oilseed rape are 100, which is similar to the optimal SP selected by the supervised 394 algorithm, and this shows the consistency between the two methods in optimal SP selection. 395 However, the optimal SP of green onion is difficult to determine owing to a minimum GS at a scale 396 of 150, which is obviously not the optimal SP. Moreover, GS values of green onion at scales 50 and 70 397 are also relatively low.

401
Based on visual interpretation, it was observed that a segmentation scale of 90 or 100 can be 402 optimal for multi-resolution segmentation (Figure 8). A lower or higher scale parameter will result 403 in over-segmentation or under-segmentation, respectively. 404 Figure 8 shows the segmentation results of the winter wheat sample region at three different 405 scales. It is observed that only a part of winter wheat had turned green at the end of February ( Figure  406 8a), and in late April (Figure 8b), the entire winter wheat field had recorded substantial growth. 407 Owing to the more obvious inter-field variability in winter wheat at the middle of the growing 408

423
It is important to note that optimal SP selection methods all have their unique advantages and 424 disadvantages. The unsupervised optimal SP selected method has been considered as an efficient and 425 automatic approach as it does not require the generation of reference objects, a process that can be 426 time-consuming and subjective. However, due to the complexity and polysemy of objects, optimal 427 SP obtained by unsupervised methods may deviate from reality (as the optimal SP of green onion in 428 this study). Assuming crops of similar spectral patterns such as winter wheat and rapeseed coexist 429 in a field, when the target class is cropland, a larger segmentation scale is required to combine 430 rapeseed and winter wheat into one object, but when the target classes are winter wheat and 431 rapeseed, a smaller SP is desirable to separate these two crops. However, optimal SP determined by 432 unsupervised methods would not change with the target classes. 433

Feature Calculation and Optimal Feature Subset Selection 434
After extraction of image objects, a total number of 146 features, including spectral, index, shape 435 and texture features were calculated as described in Section 3.2. Comparisons have been made in this 436 paper among RFE, EnRFE and iEnRFE in the selection of the optimal feature subset for RF, GBDT 437 and SVM models, and the results are presented in Figure 9.  Figure 8 shows the segmentation results of the winter wheat sample region at three different scales. It is observed that only a part of winter wheat had turned green at the end of February (Figure 8a), and in late April (Figure 8b), the entire winter wheat field had recorded substantial growth. Owing to the more obvious inter-field variability in winter wheat at the middle of the growing season, two classes were obtained from this category. Using the maximum likelihood classifier, five classes were obtained (Figure 8c-e). Each class contains pixels of similar spectra. Figure 8c shows an obvious over-segmentation at a SP of 50. Although the internal homogeneity of each object is high, the information gain ratio is low given the number of segmented objects. Figure 8e shows an obvious under-segmentation at a SP of 130. When the SP is 90, the boundaries of the segmented objects are in good agreement with the classes (Figure 8d).
Finally, 90 was selected as the optimal SP for the entire Xinghua City, given the fact that a small segmentation scale normally brings a higher classification accuracy [21]. After obtaining the optimal SP, the two Sentinel-2A images were segmented into a total number of 529,192 objects based on the multi-resolution segmentation algorithm.
It is important to note that optimal SP selection methods all have their unique advantages and disadvantages. The unsupervised optimal SP selected method has been considered as an efficient and automatic approach as it does not require the generation of reference objects, a process that can be time-consuming and subjective. However, due to the complexity and polysemy of objects, optimal SP obtained by unsupervised methods may deviate from reality (as the optimal SP of green onion in this study). Assuming crops of similar spectral patterns such as winter wheat and rapeseed coexist in a field, when the target class is cropland, a larger segmentation scale is required to combine rapeseed and winter wheat into one object, but when the target classes are winter wheat and rapeseed, a smaller SP is desirable to separate these two crops. However, optimal SP determined by unsupervised methods would not change with the target classes.

Feature Calculation and Optimal Feature Subset Selection
After extraction of image objects, a total number of 146 features, including spectral, index, shape and texture features were calculated as described in Section 3.2. Comparisons have been made in this paper among RFE, EnRFE and iEnRFE in the selection of the optimal feature subset for RF, GBDT and SVM models, and the results are presented in Figure 9. It is also observed that the iEnRFE algorithm performs better in optimal feature subset selection 446 with RF when the number of features is greater than 20 (Figure 9a). With the reduction of features, 447 the classification accuracy of iEnRFE gradually increases, while the accuracies of RFE and EnRFE are 448 almost stable. Figure 9b shows that the three feature selection methods have similar performances 449 with the GBDT model, as no significant change in accuracy is observed when the number of features 450 is greater than 20. Figure 9c shows that for the SVM model, iEnRFE obtained the highest score when 451 the feature amount is set to 30. EnRFE also produced a higher score when the feature amount is about 452 30. However, when the feature amount is about 70, there is a drastic drop in the score of RFE. This 453 phenomenon suggests that the RFE method can erroneously remove features of low importance, but 454 which are useful to improve the overall classification accuracy when combined with other features. 455 The iEnRFE algorithm in this regard is more effective in selecting the optimal feature subset than its 456 precursor, EnRFE. 457 Table 1 shows the optimal feature subset selected for each classification model based on the 458 iEnRFE method. It can be observed in Table 1 that index features have the highest importance amount  459 features, accounting for 15, 13 and 14 features included in the optimal feature subsets for RF, GBDT 460 and SVM models, respectively. The next in order of importance are spectral features, accounting for 461 10, 12 and 10 features in the optimal subsets for RF, GBDT and SVM models, respectively. Texture 462 and shape features exhibited the least importance, with only of 5, 5 and 6 features being selected in 463 the optimal subsets for RF, GBDT and SVM, respectively. This is attributable to the effect of the lower 464 resolution images (Landsat-8) which may not have provided detailed texture and shape information 465 that would have enhanced classification. However, texture and shape features are still useful to 466 improve the classification accuracy. 467 It is observed that at the initial increase in the number of features, the classification accuracies of all classification methods first increase rapidly. With a further increase in the number of features, accuracies either level-off or decrease owing to the introduction of more redundant information.
It is also observed that the iEnRFE algorithm performs better in optimal feature subset selection with RF when the number of features is greater than 20 (Figure 9a). With the reduction of features, the classification accuracy of iEnRFE gradually increases, while the accuracies of RFE and EnRFE are almost stable. Figure 9b shows that the three feature selection methods have similar performances with the GBDT model, as no significant change in accuracy is observed when the number of features is greater than 20. Figure 9c shows that for the SVM model, iEnRFE obtained the highest score when the feature amount is set to 30. EnRFE also produced a higher score when the feature amount is about 30. However, when the feature amount is about 70, there is a drastic drop in the score of RFE. This phenomenon suggests that the RFE method can erroneously remove features of low importance, but which are useful to improve the overall classification accuracy when combined with other features. The iEnRFE algorithm in this regard is more effective in selecting the optimal feature subset than its precursor, EnRFE. Table 1 shows the optimal feature subset selected for each classification model based on the iEnRFE method. It can be observed in Table 1 that index features have the highest importance amount features, accounting for 15, 13 and 14 features included in the optimal feature subsets for RF, GBDT and SVM models, respectively. The next in order of importance are spectral features, accounting for 10, 12 and 10 features in the optimal subsets for RF, GBDT and SVM models, respectively. Texture and shape features exhibited the least importance, with only of 5, 5 and 6 features being selected in the optimal subsets for RF, GBDT and SVM, respectively. This is attributable to the effect of the lower resolution images (Landsat-8) which may not have provided detailed texture and shape information that would have enhanced classification. However, texture and shape features are still useful to improve the classification accuracy. Table 1. Optimal feature subset for each machine learning model. Order of Feature Importance   RF   EVI_0429, EVI_0518, NDVI_1207, LSWI_0429, NDVI_0429, EVI_0315, EVI_0127, WNDWI_0429,  B1_MEAN_0429, NDVI_0518, WNDWI_0518, EVI_0228, GLCM_MEAN_0228, Area,  B8_MEAN_0228, LSWI_0518, LSWI_0315, B5_MEAN_0518, B1_MEAN_1207, WNDWI_0228 Assuming a total number of n features in the original feature subset, the time complexities of RF, EnRFE and iEnRFE were calculated. The time complexity of RFE is designated as O(n), meaning its run time will increase linearly with the increase in the number of features. The time complexity of EnRFE would reach O((n 2 +n)/2) at extreme conditions when it cannot find a feature whose removal would not decrease the accuracy of the feature subset, indicating that its run time will have a quadratic increase with the increase in the number of features. The time complexity of the iEnRFE is given as O(d·n), where d is a constant representing the search depth. This means that iEnRFE is a linear time algorithm as with RFE, and its computational time will not increase drastically with the number of features. Table 2 shows the computational time of RFE, EnRFE and iEnRFE. It is observed that the computational time of EnRFE is far greater than that of RFE and iEnRFE, and this is consistent with the time complexity of the algorithm. The computational time of iEnRFE is in the same order of magnitude as the RFE algorithm. This is attributable to their similar time complexities and the parallel computing ability of the iEnRFE algorithm. The CPU of the computer used in this study has a frequency of 3.6 GHz and four cores and eight threads (Intel Core i7-7700), with a memory size of 8 GB. Overall, considering the relative efficiencies of these three algorithms, this paper proposed iEnRFE as the most appropriate algorithm, at least at present, for obtaining the optimal feature subset in the current study.

Crop Classification and Accuracy Analysis
Based on the optimal feature subset and the training data, the optimal parameter values of each machine learning model were determined through grid search. For the RF model, ntree was set to 90, and nfeature to 10. For the GBDT model, learning_rate was set to 0.1, n_estimators to 80, and max_features to 15. For the SVM model, the 'linear' function selected as kernel and the C parameter was set to 2, but the gamma parameter was not set as it is not required for a 'linear' kernel.
The RF, GBDT and SVM models were trained to classify the segmented objects. The classification results of RF, GBDT and SVM models are shown in Figure 10. From the classification results, winter wheat is the main winter crop in Xinghua and accounts for the largest share of arable land. Oilseed rape is more intensely cultivated in the northwestern part of Xinghua (Figure 10d). The area of green onion is relatively small and the crop is mostly concentrated in Duotian, in the central part of Xinghua (Figure 10e). was set to 2, but the gamma parameter was not set as it is not required for a 'linear' kernel. 494 The RF, GBDT and SVM models were trained to classify the segmented objects. The classification 495 results of RF, GBDT and SVM models are shown in Figure 10. From the classification results, winter 496 wheat is the main winter crop in Xinghua and accounts for the largest share of arable land. Oilseed 497 rape is more intensely cultivated in the northwestern part of Xinghua (Figure 10d). The area of green 498 onion is relatively small and the crop is mostly concentrated in Duotian, in the central part of Xinghua 499 (Figure 10e).

503
The accuracies of RF, GBDT and SVM were evaluated using testing data and the results are 504 shown in Figure 11 and Appendix 2. The GBDT model produced the highest overall accuracy (OA of 505 92.4%, with a kappa coefficient of 0.882). The RF model produced an OA of 91.8% and a kappa 506 coefficient of 0.871, and SVM recorded the lowest accuracy among the three machine learning models, 507 with an OA of 90.5% and a kappa coefficient of 0.853. 508 The accuracies of RF, GBDT and SVM were evaluated using testing data and the results are shown in Figure 11 and Appendix B. The GBDT model produced the highest overall accuracy (OA of 92.4%, with a kappa coefficient of 0.882). The RF model produced an OA of 91.8% and a kappa coefficient of 0.871, and SVM recorded the lowest accuracy among the three machine learning models, with an OA of 90.5% and a kappa coefficient of 0.853. Remote Sens. 2019, 10, x FOR PEER REVIEW 16 of 23 509 Figure 11. Classification accuracies of RF, GBDT and SVM models.

510
In Figure 11, it can be observed that all the three classification methods recorded very high 511 accuracies for winter wheat. The UA of winter wheat with RF reaches 95.1%, and the errors in this 512 class mainly attributed to the commission of small oilseed rape parcels and wastelands that coexist 513 with winter wheat. The PA of winter wheat with RF reaches 99.0%, meaning that almost all winter 514 wheat farmlands were correctly classified. The good classification results of winter wheat are not 515 unconnected to the larger parcels and more regular boundaries of this class, and these shape and 516 textural features aided the recognition of the crop along with spectral and index features. 517 The accuracy of green onion is lower than that of winter wheat. SVM showed its highest accuracy 518 with green onion, with a UA of 90.7% and a PA of 92.5%. This can be ascribed to the capability of 519 SVM in handling small training samples [22]. The RF and GBDT models produced high UA (95.0% 520 and 93.2%, respectively) and low PA (71.7% and 77.4%, respectively). This means that only a few 521 pixels from other classes were misclassified into green onion, but more pixels of green onion were 522 omitted. There are two main reasons for this phenomenon. Firstly, the planting and harvesting time 523 of green onion varies greatly. While some green onion parcels had been harvested in April 29,thereby 524 showing spectra similar to bare land, others were still growing vigorously. Secondly, green onion is 525 mostly planted in the central part of Xinghua (Figure 12), where a different agricultural system is 526 practiced [76]. This area consists of smaller parcels of crops surrounded by rivers and other water 527 bodies (Figure 12c). The small and highly fragmented crop fields explain the obvious pixel impurity 528 (Figure 12b) which results in the relatively low classification accuracy of green onion. In Figure 11, it can be observed that all the three classification methods recorded very high accuracies for winter wheat. The UA of winter wheat with RF reaches 95.1%, and the errors in this class mainly attributed to the commission of small oilseed rape parcels and wastelands that coexist with winter wheat. The PA of winter wheat with RF reaches 99.0%, meaning that almost all winter wheat farmlands were correctly classified. The good classification results of winter wheat are not unconnected to the larger parcels and more regular boundaries of this class, and these shape and textural features aided the recognition of the crop along with spectral and index features.
The accuracy of green onion is lower than that of winter wheat. SVM showed its highest accuracy with green onion, with a UA of 90.7% and a PA of 92.5%. This can be ascribed to the capability of SVM in handling small training samples [22]. The RF and GBDT models produced high UA (95.0% and 93.2%, respectively) and low PA (71.7% and 77.4%, respectively). This means that only a few pixels from other classes were misclassified into green onion, but more pixels of green onion were omitted. There are two main reasons for this phenomenon. Firstly, the planting and harvesting time of green onion varies greatly. While some green onion parcels had been harvested in April 29, thereby showing spectra similar to bare land, others were still growing vigorously. Secondly, green onion is mostly planted in the central part of Xinghua (Figure 12), where a different agricultural system is practiced [76]. This area consists of smaller parcels of crops surrounded by rivers and other water bodies (Figure 12c). The small and highly fragmented crop fields explain the obvious pixel impurity (Figure 12b) which results in the relatively low classification accuracy of green onion.

534
With oilseed rape, all three classification methods showed lower classification accuracies. GBDT 535 obtained the highest accuracy for oilseed rape, with a UA of 82.1% and a PA of 79.7%. There are two 536 main reasons for this. Firstly, apart from the northwestern part of Xinghua where oilseed rape is 537 highly concentrated, the crop is cultivated in relatively smaller parcels within and around winter 538 wheat fields. This makes it difficult to distinguish it from winter wheat. Secondly, in northwestern 539 Xinghua where oilseed is highly concentrated, oilseed rape is interspersed by rivers and other water 540 bodies, and the attendant mixed-pixel effect resulted in a lower accuracy of the oilseed rape class. 541 The computational times of the three classification methods were recorded, and included the 542 training time and prediction time. The results showed that the prediction times of RF, GBDT and 543 SVM are similar, with 4.513 s, 6.849 s and 5.883 s, respectively. However, the training time of SVM is 544 significantly more than that of RF and GBDT, with 910.415 s, 0.988 s and 3.379 s, respectively. The 545 total time consumed by RF and GBDT is in the same order of magnitude, and because the RF 546 algorithm has the advantage of parallel computing, its total time is less than that of GBDT. Due to the 547 complexity of the SVM algorithm, the total computational time of SVM is much more than that of RF 548 and GBDT. With respect to time consumed in feature selection with iEnRFE, SVM utilized more than 549 14 hours, while RF and GBDT only utilized about 11 minutes and 8 minutes, respectively. 550 Considering the performances of the three methods, SVM is extremely time-consuming and has 551 no advantage over the RF and GBDT methods in terms of classification accuracy. This suggests that 552 for crop classification involving a large number of training samples and image features, SVM is highly 553 inefficient. On the other hand, SVM has been known to be more efficient in crop mapping when a 554 smaller number of training data samples is involved [77,78]. The RF and GBDT methods having 555 recorded higher classification efficiencies and accuracies are, therefore, considered to be more 556 suitable for use in crop recognition initiatives involving GEOBIA and multisource remote-sensing 557 data.

558
It is worth noting that although the accuracy of SVM is lower than that of RF and GBDT as 559 observed in this study, it does not suggest that SVM cannot achieve higher accuracies in other 560 mapping scenarios. In fact, through more elaborate and complex works of sample selection, feature 561 selection and parameter optimization, the SVM method can also obtain higher accuracies [79,80]. 562 Besides, SVM has proved more efficient at handling small data samples, which makes it the optimal 563 choice where the acquisition of more field data is constrained by time and resources [77,78,81]. 564

Conclusion 565
This paper presents a complete object-based crop recognition scheme that includes the selection 566 of the optimal SP of multi-resolution segmentation, feature subset and classification method for use 567 in GEOBIA.

568
A comparison of two optimal SP selection algorithms, unsupervised and supervised, indicated 569 that the supervised method proposed in this paper could better adapt to different situations than the 570 With oilseed rape, all three classification methods showed lower classification accuracies. GBDT obtained the highest accuracy for oilseed rape, with a UA of 82.1% and a PA of 79.7%. There are two main reasons for this. Firstly, apart from the northwestern part of Xinghua where oilseed rape is highly concentrated, the crop is cultivated in relatively smaller parcels within and around winter wheat fields. This makes it difficult to distinguish it from winter wheat. Secondly, in northwestern Xinghua where oilseed is highly concentrated, oilseed rape is interspersed by rivers and other water bodies, and the attendant mixed-pixel effect resulted in a lower accuracy of the oilseed rape class.
The computational times of the three classification methods were recorded, and included the training time and prediction time. The results showed that the prediction times of RF, GBDT and SVM are similar, with 4.513 s, 6.849 s and 5.883 s, respectively. However, the training time of SVM is significantly more than that of RF and GBDT, with 910.415 s, 0.988 s and 3.379 s, respectively. The total time consumed by RF and GBDT is in the same order of magnitude, and because the RF algorithm has the advantage of parallel computing, its total time is less than that of GBDT. Due to the complexity of the SVM algorithm, the total computational time of SVM is much more than that of RF and GBDT. With respect to time consumed in feature selection with iEnRFE, SVM utilized more than 14 h, while RF and GBDT only utilized about 11 minutes and 8 minutes, respectively.
Considering the performances of the three methods, SVM is extremely time-consuming and has no advantage over the RF and GBDT methods in terms of classification accuracy. This suggests that for crop classification involving a large number of training samples and image features, SVM is highly inefficient. On the other hand, SVM has been known to be more efficient in crop mapping when a smaller number of training data samples is involved [77,78]. The RF and GBDT methods having recorded higher classification efficiencies and accuracies are, therefore, considered to be more suitable for use in crop recognition initiatives involving GEOBIA and multisource remote-sensing data.
It is worth noting that although the accuracy of SVM is lower than that of RF and GBDT as observed in this study, it does not suggest that SVM cannot achieve higher accuracies in other mapping scenarios. In fact, through more elaborate and complex works of sample selection, feature selection and parameter optimization, the SVM method can also obtain higher accuracies [79,80]. Besides, SVM has proved more efficient at handling small data samples, which makes it the optimal choice where the acquisition of more field data is constrained by time and resources [77,78,81].

Conclusions
This paper presents a complete object-based crop recognition scheme that includes the selection of the optimal SP of multi-resolution segmentation, feature subset and classification method for use in GEOBIA.
A comparison of two optimal SP selection algorithms, unsupervised and supervised, indicated that the supervised method proposed in this paper could better adapt to different situations than the unsupervised method. However, the unsupervised method proved more efficient and automatic as it does not require the acquisition of reference image objects.
Three feature selection methods including RFE, EnRFE and iEnRFE were also evaluated to arrive at the optimal subset. Results indicated that the iEnRFE algorithm proposed in this paper outperformed the preexisting RFE and EnRFE methods in selecting the optimal feature subset. Moreover, three machine-learning methods, including RF, GBDT and SVM, were used for classification, and their parameters were optimized through a grid search method. The overall accuracies of RF, GBDT and SVM were 91.8%, 92.4%, and 90.5%, respectively, indicating the feasibility of machine learning methods in obtaining accurate crop classifications based on GEOBIA and multisource satellite imagery. GBDT obtained the highest overall accuracy while utilizing the least computational time. This suggests that despite its use in crop classification not being as popular as its RF and SVM counterparts, GBDT has demonstrated a better accuracy and efficiency in crop recognition with GEOBIA and multisource satellite imagery. Additionally, despite SVM being more widely used, its huge computational time renders it less efficient especially in crop mapping initiatives that require the use of multiple sources of information and over large area. In this regard, GBDT and RF, which recorded higher accuracies with less computational time, are more suitable.
The complete object-based crop classification scheme, that includes the systematic evaluation of the optimal segmentation scale parameter, feature selection and classification algorithm, presented in this study is envisaged to provide a valuable reference for crop recognition based on GEOBIA and multisource satellite imagery. The methods proposed in this paper could also be applicable to other fields such as land-use/land-cover monitoring and change detection. However, more works are needed to evaluate the feasibility of the proposed supervised optimal SP selection and iEnRFE algorithms over large areas and in different operational scenarios. Meanwhile, the application of evolving artificial intelligence methods in crop recognition with GEOBIA is a promising scientific research direction. For i = 1 to m: Construct a new feature set R i by removing the feature f i from R. Train a model on the samples with the remaining features in R i and calculate the score of S i . Re-rank features in R i based on importance from the lowest to the highest R i = { f 1 , f 2 , f 3 , . . . , f k−1 }. EndFor Step 4: Set S j = max(S i ), and R = R j , k = k−1, append S j to the end of S set .
Step 5: Repeat steps 3~4 until R = ∅ Step 6: Select the feature subset R with the highest score S in S set as the optimal feature subset END.