Superpixel-Based Regional-Scale Grassland Community Classiﬁcation Using Genetic Programming with Sentinel-1 SAR and Sentinel-2 Multispectral Images

: Grasslands are one of the most important terrestrial ecosystems on the planet and have signiﬁcant economic and ecological value. Accurate and rapid discrimination of grassland communities is critical to the conservation and utilization of grassland resources. Previous studies that explored grassland communities were mainly based on ﬁeld surveys or airborne hyperspectral and high-resolution imagery. Limited by workload and cost, these methods are typically suitable for small areas. Spaceborne mid-resolution RS images (e.g., Sentinel, Landsat) have been widely used for large-scale vegetation observations owing to their large swath width. However, there still keep challenges in accurately distinguishing between different grassland communities using these images because of the strong spectral similarity of different communities and the suboptimal performance of models used for classiﬁcation. To address this issue, this paper proposed a superpixel-based grassland community classiﬁcation method using Genetic Programming (GP)-optimized classiﬁcation model with Sentinel-2 multispectral bands, their derived vegetation indices (VIs) and textural features, and Sentinel-1 Synthetic Aperture Radar (SAR) bands and the derived textural features. The proposed method was evaluated in the Siziwang grassland of China. Our results showed that the addition of VIs and textures, as well as the use of GP-optimized classiﬁcation models, can signiﬁcantly contribute to distinguishing grassland communities, and the proposed approach classiﬁed the seven communities in Siziwang grassland with an overall accuracy of 84.21% and a kappa coefﬁcient of 0.81. We concluded that the classiﬁcation method proposed in this paper is capable of distinguishing grassland communities with high accuracy at a regional scale.


Introduction
As the largest terrestrial ecosystem on earth, grasslands play a crucial role in regulating climate, conserving water, protecting biodiversity, and promoting livestock development [1,2]. Grassland communities are considered the fundamental unit of grassland ecosystems [3]. Accurate classification of grassland communities is important for humans to understand and study grassland areas, and provides an important basis for rational use, effective conservation, and sustainable development [4]. Field surveys are a reliable way to classify grasslands. Researchers can obtain accurate information on the distribution of different grasslands through sampling and obtaining field records. However, this method is costly and time-consuming when applied either repetitively or in large landscapes [5]. model are better than that of common classifiers; (3) map the distribution of Siziwang grassland communities; (4) evaluate the universality of the proposed method. We expected that the method proposed in this paper can achieve high accuracy classification of grassland communities based on spaceborne mid-resolution RS data at a regional scale with high universality.

Study Area
Siziwang Banner (Figure 1) is part of Ulanqab city in Inner Mongolia, China. It is located in the central part of Inner Mongolia, at a latitude/longitude of 41.17°to 43.37°N, 110.33°to 113°E, and covers an area of 24 036 km 2 [34]. The region is situated at the northern foot of Daqing Mountain, with an altitude of 1000 m to 2100 m, and the entire terrain is inclined from southeast to northwest. Siziwang Banner is characterized by a midtemperate continental monsoon climatic zone, with an average annual rainfall of 310.2 mm, and 70% of the yearly precipitation occurs mainly in July, August, and September. The average annual temperature of the region is 3.8°C with the lowest temperature recorded in January (−17.2°C) and the highest temperature recorded in July (20.7°C) [35]. Occupying approximately 80% of the total area of Siziwang, Siziwang grassland is an important part of the grasslands in northern China and the Eurasian grassland [34]. The main communities here (Table 1)

Image Preprocessing
Sentinel-1 and Sentinel-2 are sensors developed by the European Space Agency (ESA) Copernicus for earth observation. The sensors are commonly used in vegetation studies owing to the high data quality and availability [4,38]. The Sentinel images used in this study ( Table 2) were retrieved from Sentinels Scientific Data Hub (https://scihub.copernicus.eu/, accessed on 1 August 2020), resampled to 10 m using bilinear interpolation, geographically registered, and subset into the study area. Sentinel-1 carries a C-band SAR with a 6-day repeat cycle [39]. Sentinel-1 data used in this study were Level-1 Ground Range Detected (GRD) images in Interferometric Wide Swath (IW) mode at VV (vertical transmit and vertical receive) and VH (vertical transmit and horizontal receive) polarizations with 250 km swath width and 5 m × 20 m spatial resolution [40]. The preprocessing of Sentinel-1 GRD data included applying orbit file, border and thermal noise removal, radiometric calibration, speckle filtering, range doppler terrain correction, and geocoding [41]. Preprocessing was conducted using an open-source SNAP software (version 8.0.0) (https://step.esa.int/main/toolboxes/snap/, accessed on 7 May 2020) and obtained higher quality GRD data for the subsequent experiments.

Sentinel-2 Data
Sentinel-2 comprises two satellites with a revisit period of 5 d and a swath width of 290 km. Sentinel-2 Multispectral Instrument (MSI) covers 13 spectral bands from visible to short-wave infrared (SWIR) ( Table 3), including 4 bands in the red, green, blue, and near-infrared (NIR) region with a spatial resolution of up to 10 m, which is currently the highest multispectral data freely available [42]. Moreover, Sentinel-2 Level-2A data has been orthographic, atmospheric, geometric corrected [43]. Table 3. Sentinel-2 band information [44].

Ground Truth Data Acquisition
The field survey was conducted in August 2019. It should be noted that the proposed approach used superpixels as the basic unit of classification, implying that the samples used for classification were obtained by assigning observations of the community categories to the superpixels where the sampling points were located. Therefore, field sampling was conducted in patches with a homogeneous grassland community, and the locations of field points were recorded by GPS. In addition to field sampling, more samples were selected for classification based on previous studies [36,45]. A total of 378 field sites with RES (45 sites), STC (41 sites), STT (67 sites), ARF (30 sites), STB (44 sites), STS (72 sites), and ACS (79 sites) were included in the current study. Of these, 70% were used for training the classification model and 30% were used for testing the classification accuracy. Considering no significant changes in the extend of the grassland in Siziwang Banner between 2019 to 2020 [34,37], we masked off non-grassland areas to eliminate their interference of the classification results using the 10 m resolution Siziwang Banner land cover map for 2020 [37].

Methods
The flow chart of the proposed classification method of grassland community is summarized in Figure 2. The method consists of four parts, including the part of image processing which is introduced in Section 2.2, and the other three parts are discussed in detail in this section.

Watershed-Based Superpixel Segmentation
The watershed algorithm is a segmentation method based on analysis of geomorphology and is widely used in RS image processing [46,47]. The algorithm achieves image segmentation by connecting pixels with similar features (usually refers to gray values) to each other in spatial location thus forming a closed contour. Specifically, at first, the gray value of all pixels in an image is extracted and a distance threshold is set. Then, taking the pixel with the smallest gray value as the initiation point, the horizontal plane (i.e., the image gray level) raises from the minimum gray value. When the horizontal plane reaches the neighboring pixels, the horizontal distance from these pixels to the initiation point is calculated, and if it is less than the threshold distance, these pixels are flooded (implying that they are included as pixels inside the segmented object), otherwise dams (i.e., watersheds) are set on these pixels to segment these neighboring pixels. Increase in the horizontal plane then more dams are set, and when the horizontal plane reaches the maximum of gray value, these dams complete the segmentation of the whole image [48]. The process of watershed segmentation is presented in Figure 3. It is worth noting that setting of the distance threshold has a significant effect on the results of segmentation and classification [49]. A large threshold leads to inclusion of heterogeneous pixels within the segmented object, whereas a small threshold results in inadequate image segmentation. The stepwise evolution analysis (SEA) method proposed by Hu et al. (2017) [50] was used in the current study to obtain the optimal segmentation scale. The first step of SEA is to construct the scale set model [51] that records the image segmentation results at each scale on the already segmented images. Concretely, the neighboring segments are merged pairwise in descending order of dissimilarity, and binary segmentation trees [52] are used to record the new segments created during the merging process and the hierarchical relationships between all child segments and parent segments. When the merging is completed, the scale set model of the image is built. An example of a scale set model is shown in Figure 4. The larger the segmentation scale is, the more under-segmentation exists; conversely, the more over-segmentation exists. Subsequently, SEA solves for the optimal segmentation scale by evaluating the risk of over-and undersegmentation at each scale using the minimum risk Bayesian decision algorithm [53]. The performance of this method has been confirmed in several studies [54][55][56]. It is worth noting that since speckle noise in SAR data would interfere with the performance of segmentation [57,58], we only performed segmentation on multispectral data. Then, the segmented vector layer was applied to SAR data to segment it.

Feature Extraction and Selection
Four categories of features derived from Sentinel-1 and -2 were utilized in this study, including spectral information, VIs, textural features, and backscatter information (Table 4). VIs included NDVI, Enhanced Vegetation Index (EVI), Simple Ratio Index (SR), and Red Edge Normalized Difference Vegetation Index (NDV I 705 ). The textural features used in this study were derived from the gray-level co-occurrence matrix (GLCM), and seven GLCM indicators of two backscatter coefficients and eight spectral bands were calculated with a window size of 9 × 9, which has been reported by several studies as suitable for extracting textures from Sentinel images [38,59]. The abovementioned features were calculated from SNAP software at the pixel level. For each superpixel, the mean and standard deviation of these features were computed.  [39] ρ nir , ρ red , ρ blue , ρ 705 , and ρ 750 represent NIR, red, blue, red edge 1, and red edge 2 band of Sentinel-2, respectively.
Notably, not all the extracted features contribute to improving the classification accuracy [64]. Therefore, several feature selection algorithms have been proposed to eliminate the effects of noisy data on classification results. The Recursive Feature Elimination with Cross-Validation (RFECV) algorithm is widely used in image analysis for automatic selection of the optimal feature subset without human intervention [18]. RFECV first ranks the features in order of importance and then selects the optimal feature subset by crossvalidation [65]. The above two processes are specifically as follows.
1. N features are fed into a classifier, and importance of each feature is calculated; 2. The feature with the lowest importance is removed from the current feature set, and the other features are input into the classifier again to calculate importance of each feature; 3.
Step 2 is repeated until the feature set was empty; 4. All features are sorted by decreasing order of importance, and a threshold is selected.
The features with importance greater than this threshold are then retained.
In previous studies, the threshold was usually determined by repeated experiments [18]. To capture the optimal feature subset automatically, the RFE with cross-validation (RFECV) algorithm was employed in this study. RFECV used the classifier in RFE to calculate the validation error of all feature subsets (2 n − 1) consisting of n features, and the number of features in the subset with the lowest average validation error was the optimal number of features. The optimal features were then selected based on the ranking obtained by RFE [66]. Since RF excels in feature selection and ranking [67], it was chosen as the classifier of RFE (hereafter RF_RFE) in this study.

Classification Selection and Hyperparameter Optimization Based on GP Algorithm
The GP algorithm is a search and optimization technique that simulates the process of Darwinian biological evolution [68]. It expresses the feasible solutions for the problem by using individuals. The initial population evolves to the optimal individual tree, i.e., the optimal solution, for the solution problem after genetic operations such as replication, crossover, and mutation, guided by the fitness function.

Individual Tree
GP is an evolutionary algorithm, which inherits the Genetic algorithm's (GA) idea of breeding offspring from parents by selection. However, unlike the traditional coding (fixedlength gene) model of GA, individuals in GP are represented in a hierarchical structure instead of a string, most often in a tree structure [69].
The individual tree comprises the terminal set (TS) and function set (FS), whereby TS and FS hold the input variables and the functions that perform operations on the input variables, respectively. Figure 5 shows an individual tree expressing (X − Y) + 3, where the functions (+, −) on the internal nodes and the variables (X, Y, 3) on the leaf nodes are generated from FS and TS, respectively. In this study, GP was performed using the scikit-learn, xgboost, TPOT, and DEAP packages in Python. TS comprises the superpixels awaiting classification and samples.

Genetic Operator
GP contains three genetic operators: replication, crossover, and mutation. The structure of individual trees can change when new trees are generated owing to the genetic operators.

•
The replication operator selects a few individuals in the current population according to certain rules and retains them directly to the next generation. • The crossover operator randomly selects two individuals as parents from the current population. A node is then randomly selected as the crossover point in each parent individual, and the part below this node represents the segment to be exchanged (called the crossover segment). Offspring individuals are generated by swapping the crossing segments of parent individuals. The crossover process of individual trees (X − Y) + 3 and (9 + 4) + (X ÷ Y) is presented in Figure 6. • The mutation operator randomly selects a node in a parent individual as a mutation point and replaces the subtree below the mutation point with a randomly generated individual tree. Figure 7 illustrates the mutation process of the individual tree (X − Y) + 3.

Fitness Function
After individuals are generated, it is necessary to evaluate their level of fitness to the environment. Those with high fitness are directly retained as the next generation or used for performing crossover or mutation operations to generate new individuals, which will improve the fitness of the next generation. In the current work, classification accuracy was adopted as the fitness function.

Flow of the GP Algorithm
GP algorithm first constructs several individual trees to form the initial population and then iterates over these trees. After each iteration, the algorithm calculates the fitness of individual trees and determines whether the iteration termination condition is satisfied. The iteration ends if the condition is satisfied. Otherwise, replication, crossover, and mutation operations are performed to generate new individuals to form the next generation of populations for a new iteration. The flowchart of GP algorithm is shown in Figure 8. In this study, the setting of GP parameters follows a standard GP process [70]. GP first generated 100 individual trees and evaluated their classification accuracy. Then, the top 20% of individuals were selected using the tournament selection method [71], following the criterion of high classification accuracy and few internal nodes. Specifically, first, three individuals were randomly selected from the population, followed by elimination of the individual with the lowest classification accuracy, and then the one with fewer internal nodes was selected from the remaining two individuals and replicated to the next generation population. This procedure was repeated until the number of selected individuals reached 20% of the total population. To creat the rest of individuals of the new population, each selected individuals was replicated five times and subjected to the genetic operation with a 5% crossover rate and a 90% mutation rate to generate new offspring for the next iteration. After each iteration, the individual with the highest classification accuracy was stored, and the current one would be replaced if a higher accuracy was found in the later iterations. The iteration was terminated when 100 iterations were completed (i.e., the generation reached 100), and the individual with the highest classification accuracy recorded in the iteration was selected as the optimal classification model [72].

Segmentation and Classification Evaluation
The Overall Goodness F-measure (OGF) method was used to measure the segmentation performance in this study. The value of OGF ranged from 0 to 1. The segmentation scale corresponding to the maximum OGF value was used as the optimal scale [73]. OGF was calculated as follows: where MI norm and LV norm represent normalized [0,1] measures of Moran's I (MI) [74] and local variance (LV) [75]. α is a parameter used to adjust the relative weights of MI norm and LV norm in Equation (1). α = 1, >1, and <1 indicate equal weights for MI norm and LV norm , higher weights for LV norm , and higher weights for MI norm , respectively. Usually, the optimal segmentation scale is obtained when these two indicators are balanced, therefore, α was set to 1 in this study [55]. LV was calculated as follows: where n w and n h represent the width and height of the image, N is the number of segments, a i and σ i represent the area and standard spectral deviation of the ith segment. A lower LV value indicates a better intra-segment homogeneity [76]. MI was calculated as shown below: where N represents the number of segments, x i is the mean value of spectral reflectance of ith segment, andx is the mean value of spectral reflectance of the whole image. w i,j is used to measure the spatial adjacency of segment i and j. If segment i and j are adjacent, w i,j = 1, otherwise 0. A lower MI value means a higher inter-segment heterogeneity [77]. The current study determined the classification accuracy using user's accuracy (UA), producer's accuracy (PA), overall accuracy (OA), and Kappa coefficient (Kappa). PA and UA were derived from the confusion matrix and were used for determination of the classification effect of each class, whereas OA and kappa were used to evaluate the overall classification results.

Segmentation Performance Evaluation
In this study, we performed superpixel segmentation on Sentinel-2 multispectral images based on SEA. To test the segmentation effect of the images under the SEA work, we calculated the OGF values corresponding to the segmentation scales from 0 to 1000. As shown in Figure 9a,b, the OGF curve initially increased and then gradually decreased, and the maximum value was achieved when the scale was close to 180. The optimal scale obtained by the SEA method was 177, and its corresponding OGF value was very close to the peak of the OGF curve. To visually examine the performance of segmentation based on SEA, we compared the segmentation results of three sub-regions within the study area (Figure 8c) at scales larger and smaller than 177 with the OGF value of 0.67 (i.e., 155 and 221) with that of scale 177. As shown in Figure 8d, at 155 scale, there are several fragmented spots in the image due to over-segmentation (yellow markers); at scale 221, there are obvious unsegmented objects (red markers); and at 177 scale, the image visually maintains a relative balance between intra-segmentation homogeneity and intersegmentation heterogeneity, with few over-and under-segmentation. The results indicated the effectiveness of applying SEA method to achieve accurate segmentation of the images in this study.

Feature Selection Result
By using RF_RFE, 67 out of 168 features (hereafter MSVT) in Section 3.2 were retained ( Table 5). The more features derived by mean (hereafter MF) were retained compared with those obtained through standard deviation (hereafter SDF), giving a total of 37 and 30 features, respectively. Specifically, for spectral information, the findings for spectral information showed that all spectral bands of MF were retained, whereas only the red and NIR bands were retained in the SDF. This indicated that the spectral information of the superpixels extracted by the mean value was more effective than the standard deviation in this study. For VIs, NDVI, SR, and NDV I 705 were retained in both MF and SDF for VIs, whereas EVI was excluded. For the textural features, MF extracted from SAR images were more retained than those of multispectral images, while the opposite is true for SDF. And for the backscattering information, both MF and SDF of the VV and VH polarization backscattering coefficient were preserved. To explore the effects of textural features and VIs on classification accuracy of grassland communities in the subsequent experiments, we performed feature selection for the dataset only containing Sentinel-2 multispectral and Sentinel-1 SAR bands, and the results (hereafter MS) are shown in Table 6.

Classification Result Assessment
The optimal classification model for the MSVT dataset generated by GP is shown in Figure 10. The model comprises a primary classifier Linear Support Vector Machine (LinearSVC) and a secondary classifier Extremely Randomized Tree (ET) and uses the stacking strategy [78] to connect them. During the classification process, LinearSVC was first trained with initial samples, and then its training results were used to train ET together with the initial samples. Finally, the final classification results were output by ET. To evaluate the effectiveness of this fusion model, we classified grassland communities using LinearSVC and ET separately with the same dataset (the hyperparameters of the classifiers remained unchanged) and compared the results with that of the fusion model (Tables 7 and 8). The results of Experiment 1, 2, and 3 showed that the classification accuracies of LinearSVC and ET were similar (76.32% to 74.68%), but both were lower than the accuracy of the fusion model (84.21%).
Three contrast experiments were conducted to validate whether the usage of textural features, VIs, and GP-optimized classification models were able to improve the classification accuracy of grassland community (Table 9). In Experiment 6, the optimal classification model obtained by GP was the Gradient Boosting Decision Tree (GBDT). The findings showed that the fusion model had the highest classification accuracy for experiments using the MSVT dataset, and there was no significant difference between the results obtained from the three single classifier-based experiments. The classification accuracy of SVM commonly used in the classification of grassland communities was between that obtained using ET and LinearSVC. And experiments using the MS dataset showed that the classification accuracy of the model obtained by GP (i.e., GBDT) was higher than that of SVM by about 13%. In addition, the accuracy of the classification results using the MSVT dataset was significantly higher than those using MS under the same conditions. The classification accuracy of experiment 1 was 24.56% higher compared with that of experiment 6, and the classification accuracy of experiment 4 was 28.95% higher compared with that of experiment 5.  Table 9. Optimization results obtained in experiments 4, 5 and 6. The results from Experiment 1 were used for mapping the Siziwang grassland community. As shown in Figure 11

The Effect of Input Variables on Classification Accuracy
In this study, we used Sentinel-2 multispectral and Sentinel-1 SAR imagery with large swath width to accommodate regional-scale studies. However, due to the spectral and spatial resolutions of satellite-based RS data are generally low, the phenomenon of "the same object with different spectrum" and "the different object with same spectrum" often occurs when observing grassland communities on a large scale [22,38], thus reducing the accuracy of classification.
To enhance accuracy of classification, textures and VIs were incorporated with regular multispectral and SAR bands for classification. According to the results, the classification accuracy was significantly improved by adding textures and VIs compared to using only regular multispectral and SAR bands, either using the random search or GP-optimized classification model. Taking the SVM-based classification results as an example, the SVM classifier using the MS dataset even after optimization showed limited ability to discriminate among the seven grassland communities in the study area (OA 46%) and showed no ability at all to discriminate against the three communities of STC, STB, and ARF. On the contrary, the SVM using the MSVT dataset showed significant discrimination ability of the three communities, and the OA increased by about 29%.
The above results emphasize the role of textures and VIs extracted from spaceborne RS data in the classification of grassland communities. Yet, current studies suggest that there is still potential for multispectral and SAR imagery to improve the classification accuracy of grassland communities in addition to deriving textures and VIs. On the one hand, given the possible differences in the growth rhythms of different grassland communities [79], the involvement of phenological differences in the growth process of different communities derived from the time-series multispectral data may improve the classification accuracy [80]. On the other hand, the dual-polarization approach of Sentinel-1 limits the extraction of polarization features [41]. If expensive full-polarization SAR images, such as RADARSAT-2 and GaoFen-3, are available in the study, rich polarization features of grasses can be extracted by polarization decomposition methods (e.g., Cloude, Krogager), which can provide more references for classification [15,81].

The Effect of Classification Model on Classification Accuracy
In addition to using more features, it is essential to optimize the classification model based on the extracted feature set, thus improving the classification accuracy of grassland communities [82]. Therefore, in the current study, the Genetic Programming algorithm was used on the derived feature set to obtain the optimal classification model for that feature set. And our results indicated that both using the MS and MSVT datasets, the classification models optimized using GP yielded more accurate results than SVM, which is commonly used in grassland classification studies [63]. In particular, when using MSVT, GP generated a fusion model consisting of LinearSVC and ET by its easily scalable tree structure [68], with significantly improved classification results compared to single classifiers.
These findings indicate that the classification accuracy of grassland communities can be significantly enhanced by GP-optimized classification models when using the same dataset. Moreover, GP can combine different classifiers using the structural characteristics of individual trees to form complex classification models compared with algorithms such as randomized search and grid search, which can only optimize individual classifiers [67]. And when faced with difficult problems, these fusion models that can achieve complementary advantages among classifiers are reported to perform better compared with single classifiers when solving difficult problems [29].
The above classification models are all based on machine learning algorithms. Currently, deep learning algorithms (DL), represented by convolutional neural networks (CNN), Sparse Coding, and Deep Belief Network (DBN), are gradually being used in RS vegetation classification owing to their ability of deep feature mining [83]. DL with deep network structures allows end-to-end learning thus it can extract deep characteristics of vegetation from RS images without human intervention [84]. We intend to explore the GP-optimized DL model for grassland community classification and expect to improve the classification accuracy further.

The Universality of the Proposed Method
Universality is an important consideration in evaluating the usefulness of vegetation classification methods. A classification method with high universality means that the method obtained in the current study area can be applied to other areas and still achieve promising results. Therefore, it has a higher application value than the methods with low universality [85,86]. Three key parts of the method that significantly affect the final classification results include segmenting the image at the optimal scale, selecting the optimal subset of features, and generating the optimal classification model. The optimal solutions of the three parts were all determined automatically using the optimization algorithms, thus reducing manual intervention and improves universality of this approach. When this approach is adopted in other study areas, we suppose that it could still perform well owing to its ability to automatically determine the optimal solutions based on new images in each step.

The Future Work
Benefiting from the large swath width and free of charge of the spaceborne midresolution RS images, this research achieved regional scale grassland community classification at low cost. In future studies, we intend to use the proposed method to classify grassland communities at national scales and larger scales, considering that there are few classification products on grassland communities at these scales [15]. However, researching such a large scale requiring a large number of RS images, so we plan to do this work on a cloud platform (e.g., Google Earth Engine [87]) to improve efficiency. With the superior capabilities of the cloud server, we can quickly preprocess and analyze the imagery, which greatly reduces the cost of the work. Meanwhile, we intend to explore DL techniques to build classification models and optimize the structure of classification models using GP algorithms in anticipation of better classification results.

Conclusions
In the past, expensive hyperspectral and high-resolution RS images were the major data sources for grassland classification at community level. However, these images are only suitable for small-scale studies due to their small swath width and high prices. For large-scale studies, spaceborne mid-resolution RS images with swath widths of several hundred kilometers are more practical. However, due to the limitation of data quality, it is difficult to distinguish different types of grassland communities using only raw images. To enhance the accuracy of classification using these images, in this study, we proposed a regional-scale superpixel-based grassland communities classification approach using the GP-optimized classification model with Sentinel-2 multispectral bands, their derived VIs and textures, and Sentinel-1 SAR bands and their derived textures. The method was tested in Siziwang grassland of China and achieved an accurate classification of the seven communities with an overall accuracy of 84.21% and a kappa coefficient of 0.81. Our results showed that the addition of VIs and textures, as well as the use of GP-optimized classification models, contribute significantly to the classification accuracy of grassland communities. In addition, the proposed method obtains the optimal segmentation scale, the optimal feature subset, and the optimal classification model by using optimization algorithms instead of manual experiments, which makes it more universal, and thus has a higher application value. This research implies the potential of using VIs and textures extracted from multispectral and SAR imagery, and GP-optimized classification models to improve the classification of grassland communities, also provides a reference for the classification of other vegetation communities over large areas using spaceborne midresolution RS images.