Optimizing Feature Selection of Individual Crop Types for Improved Crop Mapping

Accurate crop planting area information is of significance for understanding regional food security and agricultural development planning. While increasing numbers of medium resolution satellite imagery and improved classification algorithms have been used for crop mapping, limited efforts have been made in feature selection, despite its vital impacts on crop classification. Furthermore, different crop types have their unique spectral and phenology characteristics; however, the different features of individual crop types have not been well understood and considered in previous studies of crop mapping. Here, we examined an optimized strategy to integrate specific features of individual crop types for mapping an improved crop type layer in the Sanjiang Plain, a new food bowl in China, by using all Sentinel-2 time series images in 2018. First, an automatic spectro-temporal feature selection (ASTFS) method was used to obtain optimal features for individual crops (rice, corn, and soybean), including sorting all features by the global separability indices for each crop and removing redundant features by accuracy changes when adding new features. Second, the ASTFS-based optimized feature sets for individual crops were used to produce three crop probability maps with the Random Forest classifier. Third, the probability maps were then composited into the final crop layer by considering the probability of each crop at every pixel. The resultant crop layer showed an improved accuracy (overall accuracy = 93.94%, Kappa coefficient = 0.92) than the other classifications without such a feature optimizing process. Our results indicate the potential of the ASTFS method for improving regional crop mapping.


Introduction
Information about crop planting area and spatial distribution is of significance for understanding regional crop production and food security [1,2]. Remote sensing has been increasingly used for crop acreage surveys, given increasing data availability and improved spatial, temporal, and spectral resolutions [1,3,4], together with advances in machine learning classifiers [5,6] and sampling technologies [7,8]. While increasing data and emerging algorithms have provided unprecedented opportunities for crop mapping, feature selection deserves more attention, as improved feature selection can certainly contribute to improvement in computing efficiency and map accuracy [9][10][11]. -Unique spectral characteristics in certain phenological phases of crops are critical for the identification of different crops [12][13][14]. Crop mapping based on multi-temporal information outperforms that based on a single temporal image. Multi-temporal remote sensing based on a single spectral feature is mainly to analyze the time series profiles of a certain spectral feature, to find the key phenological phases that are easy to extract specific crops from other crops. Then, thresholds of the spectral feature for distinguishing the specific crops are determined to identify crop types [15][16][17]. Foerster et al. [18] extracted crop planting pattern information in the northeastern part of Germany by constructing a normalized difference vegetation index (NDVI) time series from TM/ETM+ images and analyzing the spectral mean and standard deviation values of various crops in each critical phenological phase. Liu et al. [19] combined NDVI and normalized difference water index (NDWI) to construct a new index, normalized difference vegetation-water index (NDVWI), when studying the distribution of crop types in the Heihe River Basin. Based on the multi-temporal NDVWI images, variation of local crop distribution information was obtained. However, for areas with more complex planting structure, especially in the presence of crops with similar spectral characteristics, it is difficult to extract various crops with a single spectral feature at the same time [20,21]. For example, corn and soybean in many regions tend to have similar spectral and phenological characteristics. In addition, most studies directly input all available time series images in a certain period into the classifier as temporal features. This simple stacking of time series images may mask some subtle but potentially important phenological features. At the same time, direct classification using all temporal features increases computation complexity and reduces classification accuracy, so-called the "curse of dimensionality" [22,23].
Increasing studies focus on the classification of multiple crops by using multi-temporal and multi-spectral features. Zhong et al. [21] extracted spatial distribution information of corn and soybean in Doniphan County, Kansas, USA based on spectral bands, normalized difference tillage index (NDTI), normalized difference senescent vegetation index (NDSVI), and enhanced vegetation index (EVI). Huang et al. [24] extracted soybean, corn, rice, and other main crops in Heilongjiang Province by combining NDVI, EVI, wide dynamic range vegetation index (WDRVI), land surface water index (LSWI), and normalized difference snow index (NDSI) with time series images covering the main growth period of crops. However, spectral features in these studies are mostly selected based on the researchers' empirical knowledge, and these spectral features may not fully reflect the phenological phases of specific crops. Moreover, few automatic spectro-temporal optimization schemes have been proposed to capture the phenological phases of crops comprehensively.
The redundant information in the spectro-temporal feature set may be generated by the high correlation between features, which could not only decrease the classification accuracy, but also increase the computation complexity and time cost [14,25]. In recent years, some studies have focused on spectro-temporal feature selection for land cover classification. Hu et al. [26,27] proposed a phenology-based spectral and temporal feature selection (PSTFS) method, which used MODIS time series images to extract the optimal classification features of corn in Heilongjiang Province, China and produced a corn map in 2011. The PSTFS method balanced feature separability and information redundancy and selected 34 optimal spectro-temporal features for corn identification. Although this method has shown promising crop classification ability, it can only identify a single crop type, rather than obtaining a crop layer with multi-classes. Also, the correlation threshold was set by prior knowledge and could not be directly applied to other areas. Additionally, the spatial resolution of MODIS might not be sufficient to obtain spatial details of agricultural landscape in Northeast China.
This paper proposes an automatic spectro-temporal feature selection (ASTFS) method for improved crop mapping based on Sentinel-2 time series images in 2018 and the Google Earth Engine (GEE) platform. First, we obtained optimal classification feature sets of rice, corn, and soybean, respectively, in Sanjiang Plain; second, probability maps for individual crops (rice, corn, and soybean) were generated and synthesized into the final crop type layer. The final crop type map was compared with two maps without such an optimizing process of feature selection: (1) "Crop layer without feature sorting and optimization" and (2) "crop layer with feature sorting but without optimization". The commonly used confusion matrix approach was used to evaluate the classification accuracy of the three classification maps based on 1996 ground object validation samples, which were kept apart from the training samples to assess accuracy. The ASTFS method was implemented on a local machine. The GEE platform was used for pre-processing Sentinel-2 images and generating the crop type map.

Study Area
The study area, Sanjiang Plain, is located in the eastern part of Heilongjiang Province, which is the most northeastern province of China, and is formed by the alluviation of Heilongjiang River, Songhua River, and Wusuli River. The Sanjiang Plain is centered between about 44 °N and 49 °N and 129 °E and 136 °E (Figure 1). The Sanjiang Plain is an important commodity grain base in China, with an annual output of approximate 15 million tons of grain. The per capita cropland area and per capita grain output are more than four times the national average. The scale of agricultural production here is huge, and the degree of agricultural mechanization is the highest in China. The major crops in the Sanjiang Plain are rice, corn, and soybean. The mixed distribution of the three crops poses a challenge for the precise classification of crops. The plain has an average elevation of 50-60 m, annual average temperature of 1-4 °C, and average temperature in July of 21-22 °C, which is suitable for crop growth [28]. The annual precipitation in Sanjiang Plain is 500-650 mm and the precipitation is mainly concentrated from June to October [29]. The three crops are sown and harvested only once a year in the area due to the limited light time and low accumulated temperature. Crop calendars for the three major crops in the Sanjiang Plain are shown in Figure 2.  Crop calendars for the three major crops (corn, rice, and soybean) in the Sanjiang Plain [30,31]. We divided a month into three parts (e.g., early April (E), middle April (M), and late April (L)). Rice: 1, sowing; 2, seeding/flooding; 3, transplanting; 4, reviving; 5, tillering; 6, booting; 7, heading; 8, milk stage; 9, mature; and 10, harvest; Corn: 1, sowing; 2, seeding/three leaves; 3, seven leaves; 4, stem elongation; 5, heading; 6, milk stage; 7, mature; and 8, harvest; Soybean: 1, sowing; 2, seeding; 3, the third true leaf; 4, flowing; 5, pod setting; 6, mature; and 7, harvest.

Sentinel-2 Data and Derived Spectral Indices
Sentinel-2 satellite data were used in this study. The Sentinel-2 mission consists of two satellites, Sentinel-2A and Sentinel-2B. Both satellites carry MSI (multispectral imager) sensors, which contain 13 multispectral bands. The revisit period for a single satellite is 10 days, and the revisit period for two satellites is 5 days (https://sentinel.esa.int/web/sentinel/missions/sentinel-2). For our study, we collected 2451 Sentinel-2 satellite images from the GEE platform covering the Sanjiang Plain from April to October in 2018. In the other months, the Sanjiang Plain was covered by snow, and those images from November to March were not used in this study [30]. Due to the unavailability of the surface reflectance (SR) data in the GEE platform, we used the top of atmosphere (TOA) reflectance. Although the SR data are more convincing than the TOA reflectance data, the reliability of the TOA data for land cover classification has been proven in previous studies. For example, Emelyanova et al. [32] compared the classification results between using TOA reflectance data and SR data, and found that these two datasets had similar performances. Other studies [33][34][35][36][37][38] also used TOA reflectance data to perform classification and got the same conclusion. Therefore, the classification result could be used to evaluate the performance of the newly proposed feature selection method in the absence of SR data. To minimize the atmospheric effect, we only selected three bands of Sentinel-2, including band 8 (near infrared, NIR), band 11 (short-wave infrared, SWIR), and band 12 (SWIR), with wavelength ranging from 833 to 2202 nm, considering that short wavelength bands are more susceptible to atmosphere [1]. According to the quality assurance band of Sentinel-2, we removed the effects of clouds in the imagery [34]. Then, the median values of the TOA reflectance in every month were calculated to generate monthly time series images for the 11 bands.
Here, eight vegetation indices (NDWI, NDVI, NDTI, NDSVI, NDSI, LSWI, GCVI (green chlorophyll vegetation index), and EVI) were calculated, and three bands (bands 8, 11, and 12) were selected every month, with a total of 77 features in the seven months. These vegetation indices and bands were selected following previous crop mapping studies, which proved those them good at classifying rice, corn, and soybean. Hu et al. [27] extracted the distribution of rice, corn, and soybean in Heilongjiang Province, China, using EVI, LSWI, NDSVI, and NDVI. Wang et al. [39,40] confirmed that the near-infrared band and short-wave infrared band had the potential to improve the identification accuracy of rice, corn, and soybean in Gansu Province, China. Zhong et al. [21] achieved efficient corn and soybean mapping with NDTI, EVI, and NDSVI in Doniphan County, Kansas. We chose the index NDSI because it often snows in Heilongjiang Province. GCVI has proven to reflect the level of chlorophyll content [41]. The NDVI index contributed the majority of the most significant features to crop mapping and the NDWI contributed to improving crop mapping [42]. The calculation formulas, associated properties, and references of each index are shown in Table 1. The central wavelength, bandwidth, and spatial resolution of each band used are shown in Table 2.  The ground truth data used in this study were collected from the Sanjiang Plain from July to August 2018, including rice, corn, soybean, and other land cover types like forest, water, built-up land, and others. The distribution of the sample points is shown in Figure 1. The numbers of sample points of rice, corn, soybean, and others were 1232, 1060, 506, and 1252, respectively. The numbers of training sample for rice, corn, soybean, and others were 612, 568, 256, and 618, respectively. The numbers of validation sample for rice, corn, soybean, and others were 620, 492, 250, and 634, respectively.

The Pairwise Separability Index (SIij)
Somers and Asner [47] proposed the separability index (SI), which is defined as the ratio of interclass spectral variability to intra-class spectral variability. Inter-class variability and intra-class variability are used to measure whether the feature set can distinguish different land use types effectively, and the feature set that makes the most internally consistent of a class and the biggest difference between different classes is the optimal feature set [26]. In addition, the SI does not require that the research object conform to normal distribution and there is no limit on the range, which gives it a wider application scope than the Jeffries-Matusita (JM)distance [48], another commonly used separability measurement. According to the motivation mentioned above, this study used SIij to calculate the degree of separability of the major crops (rice, corn, and soybean) in the Sanjiang Plain. The formula is as follows: (1) p = {EVI, GCVI, LSWI, NDSI, NDSVI, NDTI, NDVI, NDWI, band 8, band 11, band 12} q = {Apr., May, June, July, Aug., Sept., Oct.} where p represents a band or an index (a total of 11 in this study); q refers to a time point (a total of 7 in this study); and represent the mean spectral values of the samples of band or vegetation index p on date q for the class i (for example, rice) and class j (for example, corn), respectively; and and represent the standard deviation of th[e spectral values of the sample points of band or vegetation index p on date q for the class i and the class j, respectively. The absolute value of the difference between and (| − |) can reflect the spectral difference between different classes, the sum of and ( + ) can reflect the degree of dispersion and concentration of the spectrum within class i and class j. Based on Formula (1), it can be seen that larger | − | and smaller ( + ) will produce a higher SIij value. The higher the SIij is, the higher the separability of the two types of ground objects will be. This research calculated the SI of four land cover classes of 11 vegetation indices or bands on seven time periods, with a total of 462 pairwise SIs.

The Global Separability Index (SIglobal)
In this study, six pairwise indices composed of four types of ground objects were firstly calculated. However, SIij can only be used to calculate the pairwise separability between two types of ground objects and cannot reflect the overall separability between the four land cover classes in this study. In order to solve this problem, we extended the SIij to SIglobal [26]. The formula is as follows: where p refers to a band or vegetation index (a total of 11 in this study), q represents a time point (a total of 7 in this study), SIij is the pairwise SI calculated by Formula (1). We chose the average of the SIs of all crop pairs as SIglobal for each feature because it comprehensively reflects the separability of each crop pair. The SIglobal of the 77 features was represented by three two-dimensional matrices of spectral series (eight spectral indices and three spectral bands) and time series (seven time points). The colors within these matrices reflect the overall level of each spectral and temporal feature to differentiate crops. In this study, three SIglobal matrices were calculated for individual crop types (corn, rice, and soybean) in Sanjiang Plain.

Feature Optimization
According to Section 2.3.2, we can achieve a sorted spectro-temporal feature set in which features are sorted from high global separability to low. However, selecting classification features simply with high global separability would not obtain optimal classification results. There may be a high degree of correlation between several features with high global separability, which will limit classification accuracy and increase computational complexity. After removing redundant classification features, we can achieve the same or even higher classification accuracy with fewer classification features and computational cost. Therefore, we adopted the ASTFS method to obtain an optimal feature set by balancing the features' separability and correlation.
A clear and complete workflow of the ASTFS method for selecting the optimal features for crop classification is presented in Figure 3, which shows a feature optimization process for a single crop. In this study, we needed to use the process three times for rice, corn, and soybean, respectively, to obtain individual optimal feature sets. In Figure 3, the "feature set" contains all classification features in order of separability, with a total of 77. The "feature subset" is input into the Random Forest (RF) classifier in every iteration. The "optimal feature set" is the result of feature optimization. (1) First, the SIij values for all pairs of classes are calculated. (2) The SIglobal values for a certain crop are calculated based on SIij from the previous step and a full SIglobal matrix is generated. (3) Generate a sorted feature set based on the SIglobal values. (4) According to the order, add a new feature from the feature set into the feature subset. The feature subset, including the feature with the maximum SIglobal value, is first inputted into the RF classifier and outputs a classification accuracy. (5) If the overall accuracy of this classification is higher than any previous overall accuracies after the addition of a new feature, retain the new feature in the feature subset. Otherwise, remove the new feature from the feature subset. (6) If there are any remaining new features in the feature set, add a new feature from feature set to the feature subset and continue to repeat steps (4) through (5) until all new features in the feature set are inputted into RF classifier. The final optimal feature set is the output of the workflow. Based on this workflow, we obtained three optimal feature sets for rice, corn, and soybean, respectively.

Crop Probability Maps Based on Optimized Feature Selection and Random Forest Classifier
As a natural nonlinear classifier, the Random Forest (RF) classifier uses multiple trees to train and predict samples, first proposed by Breiman [49]. A prominent advantage of the RF classifier is that it is less prone to overfitting and are more tolerant of noise and outliers [50]. The RF classifier mainly includes the following important parameters: (1) The number of trees in the forest. In theory, the more trees there are, the better the classification results, but at the same time, the classification time will increase. Therefore, we need to find a reasonable number of trees. Studies have shown that when the number of trees is more than 400, the OOB (out-of-bag) error of each classification situation tends to be stable [51]. In order to ensure the classification accuracy and reduce the computation time as much as possible, we set the number of trees to 500. (2) The total number of features, which was 77 in this study. (3) The maximum of features allowed in a decision tree in an RF algorithm. Generally, we took the square root of the total number of features as its value. In addition, in order to get better classification results, we set the recursive times to none and bootstrap (whether there is a sample returned) to true. The RF classifier for each crop was trained by using about half of the ground truth samples described in Section 2.2.2.

Final Crop Layer Based on the Combination of Three Crop Probability Maps
According to the obtained individual optimal feature sets of rice, corn, and soybean, we can input them into the RF classifier to make the probability maps. By compositing these three probability maps, we can obtain the final crop type map. Each pixel value in the final crop type map corresponds to one pixel in each of the three probability maps. By comparing the probabilities of the three pixels, if the pixel corresponding to the crop with the maximum probability has a probability value greater than 50%, it will be retained in the final crop type map. Otherwise, the pixel would be marked as "others".

Accuracy Assessment and Comparison with Results from Unoptimized Features
In this study, the confusion matrix was used to validate the accuracy of all classification results based on 1996 ground object validation samples, which were kept apart from the training samples. The overall accuracy, producer's accuracy, and user's accuracy in the confusion matrix are important parameters for the accuracy assessment of classification results. The reliability of classification results is described from three different perspectives, which are most commonly used in land classification studies [52,53].
In order to represent the advantages of the ASTFS method in improving classification accuracy and efficiency, we also compared the classification result using optimal feature sets with two additional classifications using different input features as a part of accuracy assessment. The respective input feature sets of the two classifications are as follows: (1) All of the 77 spectro-temporal features (11 spectral indices or spectral bands multiplied 7 time points) for each of the three crops without feature optimization ("crop layer without feature sorting and optimization"), and (2) the same number of features with maximum global separability values as the number of optimal features ("crop layer with feature sorting but without optimization"). We made accuracy comparisons from two levels, accuracy comparisons for each crop (Section 3.2) and an overall accuracy comparison (Section 3.3) for three crops. The three classification feature sets were inputted into same Random Forest classifiers with the same parameter settings, classified with the same training dataset, and assessed with the same validation dataset. Figure 4 shows the average spectral curves of rice (black solid line), corn (black dotted line), soybean (black dashed line), and other ground objects (colored lines) in different spectral indices or bands, where the horizontal axes represent the months of crop growth in Sanjiang Plain (from April to October), and the vertical axes represent the vegetation indices or bands. Each spectral index corresponds to a certain physical and biological property of vegetation, and their regularity over time reflects the seasonal rhythms of crops [54,55]. It can be seen from Figure 4 that the spectral characteristics of corn, rice, and soybean in different vegetation indices or bands and at different time points are different, which also confirms the necessity of multi-spectral and multi-temporal images for crop identification.

Spectro-Temporal Feature Analyses of Major Crops (Corn, Rice, and Soybean)
As can be seen from Figure 4, there are strong similarities in the vegetation indices or band values of corn and soybean in many spectro-temporal features (for example, EVI from April to July, GCVI from April to August). This is because the corn and soybean in the Sanjiang Plain have similar growth processes and spectral characteristics, which greatly increases the difficulty of separating corn and soybean. However, band 8 in August, band 11 in July and August, and band 12 in July and August show potential to differentiate corn and soybean. Rice fields undergo sowing, flooding, and transplanting stages from about the end of April to the end of May (Figure 2), and the rice fields are covered by water during these three stages. Using this unique characteristic of rice, it is relatively easy to separate rice from other land covers.

Feature Optimization Based on the ASTFS Method
The global separability indices for rice, corn, and soybean are shown in Figure 5 as the three global separability matrices. The horizontal axes represent seven temporal intervals from April to October, and the vertical axes represent 11 spectral indices and bands. The different colors of grids in these matrices represent the different values of the global separability index SIglobal. The colors of these grids vary from blue to red, which correspond to the values of SIglobal from low to high. Through the three global separability matrices in Figure 5, we obtained the importance ranking of all spectrotemporal features. It is clear from the three global separability matrices that the contribution of each spectro-temporal feature in classification is different. The classification accuracy will be reduced when the spectro-temporal feature with poor global separability is involved in the classification, which reflects the importance of feature selection optimization.
In terms of separating rice from other land cover classes, as shown in Figure 5a, the three spectral indices LSWI, NDSI, and NDSVI are the most important spectral features, and the three months April, May, and June are the most important temporal features. In the process of separating corn from other land cover classes, as shown in Figure 5b, the two vegetation indices LSWI and NDTI are the most important spectral features, and the two months May and June are the most important temporal features. In the process of separating soybean from other land cover classes, as shown in Figure 5c, the spectral index LSWI and the two bands 11 and 12 are the most important spectral features, and the three months May, June, and August are the most important temporal features. The importance ranking of all spectro-temporal features for rice, corn, and soybean can be obtained based on the matrices. However, we could not input all available classification features to map crop type because the feature sets contained redundant or correlated information. Therefore, we removed the redundant features to reduce the curse of dimensionality in the classification (Figure 3). Finally, we obtained 22 optimal features for rice, 13 optimal features for corn, and 11 optimal features for soybean (Table 3). , May According to the classification accuracies generated by the feature selection process, classification accuracy curves of individual optimal feature sets for rice, corn, and soybean were plotted (Figure 6a,c,e). In order to compare with the classification accuracy of optimal features, we compared classification accuracy for each crop, and classification accuracy curves of individual unoptimized feature sets (including all features in order of separability) for rice, corn, and soybean were also plotted (Figure 6b,d,f). The horizontal axes of the graphs in Figure 6 represent the number of classification features, while the vertical axes represent the overall accuracy of each classification. From Figure 6a,c,e, we can see that after removing redundant information, the classification accuracy improved as the number of classification features increased. Although we sorted all features according to their global separability from high to low, with the increase in the number of classification features, the classification accuracy did not increase in Figure 6b,d,f, but declined to a certain extent under the trend of overall increase. This is because there is redundant and correlated information between classification features. When information redundancy between features damages accuracy more than its contribution to accuracy, features redundancy is caused, which is called the "curse of dimensionality" [22,23]. The ASTFS method removes the redundant and correlated information contained in the feature set. By using the optimal classification features, the overall accuracies of individual rice, corn, and soybean probability maps were 99.15%, 96.69%, and 97.75%, respectively, which are the maximum in the three results. With the same number of features with maximum SIglobal values as the number of optimal features (the red triangles in Figure 6b,d,f), the classification accuracies only reached 98.25%, 91.33%, and 96.74%. Using all of the features of rice, corn, and soybean, their classification accuracies were 98.85%, 96.29%, and 96.89%. This just confirms the "curse of dimensionality" phenomenon mentioned above: Simply stacking many classification features will not improve the classification accuracy, but will cause information redundancy and reduce the classification accuracy.

Crop Mapping Based on Optimized Feature Selection and Accuracy Assessment
Based on the optimal feature sets for rice, corn, and soybean, as well as the Random Forest classifier, we generated the rice, corn, and soybean probability maps (Figure 7a-c). The final crop layer was generated by combining these three probability maps (Figure 8). Each pixel value in the final crop layer corresponds to the crop type with the maximum probability value. Figure 8 shows that crops are mainly located in the plain areas, excluding Wanda Mountain in the east, Changbai Mountain in the southwest, and Qinghei Mountain in the northwest of the Sanjiang Plain, which is consistent with the actual situation.
In order to evaluate the reliability of the resultant crop layer based on the ASTFS method, we compared it with another two results from (1) all of the 77 spectro-temporal features for each of the three crops without feature optimization ("crop layer without feature sorting and optimization"), and (2) the same number of features with maximum global separability values as the optimal features ("crop layer with feature sorting but without optimization").
The three resultant crop type layers (crop layer with ASTFS, crop layer without feature sorting and optimization, and crop layer with feature sorting but without optimization) were validated by using the same ground truth samples (620 rice samples, 492 corn samples, 250 soybean samples, and 634 others samples). The comparison of overall accuracies and Kappa coefficients of the three crop type layers are shown in Figure 9, and the confusion matrices corresponding to individual classification results are shown in Tables 4-6. We found that the overall accuracy of the crop layer with ASTFS is 93.94% and the Kappa coefficient is 0.92, which is the maximum among the three results. For rice, the producer's accuracy of the crop layer with ASTFS is 98.39% and the user's accuracy is 98.07%, which is the maximum among the three results. Because of its unique phenological periods (e.g., flooding and transplanting stages), rice fields have special spectral characteristics. Therefore, rice is relatively easily distinguished from the three crops. For corn, the producer's accuracy of the crop layer with ASTFS is 93.29% and the user's accuracy is 92.73%, which is the maximum among the three results. For soybean, the producer's accuracy of the crop layer with ASTFS is 82.00% and the user's accuracy is 96.70%, which is the maximum among the three results. The reason why the producer's accuracy of soybean is obviously lower than other accuracies in Table  4 is that some soybean fields are misclassified into corn and others. Soybean and corn have similar spectral characteristics and similar phenological periods, so it is difficult to distinguish between them. In addition, because the other types contain multiple land cover classes (forest, water, buildings, and other ground objects) and heterogeneity, some samples of rice, corn, and soybean are misclassified into others, and some others are also misclassified into the three crops. For others, the producer's accuracy of the crop layer with ASTFS is 94.79% and the user's accuracy is 90.10%, which is also the maximum among the three results. The analyses of classification results show that the ASTFS method is an effective feature selection method.

Different Optimal Features for Identification of Rice, Corn, and Soybean
By using the ASTFS method, higher classification accuracy can be achieved with fewer classification features, also reducing computational complexity and time cost. In this study, we obtained three optimal feature sets for rice, corn, and soybean, which included 22, 13, and 11 features (Table 3), respectively. Specifically, we found that May and June are the most important phases to distinguish rice from other land cover classes, as rice undergoes flooding, transplanting, and reviving stages and the rice fields are covered by water in these periods. The corn and soybean fields are in the sowing stage and seeding stage (Figure 2), with low green vegetation and bare soil background in May and June. Differences in water content between rice fields and other types of farmlands caused their high pairwise separability in LSWI, NDSVI, and NDSI, which are sensitive to soil water content. Corn is also easily identified in May and June as it is in the sowing, seeding, three leaf, and seven leaf stages, while rice is in the flooding, transplanting, and reviving stages and soybean is in the sowing and seeding phases. In addition, the forest has higher vegetation coverage than corn in this period. In addition to May and June, August is another important time point to identify corn and soybean. Due to the similar spectral characteristics and phenological phases of corn and soybean, it is challenging to distinguish them [56,57]. According to Figure 2, corn enters the mature stage before soybean in August, which is an important phenological characteristic to distinguish corn from soybean. In the optimal features of soybean and corn, the short-wave infrared band in August is an important feature, which agrees with a previous study [40].

Implications of the ASTFS Method for Land Cover Classification
In this study, we proposed the ASTFS method and applied it to the Sanjiang Plain (about 108,900 km 2 ) in the northeast of Heilongjiang Province of China. The ASTFS method well balanced the relationship between spectral separability and information redundancy and generated high classification accuracy in the resultant crop map. In this study, the resultant crop layer maps using optimal feature sets of rice, corn, and soybean are better than that using the same number of features with maximum SIglobal, as well as that using all the 77 spectro-temporal features. The results indicate that there is redundant information that not only fails to improve the classification accuracy, but also increases the computational complexity and time cost, suggesting the importance of feature selection. The ASTFS method proposed in this study can be extended and tested in a larger area, such as Northeast China and monsoon Asia, and also other sensors like Landsat and MODIS [55,[58][59][60]. The improved regional crop type maps can provide important input data for crop growth monitoring and crop yield forecasting [61][62][63].
Although the ASTFS method was used to obtain the optimal feature sets of rice, corn, and soybean and good classification results were obtained, the method could be improved from the four aspects. First, this study used SIij when calculating pairwise separability. Other indicators could be used to quantify the inter-class separability as well. Second, when calculating SIglobal, this study adopted the average values of the pairwise separability indices as SIglobal for each feature. However, other means of calculating SIglobal, such as the minimum or maximum of the SIij, the weighted average of the SIij, and the average of normalized SIij, could be tested to improve the ASTFS method in the future. Third, ASTFS is currently limited to dealing with a single cropping and does not consider the identification of double or triple cropping. In the future, the ASTFS method could consider integrating the phase segmentation strategy [64] and more phenological metrics [65] to solve the crop identification problem of double or triple cropping. Finally, due to the unavailability of Sentinel-2 surface reflectance (SR) data in China before 2019 in the GEE platform, we used top of atmosphere (TOA) reflectance in this study. In the future, with the availability of SR data in China and continuous improvements (e.g., inclusion of the atmospheric correction tool for Sentinel-2) of the GEE platform, the results of crop classification and crop maps could be further improved.

Conclusions
While increasing numbers of medium resolution satellite imagery and improved machine learning algorithms become available, feature selection has rarely been investigated systematically for crop mapping. In this study, we proposed the automatic spectro-temporal feature selection (ASTFS) method for optimal feature selection, targeting an improved crop mapping strategy. The optimal classification feature sets of rice, corn, and soybean were obtained based on the ASTFS method, and then used for crop type layer mapping in the Sanjiang Plain of Northeast China. We found that when rice was in the flooding and transplanting stage in May, it was easy to separate from other land cover classes. The important phenological phases of corn are the seeding stage, three leaves stage, seven leaves stage, and the mature stage, and the corresponding months are May, June, and August. In particular, the mature stage in August is an important phenological phase for distinguishing corn and soybean. The short-wave infrared band plays an important role in distinguishing corn and soybean. These findings are specific for the Sanjiang Plain and may be transferable to other regions with different cropping calendars. The probability maps for the three crop types were generated based on the individual optimal feature sets of rice, corn, and soybean, and then these three maps were composited to generate the final crop type layer in the Sanjiang Plain according to their probability values. The crop type layer based on the ASTFS method showed a higher accuracy (overall accuracy (OA) = 93.94%, Kappa coefficient = 0.92) than the other two crop type maps: (1) Crop layer with feature sorting but without optimization (OA = 89.83%, and Kappa = 0.86), and (2) crop layer without feature sorting and optimization (OA = 92.89%, and Kappa = 0.90). With the same number of features, the classification accuracy of the crop layer with the ASTFS method is obviously better than the crop layer with feature sorting but without optimization. Although the classification accuracy of the crop layer with the ASTFS method is slightly better than the crop layer without feature sorting and optimization, the latter uses all features, which would greatly increase computational complexity and time cost. Through comparison, we confirmed that the ASTFS method is an effective feature optimization method for the extraction of rice, corn, and soybean in the Sanjiang Plain. The encouraging results from this study indicate that the ASTFS method has the potential to be used for the classification of crops or general land cover types and can be extensively applied for larger areas and using other sensors.