Feature Comparison and Optimization for 30M Winter Wheat Mapping Based on Landsat-8 and Sentinel-2 Data Using Random Forest Algorithm

Winter wheat cropland is one of the most important agricultural land-cover types affected by the global climate and human activity. Mapping 30-m winter wheat cropland can provide beneficial reference information that is necessary for understanding food security. To date, machine learning algorithms have become an effective tool for the rapid identification of winter wheat at regional scales. Algorithm implementation is based on constructing and selecting many features, which makes feature set optimization an important issue worthy of discussion. In this study, the accurate mapping of winter wheat at 30-m resolution was realized using Landsat-8 Operational Land Imager (OLI), Sentinel-2 Multispectral Imager (MSI) data, and a random forest algorithm. This paper also discusses the optimal combination of features suitable for cropland extraction. The results revealed that: (1) the random forest algorithm provided robust performance using multi-features (MFs), multi-feature subsets (MFSs), and multi-patterns (MPs) as input parameters. Moreover, the highest accuracy (94%) for winter wheat extraction occurred in three zones, including: pure farmland, urban mixed areas, and forest areas. (2) Spectral reflectance and the crop growth period were the most essential features for crop extraction. The MFSs combined with the three to four feature types enabled the high-precision extraction of 30-m winter wheat plots. (3) The extraction accuracy of winter wheat in three zones with multiple geographical environments was affected by certain dominant features, including spectral bands (B), spectral indices (S), and time-phase characteristics (D). Therefore, we can improve the winter wheat mapping accuracy of the three regional types by improving the spectral resolution, constructing effective spectral indices, and enriching vegetation information. The results of this paper can help effectively construct feature sets using the random forest algorithm, thus simplifying the feature construction workload and ensuring high-precision extraction results in future winter wheat mapping research.


Introduction
Land-use mapping is an important subject in the study of surface eco-physics, including vegetation, soil, buildings, water, and other surface elements.Among them, vegetation is the most sensitive to perceiving surface climate change and physical and chemical states.Moreover, vegetation is most closely related to global and regional food security, planting intensity, crop yields, and other Sustainable Development Goals.Future climate change has increased the likelihood of severe, pervasive, and irreversible consequences for human civilization and agriculture [1].In this context, the rapid and accurate mapping of crops has gradually become an important means for monitoring and evaluating agricultural development.Presently, regional and global remote sensing data products for crop and land-cover classification are becoming increasingly abundant (Table 1).Low-resolution data is provided by the Land Use and Global Environment Laboratory (LUGE) of McGill University.The data is produced by the Center for Sustainable Development and Global Environment (SAGE) at the Nelson Institute for Environmental Studies at the University of Wisconsin.The dataset includes the global distribution of three major crops (maize, wheat, and rice).It also has a spatial resolution of 5 × 5 (10 km) and resampling to a grid of 0.5 • × 0.5 • (http://www.earthstat.org/).There are also three sets of global irrigation area maps.The first is a satellite sensor-based Global Irrigated Area Map that was released by the International Water Management Institute (IWMI) and includes a 10-km irrigation and farmland global land-use/cover (LULC) map (http://waterdata.iwmi.org/Applications/GIAM2000/).The second is a regional low-resolution 500-m irrigation area map for South Asia (http://waterdata.iwmi.org/).Lastly, the dataset contains regional medium-resolution 30-m irrigation maps of the Syr Darya River Basin and the Krishna River Basin in India (http://www.iwmi.cgiar.org/).Other related products include a United States Geological Survey (USGS) interactive map of 30-m global farmland based on Landsat Global Food Security-Support Analysis Data (GFSAD30).The interactive map also includes a 30-m resolution irrigated farmland and rain-fed farmland distribution layer for Australia [2], South Asia, Iran, Afghanistan [3], and other countries.A 250-m resolution crop distribution layer is also available for the United States, Africa [4], Australia [5], South Asia, and other regions, and has also been applied in certain region researches [6] (https://geography.wr.usgs.gov/science/croplands/index.html).These mapping products provide strong data support for the study of global and regional crop planting intensity and yield.However, the current mapping products are still inadequate.For instance, these products were developed using the traditional band threshold segmentation method, which is both time-consuming and laborious.They are also confined to low resolutions.Lastly, these products are mostly designed for integrated cropland mapping or coarse classification of rain-fed or irrigated agriculture, but few studies have focused on mapping specific crops [7].
The continuous development of capacity for remote sensing data acquisition has gradually led to the adoption of the multi-type, high-dimensional characteristics of remote sensing data [8,9].This has led to machine learning methods being combined with mesoscale, multi-source remote sensing data.Different characteristics of remote sensing data can be combined, including spectral features such as visible green, red, and near-red bands [10], spatial features such as texture features derived from the gray-level co-occurrence matrix (GLCM) [11,12], temporal features such as Normalized Difference Vegetation Index (NDVI) or Enhanced Vegetation Index(EVI) time-series data [13,14], and auxiliary features such as slope factors generated from a digital elevation map (DEM) [15], to achieve higher precision than single-feature variable remote sensing crop recognition [16].Furthermore, this can help retrieve higher-value, deeper information at larger scales [7,[17][18][19], including high-dimensional remote sensing information.These factors enable applications such as crop classification and recognition [20].
Currently, land classification is conducted by machine learning algorithms and has been widely applied to remote sensing data.The random forest algorithm is a famous ensemble learning technique that is frequently based on bagging integration [21].The parameters are selected by combining sample perturbation with probability sampling (e.g., self-help sampling method) and attribute perturbation in characteristic subspace.The convergence and generalization error of the classification also provides good robustness [22,23].The random forest algorithm has been used extensively in land-cover classification applications [6,17,[24][25][26][27][28][29].The method has several capabilities for remote sensing applications, including the ability to handle high volumes of data, an unbiased estimation of generalization error, the reduction of outliers and noise, simple parameter settings, and a low computational cost.
Multi-source remote sensing images are produced by the Advanced Very High-Resolution Radiometer (AVHRR) and Moderate Resolution Imaging Spectro-radiometer (MODIS) satellites and have a large width and high time resolution.These images are the most advantageous for dynamically monitoring vegetation change at large scales.However, their spatial resolution is often not appropriate for regional-scale vegetation research.
An example of a multispectral sensor is Landsat-8, which has been in operation since 2013.Landsat-8 provides 30-m multi-spectral and repeated imagery with a 16-day revisit rate, and eliminates most atmospheric interference.The satellite's stable spectral, spatial, and temporal resolution is effective for vegetation observation.As a result, it has provided many applications on multiple platforms such as the Google Earth Engine [17,[30][31][32][33][34] with multiple methods [35] and multi-resource satellite images [36,37].
The Sentinel-2 satellite is another valuable resource for vegetation monitoring.The Sentinel-2 was launched in 2015 and is a land monitoring optical satellite that provides 10-m resolution optical imagery with a revisit period of five days.The satellite is equipped with the state-of-the-art MSI (Multispectral Imager) instrument, which covers 13 spectral bands.It is also the only satellite with three bands in the red-edge range, which is effective for monitoring vegetation health information.The Sentinel-2 has historically provided monitoring information for agriculture and forestry, and has become an effective data resource for Earth observation [38][39][40].The satellite is essential for monitoring land use, crop detection, and grain prediction; it can also help improve food security, as well as provide continuity for the current SPOT satellite Earth Observation System and Landsat missions.Using pre-processing techniques such as well calibration, Landsat-8 and Sentinel-2 can be integrated [41][42][43] and compared [44][45][46] to provide frequent cloud-free surface observations of vegetation growth.It has been well-confirmed that Sentinel-2 data and atmospherically-corrected Landsat-8 data can be highly correlated in bands reflectance, such as red band and near-infrared band, as well as be well-matched in the reflectance characteristics of primary surface objects including vegetation, soil, and water [47,48].
Mapping research is necessary for monitoring and evaluating wheat growth.It is difficult to select suitable features as an extraction factor in wheat mapping due to the spectral similarity between different wheat varieties.However, these characteristics are necessary for efficiently and accurately mapping wheat croplands.
This study had the following objectives related to regional land-cover map accuracy.Firstly, the goal was to extract winter wheat crops from integrated Landsat-8 and Sentinel-2 imagery using machine learning (RF) methods.Secondly, the study explored the best extraction features and feature combinations for winter wheat mapping.Lastly, this study compared the factors affecting extraction precision in different geographical scenes.

Study Area
The study area (Figure 1) was located at the periphery of Magdeburg city in Sachsen-Anhalt, central Germany, between 51 • 34 N and 52 • 15 N and 11 • 6 E to 12 • 10 E. The Elbe River flows through the region from south to north.The two banks of the river are widely cultivated, typically with winter wheat.
Germany has a strong agricultural sector and is the world's third largest exporter and importer of agricultural products.The North German Plain has low temperatures, cold and wet winters, low sunshine, and is suitable for the forage growth that is necessary for the development of animal husbandry.The south consists of mountain valleys with low latitudes and altitudes, high temperatures, and dense river networks.According to the annual report "Understanding Farming-Facts and figures about German farming" released by Germany's Federal Ministry of Food and Agriculture, Germany has three typical vegetation types, with 34% of the land area designated for agriculture, 31% designated for forest, and 13% designated for grasslands.A total of 47% of the land area is used for farming.
Germany is famous for its "cultivated landscape" with the main crop types being bread cereals, potatoes, sugar beets, fruit, and vegetables.Previous studies on crop classification at regional [13] and national scales [19] in central Germany have yielded high classification accuracy.The area covered by cereal crops is nearly as high as the total arable land.Most of these crops are concentrated in the central and southwestern regions of Germany.Today, cereals and wheat are the most important agricultural products in the German farming sector, which are used as food, feed, and renewable raw materials.

Study Area
The study area (Figure 1) was located at the periphery of Magdeburg city in Sachsen-Anhalt, central Germany, between 51°34′N and 52°15′N and 11°6′E to 12°10′E.The Elbe River flows through the region from south to north.The two banks of the river are widely cultivated, typically with winter wheat.
Germany has a strong agricultural sector and is the world's third largest exporter and importer of agricultural products.The North German Plain has low temperatures, cold and wet winters, low sunshine, and is suitable for the forage growth that is necessary for the development of animal husbandry.The south consists of mountain valleys with low latitudes and altitudes, high temperatures, and dense river networks.According to the annual report "Understanding Farming-Facts and figures about German farming" released by Germany's Federal Ministry of Food and Agriculture, Germany has three typical vegetation types, with 34% of the land area designated for agriculture, 31% designated for forest, and 13% designated for grasslands.A total of 47% of the land area is used for farming.
Germany is famous for its "cultivated landscape" with the main crop types being bread cereals, potatoes, sugar beets, fruit, and vegetables.Previous studies on crop classification at regional [13] and national scales [19] in central Germany have yielded high classification accuracy.The area covered by cereal crops is nearly as high as the total arable land.Most of these crops are concentrated in the central and southwestern regions of Germany.Today, cereals and wheat are the most important agricultural products in the German farming sector, which are used as food, feed, and renewable raw materials.

Satellite Imagery
Fine-resolution satellite data was required to map crop fields due to their small and fragmented parcels [53].As a result, Landsat-8 data with 30-m bands (bands two to seven), Sentinel-2 data with 10-m bands (bands two to four, band eight, and bands 11 to 12), and 20-m red-edge bands (bands five to seven) were selected to map cropland extent.Five Landsat-8 operational land imager (OLI) images and two Sentinel-2 multi-spectral instrument (MSI) images with minimum cloud cover during the wheat growth period in 2016 were respectively captured on 2 May (MSI), 5 May (OLI), 6 June (OLI), 25 August (OLI), 9 September (MSI), 10 September (OLI), and 29 November (OLI).These are marked with the corresponding numbers of 0502, 0505, 0606, 0825, 0909, 0910, and 1129.Figure 2 displays the phenology of the main crops in the study area [13,54].The images in (Figure 2) were captured during the winter wheat growing season and are displayed in band synthesis with zoomed in pictures in the first two rows.The "Early_Age" and "Mid_Age" were defined according to the growth stages of monocotyledonous and dicotyledonous plants released by Federal Biological Research Centre for Agriculture and Forestry [54].The 'Early_Age' consists of the principal growth stage of germination, leaf development and tillering.The 'Mid_Age' contains stem elongation, booting, inflorescence emergence and heading, flowering and anthesis, and the development of fruit.

Satellite Imagery
Fine-resolution satellite data was required to map crop fields due to their small and fragmented parcels [53].As a result, Landsat-8 data with 30-m bands (bands two to seven), Sentinel-2 data with 10-m bands (bands two to four, band eight, and bands 11 to 12), and 20-m red-edge bands (bands five to seven) were selected to map cropland extent.Five Landsat-8 operational land imager (OLI) images and two Sentinel-2 multi-spectral instrument (MSI) images with minimum cloud cover during the wheat growth period in 2016 were respectively captured on 2 May (MSI), 5 May (OLI), 6 June (OLI), 25 August (OLI), 9 September (MSI), 10 September (OLI), and 29 November (OLI).These are marked with the corresponding numbers of 0502, 0505, 0606, 0825, 0909, 0910, and 1129.Figure 2 displays the phenology of the main crops in the study area [13,54].The images in (Figure 2) were captured during the winter wheat growing season and are displayed in band synthesis with zoomed in pictures in the first two rows.The "Early_Age" and "Mid_Age" were defined according to the growth stages of monocotyledonous and dicotyledonous plants released by Federal Biological Research Centre for Agriculture and Forestry

Reference Data
The reference data (e.g., Mapping Germany's Agricultural Landscape) was released by the European Space Agency (ESA)'s Copernicus Sentinel-2 Monitoring Program on 29 August 2017, with free access via the website.The dataset covered 21 land-cover categories, including 15 specific crop types between October 2015 and late 2016 with a 30-m resolution.The dataset was used to assist in sketching training sample points.The corresponding types of samples were marked in the same coordinate position as the satellite images based on the crop types in the reference data.It is also helpful in obtaining validation data in ArcGIS 10.4 by manually delineated ground truth

Reference Data
The reference data (e.g., Mapping Germany's Agricultural Landscape) was released by the European Space Agency (ESA)'s Copernicus Sentinel-2 Monitoring Program on 29 August 2017, with free access via the website.The dataset covered 21 land-cover categories, including 15 specific crop types between October 2015 and late 2016 with a 30-m resolution.The dataset was used to assist in sketching training sample points.The corresponding types of samples were marked in the same coordinate position as the satellite images based on the crop types in the reference data.It is also helpful in obtaining validation data in ArcGIS 10.4 by manually delineated ground truth polygons on a high-resolution map downloaded from Google Earth that is well based on the corresponding position in the reference data.This reference dataset was invaluable for training the RF classifier, as well as for assessing classification accuracies and identifying uncertainty.

Methods
The workflow for wheat detection and mapping consisted of the following steps (Figure 3).First, data pre-processing was used to calibrate to an integrated spatial resolution of 30 m, or the same geographical reference based on Landsat data levels.The image sequences were then processed using top of atmosphere reflectance (TOA) data.Second, multi-feature construction and feature combination were used to build different model setups.Then, wheat land detection and mapping were conducted using the random forest algorithm.Lastly, mapping accuracy and comparison analysis were used to assess different model input setups.polygons on a high-resolution map downloaded from Google Earth that is well based on the corresponding position in the reference data.This reference dataset was invaluable for training the RF classifier, as well as for assessing classification accuracies and identifying uncertainty.

Methods
The workflow for wheat detection and mapping consisted of the following steps (Figure 3).First, data pre-processing was used to calibrate to an integrated spatial resolution of 30 m, or the same geographical reference based on Landsat data levels.The image sequences were then processed using top of atmosphere reflectance (TOA) data.Second, multi-feature construction and feature combination were used to build different model setups.Then, wheat land detection and mapping were conducted using the random forest algorithm.Lastly, mapping accuracy and comparison analysis were used to assess different model input setups.Figure 3. Workflow for winter wheat detection and mapping using the random forest algorithm.

Random Forest Algorithm
The random forest algorithm was constructed based on decision trees.The sample subset containing K attributes was randomly selected (sample disturbance) for each node in the decision tree.Then, the optimal attributes were selected as the basis for the division of the nodes (attribute disturbance).The construction process of the random forest algorithm consisted of the following: (1) The bagging [55] was used to randomly return each tree from the original training set, and a total of n trees were used as training sets.
(2) We trained n_tree decision tree models for n_tree training sets, respectively.
(3) The single decision tree model assumed that the number of attributes in a node was equal to K. The best attributes for the split were randomly selected according to the information gain, information gain ratios, and the Gini index from among all the K best splits [56].
(4) Each tree was continuously split using this method until all the training samples of the node belonged to the same class.
(5) The generated decision trees were formed into a random forest, and the final classification results were determined by a multi-tree classifier by vote.
The sampling scheme was implemented based on an overarching principle.Pure pixels with the type of winter wheat or the others inside the respective pixel boundary were visually identified

Random Forest Algorithm
The random forest algorithm was constructed based on decision trees.The sample subset containing K attributes was randomly selected (sample disturbance) for each node in the decision tree.Then, the optimal attributes were selected as the basis for the division of the nodes (attribute disturbance).The construction process of the random forest algorithm consisted of the following: (1) The bagging [55] was used to randomly return each tree from the original training set, and a total of n trees were used as training sets.
(2) We trained n_tree decision tree models for n_tree training sets, respectively.
(3) The single decision tree model assumed that the number of attributes in a node was equal to K. The best attributes for the split were randomly selected according to the information gain, information gain ratios, and the Gini index from among all the K best splits [56].
(4) Each tree was continuously split using this method until all the training samples of the node belonged to the same class.
(5) The generated decision trees were formed into a random forest, and the final classification results were determined by a multi-tree classifier by vote.
The sampling scheme was implemented based on an overarching principle.Pure pixels with the type of winter wheat or the others inside the respective pixel boundary were visually identified and assigned with the correct class label.Then, 100 evenly scattered sampling dots in the study area were randomly generated in ArcGIS 10.4 to construct an unbiased training sample dataset.The artificial sample points were supplemented according to the reference classification map and previous cropland knowledge in order to ensure a sufficient number of samples.Four cases (Figure 4) occurred when random sampling points were scattered on remote sensing images.These occurred when: (1) wheat pixels covered the sampling point, (2) non-wheat pixels covered the sampling point, (3) sampling points were near wheat pixels, and (4) when sampling points were near non-wheat pixels.For the first two cases, the pixel blocks in which the sample points fell were directly marked as training samples for the corresponding categories.For the latter two cases, the pure wheat and non-wheat pixel blocks that were closest to the sampling point were manually selected.Through this interactive sampling strategy, a total of 1587 wheat pixels and 4975 non-wheat pixels were selected and labeled as one and zero to construct the sample dataset for the random forest classifier.
and assigned with the correct class label.Then, 100 evenly scattered sampling dots in the study area were randomly generated in ArcGIS 10.4 to construct an unbiased training sample dataset.The artificial sample points were supplemented according to the reference classification map and previous cropland knowledge in order to ensure a sufficient number of samples.Four cases (Figure 4) occurred when random sampling points were scattered on remote sensing images.These occurred when: (1) wheat pixels covered the sampling point, (2) non-wheat pixels covered the sampling point, (3) sampling points were near wheat pixels, and (4) when sampling points were near non-wheat pixels.For the first two cases, the pixel blocks in which the sample points fell were directly marked as training samples for the corresponding categories.For the latter two cases, the pure wheat and non-wheat pixel blocks that were closest to the sampling point were manually selected.Through this interactive sampling strategy, a total of 1587 wheat pixels and 4975 non-wheat pixels were selected and labeled as one and zero to construct the sample dataset for the random forest classifier.A confusion matrix [57] was used to evaluate the performance and accuracy of the wheat extraction map.It is an effective tool to describe classifier accuracy, and consists of square numbers listed in rows and columns.This matrix can express the number of sample units assigned to a specific category relative to the verified ground truth class.As Google Earth images have been proved sufficient for providing reliable validation data and assessing moderate resolution remote sensing data [58], such as Landsat [59] and MODIS [60,61], as well as in regional ground investigation points [62].Thus, for the validation process, a high-resolution image of 2.94-m was downloaded from the Google Earth platform and carefully calibrated into the same coordinate system as the existed data.The manually delineated ground truth polygon from Google Earth was produced to serve as validation data, and was combined with the classification map generated from remote sensing data to build the confusion matrix.Accuracy indexes containing the overall accuracy, producer's accuracy, user's accuracy [63], f1_score, and kappa [64] values were calculated to examine classification accuracy.

Multi-Input Features
Input features are the most important aspect of the classification scheme, and directly affect the final classification accuracy.Many features are used as model inputs in common land mapping problems.These features can be spectral features consisting of original bands or spectral indexes.Such spectral indexes include: the Normalized Difference Vegetation Index(NDVI) [65], Enhanced Vegetation Index(EVI) [66], Green Vegetation Index(GVI) [67], Green Normalized Difference Vegetation Index(GNDVI) [68], Leaf Area Index(LAI) [69], Normalized Difference Built-Up Index(NDBI) [70], Normalized Difference Snow Index(NDSI) [71], Modified Normalized Difference Water Index(MNDWI) [72,73], Soil Adjusted Vegetation Index(SAVI) [74], Sum Green Index(SGI) [75], and Simple Ratio Index(SR) [76].Texture features can also be included and are constructed by recognizing and extracting pixel grayscale differences in a particular window space, such as: angular second-moment ASM [77], contrast [77], entropy [78], and homogeneity [77].These features also include compressed information features such as principal component transformation variables, as well as temporal features that are suitable for objectives that change over time.Lastly, A confusion matrix [57] was used to evaluate the performance and accuracy of the wheat extraction map.It is an effective tool to describe classifier accuracy, and consists of square numbers listed in rows and columns.This matrix can express the number of sample units assigned to a specific category relative to the verified ground truth class.As Google Earth images have been proved sufficient for providing reliable validation data and assessing moderate resolution remote sensing data [58], such as Landsat [59] and MODIS [60,61], as well as in regional ground investigation points [62].Thus, for the validation process, a high-resolution image of 2.94-m was downloaded from the Google Earth platform and carefully calibrated into the same coordinate system as the existed data.The manually delineated ground truth polygon from Google Earth was produced to serve as validation data, and was combined with the classification map generated from remote sensing data to build the confusion matrix.Accuracy indexes containing the overall accuracy, producer's accuracy, user's accuracy [63], f1_score, and kappa [64] values were calculated to examine classification accuracy.

Multi-Input Features
Input features are the most important aspect of the classification scheme, and directly affect the final classification accuracy.Many features are used as model inputs in common land mapping problems.These features can be spectral features consisting of original bands or spectral indexes.Such spectral indexes include: the Normalized Difference Vegetation Index(NDVI) [65], Enhanced Vegetation Index(EVI) [66], Green Vegetation Index(GVI) [67], Green Normalized Difference Vegetation Index(GNDVI) [68], Leaf Area Index(LAI) [69], Normalized Difference Built-Up Index(NDBI) [70], Normalized Difference Snow Index(NDSI) [71], Modified Normalized Difference Water Index(MNDWI) [72,73], Soil Adjusted Vegetation Index(SAVI) [74], Sum Green Index(SGI) [75], and Simple Ratio Index(SR) [76].Texture features can also be included and are constructed by recognizing and extracting pixel grayscale differences in a particular window space, such as: angular second-moment ASM [77], contrast [77], entropy [78], and homogeneity [77].These features also include compressed information features such as principal component transformation variables, as well as temporal features that are suitable for objectives that change over time.Lastly, red edge features can be included, which are derived from the development of satellite sensors for vegetation detection [79].
The feature space in this study was constructed based on the multispectral remote sensing data mentioned above.Moreover, it was based on the features as follows: (1) Original spectral bands (B), including the blue band, green band, red band, near-infrared (NIR) band, SWIR1 band, and SWIR2 band, which consist of bands two to seven from Landsat-8 and bands two to four and five to seven from Sentinel-2, respectively.These bands are useful for earth observation, especially vegetation identification.
(2) Spectral indices (S), including NDVI, which is sensitive to vegetation; NDBI, which is helpful for impervious surface recognition, NDSI, which is highly-responsive to snow and cloud; and MNDWI, which is important to distinguish water bodies.All these spectral indices are selected for their effectiveness in specific earth surface objects extraction.
(3) Principal Component Analysis (PCA) (P), which contain the first three components pca1 to pca3.These three components are able to highly generalize band information.
(4) Differences in NDVI (D) information that can reflect phenology information about different crops (5) The red edge feature space (R) was also included in this study, and was represented by the red edge band and vegetation index derived from Sentinel-2 data, as red-edge bands are very effective for reflecting vegetation health status and growth information.
A total of 131 multi-features (MF) (Table 2) were selected as predictor variables, and were combined into 30 feature subsets (MFS) containing single feature and combined feature types (Figure 5).In these subsets, the first 10 variables with the highest importance scores were used as model inputs.The 30 MFSs were further divided into five multi-patterns (MP) named: I, II, III, IV, and V.These were based on the number of feature types contained in each MFS (Figure 5).For example, the I pattern indicated that the features used for classification were from the same single feature type (e.g., B, S, P, D, and R).The II pattern suggested that the features were from the combined feature type (e.g., BS, BP, BD, BR, SP, SD, SR, PD, PR, and DR).

Key Predictor Variables in MFs
Temporal features were the most critical components among the variables involved in the classifier (Table 2).In the single-temporal feature method, the highest scoring features were concentrated in the periods of 502, 505, 825, and 909.The phenology information revealed that June was the peak growing period for winter wheat, but other crops in the same area were also too similar to distinguish.The periods including May and August through September were the key growth periods where the more obvious characteristics of wheat were revealed.These periods occurred while other crops were sown or harvested.The classification based on these two stages provided the best results for distinguishing between crop objects and improved classification accuracy.Crop phase characteristics were essential in the combination method, and their importance far exceeded other features.
The spectral indices of features were another key factor in crop discrimination.The NDVI was the main factor in identifying single-spectral index features.The winter wheat grown in May had obvious differences with other crops.These differences could be distinguished using the NDVI and NDSI, resulting in high scores on the chart.The winter wheat grown and harvested in August and September had sparse or bare farming plots, which were easily confused with impervious surfaces such as buildings.The NDBI method proved to be valuable for removing buildings and related objects.In addition to these two critical periods, MNDWI and SAVI were slightly inferior to the previously mentioned features.The NDVI difference information contained a certain degree of spectral information in the combination method.Therefore, the other spectral indices were less important than the temporal characteristics.
Band and PCA features were the second most important predictors for wheat detection.In single-band features, B2, B3, and B4 were the key bands for retrieving vegetation information and distinguishing between vegetation objects.As for PCA features, the first and second principal components were sufficient to contain ground information in a single-PCA method.The main function of PCA was to reduce dimensionality and retain the maximum amount of information, thus it was more important for multi-band data, such as PCA (1,2) _502 (Sentinel-2 data) and PCA (1,2) _505 (Landsat-8 data).
The red edge spectrum is closely related to various physical and chemical parameters of vegetation and is an important indicator for describing plant pigment color and overall health.Other indicators include: canopy LAI estimation [80], chlorophyll density evaluation [81], and leaf element absorption [82].Vegetation information was abundant in May, so the corresponding red

Key Predictor Variables in MFs
Temporal features were the most critical components among the variables involved in the classifier (Table 2).In the single-temporal feature method, the highest scoring features were concentrated in the periods of 502, 505, 825, and 909.The phenology information revealed that June was the peak growing period for winter wheat, but other crops in the same area were also too similar to distinguish.The periods including May and August through September were the key growth periods where the more obvious characteristics of wheat were revealed.These periods occurred while other crops were sown or harvested.The classification based on these two stages provided the best results for distinguishing between crop objects and improved classification accuracy.Crop phase characteristics were essential in the combination method, and their importance far exceeded other features.
The spectral indices of features were another key factor in crop discrimination.The NDVI was the main factor in identifying single-spectral index features.The winter wheat grown in May had obvious differences with other crops.These differences could be distinguished using the NDVI and NDSI, resulting in high scores on the chart.The winter wheat grown and harvested in August and September had sparse or bare farming plots, which were easily confused with impervious surfaces such as buildings.The NDBI method proved to be valuable for removing buildings and related objects.In addition to these two critical periods, MNDWI and SAVI were slightly inferior to the previously mentioned features.The NDVI difference information contained a certain degree of spectral information in the combination method.Therefore, the other spectral indices were less important than the temporal characteristics.
Band and PCA features were the second most important predictors for wheat detection.In single-band features, B2, B3, and B4 were the key bands for retrieving vegetation information and distinguishing between vegetation objects.As for PCA features, the first and second principal components were sufficient to contain ground information in a single-PCA method.The main function of PCA was to reduce dimensionality and retain the maximum amount of information, thus it was more important for multi-band data, such as PCA (1,2) _502 (Sentinel-2 data) and PCA (1,2) _505 (Landsat-8 data).
The red edge spectrum is closely related to various physical and chemical parameters of vegetation and is an important indicator for describing plant pigment color and overall health.Other indicators include: canopy LAI estimation [80], chlorophyll density evaluation [81], and leaf element absorption [82].Vegetation information was abundant in May, so the corresponding red edge bands and indices scored higher than in other seasons using the single-red edge method.Red edge information also played an important role in the combination method, which is observed in the histogram of relative combined feature types (Figure 5).

Mapping Accuracy under 30 MFSs and Five MPs in Three Zones
Three different geo-scenarios (Figure 6) were selected in the study area to display the wheat mapping results for comparison with classification results (Figure 7).These scenarios were organized as pure farmland, urban mixed areas, and forested areas, which were marked as Zone 1, Zone 2, and Zone 3. Wheat crops were extracted using 30 multi-feature subsets (MFSs) and five multi-patterns (MPs), as mentioned above.The areas had the following characteristics.The pure farmland consisted of a homogeneous farming plot, simple surface landscape structure, and different objects with the same spectrum.The urban mixed area featured abundant ground objects and notable differences in spectral characteristics.The forested area had high field fragmentation, low planting density, and irregularly shaped farming plots.
Remote Sens. 2019, 11, 535 11 of 26 edge bands and indices scored higher than in other seasons using the single-red edge method.Red edge information also played an important role in the combination method, which is observed in the histogram of relative combined feature types (Figure 5).

Mapping Accuracy under 30 MFSs and Five MPs in Three Zones
Three different geo-scenarios (Figure 6) were selected in the study area to display the wheat mapping results for comparison with classification results (Figure 7).These scenarios were organized as pure farmland, urban mixed areas, and forested areas, which were marked as Zone 1, Zone 2, and Zone 3. Wheat crops were extracted using 30 multi-feature subsets (MFSs) and five multi-patterns (MPs), as mentioned above.The areas had the following characteristics.The pure farmland consisted of a homogeneous farming plot, simple surface landscape structure, and different objects with the same spectrum.The urban mixed area featured abundant ground objects and notable differences in spectral characteristics.The forested area had high field fragmentation, low planting density, and irregularly shaped farming plots.Figure 7 shows the classification results for the three regions using 30 feature subsets (MFSs).The mapping results in each MFS had the same spatial distribution as the ground truth polygons in   The lowest overall accuracy (OA) values in the three zones were 84.84%, 91.14%, and 89.94%, while the highest values were 88.15%, 93.70%, and 94.15%, respectively.The lowest values for producer's accuracy, which has important values for land mapping, were 75.45%, 69.69%, and 57.66%, while the highest values were 84.10%, 85.91%, and 76.60%, respectively.In agriculture, this level of precision is adequate for most applications based on crop mapping.The AUC value represented model classification performance.The lowest AUC values were 0.8403, 0.8316, and 0.7650, and the highest values were 0.8743, 0.9058, and 0.8553, respectively.These results suggest that the 30 MFSs used for modeling in this study provided good classification performance in different regions.The kappa coefficient indicated the consistency between the mapping results and ground truth value.The lowest kappa values were 0.6831, 0.7078, and 0.5000, and the highest values were 0.7519, 0.8028, and 0.6492, respectively.These measurements suggest that the mapping results for the three regions were in good agreement with the actual crop distribution.Figure 7 shows the classification results for the three regions using 30 feature subsets (MFSs).The mapping results in each MFS had the same spatial distribution as the ground truth polygons in Figure 6.This reflects the robustness of the random forests algorithm for multiple variables and surface conditions, which is represented in Table 3. Table 3 also summarizes the statistical accuracy index corresponding to the classification map in Figure 7.Each classification map was evaluated using six indexes, and included the overall accuracy (OA), producer's accuracy (PA), user's accuracy (UA), area under curve (AUC), f_scores, and kappa coefficient [41].Manually delineated ground truth polygons (shapefile format) from Section 2.3 were then converted into raster format with the same resolution of 30-m, and were all used as validation samples in a total of 100,445 pixels in Zone 1, 49,968 pixels in Zone 2, and 24,432 pixels in Zone 3.
The lowest overall accuracy (OA) values in the three zones were 84.84%, 91.14%, and 89.94%, while the highest values were 88.15%, 93.70%, and 94.15%, respectively.The lowest values for producer's accuracy, which has important values for land mapping, were 75.45%, 69.69%, and 57.66%, while the highest values were 84.10%, 85.91%, and 76.60%, respectively.In agriculture, this level of precision is adequate for most applications based on crop mapping.The AUC value represented model classification performance.The lowest AUC values were 0.8403, 0.8316, and 0.7650, and the highest values were 0.8743, 0.9058, and 0.8553, respectively.These results suggest that the 30 MFSs used for modeling in this study provided good classification performance in different regions.The kappa coefficient indicated the consistency between the mapping results and ground truth value.The lowest kappa values were 0.6831, 0.7078, and 0.5000, and the highest values were 0.7519, 0.8028, and 0.6492, respectively.These measurements suggest that the mapping results for the three regions were in good agreement with the actual crop distribution.
Table 3.This table displays the wheat mapping accuracy for the 30 multi-feature subsets (MFSs).The value of the red background filling represents the lowest value of the index, and the green background filling with different depths indicates the top three highest accuracies.The measurement unit of overall accuracy (OA), producer's accuracy (PA), and user's accuracy (UA) is %.AUC: area under curve.B: original spectral bands, S: spectral indices, P: principal component information, D: differences in NDVI information that can reflect phenology information about different crops, R: red edge feature space.3. The classification methods with better accuracy were mostly concentrated in patterns II, II, and IV, which were combined with two to four feature sources.This result suggests that crop classification should contain at least two or more types of classification features to achieve high node purity.However, classification accuracy is not always increased with the enrichment of classification features.Moreover, it may remain stable at a certain level, or even slightly decline.

Feature
Figure 8 displays the precision among multi-patterns (MPs).The single type features had the lowest classification accuracies, which are indicated in Table 3.The classification methods with better accuracy were mostly concentrated in patterns II, II, and IV, which were combined with two to four feature sources.This result suggests that crop classification should contain at least two or more types of classification features to achieve high node purity.However, classification accuracy is not always increased with the enrichment of classification features.Moreover, it may remain stable at a certain level, or even slightly decline.
OA is an example of this effect.The inter-class precision results for Zone 1 and Zone 2 (Figure 9) revealed an initial increase, followed by a decrease in accuracy.Model accuracy was also affected by the bias of the model feature selection process in addition to the model robustness and sample limitations.The accuracy of the contribution of other features was minimal when the model only considered some features to be important, and adding other similar features did not increase the accuracy.In Zone 3, the accuracy increased with the increase of feature types due to a significant change in the topographic element of Region 3 compared to the other two regions.New feature types may be necessary for increased accuracy, such as the terrain features for Zone 3.

Accuracy Performance in Three Zones with Multiple Geographical Land Surfaces
Figure 10 displays the accuracy results for each zone.In Zone 1 (Figure 10a), each precision index is concentrated around 86.64%.The overall commission error and omission error were lower than 0.2, with the omission error being slightly higher than the commission error.The average kappa coefficient was 0.7197, and the AUC was 0.8563.In Zone 1, the land mass was homogeneous, OA is an example of this effect.The inter-class precision results for Zone 1 and Zone 2 (Figure 9) revealed an initial increase, followed by a decrease in accuracy.Model accuracy was also affected by the bias of the model feature selection process in addition to the model robustness and sample limitations.The accuracy of the contribution of other features was minimal when the model only considered some features to be important, and adding other similar features did not increase the accuracy.In Zone 3, the accuracy increased with the increase of feature types due to a significant change in the topographic element of Region 3 compared to the other two regions.New feature types may be necessary for increased accuracy, such as the terrain features for Zone 3.  3. The classification methods with better accuracy were mostly concentrated in patterns II, II, and IV, which were combined with two to four feature sources.This result suggests that crop classification should contain at least two or more types of classification features to achieve high node purity.However, classification accuracy is not always increased with the enrichment of classification features.Moreover, it may remain stable at a certain level, or even slightly decline.
OA is an example of this effect.The inter-class precision results for Zone 1 and Zone 2 (Figure 9) revealed an initial increase, followed by a decrease in accuracy.Model accuracy was also affected by the bias of the model feature selection process in addition to the model robustness and sample limitations.The accuracy of the contribution of other features was minimal when the model only considered some features to be important, and adding other similar features did not increase the accuracy.In Zone 3, the accuracy increased with the increase of feature types due to a significant change in the topographic element of Region 3 compared to the other two regions.New feature types may be necessary for increased accuracy, such as the terrain features for Zone 3.

Accuracy Performance in Three Zones with Multiple Geographical Land Surfaces
Figure 10 displays the accuracy results for each zone.In Zone 1 (Figure 10a), each precision index is concentrated around 86.64%.The overall commission error and omission error were lower than 0.2, with the omission error being slightly higher than the commission error.The average kappa coefficient was 0.7197, and the AUC was 0.8563.In Zone 1, the land mass was homogeneous,

Accuracy Performance in Three Zones with Multiple Geographical Land Surfaces
Figure 10 displays the accuracy results for each zone.In Zone 1 (Figure 10a), each precision index is concentrated around 86.64%.The overall commission error and omission error were lower than 0.2, with the omission error being slightly higher than the commission error.The average kappa coefficient Remote Sens. 2019, 11, 535 16 of 25 was 0.7197, and the AUC was 0.8563.In Zone 1, the land mass was homogeneous, and the surface spectral structure was similar, and featured a large cropland area.All these factors were conducive to achieving high and stable classification accuracy.However, the "foreign bodies with the same spectrum" effect between crops was apparent.The German cultivated land was small, but the spatial resolution of the image was not high enough, and may have limited the Zone 1 classification accuracy.
In Zone 2 (Figure 10b), the OA had improved performance, with an average of 92.62%.The commission and omission errors were relatively low, fluctuating at about 0.2.The average Kappa coefficient was 0.7679, which revealed a high consistency between classification and inspection data.The AUC value was 0.8835, which revealed that the models had good classification ability.Zone 2 contained water, urban areas, and vegetation, and featured a richer terrain compared with other areas.The spectral and temporal discrimination between crops and other terrain was higher and easier to distinguish.The terrain near rivers and cities revealed certain distribution characteristics, which was conducive to high classification accuracy.
The OA for Zone 3 had the best performance compared to Zone 1 and Zone 2, with an average precision level of 92.72% (Figure 10c).However, the other indexes had a lower precision and larger fluctuation.The commission and omission errors were close to 0.4, the kappa coefficient was 0.6062, and the AUC was 0.8176.These results revealed a slight weakness in consistency and model capability compared to Zone 1 and Zone 2. Zone 3 was located at the edge of an uplift area in central Germany.Forests constituted the main vegetation in the region, along with a small amount of farmland.The differences between ground features allowed for an increase in the extraction precision for winter wheat.However, some farm blocks were small, fragmented, and irregular in shape, and reflectivity characteristics were affected by topographic fluctuations to a certain extent.All these factors may have caused high commission and omission errors in the specified zones.and the surface spectral structure was similar, and featured a large cropland area.All these factors were conducive to achieving high and stable classification accuracy.However, the "foreign bodies with the same spectrum" effect between crops was apparent.The German cultivated land was small, but the spatial resolution of the image was not high enough, and may have limited the Zone 1 classification accuracy.
In Zone 2 (Figure 10b), the OA had improved performance, with an average of 92.62%.The commission and omission errors were relatively low, fluctuating at about 0.2.The average Kappa coefficient was 0.7679, which revealed a high consistency between classification and inspection data.The AUC value was 0.8835, which revealed that the models had good classification ability.Zone 2 contained water, urban areas, and vegetation, and featured a richer terrain compared with other areas.The spectral and temporal discrimination between crops and other terrain was higher and easier to distinguish.The terrain near rivers and cities revealed certain distribution characteristics, which was conducive to high classification accuracy.
The OA for Zone 3 had the best performance compared to Zone 1 and Zone 2, with an average precision level of 92.72% (Figure 10c).However, the other indexes had a lower precision and larger fluctuation.The commission and omission errors were close to 0.4, the kappa coefficient was 0.6062, and the AUC was 0.8176.These results revealed a slight weakness in consistency and model capability compared to Zone 1 and Zone 2. Zone 3 was located at the edge of an uplift area in central Germany.Forests constituted the main vegetation in the region, along with a small amount of farmland.The differences between ground features allowed for an increase in the extraction precision for winter wheat.However, some farm blocks were small, fragmented, and irregular in shape, and reflectivity characteristics were affected by topographic fluctuations to a certain extent.All these factors may have caused high commission and omission errors in the specified zones.

Optimum Season Selection for MFs
Vegetation is one of Earth's features that has an observable cycle.Moreover, vegetation has different growth periods based on its structure and the climate.Crops are a special vegetation type that is affected by natural water and heat conditions as well as anthropomorphic processes.The "same thing different spectrum" and "foreign thing same spectrum" effect was apparent in our observations.Based on this characteristic, crop phenology information is essential for distinguishing between crop features [83].The most important task for crop extraction in a certain region is to fully understand the phenological history of local crops and identify the key seasons and growing periods (Figure 11).
In the study area, the main crops were winter wheat, maize, oilseed rape, and sugar beet.The strongest features for winter wheat were revealed during the June growth period.However, oilseed rape and sugar beet also featured strong growth during this period.As a result, the vegetation features in June had little effect on the differentiation between winter wheat and other crops.However, wheat crops experienced vigorous growth in May, while other crops were in the early growth stages and had weaker spectral features.Thus, May was identified as the best period for distinguishing winter wheat features from other crops.The wheat harvest season is in August-September, during which maize and sugar beets are still growing.Winter wheat can be distinguished from these two crops by examining the differences in growth.Thus, in this study, August and September were also effective periods for identifying wheat crops.Wheat crop features were mostly indistinguishable during December due to the crop being recently sown.This phenology analysis also confirms the seasonal differences in Figure 5.

Optimum Season Selection for MFs
Vegetation is one of Earth's features that has an observable cycle.Moreover, vegetation has different growth periods based on its structure and the climate.Crops are a special vegetation type that is affected by natural water and heat conditions as well as anthropomorphic processes.The "same thing different spectrum" and "foreign thing same spectrum" effect was apparent in our observations.Based on this characteristic, crop phenology information is essential for distinguishing between crop features [83].The most important task for crop extraction in a certain region is to fully understand the phenological history of local crops and identify the key seasons and growing periods (Figure 11).
In the study area, the main crops were winter wheat, maize, oilseed rape, and sugar beet.The strongest features for winter wheat were revealed during the June growth period.However, oilseed rape and sugar beet also featured strong growth during this period.As a result, the vegetation features in June had little effect on the differentiation between winter wheat and other crops.However, wheat crops experienced vigorous growth in May, while other crops were in the early growth stages and had weaker spectral features.Thus, May was identified as the best period for distinguishing winter wheat features from other crops.The wheat harvest season is in August-September, during which maize and sugar beets are still growing.Winter wheat can be distinguished from these two crops by examining the differences in growth.Thus, in this study, August and September were also effective periods for identifying wheat crops.Wheat crop features were mostly indistinguishable during December due to the crop being recently sown.This phenology analysis also confirms the seasonal differences in Figure 5.

Optimum Season Selection for MFs
Vegetation is one of Earth's features that has an observable cycle.Moreover, vegetation has different growth periods based on its structure and the climate.Crops are a special vegetation type that is affected by natural water and heat conditions as well as anthropomorphic processes.The "same thing different spectrum" and "foreign thing same spectrum" effect was apparent in our observations.Based on this characteristic, crop phenology information is essential for distinguishing between crop features [83].The most important task for crop extraction in a certain region is to fully understand the phenological history of local crops and identify the key seasons and growing periods (Figure 11).
In the study area, the main crops were winter wheat, maize, oilseed rape, and sugar beet.The strongest features for winter wheat were revealed during the June growth period.However, oilseed rape and sugar beet also featured strong growth during this period.As a result, the vegetation features in June had little effect on the differentiation between winter wheat and other crops.However, wheat crops experienced vigorous growth in May, while other crops were in the early growth stages and had weaker spectral features.Thus, May was identified as the best period for distinguishing winter wheat features from other crops.The wheat harvest season is in August-September, during which maize and sugar beets are still growing.Winter wheat can be distinguished from these two crops by examining the differences in growth.Thus, in this study, August and September were also effective periods for identifying wheat crops.Wheat crop features were mostly indistinguishable during December due to the crop being recently sown.This phenology analysis also confirms the seasonal differences in Figure 5.

Optimal MP Selection from MFSs
The parameter settings for the random forest algorithm model are not complicated.Even with the default forest parameter structure, good classification results can be obtained by selecting typical features from the classification target.There are many crops characteristics, but vegetation classification is still limited to spectral reflectance and growth periodicity.
In terms of spectral characteristics, the main feature types include: (1) a basic original broad band, such as the visible green, red, and near-red bands; (2) a narrow band carrying key information for vegetation, such as the red edge band (band selection mainly depends on the spectral resolution of the satellite sensor); (3) principal component analysis used to produce a band consisting of the new dimension obtained by the spatial transformation of the original band (moreover, it can reduce the dimensionality of the original wave band); and (4) a vegetation index obtained using band calculations in the key position of the spectral curve for different ground objects.Then, a spectral index can be constructed to effectively separate ground objects.Common vegetation indices have become effective tools for distinguishing vegetation.They are constructed using the visible red and near infrared band, as well as a red edge vegetation index based on the red edge band.However, the phenomena of "foreign body with same spectrum" and "same body with foreign spectrum" are still inevitable due to the limited spectral resolution of medium-resolution satellites.Therefore, there are still high commission and omission errors (Table 3).
The rapid revisit period and free acquisition of medium-resolution satellite imagery has enabled the construction of time series datasets such as NDVI stack files and EVI trajectory [84,85].Most scholars perform the classification of ground objects using time series analysis.However, this method requires mastery of the complete time series files for the ground object spectrum.Moreover, cloud interference can also lead to a partial absence of time series images.Thus, the monotype of temporal features that is needed for crop recognition can be unreliable.
These two facts suggest that it is not enough to only use the monotype of features in crop classification based on machine learning algorithms.Nevertheless, the infinite types of features are still unexpected, since they will lead to bias in feature selection and computational redundancy.The crop extraction model should accept MFSs, including at least three or four different feature types (MP-III or MP-IV) in the fields of spectral reflectance and growth periodicity.

Factors Affecting Accuracy in Three Zones
The factors influencing the three zones can be inferred based on differences in classification accuracy.OA was used as the evaluation index, and the top five accuracies for the 30 MFSs were listed, and feature types were counted in the corresponding MFSs.
The result of Zone 1 (Table 4) reveals that the band (B) information may have been the main influencing factor.This is mainly due to this region being located on a pure farmland environment with homogeneous background information.Furthermore, the differences between crops were mainly identified according to their spectral heterogeneity.Further improvement in the accuracy of crop extraction for pure farmland areas may depend on improvement of the spectral and spatial resolution of satellite sensors.Furthermore, the spectral differences between different crops are best captured using narrower spectral bands.
The results from Table 4 suggest that the spectral indices (S) were the main factor influencing the accuracy of the results in Zone 2 (Table 4).This zone was located in a mixed urban area.Spectral indices can enhance the spectral information of different types of terrain, providing improved classification performance.For example, the NDVI used in this paper can enhance vegetation information, and the NDBI can enhance impermeable surface information.Furthermore, the MNDWI can enhance water information, and the NDSI can enhance cloud, snow, and bare soil information.Crop information can be more effectively retrieved from complex mixed backgrounds using these methods.Thus, the crop classification of urban mixed areas should be combined with an assessment of the regional environment.Moreover, a spectral index must be selected that can effectively enhance the terrain information and improve classification accuracy.
Zone 3 was located in a forested area, which featured a large variety of vegetation.The crops in this area exhibit distinct growth cycles for certain vegetation types (e.g., forests).This made the time-phase information (D) based on crop growth cycles the most important factor affecting classification accuracy in Zone 3 (Table 4).The area also included hilly terrain, which had the effect of disturbing the spectral reflectance characteristics of local terrain objects.Principal component information (P) was based on a spatial transformation, and is used to retain important spectral information.It was one of the main factors affecting classification accuracy.The temporal information for crop extraction in mountainous forest region was based on crop-specific growth information and was constructed to distinguish vegetation types.Topography was also a major factor affecting the reflectance characteristics of ground objects, which requires further consideration to improve the classification accuracy in the future.With the continuous development of high spatial resolution images, the extraction of remote sensing information based on pixels will face challenges because of the large amount of data.In contrast, object-based information extraction can make full use of the spatial information, geometric structure, and texture information of the image, and gradually highlight its advantages in the process of information extraction of high spatial resolution remote sensing images.The object-based approach firstly divides the land into several parts in terms of influence, and then takes the divided land as the smallest classification unit on the basis of obtaining relatively homogeneous land blocks [27].For example, in prior research, blocks after multi-scale segmentation were taken as basic classification units, and spectral, texture, shape, and other variables were constructed [86].The random forest algorithm is used to achieve better recognition results.In future research, we can further explore the object-based random forest feature recognition method under multi-source data.The advantage of this method lies in the regularity of the segmented image spots, which can effectively overcome the noises that are easily generated in the pixel-based classification method.At the same time, this method can make full use of the spatial structure and texture information in the image, and can effectively reduce the error rate in the fragmented and discontinuous areas of the ground objects, thus making the result of ground object recognition more accurate.Nowadays, the object-based object recognition method can be combined with the machine learning method to achieve better recognition results.

Advantages and Limitations of Approach in This Article
This paper proves that crop recognition and extraction with high accuracy can be achieved at the regional scale by using a limited and reasonable combination of feature spaces.To a certain extent, this paper only achieved 30-m winter wheat mapping in a small study area.Based on the experimental results of this paper, there may be some limitations in large-scale crop extraction research.However, the idea of simplifying and optimizing feature space can also provide a reference for crop extraction on a large scale to a certain extent.
Firstly, through the analysis and discussion of feature space in this paper, we can choose efficient feature variables as the input variables of the classifier from the two aspects of spectral and temporal features in future large-scale regional mapping.Secondly, for crop information extraction in large regions, considering operational efficiency and accuracy, it is usually necessary to divide large areas into certain surface units, and then calculate the information for each surface unit separately, and finally integrate the results of partitioning into the results of the whole large area.In this paper, the influence factors on the extraction accuracy of the three different regions discussed in Section 4.3 can provide some ideas for the future research of information extraction by block.For example, when large regions cover farming areas, urban mixed areas, and hilly areas, we can separate them according to their different geomorphological and environmental characteristics.For separated farming areas, it is helpful to focus on building features based on band information.As for urban mixed areas, attention can be paid to features that based on spectral index.This targeted feature building process can help achieve rapid and accurate information extraction.
Of course, information extraction at a large regional scale is also facing a variety of other problems and challenges.Possible factors are still under consideration in the process of implementation to further improve the theory and research methods.

Conclusions
In this study, Landsat-8 OLI and Sentinel-2 MSI data were combined for wheat extraction and mapping using the random forest machine learning algorithm.The mapping results for three regions with different background objects suggest that the random forest classifier was conducted well using multi-features (MFs), multi-feature setups (MFSs), and multi-patterns (MPs).Comparison analysis was conducted for MFs, MFSs, MFs, and three spatial zones to find the optimal features and combination modes, as well as the factors affecting zone accuracy.Compared with previous studies on crop classification, which needed to construct a variety of characteristic variables, this paper discusses the construction of feature space and a feature combination method, and concludes that it does not need too many kinds of features to participate in classification in order to achieve high accuracy.The results revealed that the three to four-feature type was an important factor in feature combinations, which was derived from spectral reflectance and growth periodicity.This provides a positive way for us to simplify feature construction and improve classification efficiency in the future.Moreover, the regional surface environment should be considered during feature construction.Certain features can enhance the information of the target object and mask other objects.Such features (e.g., spectral indices, phenological features) aid in extracting objects from background information.In future research, we can construct and extract features according to the characteristics of the local crop growth environment, so as to improve the classification speed and ensure high classification accuracy.Additionally, the classification attributes should be reconsidered when there are new factors affecting the reflectance characteristics of crops in the region, such as: elevation, texture, or shape.Finally, this paper combines the red-edge band features of Sentinel-2 data in the construction of feature space.Experiments show that using red-edge information as a crop extraction feature is effective, which has a certain scientific value in the future to play the role of the red-edge band in crop extraction research.Generally speaking, this paper discusses the optimization of crop extraction features and puts forward some views, which has a positive reference significance for crop information extraction research.

Figure 1 .
Figure 1.Study area location depicted by a shaded terrain map, as well as a visualization of the cultivated land in Landsat-8 imagery with random sampling points.

Figure 1 .
Figure 1.Study area location depicted by a shaded terrain map, as well as a visualization of the cultivated land in Landsat-8 imagery with random sampling points.
[54].The 'Early_Age' consists of the principal growth stage of germination, leaf development and tillering.The 'Mid_Age' contains stem elongation, booting, inflorescence emergence and heading, flowering and anthesis, and the development of fruit.

Figure 3 .
Figure 3. Workflow for winter wheat detection and mapping using the random forest algorithm.

Figure 4 .
Figure 4. Four cases occurred when random sampling points were scattered on remote sensing images.

Figure 4 .
Figure 4. Four cases occurred when random sampling points were scattered on remote sensing images.

Figure 5 .
Figure 5.The 30 multi-feature subsets (MFSs) containing the top 10 variables were ranked based on feature importance in each MFS, as well as five multi-patterns (MPs) named I, II, III, IV, and V. (a) Top 10 winter wheat predictor variables derived from the single feature type; (b) Top 10 winter wheat predictor variables derived from the combination of two feature types; (c) Top 10 winter wheat predictor variables derived from the combination of three feature types; (d) Top 10 winter wheat predictor variables derived from the combination of four feature types; and (e) Top 10 winter wheat predictor variables derived from the combination of two feature types.Each bar represents the importance score attributed to a predictor in a model run.

Figure 5 .
Figure 5.The 30 multi-feature subsets (MFSs) containing the top 10 variables were ranked based on feature importance in each MFS, as well as five multi-patterns (MPs) named I, II, III, IV, and V. (a) Top 10 winter wheat predictor variables derived from the single feature type; (b) Top 10 winter wheat predictor variables derived from the combination of two feature types; (c) Top 10 winter wheat predictor variables derived from the combination of three feature types; (d) Top 10 winter wheat predictor variables derived from the combination of four feature types; and (e) Top 10 winter wheat predictor variables derived from the combination of two feature types.Each bar represents the importance score attributed to a predictor in a model run.

Figure 6 .
Figure 6.Diagram of three different geomorphic regions located in the study area.The region with blue validation polygons is marked as Zone 1, and is located in pure farmland areas.The region with red validation polygons is marked as Zone 2, and is located in urban mixed areas.The region with yellow validation polygons is marked as Zone 3, and is located in forested areas.

Figure 6 .
Figure 6.Diagram of three different geomorphic regions located in the study area.The region with blue validation polygons is marked as Zone 1, and is located in pure farmland areas.The region with red validation polygons is marked as Zone 2, and is located in urban mixed areas.The region with yellow validation polygons is marked as Zone 3, and is located in forested areas.

Figure 8
Figure 8 displays the precision among multi-patterns (MPs).The single type features had the lowest classification accuracies, which are indicated in Table3.The classification methods with better accuracy were mostly concentrated in patterns II, II, and IV, which were combined with two to four feature sources.This result suggests that crop classification should contain at least two or more types of classification features to achieve high node purity.However, classification accuracy is not always increased with the enrichment of classification features.Moreover, it may remain stable at a certain level, or even slightly decline.

Figure 9 .
Figure 9. Line chart of precision among multi-patterns (MPs) when using OA as an indicator.

26 Figure 8
Figure 8 displays the precision among multi-patterns (MPs).The single type features had the lowest classification accuracies, which are indicated in Table3.The classification methods with better accuracy were mostly concentrated in patterns II, II, and IV, which were combined with two to four feature sources.This result suggests that crop classification should contain at least two or more types of classification features to achieve high node purity.However, classification accuracy is not always increased with the enrichment of classification features.Moreover, it may remain stable at a certain level, or even slightly decline.OA is an example of this effect.The inter-class precision results for Zone 1 and Zone 2 (Figure9) revealed an initial increase, followed by a decrease in accuracy.Model accuracy was also affected by the bias of the model feature selection process in addition to the model robustness and sample limitations.The accuracy of the contribution of other features was minimal when the model only considered some features to be important, and adding other similar features did not increase the accuracy.In Zone 3, the accuracy increased with the increase of feature types due to a significant change in the topographic element of Region 3 compared to the other two regions.New feature types may be necessary for increased accuracy, such as the terrain features for Zone 3.

Figure 9 .
Figure 9. Line chart of precision among multi-patterns (MPs) when using OA as an indicator.

Figure 9 .
Figure 9. Line chart of precision among multi-patterns (MPs) when using OA as an indicator.

Figure 10 .
Figure 10.Accuracy results for each zone.(a) Accuracy results for Zone 1; (b) Accuracy results for Zone 2; and (c) Accuracy results for Zone 3.

Figure 10 .
Figure 10.Accuracy results for each zone.(a) Accuracy results for Zone 1; (b) Accuracy results for Zone 2; and (c) Accuracy results for Zone 3.

Table 1 .
Regional and global remote sensing data products for crop and land-cover classification.

Table 2 .
The 131 multi-features (MF) derived from Landsat-8 and Sentinel-2 imagery used as predictor variables for the random forest model.

Table 4 .
The main factors affecting the zone accuracy are represented by the maximum number marked in red.The measurement unit of OA is %.
4.4.Prospects of Object-Based Approaches Compared to Pixel-Based Approaches