Mapping Winter Wheat with Combinations of Temporally Aggregated Sentinel-2 and Landsat-8 Data in Shandong Province, China

: Winter wheat is one of the major cereal crops in China. The spatial distribution of winter wheat planting areas is closely related to food security; however, mapping winter wheat with time-series finer spatial resolution satellite images across large areas is challenging. This paper explores the potential of combining temporally aggregated Landsat-8 OLI and Sentinel-2 MSI data available via the Google Earth Engine (GEE) platform for mapping winter wheat in Shandong Province, China. First, six phenological median composites of Landsat-8 OLI and Sentinel-2 MSI reflectance measures were generated by a temporal aggregation technique according to the winter wheat phenological calendar, which covered seedling, tillering, over-wintering, reviving, jointing-heading and maturing phases, respectively. Then, Random Forest (RF) classifier was used to classify multi-temporal composites but also mono-temporal winter wheat development phases and mono-sensor data. The results showed that winter wheat could be classified with an overall accuracy of 93.4% and F1 measure (the harmonic mean of producer’s and user’s accuracy) of 0.97 with temporally aggregated Landsat-8 and Sentinel-2 data were combined. As our results also revealed, it was always good to classify multi-temporal images compared to mono-temporal imagery (the overall accuracy dropped from 93.4% to as low as 76.4%). It was also good to classify Landsat-8 OLI and Sentinel-2 MSI imagery combined instead of classifying them individually. The analysis showed among the mono-temporal winter wheat development phases that the maturing phase's and reviving phase's data were more important than the data for other mono-temporal winter wheat development phases. In sum, this study confirmed the importance of using temporally aggregated Landsat-8 OLI and Sentinel-2 MSI data combined and identified key winter wheat development phases for accurate winter wheat classification. These results can be useful to benefit on freely available optical satellite data (Landsat-8 OLI and Sentinel-2 MSI) and prioritize key winter wheat development phases for accurate mapping winter wheat planting areas across China and elsewhere.


Introduction
Wheat is the most widely grown cereal crop in the world and plays an important role in the food supply, accounting for approximately 20% of human energy consumption [1,2].Therefore, timely and accurate monitoring of wheat planting areas is very important for food security [3].At present, many global and national mapping products for land-cover classification are available, such as, the 500-meter Moderate Resolution Imaging Spectroradiometer (MODIS) Land Cover Collections [4], the 30-meter Finer Resolution Observation and Monitoring-Global Land Cover (FROM-GLC) from Landsat [5] and the United States Geological Survey (USGS) National Land Cover Database (NLCD) based Landsat [6].These remote sensing datasets provided useful information for studying the spatial distribution and management of land cover and land use (LCLU) at a regional or global scale.However, listed LCLU products were developed mainly for broad LCLU classes (e.g., cropland, forest, and grassland), and few products provide more detailed information about the specific crops [7][8][9].
Earth Observation techniques proved to be a very effective and cost-efficient approach for crop mapping compared to traditional field surveys [10].However, there are still some limitations for crop mapping across large areas with satellite remote sensing.For example, MODIS and Advanced Very High-Resolution Radiometer (AVHRR) data were the main data for mapping crops due to their high temporal resolution and large swath [3,11,12].However, the lower spatial resolution of these imagery sources seriously affects the classification performance and it cannot capture the mixed agricultural landscapes, for example, in China, which has highly fragmented cropland and complex planting structures [13,14].To overcome this limitation, it is necessary to use finer-spatial-resolution data (such as 30 m Landsat imagery).Several studies analyzed the differences in crop phenology and spectral reflectance in the different growing periods to distinguish planted crops accurately.For instance, Liu et al. [15] proposed the use of Landsat-8 OLI and Gaofen-1 images obtained for six different periods of crop growth to map winter wheat in Boxing County, and the result demonstrated that different periods of crops growth observed by images could improve the accuracy of mapping.Hao et al. [16] implemented crop classification with time-series Gaofen-1 data in Manas County of Xinjiang and achieved good performance.Sun et al. [17] mapped crops with multi-temporal Landsat-8 and Sentinel-1 data accurately in urban agricultural regions.Previous studies have demonstrated that multi-temporal imagery can accurately extract crops by benefiting on the phenological characteristics of vegetation types [18][19][20][21].However, these studies at fine resolution often were limited to a small scale (prefecture-level city or county).Therefore, it is unclear how the extraction of phenological characteristics can be robust and result in accurate mapping of crop types across large areas, for instance, winter wheat.
There are two major challenges for large-scale crop mapping with medium-or high-spatialresolution imagery.First, the infrequent revisit cycle and cloud contamination may result in few cloud-free images available during the crop-growing period [22,23].Some methods have thus been presented to use one or two specific phenological seasons' data to identify crops.For example, Dong et al. [7,9] identified paddy rice based on the flooding signatures during the rice transplanting stage, and Park et al. [24] took advantage of different characteristics during the transplanting and harvest phases of Near-Infrared (NIR) and Synthetic Aperture Radar (SAR) backscattering coefficients to extract rice-producing area.However, the reliability of these algorithms still relied on the quality of images in the critical growth seasons of crops [25].Furthermore, the expert experience and prior knowledge are required to define appropriate thresholds or metrics, thus making them not be appropriate for specific tasks [11,26].Second, a large volume of data needs to be acquired and processed, which requires mass storage capacity and tremendous computing power.
With the launch of the European Space Agency's Copernicus Sentinel-2A/B satellites recently, the availability of medium-high resolution data has been further improved.Moreover, integration of Landsat-8 and Sentinel-2A/B could potentially produce at least one image every three days on average [27], which may significantly increase the number of clear-free observations.In general, several approaches have been used to benefit on multi-sensor image records and preserve as much as possible cloud-free observations.For instance, to fill data gaps due to masked out clouds and shadows, image fusion [10], mixed pixel decomposition [28] and temporal aggregation [29] were used.Among them, temporal aggregation probably is a simpler method than others, thus becoming a popular method for operational monitoring of land use and land cover [30].This method reduces an image collection by calculating the metrics (min, max, median, percentile, etc.) of all values at each pixel across the stack of all matching bands [31].The main problem with this method is that it depends on having sufficient data to generate valid aggregated metrics over a period of time [32,33], thus the reliability of temporal aggregation technique is related to the data availability over aggregation period.The number of good-quality images varies from spatially and temporally as well as the revisit frequency of sensors.Therefore, compared with using Landsat-8 and Sentinel-2 data separately, whether the combinations of temporally aggregated Landsat-8 and Sentinel-2 data can discriminate the differences of crop phenology and apply it for accurately crop mapping is worth researching.
Processing large amounts of data may create a high demand for computer performance.The Google Earth Engine (GEE) cloud computing platform can be used to process a mass of images quickly since it provides a concurrent processing way [34,35].The platform not only stores large amounts of data from a range of Earth Observation satellite systems but also enables users to upload their own data for processing, which provides a great opportunity for researchers to make comprehensive use of multi-source remote sensing data.Moreover, GEE has the most advanced cloud-computing capability designed for big data processing [36].
Our major goal was to identify the extent of winter wheat in the typical winter wheat region of China, namely Shandong Province and to test the suitability of Lansat-8 OLI and Sentinel-2 MSI temporal aggregation techniques based on GEE cloud computing platform for accurate winter wheat mapping.Here, we divided time intervals according to major winter wheat development phases and aggregated the optical data from Landsat-8 OLI and Sentinel-2 MSI.Then the Random Forest (RF) classifier was used to map winter wheat.With validation data independent of training datasets, we evaluated the accuracy of the generated maps.Hence, the purpose of this paper was to (1) map winter wheat with time-series Sentinel-2 MSI and Landsat-8 OLI data in Shandong Province (2) test if the combined data effectively improve the classification accuracy compared to Sentinel-2 MSI and Landsat-8 OLI data used individually and (3) identify key winter crop development stages for accurate mapping of winter wheat.

Study Area
The study area is Shandong Province, the second-largest wheat-producing area in China [37].Shandong faces the Bohai Sea in the east and Henan Province and Hebei Province in the west and is part of the North China Plain (Figure 1).The Yellow River flows through the west part of Shandong and enters the Bohai Sea along the northern coast.The area of the Province is 158,000 km 2 , while the cultivated area is 76,070 km 2 .In 2016, the winter wheat sown area of the Province was 38,302.7 km 2 , contributing approximately 16.6% of the total winter wheat sown area in China [37].
This region is located between a humid subtropical and continental zone.Shandong has a typical temperate monsoon climate, and the seasonal characteristics are the following: summer is hot, and winter is cold.High temperatures and precipitation occur primarily in summer, while dry and cold weather occurs in winter.Spring and autumn are shorter than summer and winter, but some coastal cities are exceptions.The average annual temperature is 13.4 °C.The lowest temperature usually occurs in January, and the highest temperature occurs in July.Annual precipitation ranges between 550 and 950 mm, decreasing from southeast to northwest.There are sufficient light resources for winter wheat production with an average of 2290-2890 hours of light per year, and the thermal conditions can meet the requirements for crop growth.Most of the Province represents the plains, with a few mountains in the middle of the Province.Therefore, biophysical and orography conditions of Shandong Province are very suitable for the growth of wheat.Winter wheat is sown in September-October and matures from late May to June of the next year in this region (Table 1).

Data and Preprocessing
All processing of remote sensing imagery was made on the GEE cloud computing platform (https://earthengine.google.org/),which stores processed Landsat-8 Operational Land Imager (OLI) and Sentinel-2 A/B Multi-Spectral Instrument (MSI) products as well as various image-processing algorithms.Therefore, GEE makes it possible to handle and process big data with state-of-the-art cloud computing capacity.

Remote Sensing Data
In the GEE data catalog, the archived Landsat-8 Top of Atmosphere (TOA) reflectance collection has been orthorectified and geographically registered.Sentinel-2 MSI data contain two products, Level-1C and Level-2A.The Level-1C product has also been orthorectified and geographically registered, and the Level-2A product has been atmospherically corrected based on Level-1C product.However, the Level-2A product before 2019 cannot be downloaded from GEE, though atmospheric correction can be implemented locally and ingest to GEE, tremendous computation and storage demanding to make it unsuitable for our experiment, therefore, the Level-1C product was used for this study.The temporal resolution of Landsat-8 is 16 days, and that of Sentinel-2 is five days.The spatial resolutions of these two sensors are 30 and 10/20/60 m, respectively.Thus we used the nearest neighbor algorithm to resample the spatial resolution of Sentinel-2 images to 30 m to match Landsat-8 imagery.Both sensors have multi-bands, but only blue, green, red, NIR, and two Short-Wave Infrared Spectral Range (SWIR) bands were selected as classified features because these bands have similar Spectral Response Functions (SRF) (Table 2) in the two sensors.The "B8A" band of Sentinel-2 data were used for "NIR" to match Lansat-8 data better [29].Images with a cloud score below 50% were retained.For Sentinel-2 imagery, clouds were masked based on cloudy pixels identified by the assessment band (QA60).Clouds of all Landsat-8 OLI images were masked by a cloud-masking procedure that detects clouds by brightness, temperature, and the normalized difference snow index [38].In total, 1813 Sentinel-2 Level-1C images and 276 Landsat-8 TOA images from October 1st 2018 to June 10th 2019, were used in this study (Figure 2).

Sample Data Acquisition
We collected reference data (including training and validation samples) from the combination of GNSS (Global Navigation Satellite System) fieldwork and very-high-resolution imagery.According to the objectives of our research and the actual status of the study area, five land-cover types were sampled: winter wheat, other crops, building and bare land, water, and forest.The building and bare land, water and forest were labeled by visual interpretation from high-resolution imagery via Google Earth TM mapping service with image-dates approximately May 2019.The samples of wheat and other crops were obtained from a field campaign, which was conducted in April 2019.During the field campaign, we performed random sampling in all prefectural-level cities of Shandong Province and used the non-differential Global Positioning System (GPS) equipment to record the coordinates of the sampled points.To avoid sampling bias, the coordinates of the centroid of the fields were recorded.In total, 6,185 samples were collected for the five land-cover types.The collected reference data were randomly divided into two parts, 60% as training samples and 40% as validation samples, to ensure that there was no overlap between them.Last, the numbers of pixels per class for the training and validation data were counted (Table 3).The distribution of reference data is shown in Figure 1.

Methodology
The framework for mapping winter wheat developed in this paper is shown in Figure 3. First, we aggregated six phases' images using Landsat-8 and Sentinel-2 data according to winter wheat's phenological calendar, and a 30 m image mosaic was built for each phase (Section 3.1).Second, we chose the spectral reflectance and two vegetation indices as the classification features (Section 3.2).Third, the random forest classifier was used to classify winter wheat and other classes (Section 3.3).Fourth, we compared classification accuracies when mono-temporal and multi-temporal data, monosensor and multi-sensor data were classified (Section 3.4).Finally, the feature importance assessment was performed to identify the contribution features (bands and vegetation indices) in the mapping process (Section 3.

Temporal Aggregation
Due to the presence of clouds, some cloud and shadow data gaps could result in spatial inconsistency.Temporal aggregation, such as calculation of mean, median, max or other simple phenological metrics based on surface reflectance products or indices (e.g., Vegetation Index, Normal Difference Water index, etc.) for a period, may fill in the data gaps.The method of temporal aggregation requires sufficient data for each period to cover the entire study area, and thus, how to divide time intervals is very important.We put forward a "phenological period composite", according to the major phenological dates of crop development, the whole growing period of crop is divided into several crop development phases, and then the data for each phase are aggregated into one image.For the integrated Lansat-8 and Sentinel-2 data, we aggregated data into six phases according to the winter wheat phenological calendar (Table 1), and the median value based on the pixel level was calculated to reduce the effect caused by the residual cloud and cloud shadows.This would result in six composite images, which are 1) seedling, 2) tillering, 3) over-wintering, 4) reviving, 5) jointing-heading and 6) maturing phase composites.Additionally, the two sensors' data were also used separately for classification there.To ensure cloud-free pixels could cover the entire study area in each phase, after our many tests, only four phases and four images were produced with Sentinel-2 MSI data, while a maximum of one image could be generated with Landsat-8 OLI data.The numbers of available images for independent phenological phases in Shandong Province are shown in Table 4.

Feature Calculation
The eight feature bands for classification included blue, green, red, NIR, SWIR1, SWIR2, Normalized Difference Vegetation Index (NDVI) (Equation 1) [39] and Soil Adjusted Vegetation Index (SAVI) (Equation 2) [40].In particular, two indicators (NDVI and SAVI) are the result of arithmetic operations performed with spectral information and are often used in research on vegetation, water and phenology extraction.Normalized Difference Vegetation Index (NDVI) is sensitive to vegetation growth, and Normalized Difference Vegetation Index (NDVI) could reduce the soil effect because winter wheat is sparse in the early growth stage.The specific calculation formula is as follows: where  NIR and  RED represent the values of the near-infrared band and red band.L stands for the correction factor, which ranges from 0-1.L is 0 when the vegetation coverage is high and 1 when it is very low; we set it as 0.5 in this paper [41].

Random Forest Classifier
In this study, RF was used for classification.The RF algorithm is a very robust classifier that has been applied to the classification of different types of satellite imagery successfully [42,43].It is an algorithm that uses the theory of Ensemble Learning to integrate multiple trees.Therefore, its basic unit is the decision tree.In a forest composed of many independent decision trees, after random sampling from the original training data, each decision tree in a forest is used to judge the unmarked samples, and then the majority voting results of all decision trees are applied to predict the unmarked sample categories.The voting of each tree has the same weight.Previous studies demonstrated that the RF classifier has a fast training speed with high classification accuracy, is not sensitive to outliers and does not tend to over-fit easily [44,45].In this research, we tested the impact of the number of trees on the classification results (Supplementary Figure S1), the number of classification trees was set as 300 to balance the computation time and accuracies, the other parameters were kept default in GEE.The number of variables in each node was set as the square root of the number of features, which proved to be effective in the reduction of overfitting [46].The bag fraction was set as 0.5.

Accuracy Assessment
Accuracy assessments are necessary for the evaluation of the accuracy of the produced thematic maps [47].A confusion matrix was calculated based on the sample data and classification result.Specific evaluation metrics include the Overall Accuracy (OA) (Equation ( 3)), the Kappa coefficient (Equation ( 4)) and the F1 measure (Equation ( 7)).The three indices of the overall agreement were compared to evaluate the effectiveness of different algorithms.The Kappa coefficient and OA were used to evaluate the overall classification result, and the F1 measure is used to evaluate the accuracy of each class by using the Producer's Accuracy (PA) (Equation ( 5))and User's Accuracy (UA) (Equation ( 6)) [17,48].Additionally, a non-parametric McNemar's test was used to examine if statistical differences exist at 5% significance level when comparing the performance of different classification models (Landsat-8, Sentinel-2, and the combination of Sentinel-2 and Landsat-8 data) [24].
where  represents the number of correctly classified samples in category i; N represents the total number of validation samples;  is the Kappa coefficient;  _ and  _ are the sum of the elements of i row and the i column in the confusion matrix, respectively.

Feature Importance Assessment
The feature importance assessment was implemented with the scikit-learn package in Python [49], because it could not be implemented in GEE.The importance of a feature is related to the depth of the decision node in one tree; the deeper the decision node is, the larger the fraction of samples it contributes.In scikit-learn, the proportion of samples to which a feature contributes combined with the impurities reduced by splitting them is used to estimate the predictive power of the feature [49].

Spectral Differences of Features
Spectral differences existed between winter wheat and other surface types (Figure 4).Winter wheat in the green and red bands exhibited a drift to small values, while the other land cover types remained constant or suffered an increase.The building and bare lands were the highest, and forest class had the lowest reflectance in the three visible bands (blue, green and red).In the NIR band, the curve of winter wheat had two peaks in the overwintering (December to February in the following year) and jointing-heading (April) phases, the curve of water remained constant and the other landcover types have a valley in the over-wintering phase.In the SWIR1 and SWIR2 bands, winter wheat showed a continuous decline, but other land-cover types maintained a small increase.The NDVI curve of winter wheat first rises then descends, with the highest value acquired in the jointingheading phase, while other land-cover types first drop then rises.The trend of the SAVI time-series curve was similar to that of NDVI.Thus, the difference between winter wheat and other land-cover types could be distinguished in these indices and can be used for classification in this study.

Winter Wheat Map in Shandong
As the produced thematic map resulting from combinations of temporally aggregated Landsat-8 and Sentinel-2 data revealed, winter wheat was mainly planted in the west of this region and little was observed in the east of Shandong Province (Figure 5).Three subsets of winter wheat maps were displayed in Figure 6.By comparing the classified result (Figure 6b) with reference data (Figure 6a), most of the winter wheat pixels have been correctly classified.However, there were differences in the boundaries of winter wheat pixels and non-wheat pixels.This may be due to some spectral interference resulting from weeds, roads or buildings in the mixed pixels.
As illustrated in the confusion matrix (Table 5), the winter wheat map had an OA of 93.4% and Kappa of 91.7%.The F1 measure of the winter wheat category was 0.97.However, some other crop pixels were misclassified to winter wheat, which caused the area of wheat to be overestimated to some degree.

Comparison with Mono-temporal Data
To determine the improvement in the classification effect with multi-temporal data, we compared the mono-temporal performance.Each mono-temporal image includes NDVI, SAVI and six reflectance bands for a total of eight features for every image.All of the assessment results are shown in Figure 7.It is shown that the maturing period (May to 10, June) composite image acquired the best accuracy, with an OA of 86.2% and a Kappa coefficient of 82.5%.However, the accuracy was much lower than that acquired by multi-temporal data (an OA of 93.4% and Kappa of 91.7%), which demonstrated that multi-temporal data produce better performance in mapping crops.By further comparing the accuracy of mono-temporal data, we found that the low accuracy was acquired in the tillering phase (November to December) with an OA of 76.4%, which was because the surface classes were covered by snow in winter.In the seedling phase of winter wheat, the OA and Kappa coefficient was also very low (80% and 74.7%).The accuracies of the overwintering, reviving and jointingheading composite images were basically the same, with OA values of 84.4%, 85.1%, and 85%, respectively.

Comparison with Single Sensors' Data
We also compared the accuracy of mapping winter wheat with a single sensor's data.The Sentinel-2 data included four images with a total of 32 features used for classification, while Landsat-8 data has one image with eight features used for classification.The OA and Kappa coefficient of the combination of Sentinel-2 and Landsat-8 data as the data source, including six images, were the highest in the three groups (Figure 8).The classification accuracy of Landsat-8 was the lowest, with an OA of 80.6% and a Kappa coefficient of 75.6%, and the result was similar to mono-temporal classification with the combined data.The Sentinel-2 data resulted in an appreciably higher accuracy level compared to Landsat-8 data, with an OA of 90.9% and a Kappa coefficient of 88.5%, but one that was still lower than the combined data.The McNemar's test also showed that the combination of Sentinel-2 and Landsat-8 data performed better (p-value < 0.05) than the individual use the two data.According to the feature importance scores illustrated in Figure 9, the SAVI band for jointingheading period acquisitions was shown to be dominant in the result, with an importance score of 0.043, followed by the NDVI band for the maturing period with a score of 0.042.The variables in the maturing phase were more important than others, with an accumulated score of 0.202, followed by variables in the reviving phase with an accumulated score of 0.192, variables in the tillering phase had the lowest accumulated score of 0.094, and thus the data acquired in the maturing phase is very important to the identification of winter wheat.Moreover, comparing the vegetation index bands with spectral index bands, the vegetation index bands were more important.

Combining of Temporally Aggregated Landsat-8 and Sentinel-2 Data
The study confirmed the utility of temporally aggregated Sentinel-2 and Landsat-8 data for mapping winter wheat at a provincial scale.Adequate temporal data can capture the different phenological characteristics between ground objects to classify targets.Zhang et al. [36] utilized three years of Sentinel series data to produce multi-temporal monthly composites for woody canopy mapping.Dong et al. [9] used a combination of Landsat data every five years to monitor the dynamics of rice cultivation.However, vegetation types may change year to year.Compared with these previous studies, this study tested the applicability of temporally aggregated Sentinel-2 and Landsat-8 data in one year to composite six periods' data for winter wheat mapping, and the results have proven to be good.
For temporal aggregation, sufficient data are required over a period to cover all the study areas.On the one hand, the integration of two different sensors' data provides a high revisit cycle to ensure sufficient good-quality observations.As illustrated in Figure 2, during the entire winter wheat growing period in Shandong province, the average number of good observations from Sentinel-2 was 40.3 and that of Landsat-8 was only 16.4, but the number was 56.7 when combining the two sensors data.On the other hand, aggregating image data according to winter wheat growing stages enlarged the time intervals so that there is available data in every growing stage of winter wheat; because the quality of satellite images is influenced by the weather, the growth of vegetation depends on the climate as well as.The overwintering phase of winter wheat corresponds to approximately two months, but the other stages are only approximately one month, and the images are more susceptible to snow or rain in this stage.The approach of aggregation of Landsat-8 and Sentinel-2 imagery provides abundant observations to preserve the temporal signature of winter wheat phenology used for discrimination against other land-uses/covers.The combinations of Landsat-8 and Sentinel-2 data performed revealed more accurate classification results than by classifying two sensor's data separately.The more time intervals there were, the better the classification effect acquired was.Short intervals help to identify phenological characteristics of certain crop types [30].

The Impact of Key Crop Development Phases on Classification Results
Considering the importance of operational monitoring of crops for food security, knowledge of key crop development phases from remote sensing perspective is crucial to prioritize the allocation of satellite imagery for such phases and obtain as early as possible classification results with acceptable accuracies.Thus, the determination of the key crop development phases is necessary.Previous studies investigated the impact of specific image dates on classification accuracy, such as the importance of May or "Spring" imagery for accurate separation of cropland and other classes [18,19].Here, we further advanced understanding of the importance of image dates, but from crop development phases' perspective, rather from specific image dates.Therefore, such results could be easily applied elsewhere, since scientists, agronomists and farmers pay more attention to crop growth stages rather specific image dates.We found that, the maturing phase (10, May-30, June) followed by the reviving phase (20, February-30, March) was particularly crucial for accurate classification of wheat identification.In the maturing period, winter wheat grows rapidly, and the harvested wheat all share the same yellow hue, while other crops are in the early growth phase, which results in higher importance of features derived from the maturing phase [8,17].Thus, image dates for the maturing phase are the best for extracting winter wheat, and the result of the mono-temporal image composite classification confirmed such a conclusion (Figure 7).In the reviving phase of winter wheat, the growth characteristics of wheat are already significantly different from those of other vegetation types; most of the plants are grey and not germinated, while winter wheat has resumed its growth [11,15].Wheat can be discriminated from other classes, and therefore the reviving period image is also important for identifying winter wheat in the study area.The data from the two periods have a great influence on the classification accuracy.
Additionally, using the satellite images from early seasons to identify the planting area of crops is extremely crucial for agricultural management too, for instance, for water and fertilizer allocation.However, the optical satellite data from the early stages of winter crop growth are affected greatly by clouds.Therefore, the combinations of temporally aggregated Landsat-8 and Sentinel-2 data would enrich cloud-free observations and facilitate in accurate classification results.This might be the reason that the classification results of mono-temporal phases after the tillering phase gradually converged to 85% (Figure 8).The extent of winter wheat could be determined two months before harvest, which would allow enough time for the government to evaluate food security and take appropriate measures.
Finally, yet importantly, we found vegetation indices played an important role in the accurate classification of winter wheat (Figure 9).Vegetation leaves have strong absorption characteristics in the red band and strong reflection characteristics in the near-infrared band, and thus the vegetation index can reflect the differences of vegetation growth status [11].In the seedling phase, winter wheat had lower reflectance in NDVI and SAVI bands compared to forest and other crops (Figure 4), because it is just emergence.After reviving, the values of NDVI and SAVI for winter wheat were significantly higher than other crops due to the rapid growth of winter wheat, which also explained that the late seasons of winter wheat had higher classification accuracies.

Uncertainty and Outlook
In this study, we divided time intervals according to the winter wheat phenological calendar and aggregated temporal data to map winter wheat, with OA was 93.4%, but we should acknowledge that the phenology of winter wheat in Shandong might be different.Different growth conditions and landscape heterogeneity may cause interclass differences and bring down the ability of the image classifier to map a certain crop [50].The median spectral value calculated may alleviate the effect to some extent, but still this problem persists.Some classification inaccuracies were also resulting from the integration of datasets from Sentinel-2 and Landsat-8 data.First, geographic misregistration still exists between two sensors [51].Both sensors correct their geographic information using different global reference systems and orientation measurements.However, the positional misregistration was stable for our study area [29], with positional misregistration of approximately 30 meters.Second, the TOA reflectance data were used for our study because no Surface Reflectance data before 2019 were available in GEE for Shandong Province until this study was completed.The values of NDVI and SAVI calculated from TOA data were generally higher than those from Surface Reflectance data.Although a little effect was shown, Werff et al. [52] found that the spectral response functions of two sensors' TOA reflectance products were more correlated to the products of the bottom of atmosphere reflectance.We also achieved plausible accuracies by classifying TOA products.Third, the 30 m spatial resolution of images involves errors and uncertainties in the classification result.Fields smaller than 30 × 30 m pixels could be misclassified.
Our work highlighted the combined use of Sentinel-2 and Landsat-8 data, which constructed time-series curves for mapping winter wheat in Shandong Province.We analyzed the impact of key phenological phases on the classification result that would provide an effective reference for winter wheat mapping in China and elsewhere too.Our research would help to select suitable temporally aggregated data and periods for winter wheat mapping.Moreover, if data from the Gaofen series satellite (a part of China High-definition Earth Observation System) were available in the GEE data catalog in the future, the data availability would be further improved, it would allow for winter wheat mapping more efficient.

Conclusion
This paper explored the performance yielded when using combinations of temporally aggregated Sentinel-2 and Landsat-8 data to map winter wheat at the provincial level, and six phenological period composite images were generated and classified by RF classifier.The result showed that: (1) temporally aggregated Landsat-8 and Sentinel-2 data provide abundant observations to derive robust phenological based indicators for identifying winter wheat; (2) the method of combining the RF classifier with the combination of temporally aggregated Landsat-8 and Sentinel-2 data based on the GEE platform could accurately and effectively extract winter wheat in a large area; (3) the image composited using maturing phase data made the biggest contribution, and vegetation indices are more important than other reflectance bands.The analysis moved away from the specific image dates and identified the critical crop development phases for accurate winter wheat mapping, thus making the results transferrable to elsewhere and applied for the operation winter wheat monitoring.

Supplementary Materials:
The following are available online at www.mdpi.com/2072-4292/12/12/2065/s1, Figure S1: The impact of selection of number of trees in Radom Forest (RF) on classification results.RF models were built with the number of trees ranging from 50 to 950.OA represents Overall Accuracy.

Figure 1 .
Figure 1.Shandong Province, is the study area.The location of training and validation samples is shown in the figure.

Figure 2 .
Figure 2. The number of good-quality observations (after cloud mask) of (a) Sentinel-2, (b) Landsat-8, and (c) integrated Landsat-8 and Sentinel-2 images for per pixels acquired from October of 2018 to June 10th of 2019 in the study area; the lower-right histogram represents the distribution of the number of observations.

Figure 3 .
Figure 3. Flow chart of the overall workflow.

Figure 4 .
Figure 4.The mean and standard deviation of five classes, S, T, O, R, J and M represent the seedling, tillering, over-wintering, reviving, jointing-heading and maturing phases of winter wheat, respectively.

Figure 5 .
Figure 5. Winter wheat map of Shandong Province in 2018-2019, which was generated by the combinations of temporally aggregated Landsat-8 and Sentinel-2 data with RF classifier.The red square, triangle and circle represent the location of the subsets in Figure 6.

Figure 6 .
Figure 6.Three subsets of winter wheat map.(a) Sentinel-2 false-color composite display (bands 4, 8 and 11) (the locations of three places are shown in Figure 5), (b) winter wheat pixels derived from RF classifier.

Figure 7 .
Figure 7. Overall Accuracy (OA) and Kappa Coefficient of mono-temporal images, S, T, O, R, J and M represent the seedling, tillering, over-wintering, reviving, jointing-heading and maturing phases of winter wheat, respectively.

Figure 8 .
Figure 8. Overall Accuracy and Kappa of classification of a single sensor's data with the random forest classifier.

Figure 9 .
Figure 9. Importance of features in the RF classification with the multi-temporal data process; S, T, O, R, J and M represent the seedling, tillering, over-wintering, reviving, jointing-heading and maturing phases of winter wheat, respectively.

Table 2 .
Main characteristics of the satellites and band used.

Table 3 .
Number of training and validation samples. 5)

Table 4 .
Number of images available for each phase in Shandong Province.

Table 5 .
Confusion matrix of Shandong winter wheat map.