Early Identification of Seed Maize and Common Maize Production Fields Using Sentinel-2 Images

Accurate and timely access to the production area of crop seeds allows the seed market and secure seed supply to be monitored. Seed maize and common maize production fields typically share similar phenological development profiles with differences in the planting patterns, which makes it challenging to separate these fields from decametric-resolution satellite images. In this research, we proposed a method to identify seed maize production fields as early as possible in the growing season using a time series of remote sensing images in the Liangzhou district of Gansu province, China. We collected Sentinel-2 and GaoFen-1 (GF-1) images captured from March to September. The feature space for classification consists of four original bands, namely red, green, blue, and near-infrared (nir), and eight vegetation indexes. We analyzed the timeliness of seed maize identification using Sentinel-2 time series of different time spans and identified the earliest time frame for reasonable classification accuracy. Then, the earliest time series that met the requirements of regulatory accuracy were compared and analyzed. Four machine/deep learning algorithms were tested, including K-nearest neighbor (KNN), support vector classification (SVC), random forest (RF), and long short-term memory (LSTM). The results showed that using Sentinel-2 images from March to June, the RF and LSTM algorithms achieve over 88% accuracy, with the LSTM performing the best (90%). In contrast, the accuracy of KNN and SVC was between 82% and 86%. At the end of June, seed maize mapping can be carried out in the experimental area, and the precision can meet the basic requirements of monitoring for the seed industry. The classification using GF-1 images were less accurate and reliable; the accuracy was 85% using images from March to June. To achieve near real-time identification of seed maize fields early in the growing season, we adopted an automated sample generation approach for the current season using only historical samples based on clustering analysis. The classification accuracy using new samples extracted from historical mapping reached 74% by the end of the season (September) and 63% by the end of July. This research provides important insights into the classification of crop fields cultivated with the same crop but different planting patterns using remote sensing images. The approach proposed by this study enables near-real time identification of seed maize production fields within the growing season, which could effectively support large-scale monitoring of the seed supply industry.


Introduction
The plots that produce crop seeds, which are called seed production fields, account for a very small but critical proportion of the crop area. In China, for example, only about two hundred thousand to three hundred thousand hectares of farmland per year is available to produce the seeds needed for about forty million hectares of common maize. The area and yield information of these several hundred thousand hectares of seed maize production fields are the basis for agricultural management departments, enterprises, and farmers to carry out decision analysis on seed supply and demand, price, and the market. The traditional hybrid seed maize production area is typically obtained by using summary data from the seed management departments of provinces, cities, and counties. Human factors have some influence on the process, and result in low efficiency and information lag. In addition, while the statistical data provides area data, it is difficult to provide specific area and location information for each field. The use of remote sensing technology gives the specific spatial distribution and area information of seed maize from a relatively objective perspective. Therefore, establishing a method to obtain more accurate and objective data on the seed production area in the early stage of the annual seed production season is urgently required for the effective regulation of seed maize.
The planting patterns and varieties [1,2] differ between common maize and seed maize. In seed maize fields, parents and mothers (two inbred lines) are regularly planted alternately, while common maize is planted with the same hybrid variety. According to our field investigation and information provided by relevant authorities, the area of each successive seed maize field is from several dozen hectares to several hundred hectares. Because of the differences in the spectra and textures of canopies of different inbred lines and the uniform hybrid population, seed maize and common maize can be identified and distinguished by remote sensing technology. Seed maize identification is regarded as a problem that involves technology for the fine identification of different crops, as well as different planting patterns and varieties of the same crop, using remote sensing.
At present, there are three main methods for fine identification by remote sensing in seed maize fields. The first method is based on time series spectra. Using Linze County in Gansu province as the main research area, Liu et al. [3] leveraged the management differences in plots with and without film-mulching in the research area to extract the time series spectral differences caused by film-mulching. With this approach, seed maize was identified with an overall precision of up to 90%. However, this method is difficult to apply to the mulch or non-mulch differences between common maize and seed maize. The second approach is based on high-resolution texture. Zhang et al. [4] used GF-2 PAN data at the tassel stage, took field blocks as objects, extracted crop texture information by the Sobel edge detection operator, and used the Hough transform to obtain strip texture information in the field blocks of seed maize. As a result, they identified seed maize with an accuracy of 90%. However, the threshold value for the Hough transform is determined by subjective experience, and the data sources in different regions need to be readjusted. Furthermore, the low-coverage range of high-resolution data limits the wide application of this method. The third method is a combination of time series spectra and high-resolution texture. Zhang et al. [5], using Qitai county in Xinjiang as their research area, used seed maize fields before flowering to emasculate the female plant rows while reserving the male ears in the male rows. The texture differences in high spatial resolution images were used for modeling. They then tested the maize field identification method titled "multi-temporal GF-1 + Kompsat spectrum + Uniform-LBP+GLCM", and the accuracy of seed maize identification exceeded 90%. Lin Z et al. [6] successfully identified seed maize using spectral and texture characteristics based on Landsat8, GF-1, and GF-2. However, these methods cannot be applied until female castration is completed in early August every yea, and require a large amount of sample data. We need to do Remote Sens. 2020, 12,2140 3 of 21 further research on how to identify seed maize early and how to effectively use samples in areas that lack samples.
These studies have preliminarily proved that remote sensing can achieve the fine identification of different planting patterns and varieties of the same crop (maize). However, obtaining accurate seed production areas with these methods requires waiting until August because this is when seed maize and common maize show the most significant differences in the field. In addition to the problem of the timing of sample collection and data processing, mapping cannot be completed until the end of the growing season. However, by this time, seeds have been harvested, processed, and have started to flow into the market. As a result, management departments and seed enterprises face difficulties in market supervision and regulation. Therefore, accurately identifying seed maize in the middle and early stages is a key scientific issue for monitoring seed production fields by remote sensing.
Studies on the early identification of crops by remote sensing classification often construct multi-time series models by using multi-source data to explore the earliest identifiable periods in different regions and for different types of crop plants. The data sources for crop remote sensing identification include platforms equipped with optical sensors, such as Landsat, Chinese HuanJing-1, Sentinel-2, and Gaofen-1 satellites [7][8][9][10][11][12][13][14][15]; hyperspectral sensors, such as Chinese Gaofen-6 [16,17]; and other microwave sensors, such as SAR [18][19][20][21] and Sentinel-1 [22][23][24]. Researchers also frequently use multiple vegetation indexes for crop classification to compare classification features. Zhong et al. [25] used the time series of three vegetation indexes to identify maize and soybean in Connors, USA, and obtained an accuracy of over 88%. Ozdogan [26] successfully identified crops by using an unsupervised classification algorithm with two vegetation indexes. Brown et al. [27] extracted the distribution information of cotton, soybean, and maize in Brazil by using two vegetation indexes, and the accuracy was about 80%. To explore time series length, Zhan et al. [28] studied the effects of different time series intervals (16,32,48, and 64 days) on accuracy and found that time series with a high time resolution returned results with high classification accuracy. Hao et al. [29] showed that classification accuracy and certainty increased as the number of features and the length of the time series increased. Furthermore, since some images from a single sensor were missing from the critical time period, they obtained a medium-resolution 15-day time series by combining Landsat-5 TM and HJ-1 CCD data, which resulted in a better time cycle and increased the likelihood of accurate crop classification [30]. On this basis, historical sample reuse has also been an important method for early identification of crops. Hao et al. [31] explored the early identification of four major crops in Kansas in 2014 using data from MODIS EVI time series from 2006 to 2013 and the cropland data layer (CDL). Cai et al. [32] aimed to achieve optimal accuracy in the identification of maize and soybean early in the growing season based on historical samples and the deep neural network (DNN) classifier and obtained good classification results. Vorobiova et al. [33] found that the NDVI time series curve fitted by historical data can be used for early crop identification.
Compared to the different crops identified in the above studies, seed maize and common maize are more similar in their spectral characteristics and growth periods. The ability to apply these classification features and methods for the early identification of different planting patterns and varieties of the same crop still lacks experimental demonstration. Therefore, in this study, long time series images from Sentinel-2 (10 m) were used as the data source. Liangzhou, Gansu province, China (one of main seed maize production areas), was used as an example to (1) study the reflectance spectral responses of seed maize and construct a data cube with multiple spectral characteristic series from March to September, and (2) explore the influence of different time series lengths on the results of seed maize identification. We first used the samples from 2018 to explore the influence of different time series lengths on the identification accuracy of seed maize. Meanwhile, we compared KNN, SVC, RF, and LSTM classifiers and selected the best classifier. We also compared the classification results of Sentinel-2 and GF-1 images. We further used the historical samples, the current crop planting structure, and spectral information synthetically based on cluster analysis to generate new samples in order to obtain the area information of seed maize with high precision in the early growing season.

Satellite Data
In order to better monitor the whole research area, spectral information of the crop plots in the time series was extracted to monitor the growth and development of crops more effectively. In this paper, Sentinel-2 10m images were used to obtain multi-temporal image data covering the growth period of seed maize. Sentinel-2 is composed of Sentinel-2 A and Sentinel-2 B. Each satellite has a repeat visit cycle of 10 days; the satellite with two complementary satellites has a repeat visit cycle of 5 days, and the width is 290 km. The data of the main crop growing season (from March to September) in the research area were selected as the remote sensing data source. In order to compare the quality of sentinel-2 images, GF-1 images were used in this paper. GF-1 has a wide spatial resolution of 16m and a 4-day revisit cycle. The specific satellite parameters are shown in Table 1.

Satellite Data
In order to better monitor the whole research area, spectral information of the crop plots in the time series was extracted to monitor the growth and development of crops more effectively. In this paper, Sentinel-2 10m images were used to obtain multi-temporal image data covering the growth period of seed maize. Sentinel-2 is composed of Sentinel-2 A and Sentinel-2 B. Each satellite has a repeat visit cycle of 10 days; the satellite with two complementary satellites has a repeat visit cycle of 5 days, and the width is 290 km. The data of the main crop growing season (from March to September) in the research area were selected as the remote sensing data source. In order to compare the quality of sentinel-2 images, GF-1 images were used in this paper. GF-1 has a wide spatial resolution of 16m and a 4-day revisit cycle. The specific satellite parameters are shown in Table 1.

Field Sample Data
Field samples are necessary for use as reference data to train supervised classifiers for crops. They are also helpful for understanding the crop phenology calendar and planting conditions in the research area. In late July of 2017-2019, we carried out field research in Liangzhou. The sampling points were evenly distributed in the planting area. The collected data included crop types, growth situations, emasculation times of seed maize, plot areas, geographic coordinates, and field photos. In 2017, 48 ground-test samples were taken: 14 samples of seed maize, 18 samples of common maize, and 16 samples of other major crops (such as crown pear, grape, and spring wheat). In 2018, 385 ground-test samples were taken: 71 samples of seed maize, 162 samples of common maize, and 152 samples of other major crops. In 2019, 664 ground-test samples were taken: 160 samples of seed maize, 222 samples of common maize, and 282 samples of other major crops. The specific distribution of the samples is shown in Table 2 and Figure 2.

Field Sample Data
Field samples are necessary for use as reference data to train supervised classifiers for crops. They are also helpful for understanding the crop phenology calendar and planting conditions in the research area. In late July of 2017-2019, we carried out field research in Liangzhou. The sampling points were evenly distributed in the planting area. The collected data included crop types, growth situations, emasculation times of seed maize, plot areas, geographic coordinates, and field photos. In 2017, 48 ground-test samples were taken: 14 samples of seed maize, 18 samples of common maize, and 16 samples of other major crops (such as crown pear, grape, and spring wheat). In 2018, 385 ground-test samples were taken: 71 samples of seed maize, 162 samples of common maize, and 152 samples of other major crops. In 2019, 664 ground-test samples were taken: 160 samples of seed maize, 222 samples of common maize, and 282 samples of other major crops. The specific distribution of the samples is shown in Table 2 and Figure 2.

Methods
In this study, on the basis of crop type and phenological calendar differences in the study area, four aspects, namely biomass, chlorophyll content, soil background, and canopy water content, and eight indexes, namely NDVI, EVI, RVI, GNDVI, TVI, DVI, SAVI, and NDWI were selected as features. At the same time, we also added four original bands, namely red, green, blue, and near-infrared. The differences between different crops were compared in multiple dimensions to identify seed maize more effectively. From the different spectral reflectance of crops, Sentinel-2 10m images were used to construct the spectral curve of the eight vegetation indexes and four original bands of different crops. These features were then used to establish a feature image system to identify seed maize, and the influence of the time series length on the seed maize identification results was studied. Time series data of different lengths from March to September were used for modeling, and the accuracies of seed maize identification by KNN, SVC, RF, and LSTM were compared. In order to investigate the image quality of Sentinel-2, the classification accuracy was compared with that of GF-1. This study also extracted potential samples based on historical samples and clustering analysis method and then modeled and identified seed maize with time series data of different lengths. A flowchart detailing the process used in this study is shown in Figure 3. The methodology to extract new samples by historical samples section is shown in Figure 4.

Methods
In this study, on the basis of crop type and phenological calendar differences in the study area, four aspects, namely biomass, chlorophyll content, soil background, and canopy water content, and eight indexes, namely NDVI, EVI, RVI, GNDVI, TVI, DVI, SAVI, and NDWI were selected as features. At the same time, we also added four original bands, namely red, green, blue, and near-infrared. The differences between different crops were compared in multiple dimensions to identify seed maize more effectively. From the different spectral reflectance of crops, Sentinel-2 10m images were used to construct the spectral curve of the eight vegetation indexes and four original bands of different crops. These features were then used to establish a feature image system to identify seed maize, and the influence of the time series length on the seed maize identification results was studied. Time series data of different lengths from March to September were used for modeling, and the accuracies of seed maize identification by KNN, SVC, RF, and LSTM were compared. In order to investigate the image quality of Sentinel-2, the classification accuracy was compared with that of GF-1. This study also extracted potential samples based on historical samples and clustering analysis method and then modeled and identified seed maize with time series data of different lengths. A flowchart detailing the process used in this study is shown in Figure 3. The methodology to extract new samples by historical samples section is shown in Figure 4.

Selection of Time Series Vegetation Indexes
Vegetation indexes are based on the spectral reflection characteristics of vegetation, combined with the visible and near-infrared bands of satellite images, and quantitatively reflect the growth and development of vegetation and soil background under certain conditions. These indexes are frequently used for regional and global land cover determination, vegetation classification, and growth monitoring [35,36]. According to the main crop types and their growing environment in the study area, the eight vegetation indexes in Table 3 were initially selected for analysis. Table 3. Vegetation indexes selected in the study.
The main crop types in the study area include seed maize, common maize, spring wheat, crown pear and grape. The commonly used vegetation indexes provide information by using the spectral reflection difference of different vegetation, and their characteristics and the information that they provide differ: NDVI and EVI can reflect the comprehensive change in crop biomass. EVI has strong anti-saturation. When crop biomass is high, EVI is more sensitive to differences in biomass between crops; when the biomass is low, NDVI can better reflect the biomass difference between crops. TVI, RVI, and GNDVI can reflect the chlorophyll content of crops. The sensitivity of the three

Selection of Time Series Vegetation Indexes
Vegetation indexes are based on the spectral reflection characteristics of vegetation, combined with the visible and near-infrared bands of satellite images, and quantitatively reflect the growth and development of vegetation and soil background under certain conditions. These indexes are frequently used for regional and global land cover determination, vegetation classification, and growth monitoring [35,36]. According to the main crop types and their growing environment in the study area, the eight vegetation indexes in Table 3 were initially selected for analysis. Table 3. Vegetation indexes selected in the study.

Vegetation Indexes Equations
Note. B, G, R, and NIR are the reflectivity of blue, green, red, and near-infrared bands, respectively; L is the soil regulation parameter and has a value of 0.5.
The main crop types in the study area include seed maize, common maize, spring wheat, crown pear and grape. The commonly used vegetation indexes provide information by using the spectral reflection difference of different vegetation, and their characteristics and the information that they provide differ: NDVI and EVI can reflect the comprehensive change in crop biomass. EVI has strong anti-saturation. When crop biomass is high, EVI is more sensitive to differences in biomass between crops; when the biomass is low, NDVI can better reflect the biomass difference between crops. TVI, RVI, and GNDVI can reflect the chlorophyll content of crops. The sensitivity of the three characteristics varies and is dependent on different growth and development stages of vegetation. DVI and SAVI can reflect the background of crop soil. DVI is sensitive to soil background changes due to different shading conditions of crops in different periods. With the increase in biomass, the value of DVI increases rapidly.
Remote Sens. 2020, 12, 2140 8 of 21 SAVI explains the change in optical characteristics of soil background. NDWI can reflect the canopy moisture content of crops and the water content of vegetation canopy. When the vegetation canopy is subjected to water stress, NDWI can detect this quickly. Since many studies [29,32] have shown that the original spectral bands have a better effect on crop classification, this study added the original bands of red, blue, green, and nir in the feature set. The main research object of this paper was seed maize. Seed maize and common maize belong to maize, but their planting patterns are different (as shown in the Figure 5). The identification of other crops in the classification is mainly to improve the generalization ability of the model. According to expert knowledge, the biomass of seed maize is smaller than common maize in the whole growth period. As shown in Figure 6, NDVI and EVI of seed maize were lower than common maize in the whole trend, and the difference between them was more obvious from July to August when seed maize finished the female castration. There were differences in spectral reflectance profiles between seed maize and common maize, which indicated that seed maize and common maize could be classified.
Remote Sens. 2020, 12, x FOR PEER REVIEW 8 of 20 characteristics varies and is dependent on different growth and development stages of vegetation. DVI and SAVI can reflect the background of crop soil. DVI is sensitive to soil background changes due to different shading conditions of crops in different periods. With the increase in biomass, the value of DVI increases rapidly. SAVI explains the change in optical characteristics of soil background. NDWI can reflect the canopy moisture content of crops and the water content of vegetation canopy. When the vegetation canopy is subjected to water stress, NDWI can detect this quickly. Since many studies [29,32] have shown that the original spectral bands have a better effect on crop classification, this study added the original bands of red, blue, green, and nir in the feature set. The main research object of this paper was seed maize. Seed maize and common maize belong to maize, but their planting patterns are different (as shown in the Figure 5). The identification of other crops in the classification is mainly to improve the generalization ability of the model. According to expert knowledge, the biomass of seed maize is smaller than common maize in the whole growth period. As shown in Figure 6, NDVI and EVI of seed maize were lower than common maize in the whole trend, and the difference between them was more obvious from July to August when seed maize finished the female castration. There were differences in spectral reflectance profiles between seed maize and common maize, which indicated that seed maize and common maize could be classified.
(a) (b) Figure 5. The photos of (a) seed maize with parents and mothers regularly planted alternately;(b) common maize, which is planted with the same hybrid variety.
(a) (b) Figure 5. The photos of (a) seed maize with parents and mothers regularly planted alternately;(b) common maize, which is planted with the same hybrid variety.
Remote Sens. 2020, 12, x FOR PEER REVIEW 8 of 20 characteristics varies and is dependent on different growth and development stages of vegetation. DVI and SAVI can reflect the background of crop soil. DVI is sensitive to soil background changes due to different shading conditions of crops in different periods. With the increase in biomass, the value of DVI increases rapidly. SAVI explains the change in optical characteristics of soil background. NDWI can reflect the canopy moisture content of crops and the water content of vegetation canopy. When the vegetation canopy is subjected to water stress, NDWI can detect this quickly. Since many studies [29,32] have shown that the original spectral bands have a better effect on crop classification, this study added the original bands of red, blue, green, and nir in the feature set. The main research object of this paper was seed maize. Seed maize and common maize belong to maize, but their planting patterns are different (as shown in the Figure 5). The identification of other crops in the classification is mainly to improve the generalization ability of the model. According to expert knowledge, the biomass of seed maize is smaller than common maize in the whole growth period. As shown in Figure 6, NDVI and EVI of seed maize were lower than common maize in the whole trend, and the difference between them was more obvious from July to August when seed maize finished the female castration. There were differences in spectral reflectance profiles between seed maize and common maize, which indicated that seed maize and common maize could be classified. A phenological calendar is a calendar that combines the phenology of plants with natural environmental conditions, such as climate and hydrology. Because of the different growth and development processes of different crops, the phenological calendars also differ among most crops; the phenological calendar of the main crops in the study area is shown in Figure 7. In this study, from the phenological calendars of the main crops in the study area-especially those of seed maize and common maize, which are more difficult to distinguish-time series remote sensing images captured during the main crop growth period in the study area were obtained, and the time series values of various vegetation indexes were calculated. The eigenvalues of the time series vegetation indexes corresponding to the field survey samples were extracted, and the time series characteristic curves of the vegetation indexes were constructed to reflect the process of crop growth and development.

Selection of Classification Algorithms
At present, many machine learning classification methods are applied to crop classification research, but no algorithm can solve all problems perfectly [45]. Therefore, we tried to find a suitable algorithm for the early identification of seed maize. This study aimed to test the performance of the algorithms through empirical judgment and classification accuracy. The four algorithms tested were K-nearest neighbor (KNN) [46], support vector classification (SVC) [47], random forest (RF) [48], and long short-term memory (LSTM) [49]. Based on these four algorithms, the data of different length time series from March to September in Liangzhou were modeled, and the effects of identification of seed maize under different time series lengths were compared. Furthermore, dynamic time warping (DTW) [50] was also used to establish the characteristic data set of the KNN algorithm. DTW is used to calculate the distance between different time series data, and this distance serves as the distance function in the nearest neighbor algorithm. SVC is a very powerful and flexible machine learning A phenological calendar is a calendar that combines the phenology of plants with natural environmental conditions, such as climate and hydrology. Because of the different growth and development processes of different crops, the phenological calendars also differ among most crops; the phenological calendar of the main crops in the study area is shown in Figure 7. In this study, from the phenological calendars of the main crops in the study area-especially those of seed maize and common maize, which are more difficult to distinguish-time series remote sensing images captured during the main crop growth period in the study area were obtained, and the time series values of various vegetation indexes were calculated. The eigenvalues of the time series vegetation indexes corresponding to the field survey samples were extracted, and the time series characteristic curves of the vegetation indexes were constructed to reflect the process of crop growth and development. A phenological calendar is a calendar that combines the phenology of plants with natural environmental conditions, such as climate and hydrology. Because of the different growth and development processes of different crops, the phenological calendars also differ among most crops; the phenological calendar of the main crops in the study area is shown in Figure 7. In this study, from the phenological calendars of the main crops in the study area-especially those of seed maize and common maize, which are more difficult to distinguish-time series remote sensing images captured during the main crop growth period in the study area were obtained, and the time series values of various vegetation indexes were calculated. The eigenvalues of the time series vegetation indexes corresponding to the field survey samples were extracted, and the time series characteristic curves of the vegetation indexes were constructed to reflect the process of crop growth and development.

Selection of Classification Algorithms
At present, many machine learning classification methods are applied to crop classification research, but no algorithm can solve all problems perfectly [45]. Therefore, we tried to find a suitable algorithm for the early identification of seed maize. This study aimed to test the performance of the algorithms through empirical judgment and classification accuracy. The four algorithms tested were K-nearest neighbor (KNN) [46], support vector classification (SVC) [47], random forest (RF) [48], and long short-term memory (LSTM) [49]. Based on these four algorithms, the data of different length time series from March to September in Liangzhou were modeled, and the effects of identification of seed maize under different time series lengths were compared. Furthermore, dynamic time warping (DTW) [50] was also used to establish the characteristic data set of the KNN algorithm. DTW is used to calculate the distance between different time series data, and this distance serves as the distance function in the nearest neighbor algorithm. SVC is a very powerful and flexible machine learning

Selection of Classification Algorithms
At present, many machine learning classification methods are applied to crop classification research, but no algorithm can solve all problems perfectly [45]. Therefore, we tried to find a suitable algorithm for the early identification of seed maize. This study aimed to test the performance of the algorithms through empirical judgment and classification accuracy. The four algorithms tested were K-nearest neighbor (KNN) [46], support vector classification (SVC) [47], random forest (RF) [48], and long short-term memory (LSTM) [49]. Based on these four algorithms, the data of different length time series from March to September in Liangzhou were modeled, and the effects of identification of seed maize under different time series lengths were compared. Furthermore, dynamic time warping (DTW) [50] was also used to establish the characteristic data set of the KNN algorithm. DTW is used to calculate the distance between different time series data, and this distance serves as the distance function in the nearest neighbor algorithm. SVC is a very powerful and flexible machine learning model that can perform linear or nonlinear classification and is widely used in crop classification research. The random forest algorithm is an integrated algorithm-a branch of machine learning-that can perform regression and classification. It can process input samples with high dimensional features without dimensionality reduction and can evaluate the importance of each feature in the classification process. This paper also used this algorithm to analyze the important characteristics in the early identification of seed maize. At present, many scholars have begun to use the deep learning algorithm to research crop classification and have achieved good results. The LSTM algorithm is suitable for processing time series data, so we selected LSTM and evaluated its performance in this experiment.

Experiment Design
We designed the following experiments to understand the performance of early identification of seed maize based on current or historical samples. Firstly, we established the feature system of eight vegetation indexes and four original bands based on the current samples and Sentinel-2 images. The data of different respective time series lengths from March to September were modeled. In addition, we compared the classification accuracy of KNN, SVC, RF, and LSTM to obtain the best classifier and the earliest time series for identification of seed maize. In order to verify the reliability of seed maize identification and the quality of Sentinel-2 images in this research, we conducted a time series progressive experiment based on the same features and the best classifier using GF-1 images. Because RF can evaluate the importance of each feature in the classification process, we also used this method to analyze the important features of different time series classification.
On this basis, we further made comprehensive use of historical samples and remote sensing data to generate new samples based on the k-means clustering method to realize early prediction of seed maize. A similar method has been successfully applied to the classification of maize and rice [51]. In this study, we improved the method and applied it to the early identification of seed maize. First, we used the samples from 2017 and 2018 to conduct ten crop mappings, superimpose classification results for 2017 and 2018, and calculate the frequency of different crop types per pixel. As the crop type frequency of this pixel was twenty times in 2017-2018 (ten classification results per year, a total of two years), the pixel was extracted as a potential sample. Then, according to the spectral characteristics and spatial information of images in 2019, the potential samples were clustered based on the k-means algorithm by the Google Earth Engine platform. Finally, we calculated the proportion of major crop types in each clustering item, and the crop category with the largest proportion was selected as the category of this clustering item; the clustering results were superposed with potential samples, and the pixels with the same category were extracted as new samples in 2019. Based on the new samples and the Sentinel-2 in 2019, the training of feature sets of different time series lengths from March to September was constructed to realize the early prediction of seed maize, and true samples from 2019 were used to evaluate the accuracy of this method. The method mainly considered the local stable planting structure and spectral information of the current year to realize the historical sample reuse and the early prediction of seed maize. Finally, we evaluated the accuracy of the classification results mainly through the confusion matrix and used the three metrics of overall accuracy (OA), producer accuracy (PA), and user accuracy (UA).

Early Identification by Different Algorithms
A multi-dimensional sample time series spectral feature set was constructed by using field research samples and obtaining time series Sentinel-2 images for the crop growth period (from March to September). Two-thirds of the research units were randomly selected as training samples, and the remaining part of the units were used as verification samples. For each algorithm, we conducted 10 experiments. The partition of each data set was based on Monte Carlo cross validation, in each experiment, and the training and verification data were randomly divided again at a ratio of 2:1.
KNN, SVC, RF, and LSTM were used for modeling and classification, and we chose the confusion matrix to evaluate the performance of models. The accuracies of crop classification, based on 10 repetitions, are shown in Table 4. Since the main research object of this paper was seed maize, we listed PA and UA for seed maize in the table. The comparison of overall accuracy is shown in Figure 8. According to the results, the classification accuracy of KNN, RF, and LSTM increased with the extension of time series. However, the classification accuracy of SVC began to decrease after it reached its peak in March to June. Through analysis, as the time series grew, more and more sentinel-2 images were added to the classification process, but the sample size remained unchanged, resulting in the problem of sample dilution, which led to subsequent precision reduction. The highest classification accuracy of both RF and LSTM reached more than 90%, and the PA and UA of seed maize exceeded 94%. On the whole, LSTM performed better than RF, indicating that the deep learning algorithm was indeed effective in crop identification research. In this experiment, we chose LSTM as the most suitable classifier. This could be seen from the classification accuracy diagram with different Sentinel-2 time series lengths. In March and April, most of the vegetation began to grow, wheat was just turning green, while crops such as maize were newly or not yet sown, and their biomass was usually low. After May, crops grew rapidly, and the spectral differences between crops in the remote sensing images increased; as a result, the classification accuracy increased rapidly. From June to September, with the increase in the time series length, on the whole, the classification accuracy of seed maize continued to improve. Moreover, by the end of June, the OA of classification could reach 89%, and the PA and UA of seed maize could reach over 90%. Although there was a slight drop in classification accuracy at the end of July, the impact was not significant. The distribution maps of seed maize based on LSTM with different time series lengths are shown in Figure 9. We found that seed maize was mainly distributed in the northwest and southeast corner of Liangzhou, which was also consistent with the actual situation. It also can be seen from the figures that seed maize was planted in clusters, because seed maize is a special crop and requires a high planting environment to produce relatively pure seeds. Comprehensive consideration of timeliness, economic cost, and classification accuracy suggests that it is more reasonable to carry out the early identification with the March-June time series by obtaining the remote sensing images from March to June in a timely manner and constructing the spectral feature image sets to identify the seed maize. This strategy was more consistent with the selection of the optimal time period in practice.    According to the characteristics of the random forest classification algorithm, the importance of classification features could be sorted. There are two ways to get the importance score of features in the random forest. One is based on Gini index and another is based on out-of-bag (OOB) error. We chose the second approach here. Our experiments were done using python and the rfpimp package [52], which is based on OOB. In this study, the RF algorithm based on time series feature sets of different lengths was used to conduct ten experiments. According to the ranking of the importance of each feature obtained by each experiment, we voted on each feature. Finally, the top ten features were selected by the average of the ranking scores comprehensively. Figure 10 shows the top ten features in this experiment with extension of time series. Feature importance ranking was done for March to April, March to May, and March to June. The features of DOY108 are high on the list, according to the phenological calendar. Seed maize at this time was just starting to be sown; common maize was seeded the most, spring wheat had entered a growth period, and crown pear had also just started to bud. Thus, at this time, the landscape of different crops and the growth difference were obvious. Adding images of July, seed maize and common maize entered a period of rapid growth, while spring wheat was in the harvest period, crown pear began to enter the harvest period, and the biomass began to decline. Therefore, the characteristics of DOY203 were relatively important. At the beginning of August, seed maize had completed the emasculation work, while seed maize and common maize had the most obvious difference; spring wheat had completed the harvest, crown pear had fully entered the harvest period, and grape had also begun to enter the harvest period. At this time, the growth difference of each crop reached the maximum. In September, most of the crops had been harvested, and the ground was exposed, so the importance score of the blue was getting higher. According to the characteristics of the random forest classification algorithm, the importance of classification features could be sorted. There are two ways to get the importance score of features in the random forest. One is based on Gini index and another is based on out-of-bag (OOB) error. We chose the second approach here. Our experiments were done using python and the rfpimp package [52], which is based on OOB. In this study, the RF algorithm based on time series feature sets of different lengths was used to conduct ten experiments. According to the ranking of the importance of each feature obtained by each experiment, we voted on each feature. Finally, the top ten features were selected by the average of the ranking scores comprehensively. Figure 10 shows the top ten features in this experiment with extension of time series. Feature importance ranking was done for March to April, March to May, and March to June. The features of DOY108 are high on the list, according to the phenological calendar. Seed maize at this time was just starting to be sown; common maize was seeded the most, spring wheat had entered a growth period, and crown pear had also just started to bud. Thus, at this time, the landscape of different crops and the growth difference were obvious. Adding images of July, seed maize and common maize entered a period of rapid growth, while spring wheat was in the harvest period, crown pear began to enter the harvest period, and the biomass began to decline. Therefore, the characteristics of DOY203 were relatively important. At the beginning of August, seed maize had completed the emasculation work, while seed maize and common maize had the most obvious difference; spring wheat had completed the harvest, crown pear had fully entered the harvest period, and grape had also begun to enter the harvest period. At this time, the growth difference of each crop reached the maximum. In September, most of the crops had been harvested, and the ground was exposed, so the importance score of the blue was getting higher. Remote Sens. 2020, 12, x FOR PEER REVIEW 14 of 20

Classification Comparison of Sentinel-2 and GF-1
In order to observe the reliability of the time series from March to June as the earliest period of seed maize and the image quality of Sentinel-2, we selected the medium-high resolution GF-1 images to do a contrast experiment based on the best effect classifier. In the experiment, two-thirds of the research units were randomly selected as training samples, and the remaining part of the units were used as verification samples. The comparison of accuracy results are shown in Figure 11. The classification accuracy based on Sentinel-2 was slightly higher than that of GF-1, indicating that the image quality of Sentinel-2 was reliable. Meanwhile, the classification accuracy based on GF-1 from March to June reached 85%. This shows that it was reasonable to choose images from March to June as the earliest identification time of seed maize.

Classification Comparison of Sentinel-2 and GF-1
In order to observe the reliability of the time series from March to June as the earliest period of seed maize and the image quality of Sentinel-2, we selected the medium-high resolution GF-1 images to do a contrast experiment based on the best effect classifier. In the experiment, two-thirds of the research units were randomly selected as training samples, and the remaining part of the units were used as verification samples. The comparison of accuracy results are shown in Figure 11. The classification accuracy based on Sentinel-2 was slightly higher than that of GF-1, indicating that the image quality of Sentinel-2 was reliable. Meanwhile, the classification accuracy based on GF-1 from March to June reached 85%. This shows that it was reasonable to choose images from March to June as the earliest identification time of seed maize. Remote Sens. 2020, 12

Early Identification of Seed Maize Based on Historical Samples
In addition to timeliness, the economic cost and safety of field sampling were also considered in the early identification of seed maize. Furthermore, we tried to give consideration to the method of using historical samples to realize rapid identification. In this study, the model of the classification year was constructed based on historical data and image data, which generated new samples. This method was proposed because the planting structure of crops and seed maize fields were stable in the research area. Potential samples were extracted by the superposition of classification year image clustering results with the same crop type in historical years. We also considered the proportion of sample structure at the same time. In order to get clustering results quickly, the experiment was conducted on the Google Earth Engine platform. Finally, we used the new samples to implement crop classification with images of different time series lengths. The new samples are shown in Figure  12.

Early Identification of Seed Maize Based on Historical Samples
In addition to timeliness, the economic cost and safety of field sampling were also considered in the early identification of seed maize. Furthermore, we tried to give consideration to the method of using historical samples to realize rapid identification. In this study, the model of the classification year was constructed based on historical data and image data, which generated new samples. This method was proposed because the planting structure of crops and seed maize fields were stable in the research area. Potential samples were extracted by the superposition of classification year image clustering results with the same crop type in historical years. We also considered the proportion of sample structure at the same time. In order to get clustering results quickly, the experiment was conducted on the Google Earth Engine platform. Finally, we used the new samples to implement crop classification with images of different time series lengths. The new samples are shown in Figure 12. The newly generated samples were used as the training data of the classification model, and then one-third of the true samples was taken as the verification sample. The models were constructed based on the feature sets of different time series lengths, and the classification results were evaluated by the confusion matrix. The accuracy results are shown in Figure 13. On the whole, the classification accuracy increased with the extension of time series, and the classification accuracy of the full time series reached 74%. From March to July, the accuracy reached more than 60%, which could basically meet the requirements of related management departments for the early identification of seed maize. It can be seen from the results that there is a lot of potential for the use of historical samples. The full use of historical samples can save a lot of labor and material resources and improve the inter-annual generalization ability of models. The use of historical samples is also one of the important methods to realize large-scale rapid mapping.  The newly generated samples were used as the training data of the classification model, and then one-third of the true samples was taken as the verification sample. The models were constructed based on the feature sets of different time series lengths, and the classification results were evaluated by the confusion matrix. The accuracy results are shown in Figure 13. On the whole, the classification accuracy increased with the extension of time series, and the classification accuracy of the full time series reached 74%. From March to July, the accuracy reached more than 60%, which could basically meet the requirements of related management departments for the early identification of seed maize. It can be seen from the results that there is a lot of potential for the use of historical samples. The full use of historical samples can save a lot of labor and material resources and improve the inter-annual generalization ability of models. The use of historical samples is also one of the important methods to realize large-scale rapid mapping.
Remote Sens. 2020, 12, x FOR PEER REVIEW 16 of 20 Figure 12. The spatial distribution of the new samples including common maize, seed maize, and other crops, which were generated from historical samples combined with spectral information.
The newly generated samples were used as the training data of the classification model, and then one-third of the true samples was taken as the verification sample. The models were constructed based on the feature sets of different time series lengths, and the classification results were evaluated by the confusion matrix. The accuracy results are shown in Figure 13. On the whole, the classification accuracy increased with the extension of time series, and the classification accuracy of the full time series reached 74%. From March to July, the accuracy reached more than 60%, which could basically meet the requirements of related management departments for the early identification of seed maize. It can be seen from the results that there is a lot of potential for the use of historical samples. The full use of historical samples can save a lot of labor and material resources and improve the inter-annual generalization ability of models. The use of historical samples is also one of the important methods to realize large-scale rapid mapping.

Discussion
The paper evaluated the early identification of seed maize. As far as we know, many seed companies use UAV to distinguish crop phenotypes and genotypes. However, for this study, it was difficult to monitor such a wide range of distributed locations with UAV because the area of each successive seed maize field is from several dozen hectares to several hundred hectares. Moreover, the cruising ability of most UAVs is insufficient, and it is difficult to realize large-scale monitoring of seed maize. Therefore, remote sensing technology is a good choice to realize large-scale monitoring quickly and efficiently, and the spatial resolution of GF-1 and Sentinel-2 selected in this paper can support the identification of seed maize based on the average size of seed maize fields.
Compared with the previous studies on seed maize identification [3-6], seed maize identification must wait until August or even after harvest, which leads to a lag in information. The method used in this paper can obtain the area and space information of seed maize at the end of June, the OA of classification could reach 89%, and the PA and UA of seed maize could reach over 90%, so as to realize the large-scale rapid mapping of seed maize. In addition to the length of the time series, the algorithm of classification was further explored. At present, the ability of deep learning method is more and more prominent in the field of crop classification. The LSTM method used in this study had a good identification effect on seed maize. The highest classification accuracy could reach more than 90%, and the PA and UA of seed maize exceeded 94%. Moreover, the performance of LSTM in accuracy was also very stable. This result means that seed maize can be identified with high accuracy by obtaining Sentinel-2 images in the middle and early part of the growing season. Nonetheless, RNN may have some problems, such as the gradient explosion problem. Therefore, the classification model can be improved by combining it with the convolutional neural network (CNN) [53,54] in later seed maize identification. Because the deep learning method is sensitive to feature differences and has a strong learning ability, we think we can also use the deep learning algorithm to extract features.
Furthermore, previous methods [3-6] required a large amount of sample data, which cost a lot of labor power and material resources. Therefore, the full use of historical samples is of great significance for large-scale crop mapping, which can save a lot of cost, realize rapid mapping, and provide effective information for decision-making departments to make timely decisions. In this study, the historical samples were reused to generate new samples to realize the early prediction of seed maize. This approach has generally achieved satisfactory results. The classification accuracy of full time series feature sets is 74%, and that of feature sets from March to July is 63%, which meets the basic requirements of the regulatory authorities for early identification of seed maize. This approach effectively improved the utilization efficiency of historical data and provided a new idea for the continuous development of crop mapping in many regions lacking crop samples. We preliminarily tried k-means for the research on the generation of new samples by combining historical samples with spectral information. Moreover, generative adversarial networks (GAN) [55] has become one of the most promising methods in the field of unsupervised learning on complex distribution in recent years. It has great potential to be used to perform cluster analysis on remote sensing images in generating new samples, which can fully learn the phenological characteristics of crops and improve the sample purity. Although this study obtained high accuracy in early identification of seed maize, it is worth noting that there are also some limitations. The available images of Sentinel-2 and GF-1 data seem to be scarce. Fusion of multi-source data can enhance the temporal resolution. For example, satellite image data from platforms such as Sentinel-1 and Landsat 8 can be added for comparison and fusion research to achieve more time series image coverage within the monitoring range and make full use of crop phenology to improve the precision and reliability of remote sensing identification of seed maize. In addition to focusing on the accuracy of classification results, the efficiency of computer modeling is also an attractive issue. Remote sensing data, as big data, requires a huge amount of computing resources from preprocessing to the whole classification process. Cloud computing platforms like Google Earth Engine can be used to assist classification calculation. In this study, we used Google Earth Engine for data clustering in the classification year, which achieved efficient computing, rapid operation, and reasonable allocation of resources. Cloud computing is also a potential direction for the future development of crop classification.

Conclusions
The remote sensing identification of seed maize needs to distinguish differences in planting patterns and varieties of the same crop. The ability to identify a seed maize field in the middle and early part of the growing season is crucial for agricultural administration departments to carry out market supervision. The core idea of this paper is to explore the early identification of seed maize based on Sentinel-2 images, that is, how early can seed maize be identified and what accuracy is acceptable during the whole growth period. We discuss this question from three aspects: classification algorithm, data source, and historical samples. To this end, this study used Sentinel-2 and GF-1 images as data sources. Analysis of the spectral reflection characteristics of seed maize, common maize, and other surrounding crops in remote sensing images formed a multi-phase spectral feature system, and the method for the early identification of seed maize was explored. The main conclusions are as follows: With the time series of feature set being extended, the identification accuracy of seed maize based on KNN, RF, and LSTM was higher. However, the SVC algorithm began to decrease after the classification accuracy reached the peak from March to June. As more and more sentinel-2 images were added, the sample dilution problem appeared in the SVC algorithm. Among the four classification algorithms, LSTM was selected as the most suitable classifier. The seed maize mapping of this experimental area could be carried out at the end of June, and the accuracy could meet the basic requirements of market supervision and regulation. In terms of data source, we compared the classification results of Sentinel-2 and GF-1. By analyzing the accuracy of Sentinel-2 and GF-1 images, Sentinel-2 was further confirmed to be a good choice of data in this study. From the perspective of historical samples, we found that the earliest identification of seed maize could be achieved in July. This discovery shows us the great potential of historical samples. This paper makes comprehensive use of historical samples and remote sensing data to generate new samples as training data based on expert knowledge, and true samples are used as validation data. The approach has generally achieved satisfactory results. However, the amount of sample data is an important problem in the early identification of seed maize. For areas lacking samples, the task still faces significant challenges. Therefore, it is necessary for us to further study how to transfer the sample characteristics from one region to another region with the support of transfer learning.