remote Mapping Crop Distribution Patterns and Changes in China from 2000 to 2015 by Fusing Remote-Sensing, Statistics, and Knowledge-Based Crop Phenology

: Maps of different kinds of crops offer information about both crop distribution and crop mix, which support analyses on food security, environmental change, and climate change. Despite the growing capability for mapping speciﬁc crops, the majority of studies have focused on a few dominant crops, whereas maps with a greater diversity of crops lack research. Combining cropping seasons derived from MODIS EVI data, regional crop calendar data, and agricultural statistical surveys, we developed an allocation model to map 14 major crops at a 1 km resolution across China for the years 2000, 2010, and 2015. The model was veriﬁed based on the ﬁtness between the area of the three typical combinations of region, crop/crop group derived from remote sensing data, and statistical data. The R 2 , indicating ﬁtness, ranged from 0.51 to 0.75, with a higher value for the crops distributed in plain regions and a lower value in regions with topographically diverse landscapes. Within the same combination of region and crop/crop group, the larger harvest area a province has, the higher its ﬁtness, suggesting an overall reliable result at the national level. A comparison of paddy rice between our results and the National Land Use/Cover Database of China showed a relatively high R 2 and slope of ﬁtness (0.67 and 0.71, respectively). Compared with the commonly used average allocation model, and without lending cropping season information, the diversity index of the results from our model is about 30% higher, indicating crop maps with greater spatial details. According to the spatial distribution analysis of the four main crops, the grids showing decreased trends accounted for 74.92%, 57.32%, and 59.00% of the total changed grid for wheat, rice, and soybean crops, respectively, while accounting for only 37.71% for maize. The resulting data sets can be used to improve assessments for nutrient security and sustainability of cropping systems, as well as their resilience in a changing climate.


Introduction
Crops planted on croplands play a key role in providing human society with food [1], offering more than 90% of all food calories and approximately 80% of all food protein and fats available in the world by generating food and feed crops [2]. Due to crop-specific physiological characteristics and requirements for management, the environmental consequences [3,4], such as water discharge, risks of excess fertilizer application, and greenhouse gas (GHG) emissions, vary among different cropping systems and among different places. Therefore, identifying how crops are distributed is the basis for recognizing the sustainability of cropping systems [5] and for promoting pathways to food security [6].
China contains approximately 18% of the world's population [7], but only 8% of global arable land and 5% of global renewable water resources [8]. Efficiently producing sufficient quantities of various foods to fulfill the nutritional needs of its people is of great importance for China, as well as for the world. Crop maps are the basis for investigating food security 2 of 19 and its sustainability by helping assess food production and environmental impacts, such as water consumption, non-point pollution, and so on. They also provide information for explicitly estimating the impacts of food security within climate change as different crops perform differently [9]. However, the huge variations in climate conditions and geographic characteristics within China lead to heterogeneity in cropping systems among regions [10]. Mapping the crop distribution of China is thus difficult, though it is critical to understand changes of the cropping systems.
Surveys and censuses, remote sensing techniques, and the combination of these two methods are the three major ways to obtain information on crop distribution. Tradeoffs among scale, spatial resolution, and number of crops of concern always exist when choosing a method. Field surveys and censuses are the traditional methods for obtaining crop information, and they offer an overall understanding of the area for all types of crops at different administrative levels but lack spatial details. For example, FAOSTAT [11] has helped to understand the evolution of cropping systems with detailed information on 14 groups and 173 types of food products around the world and in all countries. In China, the National Bureau of Statistics [12] updates the areas of 27 crops in 9 groups in 31 provinces every year, while the Ministry of Agriculture and Rural Affairs investigates the crop area of these crops every year in more than 2000 counties [13]. This information offers a general picture of the cropping systems around the world and in China; however, this information is not enough to feed the various models designed to detect climatic or environmental impacts on cropping systems sustainability [14].
With the development of remote sensing techniques, the capability of detecting spatially explicit crop types has improved greatly. Heavily relying on temporally resolved information, imagery from MODIS (the moderate resolution imaging spectroradiometer), has been the most popular data in mapping crops at a large scale since the launch of the MODIS satellites, due to its unique combination of temporal, spatial, and spectral resolutions [15,16]. To improve the spatial resolution of the imagery, a method for fusing data between relatively higher resolution remote sensing images (i.e., Landsat) and MODIS, with its higher temporal resolution, has been developed [17][18][19].
In recent years, the free access of moderate-high resolution sensor data, such as Landsat and Sentinel imagery, has largely resolved the tradeoff between spatial and temporal resolutions [20,21]. To enhance temporal resolution, data with similar spatial resolutions have long been combined. Landsat and Sentinel data have commonly been merged for crop mapping, as well as for fusing optical and radar data [20,[22][23][24].
Based on low-moderate resolution data, such as MODIS data, or using moderate-high resolution data, such as Landsat and Sentinel data, the crop types that can be successfully identified by remote sensing data at regional or national scales mainly correspond to major crops with large planting areas, such as rice [25][26][27], wheat [28], maize [29], soybean [30], and sugarcane [31], and to different combinations of these crops [32][33][34][35]. Less common crops are generally not included due to their low visibility in the signals from the remote sensing images.
When performing large scale crop mapping with a full range of crop types, studies commonly utilize a combination of remote sensing data and agricultural statistics. This technique takes advantage of both the comprehensive information on crop types from the statistical data and the spatial information of cropland from the remote sensing data [14]. To develop more spatially heterogeneous crop maps, climate factors, soil characteristics, population distribution, and other factors are also introduced into the allocation model [36][37][38].
While most of the large-scale maps of a large number of crops developed by blending remote sensing and census data mainly only involve cropland data derived from remote sensing images, the role of crop phenology reflected by time series remote sensing data in disaggregating census data remains relatively unexplored. Some studies [39,40] have provided an example of mapping various types of crops by employing crop growth processes reflected by time series remote sensing data; however, these studies still focused on several major crops or in small regions. How information obtained from remote sensing Remote Sens. 2022, 14, 1800 3 of 19 data, except for cropland distribution, could be engaged in mapping the distribution of large spans of crop types is still a question.
This study aims to develop a model to map the distribution of various crops at a large scale. This method disaggregates statistical crop areas based on cropping seasons derived from time series remote sensing data and regional crop calendars, thereby making good use of a variety of crop type information from statistical data and spatial details from remote sensing data. For the years 2000, 2010, and 2015, we applied this method in China to develop 1 km resolution maps of 14 crops, including wheat (winter and spring), rice (early, middle, and late), maize, soybean, rapeseed, sugarcane, etc. Combined, these crops comprise 76% of the total harvest area and 87% of the total crop production, nationally. The results contribute to identifying the distribution of crops and their changes between 2000 and 2015, which is critical to understanding the changes in cropping systems, and then adjusting the pathways accordingly to build more sustainable cropping systems and ensuring food security as much as possible in the future.

Land Use Data
We mapped the crop distribution on top of the national cropland map, which was derived from the national land use/cover database of China (NLCD-C) [41]. The NLCD-C was obtained by visual interpretation based on remote sensing data with a spatial resolution of approximately 30 m. It was originally at a 1:100,000 scale and in vector format. It includes land use maps for approximately every five years from the late 1980s to 2015. We used the maps from 2000, 2010, and 2015, and extracted cropland maps from them to identify cropland grids.

Remote Sensing Data
The 16 day composite MODIS EVI (MOD13Q1) time series data at a spatial resolution of 250 m was used to obtain the cropping seasons. We obtained MODIS data from the NASA Distributed Active Archive Center (DAAC). A total of 20 titles was included to cover the whole of China, with 23 phases for each title for each year. We projected all MODIS EVI images to the Albers conical equal-area projection to match with the land use data described above. We transformed the land use data from a vector file into a raster file with 250 m × 250 m grid cells to match the MODIS images and identified which grid cells were cropland. Areas without distributed cropland were masked out from all titles separately in each study period.

Statistical Data
In order to get crop distribution maps, the statistic crop harvest data were allocated onto different kinds of cropland. County-level crop harvest data for four major crops-wheat (including winter wheat and spring wheat), maize, rice (including early rice planted in April, middle rice planted in May or June, and late rice planted in July), and soybean-were obtained for all three periods [42]. For the other ten crops, we obtained the county-level data only for 2000, while provincial data for the other two periods were obtained from the Chinese Statistical Yearbook [43]. We used the provincial data to get the change rate for each province and scaled the county-level harvest area in 2000 to the targeted year by the change rate. For both the county-level data and provincial data, we estimated the value in each study year from the average value of harvested area over a three-year interval, including the base year and the years before and after the base year. For county-level data, we cross-checked the sum of county level data for each province with the provincial value and scaled the value for each county to ensure that the sum matched the value for that province.

Regional Phenology Data
Phenology data, recording the date of planting, heading, flowering, and harvest for major crops in different provinces, were obtained through the China Agriculture Informa- tion Center (http://www.agri.cn/, accessed on 15 February 2022). Twelve crops/subtypes of crops, including winter wheat, spring wheat, maize, early rice, middle rice, late rice, soybean, cotton, rapeseed, groundnut, sugarcane, and sugar beet, were contained in the dataset. We also obtained phenology information for other crops through a literature review. This was integrated with county-level crop calendar information in China from Global Agro-Ecological Zones (GAEZ) data [44]. We finally divided the whole country into six agricultural regions ( Figure 1, Table 1), which have different crop calendars. Along with wheat and rice, groundnut and maize have different varieties that can be planted in different seasons in the same region, such as spring groundnut and summer groundnut in the North China Plain and spring maize and summer maize in central western China. Since we do not have access to values for the harvest area of the different varieties of these two crops, we did not separate them.

Methods
We allocated the crop harvested areas on to cropland by matching different crops to the corresponding cropping season of different cropping patterns in each agricultural region, and then analyzed the change patterns of crop distribution in China from the years 2000 to 2015. Considering the resolution of the auxiliary information, including the statistical data (county level) and remote sensing data (250 m), we developed crop distribution maps at a spatial resolution of 1 km, for which the value of each grid cell indicates the cropland area in this 1 km 2 region. A graphic overview of the methods used is shown in Figure 2.

Cropping Seasons Detecting
Consistent with Wardlow, Egbert, and Kastens [16], we used a time series of MODIS EVI to detect crop phenology. The phase and amplitude of the crop phenology are com-

Methods
We allocated the crop harvested areas on to cropland by matching different crops to the corresponding cropping season of different cropping patterns in each agricultural region, and then analyzed the change patterns of crop distribution in China from the years 2000 to 2015. Considering the resolution of the auxiliary information, including the statistical data (county level) and remote sensing data (250 m), we developed crop distribution maps at a spatial resolution of 1 km, for which the value of each grid cell indicates the cropland area in this 1 km 2 region. A graphic overview of the methods used is shown in Figure 2.

Methods
We allocated the crop harvested areas on to cropland by matching different crops the corresponding cropping season of different cropping patterns in each agricultural r gion, and then analyzed the change patterns of crop distribution in China from the yea 2000 to 2015. Considering the resolution of the auxiliary information, including the stati tical data (county level) and remote sensing data (250 m), we developed crop distributio maps at a spatial resolution of 1 km, for which the value of each grid cell indicates th cropland area in this 1 km 2 region. A graphic overview of the methods used is shown Figure 2.

Cropping Seasons Detecting
Consistent with Wardlow, Egbert, and Kastens [16], we used a time series of MOD EVI to detect crop phenology. The phase and amplitude of the crop phenology are com

Cropping Seasons Detecting
Consistent with Wardlow, Egbert, and Kastens [16], we used a time series of MODIS EVI to detect crop phenology. The phase and amplitude of the crop phenology are com-monly used to differentiate crop types [45,46], while the aggregated analysis of the detected crop phenology could be used to define cropping patterns, such as single, double, and triple cropping. Here, we first detected the number of the peaks in the time series curve of the MODIS EVI data and then examined the phase of the peaks. Based on these, we defined the cropping seasons included in each pixel. Figure 3 illustrates the existing cropping patterns in regions where double cropping is possible (Figure 3a,b) and those in regions where triple cropping is possible (Figure 3c-f). Curves with yellow dots (Figure 3a,c) indicate patterns in which double or triple cropping is possible, but only one cropping season was used. This circumstance existed, but was not in the majority. Therefore, for plots with a single cropping pattern, we did not differentiate patterns with different phases of the peak. For double cropping curves detected in regions where triple cropping is possible, there might be two circumstances: one is winter crops followed by summer crops (Figure 3d), and the other is double rice or early/late rice with other crops of a short growing season (Table 1) and without winter crops ( Figure 3e). We differentiated these two patterns based on the phase of the first peak.
Remote Sens. 2021, 13, x 6 of 19 monly used to differentiate crop types [45,46], while the aggregated analysis of the detected crop phenology could be used to define cropping patterns, such as single, double, and triple cropping. Here, we first detected the number of the peaks in the time series curve of the MODIS EVI data and then examined the phase of the peaks. Based on these, we defined the cropping seasons included in each pixel. Figure 3 illustrates the existing cropping patterns in regions where double cropping is possible (Figure 3a,b) and those in regions where triple cropping is possible (Figure 3cf). Curves with yellow dots (Figure 3a,c) indicate patterns in which double or triple cropping is possible, but only one cropping season was used. This circumstance existed, but was not in the majority. Therefore, for plots with a single cropping pattern, we did not differentiate patterns with different phases of the peak. For double cropping curves detected in regions where triple cropping is possible, there might be two circumstances: one is winter crops followed by summer crops (Figure 3d), and the other is double rice or early/late rice with other crops of a short growing season (Table 1) and without winter crops ( Figure 3e). We differentiated these two patterns based on the phase of the first peak. cropping could potentially be applied, though it is not necessarily applied (a,b) and in regions where triple cropping could potentially be applied, though it is not necessarily applied (c-f). Curves with green dots are the major patterns considered in this study. The curves with yellow dots in (a,c) indicate the presence of a single cropping pattern, but these patterns were not the majority and so were not separated from the majority type presented by green dots. We named these patterns as follows: pattern 2-1 (a), pattern 2-2 (b), pattern 3-1 (c), pattern 3-2 (d), pattern 3-3 (e), and pattern 3-4 (f). Consistent with Zuo et al. (2013), we avoided false peaks by discarding data obtained after 320 Julian day, which was especially helpful when winter wheat was planted.
To detect the cropping seasons for each pixel, we firstly used a harmonic analysis of time series (HANTS) [47] to remove interference signals and restructure the time series curve of EVI. We excluded data after November to avoid fake peaks for winter wheat before it hibernates [47,48]. However, there still might be some fake peaks in the reconstructed EVI time series curve, such as the low peak of winter wheat before it hibernates. We additionally introduced some constraints before detecting the actual crop growing peaks. On the one hand, since the crop growth process usually lasts for a certain time, we defined a peak as an actual peak if it followed at least two continuous phases with a rising EVI value, and was followed by two continuous phases with decreasing EVI values. On the other hand, a plot in the triple cropping pattern could only occur in the area where the cropping could potentially be applied, though it is not necessarily applied (a,b) and in regions where triple cropping could potentially be applied, though it is not necessarily applied (c-f). Curves with green dots are the major patterns considered in this study. The curves with yellow dots in (a,c) indicate the presence of a single cropping pattern, but these patterns were not the majority and so were not separated from the majority type presented by green dots. We named these patterns as follows: pattern 2-1 (a), pattern 2-2 (b), pattern 3-1 (c), pattern 3-2 (d), pattern 3-3 (e), and pattern 3-4 (f). Consistent with Zuo et al.
(2013), we avoided false peaks by discarding data obtained after 320 Julian day, which was especially helpful when winter wheat was planted.
To detect the cropping seasons for each pixel, we firstly used a harmonic analysis of time series (HANTS) [47] to remove interference signals and restructure the time series curve of EVI. We excluded data after November to avoid fake peaks for winter wheat before it hibernates [47,48]. However, there still might be some fake peaks in the reconstructed EVI time series curve, such as the low peak of winter wheat before it hibernates. We additionally introduced some constraints before detecting the actual crop growing peaks. On the one hand, since the crop growth process usually lasts for a certain time, we defined a peak as an actual peak if it followed at least two continuous phases with a rising EVI value, and was followed by two continuous phases with decreasing EVI values. On the other hand, a plot in the triple cropping pattern could only occur in the area where the annual accumulated Remote Sens. 2022, 14, 1800 7 of 19 temperature above a base temperature was 5000 • C [49]. Based on the above terms, we could count the number of actual peaks in the reconstructed image pixel by pixel. For pixels with two peaks in the triple cropping region, we defined the pixel with the first peak occurring after June as the one in which double rice was planted (Figure 3e).
The entire procedure includes four steps: (1) Defining the relationship between EVI values in adjacent phases is calculated as follows: where, a i and a i+1 indicate the EVI values in phases i and i + 1, and ∆a i denotes the relationship between a i and a i+1 .
(2) The difference of adjacent ∆a i to determine the position of local peak and vale is calculated as follows: If ∆∆a i was less than zero, a local peak occurred in phase i − 1; if ∆∆a i was larger than zero, a local vale occurred in phase i − 1.
(3) Removing fake peaks according to the additional requirements mentioned above.
(4) Detecting the phase of the first peak for double cropping pixels in regions where triple cropping is possible.

Crop Harvest Area Allocation
We distributed the statistic harvest area of different crops onto different types of cropland defined by cropping patterns in each agricultural region. ArcGIS software was used in this process. We determined the harvest area of each crop in each grid by the following formulation: where H A ij is the harvest area of crop i in grid j in a 1 km × 1 km resolution; CA j is the area of cropland in grid j, which was obtained from the NLCD; F cljk is the area fraction of cropland with pattern k to total cropland in grid j, which can be obtained from the MODIS-derived cropping pattern information described in Section 3.1; H I ij is the harvest index of crop i on cropland in pattern k, i.e., the harvest area of crop i per unit of cropland in pattern k; and H I ij is county-specific since the smallest granularity of harvest area is county. H I ij was calculated as follows: where H A il is the harvest area of crop i in county l and CA kl is the area of cropland in pattern k in county l.
For each county, the cropland obtained from NLCD-C was divided into several cropping patterns based on the methods described in Section 3.1. Areas of cropland in pattern k were calculated by multiplying the proportion of cropland grids in pattern k by the total cropland area in this county. If the crop can be potentially distributed in two types of cropping patterns, such as groundnut in the North China Plain, we equally distributed the harvest area of this crop onto the two types of cropland. This means that only if we can get the harvest area of different varieties of this kind of crop, otherwise, for crops that can be potentially distributed in two cropping patterns in double cropping regions, the results based on our model will be similar to the commonly used average allocation. This situation occurs in 11 crop-region combinations out of the total 68 crop-region combinations.
The cropping patterns in Table 1 and phenology types in Figure 2 were used to allocate harvested area for each crop. Meanwhile, because of the discrepancies between statistical data and data derived from remote sensing images [50], as well as the misclassification of cropping patterns due to noise in the remote sensing data [51,52]), the area of cropland in a certain pattern might be smaller than the sum of the crops that are supposed to be distributed in this type of pattern, i.e., ∑ i H I ik > 1. We developed the following principles, based on the crop calendar, to adjust the results to make it more reasonable:

•
For winter crops, we allocated them onto the cropland with a spring peak, that is, croplands in patterns 2-2, 3-2, and 3-4. If the sum area of the winter crops was larger than croplands with a spring peak, we allocated the excess area of winter crops onto the croplands with one cropping cycle, i.e., the croplands in patterns 2-1 and 3-1. We did not allocate winter crops onto pattern 3-3 because there was no spring peak in this type of cropland, according to our detection.

•
For spring or summer crops existing on single or double cropped lands, we allocated them onto the corresponding type of croplands, according to H I ik > 1 (k denotes patterns 3-3 and 3-4), we moved the crops marked in green that were not exclusively located in cropland in patterns 3-3 and 3-4 out onto another pattern where they might be located, so as to make sure the HI of all the crops in each grid were less than or equal to 1.

Verification and Accuracy Check
The accuracy of the methodologies proposed here is largely determined by the consistency between crop harvest area estimated from an area of cropping patterns derived from remote sensing data and the statistic crop harvest area that was supposed to be located on the corresponding cropping pattern. We applied linear regression to fit the above two sets of data, used the coefficient of determination to check the fitness, and used the slope to examine the consistency of these two sets of data.
We chose three pairs of cropping seasons in zones, with a relatively small group of crop types that can be planted, to examine the fitness (according to Table 1): (1) winter crops in potentially double cropped areas (WCD), exclusively planted with winter wheat and rapeseed; (2) winter crops in potentially triple cropped areas (WCT), exclusively planted with winter wheat, rapeseed, barley, and potato, and soybean in southern China (which can be distributed in both winter and the main season, but with very small area); and (3) main season crops in triple cropped areas (MCT), mainly including early rice, late rice, and some minority crops such as maize, soybean, and groundnut.
Analyses of the accuracy were applied on three levels: first, the overall fitness was obtained for all counties; second, counties of four provinces with the largest harvest area, together accounting for more than 70% of harvest area, were selected to examine the fitness separately; and third, we compared the fitness of each related province with its proportion of the harvest area of the crop of concern against the total area in China in order to test the assumption that the larger the planted area, the higher the accuracy would be.
We also employed paddy field areas from NLCD-C to check the accuracy of rice distribution. Although the paddy fields were not necessarily planting rice, most paddy fields are planted with rice, in China. We again used linear regression to fit these two datasets, the coefficient of determination to check the fitness, and the slope to examine the accuracy.
In order to compare the differences of the results from the different methods, we also employed the Shannon diversity index (SHDI). The SHDI is a measure of the disorder or uncertainty, which indicates the heterogeneity of a landscape or species. Here, we investigated the heterogeneity of patches in terms of harvest area for certain crops at the county level by SHDI. To do this, we compared the SHDI of our results with that of the average allocation model to examine which can offer more spatial details of crop distribution. The SHDI was calculated as follows: where P ij indicates the proportion of range i of harvest area of crop j at the grid level in the total harvest area of crop j in a county.

Comparison of Area of Crop Groups Obtained by Remote Sensing with Statistical Data
It is almost impossible to find true spatial distribution for all crops at the country level to verify our results. The accuracy of the area of cropland planted with different group of crops obtained from MODIS images is critical for the accuracy of the final results. The process of getting this value didn't include statistical data, but it will be used to allocate the statistical crop area. Therefore, we checked the consistency between the area of crop groups in certain cropping seasons gained from remote sensing images and the one recoded by statistical data, based on which the reliability of our method could be assessed. The coefficient of determination for the results from the remote sensing images and statistical data across the country is 0.75 for WCD, 0.71 for WCT, and 0.51 for MCT, while the slope of the fitting model is 1.20 for WCD, 1.27 for WCT, and 1.02 for MCT (Figure 4a). Generally, the results obtained from remote sensing images matched the statistical data well.
The fitness of the two winter crops (WCD and WCT) was much better than that of the main season crops in triple cropped regions (MCT). For one thing, MCTs were concentrated in southern China, where landscape was fragile and mixed pixels were commonly found. For another thing, the planting season of MCTs was in the rainy season, which created the noise fluctuation of the EVI time series curve beyond the one reflecting growth process, leading to a relatively lower accuracy on crop seasons extraction in this region. The fitness of WCDs was higher than that of WCTs. Though they are both winter crops, WCDs were mainly concentrated in the North China Plain, where patches of cropland were large and mixed pixels were rare; however, WCTs were mostly located in southern China, where hills and mountains led to fragile landscapes and mixed pixels.
We further checked four provinces with the largest area for each pair in 2015, which together accounted for more than three quarters of the total harvest area of the corresponding crops. Most of these provinces had higher fitness in terms of R 2 than the national average ( Figure 4g-i), except the province with fourth largest harvest area for WCTs, which accounted for less than 10% of total harvest area. Generally, the ranges of R 2 for WCTs (0.59-0.90) and MCTs (0.50-0.76) are larger than for WCDs (0.77-0.86). This can be attributed to the fact that the crop types included in WCDs were merely winter wheat and rapeseed exclusively planted in winter season, while more than three crops were included in WCTs and MCTs and some crops might be planted in other seasons, though with small areas, which will introduce higher uncertainty.
Analysis of the relationship between the fitness in terms of R 2 and the proportion of the crop area for the corresponding province in national totals reveals that the model does better in provinces with larger areas of the crops of concern for all the three periods (Figure 4d-f). The R 2 for WCDs are high in most provinces and less significant in the correlation with the proportion of crop area. However, the R 2 for WCTs and MCTs has larger variation and is more significantly correlated with the proportion of crop area, which is mainly due to large variations in landscape [47], complexity of crop types, and noise in signals in the different provinces. mainly due to large variations in landscape [47], complexity of crop types, and noise in signals in the different provinces. plots of fitness, represented by R 2 , against the percent of harvest areas of the crops of concern at the province level; (g-i): comparisons at the county level for the first four provinces with the highest harvest areas of the crops of concern.

Comparison of Rice Distribution with Paddy Fields from the NLCD-C
We also carried out quantitative validation when third-party independent crop maps were available. The NLCD-C is a national land use/cover database of China which was obtained by artificial visual interpretation based on remote sensing data with a spatial resolution of approximately 30 m. Rice, one of the most important crops, occupies the largest area of arable land in China. By assuming paddy fields were mostly used for growing rice, we calculate the area of paddy fields at a 1 × 1 km grid level for the NLCD-C. We further calculated the coefficient of determination (R 2 ) and the slope of the fitting between the percent of rice (paddy field) area in each grid for the NLCD-C and our results. The value of R 2 is 0.67 and the slope of the fitting is 0.71 (Figure 5), indicating a relatively high reliability. Compared with the validation results of other crop allocation models like SPAM (R 2 for rice validation with the NLCD-C is 0.49), the accuracy of our crop allocation results is relatively high. There are potentially many factors affecting the discrepancy if we treat the NLCD-C as the truth, such as the accuracy of the NLCD-C and the difference in spatial resolution between the two datasets. (d-f): plots of fitness, represented by R 2 , against the percent of harvest areas of the crops of concern at the province level; (g-i): comparisons at the county level for the first four provinces with the highest harvest areas of the crops of concern.

Comparison of Rice Distribution with Paddy Fields from the NLCD-C
We also carried out quantitative validation when third-party independent crop maps were available. The NLCD-C is a national land use/cover database of China which was obtained by artificial visual interpretation based on remote sensing data with a spatial resolution of approximately 30 m. Rice, one of the most important crops, occupies the largest area of arable land in China. By assuming paddy fields were mostly used for growing rice, we calculate the area of paddy fields at a 1 × 1 km grid level for the NLCD-C. We further calculated the coefficient of determination (R 2 ) and the slope of the fitting between the percent of rice (paddy field) area in each grid for the NLCD-C and our results. The value of R 2 is 0.67 and the slope of the fitting is 0.71 (Figure 5), indicating a relatively high reliability. Compared with the validation results of other crop allocation models like SPAM (R 2 for rice validation with the NLCD-C is 0.49), the accuracy of our crop allocation results is relatively high. There are potentially many factors affecting the discrepancy if we treat the NLCD-C as the truth, such as the accuracy of the NLCD-C and the difference in spatial resolution between the two datasets. Remote Sens. 2021, 13, x 11 of 19 Figure 5. Comparison of rice area from our results and paddy fields extracted from the NLCD-C.

Comparison with Average Allocation Results
We compared the spatial details of our results with those obtained from the average allocation model both visually and quantitively. Figure 6 shows the distribution from the two models for the 14 crops (early rice, late rice, and middle rice were grouped as rice, and spring wheat and winter wheat were grouped as wheat) in counties with large harvest areas. Obviously, all 14 crops have larger variations in harvest area at the pixel level, and some crops, such as wheat, sugar beet, and barley, were concentratedly distributed in certain areas for our results, which cannot be detected by the average allocation model.

Comparison with Average Allocation Results
We compared the spatial details of our results with those obtained from the average allocation model both visually and quantitively. Figure 6 shows the distribution from the two models for the 14 crops (early rice, late rice, and middle rice were grouped as rice, and spring wheat and winter wheat were grouped as wheat) in counties with large harvest areas. Obviously, all 14 crops have larger variations in harvest area at the pixel level, and some crops, such as wheat, sugar beet, and barley, were concentratedly distributed in certain areas for our results, which cannot be detected by the average allocation model.

Comparison with Average Allocation Results
We compared the spatial details of our results with those obtained from the average allocation model both visually and quantitively. Figure 6 shows the distribution from the two models for the 14 crops (early rice, late rice, and middle rice were grouped as rice, and spring wheat and winter wheat were grouped as wheat) in counties with large harvest areas. Obviously, all 14 crops have larger variations in harvest area at the pixel level, and some crops, such as wheat, sugar beet, and barley, were concentratedly distributed in certain areas for our results, which cannot be detected by the average allocation model.  We further randomly chose eight counties to calculate the SHDI for spatial details within each county for each crop, respectively. A much larger SHDI was found in our results, with a maximum value between 6 and 7, while the maximum SHDI for the results based on the average allocation model was less than 5, meaning a higher spatial heterogeneity was detected by our model (Figure 7). The mean value of the SHDI of our results was about 1.3 times those from the average allocation model. Taking wheat in Xintai County of Shandong Province as an example, the SHDI for the average allocation was 4.69, whereas that of our allocation result was 5.97. Therefore, our results offer high spatial heterogeneity by effectively linking cropping seasons with crop types.
Plain; potato in Ankang County, the southwest region; maize in Renshou County, the southwest region; rice in Fengcheng County, the Yangtze River Middle and Lower Plain; and sugarcane in Shule County, the Southern China region.
We further randomly chose eight counties to calculate the SHDI for spatial details within each county for each crop, respectively. A much larger SHDI was found in our results, with a maximum value between 6 and 7, while the maximum SHDI for the results based on the average allocation model was less than 5, meaning a higher spatial heterogeneity was detected by our model (Figure 7). The mean value of the SHDI of our results was about 1.3 times those from the average allocation model. Taking wheat in Xintai County of Shandong Province as an example, the SHDI for the average allocation was 4.69, whereas that of our allocation result was 5.97. Therefore, our results offer high spatial heterogeneity by effectively linking cropping seasons with crop types.

Patterns of Crop Distribution
Different crops are always distributed in different regions due to crop-specific climate suitability. Maize, mainly distributed in the North China Plain and northeastern China (Figure 8), has the largest harvest area, and approximately 32% of cropland was planted with maize for one crop season in 2015. Wheat was concentrated in the North China Plain, northeastern China, and northwestern China, within which less than 10% was spring wheat, which was mainly distributed in northern China, where single cropping was applied. Double cropped rice, including early rice and late rice, was mainly planted south of the Yangtze River, whereas middle rice can also be found in northeastern China. Respectively, 5%, 6%, and 11% of the croplands were planted with early rice, late rice, and middle rice for one crop season in 2015. About 1% of the croplands were planted with the other three types of cereals in one crop season, i.e., millet (the North China Plain), sorghum (southwestern China and northeastern China), and barley (southwestern China and eastern China) (Figure 8).
Besides the three major staple crops, rapeseed and soybean were the two crops with the largest areas, and about 5% of cropland was planted with these two crops in 2015. Rapeseed was mainly concentrated in southern China, while soybean was mostly planted in northeastern China and south of the North China Plain (Figure 8). Potato, planted on 3% of the cropland, was mainly concentrated in the second ladder of topography, where the temperature is moderate. Sunflower and sugar beet were the two crops other than

Patterns of Crop Distribution
Different crops are always distributed in different regions due to crop-specific climate suitability. Maize, mainly distributed in the North China Plain and northeastern China (Figure 8), has the largest harvest area, and approximately 32% of cropland was planted with maize for one crop season in 2015. Wheat was concentrated in the North China Plain, northeastern China, and northwestern China, within which less than 10% was spring wheat, which was mainly distributed in northern China, where single cropping was applied. Double cropped rice, including early rice and late rice, was mainly planted south of the Yangtze River, whereas middle rice can also be found in northeastern China. Respectively, 5%, 6%, and 11% of the croplands were planted with early rice, late rice, and middle rice for one crop season in 2015. About 1% of the croplands were planted with the other three types of cereals in one crop season, i.e., millet (the North China Plain), sorghum (southwestern China and northeastern China), and barley (southwestern China and eastern China) (Figure 8).
Besides the three major staple crops, rapeseed and soybean were the two crops with the largest areas, and about 5% of cropland was planted with these two crops in 2015. Rapeseed was mainly concentrated in southern China, while soybean was mostly planted in northeastern China and south of the North China Plain (Figure 8). Potato, planted on 3% of the cropland, was mainly concentrated in the second ladder of topography, where the temperature is moderate. Sunflower and sugar beet were the two crops other than spring wheat that were mainly planted in northern China, while sugarcane was the only one distributed in southern China, especially in Guangxi province.
spring wheat that were mainly planted in northern China, while sugarcane was the only one distributed in southern China, especially in Guangxi province.

Change Patterns of Major Crops
From 2000 to 2015, among the 14 major crops, the planting area of maize increased significantly, and the planting area of potato, sugarcane, peanuts, and rape increased slightly. For other crops, the planting area decreased. Here, we focus on the four major crops, i.e., wheat, rice, maize, and soybean, to dig into the change patterns of their distribution (Figure 9). We separated the grid cells into two types: cells with increased area and cells with decreased area, and we defined the grid cells with area change rates larger than 20% as high change intensity.

Change Patterns of Major Crops
From 2000 to 2015, among the 14 major crops, the planting area of maize increased significantly, and the planting area of potato, sugarcane, peanuts, and rape increased slightly. For other crops, the planting area decreased. Here, we focus on the four major crops, i.e., wheat, rice, maize, and soybean, to dig into the change patterns of their distribution ( Figure 9). We separated the grid cells into two types: cells with increased area and cells with decreased area, and we defined the grid cells with area change rates larger than 20% as high change intensity. For wheat, the cells with decreased area accounted for 74.92% of all cells with area changed. High decrease intensity cells were mainly located in Heilongjiang Province, the Inner Mongolia Autonomous Region, Sichuan Province, and Shaanxi Province, with more than 30% of the cells showing a largely decreased area in these provinces. The high increase intensity cells were mainly located in the Inner Mongolia Autonomous Region, Henan Province, Hubei Province, and Shandong Province, with nearly 40% of the cells showing a largely increased area in these provinces. The Inner Mongolia Autonomous Region had regions with areas of wheat that shifted a lot, as it had plenty of cells with both high decrease intensity and high increase intensity concentratedly distributed.
For rice, the cells with decreased area accounted for 57.32% of all cells with area changed. High decrease intensity cells were mainly located in Sichuan Province, Guizhou Province, the Inner Mongolia Autonomous Region, and Guangxi Province, with more than 20% of the cells showing a largely decreased area in these provinces. The high increase intensity cells were mainly located in Heilongjiang Province, Hunan Province, Yun- For wheat, the cells with decreased area accounted for 74.92% of all cells with area changed. High decrease intensity cells were mainly located in Heilongjiang Province, the Inner Mongolia Autonomous Region, Sichuan Province, and Shaanxi Province, with more than 30% of the cells showing a largely decreased area in these provinces. The high increase intensity cells were mainly located in the Inner Mongolia Autonomous Region, Henan Province, Hubei Province, and Shandong Province, with nearly 40% of the cells showing a largely increased area in these provinces. The Inner Mongolia Autonomous Region had regions with areas of wheat that shifted a lot, as it had plenty of cells with both high decrease intensity and high increase intensity concentratedly distributed.
For rice, the cells with decreased area accounted for 57.32% of all cells with area changed. High decrease intensity cells were mainly located in Sichuan Province, Guizhou Province, the Inner Mongolia Autonomous Region, and Guangxi Province, with more than 20% of the cells showing a largely decreased area in these provinces. The high increase intensity cells were mainly located in Heilongjiang Province, Hunan Province, Yunnan Province, and Sichuan Province, with nearly 40% of the cells showing a largely increased area in these provinces. Sichuan Province had regions with areas of rice that shifted a lot, as it had plenty of cells with both high decrease intensity and high increase intensity concentratedly distributed.
For soybean, the cells with decreased area accounted for 59.00% of all cells with area changed. High decrease intensity cells were mainly located in the Inner Mongolia Autonomous Region, Heilongjiang Province, Shaanxi Province, and Hebei Province, with more than 25% of the cells showing a largely decreased area in these provinces. The high increase intensity cells were mainly located in Sichuan Province, Yunnan Province, Heilongjiang Province, and Anhui Province, with nearly 40% of the cells showing a largely increased area in these provinces. Heilongjiang Province had regions with areas of soybean that shifted a lot, as it had plenty of cells with both high decrease intensity and high increase intensity concentratedly distributed.
For maize, the cells with increased area accounted for 62.29% of all cells with area changed. High decrease intensity cells were mainly located in Shaanxi Province, Sichuan Province, Guizhou Province, and Guangxi Province, with more than 25% of the cells showing a largely decreased area in these provinces. The high increase intensity cells were mainly located in Heilongjiang Province, the Inner Mongolia Autonomous Region, Henan Province, and Jilin Province, with nearly 40% of the cells showing a largely increased area in these provinces.

Discussion
Remote sensing techniques have been considered an efficient method for crop distribution mapping. Previous methods mainly focused on the detection of only some major crops [53,54]. When referring to all kinds of crop mapping, fused remote sensing data and statistical data were commonly used; however, the average allocation method [14] or predicting models [36][37][38], such as the cross-entropy approach, were the most popular ones. We mapped 14 crops by combining crop season information derived from remote sensing images and crop harvest area from statistical data. We compared the crop areas estimated from the remote sensing images with national statistical data at the county level. The results showed a high consistency between these two values, meaning that the methodologies proposed here are effective. The error mainly comes from mixed pixels.
Inaccuracy of the results mainly comes from large numbers of mixed pixels in the fragile landscape of areas with complex crop planting patterns. From our results, we could see that the slopes of the fitting model for all three pairs of cropping seasons in zones were larger than 1 (Figure 4), indicating that the results derived from the remote sensing data were mostly larger than those from the statistical data. This is mainly because pixels in the remote sensing images though are recognized as pure cropland that always includes paths, channels, ponds, etc. in fields [47]. Areas of this infrastructure were accounted for in the remote sensing results; however, they were excluded in the statistical data. The more fragile the landscape of cropland is, the higher the inaccuracy that would occur [52,55].
Some studies have explored methods for reducing the impact of mixed pixels on the accuracy of crop mapping [56,57]. One effective way is to integrate images with high spatial resolution with those having lower spatial resolution, such as fusing Landsat data and MODIS data, to improve the spatial resolution of the datasets; however, the large amount of data was generally a challenge when referring to large-scale research [58]. Some studies have employed decision trees to try to use a small amount of data to map cropping information efficiently [59]. Fortunately, the emergence of Google Earth Engine enabled studies based on images with moderate resolution [50]. Another solution is to map fractional crop distribution by building up estimation models that illustrate the relationship between the vegetation index from data with low spatial resolution and the information at a higher resolution. In the future, methods similar to the above, and other effective methods that can improve accuracy, can be used in crop allocation to reduce the impact of mixed pixels on accuracy.
Using low-medium resolution data, such as MODIS data, or using moderate resolution data, such as Landsat and Sentinel, for crop types that were successfully identified by remote sensing data at the regional or national scale are mainly concentrated on some major crops with large planting areas, for example, rice [25][26][27], wheat [28], maize [29], soybean [30], and sugarcane [31], and mostly different combinations of these crops [32][33][34][35]. Crops with low popularity are generally not included due to their lower visibility on the signals from remote sensing images. Based on the crop seasons obtained from remote sensing, statistical data, and the crop phenology information, this paper draws the spatial distribution map of 14 crops. These crops include not only the main food crops, but also some smaller kinds of crops, such as sugarcane, peanut, barley, sunflower, and so on. The spatial distribution maps of these crops can show more detailed spatial distribution information of crops in a larger space.
We also compared our results of area change with those of other researchers. The planting area of rice increased obviously in Heilongjiang Province, which was consistent with Lu's research results, which found that the total cultivated area of rice in Heilongjiang Province increased largely from 1993 to 2011 and it expanded spatially to the northern and eastern parts of the Sanjiang Plain. Other findings consistent with this study found that the rice planting area in Northeastern China expanded quickly and increased by nearly 4.5 times during the period from 1980-2010.
Some studies also paid attention to cropping patterns in single crop regions. For example, the peak time of time series curves of VIs for spring maize and spring wheat in northern China is usually different from those of soybean and summer maize, which could be used to further differentiate these crops [59]. In the future, we could collect more detailed data on the harvest areas of different crops to utilize this information. Some other studies found that cotton planted in the North China Plain always has a longer cropping season when compared with single cropped maize [60]. This reminds us that the length of cropping season can be used to further distinguish cropping patterns in order to get more detailed crop distribution information. In conclusion, more information about the peak of VIs time series curves and the length of cropping season perhaps can be used to improve the accuracy of crop mapping.

Conclusions
Crop distribution has been recognized as an essential basis for evaluating food security, environmental change, and global climate change. Tradeoffs between spatial resolution and the number of crops of concern always exist in crop mapping. In this study, we present an improved crop allocation model by combining cropping season data detected from remote sensing images and the regional crop calendar, aiming to provide more spatially resolved crop maps with a larger span. Verification of the model showed that the harvest areas of crops/crop groups obtained by cropping seasons derived from remote sensing images matched well with the statistical data, indicating that our model is a reliable one for national crop mapping. The fitness between these two datasets on the provincial level was largely affected by landscape characteristics in that the flat areas always had a higher fitness, and thus a high accuracy due to fewer mixed pixels. Moreover, the larger a provincial harvest area is, the higher its fitness. Basically, our proposed model can map crops in a greater detail compared to the commonly used average allocation model, and it is suitable for large-scale crop mapping.