Impacts of Spatial Zonation Schemes on Yield Potential Estimates at the Regional Scale

: Simulations based on site-speciﬁc crop growth models have been widely used to obtain regional yield potential estimates for food security assessments at the regional scale. By dividing a region into nonoverlapping basic spatial units using appropriate zonation schemes, the data required to run a crop growth model can be reduced, thereby improving the simulation e ﬃ ciency. In this study, we explored the impacts of di ﬀ erent zonation schemes on estimating the regional yield potential of the Chinese winter wheat area to obtain the most appropriate spatial zonation scheme of weather sites therein. Our simulated results suggest that the upscaled site-speciﬁc yield potential is a ﬀ ected by the zonation scheme and by the spatial distribution of sites. As such, the distribution of a small number of sites signiﬁcantly a ﬀ ected the simulated regional yield potential under di ﬀ erent zonation schemes, and the zonation scheme based on sunshine duration clustering zones could e ﬀ ectively guarantee the simulation accuracy at the regional scale. Using the most inﬂuential environmental variable of crop growth models for clustering can get the better zonation scheme to upscale the site-speciﬁc simulation results. In contrast, a large number of sites had little e ﬀ ect on the regional yield potential simulation results under the di ﬀ erent zonation schemes.


Introduction
Yield potential refers to potential productivity, which is entirely determined by temperature and photosynthetically active radiation when nutrients, moisture, soils, cultivars, and other agricultural technological parameters are at optimum conditions [1,2]. Yield potential estimations at the regional scale can identify the variations in the upper yield limit, optimize the planting system, and improve the use efficiency of agroclimatic resources, thereby providing information for agricultural impact and risk assessment to support policymaking [1][2][3][4][5]. Additionally, the process-based crop growth model as a robust, generic, and cost-effective tool to simulate yields under a range of agricultural and climatic scenarios, in this context, crop growth models have been extensively used to estimate yield potentials over large areas. Currently, two main upscaling methods are employed for regional applications of crop growth models. The first upscaling method links a site-specific crop growth model with regional gridded data (rain, temperature, radiation, soil, etc.) and runs the crop growth model for each grid cell, a basic homogeneous spatial unit, to obtain the regional yields [4,[6][7][8]. However, while this method can moderate-temperate semiarid, moderate-temperate semihumid, warm-temperate semihumid, warm-temperate semiarid, northern subtropical humid, and mid-subtropical humid zones, each of which exhibits distinct temperature and precipitation conditions [19]. Among these zones, the annual average temperature varies from 9 to 15 °C, and the annual precipitation varies from 440 to 980 mm in the northern winter wheat region (including subregions X, VIII, and XI) and the Huang-Huai winter wheat region (including subregions I, IX, and V). In contrast, in the winter wheat region near the middle and lower reaches of the Yangtze River (including subregions VII and IV) and the southwestern winter wheat region (including subregions II, III, and VI), the annual average temperature and annual precipitation vary from 16 to 25 °C and generally more than 1000 mm, respectively [20]. The landforms of the Chinese winter wheat area include plains, hills, mountains, and basins, all of which display distinct elevation differences from east to west (the highest elevation is 5174 m, and the lowest elevation is −142 m below sea level).

Method Roadmap
A modeling approach was used as the basis for simulating the wheat yield potential at 538 sites throughout the study area from 2000 to 2009 [21][22][23][24][25][26][27][28][29][30]. A total of 17 schemes were defined using spatial random sampling, and the number of sites per scheme ranged from 20 to 500 with an interval of 30. By combining six different zonation schemes, a total of 102 regional yield potential estimation scenarios were constructed. Then, the regional yield potential was calculated using the spatial weighted average method based on the site-specific simulation results [10,31]. By using the average of the simulation results for all 538 sites as the reference value, the relative errors in the yield potential and the reference values for different simulation scenarios were calculated to quantitatively evaluate the impact of each zonation scheme on the accuracy of the simulation results. The standard deviations of the simulated yields for the sites in the basic spatial units were calculated and used to evaluate the effects of different zonation schemes on the stability of the simulation results. Finally, an appropriate zonation scheme for estimating the regional yield potential was obtained based on the site-specific simulation results ( Figure 2). Additionally, unlike previous studies that used multiyear mean simulated yields for analysis [8,9], we analyzed the simulated data every year from 2000 to 2009 to validate the proposed method and verify whether the conclusions are robust to within-year variability.

Method Roadmap
A modeling approach was used as the basis for simulating the wheat yield potential at 538 sites throughout the study area from 2000 to 2009 [21][22][23][24][25][26][27][28][29][30]. A total of 17 schemes were defined using spatial random sampling, and the number of sites per scheme ranged from 20 to 500 with an interval of 30. By combining six different zonation schemes, a total of 102 regional yield potential estimation scenarios were constructed. Then, the regional yield potential was calculated using the spatial weighted average method based on the site-specific simulation results [10,31]. By using the average of the simulation results for all 538 sites as the reference value, the relative errors in the yield potential and the reference values for different simulation scenarios were calculated to quantitatively evaluate the impact of each zonation scheme on the accuracy of the simulation results. The standard deviations of the simulated yields for the sites in the basic spatial units were calculated and used to evaluate the effects of different zonation schemes on the stability of the simulation results. Finally, an appropriate zonation scheme for estimating the regional yield potential was obtained based on the site-specific simulation results ( Figure 2). Additionally, unlike previous studies that used multiyear mean simulated yields for analysis [8,9], we analyzed the simulated data every year from 2000 to 2009 to validate the proposed method and verify whether the conclusions are robust to within-year variability.

Data Description
The daily 538 weather site data needed for WheatGrow inputs were obtained from the Meteorological Data Sharing Network of the China Meteorological Administration (http://www.nmic.cn) for the period from 2000 to 2009 (Figure 1b). Because our simulation scenario was yield potential, the weather data needed for WheatGrow model included daily maximum temperature (Tmax), daily minimum temperature (Tmin), and sunshine duration (SSD). The Pohlert method was used to convert sunshine duration into solar radiation quantities [32]. The most popular cultivars in each subregion were used in the WheatGrow model. A total of 129 typical sites with historical observations of wheat phenology (sowing, heading, and maturity dates), grain yield, and management practices from 2000 to 2009 were obtained from the National Meteorological Center Library of China (Figure 1b). The sowing date was interpolated using the Thiessen polygons method for all sites [33].

WheatGrow Model Description and Validation
The WheatGrow model includes five modules, the apical and phenological development of wheat, photosynthesis and dry matter production, dry matter partitioning and organogenesis, yield and quality formation, and soil moisture and nutrition balance modules [21][22][23][24][25][26][27].The WheatGrow model can simulate wheat growth and development conditions under three growth conditions: yield potential, water limitation, and nitrogen limitation [22,30]. The WheatGrow model has been validated in simulations of winter wheat at multiple ecological observation sites throughout the main winter wheat production area of China using field experiment data from different sowing dates, plant densities, and nitrogen fertilization strategies, and it is reported that the WheatGrow model displays good agreement between the predicted and observed values and can effectively capture the spatial variations of the yield at different regional scales [4,34,35].
From 129 sites, we selected 45 typical sites ( Figure 1a) that had the same statistical yield time range from 2000 to 2009 for model validation using RMSE (Root Mean Squard Error) and NRMSE (Normalized Root Mean Squared Error). The smaller the RMSE, the better the WheatGrow model performance, and if the NRMSE value is less than 20%, showing the WheatGrow model can well reproduce the statistical yield.

Design of the Zonation Schemes and the Regional Yield Potential Simulation Scenarios
This study was designed for six basic zonation schemes: the administrative zones of provinces (AZP), based on provincial administrative boundaries (Figure 3a) [30]; the agricultural comprehensive zones (ACZ), based on the natural environment, regional agricultural functions, and spatial agricultural characteristics (Figure 3b) [36]; the geomorphology zones (GZ), based on the elevation and terrain relief (Figure 3c) [37]; the ecological zones (EZ), based on the geographic environment, natural conditions, tillage system, cultivar type, production level, cultivation characteristics, and the occurrence of pests and diseases associated with wheat planting (Figure 3d) [38]; the temperature clustering zones (TCZ), based on the spatial clustering of the 10-year average temperatures in the study area ( Figure 4e); and the sunshine duration clustering zones (SDCZ), based on the spatial clustering of the 10-year average sunshine durations (Figure 3f). The Scikit-learn

Data Description
The daily 538 weather site data needed for WheatGrow inputs were obtained from the Meteorological Data Sharing Network of the China Meteorological Administration (http://www.nmic.cn) for the period from 2000 to 2009 (Figure 1b). Because our simulation scenario was yield potential, the weather data needed for WheatGrow model included daily maximum temperature (Tmax), daily minimum temperature (Tmin), and sunshine duration (SSD). The Pohlert method was used to convert sunshine duration into solar radiation quantities [32]. The most popular cultivars in each subregion were used in the WheatGrow model. A total of 129 typical sites with historical observations of wheat phenology (sowing, heading, and maturity dates), grain yield, and management practices from 2000 to 2009 were obtained from the National Meteorological Center Library of China (Figure 1b). The sowing date was interpolated using the Thiessen polygons method for all sites [33].

WheatGrow Model Description and Validation
The WheatGrow model includes five modules, the apical and phenological development of wheat, photosynthesis and dry matter production, dry matter partitioning and organogenesis, yield and quality formation, and soil moisture and nutrition balance modules [21][22][23][24][25][26][27].The WheatGrow model can simulate wheat growth and development conditions under three growth conditions: yield potential, water limitation, and nitrogen limitation [22,30]. The WheatGrow model has been validated in simulations of winter wheat at multiple ecological observation sites throughout the main winter wheat production area of China using field experiment data from different sowing dates, plant densities, and nitrogen fertilization strategies, and it is reported that the WheatGrow model displays good agreement between the predicted and observed values and can effectively capture the spatial variations of the yield at different regional scales [4,34,35].
From 129 sites, we selected 45 typical sites ( Figure 1a) that had the same statistical yield time range from 2000 to 2009 for model validation using RMSE (Root Mean Squard Error) and NRMSE (Normalized Root Mean Squared Error). The smaller the RMSE, the better the WheatGrow model performance, and if the NRMSE value is less than 20%, showing the WheatGrow model can well reproduce the statistical yield.

Design of the Zonation Schemes and the Regional Yield Potential Simulation Scenarios
This study was designed for six basic zonation schemes: the administrative zones of provinces (AZP), based on provincial administrative boundaries (Figure 3a) [30]; the agricultural comprehensive zones (ACZ), based on the natural environment, regional agricultural functions, and spatial agricultural characteristics ( Figure 3b) [36]; the geomorphology zones (GZ), based on the elevation and terrain relief (Figure 3c) [37]; the ecological zones (EZ), based on the geographic environment, natural conditions, tillage system, cultivar type, production level, cultivation characteristics, and the occurrence of pests and diseases associated with wheat planting (Figure 3d) [38]; the temperature clustering zones (TCZ), based on the spatial clustering of the 10-year average temperatures in the study area ( Figure 4e); and the sunshine duration clustering zones (SDCZ), based on the spatial clustering of the 10-year average Agronomy 2020, 10, 631 5 of 15 sunshine durations (Figure 3f). The Scikit-learn module was adopted for spatial clustering using the k-means clustering algorithm, and the minimum means distortion index was used to determine the optimum number of zones [39]. Moreover, to ensure the integrity of the basic spatial units so that sites belonging to the same spatial unit are spatially connected, this study merged small and isolated spatial units [10,31]. Among them, the ACZ has the greatest number (18) of basic spatial units (Table 1).
Agronomy 2019, 9, x FOR PEER REVIEW 5 of 15 module was adopted for spatial clustering using the k-means clustering algorithm, and the minimum means distortion index was used to determine the optimum number of zones [39]. Moreover, to ensure the integrity of the basic spatial units so that sites belonging to the same spatial unit are spatially connected, this study merged small and isolated spatial units [10,31]. Among them, the ACZ has the greatest number (18) of basic spatial units (Table 1). Furthermore, to reduce the impact of random sampling on the results of this study, spatial random sampling was performed 1000 times for each scheme, and the average of the 1000 simulation results for each simulation scenario was used as the final simulated regional yield potential.  Table 1. Characteristics of each zonation scheme: the administrative zones of provinces (AZP), the agricultural comprehensive zones (ACZ), the geomorphology zones (GZ), the ecological zones (EZ), the temperature clustering zones (TCZ), and the sunshine duration clustering zones (SDCZ). This study used the spatial weighted average method to obtain the regional yield potential  Y (Equation (1)) [10,40], where m is the total number of basic spatial units in the region, i is the th i basic spatial unit, i y is the average value of the simulated yield potential of the sites in the th i basic spatial unit, and i ω is the weight contribution of i y to the regional yield potential  Y (Equation (2)).  The spatial distributions of the yield potential tended to be consistent for all sites throughout the winter wheat-producing area of China ( Figure 5). The lower values were mainly distributed in the Sichuan Basin subregion (III) with relatively poor light and temperature conditions, for which the minimum and maximum 10-year average values were only 3209 kg/ha and 4612 kg/ha, respectively. This subregion was followed by the subregion comprising the mountainous and hilly areas of southern Shaanxi Province and western Hubei Province (VI), the Yangtze-Huai Plain subregion (VII), and the riverside and lakeside subregion (IV). Higher values were found in the eastern subregions of the study area, including the Huang-Huai Plain subregion (I), the Jiaodong hilly subregion (V), the Yunnan-Guizhou Plateau subregion (II), and the subregion comprising the gully area of the Loess Plateau (X), among which the minimum and maximum 10-year average values were 6100 kg/ha and 11,700 kg/ha, respectively.  Table 1. Characteristics of each zonation scheme: the administrative zones of provinces (AZP), the agricultural comprehensive zones (ACZ), the geomorphology zones (GZ), the ecological zones (EZ), the temperature clustering zones (TCZ), and the sunshine duration clustering zones (SDCZ). Furthermore, to reduce the impact of random sampling on the results of this study, spatial random sampling was performed 1000 times for each scheme, and the average of the 1000 simulation results for each simulation scenario was used as the final simulated regional yield potential.

Zonation
Agronomy 2020, 10, 631 6 of 15 2.6. Upscaling the Yield Potential of the Sites and Performing Uncertainty Analysis 2.6.1. Calculation of the Regional Yield Potential Using the Spatial Weighted Average Method This study used the spatial weighted average method to obtain the regional yield potentialŶ (Equation (1)) [10,40], where m is the total number of basic spatial units in the region, i is the ith basic spatial unit, y i is the average value of the simulated yield potential of the sites in the ith basic spatial unit, and ω i is the weight contribution of y i to the regional yield potentialŶ (Equation (2)).
In Equation (2), N i represents the number of sites in the ith basic spatial unit, N is the total number of sites in the study area, n i is the number of selected sites in ith basic spatial unit, and n is the number of selected sites.

Uncertainty of the Regional Yield Potential Estimation
This study used the average value of the simulated yield potential for all sites in the study area Y as the reference value (Equation (4)). The relative errors δ (Equation (5)) and the standard deviations σ (Equation (6)) were further calculated to quantitatively evaluate the accuracy and stability, respectively, of the different zonation schemes when obtaining the regional yield potential. Values of the relative error δ that are closer to 0 indicate a higher accuracy for a zonation scheme [41,42], while smaller values of the standard deviation σ indicate that a zonation scheme has greater stability [17,31,42].
In Equation (4), y j represents the simulation result of the yield potential at site j in the study area. In Equation (6), y k i represents the simulated yield potential of site k in the ith basic spatial unit.

Results
The WheatGrow model validation results show that the RMSE between the simulated and observed yields was approximately 1000 kg/ha for 2000 to 2009 (Figure 4). Except that the NRMSE in 2002 and 2003 was slightly higher than 20%, the NRMSE in all other years was within 20%. The WheatGrow model has a good performance.
The spatial distributions of the yield potential tended to be consistent for all sites throughout the winter wheat-producing area of China ( Figure 5). The lower values were mainly distributed in the Sichuan Basin subregion (III) with relatively poor light and temperature conditions, for which the minimum and maximum 10-year average values were only 3209 kg/ha and 4612 kg/ha, respectively. This subregion was followed by the subregion comprising the mountainous and hilly areas of southern Shaanxi Province and western Hubei Province (VI), the Yangtze-Huai Plain subregion (VII), and the riverside and lakeside subregion (IV). Higher values were found in the eastern subregions of the study area, including the Huang-Huai Plain subregion (I), the Jiaodong hilly subregion (V), the Yunnan-Guizhou Plateau subregion (II), and the subregion comprising the gully area of the Loess Plateau (X), among which the minimum and maximum 10-year average values were 6100 kg/ha and 11,700 kg/ha, respectively. The spatial distributions of the yield potential tended to be consistent for all sites throughout the winter wheat-producing area of China ( Figure 5). The lower values were mainly distributed in the Sichuan Basin subregion (III) with relatively poor light and temperature conditions, for which the minimum and maximum 10-year average values were only 3209 kg/ha and 4612 kg/ha, respectively. This subregion was followed by the subregion comprising the mountainous and hilly areas of southern Shaanxi Province and western Hubei Province (VI), the Yangtze-Huai Plain subregion (VII), and the riverside and lakeside subregion (IV). Higher values were found in the eastern subregions of the study area, including the Huang-Huai Plain subregion (I), the Jiaodong hilly subregion (V), the Yunnan-Guizhou Plateau subregion (II), and the subregion comprising the gully area of the Loess Plateau (X), among which the minimum and maximum 10-year average values were 6100 kg/ha and 11,700 kg/ha, respectively. The number of sites had a significant impact on the regional yield potential estimation results under the six different zonation schemes. Based on the simulation results from all scenarios, when the number of sites was small, the regional yield potential was overestimated (Figure 6), the relative error was high (Figure 7), and the standard deviation was low ( Figure 8). As the number of sites increased, the simulation results tended to become more accurate and stable ( Figure 6).
As the number of sites increased to 200, the differences among the simulated yield potential under the different zonation schemes gradually declined to less than 100 kg/ha; additionally, the relative errors of the simulated yield potential under the different zonation schemes decreased, and their differences gradually decreased to approximately 0.01; moreover, the differences among the standard deviations of the simulated yield potential under the different zonation schemes were stable at approximately 300 kg/ha (Figure 8a-j). Therefore, when more than 200 sites selected in the study area, the regional yield potential simulation difference between zonation schemes became smaller, The number of sites had a significant impact on the regional yield potential estimation results under the six different zonation schemes. Based on the simulation results from all scenarios, when the number of sites was small, the regional yield potential was overestimated (Figure 6), the relative error was high (Figure 7), and the standard deviation was low ( Figure 8). As the number of sites increased, the simulation results tended to become more accurate and stable ( Figure 6). and the different zonation schemes did not have a significant impact on the regional yield potential estimation.  As the number of sites increased to 200, the differences among the simulated yield potential under the different zonation schemes gradually declined to less than 100 kg/ha; additionally, the relative errors of the simulated yield potential under the different zonation schemes decreased, and their differences gradually decreased to approximately 0.01; moreover, the differences among the standard deviations of the simulated yield potential under the different zonation schemes were stable at approximately 300 kg/ha (Figure 8a-j). Therefore, when more than 200 sites selected in the study area, the regional yield potential simulation difference between zonation schemes became smaller, and the different zonation schemes did not have a significant impact on the regional yield potential estimation.       However, the zonation schemes significantly affected the simulated regional yield potential when the number of sites was small (Figure 9a-j, red). When the number of sites was only 20, the simulation results from the different zonation schemes varied considerably. Because the sites' spatial distribution is not uniform when the sites are selected in proportion, the n i (Equation (4)) may be zero, because there are few sites in the basic spatial units, and the selected sites are concentrated in the basic spatial unit with high yield potential; thus, the simulated regional yield potential of the ACZ was the largest. The simulated regional yield potential of the SDCZ was the smallest. Moreover, the relative error of the simulated regional yield potential of the SDCZ remained the lowest (Figure 10a-j), and the standard deviations of the simulated regional yield potential were small and less than 400 kg/ha under all the different zonation schemes (Figure 11a-j, red).
With an increase in the number of sites, the simulated regional yield potential under the various zonation schemes all showed varying degrees of reduction, while the differences among the simulated regional yield potential caused by different zonation schemes significantly decreased (Figure 9a-j,  green). When the number of sites increased from 20 to 200, the average values of the simulated regional yield potential were similar under the different zones and converged (Figure 9a-j, green), and the relative errors of the simulated regional yield potential were less than 0.1 under all six zonation schemes (Figure 10a-j, green). The accuracy of the simulation results was relatively high under all zonation schemes; however, the standard deviation of the simulated regional yield potential under each zonation scheme gradually increased, and the stability of the simulation weakened. At this point, the standard deviations of the simulated regional yield potential of the ACZ and SDCZ were the smallest, suggesting that these two zonation schemes are superior to the other zonation schemes (Figure 11a-j).
kg/ha under all the different zonation schemes (Figure 11a-j, red).
With an increase in the number of sites, the simulated regional yield potential under the various zonation schemes all showed varying degrees of reduction, while the differences among the simulated regional yield potential caused by different zonation schemes significantly decreased (Figure 9a-j, green). When the number of sites increased from 20 to 200, the average values of the simulated regional yield potential were similar under the different zones and converged (Figure 9aj, green), and the relative errors of the simulated regional yield potential were less than 0.1 under all six zonation schemes (Figure 10a-j, green). The accuracy of the simulation results was relatively high under all zonation schemes; however, the standard deviation of the simulated regional yield potential under each zonation scheme gradually increased, and the stability of the simulation weakened. At this point, the standard deviations of the simulated regional yield potential of the ACZ and SDCZ were the smallest, suggesting that these two zonation schemes are superior to the other zonation schemes (Figure 11a-j).

Discussion
The use of high-quality site data to estimate the regional yield potential through spatial zonation has great significance for identifying the variation patterns for the upper limit of crop yields, optimizing cropping systems, and improving the efficiency of agroclimatic resource utilization [8,11,43]. The environmental variables across our study areas where spatial heterogeneity and autocorrelation are ubiquitous can cause the simulated yields to show a spatial continuity, i.e., data at two nearby locations are on average more similar than data at two widely spaced locations [10]. For example, the simulated yield potential in the Sichuan Basin subregion (III) was relatively low and exhibited spatial agglomeration ( Figure 5). By implementing a zonation scheme, the spatial

Discussion
The use of high-quality site data to estimate the regional yield potential through spatial zonation has great significance for identifying the variation patterns for the upper limit of crop yields, optimizing cropping systems, and improving the efficiency of agroclimatic resource utilization [8,11,43]. The environmental variables across our study areas where spatial heterogeneity and autocorrelation are ubiquitous can cause the simulated yields to show a spatial continuity, i.e., data at two nearby locations are on average more similar than data at two widely spaced locations [10]. For example, the simulated yield potential in the Sichuan Basin subregion (III) was relatively low and exhibited spatial agglomeration ( Figure 5). By implementing a zonation scheme, the spatial heterogeneity of the simulated yields can be significantly reduced for each basic spatial unit, and the representativeness of the site-specific simulated yields can be improved [11,44]. To upscale the simulated yield potential of sites to the regional scale by using spatial zonation, previous studies have mostly focused on the use of a single existing zonation scheme, such as climate zonation or agricultural zonation; however, existing zonation schemes may have negative effects on the regional estimation results because of inappropriate prior information used for the zonation, and as a consequence, homogeneous simulation results were not obtained within the irregular basic spatial units [10]. Moreover, as the number of sites increased, the yield potential and relative error decreased, and the standard deviation increased (Figures 6-8), but previous studies did not investigate the effects of different numbers of sites on the regional yield potential simulation results [5,11,12]. Our results provide a reference for the design of zonation schemes and the selection of site numbers for estimating the regional yield potential in different regions.
Due to the spatial heterogeneity in environmental data, estimating the regional yield potential based on site data is affected by both the zonation scheme and the number of sites. In particular, when the number of sites is limited, if the simulated yield potential of a single site is excessively high or low, the simulated single-site potential will significantly impact the estimated regional yield potential. Hence, the use of a suitable zonation scheme for estimating the regional yield potential can yield relatively reliable results [17,45,46]. However, environmental data exhibit spatial continuity. Except for the clear boundary between ocean and land, there is no universal zonation scheme [41].
The comparison among the six zonation schemes considered in this study shows that the regional yield potential simulation is sensitive to the zonation scheme. Because the spatial heterogeneity existed in site-specific yield potential, the standard deviation between different zonation schemes varies greatly when the number of sites was 200 (Figure 11, green). Compared with ACZ, the SDCZ can guaranteed homogeneity of the site-specific yield potential even with a small number of basic spatial units, especially in 2002, 2007, and 2009 ( Figure 11, green). Furthermore, even with only 20 sites, the SDCZ can keep the relative error within 0.2 from 2000 to 2009 ( Figure 10). The SDCZ scheme can reduce both the number of sites needed and the amount of data required by the crop growth model under the premise of ensuring the simulation accuracy (Figure 10a-j, red). Therefore, the SDCZ is the most suitable zonation scheme. Increasing the number of sites can improve the regional yield potential estimation accuracy and reduce the variability in the estimated regional yield potential under different zonation schemes [11,46]. Therefore, as the number of sites increases to 200, the relative error falls below 0.1 (Figure 10a-j, green), and the relative errors of the different zonation schemes gradually decrease (Figure 7a-j). Although the ACZ scheme has the strongest stability and certain advantages, the effects of the differences among the zonation schemes on the simulated regional yield potential can be neglected.
In addition, the simulation results of the process-based crop growth model are jointly determined by the model structure, the input data and the process parameters. The input data are the main cause of the spatial variation in the regional simulation results from the crop growth model [47]. Therefore, through a sensitivity analysis of the crop growth model, we can identify the input parameters that have significant influences on the simulation results of the model, thereby improving the reliability of the model prediction [47,48]. By correlation testing, this study found that the simulated yield potential was negatively correlated with the temperature variables and significantly positively correlated with the average sunshine duration (Table 2). This finding is similar to those of previous studies that used the AGROC, APSIM-Wheat, and CoupModel models [32,[49][50][51]. Because the simulated scenario of this study is the yield potential, due to gradually increasing sunshine duration, higher amounts of solar radiation were intercepted, and higher amounts of dry matter biomass were produced; however, the rising temperature can shorten the grain-filling period and may cause a reduction in yield [52]. Therefore, when a small number of sites were used, the SDCZ scheme could still obtain relatively good regional simulation results ( Figure 10). This means that uncertainties could be introduced by different zonation schemes when upscaling site-specific simulation results to the regional scale. If a sufficient data set is available, then the most appropriate zonation scheme should be determined to guarantee the simulation accuracy at the regional scale regardless of which crop model is used. Accordingly, by dividing the area of interest into irregular basic spatial units with the most sensitive variables of a crop growth model, a relatively high regional estimation accuracy can be achieved with a small number of sites, thereby improving the availability of data at a low cost.
However, in addition to meteorological factors, the spatial variations in crop yield are also affected by the spatial heterogeneity of the soil factors, cultivars, and management measures [53]. Since the object of this study is the potential productivity as entirely determined by light and temperature conditions, only the influences of temperature and sunlight among the meteorological factors were considered. In future analyses of the potential rainfed productivity and the potential productivity determined by nitrogen content, the correlations of potential productivity with various other factors related to the soil, precipitation, and fertilization need to be analyzed, and different crop growth models need to be compared to conclusively determine the optimum zonation scheme [10]. Moreover, over a long period of time, climate change can cause zone instability. For example, due to the redistribution of light, heat, water, and soil resources caused by climate warming, the suitable planting boundary in China will move to the north, and the clustering results of weather data will also change [54]. The uncertainty caused by the period was not taken into account in this study. Table 2. Analysis of the correlations between simulated yield potential and the following environmental variables using the Spearman correlation test of 538 sites: the maximum average temperature (AveTmax), minimum average temperature (AveTmin), average temperature (AveTEM), average sunshine duration (AveSSD) (during the entire growth period in each year), and site elevation (DEM). ** and * represent significance at the 1% and 5% levels, respectively.

Year
Meteorological

Conclusions
In this study, the Chinese winter wheat production region was used as the study area, and regional yield potential simulation scenarios were established under different zonation schemes using different numbers of sites for site-specific model simulations. Data from 2000 to 2009 were used to study the uncertainties associated with the upscaling of the site-specific yield potential using the spatial weighted average method. We used the average value of the simulated yield potential for all the sites in the study area as the reference value, and the results of an uncertainty analysis among the different zonation schemes suggested that the zonation scheme and distribution of sites affect the regional yield potential simulation. As such, a small number of sites significantly affected the simulated regional yield potential under different zonation schemes, and the zonation scheme based on sunshine duration clustering zones (SDCZ) could effectively guarantee the simulation accuracy at the regional scale. Thus, using the most influential environmental variable of crop growth models for clustering can get a better zonation scheme to upscale the site-specific simulation results. However, a large number of sites resulted in little difference among the regional yield potential simulation results under the different zonation schemes.