Winter Wheat Yield Prediction at County Level and Uncertainty Analysis in Main Wheat-Producing Regions of China with Deep Learning Approaches

: Timely and accurate forecasting of crop yields is crucial to food security and sustainable development in the agricultural sector. However, winter wheat yield estimation and forecasting on a regional scale still remains challenging. In this study, we established a two-branch deep learning model to predict winter wheat yield in the main producing regions of China at the county level. The ﬁrst branch of the model was constructed based on the Long Short-Term Memory (LSTM) networks with inputs from meteorological and remote sensing data. Another branch was constructed using Convolution Neural Networks (CNN) to model static soil features. The model was then trained using the detrended statistical yield data during 1982 to 2015 and evaluated by leave-one-year-out-validation. The evaluation results showed a promising performance of the model with the overall R 2 and RMSE of 0.77 and 721 kg/ha, respectively. We further conducted yield prediction and uncertainty analysis based on the two-branch model and obtained the forecast accuracy in one month prior to harvest of 0.75 and 732 kg/ha. Results also showed that while yield detrending could potentially introduce higher uncertainty, it had the advantage of improving the model performance in yield prediction.


Introduction
Wheat is one of the three staple foods in China, and plays an important role in maintaining food supply and security. Timely and accurate wheat yield information on the national scale is of great significance for agricultural decision making and sustainable developments. The grain yield is affected by multiple factors, such as soil quality, weather conditions, field management practices, agricultural subsidy policy, and the market prices of grain. For example, high grain prices usually encourage farmers to invest more resources to achieve higher yields [1]. In addition, many of these factors have a non-linear relationship with yield and some factors such as weather, soil, and applied nutrients are interrelated in a complex agroecosystem [2,3]. Thus, wheat yield forecasting over large spatial regions still remains a challenge.
Several methods have been developed to forecast crop yields using remotely sensed data in the last few decades [2]. One of the main categories is the data assimilation method, which retrieves crop status variables, such as Leaf Area Index (LAI) and Evapotranspiration (ET) from remotely sensed data, and uses these variables as inputs to re-calibrate and optimize the grain yield simulation ability of the crop growth model [4][5][6][7]. There are two main challenges of the data assimilation method. The first is the requirement of local calibration and numerous crop specific inputs (such as crop characteristics, field management practices, meteorological and soil data) to simulate crop growth and development through the crop cycle [8,9]. The second is higher resolution of observations result in an increased computational cost for data assimilation system, which makes it difficult to operate at large scale in practical application [10]. Another category of models are statistical regression-based methods which are the most commonly used methods for regional crop yield forecasting [11][12][13]. These methods are based on empirical relationships between historical yields and remote sensing derived variables such as the Normalized Difference Vegetation Index (NDVI) from AVHRR or MODIS satellite data. They are typically straightforward to implement without requiring numerous inputs. One of the main drawbacks of these empirical approaches is that the relationships used are typically localized and not easy to extend to other agricultural regions [8,14].
In recent years, machine learning and deep learning techniques have been successfully applied in many areas, such as image recognition, language translation, and signal processing [15]. Traditional machine learning methods such as Support Vector Machine (SVM) and Random Forest (RF) have been widely used in satellite image classification [16,17], parameter inversion [18], and crop yield prediction [19][20][21]. Meanwhile, deep learning techniques often achieve a better performance compared with traditional machine learning methods [22][23][24]. Subsequently, Convolution Neural Networks (CNN) and Long Short-Term Memory (LSTM) [25] networks, which are the most popular models among deep learning techniques, have been applied to crop yield estimation and forecasting. For example, You [26] proposed a deep learning framework to predict crop yield, which was trained by a new form of feature representation based on histograms of raw images. The model achieved a high accuracy and demonstrated a good ability in terms of transfer learning [27]. However, this method needs a large number of pixels in a region to generate the histograms, which limits their application regarding low spatial resolution images or yield prediction on the smaller scale. In addition, the commonly used representation of spatial aggregation can also be used in the deep learning models such as one dimensional CNN [28] or LSTM [29,30]. Both Oliveira [30] and Jiang [29] used LSTM to build their crop yield prediction model, using meteorological data and soil data. The difference between them is the former just kept soil data the same in all time steps and input them into LSTM directly, while the latter added the soil data from the fully connected layer of their neural networks model. However, the combination of remote sensing data, meteorological data, and soil data is rarely explored. Moreover, most of these deep learning methods did not quantify the uncertainty of model's outputs. It is important to know the confidence of prediction because knowing the uncertainty of yield prediction is beneficial to farmer decision making and policy makers to maximize profits and reduce losses [1]. Therefore, the potential of using deep learning in regional crop yield prediction has not been fully explored.
Yield trend is another issue that is not considered in machine learning methods when the training data come from a wide range of years. Chinese wheat yield has been increased rapidly (more than 1 % annually [31]) during recent decades, mainly due to new varieties, improved crop management technologies and policy reform [31][32][33]. There are some commonly used methods to detrend yield data including the linear model, quadratic model, cubic model, moving averaging, and so on [34][35][36]. These detrending methods generally use the year as the independent variable to obtain yield trends. The detrended yield, termed "yield anomalies" [36], is assumed to be correlated with the annual variability of weather conditions, such as water stress.
In this study, the winter wheat yield in the main producing area of China was estimated at the county level using deep learning methods based on long-term statistical yields and multi-source data. The main purposes of this study are: (1) to explore the feasibility of deep learning-based yield forecasting with long-term historical yield data; (2) to demonstrate the necessity of yield detrending and the impact of socioeconomic factors in yield estimation; (3) to quantitatively evaluate the uncertainty of the yield prediction model using Gal's methods [37].

Study Area
The study focuses on winter wheat planting regions in China (Figure 1), including most areas of Hebei, Henan, Shandong, and Jiangsu province, north of Anhui and Hubei province, southwestern Shanxi province, central Shaanxi province, east of Gansu and Sichuan province, extending from 103 • E to 122 • E and from 28 • N to 40 • N. It includes two main winter wheat growing regions: the northern and the southern areas. The northern area is located to the south of the Great Wall, north of Qinling Mountain and Huaihe River, where winter wheat planting acreage and production account for 60% of the country. The northern area has a temperate continental monsoon climate, with an annual average temperature of 9-15 • C and an annual rainfall of 440-980 mm. The main prevailing rotation mode is winter wheat and summer corn. The southern area is located to the south of Qinling Mountain and Huaihe River, with a subtropical monsoon climate, an annual average temperature of 16-24 • C and an annual precipitation more than 1000 mm. Winter wheat is mostly rotated with paddy rice in this area.

Satellite data
The Advanced Very High Resolution Radiometer (AVHRR, 0.05 arc degrees, daily) surface reflectance [38] were chosen as our satellite data source given its long-term observation and suitable temporal resolution. The bands "SREFL_CH1" (red band), "SREFL_CH2" (near-infrared band), "SREFL_CH3" (thermal infrared band), and the Normalized Difference Vegetation (NDVI) were selected as remote sensing input data.

Meteorological Data
Weather conditions play a vital role during crop growing season. The China Meteorological Forcing Dataset from the Institute of Tibetan Plateau Research, Chinese Academy of Sciences [39][40][41] was used as meteorological input data. This dataset provides seven meteorological variables (Table 1) including temperature, precipitation, air pressure, air specific humidity, downward shortwave radiation, downward longwave radiation, and wind speed, at a spatial resolution of 0.1 degrees and temporal resolution of 3 h. These variables are usually used as drivers for crop models to simulate the growing processes of crops, since they affect crops in different ways during growing stages. We expected to establish these effects on crops using our neural networks model.

Soil Data
Soil properties largely determine the spatial variability of crop yields when the weather is ideal and the crop management is in the same condition. Previous work [30] also demonstrated that soil quality is relevant for crop yield prediction, which provides complementary information and increases yield prediction accuracy. In this study, the SoilGrids dataset was used, which is a global gridded soil information dataset, with a spatial resolution of 1 km or 250 m [42,43]. We chose SoilGrids 250 m as soil inputs, which were predicted by an ensemble of machine learning methods fitted using 150,000 soil profiles and 158 remote sensing-based covariates [42]. This dataset includes nine soil features (listed in Table 1) describing the physical and chemical properties of the soil in seven layers of depth from 0 to 200 cm.

Cropland Data
It is difficult to obtain the specific crop planting map for all individual years from 1982 to 2015. Several previous studies [44][45][46] compared the use of a crop specific mask and generalized cropland mask for crop yield prediction. Although there could be some loss of accuracy in some cases, generalized cropland mask can still be used for crop yield prediction. Furthermore, according to the data of National Bureau of Statistics (http://data.stats.gov.cn/), winter wheat is the dominant crop in winter season over our study area, which could account for more than 90% of the total planting area among summer grain crops (Winter wheat in China is harvested in summer). Thus, cropland data were used as the wheat mask in the present study. The cropland product was downloaded from European Space Agency Climate Change Initiative (ESA CCI), which provides annual land cover maps from the year 1992 to 2015 with a spatial resolution of 300 m. This dataset is kept updating and can be used in practical application for yield prediction of current year.

Yield Data and Detrending
The county-level yield data in 1982-2015 was collected manually from local statistical yearbooks which were not published online (data for the years 1983, 1995, and 2006 were missing). Counties with a winter wheat planting area greater than 10,000 ha and more than 25 years of records were selected as our modeling and analysis samples ( Figure 1). There were in total 11,444 yield records after filtering.
As shown in Figure S2, the model was not able to learn the temporal trend of the yield when using the statistical yield data to train it. Thus, in hindsight, it is necessary to detrend yield data in the modeling process when the yield data are from a wide range of years. The trend of yields was assumed as linear ( Figure 2) and removed while keeping the spatial variability in each year by calculating relative yields. We first calculated the area weighted average yield (kg/ha) of all counties in every year. Then, for each year, the trend yield was obtained using linear regression with weighted average yields of other years. Finally, the relative yield (Equation (1)) was applied to all county-level yield data using the trend yield in the corresponding year. Here, the trend yields can be seen as average yield level of the whole study area. The relative yield indicates the relative difference between the statistical yield at the county level and the average yield in the study area. The relative yields were used to train our deep learning model to avoid the impact of trend. In practical application, to predict the yield for an individual year, we just need to predict the relative yield using the trained model and get the trend yield of this year using linear regression as the trend was assumed to be linear. Then the predicted yield (kg/ha) can be calculated reversely using Equation (1). The distribution difference ( Figure 3) between the actual yields and relative yields indicates that the yield level varies with time without detrending ( Figure 3a). However, a zero-mean distribution (Figure 3b) was achieved using relative yields which means the trend is removed acceptably.
where y i is a single county-level yield record, andȳ trend is the trend yield of the whole study area in the corresponding year.

Data Preprocessing
Remote sensing data were processed on the Google Earth Engine (GEE) [47], which is a planetary-scale platform for earth science data and analysis (The GEE code was shared https://drive. google.com/file/d/1m7-LhpfIoFU8LrvaSQGgQn00cSxV-aCt/view?usp=sharing). Meteorological data and soil data were downloaded and processed on desktop computer using GDAL (The Geospatial Data Abstraction Library). Although the surface reflectance data are daily, the time series has some missing values and fluctuates as a result of noise. To reduce the noise, we produced stable 8-day interval time series by maximum-value composite [48], and cropland masking was applied synchronously. Meteorological data were also aggregated every 8-day interval to keep the same time length as the remote sensing data. All remote sensing data, meteorological data, and soil data were spatially aggregated to the mean for each county with county boundaries. Invalid data were deleted and interpolated and each variable was normalized to 0-1 using its maximum and minimum values. Finally, all the time series had a length of 32 for each year, covering the entire growing season of winter wheat (from October to the middle of June in the following year).

Deep Neural Network Structure
Remote sensing and meteorological data change dynamically during crop growing season; however, soil properties can be assumed to be constant. According to different structures of dynamic and static features, the model was split into two components (see model structure in Figure 4).
The dynamic features were input into a LSTM network which is adept at processing time series data, and the static features were fed into a one-dimensional CNN to extract the vertical features from the multi-depth soil properties. The number of units in the LSTM cell was 32. As for the CNN component, there were four convolutional layers and the kernel size was 3. The dimensions of filters in these four layers were 128, 128, 256, and 256, and the strides were 1, 2, 1, 2, respectively. Both of the components were merged from the fully connected layer. The activation function is ReLU [49].

Quantification of Prediction Uncertainty
It is important to quantify the prediction uncertainty of the yield regression model. The prediction uncertainty can be divided into aleatoric and epistemic uncertainties on the basis of different sources [50]. Aleatoric uncertainty comes from observation noise while epistemic uncertainty accounts for the uncertainty of the model. Kendall and Gal's [37] methods were used to estimate these two types of uncertainties in the present study.
An adjusted loss function (Equation (2)) was used to estimate heteroscedastic aleatoric uncertainty with an assumption that observation noise can vary with different samples. This was achieved by splitting the last fully connected layer into two outputs (Figure 4), therefore, both of the f (x i ) and σ(x i ) could be learned simultaneously. Epistemic uncertainty was estimated using Monte Carlo dropout [51], which can be interpreted as a Bayesian approximation. In practice, the model needed to dropout before every weight layer and also used the dropout at the test time for the many forward passes, which was a sampling process from the approximate posterior.

Implementation Details
We used TensorFlow (GPU version 1.15) [52], Python's deep learning library, to construct and train our neural network model. All weights of the network were initialized by He initialization [53], and biases were initialized using zero. The loss function (Equation (2)) was used to optimize parameters. The initial learning rate was set to 0.0005 and the Adam optimizer [54] was applied to minimize the loss because of its ability to adjust the learning rate during the training stage. The batch size is 64 and the network was trained for 50 epochs, then the best model was saved during training.

Model Evaluation Metrics
We chose Normalized Root Mean Square Error (NRMSE), Coefficient Of Determination (R 2 ), and Mean Absolute Percentage Error (MAPE), the three metrics in Equations (4)- (6), and a schedule of leave-one-year-out-validation [55][56][57] to evaluate our model's accuracy. In other words, for each year, the model was trained using yield data from other years and validated using yield data from that year. There were a total of 11,444 yield samples. The sample size for each year was not exactly the same, varying between 300 and 400. This form of validation guarantees that the prediction of the model in each year is unbiased by information of the year to be predicted. It should be noted that NRMSE is more reasonable than RMSE when the model is validated yearly because the levels of yield change with the year. Figure S2 shows the scatter plots of every year with the three accuracy metrics. Generally, the accuracy in recent years is higher than in early years. The overall R 2 , RMSE, and MAPE of all samples are 0.77, 721 kg/ha, and 15.38%, respectively, which were calculated using the predicted value of all the samples using leave-one-year-out-validation.

Model Accuracy
We also calculated the three accuracy metrics for each county in all the years. The spatial distributions of R 2 , NRMSE, and MAPE ( Figures 5-7) show that the R 2 values in more than 80% of counties are positive and greater than 0.5, indicating that the model can explain more than 50% of the time variability for these counties. The NRMSE and MAPE are less than 0.2 and 20% for most counties, respectively. Generally, higher errors are observed in northwestern provinces (Shanxi, Shaanxi, and Gansu) since most of the cropland in these regions is rainfed ( Figure S3) and the planting density of wheat is low, while most of the North China Plain is irrigated with a high planting density. Yields in rainfed areas are generally lower than those in irrigated areas ( Figure S4). Yield data from irrigated areas is much more than that from rainfed areas and the inputs of our model did not include irrigation data of different areas. Besides, the generalized cropland mask was used in this study rather than the winter wheat mask. These factors would limit the model's performance in rainfed areas and non-wheat dominant areas. This is one of the reasons that low yields were overestimated, as shown in the scatter plots of Figure S2. Another reason could be that the sample sizes of the low yield and high yield were smaller than that of the middle yield. The model was trained by this unbalanced sample distribution, which caused the model to tend towards predicting the middle yield. In addition, training deep learning models needs a large amount of data, and the total of 11,444 samples in this study may not be sufficient to obtain better results in the low and high ends of the yield. As a result, low yields were overestimated and high yields were underestimated, especially in the early years.

Impact of Different Data Sources on Model Accuracy
To evaluate the yield prediction performance on different data sources, we performed an analysis on several combinations of data sources, and the overall accuracies are listed in Table 2. Compared with meteorological data, using remote sensing data alone achieved a higher accuracy with an R2 of 0.58, RMSE of 969 kg/ha, and MAPE of 21.04%, because remote sensing data are more direct and a more comprehensive observation of crop growing status. In addition, there is supposed to be higher accuracy if using winter wheat masks. On the other hand, most regions of the study area are irrigated, which may weaken the impact of weather conditions. The combination of remote sensing and meteorological data shows a slight increase in accuracy compared with remote sensing data alone, which means that the contributions from them are overlap to a great extent. The accuracy shows a great improvement when soil information is added ( Table 2). The overall R 2 , RMSE, MAPE values are 0.60, 945 kg/hm 2 , 19.94%, respectively, when the model was trained without the soil branch. However, these values are 0.77, 721 kg/ha, and 15.38% with the soil branch. Figure 8 shows the difference in NRMSE with and without the soil branch. There is about a 20% improvement in model accuracy, which indicates the addition of soil data improved model's ability of capturing spatial variability of yield.
It should be noted that the soil information was constant for each county in all the years when training the model. To show whether the model is overfitted on soil information, we implemented cross-validation based on location by randomly selecting yield samples from 70% of the counties to train the model and used samples from the other counties to validate. This validation was performed ten times and average accuracy metrics were calculated using accuracy metrics of these ten times ( Table 3). The results show that there is no obvious overfitting, which means the model is generalized and can be used to estimate the yield in the counties where the statistical yield is not available. Table 3. Cross-validation of model with a random split from ten rounds. For each round, samples of 70% of the counties were used to train, the other samples were used to validate.

Improvement in Yield Detrending
As shown in Figure 9a, the model without the detrending yield produced highly biased results whereby the predicted yields are overestimated in the early years and underestimated in recent years, indicating the model is not able to capture the increasing trend of the yield caused by advancement in crop variety, technology, and field management. Figure 9 compares the predicted average yield of all counties in all years before and after yield detrending. When the model was trained using the relative yields, the overall R 2 , RMSE, MAPE are 0.77, 721 kg/ha, and 15.38%, while they are 0.23, 1312 kg/ha, and 24.4% without detrending (results of every year are shown in Figures S1 and S2). The improvement in accuracy is significant, and there is a more consistent trend (Figure 9b) with the statistical average yield.
(a) (b) Figure 9. The predicted average yield of all counties before and after detrending. The red solid line is the average yield of all county-level yield data. (a) The predicted average yield of all counties before detrending; (b) The predicted average yield of all counties after detrending.

Comparison with Machine Learning Methods
To further evaluate the model performance, we compared the proposed deep learning model with other machine learning methods, including random forest, support vector machine, and lasso regression, the parameters of which were optimized to reach as high an accuracy as possible. These machine learning models were performed using scikit-learn, which is a machine learning package in Python. Table 4 shows the overall accuracy of these models. The results indicate that the proposed model outperforms these machine learning approaches, because deep learning models have a better ability to model high-level feature representation through a hierarchical learning framework [58].

Quantified Uncertainty
As described in Section 2.3.2, the model can estimate the uncertainty of each output, which is regarded as the confidence interval of model prediction. We randomly chose some counties to show the changes of their yields and uncertainty over years in Figure 10. The scatter plots on right side of the figure indicate the prediction accuracy of these counties. It can be seen that the aleatoric uncertainty is generally greater than the epistemic uncertainty.

Yield Forecast Ability
A permutation test [59] was performed to evaluate the relative importance of different months throughout the growth period of winter wheat. By permuting the data of a specific period, if the loss of accuracy is large compared to the model with complete data, we can conclude that the data during this period has great influence on the final yield prediction. The results ( Figure 11) show that April is the most important growth period, in which the heading stage of winter wheat occurs. Therefore, we trained the model using data obtained before May to test the forecast ability. The overall R 2 , RMSE, and MAPE of the prediction in early May are 0.75, 732 kg/ha, and 15.8%, respectively. There is small decrease in accuracy compared to the result using data in the full growing stages. Therefore, the results suggest that it is possible to predict wheat yield one month before harvest, indicating the high forecast ability of the model.

Uncertainty Changes Caused by Yield Detrending
Although the model accuracy improved when trained by the relative yield, the aleatoric uncertainty shows a significant increase ( Figure 12 ). On average, the aleatoric uncertainties before and after detrending are 27 kg/ha and 752 kg/ha, the epistemic uncertainties are 238 kg/ha and 171 kg/ha respectively. The aleatoric uncertainty before yield detrending is small and almost constant, indicating the homoscedastic aleatoric uncertainty. In contrast, it is apparent that yield detrending brings more noise to the labels and this noise varies with different samples, corresponding to heteroscedastic aleatoric uncertainty. The difference between the homoscedastic and heteroscedastic aleatoric uncertainty was explained by Gal [60] with some demonstrations. The reason for this change is that overall trend was used in our detrending method instead of trends in individual counties; however, the increasing rates of yield are different for each county. As a result, the aleatoric uncertainty rises significantly. On the other hand, the epistemic uncertainty decreased slightly. The epistemic uncertainty comes from the model structure and parameters. The decrease in epistemic uncertainty is thought to be caused by the model parameters since the model structure was the same before and after detrending. For the sample subset of each year, the joint distribution of inputs and outputs is closer to the results of all samples than those without detrending. Reduction of out-of-distribution samples in individual years helps to reduce epistemic uncertainty [37]. This change of uncertainty indicates that the deep learning model becomes more robust if trained by the relative yield. Nevertheless, the requirements for detrending could be a main limitation in this study and future studies are needed to better model the yield trend.

Impact of Agricultural Policies
Yields were overestimated with high biases during 2000 to 2003, which can be observed from the scatter plots in Figure S1 and the plot in Figure 9b. The main reason for that is the changing agricultural policies during these years. Grain supply exceeded demand and storage increased at the end of the 1990s because of the increasing production. Furthermore, the Chinese government have not issued production targets since 2001, resulting in a fast decrease in the winter wheat planting area during the adjustment of the crop planting structure. In addition, there was a stagnation of yield growth in many regions caused by the planting area reduction in low-quality and high-yield varieties, but an increase in high-quality and low-yield varieties. As a result, grain production reached a low point in 2003. In year 2004, the market grain price began to increase because of high demand and low storage. The government introduced a series of agricultural subsidy policies such as minimum purchase prices and subsidies to encourage grain production. Farmers were motivated by these policies and put more inputs such as fertilizer, pesticides, and machinery into agricultural production, making the planting area, yield, and production start to increase again [61,62].
In addition, severe agro-meteorological disasters could be another possible reason for the high biases in the yield prediction from 2000 to 2003 ( Figure S1 and Figure 9b). The main hazard for winter wheat in China is drought; this is based on an analysis of the records of agro-meteorological stations and statistics [63]. However, the biases are so high that it could not even be caused by the severe drought during 2000 to 2003. Therefore, we suggest that socioeconomic factors might have a major effect on production, which was also demonstrated by Yu [64] using multiple crop models.

Potential for Integrating Prior Knowledge
In this study, we tried to build a data-driven model using the same environmental factors as most crop models. However, the fact is that most data-driven models can only capture correlation but not causality [65]. Therefore, integrating prior knowledge can improve the model performance and credibility, which can be achieved by combining submodules in process models [66,67]. For regional yield estimation in this study, the model could be constrained by some rules or physical laws that are already clearly known and described mathematically in crop models.
In addition, the influence of socioeconomic factors can also be expressed by some econometric models to fit a better yield trend than just using a linear trend with the year. Figure 13 is the linear regressed wheat yield of China using wheat planting inputs per unit area, including seeds, fertilizers, pesticides, and machinery, which were collected from "Compilation of national agricultural product cost-benefit data" of China. Although the stagnation of yield growth in the 2000-2003 period is modeled well, these kinds of data are collected afterwards, and so are not available before harvest and cannot be used to make a yield prediction. Apparently, production inputs are affected by socioeconomic factors such as market prices and agricultural subsidies, which are available prior to harvest. This makes it possible to quantitatively estimate the impact of socioeconomic factors in yield prediction by combining some econometric models.

Conclusions
In this study, we developed a deep learning model which combined remote sensing, meteorological, and soil data, to estimate the winter wheat yield in the main planting area of China at the county level. The model was trained using the detrended statistical yield data and evaluated by leave-one-year-out-validation, and showed a satisfactory accuracy of wheat yield prediction. Results also showed that the yield forecast could be achieved at least one month prior to harvest with a small loss of accuracy. The uncertainty analysis was further conducted using an adjusted loss function and Monte Carlo dropout. We found that yield detrending could possibly cause high uncertainty of yield prediction and should be treated carefully when applied in crop yield prediction. The proposed method exhibits great potential for applications in other crop types and agricultural landscape regions all over the world.
Supplementary Materials: The following are available online at http://www.mdpi.com/2072-4292/12/11/ 1744/s1, Figure S1: Winter wheat yield forecasting accuracy with deep learning model from 1982 to 2015 before detrending yield, Figure S2: Winter wheat yield forecasting accuracy with deep learning model from 1982 to 2015 after detrending yield, Figure S3: Locations of counties where MAPE is greater than 25% and the background is distribution of irrigated and rainfed areas, Figure S4: Average yield of all years for each county.

Abbreviations
The following abbreviations are used in this manuscript: