Wheat is one of the main crops in China, and crop yield prediction is important for regional trade and national food security. There are increasing concerns with respect to how to integrate multi-source data and employ machine learning techniques to establish a simple, timely, and accurate crop yield prediction model at an administrative unit. Many previous studies were mainly focused on the whole crop growth period through expensive manual surveys, remote sensing, or climate data. However, the effect of selecting different time window on yield prediction was still unknown. Thus, we separated the whole growth period into four time windows and assessed their corresponding predictive ability by taking the major winter wheat production regions of China as an example in the study. Firstly we developed a modeling framework to integrate climate data, remote sensing data and soil data to predict winter wheat yield based on the Google Earth Engine (GEE) platform. The results show that the models can accurately predict yield 1~2 months before the harvesting dates at the county level in China with an R2
> 0.75 and yield error less than 10%. Support vector machine (SVM), Gaussian process regression (GPR), and random forest (RF) represent the top three best methods for predicting yields among the eight typical machine learning models tested in this study. In addition, we also found that different agricultural zones and temporal training settings affect prediction accuracy. The three models perform better as more winter wheat growing season information becomes available. Our findings highlight a potentially powerful tool to predict yield using multiple-source data and machine learning in other regions and for crops.
This is an open access article distributed under the Creative Commons Attribution License
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited