Integration of Sentinel-3 and MODIS Vegetation Indices with ERA-5 Agro-Meteorological Indicators for Operational Crop Yield Forecasting

Timely crop yield forecasts at a national level are substantial to support food policies, to assess agricultural production, and to subsidize regions affected by food shortage. This study presents an operational crop yield forecasting system for Poland that employs freely available satellite and agro-meteorological products provided by the Copernicus programme. The crop yield predictors consist of: (1) Vegetation condition indicators provided daily by Sentinel-3 OLCI (optical) and SLSTR (thermal) imagery, (2) a backward extension of Sentinel-3 data (before 2018) derived from cross-calibrated MODIS data, and (3) air temperature, total precipitation, surface radiation, and soil moisture derived from ERA-5 climate reanalysis generated by the European Centre for Medium-Range Weather Forecasts. The crop yield forecasting algorithm is based on thermal time (growing degree days derived from ERA-5 data) to better follow the crop development stage. The recursive feature elimination is used to derive an optimal set of predictors for each administrative unit, which are ultimately employed by the Extreme Gradient Boosting regressor to forecast yields using official yield statistics as a reference. According to intensive leave-one-year-out cross validation for the 2000–2019 period, the relative RMSE for voivodships (NUTS-2) are: 8% for winter wheat, and 13% for winter rapeseed and maize. Respectively, for municipalities (LAU) it equals 14% for winter wheat, 19% for winter rapeseed, and 27% for maize. The system is designed to be easily applicable in other regions and to be easily adaptable to cloud computing environments such as Data and Information Access Services (DIAS) or Amazon AWS, where data sets from the Copernicus programme are directly accessible.


Introduction
Reliable and timely country-wide crop yield forecasts play an important role in 22 supporting national agricultural policies, food security, and planning of food supplies in 23 countries affected by food shortage [1]. Moreover, crop yield data have been increasingly 24 used to analyze agricultural productivity potential [2,3], carbon and nitrogen cycles [4,5], 25 greenhouse gas emissions from agriculture [6], as well as impact of climate change on 26 agricultural production [7]. In this respect, food shortage has become more frequent due 27 to extreme weather events that are more likely to occur under the changing climate [8]. 28 Remotely sensed satellite data sets offer unique, timely, objective, economical and 29 spatially homogeneous information on agriculture over vast areas [9]. Therefore, they 30 have been widely used to monitor crop growth and forecast crop yields (e.g., [9][10][11][12][13][14][15][16][17][18][19]. The 31 comprehensive overview on utilization of satellite data for agriculture monitoring is 32 given by Weiss et al. (2020) [20]. Recently, the Copernicus programme with Sentinel-1 33 (radar) and Sentinel-2 (optical) satellite constellations has opened a new chapter in 34 Version December 6, 2021 submitted to Remote Sens.
https://www.mdpi.com/journal/remotesensing ferent crops or even by non-agricultural land. MODIS sensors mounted aboard the Terra 48 and Aqua satellites provide 250 m imagery in the red and near infrared bands which are 49 crucial for vegetation studies. Consequently many studies incorporated MODIS data to 50 forecast crop yields [9,19,26,27]. Nevertheless, the MODIS instruments operating since

88
The acquired products were further mosaicked, cloud masked using associated quality classes as arable land. Further, the binary mask was used to generate fractional arable 119 land products at spatial resolutions matching the Sentinel-3, MODIS and ERA-5 prod-120 ucts. These fractional estimates were used as weights to spatially aggregate satellite and   where the products were initially masked whenever a fraction of arable land within a 140 pixel is less than 30%. The remaining pixels were averaged within administrative units and ERA-5 products at native temporal resolution (see Table 1). values. Therefore, in a first step the smoothing method fitted a spline to the original data.

151
Then, the distance (difference) between the fit and the original values created weights so 152 that the original values above the spline fit received high weights and the values below 153 initial fit received weights equal to 0. Finally, the smoothed NDVI was generated by the 154 second cubic spline fit that uses these weights.
where T base stands for crop-specific temperature threshold: 5°C for winter wheat and Temperature Condition Index (TCI) defined as: where GDD indicates the growing degree days (thermal time), NDVI GDD -an instan- where:  The most accurate calibration models homogenizing MODIS NDVI and LST prod-259 ucts with Sentinel-3 counterparts are given in Table 2. The RF and kNN models were 260 found to be the optimal for the cross-calibration of NDVI and LST, respectively.         performance. Figure 15 shows that for all three crops, the larger the training dataset 322 (more years), the higher the accuracy of the crop yield forecasts. This is particularly 323 evident for maize, for which a long data series corresponds to more accurate prediction 324 than for the other two crops. The crop yield forecasting system is fully automatic which implies that data collec-327 tion, processing, crop yield forecasting and generation of graphical and tabular outputs 328 do not require a human operator. Figure 16 presents the workflow of the crop yield 329 forecasting system along with software that was used at each processing step. The in R. Figure 17 shows an example of graphical system output, which complements the 336 tabular data (CSV) and geospatial vector data (SHP).

337
To guarantee the portability of the system, everything was installed on a virtual  resolution Sentinel-2 imagery [39,40]. The future extensions of the proposed system 386 will allow for crop yield predictions at a field level. However, to date it is not possible 387 due to data availability and quality issues. Thus, the system we present is based on