Integration of Sentinel-3 and MODIS Vegetation Indices with ERA-5 Agro-Meteorological Indicators for Operational Crop Yield Forecasting

Bojanowski, Jędrzej S.; Sikora, Sylwia; Musiał, Jan P.; Woźniak, Edyta; Dąbrowska-Zielińska, Katarzyna; Slesiński, Przemysław; Milewski, Tomasz; Łączyński, Artur

doi:10.3390/rs14051238

Open AccessArticle

Integration of Sentinel-3 and MODIS Vegetation Indices with ERA-5 Agro-Meteorological Indicators for Operational Crop Yield Forecasting

by

Jędrzej S. Bojanowski

^1,*,†,‡

,

Sylwia Sikora

^1,†,

Jan P. Musiał

^1,‡

,

Edyta Woźniak

²

,

Katarzyna Dąbrowska-Zielińska

¹

,

Przemysław Slesiński

³

,

Tomasz Milewski

³ and

Artur Łączyński

³

¹

Remote Sensing Centre, Institute of Geodesy and Cartography, 02-679 Warsaw, Poland

²

Space Research Centre, Polish Academy of Sciences, 00-716 Warsaw, Poland

³

Statistics Poland, 00-925 Warsaw, Poland

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

^‡

Current address: CloudFerro Sp. z o.o, 00-511 Warsaw, Poland.

Remote Sens. 2022, 14(5), 1238; https://doi.org/10.3390/rs14051238

Submission received: 25 January 2022 / Revised: 24 February 2022 / Accepted: 1 March 2022 / Published: 3 March 2022

(This article belongs to the Special Issue European Remote Sensing-New Solutions for Science and Practice)

Download

Browse Figures

Versions Notes

Abstract

:

Timely crop yield forecasts at a national level are substantial to support food policies, to assess agricultural production, and to subsidize regions affected by food shortage. This study presents an operational crop yield forecasting system for Poland that employs freely available satellite and agro-meteorological products provided by the Copernicus programme. The crop yield predictors consist of: (1) Vegetation condition indicators provided daily by Sentinel-3 OLCI (optical) and SLSTR (thermal) imagery, (2) a backward extension of Sentinel-3 data (before 2018) derived from cross-calibrated MODIS data, and (3) air temperature, total precipitation, surface radiation, and soil moisture derived from ERA-5 climate reanalysis generated by the European Centre for Medium-Range Weather Forecasts. The crop yield forecasting algorithm is based on thermal time (growing degree days derived from ERA-5 data) to better follow the crop development stage. The recursive feature elimination is used to derive an optimal set of predictors for each administrative unit, which are ultimately employed by the Extreme Gradient Boosting regressor to forecast yields using official yield statistics as a reference. According to intensive leave-one-year-out cross validation for the 2000–2019 period, the relative RMSE for voivodships (NUTS-2) are: 8% for winter wheat, and 13% for winter rapeseed and maize. Respectively, for municipalities (LAU) it equals 14% for winter wheat, 19% for winter rapeseed, and 27% for maize. The system is designed to be easily applicable in other regions and to be easily adaptable to cloud computing environments such as Data and Information Access Services (DIAS) or Amazon AWS, where data sets from the Copernicus programme are directly accessible.

Keywords:

crop monitoring; crop yield; data calibration; extreme gradient boosting; growing degree days; machine learning; satellite data; thermal time

1. Introduction

Reliable and timely country-wide crop yield forecasts play an important role in supporting national agricultural policies, food security, and planning of food supplies in countries affected by food shortage [1]. Moreover, crop yield data have been increasingly used to analyze agricultural productivity potential [2,3], carbon and nitrogen cycles [4,5], greenhouse gas emissions from agriculture [6], as well as the impact of climate change on agricultural production [7]. In this respect, food shortage has become more frequent due to extreme weather events that are more likely to occur under the changing climate [8].

Remotely sensed satellite data sets offer unique, timely, objective, economical, and spatially homogeneous information on agriculture over vast areas [9]. Therefore, they have been widely used to monitor crop growth and forecast crop yields (e.g., [9,10,11,12,13,14,15,16,17,18,19]). The comprehensive overview on the utilization of satellite data for agriculture monitoring is given by Weiss et al. (2020) [20]. Recently, the Copernicus programme with Sentinel-1 (radar) and Sentinel-2 (optical) satellite constellations has opened a new chapter in monitoring agriculture production on a country level. Coupling data from these two instruments featuring frequent revisit (3–6 days depending on latitude) and high spatial resolution (10–20 m) has proved to provide accurate crop type maps at a regional level [21,22,23]. Nevertheless, applicability of Sentinel-2 imagery to forecast crop yields is limited due to a short time series (5 years to date) and due to a lack of consistent crop yield reference data at a field scale across a country.

The most common approach to forecast crop yields using satellite data is to build a statistical model that relates annual crop condition anomalies described by vegetation indices, e.g., Normalized Difference Vegetation Index (NDVI) or Fraction of Absorbed Photosynthetically Active Radiation (fAPAR), to national statistics of crop yields [9,11,24,25,26]. Such an approach benefits from long-term multidecadal satellite observations, which are available from low and moderate resolution (250–1000 m) satellite sensors such as the Advanced Very-High-Resolution Radiometer (AVHRR), Végétation, or the Moderate Resolution Imaging Spectroradiometer (MODIS). The main drawback of a coarse resolution imagery is related to heterogeneity of pixels which often contain several agricultural fields covered by different crops or even by non-agricultural land. MODIS sensors mounted aboard the Terra and Aqua satellites provide 250 m imagery in the red and near infrared bands which are crucial for vegetation studies. Consequently, many studies incorporated MODIS data to forecast crop yields Refs. [9,19,26,27]. Nevertheless, MODIS instruments operating since 1999 (Terra platform) and 2002 (Aqua platform), several times exceeded the expected 6 years of mission duration. Therefore, in this study data from Sentinel-3 satellite constellation (designed to operate till 2032) to operationally acquire near-real time vegetation indices at 300-m resolution have been employed. However, in order to compute temporal anomalies, the Sentinel-3 time series have been extended with MODIS measurements, re-calibrated to match the Sentinel-3 signal.

The ultimate aim of this study is to provide a methodology to estimate crop yields on a country scale during the growing season. This responds to the need of statistical offices (in this case Statistics Poland) that are obliged to provide statistical information on agricultural production. In this context, this study aims to evaluate with what accuracy yield forecasts can be produced using an automated system running on publicly available data. Therefore, within this study, a novel system for an operational crop yield forecasting at Nomenclature des Unités Territoriales Statistiques level 2 (NUTS-2) and Local Administrative Units (LAU) is proposed that builds on a fusion of satellite-based vegetation indices, agro-meteorological indicators, and crop phenology approximated by thermal time. The system exploits Copernicus data sets and climate reanalysis available free of charge at a global scale, and thus can be applied at any location. Within this study, the system is utilized for predicting yields of winter wheat, winter rapeseed, and maize in Poland for a period 2000–2019. To build a system, the first specific objective is to calibrate vegetation indices from Sentinel-3, that are used operationally, with MODIS data which extends the data record to the period before 2018. A second specific objective is to verify the applicability of ERA-5 climate reanalysis data to describe agro-meteorological conditions that affect crop development. A third specific objective is to quantify the impact of the length of the time series used to train the forecasting models, which differs depending on the availability of reference crop yield statistics, on the model performance. Finally, the details on numerical implementation of the system are provided.

2. Data

2.1. Satellite Data

2.1.1. Sentinel-3 Operational Products

The Level-2 Near Real Time (NRT) products (i.e., surface reflectance, vegetation indices, land surface temperature) acquired by the Ocean and Land Colour Imager (OLCI) and Sea and Land Surface Temperature Radiometer (SLSTR) mounted onboard Sentinel-3 satellites are freely available from the Copernicus Open Access Hub (https://scihub.copernicus.eu/ accessed on 20 November 2021) with a delay up to 3 h after satellite acquisition. The following daily products from the Sentinel-3A and Sentinel-3B satellites for 2018–2020 were used in this study:

Land Full Resolution (LFR) product derived from OLCI imagery at a 300-m resolution consisting of Global Vegetation Index (OGVI) and Terrestrial Chlorophyll Index (OTCI) indices accompanied with rectified reflectances at 681 nm (RED) and 865 nm (NIR) channels used in this study to calculate Normalized Difference Vegetation Index (NDVI) using formula:

$NDVI = (NIR - RED) / (RED + NIR);$

(1)
Land Surface Temperature (LST) from the SLSTR sensor at a 1-km resolution.

The acquired products were further mosaicked, cloud masked using associated quality flags and reprojected to the Poland CS92 coordinate system (EPSG: 2180) using ESA SNAP software.

These satellite data provide information related to vegetation vigour and biomass (from NDVI) and vegetation transpiration (from LST derivatives). The NDVI and LST indicators provide complementary information on the vegetation physical status and access to water in the root zone (through evapotranspiration related to LST). These particular Sentinel-3 products were selected in line with the overall system design to be based on Copernicus’ ready-to-use products, i.e., not requiring pre-processing of raw data. This ensures that the system will use the data generated by the latest processing methods provided by Copernicus (i.e., even if they will be updated).

2.1.2. MODIS Products

The MOD09Q1, MOD11A2 collection 6 (V006) products generated from the Moderate-Resolution Imaging Spectroradiometer (MODIS) imagery are freely available from the Land Processes Distributed Active Archive Center (LP DAAC) of the U.S. Geological Survey (https://lpdaac.usgs.gov accessed on 20 November 2021). The acquired products covered Poland and the period 2000–2019. The MOD09Q1 product contains 8-day composites of land surface spectral reflectance acquired in 620–670-nm (RED) and 841–876-nm (NIR) channels at a 250-m resolution, which were used to calculate NDVI. Information on the presence of clouds, cloud shadows, and snow was derived from the associated quality flags. The MOD11A2 product provides 8-day composites of daytime and night-time LST and emissivity calculated from the 11.03

μ

m and 12.02

μ

m channels at a spatial resolution of 1 km. The NDVI and LST MODIS products were mosaicked and reprojected to the Poland CS92 coordinate system (EPSG: 2180).

These MODIS products were chosen for a backward prolongation of Sentinel-3 products due to a similar processing stage and spatial resolution. They serve as a long term average for calculation of anomaly-derived indices from Sentinel-3. With this in mind, the data can be at a certain level of generality, as only the average course of the product along the vegetation season is needed. For this reason, it was decided to use 8-day composites, because in the aggregation process the observations with the highest quality are selected taking into account cloud contamination and the sun zenith angle [28].

2.2. Agro-Meteorological Data

Agro-meteorological data for the period 2000–2019 at a deg resolution of 0.25 × 0.25 were derived from the ERA-5 reanalysis generated by the European Centre for Medium-Range Weather Forecasts and freely distributed through the Copernicus Climate Data Store. They included hourly data at surface level consisting of: 2-m air temperature, total precipitation, surface incoming solar radiation, and volumetric soil water at 0–7-cm and 7–28-cm depths. These parameters were aggregated into daily means and/or sums using Climate Data Operators (CDO) software [29]. Additionally, minimum and maximum daily air temperatures were calculated.

2.3. Crop Mask

A binary crop mask was derived from the Corine Land Cover version 2018 classification, freely distributed by the European Environmental Agency (EEA) at https://land.copernicus.eu/pan-european/corine-land-cover/clc2018 (accessed on 20 November 2021), by extracting 2.1.1 (non-irrigated arable land) and 2.4.2 (complex cultivation patterns) classes as arable land. Further, the binary mask was used to generate fractional arable land products at spatial resolutions matching the Sentinel-3, MODIS, and ERA-5 products. These fractional estimates were used as weights to spatially aggregate satellite and agro-meteorological variables for the administrative units. An example of fractional arable land over Poland is presented in Figure 1.

2.4. Crop Yield Statistics

The reference data for the crop yield forecasting model consisted of official yield statistics provided by Statistics Poland at NUTS-2 and LAU levels (Figure 2). The NUTS-2 data included winter wheat, winter rapeseed, and maize yields expressed in decatons [dt] for the period 1997–2019. At the LAU level, the length of the time series was shorter, and also inconsistent among administrative units (Figure 3).

The yield statistics for each NUTS-2 and LAU region were transformed into temporal yield residuals (Figure 4) from the Theil–Sen monotonic trend [30] in annual yields estimates covering the period 1997–2019 (Figure 5). These yield residuals were used as response variables in crop yield forecasting. The final absolute yield forecast consisted of a sum of the monotonic trend and a forecasted yield residual for a particular year.

3. Methods

3.1. Spatial Aggregation

Data derived from MODIS, Sentinel-3, and ERA-5 products were spatially aggregated for arable land (using a crop mask) within administrative units (NUTS-2 and LAU) (Table 1). The aggregation followed the approach proposed by Genovese et al. (2001) [31] where the products were initially masked whenever a fraction of arable land within a pixel is less than 30%. The remaining pixels were averaged within administrative units using the fraction of arable land as weights. Additional adjustment of weights at the borders between administrative units was performed to reduce the importance of pixels covered by more than one unit. The aggregation resulted in a database consisting of predictors at NUTS-2 and LAU administrative levels averaged from MODIS, Sentinel-3, and ERA-5 products at a native temporal resolution (see Table 1).

3.2. Temporal Smoothing of NDVI Values

A 2-iterative original cubic spline smoothing technique (implemented in the R environment [32]) was applied to filter out spurious NDVI values (MODIS and Sentinel-3) introduced by residual cloud cover and/or by geolocation errors. This technique assumes that residual cloud cover decreases NDVI values. Therefore, in a first step, the smoothing method fitted a spline to the original data. Then, the distance (difference) between the fit and the original values created weights so that the original values above the spline fit received high weights and the values below the initial fit received weights equal to 0. Finally, the smoothed NDVI was generated by the second cubic spline fit that uses these weights. The applied method is a modification of Chen et al. (2004) [33] who used the Savitsky–Golay filter (instead of spline fit) in the iterative NDVI smoothing.

3.3. Cross-Calibration of NDVI and LST Products Derived from MODIS and Sentinel-2 Data

The temporal homogeneity of NDVI- and LST-based indices is crucial to derive reliable anomalies that could serve as crop yield predictors. Therefore, the MODIS NDVI and LST time series were cross-calibrated with analogous Sentinel-3 products to form homogenous data records. The calibration method is built on the automatic selection of the optimal machine learning method amongst: Random Forest (RF), K-nearest Neighbor (kNN), Support Vector Machine (SVM), and Neural Network (NN), which yielded the largest modeling efficiency (EF) (formula given in Section 3.6) between re-calibrated MODIS time series and corresponding original Sentinel-3 time series. The training and validation of machine learning techniques were performed for the period 2018–2020 when both MODIS and Sentinel-3 satellites were operational. The validation followed the leave-one-year-out approach to choose the most accurate machine learning method, which has occurred to vary for different data sets. To model the differences between MODIS and Sentinel-3 indices, three explanatory variables were used. The first one was calendar time expressed as a day of the year. The second explanatory variable was thermal time expressed by the growing degree days (see Section 3.4 for details) indicating the amount of thermal energy accumulated at a given time and the amount of energy needed to reach a given stage of crop development. The third explanatory variable was the MODIS product (i.e., NDVI or LST) to be homogenized with the Sentinel-3 counterpart. Ultimately, the trained calibration models were applied to the MODIS NDVI and LST time series between the years 2000 and 2017 to extend the Sentinel-3 time series.

3.4. Resampling of Explanatory Variables from Calendar Time to Thermal Time

To ensure year-to-year comparability of vegetation conditions, the explanatory variables were resampled from calendar time (day of year) to thermal time, which denotes cumulated mean daily air temperatures at 2 m a.g.l above a crop-specific threshold. Thus, a thermal time is a good proxy for the crop development stage [34,35,36]. Analysis of vegetation indices in respect to thermal time allows derivation of temporal anomalies by referring instantaneous values of an index to a multiannual average calculated for the same thermal time (i.e., the same crop development stage). If the calendar time was used instead of the thermal time, the temporal anomalies could be related to the shift in a vegetation season (e.g., a delay in biomass accumulation), however not to the actual crop conditions that are to be used to forecast crop yields.

Thermal time was calculated for a day d of the year as so-called Growing Degree Days (GDD) from daily maximum (

T_{\max}

) and minimum (

T_{\min}

) air temperatures using a formula:

{GDD}_{d} = \sum_{i = 1}^{d} [(\frac{T_{\max, i} + T_{\min, i}}{2} - T_{base}) \times \underset{conditional}{\underset{︸}{[(\frac{T_{\max, i} + T_{\min, i}}{2} - T_{base}) > 0]}}]

(2)

where

T_{base}

stands for crop-specific temperature threshold: 5 °C for winter wheat and winter rapeseed, and 10 °C for maize. In addition,

T_{\min, i}

equals

T_{base}

if

T_{\min, i} < T_{base}

, and

T_{\max, i}

equals 30 °C if

T_{\max, i} > 30

°C. The conditional part of the equation equals 1 if the condition is met, and 0 otherwise, which implies that only positive values (mean air temperature values reduced by the threshold) are summarized.

Based on daily GDD values, all yield predictors were resampled for eight GDD values ranging from 150 °C to 1200 °C with a step of 150 °C. Since GDD were calculated at a daily time step, all predictors had to have the same 1-day temporal resolution prior resampling. Therefore, the 8-day MODIS NDVI and LST products were converted to 1-day resolution using the spline function.

Resampling of NDVI and LST to thermal time allowed derivation of normalized indicators proposed by Kogan (1997) [37] such as: Vegetation Condition Index (VCI) and Temperature Condition Index (TCI) defined as:

{VCI}_{GDD} = 100 \times \frac{{NDVI}_{GDD} - {NDVI}_{\min, GDD}}{{NDVI}_{\max, GDD} - {NDVI}_{\min, GDD}}

(3)

{TCI}_{GDD} = 100 \times \frac{{LST}_{\max, GDD} - {LST}_{GDD}}{{LST}_{\max, GDD} - {LST}_{\min, GDD}}

(4)

where GDD indicates the growing degree days (thermal time),

{NDVI}_{GDD}

—an instantaneous NDVI value, and

{NDVI}_{\min, GDD}

and

{NDVI}_{\max, GDD}

—minimum and maximum NDVI recorded in the period 2000–2019 at a particular location for a given GDD, respectively. Definition of

{TCI}_{GDD}

follows the same logic as

{VCI}_{GDD}

.

3.5. Crop Yield Forecasting

Crop yield forecasting proposed in this study employs a machine learning technique i.e., eXtreme Gradient Boosting (XGBoost) algorithm [38] implemented in R environment [32,39] to predict crop yield residuals from the Theil–Sen monotonic trend using a variety of predictors derived from satellite and agrometeorological data. To train the XGBoost method, an extensive input table was constructed for each administrative unit (LAU or NUTS-2) consisting of r rows and c columns, where r denotes a number of years for which predictors and reference crop yields were available, and c indicates a number of predictors. The following predictors were calculated for each of eight GDD levels (150 °C, 300 °C, 450 °C, …, 1200 °C):

Minimum, maximum and mean air temperature;
Surface radiation;
Accumulated surface radiation since 1 April;
Soil moisture at 0–7 cm and 7–28 cm levels;
Precipitation;
Accumulated precipitation since 1 April;
${NDVI}_{GDD}$ ;
${VCI}_{GDD}$ ;
${LST}_{GDD}$ ;
${TCI}_{GDD}$ ;
Annual maximum NDVI (which does not correspond to the GDD levels).

In total, there were 170 predictors (c) but the number of years (r) varied between crop types and administrative units due to the reference data availability. For LAUs, the number of years is presented in Figure 3, whereas for NUTS-2, it equaled 17 for winter wheat and 22 for winter rapeseed and maize.

All predictors were linearly scaled to the range between zero and one. Then, highly correlated predictors (above 0.75) were removed. Further, the feature selection procedure was applied based on the recursive feature elimination [40] employing the XGBoost method. The optimized XGBoost algorithm was ultimately trained based on selected predictors and crop yield residuals as a dependent variable. The application of the prediction model resulted in the forecasted crop yield residuals. The final absolute yield forecast was then calculated as a sum of this value and the crop yield estimated from the Theil–Sen monotonic trend.

3.6. Validation Approach

Validation of forecasting models involved the comparison of predicted yields and reference official statistics that were not used for the model training. For each administrative unit, crop type, and GDD level, a cross-validation was performed. It followed the leave-one-year-out procedure, which is a special case of the k-fold cross validation where k equals a number of years in a time series. It must be noted that the selection of predictors was repeated at each iteration to avoid the predictor selection procedure to benefit from ‘knowing’ the data from the year that was used for validation.

Three metrics were used to describe the model performance: Mean Bias Error (MBE, Equation (5)), Root Mean Square Error (RMSE, Equation (6)), and Modeling Efficiency (EF, Equation (7)) calculated by means of the following formulae:

MBE = \frac{1}{n} \sum_{k = 1}^{n} (E_{k} - M_{k})

(5)

RMSE = \sqrt{\frac{1}{n} \sum_{k = 1}^{n} {(E_{k} - M_{k})}^{2}}

(6)

EF = 1 - \frac{\sum_{k = 1}^{n} {(E_{k} - M_{k})}^{2}}{\sum_{k = 1}^{n} {(E_{k} - \bar{M})}^{2}}

(7)

where:

E_{k}

represents the predicted crop yield value,

M_{k}

represents the reference crop yield value,

\bar{M}

represents the average value of reference crop yield values, k represents the step of the time series (i.e., 1 year), and n represents the length of the time series.

The RMSE and MBE were also expressed in relative values (0–100%) denoted as the RRMSE and RMBE, respectively, by dividing these quality metrics by the mean of a reference data (

\bar{M}

). The MBE, RMSE, and EF were also used to evaluate the accuracy of cross-calibration between MODIS and Sentinel-3 products. However, in this situation the

E_{k}

denotes re-calibrated MODIS NDVI/LST and

M_{k}

, the original Sentinel-3 NDVI/LST time series.

4. Results

4.1. Accuracy of Cross-Calibration between MODIS and Sentinel-3 Products

The most accurate calibration models homogenizing MODIS NDVI and LST products with Sentinel-3 counterparts are given in Table 2. The RF and kNN models were found to be the optimal for the cross-calibration of NDVI and LST, respectively.

Table 2 reveals clear improvement in time series homogeneity between MODIS and Sentinel-3, which is confirmed by all three quality metrics. Yet, the impact of the remaining difference between Sentinel-3 and MODIS products on time series homogeneity is not easy to determine. This in turn may have further implications on the reliability of crop yield predictors that are calculated as anomalies or standardized by extreme values (i.e., VCI, TCI). To compare the spurious temporal variability introduced by the cross-calibration with natural variability of MODIS-derived predictors (NDVI, LST), the differences between monthly cross-calibration RMSE and double monthly standard deviations of MODIS-derived predictors within the period 2000–2019 were computed and revealed for administrative units (Figure 6). The negative differences (marked with blue) indicate that the cross-calibration error is lower than the natural variability of a predictor expressed by the double standard deviation. Overall, such a situation for NDVI occurred in 78% of LAU units and for LST in 98% of LAU units. This implies that in a great majority of administrative units, the cross-calibration errors do not obscure the natural variability of the predictors. Thus, it can be concluded that the developed and validated cross-calibration models are sufficient to homogenize MODIS products with the Sentinel-3 counterparts and further to use the homogenized data records to predict crop yields.

4.2. Yield Forecasting Performance

4.2.1. Nuts-2 Level

The best performance of end-of-season crop yield predictions is revealed for winter wheat (

RRMSE = 8.15 %

), while predictions for maize and winter rapeseed are less accurate (

RRMSE = 13 %

) (Table 3). For all three crops, the overall relative bias (RMBE) is below 1%. Yet, for individual NUTS-2 regions, these errors can be greater (Figure 7).

Figure 8 reveals that forecasting quality metrics differ across years. Most evidently, the predictions tended to overestimate crop yields: for all three crops in 2006, for winter rapeseed in 2003, and for maize in 2015. KusmierekTomaszewska and Żarski (2021) [41] reported these years as dry according to the Standardized Precipitation Index (SPI). In 2003, moderately dry conditions in May may have affected rapeseed development. In 2006, the very dry (SPI < −1.5) period in June–July impacted all crops. Finally, in 2015, drought lasted until the end of season limiting maize yields. Figure 9, Figure 10 and Figure 11 reveal detailed year-by-year performance of the crop yield forecasts for all three crops. High performance of winter wheat yield forecasts for Mazowieckie, Świętokrzyskie, Podkarpackie, and Podlaskie NUTS-2 units can be related to low annual variability of yields ranging from 30 to 40 dt/ha (Figure 9) that can be well approximated by a monotonic trend.

The quality of the prediction models increases along the season when more predictors become available for the XGBoost model (after every 150 °C GDD step). According to correlation coefficient, for winter wheat and winter rapeseed, the prediction models on average outperform the Theil–Sen monotonic trend-model at 450 °C GDD (Figure 12). However, only after mid-season, this is valid for the vast majority of administrative units. For maize it is evident that after 600 °C GDD, the predictions models are significantly more accurate than the trend-model.

4.2.2. LAU Level

At the LAU level, the overall quality of the crop yield predictions is lower than for NUTS-2. The RRMSE is around 14%, 19%, and 28% for winter wheat, winter rapeseed, and maize, respectively (Table 4).

Figure 13 presents the quality metrics for LAU units for which crop yields forecasts were generated. Some LAU units lack predictions (marked with black) due to insufficient reference statistics used to train the XGBoost models or because a given crop is not cultivated there.

Quality metrics of crop yield forecasts differ from year to year (Figure 14). For winter wheat and rapeseed they are in line with the results for NUTS-2, i.e., overestimation in 2011 (rapeseed) and 2018 (winter wheat). However, overestimated maize yields are strongly related to the amount of data available to train the models (see lowest panel in Figure 14). It can be seen that for the years 2015–2019, for which the amount of data is sufficient, the quality of the forecasts is already close to the NUTS-2 ones. In addition, the years when the forecast quality is lower are consistent with the NUTS-2 statistics, e.g., for 2015.

Variable length of time series of official statistics (used to train the models) among LAU units allows the assessment of how the size of the training dataset impacts the model performance. Figure 15 shows that for all three crops, the larger the training dataset (more years), the higher the accuracy of the crop yield forecasts. This is particularly evident for maize, for which a long data series corresponds to a more accurate prediction than for the other two crops.

5. Implementation of the Operational System

The crop yield forecasting system is fully automatic which implies that data collection, processing, crop yield forecasting, and the generation of graphical and tabular outputs do not require a human operator. Figure 16 presents the workflow of the crop yield forecasting system along with software that was used at each processing step. The system is implemented as an R package, so any external functions (CDO, Python, SNAP) are called from the R environment. Python is used to retrieve satellite and ERA-5 data, CDO to process the ERA-5 data, and SNAP to process Sentinel-3 images. Processing of MODIS data, aggregation to administrative units, preparation of predictors, training and application of crop yield models, as well as generation of final outputs are implemented in R. Figure 17 shows an example of graphical system output, which complements the tabular data (CSV) and geospatial vector data (SHP).

To guarantee the portability of the system, everything was installed on a virtual machine (VirtualBox v6.1.20 with Xubuntu v20.04.2). This allows migration to cloud computing environments such as Data and Information Access Services (DIAS) or Amazon Web Services (AWS), where the Copernicus data sets are directly accessible.

6. Discussion

6.1. System Performance

Relating the accuracy of yield forecasts obtained by the proposed system to existing solutions is complex and difficult to do adequately. This is due to the facts that: (1) yield forecasting is carried out at different administrative levels, (2) the quality of reference data (in this case official statistics) can vary between countries, (3) the inter-annual crop yield variability varies between countries, related to the level of development of the agricultural practices and the use of state-of-the-art management methods to reduce the effects of unfavorable crop growth conditions, (4) models are trained and validated on data of different lengths, and finally (5) even if the compared systems operate in areas of a similar climate, other indicators may be a major crop yield-limiting factor. Therefore, embedding performance of the presented system in the European context cannot be interpreted as an exhaustive bench-marking of forecasting systems. Since very few studies have attempted to predict crop yields in Europe at a LAU level, the following discussion is focused on forecasting at a NUTS-2 level.

The proposed system provides end-of-season winter wheat yield forecasts with a RRMSE of 8.15%. This outperforms the forecasting model based on the fraction of absorbed photosynthetically active radiation derived from SPOT VEGETATION, which provided winter wheat forecasts for Poland with a 10% RRMSE [17]. A common approach that employs MODIS NDVI seasonal peak was used to forecast the yield of winter wheat in Ukraine with a RRMSE of 15% [9] and 12% [42]. A very recent work of Paudel et al. (2022) [43] proposed a complex solution that uses deep learning methodology, and combines satellite-derived indicators with weather information and outputs of a crop growth model (World Food Studies Simulation Model, WOFOST). With this approach, RRMSE was 8.64% for winter wheat in Poland. It was also used for forecasting yields of maize for six European countries revealing an accuracy (RRMSE) from 12% (Spain) to as much as 29% (Romania), in comparison to the RRSME of 13% achieved here. Thus, it can be concluded that the operational system presented here provides crop yield forecasts at a NUTS-2 level with a similar or higher accuracy than similar approaches used in Europe.

6.2. Cross-Calibration of Satellite Indices

The crop yield forecasting system presented here employs the long-term time series of crop growth conditions and corresponding crop yield statistics used to train machine learning models. While agro-meteorological indicators are seamlessly derived from climate reanalysis, satellite-based indicators have been combined from two satellite sensors. First, sensors onboard the Sentinel-3 platforms provide data used for operational near-real time use, however these data have only been available since 2018. Second, data for previous years (since 2000) have been extracted from the MODIS archive. Data acquired by sensors with different spectral channels characteristics require cross-calibration and homogenization. In this study, the calibration was rendered on the product level, i.e., for NDVI and LST. An alternative approach would be to cross-calibrate radiances measured by the MODIS and OLCI/SLSTR instruments (at the Level-1 product level), then to perform atmospheric and Rayleigh scattering corrections in order to derive a homogeneous, cross-calibrated time series of the NDVI and LST products. This approach would significantly increase computational demand and complexity of the forecasting system, which is not advisable for an operational system. Furthermore, within this study, the cross-calibration at the Level-2 product level was found to significantly improve the homogeneity of data records (Table 2) and not to introduce significant variability, which would influence the crop yield forecasting system (see Section 4.1).

6.3. Heterogeneity of Spectral Signatures at the Moderate Spatial Resolution

Satellite-based predictors (i.e., NDVI, LST, VCI, TCI) used in the proposed crop yield forecasting system are not crop specific due to coarse spatial resolution of pixels (≥300 m) which may cover several agricultural fields sown with different plants. Nevertheless, such heterogeneous spectral signatures have been successfully used as a proxy for the vegetation condition and consequently as a predictor of crop yields (e.g., [11]). Yet, the new generations of satellite sensors such as Multispectral Instrument (MSI) onboard Sentinel-2 satellite constellation acquire images at both high temporal (~5 days) and spatial (~10 m) resolutions. This is a prerequisite to extract spectral signatures for individual fields and consequently to obtain a clear spectral signature related to physical characteristics of a particular crop [44]. Although the Sentinel-2 imagery is available, there are still some limitations that void its utilization within the crop yield forecasting systems. One of the limitations is the short Sentinel-2 data record (since 2015) which cannot be easily extended by means of cross-calibration with an older sensor (such as MODIS in this study) due to the unavailability of such an instrument featuring comparable spatial and temporal resolutions. A short time series hinders derivation of long-term anomalies and reduces the quality of crop yield forecast performed by a machine learning algorithm (Figure 15). Another limitation to the assimilation of high resolution satellite imagery to crop yield forecasts at a large/continental scale is the lack of extensive multitemporal reference information on crop yields and crop types at an agricultural field level. Nevertheless, these gaps are being quickly bridged by the expanding Sentinel-2 archive and by reference in-situ data on crop productivity collected by sensors mounted on novel agricultural equipment (e.g., harvesters). Consequently, several studies have already tackled the problem of crop yield forecasting based on high resolution Sentinel-2 imagery [45,46]. The future extensions of the proposed system will allow for crop yield predictions at a field level. However, to date it is not possible due to data availability and quality issues. Thus, the presented system is based on medium resolution satellite imagery which does not allow monitoring of individual fields. However, it should be emphasized that the forecasting system has been designed in such a way that changing the input data from Sentinel-3 to Sentinel-2 and switching to the clear spectral signature of specific crops requires only adaptation of the data input module. Then, it would be also possible to assimilate the Copernicus high resolution vegetation phenology and productivity product [47], which provides information on crop development and productivity at the 10-m scale.

6.4. Limitation of Agro-Meteorological Indicators

The proposed crop yield forecasting system is based on two groups of data. One group contains predictors related to the crop growth conditions described by agro-meteorological indicators. The other group consists of satellite-derived vegetation indices describing the instantaneous physical state of crops affected by the growing conditions. The fusion of both groups of predictors improves the quality of crop yield forecasts because they mutually diminish their deficiencies/limitations. In this respect, the agrometeorological data (either originating from climatological reanalysis or from interpolation of synoptic observations), have a resolution of a few kilometers, which may be sufficient to describe air temperature, insolation, or average precipitation across a flat terrain. However, some extreme weather events such as hailstorms, heavy rain, strong winds, or frosts will not be captured by a coarse agro-meteorological data. Therefore, it is necessary to use satellite-based predictors, which characterize the vegetation state induced by these adverse weather events. At the same time, satellite vegetation indices have their own limitations. Firstly, in the case of medium resolution imagery pixels contain a mixture of different crop types. Secondly, the saturation of NDVI above a certain biomass level [48] can hamper sensitivity to higher crop yields. Thus, the synergy between agro-meteorological and satellite predictors positively impacts crop yield modeling.

6.5. Applicability of the Crop Yield Forecasting System to Other Areas

The presented system contains a module for automatic training of yield forecasting models. It includes the step of removing the correlated features, forward feature selection, and selection of the best prediction method. It requires only the inclusion of predictors, i.e., agro-meteorological and satellite data, and reference statistics on crop yields for administrative units. Therefore, the model can be easily adapted to other regions featuring similar crop growth conditions. Otherwise, a problem may occur related to a double growing season, as computation of the GDD in the current system assumes one main crop per year.

6.6. Perspectives

There are three main directions for the near-future development of the proposed system. The first involves using a dynamic cropland mask that would better reflect year-to-year land use changes. The near future will also bring crop type maps [49,50] that, in turn, will allow crop-specific satellite-based indices from individual fields. Yet, a relatively short time series of crop type maps (since 2015 in case of Sentinel-derived ones) will limit their use in crop yield forecasting that relies on training the models on data archive. The second enhancement of the system would be to use weather forecasts to determine predictors derived at the moment from climate reanalysis. This should improve the accuracy of in-season forecasts that completely disregard the forthcoming change in weather conditions, however are only based on the average course of the weather condition over the season in previous years. A last development would be to extend the list of predictors. One group of these could be constituted of the outputs of crop growth models [43]. However, it must be emphasized that the models require extensive calibration against field data which are often not available. The second group of potential new predictors are related to less exploited satellite-derived data such as sun-induced fluorescence [51].

7. Conclusions

This study presents a fully automated crop yield forecasting system operating at the administrative-unit scale based on predictors originating from open-access Sentinel-3 satellite data and ERA-5 climate reanalysis. The system gathers satellite and agro-meteorological products, performs preprocessing, calculates yield predictors, transforms them to thermal time (as a proxy for crop development stage), predicts crop yields, and generates the final outputs (tabular, graphical). The prediction module runs an exTreme Gradient Boosting regressor, which is preceded by the iterative predictor selection procedure. The system was intensively validated at NUTS-2 and LAU units in Poland for winter wheat, winter rapeseed, and maize using a leave-one-year-out procedure. The analyzed period covered 2000–2019, however the availability of the reference data differed among administrative units. The performance (relative RMSE) of the end-of-season forecasts for NUTS-2 was 8.15% for winter wheat, and around 13% for maize as well as for winter rapeseed. The system was designed in a way that it can be easily applied to other regions, where reference yields statistics are available, and can also be easily migrated to cloud computing environments (e.g., DIAS or Amazon AWS), where the Copernicus data sets are directly accessible.

Author Contributions

Conceptualization, J.S.B., S.S., J.P.M.; methodology, J.S.B., S.S., J.P.M.; software, J.S.B., S.S., J.P.M.; validation, J.S.B., S.S., P.S.; formal analysis, J.S.B., S.S.; investigation, J.S.B., S.S.; data curation, J.S.B., S.S., J.P.M.; writing—original draft preparation, J.S.B.; writing—review and editing, J.S.B., S.S., J.P.M., E.W., K.D.-Z., P.S., T.M., A.Ł.; visualization, J.S.B., S.S.; project administration, J.S.B., K.D.-Z., T.M., A.Ł.; funding acquisition, J.S.B., E.W., K.D.-Z., T.M., A.Ł. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Centre for Research and Development–Poland under grant agreement no. GOSPOSTRATEG1/381705/13/NCBR/2018, and by the European Space Agency under contract no. 4000123852/18/NL/CBi.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

Becker-Reshef, I.; Justice, C.; Barker, B.; Humber, M.; Rembold, F.; Bonifacio, R.; Zappacosta, M.; Budde, M.; Magadzire, T.; Shitote, C.; et al. Strengthening agricultural decisions in countries at risk of food insecurity: The GEOGLAM Crop Monitor for Early Warning. Remote Sens. Environ. 2020, 237, 111553. [Google Scholar] [CrossRef]
Lobell, D.B.; Cassman, K.G.; Field, C.B. Crop yield gaps: Their importance, magnitudes, and causes. Annu. Rev. Environ. Resour. 2009, 34, 179–204. [Google Scholar] [CrossRef] [Green Version]
Lobell, D.B. The use of satellite data for crop yield gap analysis. Field Crop. Res. 2013, 143, 56–64. [Google Scholar] [CrossRef] [Green Version]
Mathew, I.; Shimelis, H.; Mutema, M.; Chaplot, V. What crop type for atmospheric carbon sequestration: Results from a global data analysis. Agric. Ecosyst. Environ. 2017, 243, 34–46. [Google Scholar] [CrossRef]
Turkeltaub, T.; Kurtzman, D.; Russak, E.E.; Dahan, O. Impact of switching crop type on water and solute fluxes in deep vadose zone. Water Resour. Res. 2015, 51, 9828–9842. [Google Scholar] [CrossRef] [Green Version]
Gilbert, N. One-third of our greenhouse gas emissions come from agriculture. Nature 2012, 31, 10–12. [Google Scholar] [CrossRef]
Ray, D.K.; Gerber, J.S.; MacDonald, G.K.; West, P.C. Climate variation explains a third of global crop yield variability. Nat. Commun. 2015, 6, 5989. [Google Scholar] [CrossRef] [Green Version]
Iizumi, T.; Sakai, T. The global dataset of historical yields for major crops 1981–2016. Sci. Data 2020, 7, 97. [Google Scholar] [CrossRef] [Green Version]
Becker-Reshef, I.; Vermote, E.; Lindeman, M.; Justice, C. A generalized regression-based model for forecasting winter wheat yields in Kansas and Ukraine using MODIS data. Remote Sens. Environ. 2010, 114, 1312–1323. [Google Scholar] [CrossRef]
Tucker, C.J.; Holben, B.N.; Elgin, J.H.; McMurtrey, J.E. Relationship of spectral data to grain yield variation. Photogramm. Eng. Remote Sens. 1980, 45, 600–608. [Google Scholar]
Dabrowska-Zielinska, K.; Kogan, F.; Ciolkosz, A.; Gruszczynska, M.; Kowalik, W. Modelling of crop growth conditions and crop yield in Poland using AVHRR-based indices. Int. J. Remote Sens. 2002, 23, 1109–1123. [Google Scholar] [CrossRef]
Bastiaanssen, W.G.M.; Ali, S. A new crop yield forecasting model based on satellite measurements applied across the Indus Basin, Pakistan. Agric. Ecosyst. Environ. 2003, 94, 321–340. [Google Scholar] [CrossRef]
Basnyat, P.; McConkey, B.; Lafond, G.P.; Moulin, A.; Pelcat, Y. Optimal time for remote sensing to relate to crop grain yield on the Canadian prairies. Can. J. Plant Sci. 2004, 84, 97–103. [Google Scholar] [CrossRef]
Kogan, F.; Yang, B.; Wei, G.; Zhiyuan, P.; Xianfeng, J. Modelling corn production in China using AVHRR-based vegetation health indices. Int. J. Remote Sens. 2005, 26, 2325–2336. [Google Scholar] [CrossRef]
Prasad, A.K.; Chai, L.; Singh, R.P.; Kafatos, M. Crop yield estimation model for Iowa using remote sensing and surface parameters. Int. J. Appl. Earth Obs. Geoinf. 2006, 8, 26–33. [Google Scholar] [CrossRef]
Moriondo, M.; Maselli, F.; Bindi, M. A simple model of regional wheat yield based on NDVI data. Eur. J. Agron. 2007, 26, 266–274. [Google Scholar] [CrossRef]
Kowalik, W.; Dabrowska-Zielinska, K.; Meroni, M.; Raczka, T.U.; de Wit, A. Yield estimation using SPOT-VEGETATION products: A case study of wheat in European countries. Int. J. Appl. Earth Obs. Geoinf. 2014, 32, 228–239. [Google Scholar] [CrossRef]
Johnson, D.M. An assessment of pre- and within-season remotely sensed variables for forecasting corn and soybean yields in the United States. Remote Sens. Environ. 2014, 141, 116–128. [Google Scholar] [CrossRef]
Shao, Y.; Campbell, J.B.; Taff, G.N.; Zheng, B. An analysis of cropland mask choice and ancillary data for annual corn yield forecasting using MODIS data. Int. J. Appl. Earth Obs. Geoinf. 2015, 38, 78–87. [Google Scholar] [CrossRef]
Weiss, M.; Jacob, F.; Duveiller, G. Remote sensing for agricultural applications: A meta-review. Remote Sens. Environ. 2020, 236, 111402. [Google Scholar] [CrossRef]
Kussul, N.; Mykola, L.; Shelestov, A.; Skakun, S. Crop inventory at regional scale in Ukraine: Developing in season and end of season crop maps with multi-temporal optical and SAR satellite imagery. Eur. J. Remote Sens. 2018, 51, 627–636. [Google Scholar] [CrossRef] [Green Version]
Rao, P.; Zhou, W.; Bhattarai, N.; Srivastava, A.K.; Singh, B.; Poonia, S.; Lobell, D.B.; Jain, M. Using Sentinel-1, Sentinel-2, and Planet Imagery to Map Crop Type of Smallholder Farms. Remote Sens. 2021, 13, 1870. [Google Scholar] [CrossRef]
Tricht, K.V.; Gobin, A.; Gilliams, S.; Piccard, I. Synergistic Use of Radar Sentinel-1 and Optical Sentinel-2 Imagery for Crop Mapping: A Case Study for Belgium. Remote Sens. 2018, 10, 1642. [Google Scholar] [CrossRef] [Green Version]
Franch, B.; Vermote, E.; Skakun, S.; Roger, J.; Becker-Reshef, I.; Murphy, E.; Justice, C. Remote sensing based yield monitoring: Application to winter wheat in United States and Ukraine. Int. J. Appl. Earth Obs. Geoinf. 2019, 76, 112–127. [Google Scholar] [CrossRef]
Lobell, D.B.; Asner, G.P.; Ortiz-Monasterio, J.I.; Benning, T.L. Remote sensing of regional crop production in the Yaqui Valley, Mexico: Estimates and uncertainties. Agric. Ecosyst. Environ. 2003, 94, 205–220. [Google Scholar] [CrossRef] [Green Version]
Nolasco, M.; Ovando, G.; Sayago, S.; Magario, I.; Bocco, M. Estimating soybean yield using time series of anomalies in vegetation indices from MODIS. Int. J. Remote Sens. 2020, 42, 405–421. [Google Scholar] [CrossRef]
Doraiswamy, P. Crop condition and yield simulations using Landsat and MODIS. Remote Sens. Environ. 2004, 92, 548–559. [Google Scholar] [CrossRef]
Vermote, E. MODIS/Terra Surface Reflectance 8-Day L3 Global 250m SIN Grid V061. 2021. Available online: https://lpdaac.usgs.gov/products/mod09q1v061/ (accessed on 20 November 2020).
Schulzweida, U. CDO User Guide. 2020. Available online: https://code.mpimet.mpg.de/projects/cdo/wiki/Cite (accessed on 20 November 2020).
Theil, H. A Rank-Invariant Method of Linear and Polynomial Regression Analysis. In Advanced Studies in Theoretical and Applied Econometrics; Springer: Berlin/Heidelberg, Germany, 1992; pp. 345–381. [Google Scholar] [CrossRef]
Genovese, G.; Vignolles, C.; Nègre, T.; Passera, G. A methodology for a combined use of normalised difference vegetation index and CORINE land cover data for crop yield monitoring and forecasting. A case study on Spain. Agronomie 2001, 21, 91–111. [Google Scholar] [CrossRef] [Green Version]
R Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2021. [Google Scholar]
Chen, J.; Jönsson, P.; Tamura, M.; Gu, Z.; Matsushita, B.; Eklundh, L. A simple method for reconstructing a high-quality NDVI time-series data set based on the Savitzky–Golay filter. Remote Sens. Environ. 2004, 91, 332–344. [Google Scholar] [CrossRef]
Bonhomme, R. Bases and limits to using ‘degree.day’ units. Eur. J. Agron. 2000, 13, 1–10. [Google Scholar] [CrossRef]
Duveiller, G.; Baret, F.; Defourny, P. Using Thermal Time and Pixel Purity for Enhancing Biophysical Variable Time Series: An Interproduct Comparison. IEEE Trans. Geosci. Remote Sens. 2013, 51, 2119–2127. [Google Scholar] [CrossRef]
Trudgill, D.L.; Honek, A.; Li, D.; Straalen, N.M. Thermal time—Concepts and utility. Ann. Appl. Biol. 2005, 146, 1–14. [Google Scholar] [CrossRef]
Kogan, F.N. Global Drought Watch from Space. Bull. Am. Meteorol. Soc. 1997, 78, 621–636. [Google Scholar] [CrossRef]
Chen, T.; Guestrin, C. XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, San Francisco, CA, USA, 13–17 August 2016. [Google Scholar] [CrossRef] [Green Version]
Chen, T.; He, T.; Benesty, M.; Khotilovich, V.; Tang, Y.; Cho, H.; Chen, K.; Mitchell, R.; Cano, I.; Zhou, T.; et al. Extreme Gradient Boosting, R Package Version 1.5.0.2. 2021.
Guyon, I.; Weston, J.; Barnhill, S.; Vapnik, V. Gene Selection for Cancer Classification using Support Vector Machines. Mach. Learn. 2002, 46, 389–422. [Google Scholar] [CrossRef]
Kuśmierek-Tomaszewska, R.; Żarski, J. Assessment of Meteorological and Agricultural Drought Occurrence in Central Poland in 1961–2020 as an Element of the Climatic Risk to Crop Production. Agriculture 2021, 11, 855. [Google Scholar] [CrossRef]
Franch, B.; Vermote, E.; Becker-Reshef, I.; Claverie, M.; Huang, J.; Zhang, J.; Justice, C.; Sobrino, J. Improving the timeliness of winter wheat production forecast in the United States of America, Ukraine and China using MODIS data and NCAR Growing Degree Day information. Remote Sens. Environ. 2015, 161, 131–148. [Google Scholar] [CrossRef]
Paudel, D.; Boogaard, H.; de Wit, A.; van der Velde, M.; Claverie, M.; Nisini, L.; Janssen, S.; Osinga, S.; Athanasiadis, I.N. Machine learning for regional crop yield forecasting in Europe. Field Crop. Res. 2022, 276, 108377. [Google Scholar] [CrossRef]
Veloso, A.; Mermoz, S.; Bouvet, A.; Toan, T.L.; Planells, M.; Dejoux, J.F.; Ceschia, E. Understanding the temporal behavior of crops using Sentinel-1 and Sentinel-2-like data for agricultural applications. Remote Sens. Environ. 2017, 199, 415–426. [Google Scholar] [CrossRef]
Skakun, S.; Vermote, E.; Franch, B.; Roger, J.C.; Kussul, N.; Ju, J.; Masek, J. Winter Wheat Yield Assessment from Landsat 8 and Sentinel-2 Data: Incorporating Surface Reflectance, Through Phenological Fitting, into Regression Yield Models. Remote Sens. 2019, 11, 1768. [Google Scholar] [CrossRef] [Green Version]
Franch, B.; Bautista, A.S.; Fita, D.; Rubio, C.; Tarrazó-Serrano, D.; Sánchez, A.; Skakun, S.; Vermote, E.; Becker-Reshef, I.; Uris, A. Within-Field Rice Yield Estimation Based on Sentinel-2 Satellite Data. Remote Sens. 2021, 13, 4095. [Google Scholar] [CrossRef]
Tian, F.; Cai, Z.; Jin, H.; Hufkens, K.; Scheifinger, H.; Tagesson, T.; Smets, B.; Hoolst, R.V.; Bonte, K.; Ivits, E.; et al. Calibrating vegetation phenology from Sentinel-2 using eddy covariance, PhenoCam, and PEP725 networks across Europe. Remote Sens. Environ. 2021, 260, 112456. [Google Scholar] [CrossRef]
Gamon, J.A.; Field, C.B.; Goulden, M.L.; Griffin, K.L.; Hartley, A.E.; Joel, G.; Penuelas, J.; Valentini, R. Relationships Between NDVI, Canopy Structure, and Photosynthesis in Three Californian Vegetation Types. Ecol. Appl. 1995, 5, 28–41. [Google Scholar] [CrossRef] [Green Version]
d’Andrimont, R.; Verhegghen, A.; Lemoine, G.; Kempeneers, P.; Meroni, M.; van der Velde, M. From parcel to continental scale—A first European crop type map based on Sentinel-1 and LUCAS Copernicus in-situ observations. Remote Sens. Environ. 2021, 266, 112708. [Google Scholar] [CrossRef]
Woźniak, E.; Rybicki, M.; Kofman, W.; Aleksandrowicz, S.; Wojtkowski, C.; Lewiński, S.; Bojanowski, J.; Musiał, J.; Milewski, T.; Slesiński, P.; et al. Multi-temporal phenological indices derived from time series Sentinel-1 images to country-wide crop classification. Int. J. Appl. Earth Obs. Geoinf. 2022, 107, 102683. [Google Scholar] [CrossRef]
Song, L.; Guanter, L.; Guan, K.; You, L.; Huete, A.; Ju, W.; Zhang, Y. Satellite sun-induced chlorophyll fluorescence detects early response of winter wheat to heat stress in the Indian Indo-Gangetic Plains. Glob. Chang. Biol. 2018, 24, 4023–4037. [Google Scholar] [CrossRef] [Green Version]

Figure 1. Fractional arable land [%] in Poland.

Figure 2. Administrative divisions of Poland for which crop yields were predicted: Nomenclature des Unités Territoriales Statistiques level 2 (NUTS-2)—red lines, and Local Administrative Units (LAU)—gray lines.

Figure 3. Time span of official crop yield statistics at the LAU level. Black signifies no data.

Figure 4. Temporal yield residuals from Theil–Sen monotonic trend used as a response variable in crop yield forecasting for NUTS-2. Please mind different limits of Y-axes.

Figure 5. Absolute annual crop yields for NUTS-2 units reported by Statistics Poland with a fitted Theil–Sen monotonic trend. Please mind different limits of Y-axes.

Figure 6. Monthly differences (May) for LAU units between cross-calibration error (RMSE) and the natural variability of the MODIS-derived predictors expressed as double standard deviation. The negative differences (blue colours) indicate that the calibration error is lower than the natural variability.

Figure 7. Map of crop yield forecasts performance at the NUTS-2 level.

Figure 8. Time series of quality metrics of crop yield predictions for NUTS-2 units after the vegetation season.

Figure 9. Comparison of crop yield predictions with the reference statistics for winter wheat at NUTS-2 level. The RMSE and bias of this relationship is given in Figure 7.

Figure 10. Comparison of crop yield predictions with the reference statistics for winter rapeseed at NUTS-2 level. The RMSE and bias of this relationship is given in Figure 7.

Figure 11. Comparison of crop yield predictions with the reference statistics for maize at NUTS-2 level. The RMSE and bias of this relationship is given in Figure 7.

Figure 12. Distributions of temporal correlations for different GDD levels between crop yield predictions and reference statistics. Additionally, the performance of linear trend models is shown on the rightmost box. Dashed line denotes median correlation of the linear trend-model.

Figure 13. Map of crop yield forecasts performance at the LAU level. Black areas denote LAU units where the crop yield forecast was not possible due to data availability issues.

Figure 14. Time series of quality metrics of crop yield predictions for LAU units after the vegetation season.

Figure 15. Correlation coefficient (r) between reference crop yield statistics (used to train the models) and crop yield forecast as a function of the available time span in years (color bars).

Figure 16. Workflow of the crop yield forecasting system, data streams, and software used (‘cdo’—Climate Data Operators v1.9.9, R v4.1.1, Python v3.8.10).

Figure 17. An example of the graphical output generated by the crop yield forecasting system: crop yield forecasts for winter wheat (upper row), winter rapeseed (middle row), and maize (bottom row) for NUTS-2 (left column) and LAU (right column).

Table 1. Satellite, agro-meteorological, and ancillary data used to derive crop yield forecast predictors.

Name	Source	Temporal Resolution	Spatial Resolution
Satellite indices
NDVI (-)	MODIS	8 day	250 m
NDVI (-)	Sentinel-3	1 day	300 m
LST (K)	MODIS	8 day	1000 m
LST (K)	Sentinel-3	1 day	1000 m
Agro-meteorological parameters
Air temperature (K)	ERA-5	1 h	0.25 deg
Precipitation (m)	ERA-5	1 h	0.25 deg
Surface radiation ( $J m^{- 2}$ )	ERA-5	1 h	0.25 deg
Soil moisture 0–7 cm ( $m^{3} m^{- 3}$ )	ERA-5	1 h	0.25 deg
Soil moisture 7–28 cm ( $m^{3} m^{- 3}$ )	ERA-5	1 h	0.25 deg
Crop mask
Fraction of arable land	CLC * 2018	static	Polygons **
Administrative units
NUTS-2/LAU	GUGiK ***	static	Polygons

* Corine Land Cover; ** With minimum area of 25 ha; *** Head Office of Geodesy and Cartography.

Table 2. Accuracy of MODIS NDVI and LST cross-calibration to Sentinel-3 counterparts.

Response Variable	Status	Predictors	MBE	RMSE	EF
${NDVI}_{S - 3}$	prior to calibration	–	3.40	5.16	0.73
${NDVI}_{S - 3}$	calibrated by RF	${NDVI}_{MODIS}$ , DOY, GDD	0.07	2.95	0.91
${LST}_{S - 3}$	prior to calibration	–	−7.16	7.83	−0.65
${LST}_{S - 3}$	calibrated by kNN	${LST}_{MODIS}$ , DOY, GDD	0.00	2.51	0.84

DOY—Day of Year; S-3—Sentinel-3; GDD—Growing Degree Days.

Table 3. Overall performance of crop yield forecasts at NUTS-2 level. MBE–Mean Bias Error, RMSE–Relative MBE, RMSE–Root Mean Square Error, RRMSE–Relative RMSE.

Crop Type	MBE (dt)	RMBE (%)	RMSE (dt)	RRMSE (%)	$R^{2}$ (–)	Correlation (r) (–)
Winter wheat	0.25	0.60	3.43	8.15	0.84	0.92
Winter rapeseed	0.19	0.71	3.39	13.03	0.47	0.69
Maize	0.37	0.63	7.76	13.32	0.51	0.71

Table 4. Overall performance of crop yield forecasts at LAU level.

Crop Type	MBE (dt)	RMBE (%)	RMSE (dt)	RRMSE (%)	$R^{2}$ (–)	Correlation (r) (–)
Winter wheat	−0.01	−0.03	5.20	13.77	0.75	0.87
Winter rapeseed	−0.02	−0.07	5.09	18.80	0.45	0.67
Maize	0.07	0.12	14.89	27.36	0.48	0.69

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Bojanowski, J.S.; Sikora, S.; Musiał, J.P.; Woźniak, E.; Dąbrowska-Zielińska, K.; Slesiński, P.; Milewski, T.; Łączyński, A. Integration of Sentinel-3 and MODIS Vegetation Indices with ERA-5 Agro-Meteorological Indicators for Operational Crop Yield Forecasting. Remote Sens. 2022, 14, 1238. https://doi.org/10.3390/rs14051238

AMA Style

Bojanowski JS, Sikora S, Musiał JP, Woźniak E, Dąbrowska-Zielińska K, Slesiński P, Milewski T, Łączyński A. Integration of Sentinel-3 and MODIS Vegetation Indices with ERA-5 Agro-Meteorological Indicators for Operational Crop Yield Forecasting. Remote Sensing. 2022; 14(5):1238. https://doi.org/10.3390/rs14051238

Chicago/Turabian Style

Bojanowski, Jędrzej S., Sylwia Sikora, Jan P. Musiał, Edyta Woźniak, Katarzyna Dąbrowska-Zielińska, Przemysław Slesiński, Tomasz Milewski, and Artur Łączyński. 2022. "Integration of Sentinel-3 and MODIS Vegetation Indices with ERA-5 Agro-Meteorological Indicators for Operational Crop Yield Forecasting" Remote Sensing 14, no. 5: 1238. https://doi.org/10.3390/rs14051238

APA Style

Bojanowski, J. S., Sikora, S., Musiał, J. P., Woźniak, E., Dąbrowska-Zielińska, K., Slesiński, P., Milewski, T., & Łączyński, A. (2022). Integration of Sentinel-3 and MODIS Vegetation Indices with ERA-5 Agro-Meteorological Indicators for Operational Crop Yield Forecasting. Remote Sensing, 14(5), 1238. https://doi.org/10.3390/rs14051238

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Integration of Sentinel-3 and MODIS Vegetation Indices with ERA-5 Agro-Meteorological Indicators for Operational Crop Yield Forecasting

Abstract

1. Introduction

2. Data

2.1. Satellite Data

2.1.1. Sentinel-3 Operational Products

2.1.2. MODIS Products

2.2. Agro-Meteorological Data

2.3. Crop Mask

2.4. Crop Yield Statistics

3. Methods

3.1. Spatial Aggregation

3.2. Temporal Smoothing of NDVI Values

3.3. Cross-Calibration of NDVI and LST Products Derived from MODIS and Sentinel-2 Data

3.4. Resampling of Explanatory Variables from Calendar Time to Thermal Time

3.5. Crop Yield Forecasting

3.6. Validation Approach

4. Results

4.1. Accuracy of Cross-Calibration between MODIS and Sentinel-3 Products

4.2. Yield Forecasting Performance

4.2.1. Nuts-2 Level

4.2.2. LAU Level

5. Implementation of the Operational System

6. Discussion

6.1. System Performance

6.2. Cross-Calibration of Satellite Indices

6.3. Heterogeneity of Spectral Signatures at the Moderate Spatial Resolution

6.4. Limitation of Agro-Meteorological Indicators

6.5. Applicability of the Crop Yield Forecasting System to Other Areas

6.6. Perspectives

7. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI