Wheat Yield Estimation from NDVI and Regional Climate Models in Latvia

: Wheat yield variability will increase in the future due to the projected increase in extreme weather events and long-term climate change e ﬀ ects. Currently, regional agricultural statistics are used to monitor wheat yield. Remotely sensed vegetation indices have a higher spatio-temporal resolution and could give more insight into crop yield. In this paper, we (i) evaluate the possibility to use Normalized Di ﬀ erence Vegetation Index (NDVI) time series to estimate wheat yield in Latvia and (ii) determine which weather variables impact wheat yield changes using both ALARO-0 and REMO Regional Climate Models (RCM) output. The integral from NDVI series (aNDVI) for winter and spring wheat ﬁelds is used as a predictor to model regional wheat yield from 2014 to 2018. A correlation analysis between weather variables, wheat yield and aNDVI was used to elucidate which weather variables impact wheat yield changes in Latvia. Our results indicate that high temperatures in June for spring wheat and in July for winter wheat had a negative correlation with yield. A linear regression yield model explained 71% of the variability with a residual standard error of 0.55 Mg / ha. When RCM data were added as predictor variables to the wheat yield empirical model a random forest approach resulted in better results compared to a linear regression approach, the explained variance increased up to 97% and the residual standard error decreased to 0.17 Mg / ha. We conclude that NDVI time series and RCM output enabled regional crop yield and weather impact monitoring at higher spatio-temporal resolutions than regional statistics.


Introduction
Extreme weather events and long-term climate change effects will have a large impact on the yield of agriculture crops in the future [1][2][3][4]. A negative impact on major food and feed crops is expected in temperate and tropical regions under the climate change scenario of global warming equal to 2 • C or more compared to the pre-industrial era [4]. In some regions, the negative effect of increased temperature and occurrence of extreme events (e.g., drought and pests) on crop production might be offset by the carbon dioxide (CO 2 ) fertilization effect [4,5]. However, the magnitude of the CO 2 fertilization effect and the importance of other interacting factors is still unclear [4,5]. Considering these scenarios, it will be a challenge to ensure global food security in the future. In Europe, particularly in southern Europe, extreme weather events have already resulted in crop yield losses [2,6,7]. The extreme drought of 2018 resulted in a drop of 8% in European cereal harvest compared to the five-year average [6]. Years with extreme heat waves between 1964 and 2007 resulted in national cereal production deficits of 9.1% worldwide [8].
In order to guarantee food supply, it is crucial to monitor crop yield and understand how weather variability and weather extremes influence crop yield. Accurate and timely crop yield data are indispensable for decision making, as reliable estimations of yield changes are required for strategic planning [9]. Currently, crop yield statistics are predominantly used to monitor and evaluate the causes of crop yield changes. However, these crop yield statistics have their limitations as the spatial resolution and temporal frequency of these data are limited. Data are, for example, often only available at county or regional levels and assessed yearly [10].
Remotely sensed vegetation indices (VIs) have a large scope to fill the crop yield data gap [11]. VIs capture various properties of the Earth's vegetation by combining the reflectance of specific spectral bands and can be used to monitor crop performance. An advantage of VIs, compared to crop yield statistics, is that information of the crop can be delivered at a high spatio-temporal resolution. The spatial resolution of VIs can be as high as a few meters and some VIs are available at a daily resolution. There is, however, often a tradeoff between the spatial and temporal resolution of VIs. Time series of VIs enable us to evaluate crop development throughout the growing season. In addition, remotely sensed VIs are globally available, allowing us to monitor crop yield at the field level as well as at the global level. Several studies have already demonstrated the possibility to use VIs to monitor yield at different geographical locations and at different scales i.e., from the field to global level [12][13][14][15][16][17]. One of the VIs that is often and successfully used to estimate crop yield is the Normalized Difference Vegetation Index (NDVI) [14,18,19].
Deriving crop yield from VIs can be performed using a mechanistic approach or an empirical approach [20]. When a mechanistic approach is applied, growth parameters such as: VI, soil characteristics, and management information are used to run and calibrate plant growth models from which crop yield is derived [21]. When an empirical approach is used, a VI is related to crop yield by using statistical methods without describing the causality between the VI and crop yield as performed in mechanistic-based models [14,22]. A simple linear regression model where a VI is modeled as a function of crop yield is an example of an empirical based model. Whereas empirical-based approaches depend on the availability and quality of the data i.e., VI and yield data, mechanistic approaches rely on the assumptions made in the models [20]. The main limiting factor of empirical crop yield models is that the models cannot be easily extrapolated to other areas [12,20].
Wheat is one of the most important cereal crops in Europe, in 2017 wheat constituted 52% of the total harvested cereal production in Europe [23]. European wheat production is also important on a global scale, as Europe accounted for 32.6% of the global wheat production for the period from 1994 to Remote Sens. 2020, 12, 2206 3 of 20 2017, and wheat yield in Europe is high (i.e., in 2017 wheat yield in Europe was 44 kg/ha compared to 35 kg/ha globally) [23]. However, the occurrence of adverse weather conditions in European wheat production areas is expected to increase due to climate change which will lead to a larger wheat yield variability in Europe [1,2,[24][25][26]. This advocates wheat yield monitoring in Europe. The aim of this paper is to (i) evaluate the possibility to use NDVI time series to estimate yields and (ii) determine which weather variables impact yield changes. We tested the methodology for spring and winter wheat yield in Latvia. NDVI series at the field level were combined with crop yield statistics at the regional level from 2014 until 2018 using an empirical modelling strategy. In addition, we explored which weather variables impact wheat yield changes in Latvia, using output from the Regional Climate Models ALARO-0 and REMO2015 (henceforth REMO), and subsequent correlation analysis between weather variables, wheat yield, and NDVI.

Study Area
Spring and winter wheat yield of Latvia for the period from 2014 to 2018 was studied. For this period, geospatial field data of 2384 wheat fields in Latvia were available through the Land Parcel Information System (LPIS) (Figure 1). Only wheat fields with an area equal to or bigger than 0.1 km 2 were used in order to be able to extract pure NDVI pixels from PROBA-V, following [14]. The majority of the fields were located in the Zemgale region ( Figure 1) which has the most fertile soils, the dominant World Reference Base Soil Group is Luvisol in this region ( Figure 2). In the Latgale region where less fertile Albeluvisols dominate (now named Retisol according to [27]), there were not many wheat fields located (Figures 1 and 2). In the following sections, we describe the methods used to extract NDVI data for the wheat fields and the available crop yield statistics.
Remote Sens. 2020, 12, x FOR PEER REVIEW 3 of 20 regional level from 2014 until 2018 using an empirical modelling strategy. In addition, we explored which weather variables impact wheat yield changes in Latvia, using output from the Regional Climate Models ALARO-0 and REMO2015 (henceforth REMO), and subsequent correlation analysis between weather variables, wheat yield, and NDVI.

Study Area
Spring and winter wheat yield of Latvia for the period from 2014 to 2018 was studied. For this period, geospatial field data of 2384 wheat fields in Latvia were available through the Land Parcel Information System (LPIS) (Figure 1). Only wheat fields with an area equal to or bigger than 0.1 km 2 were used in order to be able to extract pure NDVI pixels from PROBA-V, following [14]. The majority of the fields were located in the Zemgale region ( Figure 1) which has the most fertile soils, the dominant World Reference Base Soil Group is Luvisol in this region ( Figure 2). In the Latgale region where less fertile Albeluvisols dominate (now named Retisol according to [27]), there were not many wheat fields located (Figures 1 and 2). In the following sections, we describe the methods used to extract NDVI data for the wheat fields and the available crop yield statistics.

Remote Sensing Data: NDVI
The 10-daily NDVI synthesis images with a resolution of 300 by 300 m from PROBA-V were used. The geospatial field data were combined with the NDVI raster data to extract the pixels for each field with the terrascope platform (https://terrascope.be/en). To ensure that the extracted pixels were located within the field boundaries, an inward buffer of 150 m was applied before extracting the pixels. The extracted pixels were averaged at the field level, resulting in NDVI series consisting of 10 daily observations per field.
The NDVI integral (hereafter referred to as aNDVI) was calculated using the trapezoidal rule. The calculated aNDVI was used to predict spring and winter wheat yield. To ensure that the calculated aNDVI represented the wheat yield growing season, NDVI values measured after the 11th of August (i.e., harvest) and below the threshold of 0.2 were discarded. The NDVI threshold of 0.2 resulted in the best wheat yield predictions [14]. To make sure that the calculated aNDVI captured the different phenological stages (from green-up to harvest) which are important for wheat yield, only the NDVI curves that had at least ten continuous observations were retained for the calculation of aNDVI series ( Figure 3). For the calculation of aNDVI, 339 fields, i.e., 14% of the initial number of fields, were discarded because their NDVI series had less than ten observations. Thus, in total, 2045 wheat fields (1561 winter wheat fields and 484 spring wheat fields) were used for the calculation of aNDVI.

Remote Sensing Data: NDVI
The 10-daily NDVI synthesis images with a resolution of 300 by 300 m from PROBA-V were used. The geospatial field data were combined with the NDVI raster data to extract the pixels for each field with the terrascope platform (https://terrascope.be/en). To ensure that the extracted pixels were located within the field boundaries, an inward buffer of 150 m was applied before extracting the pixels. The extracted pixels were averaged at the field level, resulting in NDVI series consisting of 10 daily observations per field.
The NDVI integral (hereafter referred to as aNDVI) was calculated using the trapezoidal rule. The calculated aNDVI was used to predict spring and winter wheat yield. To ensure that the calculated aNDVI represented the wheat yield growing season, NDVI values measured after the 11th of August (i.e., harvest) and below the threshold of 0.2 were discarded. The NDVI threshold of 0.2 resulted in the best wheat yield predictions [14]. To make sure that the calculated aNDVI captured the different phenological stages (from green-up to harvest) which are important for wheat yield, only the NDVI curves that had at least ten continuous observations were retained for the calculation of aNDVI series ( Figure 3). For the calculation of aNDVI, 339 fields, i.e., 14% of the initial number of fields, were discarded because their NDVI series had less than ten observations. Thus, in total, 2045 wheat fields (1561 winter wheat fields and 484 spring wheat fields) were used for the calculation of aNDVI.

Crop yield Statistics Data
Yearly regional crop yield statistics from 2014 to 2018 for spring and winter wheat in Latvia for the NUTS 3 (Nomenclature of Territorial Units for Statistics level 3) regions: Pierīga, Vidzeme, Kurzeme, Zemgale, and Latgale were provided by the Central Statistical Bureau of Latvia (https://www.csb.gov.lv/) ( Figure 1). The regional crop yield statistics were used to determine the yield of each field based on its location (i.e., in which region it was located). If a field was located in more than one region, the field centroid was used to determine the region.

Wheat Yield Empirical Model
A linear regression model with aNDVI as a predictor variable and statistical yield (i.e., crop yield statistics reported by the Central Statistical Bureau of Latvia) as a response variable was made for each field to evaluate the possibility of using NDVI series of spring and winter wheat to estimate and evaluate the yield of the fields. Two factors were added to the linear model. First, a factor indicating if the crop was winter or spring wheat. Second, a factor indicating the region where the field was located. By adding this factor to the model, regional differences such as the dominant Reference Soil Group in each region were taken into account ( Figure 2). The fitted linear regression model can be written as Yieldi = α + β1 aNDVIi + β2 cropi + β3 R1i + β4 R2i + β5 R3i+ β6 R4i + ɛi (1) with Yieldi = the yield of field i; aNDVIi = the calculated aNDVI of field i; cropi = 1 if crop on field i is winter wheat and cropi = 0 if crop on field i is spring wheat; R1i = 1, R2i = 1, R3i = 1, or R4i = 1 if field i is located in the Latgale, Pierīga, Vidzeme, Zemgale, region, respectively, or R1i = 0, R2i = 0, R3i = 0, R4i = 0 if it is located in another region. α, β1, β2, β3, β4, β5, and β6 are the coefficients of the model,

Crop yield Statistics Data
Yearly regional crop yield statistics from 2014 to 2018 for spring and winter wheat in Latvia for the NUTS 3 (Nomenclature of Territorial Units for Statistics level 3) regions: Pierīga, Vidzeme, Kurzeme, Zemgale, and Latgale were provided by the Central Statistical Bureau of Latvia (https://www.csb.gov.lv/) ( Figure 1). The regional crop yield statistics were used to determine the yield of each field based on its location (i.e., in which region it was located). If a field was located in more than one region, the field centroid was used to determine the region.

Wheat Yield Empirical Model
A linear regression model with aNDVI as a predictor variable and statistical yield (i.e., crop yield statistics reported by the Central Statistical Bureau of Latvia) as a response variable was made for each field to evaluate the possibility of using NDVI series of spring and winter wheat to estimate and evaluate the yield of the fields. Two factors were added to the linear model. First, a factor indicating if the crop was winter or spring wheat. Second, a factor indicating the region where the field was located. By adding this factor to the model, regional differences such as the dominant Reference Soil Group in each region were taken into account ( Figure 2). The fitted linear regression model can be written as Remote Sens. 2020, 12, 2206 6 of 20 statistics reported by the Central Statistical Bureau of Latvia) as a response variable was made for each field to evaluate the possibility of using NDVI series of spring and winter wheat to estimate and evaluate the yield of the fields. Two factors were added to the linear model. First, a factor indicating if the crop was winter or spring wheat. Second, a factor indicating the region where the field was located. By adding this factor to the model, regional differences such as the dominant Reference Soil Group in each region were taken into account ( Figure 2). The fitted linear regression model can be written as Yieldi = α + β1 aNDVIi + β2 cropi + β3 R1i + β4 R2i + β5 R3i+ β6 R4i + ɛi (1) with Yieldi = the yield of field i; aNDVIi = the calculated aNDVI of field i; cropi = 1 if crop on field i is winter wheat and cropi = 0 if crop on field i is spring wheat; R1i = 1, R2i = 1, R3i = 1, or R4i = 1 if field i is located in the Latgale, Pierīga, Vidzeme, Zemgale, region, respectively, or R1i = 0, R2i = 0, R3i = 0, R4i = 0 if it is located in another region. α, β1, β2, β3, β4, β5, and β6 are the coefficients of the model, i (1) with Yield i = the yield of field i; aNDVI i = the calculated aNDVI of field i; crop i = 1 if crop on field i is winter wheat and cropi = 0 if crop on field i is spring wheat; R1 i = 1, R2 i = 1, R3 i = 1, or R4 i = 1 if field i is located in the Latgale, Pierīga, Vidzeme, Zemgale, region, respectively, or R1 i = 0, R2 i = 0, R3 i = 0, R4 i = 0 if it is located in another region. α, β 1 , β 2 , β 3 , β 4 , β 5 , and β 6 are the coefficients of the model, and etermine the region. ctor variable and statistical yield (i.e., crop yield f Latvia) as a response variable was made for ries of spring and winter wheat to estimate and ed to the linear model. First, a factor indicating ctor indicating the region where the field was ifferences such as the dominant Reference Soil e 2). The fitted linear regression model can be ulated aNDVI of field i; cropi = 1 if crop on field ring wheat; R1i = 1, R2i = 1, R3i = 1, or R4i = 1 if gale, region, respectively, or R1i = 0, R2i = 0, R3i 3, β4, β5, and β6 are the coefficients of the model, i is the error of the model. The linear regression model performance was evaluated with the explained variance, the residual standard error, and the Akaike Information Criterion (AIC) [28].
In addition to a linear regression model, a random forest regression was applied. The latter was built based on 100 trees and evaluated with the explained variance. Similar to the linear regression model, the variables aNDVI, a factor crop indicating if the crop was winter or spring wheat, and a factor indicating the region where the field was located were used in the random forest model to predict wheat yield. The random forest model was build using the randomForest function, which implements the Breiman's algorithm, from the R package randromForest [29]. The results of the two modeling approaches were compared.

Regional Climate Modeling
We used the field level monthly mean values of minimum and maximum temperatures (henceforth Tmin and Tmax) from the REMO [30,31] and ALARO-0 Regional Climate Models (RCM) [32,33], both driven by lateral boundary conditions from the ERA-Interim reanalysis [34] using the Davies procedure [35], as prepared according to [35,36]. Tmin and Tmax were selected based on preliminary analysis using climate station data which indicated that wheat yield and aNDVI were more strongly correlated with Tmin and Tmax, and less with precipitation, reference evapotranspiration, or frost occurrence (defined as number of days with Tmin < −15 • C). Validation of the variables for the period from 1980 to 2017 over Latvia showed that the modeled Tmin and Tmax contained biases compared to station observations [37]. For example, REMO showed a cold Tmin bias in winter and in February in particular, and underestimated Tmax in summer in June and July. However, the spatial patterns for temperature were similar between RCMs and observations. [37] concluded that both ALARO and REMO could be applied in impact modelling. These high-resolution climate data can be freely downloaded from the Earth System Grid Federation (ESGF) data nodes (http://esgf.llnl.gov/). Monthly mean values of Tmin and Tmax were available from 2014 to 2018 for the REMO and ALARO-0 models. RCM climate data were preferred over climate station data as their spatial resolution is much higher. In addition, RCM data form the basis for the analysis of climate projections and could thus be used to evaluate the effect of future climate on wheat yield.

Evaluating the Effect of Weather on Wheat Yield
To evaluate the effect of weather on wheat yield, the Pearson correlation between temperatures (Tmin and Tmax) and aNDVI, and the correlation between temperatures (Tmin and Tmax) and wheat yield, were calculated. Considering the growing season of wheat in Latvia, the correlation was calculated for the months from January to August.
Finally, we tested if REMO and ALARO-0 RCM could be used to improve wheat yield modeling. Therefore, in a second modeling stage, monthly mean RCM temperatures which showed a high correlation with wheat yield and aNDVI were added as predictors to the wheat yield linear and random forest models. Separate linear regressions and random forest models were built for the selected REMO and ALARO-0 temperature data.

NDVI Series of Winter and Spring Wheat
The average NDVI series from 2014 to 2018 for the studied winter and spring wheat fields in Latvia are visualized in Figure 4. For winter wheat, the NDVI peak was around mid-May, while for spring wheat the NDVI peak was later in the growing season-around mid-June. For both winter and spring wheat, the NDVI dropped around the end of August which coincided with the harvest time of winter and spring wheat in Latvia.

aNDVI and Crop Yield Statistics
The NDVI series used to calculate the aNDVI for spring and winter wheat for each of the five NUTS 3 regions are visualized in Figure 5. The Zemgale region had the highest number of wheat fields, whereas the Latgale region had the lowest number of wheat fields. The number of fields used for the calculation of aNDVI for each year for spring and winter wheat is presented in Figure 6. Note that the number of fields in the years 2014 and 2018 was lower compared to the other years. The number of winter wheat fields was higher than the number of spring wheat fields, except in 2014. This is because a large proportion of winter wheat fields were damaged by frost due to a cold spell in mid-January in 2014. These fields were subsequently re-sown with spring wheat [38].

aNDVI and Crop Yield Statistics
The NDVI series used to calculate the aNDVI for spring and winter wheat for each of the five NUTS 3 regions are visualized in Figure 5. The Zemgale region had the highest number of wheat fields, whereas the Latgale region had the lowest number of wheat fields. The number of fields used for the calculation of aNDVI for each year for spring and winter wheat is presented in Figure 6. Note that the number of fields in the years 2014 and 2018 was lower compared to the other years. The number of winter wheat fields was higher than the number of spring wheat fields, except in 2014. This is because a large proportion of winter wheat fields were damaged by frost due to a cold spell in mid-January in 2014. These fields were subsequently re-sown with spring wheat [38].  In Figure 7, the yield at the regional level, as reported by the Central Statistical Bureau of Latvia for spring and winter wheat for the years from 2014 to 2018, is visualized. For spring wheat, a low yield was reported in 2018. For winter wheat there was a low yield reported in the years 2014 and 2018 ( Figure 7). Similar patterns were visible when the aNDVI calculated for the studied fields was plotted for each year for spring and winter wheat ( Figure 8). Nonetheless, the variation in aNDVI between the years (Figure 8) was not as pronounced as compared to the variation in yield between the years (Figure 7).  Similar patterns were visible when the aNDVI calculated for the studied fields was plotted for each year for spring and winter wheat (Figure 8). Nonetheless, the variation in aNDVI between the  Similar patterns were visible when the aNDVI calculated for the studied fields was plotted for each year for spring and winter wheat (Figure 8). Nonetheless, the variation in aNDVI between the

Wheat Yield Empirical Model
The statistics of the coefficients (estimate, standard error, t-value, and p-value) of the wheat yield empirical linear regression model (Equation (1)) are presented in Table 1. All the coefficients of the linear regression model were significant (p < 0.001). The explained variance of the linear regression model was 71% and the residual standard error was 0.55 Mg/ha. When a random forest regression model was used instead of a linear regression model, the residual standard error of the random forest model was equal to 0.58 Mg/ha and the explained variance by the model decreased to 62%. Therefore, the linear regression model performed better. The explained variance of the linear regression model decreased by 23% when the region factor was not included in the model. The AIC increased from 3400 to 4574 when region was not included in the model. When neither crop nor region factors were included in the model, the explained variance decreased by 27%-from 71% to 44%. In addition, the AIC increased from 3400 to 4724 when neither crop nor region were included in the model. Allowing the slope of the linear regression model to change depending on the region, i.e., including an interaction term between aNDVI and region, did not result in the improvement of the model accuracy and resulted in non-significant parameter estimates. The proposed linear regression model with the factor crop, indicating if the crop is spring or winter wheat, and the factor region, indicating in which region the field is located, explains the wheat yield variability well. In Figure 9, the yield, as predicted by the empirical linear regression model, is plotted versus the observed yield for spring and winter wheat. The adjusted R 2 (R 2 = 0.71) and the residual standard error of the linear regression model fit (0.55 Mg/ha) resulted in good estimates of the model parameters (Table 1). However, the linear regression model overestimated the yield of winter wheat slightly in the years 2014 and 2018 ( Figure  9) and slightly underestimated the 2015 yield of spring wheat.

Wheat Yield Empirical Model
The statistics of the coefficients (estimate, standard error, t-value, and p-value) of the wheat yield empirical linear regression model (Equation (1)) are presented in Table 1. All the coefficients of the linear regression model were significant (p < 0.001). The explained variance of the linear regression model was 71% and the residual standard error was 0.55 Mg/ha. When a random forest regression model was used instead of a linear regression model, the residual standard error of the random forest model was equal to 0.58 Mg/ha and the explained variance by the model decreased to 62%. Therefore, the linear regression model performed better. The explained variance of the linear regression model decreased by 23% when the region factor was not included in the model. The AIC increased from 3400 to 4574 when region was not included in the model. When neither crop nor region factors were included in the model, the explained variance decreased by 27%-from 71% to 44%. In addition, the AIC increased from 3400 to 4724 when neither crop nor region were included in the model. Allowing the slope of the linear regression model to change depending on the region, i.e., including an interaction term between aNDVI and region, did not result in the improvement of the model accuracy and resulted in non-significant parameter estimates. The proposed linear regression model with the factor crop, indicating if the crop is spring or winter wheat, and the factor region, indicating in which region the field is located, explains the wheat yield variability well. In Figure 9, the yield, as predicted by the empirical linear regression model, is plotted versus the observed yield for spring and winter wheat. The adjusted R 2 (R 2 = 0.71) and the residual standard error of the linear regression model fit (0.55 Mg/ha) resulted in good estimates of the model parameters (Table 1). However, the linear regression model overestimated the yield of winter wheat slightly in the years 2014 and 2018 ( Figure 9) and slightly underestimated the 2015 yield of spring wheat.

Effect of Tmin and Tmax on Wheat Yield
In Figure 10, the Pearson correlation between the monthly means of temperature (Tmax and Tmin) and aNDVI, and the monthly means of temperature (Tmax and Tmin) and yield for spring wheat is visualized for the months from January to August. The most negative correlation between monthly means of temperature and yield, and temperature and aNDVI, was found in June, indicating that high temperatures in June resulted in lower aNDVI and yield values for spring wheat. In the month of March, there was a high positive correlation between Tmin and yield, and Tmax and yield. The correlations between aNDVI and Tmin, and aNDVI and Tmax of March, showed similar high positive correlations, indicating that high temperatures in March resulted in high aNDVI and yield values for spring wheat.

Effect of Tmin and Tmax on Wheat Yield
In Figure 10, the Pearson correlation between the monthly means of temperature (Tmax and Tmin) and aNDVI, and the monthly means of temperature (Tmax and Tmin) and yield for spring wheat is visualized for the months from January to August. The most negative correlation between monthly means of temperature and yield, and temperature and aNDVI, was found in June, indicating that high temperatures in June resulted in lower aNDVI and yield values for spring wheat. In the month of March, there was a high positive correlation between Tmin and yield, and Tmax and yield. The correlations between aNDVI and Tmin, and aNDVI and Tmax of March, showed similar high positive correlations, indicating that high temperatures in March resulted in high aNDVI and yield values for spring wheat. In Figure 11, the Pearson correlation between the monthly means of temperature (Tmax and Tmin) and aNDVI, and the monthly means of temperature (Tmax and Tmin) and yield for winter wheat, is visualized for the months from January to August. The most negative correlation between temperature and aNDVI, and temperature and yield, was observed in July, indicating that high temperatures in July resulted in lower aNDVI and yield values for winter wheat. The most positive correlations between temperature and yield, and between temperature and aNDVI, were observed in January. In Figure 11, the Pearson correlation between the monthly means of temperature (Tmax and Tmin) and aNDVI, and the monthly means of temperature (Tmax and Tmin) and yield for winter wheat, is visualized for the months from January to August. The most negative correlation between temperature and aNDVI, and temperature and yield, was observed in July, indicating that high temperatures in July resulted in lower aNDVI and yield values for winter wheat. The most positive correlations between temperature and yield, and between temperature and aNDVI, were observed in January.
Remote Sens. 2020, 12, x FOR PEER REVIEW 13 of 20 Figure 11. Pearson correlation between monthly means of temperature (Tmax and Tmin) and aNDVI and yield for winter wheat. Monthly means of Tmin and Tmax of the months from January to August are used to calculate the correlation. Top graph: Pearson correlation between TminALARO and aNDVI (red circles), TminREMO and aNDVI (blue circles), TminALARO and yield (red triangles), and TminREMO and yield (blue triangles). Bottom graph: Pearson correlation between TmaxALARO and aNDVI (red circles), TmaxREMO and aNDVI (blue circles), TmaxALARO and yield (red triangles), and TmaxREMO and yield (blue triangles).
In Figure 12, wheat yield is plotted as a function of maximum temperature in June and March, and July and January, for spring and winter wheat yield, respectively. These are the two months which showed the strongest correlation with spring and winter wheat. The highest spring wheat yields were found in the years with low temperatures in June and high temperatures in March. The highest winter wheat yields were found in the years with low temperatures in July and high temperatures in January. Figure 11. Pearson correlation between monthly means of temperature (Tmax and Tmin) and aNDVI and yield for winter wheat. Monthly means of Tmin and Tmax of the months from January to August are used to calculate the correlation. Top graph: Pearson correlation between TminALARO and aNDVI (red circles), TminREMO and aNDVI (blue circles), TminALARO and yield (red triangles), and TminREMO and yield (blue triangles). Bottom graph: Pearson correlation between TmaxALARO and aNDVI (red circles), TmaxREMO and aNDVI (blue circles), TmaxALARO and yield (red triangles), and TmaxREMO and yield (blue triangles).
In Figure 12, wheat yield is plotted as a function of maximum temperature in June and March, and July and January, for spring and winter wheat yield, respectively. These are the two months which showed the strongest correlation with spring and winter wheat. The highest spring wheat yields were found in the years with low temperatures in June and high temperatures in March. The highest winter wheat yields were found in the years with low temperatures in July and high temperatures in January. March for the spring wheat fields. Each point represents a field, the color of the points indicates the yield of the spring wheat field. Bottom row: scatter plot of maximum temperature in July versus maximum temperature in January for the winter wheat fields. Each point represents a field, the color of the points indicates the yield of the winter wheat field. Monthly maximum temperature as modeled by both the REMO (left graphs) and the ALARO-0 (right graphs) models were used.

Regional Wheat Yield Modelling
For winter wheat, the modeled yield was overestimated in 2014 in all regions and in 2018 for fields located in the Kurzeme region (Figures 9 and 13). The years with the lowest yields were also 2014 and 2018 (Figure 7). This suggests that aNDVI did not capture one or more factors affecting low winter wheat yields observed in 2014 and 2018. The correlation of the monthly means of temperature and yield for winter wheat suggests that high temperatures in July resulted in lower winter wheat yields ( Figure 11). The highest July temperatures were observed in 2014 and 2018 ( Figure 13), but their negative impact on winter wheat yield did not show in aNDVI. This might explain the overestimation of modeled winter wheat in the years 2014 and 2018. Considering this observation, the wheat yield model could improve by adding explanatory variables which are known to affect wheat yield such as Tmin or Tmax of July. Indeed, when Tmax of July from the ALARO-0 model or REMO model were added to the linear regression model the explained variance increased by 10%from 71% to 79% and 11%-from 71% to 80%, respectively. The AIC of the wheat linear regression model decreased from 3400 to 2615 for the ALARO-0 model and to 2702 for the REMO model, when including the July maximum temperatures. The residual standard error decreased from 0.55 to 0.47 Mg/ha and to 0.46 Mg/ha, respectively, when Tmax of July from ALARO-0 or REMO were added to the linear regression model. However, when July maximum temperatures were included in the model, the explained variance by the random forest model was higher compared to the linear regression model. When a random forest modeling approach was used, the explained variance was equal to 97% for the ALARO-0 model and 94% for the REMO model, when including July maximum temperatures. The residual standard error was equal to 0.17 Mg/ha for the ALARO-0 model and 0.23 Mg/ha for the REMO model, when including July maximum temperatures in the random forest models. March for the spring wheat fields. Each point represents a field, the color of the points indicates the yield of the spring wheat field. Bottom row: scatter plot of maximum temperature in July versus maximum temperature in January for the winter wheat fields. Each point represents a field, the color of the points indicates the yield of the winter wheat field. Monthly maximum temperature as modeled by both the REMO (left graphs) and the ALARO-0 (right graphs) models were used.

Regional Wheat Yield Modelling
For winter wheat, the modeled yield was overestimated in 2014 in all regions and in 2018 for fields located in the Kurzeme region (Figures 9 and 13). The years with the lowest yields were also 2014 and 2018 (Figure 7). This suggests that aNDVI did not capture one or more factors affecting low winter wheat yields observed in 2014 and 2018. The correlation of the monthly means of temperature and yield for winter wheat suggests that high temperatures in July resulted in lower winter wheat yields ( Figure 11). The highest July temperatures were observed in 2014 and 2018 ( Figure 13), but their negative impact on winter wheat yield did not show in aNDVI. This might explain the overestimation of modeled winter wheat in the years 2014 and 2018. Considering this observation, the wheat yield model could improve by adding explanatory variables which are known to affect wheat yield such as Tmin or Tmax of July. Indeed, when Tmax of July from the ALARO-0 model or REMO model were added to the linear regression model the explained variance increased by 10%-from 71% to 79% and 11%-from 71% to 80%, respectively. The AIC of the wheat linear regression model decreased from 3400 to 2615 for the ALARO-0 model and to 2702 for the REMO model, when including the July maximum temperatures. The residual standard error decreased from 0.55 to 0.47 Mg/ha and to 0.46 Mg/ha, respectively, when Tmax of July from ALARO-0 or REMO were added to the linear regression model. However, when July maximum temperatures were included in the model, the explained variance by the random forest model was higher compared to the linear regression model. When a random forest modeling approach was used, the explained variance was equal to 97% for the ALARO-0 model and 94% for the REMO model, when including July maximum temperatures. The residual standard error was equal to 0.17 Mg/ha for the ALARO-0 model and 0.23 Mg/ha for the REMO model, when including July maximum temperatures in the random forest models.
Remote Sens. 2020, 12, x FOR PEER REVIEW 15 of 20 Figure 13. Maximum temperature in July from the regional climate models REMO and ALARO-0 for winter wheat fields for the period from 2014 to 2018.
Adding explanatory variables which are known to affect wheat yield also improved the explained variance of the individual spring and winter wheat linear regression models. The explained variance of the winter wheat linear regression model increased from 59% to 77% for the ALARO-0 model and to 84% for the REMO model, when including the July maximum temperatures. The AIC of the winter wheat linear regression model decreased from 2690 to 1794 for the ALARO-0 model and to 1258 for the REMO model, when including July maximum temperatures. Both linear regression models indicated that July maximum temperatures had a negative effect on winter wheat yield. The residual standard error decreased from 0.57 to 0.43 Mg/ha and to 0.36 Mg/ha, respectively, when July maximum temperatures from ALARO-0 or REMO models were added to the winter wheat linear regression model. Likewise, the explained variance of the spring wheat linear regression model increased from 59% to 83% for the ALARO-0 model and to 77% for the REMO model, when including the June maximum temperatures. The AIC of the spring wheat linear regression model decreased from 661 to 246 for the ALARO-0 model and to 387 for the REMO model, when including the June maximum temperatures. Both linear regression models indicated that June maximum temperatures had a negative effect on spring wheat yield. The residual standard error decreased from 0.48 to 0.31 Mg/ha and to 0.36 Mg/ha, respectively, when June maximum temperatures from ALARO-0 or REMO models were added to the spring wheat linear regression model. The explained variance was higher when a random forest regression modelling approach was used for both the winter and spring wheat model when including July and June maximum temperatures, respectively. For the winter wheat random forest model, the explained variance was equal to 98% for the ALARO-0 model and to 97% for the REMO model, when including July maximum temperatures. The residual standard error was equal to 0.12 Mg/ha for the ALARO-0 model and to 0.16 Mg/ha for the REMO model, when including July maximum temperatures in the random forest winter wheat model. For the spring wheat random forest model, the explained variance was equal to 92% for the ALARO-0 model and to 93% for the REMO model, when including June maximum temperatures. The residual standard error was equal to 0.21 Mg/ha for the ALARO-0 model and to 0.20 Mg/ha for the REMO model, when including June maximum temperatures in the random forest spring wheat model. Adding explanatory variables which are known to affect wheat yield also improved the explained variance of the individual spring and winter wheat linear regression models. The explained variance of the winter wheat linear regression model increased from 59% to 77% for the ALARO-0 model and to 84% for the REMO model, when including the July maximum temperatures. The AIC of the winter wheat linear regression model decreased from 2690 to 1794 for the ALARO-0 model and to 1258 for the REMO model, when including July maximum temperatures. Both linear regression models indicated that July maximum temperatures had a negative effect on winter wheat yield. The residual standard error decreased from 0.57 to 0.43 Mg/ha and to 0.36 Mg/ha, respectively, when July maximum temperatures from ALARO-0 or REMO models were added to the winter wheat linear regression model. Likewise, the explained variance of the spring wheat linear regression model increased from 59% to 83% for the ALARO-0 model and to 77% for the REMO model, when including the June maximum temperatures. The AIC of the spring wheat linear regression model decreased from 661 to 246 for the ALARO-0 model and to 387 for the REMO model, when including the June maximum temperatures. Both linear regression models indicated that June maximum temperatures had a negative effect on spring wheat yield. The residual standard error decreased from 0.48 to 0.31 Mg/ha and to 0.36 Mg/ha, respectively, when June maximum temperatures from ALARO-0 or REMO models were added to the spring wheat linear regression model. The explained variance was higher when a random forest regression modelling approach was used for both the winter and spring wheat model when including July and June maximum temperatures, respectively. For the winter wheat random forest model, the explained variance was equal to 98% for the ALARO-0 model and to 97% for the REMO model, when including July maximum temperatures. The residual standard error was equal to 0.12 Mg/ha for the ALARO-0 model and to 0.16 Mg/ha for the REMO model, when including July maximum temperatures in the random forest winter wheat model. For the spring wheat random forest model, the explained variance was equal to 92% for the ALARO-0 model and to 93% for the REMO model, when including June maximum temperatures. The residual standard error was equal to 0.21 Mg/ha for the ALARO-0 model and to 0.20 Mg/ha for the REMO model, when including June maximum temperatures in the random forest spring wheat model.

Discussion
The statistics of the empirical models showed that there is a scope to use 10-daily NDVI series with a 300 m resolution provided by satellites with a PROBA-V configuration to estimate spring and winter wheat yield in Latvia (Table 1 and Figure 9). The variance explained by the wheat linear regression model decreased by 22% when the factor region was not included as a predictor factor in the wheat yield model. This is likely related to the fact that yield data (the response variable) reflect the regional environment. Wheat yield variability may not be fully captured by aNDVI and may need additional information, such as, for example, differences in soil characteristics, used wheat varieties, and management practices between the regions, although the analysis by [14] suggests that NDVI time series are sufficient to estimate yields on a parcel to regional scale.
In general, the model performance is in the same range as that of other studies using PROBA-V NDVI data to predict wheat yield [14,21,22]. Zheng, Y. et al. used NDVI to estimate the aboveground biomass with a light use efficiency model [21], and subsequently predicted wheat yield. Durgun, Y.Ö. et al. fitted an asymmetric double sigmoid model on the NDVI series and used the integration of the fitted function as a predictor for wheat yield [14]. Compared to these studies, our methodology of computing the integral of the NDVI series (i.e., aNDVI) as a predictor of wheat yield is straightforward. However, a prerequisite is the availability of continuous NDVI series. Methodologies which combine different remotely sensed data-such as vegetation indicators and microwave vegetation optical depth data-can be used to estimate crop yield in regions where continuous NDVI series are scarce due to the frequent presence of clouds [17]. The developed wheat modelling method can be used in other wheat producing areas. In addition, NDVI, aNDVI, and meteorological variables may capture only part of the important wheat yield-influencing variables. In other regions' local soil fertility conditions, the effects of variety and weather extremes may explain a much larger proportion of the yield variation. The explained variance by the random forest regression models that included RCM data as a predictor variable was higher compared to the linear regression models. Therefore, in regions where RCM data are available, a random forest regression modeling approach is recommended for estimating regional wheat yields.
Durgun, Y.Ö. et al studied whether better wheat yield estimations could be obtained with thermal instead of calendar time for the integration of PROBA-V NDVI series at spatial resolutions of 100 m, 300 m, and 1 km [14]. Thermal time outperformed calendar time for estimating winter wheat, but a notable exception occurred for fields with a pixel purity above 65% at 300 m spatial resolution. However, due to low sample size at 300 m resolution no general conclusion could be made by [14]. The present study seems to confirm the observation of [14], since we obtained good wheat yield estimations with pure pixel NDVI data at 300 m resolution projected on calendar time (i.e., aNDVI). The use of thermal time did not improve wheat yield estimations. The explained variance decreased by 4% when thermal time instead of calendar time was used in the linear regression wheat yield model.
In general, the variation in correlation from January to August between monthly means of temperature and wheat yield on the one hand, and monthly means of temperature and aNDVI on the other hand, were quite similar.
The variation in correlation of monthly mean temperature and winter and spring wheat yield and aNDVI between the different months from January to August indicated that temperature during some months influenced the yield and aNDVI more than others. For spring wheat, high temperatures in June resulted in lower yield and lower aNDVI, while for winter wheat the same pattern was found in July. The negative effect of temperature in June (for spring wheat) and July (for winter wheat) on aNDVI can be linked to important phenological stages of winter and spring wheat. This observation indicates that the variability in wheat yield in Latvia is likely to increase in the future. Indeed, the average temperature as well as the occurrence of extremely hot days will further increase due to climate change. The negative effect of high temperatures on wheat yield in June (for spring wheat) and July (for winter wheat) confirms similar observations in Europe [39,40] and should, therefore, be considered in climate change adaptation strategies [2,26,41]. From the observed positive correlation between temperature in January and winter wheat yield, we could conclude that projected increases in winter temperatures could have a positive effect on winter wheat yield. However, dehardening (i.e., loss of freezing tolerance) of wheat is also triggered by higher temperatures [42] and may negatively impact crop performance. Winter wheat growing at high temperatures in January generally loses its freezing tolerance and will be more susceptible to low temperatures in spring [43]. In addition, in the case of, due to global warming, both January and July temperatures increase, the positive effect of high January temperatures on winter wheat yield could be offset by the negative effect of high July temperatures on winter wheat yield. A similar remark can be made for spring wheat, as the positive effect of high March temperatures on spring wheat yield is likely to be offset by the negative effect of high June temperatures on spring wheat yield. These phenomena are already visible at present. In years with high temperatures for both March and June the spring wheat yield is moderate, and in years with high temperatures for both January and July, winter wheat yield is moderate (Figure 12).

Conclusions
Wheat yield was estimated from NDVI time series for spring and winter wheat in Latvia. The wheat yield was derived from the integral of the NDVI series (i.e., aNDVI) using a simple methodology which can be applied to other regions and crops where continuous NDVI series are available. We can conclude that adding (i) region as a factor, (ii) both the region and crop type, and (iii) explanatory weather variables from regional climate models (RCM) to the model improved wheat yield prediction. The importance of adding region as a factor to the model might be explained by regional differences in the dominant Reference Soil Group. Allowing the slope of the model to change depending on the region did not improve wheat yield prediction. Our results suggest that there is a large scope to use NDVI-derived crop yield estimates, such as aNDVI, and that RCM output plays a valuable role in understanding regional weather impacts on yield. In the case that RCM data are available, a random forest model approach is recommended to model regional wheat yields. The high spatial extent of the VI-derived crop yield estimates is an important benefit compared to the commonly used crop yield statistics that are often only available at the regional level. Likewise, the spatio-temporal availability of the regional climate model output is superior to sparsely available climate stations. Using aNDVI and the RCMs ALARO-0 and REMO, we found that high temperatures during June (for spring wheat) and July (for winter wheat) influenced the wheat yield negatively in Latvia. High temperatures in March and January had a positive effect on spring and winter wheat yield, respectively. However, this positive effect of high temperatures was found to be offset when temperatures were high in both March and June, and January and July for spring and winter wheat, respectively. This information should be considered in weather impact analysis and climate change adaptation strategies such as, for example, wheat breeding programs.