Estimating Wheat Yield in China at the Field and District Scale from the Assimilation of Satellite Data into the Aquacrop and Simple Algorithm for Yield ( SAFY ) Models

Accurate yield estimation at the field scale is essential for the development of precision agriculture management, whereas at the district level it can provide valuable information for supply chain management. In this paper, Huan Jing (HJ) satellite HJ1A/B and Landsat 8 Operational Land Imager (OLI) images were employed to retrieve leaf area index (LAI) and canopy cover (CC) in the Yangling area (Central China). These variables were then assimilated into two crop models, Aquacrop and simple algorithm for yield (SAFY), in order to compare their performances and practicalities. Due to the models’ specificities and computational constraints, different assimilation methods were used. For SAFY, the ensemble Kalman filter (EnKF) was applied using LAI as the observed variable, while for Aquacrop, particle swarm optimization (PSO) was used, using canopy cover (CC). These techniques were applied and validated both at the field and at the district scale. In the field application, the lowest relative root-mean-square error (RRMSE) value of 18% was obtained using EnKF with SAFY. On a district scale, both methods were able to provide production estimates in agreement with data provided by the official statistical offices. From an operational point of view, SAFY with the EnKF method was more suitable than Aquacrop with PSO, in a data assimilation context.


Introduction
Precision agriculture is being increasingly considered as an optimal technological approach to improve the efficiency and the productivity of farming and cropping systems [1].Monitoring crop growth using remotely sensed biophysical variables such as leaf area index (LAI) or canopy cover (CC) has proven to be very effective for inferring information on crop biomass and yield, useful for field management [2][3][4].Yield estimation and forecasting at the field scale would allow farm managers to plan their agronomic operations, e.g., sowing, tillage or fertilization, on the basis of expected yield potential.At the district level scale, yield estimation and forecasting is useful to public and private organization, for commercial or planning purposes.
The use of remote sensing (RS) in order to monitor crop growth has considerably increased recently, with the development of new technologies such as hyperspectral sensors [5] and easier availability of imagery, allowing exploitation of these data in a progressively more effective way [6].Particularly, satellite RS is increasingly used in monitoring crops and variables that describe their growth, as it allows the monitoring both of large zones on a regional or district scale [7], and of smaller areas on a field scale [6].The use of these data has proven to be effective in the estimation of crop yield [2,[8][9][10], by combining the monitoring from remote sensing of crop biophysical variables with crop growth models.Leaf area index (LAI), fractional canopy cover (CC), the fraction of photosynthetically active radiation absorbed by the canopy (fAPAR) and plant chlorophyll concentration are the most important canopy variables used in this context, being the main state variables used in crop growth modelling [11].A detailed review of the methods for retrieving canopy variables from remote sensing [11] divided these methods into: (1) statistical approaches, e.g., using vegetation indices and regression methods linking spectral information and biochemical variables [12]; and (2) approaches exploiting physically-based models [13][14][15].The latter are particularly interesting, because they are expected to be of more general validity, without the need for data-specific calibrations as in the former methods.Additionally, they often allow to estimate simultaneously several variables such as LAI and CC [14].A widely used strategy is to train artificial neural networks (ANNs) using the simulations from a canopy radiative transfer model such as PROSAIL [16], which couples the leaf model PROSPECT [17] and the canopy model SAIL [18].Many studies have shown the effectiveness of this methodology, using both multispectral [14,19] and hyperspectral [20] sensors.
In order to estimate crop yield, the biophysical variables can be assimilated into crop growth models using different approaches.Following [21], it is possible to distinguish two strategies: calibration and updating.The calibration approach consists of re-calibrating the model parameters in order to minimize the differences between simulated and observed variables.In the updating approach, it is the value of the biophysical variables simulated by the model that is updated, on the basis of the value of the biophysical variables observed e.g., from remote sensing.Some authors have converted the state variables simulated by the models into vegetation indices (VIs) by means of canopy reflectance models and used VIs, derived from RS data, in the assimilation.For example, authors of [22] developed a methodology in which the LAI simulated by the EPIC crop model was converted into reflectance in the RED and NIR spectral range of Landsat TM or AVHRR data using the SAIL model.NDVI values from these reflectances were then compared with NDVI derived from satellite imagery.The EPIC crop model parameters were re-calibrated for NDVI to be within 20 percent of the observed satellite data [22].Other authors have estimated biophysical variables from RS and used them directly in the assimilation.For example, authors of [23] recalibrated the parameters of the SUCROS model by minimizing the difference between LAI estimated from SPOT satellite data and LAI simulated by the model.The updating approach has also been applied in many studies [24][25][26][27][28], in which the effectiveness of this method to improve the accuracy of the crop growth model predictions has always been proven.Authors of [24] tested the ensemble Kalman filter (EnKF), an updating assimilation method, to assimilate the LAI derived from MODIS into the WOFOST crop model [29], in order to determine the production of wheat at a regional scale.Authors of [25] explored the potential of EnKF for simultaneously assimilating two remotely sensed variables, i.e., soil moisture and LAI, demonstrating the possibility of preventing yield losses by monitoring the effects of water stress on crops.The potential of the EnKF method assimilation was also compared with the POD4DVAR, a four-dimensional variational algorithm [27], assimilating LAI estimated by the CCD sensors on board of the Chinese Huan Jing (HJ) satellites HJ1A/B into the CERES crop model, both at a regional and field scale.The POD4DVAR showed a higher computational efficiency, but was more influenced by measurements uncertainties than EnKF.
Regardless of the approach, in all the studies in which remote sensing data are assimilated into crop models, not all the parameters are recalibrated, because of their large number and the limited information about them available in situ.This implies that the choices on which parameters to recalibrate at run-time and of the values to assign to fixed parameters are crucial [23].A study conducted by authors of [28] highlighted these problems.The green area index (GAI) estimated from MODIS was assimilated in the WOFOST [29] model using a calibration approach, minimizing the error between modelled GAI and MODIS-observed GAI.Only the parameters that most affect GAI were optimized, highlighting the need to calculate the parameters sensitivity for each scenario [28].
The efficiency of the assimilation methods is also strongly dependent on the number of observations ingested into the process [11].Many studies in the literature (e.g., [24][25][26]) showed that using a higher number of observations corresponds an improved accuracy of estimation [2].Generally, the number of observations assimilated into the models is higher than six.In a context of regional scale, there are many satellites which can provide data frequently, albeit at a low resolution (e.g., MODIS).However, for precision agriculture applications, it is necessary to work at the field scale, with a spatial resolution of at least 30 m or higher.The opportunities to obtain a high number of data acquisitions with such resolution are currently rather limited, therefore, in order to apply these methods at the field scale, it is necessary to use efficient assimilation algorithms, performing well even with a limited number of images, or to extend the number of observations in other ways.For example, [4] tried to solve this problem by using a data fusion algorithm blending MODIS (250 m) and OLI data (30 m) to estimate LAI at the field scale and assimilated the data into the simple algorithm for yield (SAFY) model.The fused data set had the temporal resolution of MODIS data (8 days) and the spatial resolution of OLI (30 m).
In addition to the number of observations, the choice of the crop model to use is also an important factor conditioning the assimilation results.Generally, models that accurately describe the crop growth process have a higher accuracy, but they are more complex, more difficult to integrate with other processes (such as the assimilation methods) and have higher computational costs.In particular, the large number of parameters makes the calibration of the model rather laborious, and this is one of the main problems of the application of assimilation methods.The choice of the model to use, in agreement with the purpose to be achieved, should consider other attributes in addition to model accuracy, such as complexity and plasticity [30,31].
Among all the crop models that have been used in assimilation studies, Aquacrop [32], seems particularly interesting for its relative simplicity, as compared to more complex models, due to its emphasis on water limitation and the explicit use, as main canopy state variable, of CC.This variable is usually easier to estimate from remote sensing than LAI, not being subject to saturation at high values [31].The first applications of Aquacrop within data assimilation schemes have provided encouraging results [31,[33][34][35].However, in Aquacrop there are about 100 parameters related to the characteristics of the crop and the soil, as well as other management and input factors.During the assimilation, authors of [33] recalibrated 3 parameters, authors of [34,35] 8 parameters and authors of [31] 13 parameters, leaving all the others fixed.Additionally, in the version currently made available by the developers, it is impossible to implement a re-initialization at the RS observation dates, preventing the application of updating approaches of data assimilation, so that only recalibration approaches have been employed so far.
SAFY [36], a much simpler model than Aquacrop, with fewer parameters, i.e., 13 in the original version, has been also successfully employed for the estimation of wheat yield, using only recalibration assimilation approaches so far [4,36].This model is potentially suitable also for the application of updating assimilation algorithms, given the accessibility of the source code, and it is particularly appealing in this context given its high computational speed [31].
The aim of this work was to develop and compare two methods for the estimation of winter wheat yield, suitable both at field and district level, based on data assimilation algorithms combining remotely sensing crop biophysical variables with the Aquacrop and SAFY crop growth models.Because of the above mentioned coding constraints, it was necessary to use different assimilation techniques for the two models, i.e., the EnKF was employed for SAFY, while for Aquacrop particle swarm optimization (PSO) [37] was used.A dataset, composed by a series of images acquired with multispectral satellite sensors, was used to estimate the LAI and the CC using a model-based method employing an artificial neural network [14].Additionally, the purpose of this study was to test the performance of assimilation methods with a reduced number of observations, in order to verify their potential for precision agriculture applications in case of constraints e.g., due to a limited number of satellite acquisitions available within the crop growth season, a situation often occurring in areas affected by frequent cloud cover.

Study Area and Datasets Employed
The study area (34.27 • N, 108.09 • E, altitude around 460 m) is located in the rural region of Yangling, in the centre of China, Province of Shaanxi (Figure 1).A dataset composed by weather data, field data and remote sensing data, was collected during three different winter wheat growth seasons, between September and June, from 2012 to 2015.
Remote Sens. 2017, 9, 509 4 of 24 Aquacrop particle swarm optimization (PSO) [37] was used.A dataset, composed by a series of images acquired with multispectral satellite sensors, was used to estimate the LAI and the CC using a model-based method employing an artificial neural network [14].Additionally, the purpose of this study was to test the performance of assimilation methods with a reduced number of observations, in order to verify their potential for precision agriculture applications in case of constraints e.g., due to a limited number of satellite acquisitions available within the crop growth season, a situation often occurring in areas affected by frequent cloud cover.

Study Area and Datasets Employed
The study area (34.27°N, 108.09°E, altitude around 460 m) is located in the rural region of Yangling, in the centre of China, Province of Shaanxi (Figure 1).A dataset composed by weather data, field data and remote sensing data, was collected during three different winter wheat growth seasons, between September and June, from 2012 to 2015.The weather data were collected at the weather station ID 57034 (34.25°N, 108.22°E), by the China Meteorological Administration and made available on the China Meteorological Data Sharing Service System (CMDSSS, http://cdc.cma.gov.cn).They included: temperature, rainfall, relative humidity and wind speed.Other climatic data needed to drive the crop models, such as evapotranspiration and solar radiation, were derived from these variables.Reference evapotranspiration was calculated using the ET0 calculator [38], whereas solar radiation was estimated from the relationship proposed by [39].Temperatures and rainfall for winter wheat growth seasons of 2012-2013, 2013-2014 and 2014-2015 in Yangling are shown in Figure 2. The weather data were collected at the weather station ID 57034 (34.E), by the China Meteorological Administration and made available on the China Meteorological Data Sharing Service System (CMDSSS, http://cdc.cma.gov.cn).They included: temperature, rainfall, relative humidity and wind speed.Other climatic data needed to drive the crop models, such as evapotranspiration and solar radiation, were derived from these variables.Reference evapotranspiration was calculated using the ET 0 calculator [38], whereas solar radiation was estimated from the relationship proposed by [39].Temperatures and rainfall for winter wheat growth seasons of 2012-2013, 2013-2014 and 2014-2015 in Yangling are shown in Figure 2.
The climate of the Yangling area is defined as semi-arid, with precipitation of around 500 mm per year, usually concentrated in the summer, between June and October, while the winter season is almost devoid of precipitation.Because of this rainfall distribution, winter wheat is mandatorily irrigated.Temperatures vary from a minimum of −4 • C in winter to a maximum of 30 • C in summer, although temperatures below 0 • C are not common, and were limited to a few days in winter for our study period, except for 2012-2013, when the temperatures remained under 0 • C for several weeks between December and January.This means that the period of cold stress is limited to few days.The rainfall for the three crop growth seasons considered was almost completely absent between December and January, increasing in spring, concentrating large amounts of rain in a few days, reaching a peak in summer, and the decreasing until early autumn when the dry season begins.
Remote Sens. 2017, 9, 509 5 of 24 The climate of the Yangling area is defined as semi-arid, with precipitation of around 500 mm per year, usually concentrated in the summer, between June and October, while the winter season is almost devoid of precipitation.Because of this rainfall distribution, winter wheat is mandatorily irrigated.Temperatures vary from a minimum of −4 °C in winter to a maximum of 30 °C in summer, although temperatures below 0 °C are not common, and were limited to a few days in winter for our study period, except for 2012-2013, when the temperatures remained under 0 °C for several weeks between December and January.This means that the period of cold stress is limited to few days.The rainfall for the three crop growth seasons considered was almost completely absent between December and January, increasing in spring, concentrating large amounts of rain in a few days, reaching a peak in summer, and the decreasing until early autumn when the dry season begins.
Field measurements of wheat LAI, biomass and yield, were carried out each year from March to June by staff from the National Engineering Research Centre for Information Technology in Agriculture (NERCITA).Table 1 shows the list of the field measurements dataset.The ground measurement points, georeferenced using a GPS, were distributed in different winter wheat fields throughout an area of about 1200 km 2 surrounding Yangling.In addition to the sampling points reported in Table 1, there were additional points, 10 in 2013 and 26 in 2014, for which data were incomplete (i.e., not all sampling dates were present).These points were excluded from the yield and biophysical variables validation, but were used for the validation of the classification of wheat fields (see Section 2.2.5).Three local wheat cultivars (Xiaoyan22, Xinong9871, and Shanbei139) were planted in the first half of October and harvested at the beginning of June.Field management (weed control, pest management, and fertilizer application) followed local standard practices for wheat production.A different number of ground points was sampled in different years (Table 1).For each ground point, LAI was estimated using the LAI-2000 Plant Field measurements of wheat LAI, biomass and yield, were carried out each year from March to June by staff from the National Engineering Research Centre for Information Technology in Agriculture (NERCITA).Table 1 shows the list of the field measurements dataset.The ground measurement points, georeferenced using a GPS, were distributed in different winter wheat fields throughout an area of about 1200 km 2 surrounding Yangling.In addition to the sampling points reported in Table 1, there were additional points, 10 in 2013 and 26 in 2014, for which data were incomplete (i.e., not all sampling dates were present).These points were excluded from the yield and biophysical variables validation, but were used for the validation of the classification of wheat fields (see Section 2.2.5).Three local wheat cultivars (Xiaoyan22, Xinong9871, and Shanbei139) were planted in the first half of October and harvested at the beginning of June.Field management (weed control, pest management, and fertilizer application) followed local standard practices for wheat production.A different number of ground points was sampled in different years (Table 1).For each ground point, LAI was estimated using the LAI-2000 Plant Canopy Analyzer (LI-COR Inc., Lincoln, NE, USA) collecting data from five points within a 30 × 30 m square.LAI was converted into fractional ground canopy cover (CC) using the equation proposed by [40] from field data collected in the USA, who reported an r 2 of 0.96: (1) Biomass and yield were determined by cutting plants from a 0.25 m 2 area positioned near the point of LAI measurements.All plant samples were oven dried at 70 • C to a constant weight, and final dry weight recorded.Further details on the methodologies of data collection can be found in [33].
The remote sensing dataset was composed by several images acquired by Landsat 8, HJ1A and HJ1B satellites, between February and June of each year (Table 2).All the images were acquired using multispectral sensors, respectively the OLI on board the Landsat 8, and the CCD1 and CCD2 on board the HJ1A and HJ1B.The OLI sensor has seven bands with a spatial resolution of 30 m, with wavelengths ranging between 0.45 µm and 2.29 µm.Only the four bands in the visible and NIR regions were employed as inputs of the artificial neural network algorithm used to calculate the crop biophysical variables (see Section 2.2.2).This was done also because the CCD sensors (on board the HJ satellites) have a similar four bands in the region of visible and NIR, also with a spatial resolution of 30 m. Details on the spectral characteristics of the OLI and CCD sensors can be found in [41,42].
Surface reflectance Landsat 8 images were provided, geographically and atmospherically corrected, by the United States Geological Survey (USGS).The HJ images needed georectification, radiometric calibration and atmospheric correction.The georectification was done by identifying ground control points in GoogleEarth and warping the images using the Image-to-Map registration tool in ENVI [43].The radiometric calibration was performed using gain and offset values provided for each band by the metadata of the respective HJ images.The atmospheric correction was performed using the Fast Line-of-sight Atmospheric Analysis of Spectral Hypercubes (FLAASH) module included in the ENVI software.The altitude of the sensor was set at 650 km and the ground elevation at 500 m (Yangling mean elevation).The mid-latitude winter atmospheric model and the rural aerosol model were used, with an initial visibility between 20 and 30 km, depending on the image.

Data Assimilation Methods
It is possible to divide the methodology adopted (Figure 3) into four steps: in the first step the biophysical variables LAI and CC were estimated; in the second step the assimilation was carried out: LAI values were assimilated into the SAFY model through an updating method based on the EnKF, whereas CC values were assimilated into Aquacrop using the PSO method; in the third step the two methodologies were applied and validated at the field scale using ground measurements; in the fourth step the method was extended to the district scale and validated using official statistical data.Details on each of these steps is provided in the following paragraphs.the fourth step the method was extended to the district scale and validated using official statistical data.Details on each of these steps is provided in the following paragraphs.

LAI and Canopy Cover Estimation
LAI and CC were retrieved using the algorithm developed by [15].This algorithm converts measured spectral reflectances into biophysical variables, by means of artificial neural networks (ANNs) trained over a database of canopy reflectances simulated using the radiative transfer model PROSAIL [16].The PROSAIL input dataset was composed of different kinds of parameters characterizing the crop canopy, sensor characteristics and satellite configuration (Table 3).
Table 3. Input parameters of the PROSAIL radiative transfer model used to generate the training database.The distribution used for each parameters is the truncated Gaussian function defined by: lower (min) and upper bounds (max), mode and the standard deviation (Std).N_C is the number of classes in which the distribution is split.The LAI mode was varied according the period of satellite data acquisition.In the present study, canopy, leaf and soil parameters were set following the experimental results obtained by [20] for winter wheat, except for the mode of the LAI.The latter was set using as reference the field measurements dataset, setting the LAI mode at 1.5 for the images acquired in February and in the first week of March, at 2.5 for the images acquired between March and the beginning of April and at 4 for the images acquired between April and May.

Variable
The sensor characteristics input included the relative spectral response of each single band and four error parameters, accounting for instrumental noise, radiometric calibration and atmospheric correction errors.

LAI and Canopy Cover Estimation
LAI and CC were retrieved using the algorithm developed by [15].This algorithm converts measured spectral reflectances into biophysical variables, by means of artificial neural networks (ANNs) trained over a database of canopy reflectances simulated using the radiative transfer model PROSAIL [16].The PROSAIL input dataset was composed of different kinds of parameters characterizing the crop canopy, sensor characteristics and satellite configuration (Table 3).
Table 3. Input parameters of the PROSAIL radiative transfer model used to generate the training database.The distribution used for each parameters is the truncated Gaussian function defined by: lower (min) and upper bounds (max), mode and the standard deviation (Std).N_C is the number of classes in which the distribution is split.The LAI mode was varied according the period of satellite data acquisition.In the present study, canopy, leaf and soil parameters were set following the experimental results obtained by [20] for winter wheat, except for the mode of the LAI.The latter was set using as reference the field measurements dataset, setting the LAI mode at 1.5 for the images acquired in February and in the first week of March, at 2.5 for the images acquired between March and the beginning of April and at 4 for the images acquired between April and May.

Variable
The sensor characteristics input included the relative spectral response of each single band and four error parameters, accounting for instrumental noise, radiometric calibration and atmospheric correction errors.
The satellite configuration inputs include: geographical position of the target, time and day of year of acquisition, sun and view zenith and azimuth angles.This information is different for each image, so the ANN algorithm was run specifically for each acquired image.For each image a total of 98,304 PROSAIL model simulations were used to train the ANNs, in order to associate a given set of input variables and parameters (characterizing leaf, canopy and geometry of observation) to a reflectance spectrum.After the training, the ANNs were employed to "invert" the process and associate a given spectrum observed by the satellite (corresponding to a pixel and resampled according to the bands of the sensor) to a set of biophysical variables.Five ANNs were employed in this study and the median was used as the biophysical variable estimate.To apply the ANN algorithm, the Matlab (MathWorks, Natick, MA, USA) tool developed by [19] was used.

Crop Growth Simulation Models
As mentioned, two different crop growth models were used in this study: Aquacrop [32] and SAFY [36].
Aquacrop was developed by the Food and Agriculture Organization of the United Nations (FAO) to describe crop response to water and simulates the aboveground biomass production mainly as a function of water productivity (WP): where B n is the cumulative aboveground biomass production after n days (g•m −2 ), Tr i is the daily crop transpiration (mm•day −1 ); ET 0i is the daily reference evapotranspiration (mm•day −1 ); WP* is the normalized crop water productivity (g•m −2 ).The canopy cover (CC) is a crucial state variable of Aquacrop through its expansion, ageing, conductance and senescence, since it determines the amount of water transpired, which determines the amount of biomass produced [32].The final yield is the product of the final biomass multiplied by the harvest index [44].A detailed description of the Aquacrop crop model can be found in [32,44].The input information necessary to run Aquacrop model consists of: daily weather data (maximum and minimum temperature, rainfall and evapotranspiration), soil composition and agronomic management information.A set of more than 100 parameters is used, particularly linked to crop physiological and soil physical processes.The other model used in this work is the simple algorithm for yield estimate (SAFY), developed specifically for remote sensing applications [36].It is based on Monteith's concept [45], a simple theory linking the total dry phytomass production and the photosynthetically active solar radiation (APAR) absorbed by plants.The purpose of the algorithm is to represent the time courses of dry aboveground biomass (DAM), green leaf area index (GLAI) and yield.The daily increase of DAM (∆DAM) is represented by the following equation: where R g is the incoming global radiation, ε C is the ratio of incoming PAR to global solar radiation, ε I is the light interception efficiency (fraction of APAR), ELUE (g•MJ −1 ) is the light use efficiency and F T is a temperature stress function.The ε I depends on the green leaf area index (GLAI) and a light interception coefficient k through Beer's law (4): GLAI development is divided in two phases.From emergence to the beginning of senescence, it increases proportionally with DAM, from senescence to the end of the growth cycle it decreases.In this way biomass (DAM) depends directly on GLAI, but at the same time the daily development of GLAI (∆GLAI) depends on DAM (positive feedback).The grain yield is computed from DAM through a partitioning coefficient of the biomass to grain (Pgro_P2G).A detailed description of SAFY was presented by [36].
In this work, a modified version of SAFY was used, in which a simple water balance was introduced, following the FAO guidelines for computing crop water requirements [46].The same modification had been formerly proposed by [47,48], introducing a dependence of biomass yield on crop water stress.This modification was made since it allows an explicit assessment of the crop response to water availability and the characterization of water stress, which is lacking in the original model.In the present work, the ∆DAM expressed in Equation ( 3) is multiplied by a water stress factor K s , a dimensionless transpiration reduction factor dependent on available soil water, ranging between 0 and 1 [49].K s is calculated from the total available soil water in the root zone (TAW) and readily available soil water in the root zone (RAW), resulting from a simplified water balance driven by crop evapotranspiration.The necessary input to run the modified version of SAFY includes daily information about incoming global radiation, average air temperature, rainfall and reference evapotranspiration.Further details on this modified version of SAFY are reported by [49].
Both models were subjected to a sensitivity analysis [49], in order to identify the parameters that most affected the yield estimated by the models.This procedure is essential for the choice of a reduced set of parameters to target during the assimilation algorithms, leaving all the others at fixed values.Especially for the latter, a preliminary calibration is an important step.The calibration of parameters for both models was done starting from reference values for winter wheat taken from the literature: [36] for SAFY and [50] for Aquacrop.The data on soil properties were obtained from samples gathered in the study site, at the same points where the crop data were collected, which were analyzed in the lab by NERCITA Beijing, for soil texture and organic matter.Crop management input variables were set according to standard local practice.Subsequently, the value of parameters which resulted most influential from the sensitivity analysis were adjusted, by trial and error, so that the time series of simulated biophysical variables (LAI for SAFY and CC for Aqucrop) fit the time series observed in two of the ground sampling points (Table 1).These two sampling points were randomly chosen from field measurements dataset and were subsequently excluded from the validation tests.In all the applications with the SAFY model, a unique calibration was carried out, using two points randomly selected from those collected in 2014, since that was the year with most observation dates.For Aquacrop a single calibration for all the applications of the model was not satisfactory, as is described in the results.For this reason the model was alternatively calibrated also for each year, using a different set of points, chosen from the experimental set collected during the corresponding year.In this case three different calibrations were done for Aquacrop, one for each year in which the field measurements were made.A unique set of parameters and input variables was used for the simulations carried throughout the whole study area.The spatial variability among parameters and inputs was subsequently introduced by means of the assimilation algorithms (described in the following sections), by adjusting some parameters.

Ensemble Kalman Filter for SAFY Model Assimilation
An updating assimilation algorithm based on the EnKF was developed, taking advantage of the accessibility and manageability of the SAFY code.Before applying this method, the model parameters that most influenced the yield were identified from the results of the sensitivity analysis study [50].In Table 4, the list of the nine most influential parameters and their relative range of variation are shown.The initial calibration values of the parameters were defined as described in Section 2.2.2.The range of variation of the selected parameters was chosen by taking the calibrated values as a mean value.The minimum and maximum were then chosen arbitrarily in a surrounding of maximum ±13% of the mean value, according to a realistic range for each parameter based on expert judgement.The EnKF algorithm developed for SAFY in this work is based on the theory of [51], which considers the observations as random variables, therefore adding random perturbations to the observed values.It follows the application to crop models described by [52].It is possible to divide the assimilation algorithm into the following steps: (1) Generation of an ensemble of n = 100 vectors, each one containing the values of the i parameters P i , in this case i = 1, 2, . . ., 9 (Table 4).To each element, corresponding to a nominal parameter value (fixed as the mean of the range shown in Table 4), an error value was added, randomly drawn from a truncated normal distribution N(0, 1), with lower and upper limits as shown in Table 4.
(2) Simulations with the SAFY model in order to obtain a value of LAI for each element of the ensemble at the date when the first satellite image was acquired.An error ε is added to the simulated LAI values.This error ε was randomly generated from a normal distribution N(0, Q), in which the standard deviation Q was arbitrarly chosen as 20% of the LAI value, to take into account the uncertainity of the model.In this way a matrix ϕ t1 was defined: where each column is an element of the ensemble and represents a random configuration of the model using parameters P i and the corrisponding LAI value obtained L n at the time t 1 of the first satellite acquisition.
(3) Generation of the vector M t 1 , where each element is composed by the LAI observed at time t 1 , i.e., retrieved from remote sensing data, plus an error τ j t 1 drawn from N[0, var(τ j t 1 )], where var(τ j t 1 ) is a variance expressing the measurement error, which in our case was inferred from the comparison of the LAI values estimated from remote sensing, with the ground measurements.In this study the measurement error variance was chosen as var(τ t ) = 0.3•M t , taking into account the relative root-mean-square error (RRMSE) found for LAI estimation from our measurements (see Section 3.1).( 4) Computation of the variance-covariance matrix of ϕ t 1 for the 100 ensemble elements.
(5) Calculation of the Kalman gain using the variance-covariance matrix: where: R = (1, 0, 0, . . ., 0) with number of elements equal to the number of ensemble elements.(6) Update of ϕ t 1 as follows: where j is the j-th column of ϕ t 1 .(7) Replacement of LAI and parameters values (the elements of ϕ t 1 ) with those calculated at step 6 (the elements of ϕ t 1 ,K ).( 8) Repetition from step 3 for each satellite observation date.When the last observation has been assimilated, SAFY runs to the end of the crop growth cycle and outputs the yield.

PSO Algorithm for Aquacrop Model Assimilation
The source code of the Aquacrop model was not available to the authors and this prevented the addition of changes in order to use an updating assimilation method, such as the EnKF used for SAFY.For this reason a method of assimilation by calibration [11], PSO, was employed.This method is based on a comparatively simple principle, needs few input parameters, can be easily applied to other models and was shown to have good computation efficiency [37].The basic assumption of PSO is that a group of m particles (m = 15 in this study) flies with certain speeds without quality and size in a d-dimensional search space.In this study m was chosen as equal to 15, which is lower than the number of particles suggested by [53], i.e., 20-50, since these require considerable computing resources which were not available for the present study.However, it has been reported that there are no drawbacks from using a lower number of particles provided that this is higher than the number of parameters [53,54], which was nine in our case.Each particle can modify its position and velocity based on both the best point in current generation (p id ) and the best point of all particles in the swarm (p gd ).The PSO assimilation method is described by [53] and its application for Aquacrop, that can be found in [33][34][35], is summarized in the following steps: (1) the initial position (value) and velocity of each particle was determined.The adjusted parameters were selected from the sensitivity analysis study previously carried out [49] and are listed in Table 5. (2) Aquacrop was run using the executable file (ACsaV40) released by the FAO and controlled by a Matlab script.The simulated CC (CC s ) was calculated from the Aquacrop input including weather data, soil data, crop genotype parameters and management data.(3) A cost function (J) was constructed to assess the discrepancy between CC s and the remote sensing estimated CC (CC r ) as shown in Equation ( 8).The minimization of the cost function determined whether the optimization algorithm had reached the optimum input parameter values.
where CC Si and CC ri are respectively the simulated and remote sensing values of canopy cover at observation date i, N is the number of dates and J is expressed as % (i.e., the unit of CC). ( 4) The best point of the i-th particle (p id ) and of all particles in swarm (p gd ) was sought.The p id and p gd were searched at each iteration.
(5) The position and velocity of particles were updated as shown in the Equation ( 9): where ν id is the best velocity of the i-th particle; k is the index of the iteration; c 1 and c 2 are constants which were assigned a value of 2; ξ and η are random values between 0 and 1; x id is the best position of the i-th particle.(6) If the iteration target was not reached, the updated positions were replaced and the program runs again.In this study the iteration target was fixed at 30 because it is the minimum iteration number indicated in literature [53].Choosing a higher number does not allow to run the script within an acceptable time.(7) If the iteration target was reached, the CC, biomass and yield were calculated.

Spatialized Application
The application at the district level was done for the winter wheat growth seasons of 2013 and 2014.These two years were chosen because official yield data, used for validation, were made available.Information was provided by two official statistical sources: the National Bureau of Satatistics of the People's Republic of China (which provided data for the year 2013) and the Shaanxi Provincial Bureau of Statistics (which provided those for 2014).
For each year, the biophysical variables values retrieved from the images acquired between March and May were used in the assimilation, which was run on a pixel-by-pixel basis.For each image, only the pixels belonging to the wheat fields included in the rural area of Yangling were considered.To identify the pixels in which wheat was present, 5 Landsat 8 images acquired between February and June of each year were used.The images were converted into LAI maps through the PROSAIL-ANN algorithm (Section 2.2.2).A mask was built using agricultural fields boundaries obtained from a vector file provided by the National Engineering Centre for Information Technology in Agriculture.In this way all the pixels that fell out of the agricultural areas were excluded.
The remaining pixels of each image were classified as wheat if the following conditions occurred, representing the typical LAI time trends of winter wheat: -LAI of the first date smaller than LAI of second date and the latter smaller than LAI of third date; -LAI of third date higher than 3; -LAI of fourth date higher than 3.4; -LAI of fifth date smaller than 0.5 (after harvest).
Not all the pixels included within the boundaries of some fields were classified as wheat, but when the majority, i.e., 80%, of pixels included within the boundaries of a field were classified as wheat, the whole field was considered as such.The classification results were checked against ground truth data.There were 59 points (i.e., ground verified wheat fields) available in 2013 and 61 points in 2014, considering also incomplete data sampling points, in addition to those reported in Table 1.Of these, 57 fields were correctly identified as wheat in 2013 and 58 in 2014, with an overall accuracy of 96%.
To reduce the computational cost of the spatialized assimilation algorithms, a data binning procedure based on LAI value was employed so that the model was run only once for similar pixels.The LAI values in the images, retrieved from remote sensing, were rounded to the first decimal place and, if the number in that place was odd, it was rounded to the nearest lower even number.In this way, the range between the minimum and maximum values of LAI was divided into intervals of 0.2.The total number of LAI classes is the product of the number of intervals between the minimum and the maximum feature value of LAI for each image.Each class is defined by the combination of three numbers, 0.2 multiples, belonging to the range min LAI-max LAI of each image, for the three dates.In this way, after running the algorithm for all the combination of LAI, it is possible to assign a value of yield to each class.Classifying the points of the map at each pixel, the corresponding yield value was assigned.In this way it was possible to obtain yield maps of the region, by running the algorithm only for the number of identified classes (around 1800) and not for the roughly two million pixels of the region of interest.A similar data binning procedure was employed for the images representing the CC, but in this case the range of variation between a class and the next was not constant, because of the non-linearity of the relationship between LAI and CC (Equation ( 1)).Furthermore the ranges were enlarged in order to reduce the number of classes, because the computational cost of PSO-Aquacrop imposed a limited number of iterations.The 1800 bin classes identified for LAI were thus reduced to 100 for the CC.A look-up-table procedure was finally employed to assign the results of the simulations to the respective pixels after the assimilation.

LAI and CC Estimation and Validation
LAI and CC were estimated using the ANN algorithm described in Section 2.2.2 for all the images acquired by HJ-1A, HJ-1B and Landsat 8 (Table 2) in the 3 years (from 2013 to 2015).Some acquisition dates of the images were near, i.e., within 1 to 4 days, the dates when ground measurements were performed.For these images, the ground sampling points were identified according to their GPS coordinates, and measured and estimated LAI and CC were compared.
Figure 4 shows the comparison between LAI estimated from the satellite images employing the ANN algorithm and LAI measured during the field campaigns.The overall relative root mean square error (RRMSE) for LAI for the three years was 30%.The root-mean-square error (RMSE) and RRMSE for the three years examined, both for CC and LAI, are shown in detail in Table 6.The estimation error is larger for higher values of LAI, in fact for LAI between 0 and 2 the difference between estimates and measures is small (Figure 4), while particularly for LAI values between 3 and 4 the error tends to be higher.These results indicate that the algorithm tends to underestimate LAI, especially for 2013, for which the results were worse that for others years (Table 6).
Figure 5 shows the results obtained for CC estimation.In this case the overall RRMSE is 9%, i.e., lower than for LAI (Table 6).Saturation affects less the estimation of CC, even if the ANN algorithm tends also in this case to underestimate this biophysical variable.The results obtained for 2015 present the smallest RRMSE, while the results for 2013 and especially 2014 provide a much worse estimation.
square error (RRMSE) for LAI for the three years was 30%.The root-mean-square error (RMSE) and RRMSE for the three years examined, both for CC and LAI, are shown in detail in Table 6.The estimation error is larger for higher values of LAI, in fact for LAI between 0 and 2 the difference between estimates and measures is small (Figure 4), while particularly for LAI values between 3 and 4 the error tends to be higher.These results indicate that the algorithm tends to underestimate LAI, especially for 2013, for which the results were worse that for others years (Table 6).Figure 5 shows the results obtained for CC estimation.In this case the overall RRMSE is 9%, i.e., lower than for LAI (Table 6).Saturation affects less the estimation of CC, even if the ANN algorithm tends also in this case to underestimate this biophysical variable.The results obtained for 2015 present the smallest RRMSE, while the results for 2013 and especially 2014 provide a much worse estimation.

Assimilation Results: Field Scale Application
EnKF for SAFY and PSO for Aquacrop were tested at the field scale, validating the results with the ground yield measurements. Figure 6a shows the comparison between measured and simulated yield, resulting from the application of the EnKF method to the SAFY model, for which a unique preliminary calibration of model parameters had been done, as described in Section 2.2.3.An overall RRMSE of 18% was obtained for SAFY, with values ranging from 15 to 20% in the different years

Assimilation Results: Field Scale Application
EnKF for SAFY and PSO for Aquacrop were tested at the field scale, validating the results with the ground yield measurements. Figure 6a shows the comparison between measured and simulated yield, resulting from the application of the EnKF method to the SAFY model, for which a unique preliminary calibration of model parameters had been done, as described in Section 2.2.3.An overall RRMSE of 18% was obtained for SAFY, with values ranging from 15 to 20% in the different years (Table 7).The RRMSE for EnKF with the SAFY model was 18%, while for PSO with the Aquacrop model it was 24%, for the data of all years pooled together.The performance of Aquacrop was particularly poor, as compared to SAFY, for 2013 data (Table 7).Figure 6b,c both show the comparison between measured and simulated yields resulting from the application of the PSO assimilation method to the Aquacrop model.In Figure 6b the simulations were preceded by a single calibration of the model (using two data points from 2014), while in Figure 6c the model had been previously calibrated for each year considered.
The application of PSO with the Aquacrop model with only one calibration showed a rather limited range of simulated yield values. i.e., between 4 and 6 t•ha −1 , with an RRMSE of 0.27, whereas when a calibration was carried out for each year, the range of simulated yield variation was wider, i.e., between 3 and 7 t•ha −1 , with a small decrease of RRMSE.In the case of a single calibration no correlation was found (r = −1.04,n.s.) between simulated and measured values, whereas when yearly calibrations were performed a low, albeit statistically significant, correlation (r = 0.34, p < 0.01) was found between measurements and estimates.For SAFY the correlation (r = 0.53, p < 0.01) was higher.
Overall, for the scenarios analyzed, the assimilation using the EnKF method with the SAFY model estimated the yield similarly or better than the PSO with Aquacrop.The computational time required by the two methods is very different, though.Using a PC with an Intel Core i7-4770 CPU at 3.4 GHz, the run time of PSO with Aquacrop was around 40 min for each simulated point, while for EnKF with SAFY it was 30 s for each point.

Assimilation Results: Spatialized Application
The district scale yield estimation was limited to part of the Yangling administive area for which it was possible to obtain official data about the production per hectare, the total production and the surface area sown with winter wheat.The district scale application was limited to fields  Figure 6b,c both show the comparison between measured and simulated yields resulting from the application of the PSO assimilation method to the Aquacrop model.In Figure 6b the simulations were preceded by a single calibration of the model (using two data points from 2014), while in Figure 6c the model had been previously calibrated for each year considered.
The application of PSO with the Aquacrop model with only one calibration showed a rather limited range of simulated yield values. i.e., between 4 and 6 t•ha −1 , with an RRMSE of 0.27, whereas when a calibration was carried out for each year, the range of simulated yield variation was wider, i.e., between 3 and 7 t•ha −1 , with a small decrease of RRMSE.In the case of a single calibration no correlation was found (r = −1.04,n.s.) between simulated and measured values, whereas when yearly calibrations were performed a low, albeit statistically significant, correlation (r = 0.34, p < 0.01) was found between measurements and estimates.For SAFY the correlation (r = 0.53, p < 0.01) was higher.
Overall, for the scenarios analyzed, the assimilation using the EnKF method with the SAFY model estimated the yield similarly or better than the PSO with Aquacrop.The computational time required by the two methods is very different, though.Using a PC with an Intel Core i7-4770 CPU at 3.4 GHz, the run time of PSO with Aquacrop was around 40 min for each simulated point, while for EnKF with SAFY it was 30 s for each point.

Assimilation Results: Spatialized Application
The district scale yield estimation was limited to part of the Yangling administive area for which it was possible to obtain official data about the production per hectare, the total production and the surface area sown with winter wheat.The district scale application was limited to fields classified as winter wheat in the different years as described in Section 2.2.5.From Figures 7 and 8, it is apparent that a large part of the area in which winter wheat was cultivated in 2013 was occupied by other crops in 2014, as a result of the crop rotation practices adopted in the area.
Figure 7 shows the winter wheat yield maps estimated for 2013 and 2014 using the EnKF method with the SAFY model.
Remote Sens. 2017, 9, 509 16 of 24 Figure 7 shows the winter wheat yield maps estimated for and 2014 using the EnKF method with the SAFY model.The official statistical data (provided by Shaanxi Provincial Bureau of Statistics) for 2013 reported an average yield of 5.5 t•ha for the area and a total production of 1.09 × 10 4 t for the whole wheat surface area, estimated at 1980 ha.The method used for the classification (Section 2.2.5) of wheat cultivation area, exploiting also the knowledge of agricultural field boundaries, provided an estimate of wheat sowing area of 1983 ha, i.e., very accurate.The average yield per unit surface estimated using the EnKF method with the SAFY model, resulted to be 4.98 t•ha −1 , and the total production for the Yangling administrative area was of 0.99 × 10 4 t, i.e., about a 9.4% less than the official statistics yield.For 2014, the official statistical data provided by the National Bureau of Statistics of the People's Republic of China reported a mean yield of 6.13 t•ha −1 , for a total production of 0.93 × 10 4 t in the whole wheat cultivation area, estimated at 1470 ha.For the 2014 the wheat area classification results were less accurate as for the 2013: the wheat sowing area estimated was of 1564 ha, overestimated by about 7% compared to official statistical data.The estimated mean yield was of 5.67 t•ha −1 , and the total production estimated was 0.82 × 10 4 t, so for 2014 there was an underestimation relative to the official statistics of about 12.2%.For the same reference wheat sowing area, the yield was also estimated using the PSO assimilation method with the Aquacrop model (Figure 8).
All the estimates obtained with Aquacrop shown in Figure 8 were obtained with the model calibrated for each year before the application of the assimilation algorithm.Using the PSO with Aquacrop, the average yield per unit on the studied area was of 5.56 t•ha −1 , very close to the official statistics value, whereas the total production estimated was of 1.10 × 10 4 t, with an estimation error of The official statistical data (provided by Shaanxi Provincial Bureau of Statistics) for 2013 reported an average yield of 5.5 t•ha −1 for the area and a total production of 1.09 × 10 4 t for the whole wheat surface area, estimated at 1980 ha.The method used for the classification (Section 2.2.5) of wheat cultivation area, exploiting also the knowledge of agricultural field boundaries, provided an estimate of wheat sowing area of 1983 ha, i.e., very accurate.The average yield per unit surface estimated using the EnKF method with the SAFY model, resulted to be 4.98 t•ha −1 , and the total production for the Yangling administrative area was of 0.99 × 10 4 t, i.e., about a 9.4% less than the official statistics yield.For 2014, the official statistical data provided by the National Bureau of Statistics of the People's Republic of China reported a mean yield of 6.13 t•ha −1 , for a total production of 0.93 × 10 4 t in the whole wheat cultivation area, estimated at 1470 ha.For the 2014 the wheat area classification results were less accurate as for the 2013: the wheat sowing area estimated was of 1564 ha, overestimated by about 7% compared to official statistical data.The estimated mean yield was of 5.67 t•ha −1 , and the total production estimated was 0.82 × 10 4 t, so for 2014 there was an underestimation relative to the official statistics of about 12.2%.For the same reference wheat sowing area, the yield was also estimated using the PSO assimilation method with the Aquacrop model (Figure 8).
All the estimates obtained with Aquacrop shown in Figure 8 were obtained with the model calibrated for each year before the application of the assimilation algorithm.Using the PSO with Aquacrop, the average yield per unit on the studied area was of 5.56 t•ha −1 , very close to the official statistics value, whereas the total production estimated was of 1.10 × 10 4 t, with an estimation error of about 118 t.For 2014 the same procedure was applied, and the yield per unit area estimated with PSO and Aquacrop was of about 5.97 × 10 4 t, while the total production estimated for this year was of 0.93 × 10 4 t. Figure 9 summarizes the comparison between assimilation results and official statistics for the Yiangling administrative area.Total production (10 4

Discussion
The first step of this work was the retrieval, from remote sensing data, of the biophysical variables to be used in the assimilation procedures.The biophysical variables considered were LAI and CC, both estimated using the ANN algorithm as described in Section 2.2.2.The results shown in Figure 4 indicate that overall, the algorithm tended to underestimate the value of LAI.The comparison between estimated values and ground measurements highlighted that, with the exception of small values (between 0 and 2), the points underestimated were more than those overestimated, particularly for values of LAI between 2 and 4, range in which the error was higher than the average RRMSE.This error distribution was partly due to the characteristics of ANN algorithm, which tended to worsen the estimation of winter wheat LAI with increasing values, because of the well-known issue of saturation of the canopy reflectance [19].Using HJ-1 data, authors of [55] obtained better results than ours, for a larger ground dataset collected in the same area of Yangling in 2014 (n = 80), with RRMSE values as low as 21% for optical vegetation indices.However, it should be noted that the methods used by these authors are based on empirical calibrations, with a prevailing local validity, whereas, in our case, the ANN method is of general application, not requiring preliminary calibrations [19,20].Authors of [56] assessed the performance of the same ANN-based algorithm used in the present study, with SPOT4 HRVIR and Landsat 8 data, obtaining an RMSE of 0.74 for winter wheat, better than the RMSE values found in our study (Table 6) for 2013 and 2014, respectively of 1.48 and 1.03, but similar to the that obtained in 2015, i.e., 0.75.It should be noted that in the dataset used by the authors of [56], only two data points had LAI values higher than 3 and the maximum LAI was 4.5.Conversely, in our dataset, many points had LAI values well above 3, reaching up to a maximum value of almost 8 (Figure 4).
A considerable impact on the LAI estimation accuracy might have been brought about also by the quality of the remote sensing data.The region of interest is strongly characterized by the presence of clouds and haze, so the probability to acquire clear images is very low, especially during spring, when rainfalls are frequent.The atmospheric correction for these images was difficult and it strongly influenced the reflectance and consequently the estimation of LAI.With such restrictive atmospheric conditions, it is not surprising that radar data can provide better estimates of LAI than optical data [55].Indeed, authors of [55], using empirical calibrations, achieved better results when they included regressors in the estimation of LAI, radar polarimetric parameters obtained from RADARSAT-2, alone or coupled with vegetation indices from HJ-1.In the latter case they achieved a RRMSE of 17.4% when using partial least squares regression.
The consideration about the influence of the atmosphere on the quality of input data for the Yanling area is also true for CC, but in this case the RRMSE between RS estimation and ground measurements was smaller than for LAI, with a RMSE of 7.2% and a RRMSE of 9% for the three years.This is due to the fact that CC is less subject to a saturation at high values than LAI.The observation period was between March and May, when the variation of LAI was between 0.5 and 7, corresponding to a variation of CC between 40% and 91%.By the end of March, LAI was higher than 3, which corresponded to a CC of 80%, so for most measurements dates CC varied between 80% and 91%.In such a small range of values, the ANN algorithm could estimate CC changes with a higher accuracy.In this case our results were better than those achieved by [56], who obtained a RMSE of 19% for winter wheat, whereas in our study RMSE values ranged between 5.1% and 7.3% in the three years.Our results are also better than those of [33], who achieved a RMSE of 15.8% and an RRMSE of 19.6% when using only HJ-1 data.These authors obtained better results, comparable to ours, when they used, in empirical regression models, coupled optical and radar data as regressors, reaching a RMSE of 8.8% and a RRMSE of 10.9%, confirming the above considerations on atmospheric effects.Again, it should be remembered that we used a general purpose model-based algorithm, i.e., not requiring a local calibration, unlike [33].Part of the differences in the accuracy of estimation of CC could also be attributed to the equations used to convert ground measurement of LAI into CC.We used an equation proposed in [40] after a specific study on wheat, which showed that it performed better than the equation proposed by [57] for maize, which was also employed in [33].
The estimation of LAI and CC was aimed at the application of assimilation methods for estimating yield, thus the error in their estimate could influence the results of the assimilation.In particular, the EnKF method takes explicitly into account the error in the observations to be assimilated into the model, which was set in our case to a value of 30%, in agreement with the RRMSE actually obtained for LAI (Figure 4).This sort of error is expected to have a remarkable effect on the accuracy of the assimilation results [2].
SAFY and Aquacrop are two growth crop models with rather different characteristics.The former is a simple algorithm, developed specifically for remote sensing applications, with a reduced number of influential parameters and strongly dependent on the scenario characteristics [49].The latter is more complex than SAFY; the number of influential parameters is higher and the influence of scenario characteristics is lower than for SAFY.The strength of Aquacrop is its capability of taking into account crop physiological processes related to water stress, which are ignored by SAFY, but of great interest for the estimation of crop water requirements and yield response to drought from remote sensing [7,31].However, despite being less complex than other crop models such as e.g., EPIC [22] or CERES [26], the larger number of parameters of Aquacrop makes it harder to calibrate and slower to run than SAFY, therefore making it less suitable for an assimilation application.
The application of two different assimilation algorithms to the two models, EnKF on SAFY and PSO on Aquacrop, allowed to obtain reasonable yield estimates both at the field and at the district scale.In the field application, the accuracy of SAFY with the application of EnKF was similar or higher than for Aquacrop with PSO.The estimated yields obtained with EnKF and SAFY varied in a range between 3 and 8 t•ha −1 , while for PSO with Aquacrop they varied between 4 and 7 t•ha −1 .The first method estimated with a good approximation low yield values, but the estimation accuracy decreased with the increasing yield and above 6 t•ha −1 an underestimation was apparent.Instead, the second method overestimated values of yield lower than 4 t•ha −1 and underestimated values over 6 t•ha −1 .Both methods showed the tendency to saturate at values higher than 6.5 t•ha −1 .
Both methods were initially applied by using a single preliminary calibration for the entire dataset, i.e., the three different winter wheat growth cycles.However, in such case, the yields estimated from the PSO with Aquacrop had a rather small range of variation (Figure 6b), looking rather insensitive to scenario changes.A different calibration for each climate dataset was necessary (Figure 6c), in order to have comparable results with those obtained applying the EnKF to SAFY, the latter being calibrated only once for all the years.In [34], PSO method for the assimilation of crop biomass was used, estimated from field based spectral reflectance measurements, into the Aquacrop model, and employed a quite extensive winter wheat dataset collected over 4 years.During the PSO assimilation the authors recalibrated only three parameters (canopy growth coefficient, maximum canopy cover and maximum effective rooting depth), but no information is provided on how the other parameters were set.They achieved an RMSE of 0.65 t•ha −1 for the whole validation dataset.
Jin et al. [35] used the same field dataset and methodology as in [34], but during the PSO assimilation they recalibrated eight Aquacrop parameters: increase of canopy cover per growing degree day (cgc), maximum canopy cover (ccx), decrease of canopy cover per growing degree day (cdc), growing degree days from sowing to emergence (eme), number of plants ha −1 (num), soil water depletion factor for canopy senescence (psen), shape factor for water stress limiting stomatal conductance (pstoshp), and maximum effective rooting depth (rootdep).Also, in this case no information was provided on the other parameters and input variables.They achieved RMSE values for yield estimates ranging from 0.51 to 0.61 t•ha −1 for the different years and RRMSE from 8.8 to 10.7%, i.e., much better than our RRMSE value of 24%.However these results were obtained using ground-based hyperspectral measurements.The authors of [33] used HJ-1 data, acquired in 2014 in Yangling, for estimation of biomass or canopy cover, which was then used for the assimilation into the Aquacrop model using the PSO method.In this case, the yield estimation accuracy was lower, with an RMSE of 1.24 t•ha −1 and a RRMSE of 26%, comparable with our results.Since, for these authors, the estimation of biomass was more accurate than that of CC; because of the above mentioned issues, they obtained better results when they used biomass for the assimilation, with a RMSE of 0.92 t•ha −1 and a RRMSE of 19.4%.Even better results were achieved by [33] when they combined optical and radar data for the estimation of the biophysical variables to be used for the assimilation, reaching a RMSE of 0.81 t•ha −1 and a RRMSE of 17%.This confirms that the use of radar data could partially overcome the above mentioned difficulties with the atmospheric effects on optical data for the Yangling area.
In the regional scale application, both methods simulated the mean and total yield of the Yangling administrative area with a good approximation, as compared to official statistics.The wheat classification procedure was quite successful and it showed the spatial distribution of growing areas, interestingly revealing the effect of year-to-year variation due to crop rotation practices.Yield estimation results were generally consistent with those obtained with the field scale application.For PSO with Aquacrop the accuracy in the spatialized application was higher than expected from field scale results.It should be noted that for this method it was necessary to recalibrate the fixed parameters and the ranges of parameters optimized during the assimilation, for each observed crop growth cycle.For EnKF with SAFY, conversely, the same calibration was used for all the analyzed crop growth cycles.Because of the computational limits of Aquacrop, the classes in which the pixels were binned were reduced from about 1800 used in EnKF-SAFY to 100 for PSO-Aquacrop.This reduction was one of the causes of the lower spatial variation of yield for Aquacrop (Figure 8) as compared to SAFY (Figure 7).Another cause was the low variation of CC, and the input data for PSO-Aquacrop, which affected the output.
Despite the error in the input data, found for both LAI and CC, and the limited number of remote sensing observation dates, the estimation of yield was reasonable.For EnKF with SAFY this depended on the characteristics of the Kalman Filter.This method, in fact, "filtered" the error on the measurements, introduced as a random effect, reducing its influence on the output.The effect of the error can be also reduced by increasing the number of ensemble, but this inevitably increases the computational time.For this method the choice of measurements error variance and the number of ensembles is fundamental [51].
For PSO with Aquacrop, the input error of CC was lower than for SAFY, but the yield estimation accuracy seemed to be more affected by the calibration of the model for each climatic scenario.This method thus resulted to be strongly dependant on the initial calibration, which resulted rather complex because of the high number of parameters which described the model.This dependence from calibration combined with the inaccessibility of the Aquacrop code and with the high computational cost, made the PSO method with Aquacrop less suitable for an operational application as an assimilation algorithm.
The value of RRMSE obtained with the assimilation based on the EnKF with SAFY ranged, in the present study, between 15% and 20% among the years, with a value of 18% for the whole dataset.
Veloso [48], using a modified SAFY version including CO 2 fluxes, and recalibrating seven parameters assimilating LAI retrieved from remote sensing, obtained RRMSE values between 12% and 24% for wheat yield estimation.It should be noted that a much larger number of observations of LAI from remote sensing acquisitions were available in [48] as compared to the present study.
In the literature [22][23][24][25][26][27][28] the RRMSE value found by applying different methods of assimilation in different models was reported to be between 5% and 16%, therefore quite lower than ours.However, it is important to consider that some of these studies [24,25,27] employed more than 20 observations of biophysical variables in the assimilations for each crop growth cycle, while other studies [4,26] used less images (around 10) at a higher resolution and with a lower error on the biophysical variables estimations.The method tested in this study, instead, assimilated a reduced number of images (just three or four) for each crop growth cycle and with an estimation error of about 20%, reaching levels of accuracy in the estimation of the yield comparable with the studies encountered in the literature.Casa et al. [2] showed that the frequency of available observations, but also their distribution along the crop growth cycle has an important influence on the accuracy of model estimation, when used for model forcing.

Conclusions
This study demonstrated the possibility to estimate the winter wheat yield, at field and district level scale, through the use of assimilation methods.Two different approaches were tested, one based on an updating assimilation method, the EnKF, employed for the assimilation of LAI into SAFY, and another based on a calibration assimilation method, the PSO, used to assimilate the CC into Aquacrop.The tests at the field scale showed the feasibility of using medium resolution (30 m GSD) satellite data, such as from HJ1A/B or Landsat 8 OLI images, to estimate yield, with potential applications for precision agriculture.
However, the results obtained with Aquacrop were less accurate as compared to other methods of assimilation for calibration encountered in the literature [22,23,28].This is mainly due to the characteristics of Aquacrop, for which is indispensable to carry out a very accurate calibration.Nevertheless, the application at the district scale of PSO assimilation with Aquacrop, was in agreement with the yield provided by official statistics.This occurred because Aquacrop was recalibrated for each climate dataset.Furthermore, to meet the high computational requirements of this assimilation method for calibration, a coarser binning of pixels, according to the CC temporal series, was applied, forcing a reduced number of classes.In this way the range of variation of simulated yields was rather limited, leading to accurate results but responding less to spatial or temporal variation of yields.The high computational cost, the difficult calibration and the need to recalibrate for each year of climate datasets makes, therefore, the PSO method applied to Aquacrop less efficient.
The performance of the EnKF with the SAFY model, was comparable with the results of others upgrading assimilation method encountered in literature [24][25][26][27][28], which is especially encouraging when considering that, despite the limited number of remote sensing images, i.e., three or four, with an error of LAI estimation of 30%, the error on the yield estimation was around 18%.This allows the application of this upgraded assimilation method in areas of the world where it is not possible to obtain a large number of images for each crop cycle.
The encouraging results of the present work obtained with SAFY should be confirmed with further studies with other validations, possibly with experiments that provide more frequent field measurements, a larger variety of climatic and environmental datasets and a higher quality of remote sensing image collection.

Figure 1 .
Figure 1.Map of China showing the location of the test area of Yangling (Shaanxi Province).

Figure 1 .
Figure 1.Map of China showing the location of the test area of Yangling (Shaanxi Province).

Figure 4 .Figure 4 .
Figure 4. Validation results for the retrieval of LAI from Huan Jing satellites HJ1A, HJ1B and Landsat 8 images, using field measurements of 3 years in Yangling rural area.

Figure 5 .
Figure 5. Validation results for the retrieval of canopy cover (CC) from HJ1A, HJ1B and Landsat 8 images, using field measurements of 3 years in the Yangling rural area.The field measured LAI was converted into CC using Equation (1).

Figure 5 .
Figure 5. Validation results for the retrieval of canopy cover (CC) from HJ1A, HJ1B and Landsat 8 images, using field measurements of 3 years in the Yangling rural area.The field measured LAI was converted into CC using Equation (1).

Figure 6 .
Figure 6.Comparison between measured and wheat yield using (a) the EnKF method with the SAFY model after a unique general calibration for all years; (b) the PSO method with the Aquacrop model after a unique general calibration for all years; and (c) the PSO method with the Aquacrop model after specific calibrations performed for each year.

Figure 6 .
Figure 6.Comparison between measured and simulated wheat yield using (a) the EnKF method with the SAFY model after a unique general calibration for all years; (b) the PSO method with the Aquacrop model after a unique general calibration for all years; and (c) the PSO method with the Aquacrop model after specific calibrations performed for each year.

Figure 8 .
Wheat yield map (t•ha −1 ) for Yangling, estimated using the PSO assimilation method with the Aquacrop model for 2013 (a) and 2014 (b).

Figure 9 .Figure 9 .
Figure 9.Comparison of wheat production estimated by official statistical surveys (black), and from assimilation using EnKF with SAFY (white) and PSO with Aquacrop (grey).

Table 1 .
Field measurements of LAI (L), biomass (b) and yield (y) conducted in Yangling from 2013 to 2015.n. pts are the number of sampling points where ground data were collected.

Table 1 .
Field measurements of LAI (L), biomass (b) and yield (y) conducted in Yangling from 2013 to 2015.n. pts are the number of sampling points where ground data were collected.

Table 4 .
[50] of the most influential parameters of the SAFY model allowed to vary during the assimilation, identified from a previous global sensitivity analysis study[50]in order of decreasing importance.APAR: absorbed photosynthetically active radiation; DAM: dry above-ground biomass.

Table 5 .
[49] of the most influential parameters of the Aquacrop model allowed to vary during the assimilation, identified from a previous global sensitivity analysis study[49], in order of decreasing importance.

Table 6 .
Statistics resulting from the estimation of LAI and canopy cover (CC) from Landsat and HJ1 images assessed against ground measurements.RMSE: root-mean-square error; RRMSE: relative root-mean-square error.

Table 6 .
Statistics resulting from the estimation of LAI and canopy cover (CC) from Landsat and HJ1 images assessed against ground measurements.RMSE: root-mean-square error; RRMSE: relative root-mean-square error

Table 7 .
Statistics resulting from the estimation of yield using two different assimilation strategies: the EnKF for SAFY and the PSO for Aquacrop (only the case of yearly calibration results are shown).

Table 7 .
Statistics resulting from the estimation of yield using two different assimilation strategies: the EnKF for SAFY and the PSO for Aquacrop (only the case of yearly calibration results are shown).