1. Introduction
It is increasingly necessary to develop new methods that allow, in a simple and non-invasive way, the monitoring of large agricultural areas at sufficiently high spatial resolution to improve crop management at field scale, over the full crop cycle and even with a daily frequency of crop growth status observation, e.g., for irrigation management [
1]. In order to improve production efficiency by developing data-driven management practices in agricultural areas, it is necessary to approach the problem from two different points of view—one that considers as much as possible the variation in space of the main biophysical characteristics of the crop and the soil, and another one that considers the temporal patterns at frequent intervals, i.e., to satisfy the necessity to frequently monitor over time these characteristics.
It is widely demonstrated in literature that the first objective can be achieved through the use of high resolution Earth Observation (EO) data, e.g., refs. [
2,
3,
4,
5], which, thanks to the use of increasingly advanced and accessible satellites, allow monitoring over large areas at increasingly higher spatial resolutions and ever smaller time intervals. An example of this kind of satellites are the constellations of satellites made available by the Copernicus programme [
6].
Temporal monitoring on a daily scale, not yet possible at reasonably affordable costs through EO data with high spatial resolution, can instead be obtained by using crop growth models, as proven by the vast literature available on this subject, e.g., refs. [
7,
8,
9,
10]. The use of crop growth models also allows the estimation of variables that are not directly observable through sensors, thus offering a more complete and efficient monitoring of crops. The two tools are increasingly used in combination through data assimilation (DA) methods [
2,
11,
12,
13,
14,
15], of which particularly the authors in [
2] provide an excellent overview of the methodologies developed so far. This review analysed over 40 studies carried out in the last 15 years, verifying a preference in the use of variational approaches. It also highlighted the fundamental points that characterize the good application of EO data assimilation methods into crop growth models, including the importance of correctly calibrating the model parameters, the need to estimate crop variables to be assimilated (using EO data) with high accuracy, high spatial resolution and high temporal frequency. Furthermore, this review identified some critical issues, such as the lack of correlation of observations from different sensors and accurate estimations of parameter uncertainty. The analysis of [
2] concludes by stating that the use of crop models and EO data through data assimilation methodologies are the basis of future crop monitoring systems because they are able to reciprocally remedy the corresponding limitations of each.
The literature is full of examples that deal with combining crop growth models with EO data through assimilation techniques, in particular aimed at estimating crop yield. A difficulty highlighted in several studies (e.g., refs. [
13,
14,
16,
17,
18,
19]) is the possibility to obtain spatialized in situ measurements and simultaneously high-resolution data, essential for proper validation at field scale.
Awad et al. [
19], for example, solved the problem of insufficient EO data with high spatial and temporal resolution by combining the few available high resolution data with lower resolution images that have a considerably higher temporal resolution. The authors assimilated crop variables (such as LAI, biomass and evaporative fraction) estimated by EO data in a mathematical model developed especially for that study based on the equation of Monteith [
20].
Kang et al. [
14] proposed a two spatial scale level approach—a first level at county scale, in which a Markov Chain Monte Carlo algorithm is used to recalibrate uncertain and sensitive model parameters. In addition, there is a second level at the field scale, in which the information retrieved from the first level is used to set the model in which the LAI retrieved from Landsat-8 data is assimilated using an Ensemble Kalman filter DA method.
This study also highlighted the importance of using models that, besides allowing the estimation of yield as a function of the LAI, also consider multiple aspects that influence the development of crops. It focuses on the importance of knowing water consumption, and in particular of quantifying the phenomena of soil evaporation and transpiration of vegetation in agricultural areas. For this aim, the authors propose a modified version of the Simple Algorithm For Yield (SAFY), a simple crop growth model developed by [
21] specifically for applications with EO data to which a soil water balance is added. This allows for simulating variables that influence the crop growth both in a direct way (such as LAI and biomass), and in a less direct way (such as the evapotranspiration).
The importance of estimating evapotranspiration (ET) is confirmed by many studies, such as [
22,
23,
24,
25,
26,
27] because its monitoring allows for implementing irrigation reducing water waste, optimizing the yield and preventing the risks of droughts. In response to these observational requirements, ESA, NASA and CNES/ISRO [
28,
29,
30] plan the development of missions that can estimate the Land Surface Temperature (LST) at high spatial resolution and at high temporal frequency, through which it is possible to indirectly estimate the ET [
28].
Research has been conducted for evapotranspiration monitoring using EO data for several decades, such as [
31,
32,
33,
34], increasing the accuracy of the estimates with the improvement of available technologies. For these methods, the temporal frequency of the production of evapotranspiration maps is linked to the satellite revisit time. More and more studies are proposing alternatives to support the use of EO data to obtain daily estimates of actual ET [
25].
Stancalie et al. [
24], for example, proposed a method for the estimation of actual ET maps using data acquired by NOAA–AVHRR and validating the results with the value of ET estimated by models, noting the difficulty of obtaining real spatial data measured in situ. Another study presented by [
26] proposed the use of EO data acquired through Deimos-1 and Landsat-8 in synergy with a Penman–Monteith equation [
35] based model, obtaining encouraging results. The validation was carried out based on 18 sites for an analysis at regional scale. The study concludes by encouraging the use of similar methodologies also for higher spatial resolution, using modern sensors such as those aboard Sentinel-2.
A detailed review of the studies that are focused on estimating ET using EO data is provided by [
25]. The authors divided the studies carried out so far into four main methodologies: “(1) methods that involve the use of statistically-derived relationships between ET and vegetation indices such as the Normalized Difference Vegetation Index (NDVI) or the Fractional Vegetation Cover (FVC); (2) physical models that calculate ET as the residual of Surface Energy Balance (SEB) through remotely sensed thermal infrared data; (3) other physical models that involve the application of the combination of Penman–Monteith and Priestley–Taylor types of equations and (4) data assimilation methods adjoined to the heat diffusion equation (and through the radiometric surface temperature sequences).” [
25].
The main purpose of our study is to improve daily crop evapotranspiration estimates, by exploiting the assimilation of EO data into crop growth models. The proposed methodology attempts to address the current limitations of the EO capacities.
The main difficulties that emerged from the literature in this area were:
providing information on the growth state of crops with high spatial resolution and high temporal frequency to encourage practical applications,
the complex calibration of model parameters,
the high computational cost of DA methodologies,
the scarce availability of spatialized ground data and at the same time frequently collected during the crop cycle.
This paper proposes a methodology that addresses these problems, offers a solution to reduce the encountered limitations and sets the foundations for possible future developments. The model proposed here was designed to be easily adapted to the monitoring of other biophysical quantities of crops, such as biomass or yield. Although it is mainly focused on ET estimation, preliminary results obtained from synthetic data are also presented for the estimation of yield.
An assimilation method based on an updating algorithm was used [
36] in combination with a simple crop growth model that allowed the daily monitoring of the main biophysical variables of vegetation: leaf area index, biomass, yield and actual evapotranspiration. The crop growth model chosen for this work is a version of the Simple Algorithm For Yield [
21] to which a simple model for calculating the soil water balance has been added [
14]. The assimilation method Ensemble Kalman Filter (EnKF) [
37] was used in combination with a modified version of the Simple Algorithm For Yield simple crop growth model that allowed the daily monitoring of the main biophysical variables of vegetation: leaf area index, biomass, yield and actual evapotranspiration. The EO data used for the assimilation were acquired by Sentinel-2.
The method developed in this work (which will hereinafter be referred to as EnKF-SAFY_swb) was initially analysed through a set of synthetic data in order to evaluate its assimilation efficiency, according to the methodology proposed by [
38]. Subsequently, the methodology was validated in a study area located in central Italy (Grosseto, Tuscany), using daily actual evapotranspiration as reference variable.
4. Discussion
One of the aims of this paper was to study the validity of the proposed DA method based on the EnKF coupled with the new model SAFY_swb. More specifically, we aimed at evaluating if the EnKF-SAFY_swb method would be more effective than the simple SAFY_swb model in simulating the crop growth. Furthermore, we wanted to understand how much the assimilation efficiency depends on the number of assimilations and on the error on the variable to be assimilated.
The “general case” (
Section 2.2.1) was useful in defining the relationship between the number of assimilations and the efficiency of the method, and it also made it possible to establish the LAI error limit value for which the use of this assimilation method is convenient. For this case, the AE is evaluated only with respect to yield, since it is the variable of greatest interest in a crop growth model. The “specific case” (
Section 2.2.2), on the other hand, made it possible to quantify in a more decisive way (allows to use a large number of input scenarios) the usefulness of using the EnKF assimilation method with the SAFY_swb model.
The results of the AE analysis for the “general case” highlight that the use of the EnKF DA method on SAFY_swb led to an improvement in accuracy both in estimating LAI and yield (
Figure 6). Particularly surprising is the reduction in RMSE for yield, which is estimated using the EnKF. RMSE distinctly decreased from 176.5
to 50.5
. This result, although encouraging, cannot be considered fully reliable, as the dataset is generated with a small number of “true” cases. The number of scenarios considered is not sufficient to unequivocally define the AE of the EnKF-SAFY_swb method.
The analysis for the general case is, on the other hand, very useful to quantify the influence of the number of observations to be assimilated and of the acceptable error of the observed variable (i.e., the LAI) on the AE of the method. From this analysis, it resulted in the fact that the error on the LAI measurements greatly influences the AE (
Figure 7). In particular, the use of the assimilation method is very suitable for LAI errors between 0.05 and 0.1, suitable for estimation errors on the LAI of 0.15, sufficiently suitable for errors between 0.2 and 0.25. For measurement errors exceeding 0.25, the assimilation method, compared to the use of the simple model, is no longer beneficial. It is therefore possible to establish an estimation error in LAI of 0.25 as a threshold value obtaining better results when using the DA technique. The number of assimilated observations, therefore, does not significantly affect the AE, especially for low LAI errors. The influence of the LAI starts to become relevant for errors higher than 0.2. In the case of LAI estimations from Sentinel-2 data, the error on the LAI is around 0.2, and thus the use of EnKF-SAFY_swb has proven to be useful to improve the crop variable estimations. The AE relative to the yield evaluated for five assimilation dates and for an LAI error of 0.2 (i.e., a case similar to the “specific case” and to the real case study) is slightly above 60%.
Once the estimation error limit of the LAI and the minimum number of assimilations had been established, it was possible to analyse the assimilation efficiency of the EnKF-SAFY_swb method. The database generated in the specific case adopted a number of assimilated observations and error on LAI observations similar to those found in the case study. In this way, it was possible to generate a dataset large enough to rigorously establish the assimilation efficiency of the proposed method. In this case, the comparisons between “true” and “simulated” variables were shown not only for LAI and production but also for biomass and actual evapotranspiration (
Figure 8). Both parameters give an idea of the model’s potential (useful to inspire future studies) and, in the real case analysed, the focus was mainly on the estimation of AET. This is why the AE was calculated both for yield and AET.
The error on the yield is slightly higher than in the general case, albeit with a value that justifies the use of the assimilation method. We can therefore state that, for a number of assimilated observations higher than 5 and an LAI error lower than 0.2, the yield estimation error is at least 11% (value obtained using a synthetic dataset), and the AE with respect to the yield of using the EnKF-SAFY_swb method is estimated at around 63%. This value agrees with the value of AE evaluated in the “general case” for five assimilated observations and an LAI error of 0.2.
For the AET, the estimation error (RMSE) is about 0.23 mm (equal to the 18% of the average measurement). In this case, the AE is approximately 67%, a value that indicates how sufficiently useful it is to increase the estimates made by the SAFY_swb crop growth model with the EnKF assimilation method. It should be noted that this cannot be considered a proper validation, but it is rather the step prior to a use “on-the-field”. On the one hand, it proves the correctness of the theoretical principles on which the method is based on (and its correct application). On the other hand, it provides a reference system that any “real” case study can be compared with.
The validation of the assimilation method based on the Kalman Filter Ensemble applied to the SAFY_swb crop growth model proposed in this study was applied to a data set recorded in central Italy in 2018 (case study,
Section 2.3).
We extrapolated from the images acquired via airborne the points corresponding to the geographical position of the EC footprint. We compared the daily trend of LEi in the days of acquisition via airborne in the interval of time in which the flight took place. This is a purely qualitative comparison because of the relatively small data set. This comparison (
Figure 9) shows that there is correspondence between the data with an average error of 9% (between the airborne and the EC measurements). In this specific case, we are interested in analysing the ET, which, according to the method suggested by [
55] and described in
Section 2.3.3, is calculated using a single measure of daily LE
i. The comparison in
Figure 9 therefore allowed to select the time in which the estimation of LE
i derived from the airborne data is closer to the maximum of the polynomial regression function calculated from the LE
i estimations measured by the EC sensor.
Concerning the validation of the method’s ability to simulate the ET trend, the comparison with the data measured in situ shows a low correspondence (a R
2 of around 0.46 and an RRMSE of around 27%), especially in the period between the days 195th (14 July) and 207th (5 August) (
Figure 10).This error could be attributed to the inaccuracy of the climatic input data. The weather station was not exactly located in the study area, but a few kilometres south of it. For a field-scale study, this distance can introduce a non-negligible error. Another reason could be an incorrect calibration of the parameters of the SAFY_swb model, and further experiments are needed to confirm or disprove these hypotheses.
Regarding the application of the EnKF-SAFY_swb method for the simulation of daily ET maps for a predetermined date, the validation showed a fair correspondence. The relative RMSE between the ET processing obtained from the airborne data and the ET simulated by the model is approximately 27%. Considering that the error on the initial LAI was set at 20%, the relative RMSE obtained can be considered a satisfactory result. However, it highlights that the assimilation of remotely sensed data brings an improvement to the model. Analysing the individual groups of fields in detail, it is noted that the RRMSE is around 20% for six groups out of eight and around 40% for the remaining 2. Both groups of fields with a relatively high RRMSE are part of the pivot, the area for which a malfunction of the irrigation system was reported (preventing the precise knowledge of the quantity of water administered). This drawback has forced approximations on some input parameters considered fixed during the assimilation (such as the amount of water used for irrigation), which could be partly caused due to errors on the initial calibration of the parameters. This is very evident by observing the daily ET maps proposed in
Figure 12. The histograms of those images (
Figure 13) highlight a difficulty of the method in simulating the variations of ET, tending to assume the most frequent values. They also confirm a tendency of the model to underestimate the ET, particularly in some fields within the pivot.
However, even if the expectations given by the results obtained using the synthetic dataset (an RRMSE of about 18%) were higher, the results obtained from real data are sufficiently valid, even if they leave many possibilities for improvement of the proposed methodology.
For the sake of completeness, a validation was also carried out on the LAI, using as reference measures the LAI values processed starting from Sentinel-2 data, not used in assimilation. It was decided to always use the latest available date of the crop cycle of each group of fields. Analysing the RRMSE group-by-group (
Figure 14 and
Table 8), the method is very accurate in 6 out of 8 cases and not valid in the remaining 2. Both groups of fields for which the RRMSE is very high have LAI values on average lower than 1, and this suggests a difficulty of the methodology in simulating the LAI for low values in the final phase of the cultural cycle. The absolute RMSE in fact is not discordant with that of the other fields; the relative one, however, referring to small values of LAI, is very high. Analysing instead the entire area covered by the studied fields, the results are consistent with what was found using synthetic data.
Analysing the literature, some common problems emerged in the synergistic use of EO data with crop models through DA methods. The problem of parameter calibration was solved with in situ measurements, in order to reduce the number of parameters with high uncertainty to 5 (parameters varied in the application of the DA method). This solution is not optimal for the use of low cost and low environmental impact methods; it is therefore necessary to implement tools capable of acquiring data remotely. The difficulty in having high spatial resolution and high temporal frequency for the method validation data has been circumvented by using synthetic datasets. This is a preliminary check, to be confirmed by field applications, but, in any case, allows highlighting the strengths and weaknesses of the methodology, thus suggesting possible improvements. The choice of two synthetic datasets (above defined as “general case” and “specific case”) was necessary to solve the problem of too high computation time (already found in other previous studies, such as [
13,
62,
63]). Dividing it into two steps, the processing times were reduced and an analysis on some EnKF DA typical parameters was also provided: the number of assimilations and the threshold error on the assimilated data. In the real case, however, the problem of having high spatial resolution data for validation was addressed using images acquired from aircraft, while daily (punctual) data were used to solve the problem related to the high temporal frequency. This strategy allows for a preliminary validation with real data, which further confirm the benefits that satellite missions such as LSTM, which aims to provide high-resolution, high-frequency ET maps.
5. Conclusions
The method proposed in this study, which uses the LAI retrieved from Sentinel-2 data with a modified version of SAFY that considers the soil water balance as an input, through a DA method based on the EnKF, sought to address some of the most common problems found in the literature. We studied the problem both from a spatial and temporal point of view. It is part of the scientific field of DA methods for a synergistic use of EO data with crop growth model that uses a filtering approach, according to the classification proposed by [
2].
The study aims to propose a methodology that allows future practical applications at field scale, thus providing information that in addition to detailing the growth of crops is also useful to optimize water resources. Precisely for this reason, we have chosen to use a model such as SAFY_swb, capable of simulating the evolution of ET over time and its implications for the development of the crop.
The main limitations of this study are:
- -
Few acquisitions of satellite images from which to obtain the LAI for the updating of the model.
- -
Few images acquired via airborne for validation
- -
Single Eddy Covariance tower on the ground
- -
Lack of an active in situ weather station for the entire crop cycle.
The proposed methodology was developed considering these limitations and trying, as far as possible, to reduce their effects on the results.
In conclusion, it can be stated that the EnKF-based DA method has been shown to be valid for the SAFY_swb crop growth model, confirming good results in the simulation of biomass and yield (to be validated also in real cases) and equally good ability in monitoring LAI and ET. The first can be improved by increasing the number of assimilations, the second, albeit valid in providing an overall trend of the data, to be improved especially in the accuracy of the daily data. In addition, the model has proven to address ET’s daily requirements by efficiently assimilating the EO data.
A good calibration continues to be essential to use crop models, which cannot always be solved efficiently with in situ measurements. It has further been shown, through the use of synthetic data, that the lack of good calibration could be partially avoided by increasing the number of assimilations. However, results improved in a much more significant way by increasing the accuracy of the assimilated variable. The studies conducted so far suggest that assimilating multiple variables simultaneously (with the highest temporal frequency and spatial resolution possible) could further compensate for the impossibility of calibrating the parameters that can be measured mainly in situ. Another possible improvement for future studies would be the use of specific meteorological data recorded in a study site, so that the elaborated maps can further consider the variability on the area under study.