1. Introduction
Several processes of crop physiology depend on the climate at different time scales. Among them, crop phenology is fundamental as it determines crop performance over a growing season and is commonly used for agricultural planning. An accurate representation of this process is crucial for assessing crop productivity.
Global environmental changes experienced in recent decades and those predicted for the coming years affect the crop phenology, especially in areas with high climate variability, such as the Mediterranean and semi-arid zones [
1,
2,
3,
4,
5]. Vineyards represent one of the most economically significant agricultural products in these areas. Thus, changes in vineyard phenology will shape management practices for strategic planning in the sector, especially regarding viticultural zoning and the selection of suitable cultivars [
6,
7,
8,
9,
10].
Temperature is the main driver of grapevine (
Vitis vinifera L.) phenology [
11,
12,
13,
14,
15,
16]. Values between 10 °C and 25 °C are optimal for vine development [
17]. Temperatures between 5 and 10 °C are also critical for accumulating cold units to complete dormancy, playing a fundamental role in flowering and berry formation processes [
16]. Several studies have identified that average maximum temperatures during the spring months have a strong negative correlation with the duration of the budburst–flowering interphase. In contrast, the period of the interphases (flowering–veraison and veraison–maturity) is related to the accumulation of heat forcing units. Several studies have shown a nonlinear effect of temperature on grapevine phenology, but these responses are different among cultivars [
12,
13,
18,
19,
20]. In addition to temperature, soil water availability and soil physical properties impact the phenology of grapevines, with earlier flowering and veraison associated with water deficit or in dry years for rainfed vineyard systems [
8,
21,
22].
Multiple tools have been used to represent phenology, including weather-based phenological models (W-PhenM), which are based on the accumulation of state-forcing units (SF) and the combination of chill and state-forcing units (CF) [
23]. Forcing units are obtained from the difference between mean (T
m) and base temperatures (T
b), regarded as the minimum temperature required for grapevine growth. The accumulation starts on a fixed date, usually January 1st in the northern hemisphere and July 1st in the southern hemisphere, using a temperature of 10 °C [
17]. In the grapevine flowering and veraison (GFV) model proposed by [
24], T
b is set to 0 °C. The starting date of accumulating forcing units (t
0) corresponds to the 60th day of the year (DOY = 60) in the northern hemisphere and DOY = 242 in the southern hemisphere. CF models describe the dormancy phase [
25,
26], which requires accumulating chill units. When a threshold is surpassed, the accumulation of forcing units is triggered.
Grapevine phenology models with this approach are the ones proposed by [
27] (CaEc) and the BRIN model [
28]. The CaEc model is based on the Chuine Unified Model [
25] for the accumulation of chilling during dormancy and a sigmoidal model of accumulation of forcing units for the budburst, flowering, and veraison phases. The BRIN model is an assembly between Bidabe’s Cold Action model for the dormancy phase and the Richardson model [
26], based on the accumulation of growing degree hours for the budburst phase.
SF models involve estimating a few parameters; their implementation is simple and applies to wide varieties and locations. However, the predictive power of SF models is not necessarily the same for all varieties and does not properly describe the differences in development rates between stages. CF models are variety-specific and incorporate a sub-model for each phenophase, making them more complex to implement [
24,
27,
28].
SF and CF models must be parameterized and validated site-specifically because such models do not consider the spatial variability of phenology [
29,
30,
31,
32,
33,
34,
35]. In addition, these models depend on the availability of meteorological data that represent vineyard conditions [
36,
37]. Therefore, the uncertainty associated with W-PhenM lies mainly in the parameterization process, which is affected by the nature of the input data and the method used. In addition, the complexity of agricultural systems causes the parameterization, which is altered by biotic and abiotic factors and the associated agricultural management [
38,
39].
Phenological models based on remotely sensed data, known as land surface phenology models (LSP) [
40], have the potential to overcome some of the drawbacks that SF and CF models have, especially regarding the spatial variability of phenology [
41].
There are several classifications for LSP models according to (a) specific thresholds, (b) time series curvature, (c) previous or within-season phenology responses, and (d) changes in the trend of remotely sensed data [
42]. The within-season approach (real-time or near real-time) aims to monitor crop development for operational planning [
42,
43]. The time-series curvature approach has been used in annual crops [
44,
45,
46] and vineyards [
47,
48]. This method requires fitting remotely sensed data to a function to identify inflection points (dates) and local maxima and minima. Based on these methods, forest phenology has been monitored using data from moderate-resolution imaging spectroradiometry (MODIS) [
49,
50], which has a spatial resolution between 250 m and 1000 m and a daily temporal frequency. In crops, MODIS data have been used to analyze phenology in soybean and maize [
51,
52] as well as vineyards [
44,
53,
54], and has been tested for the development of crop maps [
32].
LSP is an indicator of the global dynamics of terrestrial ecosystems since it responds to environmental changes, especially temperature and precipitation. Therefore, the temporal and spatial analysis of LSP patterns provides insight into the phenology of ecosystems and the drivers involved, making it a tool that improves phenology modeling in the face of climate change scenarios [
55].
Despite the potential of LSPs, the challenge of these models lies in the ability to discriminate phenological metrics when compared to field observations [
45], which is due to the coarse spatial resolution of remote sensing data and the inherent complexity of the transitions between phenophases [
46]. In the early stages of development, where signals can be disturbed by soil moisture conditions and the woody structure of perennial plants, it generates noise that affects data quality. Additionally, the relationship between phenology and greenness detected by remote sensing is highly dependent on the crop type, its growth dynamics, and the effects of biotic and abiotic stressors [
42].
It is relevant to consider the synchronization between the greenness and the structural development of the vineyard, which is related to the coincidence between changes detected by remote sensors and those observed in the field that depends on chlorophyll concentration, soil, and leaf water content. Such synchronization is critical in the abrupt transition between dormancy and the greenness increase in bud break, poorly modeled by the curve-fitting method [
56].
Another challenge for applying LSPs in Mediterranean areas, such as central Chile, corresponds to the high cloudiness usually found at the beginning of the growing season (occurring in late winter and early spring), which reduces the possibility of accessing remote sensing data. In addition, vineyards are heterogeneous surfaces that include inter-row areas, being challenging to identify the earliest phenological stages only with remote sensing data [
57].
Recently, Sentinel-2, a remote sensing data source available since 2015, was used to monitor phenology in forests [
58], wheat [
59], rice [
60], and tropical fruit trees [
61]. This remote sensing data source has not been used in vineyard phenology; however, Ref. [
62] evaluated the potential of Sentinel-2 to obtain information on the agronomic importance in viticulture, including the phenology.
Several authors have proposed integrating LSP data with ground-level phenological, meteorological, and soil data [
42,
45,
57,
63] as a way to overcome these difficulties.
Phenological data assimilation (DA) is the process by which remote sensing measurements or observations, transformed into phenological stages, are incorporated into W-PhenMs to calibrate, replace, or update the modeled phenological processes. DA brings the ability to reduce the difference between model-based and remote sensing estimates, provides temporal continuity to the evaluated phenomena, and updates the state variables of predictive models. However, such a framework is subject to different sources of error from remote sensing data, models, and algorithms for assimilation, optimization, and interpolation. Additionally, DA requires large amounts of data (especially remote sensing data), which implies a high computational capacity to reduce computing times [
64].
The main uncertainty of DA lies in selecting the algorithm to be used for assimilation since data with different spatial (e.g., local and regional) and temporal (e.g., daily and weekly) scales are usually integrated. Furthermore, DA over large areas (regional scale) requires a previous evaluation to determine the spatial covariance of phenological patterns affected by land heterogeneity [
65].
DA uses algorithms, called filters, that are applied to time series of state variables in both models (e.g., W-PhenMs) and observations (e.g., remotely sensed data) to improve the estimation of state variables [
66]. The Kalman Filter (KF) and its variants have been one of the most widely used algorithms in DA [
67,
68,
69,
70].
Crop simulation models have been integrated with remote sensing data of the Leaf Area Index (LAI) and soil moisture (SM) to improve yield prediction in grain crops through KF in linear processes. For nonlinear processes, the Extended Kalman Filter (EKF), Ensemble Kalman Filter (EnKF), and Particle Filter (PF) have been adopted [
71,
72,
73,
74,
75,
76,
77]. In vineyards, the EnKF and PF have been used to assimilate data from high-resolution thermal infrared sensors and Synthetic Aperture Radar (SAR), with the soil–vegetation–atmosphere transfer (SVAT) model to improve soil moisture modeling at the surface and root zone levels [
78].
In the area of phenology, DA has made possible the evaluation of the stage of several biome types, especially those found in the Mediterranean zone [
79]. Better predictions are obtained in forest ecosystems in spring (when weather is highly uncertain) [
65,
80]. Additionally, DA helps to identify gaps in parameter estimation and poor relationships between state variables simulated by meteorological models [
81,
82].
Given the potential for phenology modeling approaches described above, developing a data assimilation (DA) based model in a vineyard would improve the goodness-of-fit performance of the W-PhenM and LSP. Therefore, this research aims to evaluate a DA-based phenology model that integrates W-PhenM with the Sentinel-2 LSP in a commercial Cabernet Sauvignon vineyard in Central Chile.
2. Materials and Methods
Figure 1 summarizes the data assimilation approach for phenology modeling in vineyards. Firstly, phenology, meteorological, and micro-meteorological ground data are collected. Secondly, the evaluation of the phenology model based on remote sensing data (RS-PhenM) by applying the Savitzki–Golay filter (SG-Filter), fitting the data to a double Gaussian model, and then the derivation of phenological metrics are carried out. The third step is the evaluation of models based on meteorological data: the Grapevine Flowering Veraison model (GFV) [
24], the Caffarra and Eccel approach [
27] (CaEc), and the BRIN model [
28]. Finally, the assimilation process is performed by the Extended Kalman Filter (EKF) algorithm and evaluation of the assimilated models: assimilated GFV (EKF-GFV) and assimilated CaEc (EKF-CaEc).
2.1. Study Area
The study was carried out in a drip-irrigated vineyard (Vitis vinifera L. cv. Cabernet Sauvignon) during 2017–2018 (S1), 2018–2019 (S2), and 2019–2020 (S3) growing seasons (October–May). The vineyard is located in central Chile, 30 km south of Santiago. This region is characterized by a Mediterranean climate with a mean annual temperature of 12.2 °C, a mean temperature in January (summer) of 19.1 °C, and a mean in June of 5.6 °C. Precipitation is concentrated in winter (June-September) with an average annual total of 280 mm and average total reference evapotranspiration of 485 mm. Irrigation is the primary water source during the growing season because precipitation is concentrated in the austral winter (June–August).
The vineyard was planted in 2010. Rows are oriented north–south, with a spacing of 2.5 m between rows and 1.0 m between vines. Inter-rows are maintained vegetation-free using mechanical and chemical weed control measures. Water is applied by drip irrigation during the season. The irrigation time is calculated as a function of reference evapotranspiration (ET0). Usually, the grower sets irrigation to restore 50% of ET0 every seven days throughout the season. Due to the prevailing drought in the winter of 2019 and reduced water availability for irrigation in S3, irrigation time was set to restore 25% of ET0. Canopy management also varied among the growing seasons. In S1, the trellis system was vertical-shoot positioned with three-wire lines, while in S2 and S3, the training system was structured without wire lines, increasing the frequency of topped and trimmed during the growing season.
2.2. Ground Data
Micrometeorological data were obtained using an eddy covariance system (EC), measuring energy and mass exchange between the vineyard and the atmosphere [
83]. Due to the prevailing wind direction during the daytime, a west-facing EC tower was installed on the east border of the study area (
Figure 2, with coordinates 33°42′16″S and 70°34′32″W. Installed sensors, data processing, and quality control are described in detail in Ref. [
84].
Meteorological data were collected from William Fevre agrometeorological station, located 4 km north of the study area (33°67′S and 70°58′W). The station records solar radiation (MJ m−2 day−1), air temperature (°C), relative humidity (%), wind speed (m s−1), and precipitation (mm).
During the three seasons, phenological observations were recorded every seven days using the modified Eichhorn and Lorenz (E-L) scale [
85]. The E-L system identifies the main grapevine development stages (
Table 1), and for the present research, Budburst (4), Flowering (23), Setting (27), and Veraison (35) were evaluated.
2.3. The Remote Sensing Phenological Model (RS-PheM)
Remote sensing data were obtained from the Sentinel-2 mission, with a spatial resolution of 10 m, a radiometric resolution of 12 bit, and a temporal frequency of 5 days at latitudes near the equator and 2–3 days at mid-latitudes [
86]. The spectral bands used correspond to 4 (σ
red) and 8 (σ
nir), whose wavelength ranges are 0.64–0.70 µm and 0.73–0.93 µm, respectively. Remote sensing data were filtered by date during the growing season and by the percentage of cloudiness, selecting those with 30% or less, resulting in 96 images for the three seasons (
Table 2).
The calculation of the Normalized Difference Vegetation Index (NDVI) was supported by Google Earth Engine (GEE) [
87]. GEE is based on JavaScript code for geospatial analysis allowing data assimilation with minimum computational capabilities.
2.3.1. NDVI Time Series Smoothing
Although images are filtered by cloud cover, they still maintain a significant noise level due to atmospheric conditions. The noise is evident by abrupt changes in NDVI values across the time series and does not correspond to the gradual variations of the vegetation during the growing season [
88] due to atmospheric and local factors such as soil moisture conditions and the woody structure of perennial crops.
The intra-seasonal and inter-seasonal NDVI time series were smoothed. The smoothing process improves the identification of NDVI changes, which allows relating it with the phenological changes observed in the vineyard and its application according to the criterion of the first and second derivative for the definition of the LSP.
Although several noise removal methods have been developed, there is no agreement on the best method to use. However, the Savitzky–Golay filter has been successfully used in many NDVI-based studies for the assessment of vineyard LAI. Therefore, based on the good fit between LAI and phenology, the algorithm Savitzky and Golay [
89] proposed for intra-seasonal time series was used, which is a least-squares adjustment between consecutive values obtained by a weighted moving average given as a polynomial of a certain degree. The general equation is given by:
where Y is the original NDVI value, Y* is the resultant NDVI value,
Ci is the coefficient for the
ith NDVI value of the filter (smoothing window), and
N is the number of convoluting integers and is equal to the smoothing window size. The subindex
j is the running index of the original ordinate data table. The smoothing array (filter size) consists of 2
m + 1 points, where
m is the half-width of the smoothing window.
To smooth inter-seasonal time series, we used a second-degree asymmetric Gaussian model with the general equation:
where
x is the day of the year (DOY) and
a1,
a2,
b1,
b2,
c1, and
c2 are parameters fitted to the NDVI time series. For convenience, DOY starts on 1 July of each year (DOY
Jul = 1) and ends on 30 June of the following year (DOY
Jul = 365) because the growing season starts in the southern hemisphere in September and ends in May next year. This adjustment of the DOY definition facilitates the evaluation of the models and allows comparison with results obtained in the northern hemisphere.
2.3.2. Remote Sensing Phenology Metrics
The metrics to identify NDVI variations associated with changes in the phenology were determined using a modification of the methodology proposed by Ref. [
48] in Portugal vineyards. First, NDVI was adjusted to a seven-parameter double logistic model. Second, the first (δ
1) and second (δ
2) derivatives were calculated to obtain the phenological metrics (RS-PhenM). Third, the inflection points, local maximum, and minimum were identified. Therefore, by interpolation, the date (DOY
jul) of the Start of the Season (SOS), Left Inflection Point (LIP), Maximum Canopy Development (MCD), and Right Inflection Point (RIP) were estimated. These four NDVI phenological metrics were linked to the stages Budburst (4), Flowering (23), Setting (27), and Veraison (35), respectively (
Table 3).
2.4. The Weather Phenological Models (W-PheM)
The W-PheM is based on grapevine phenological processes. This research is focused on assessing the “Grapevine Flowering Veraison Model” (GFV) [
24], BRIN [
28], and the model proposed by [
27] (CaEc).
2.4.1. Grapevine Flowering Veraison Model (GFV)
The GFV is a model that assumes a phenological phase occurs when a critical value (
F*) of the forcing variable (
Sf) is reached at a time (
ts):
where
Rf is the daily sum of the forcing rate, starting on a day of the year (
t0), which in the Northern Hemisphere is 1 March (DOY = 60) and in the Southern Hemisphere is 29 August (DOY
Jul = 60), and
xt is the daily mean temperature. The forcing rate in the GFV model is a function of the base temperature (
Tb) of 0 °C, and the following criteria are applied:
2.4.2. BRIN Model
The BRIN model estimates the date when bud break occurs in vineyards. This model combines two phenological models, one associated with the endo-dormancy period [
90] and the other with eco-dormancy [
26]. Therefore, the bud break date (
Nbb) occurs when the critical sum (
Gc) of cumulative growing degree hours (
Ac) since dormancy break (
Ndb) is reached:
For the calculation of the growing degree hours (GDH), the hourly temperature of day
n [
T(
h,
n)] is estimated by linear interpolation between the maximum temperature of day
n [
Tnx(
n)] and the minimum temperature of the following day [
Tn(
n + 1)]. Therefore, assuming that the length of the day is 12 h, it follows that:
Equations (6) and (7) show that both the base temperature (
T0Bc) and the maximum of the eco-dormancy period (
TMBc) limit the
Ac response; consequently:
The BRIN model assumes that T0Bc = 5 °C before bud break and TMBc = 25 °C.
Additionally, dormancy break (
Ndb) occurs when a critical number (
CC) of chilling units (
CU) counted from 1 March (when buds are dormant) is reached. The
CU is calculated based on the
Q10 concept, where an arithmetic progression of 10 °C temperature causes an action with a geometric regression of the
Q10 ratio:
2.4.3. Caffarra and Eccel (2010) Model (CaEc)
The model proposed by Caffarra and Eccel (CaEc) has two components: (a) one describing bud break based on the model of [
25], where chilling hours act on the release of endo-dormancy, and (b) describing flowering and veraison as the result of the accumulation of forcing units by a sigmoidal function.
In this regard, the bud break of the CaEc model is based on the following equations:
where
Fcrit is the critical forcing units to reach the phenophases of interest; ForcState is the accumulated forcing units; ChState is the accumulated chilling units;
Tm is the mean daily air temperature; and
a,
b,
c,
e are curved shape parameters.
On the other hand, the model CaEc models the date of flowering and veraison according to the equation:
2.4.4. Parameterization and Evaluation of W-PhenM
The W-PheM were parameterized using the Phenology Modelling Platform (PMP) [
91], which is free downloaded software (
https://www.cefe.cnrs.fr/fr/recherche/ef/forecast/phenology-modelling-platform, accessed on 25 April 2022) developed with the purpose of fitting phenology model parameters. PMP uses an optimization algorithm based on the simulated annealing method [
92], which simultaneously adjusts all model parameters, obtaining effective overall convergence despite the interdependence among phenological model parameters.
To optimize the model evaluation based on data assimilation, the W-PhenM was parameterized with a set of phenological observations obtained from the National Network of Phenology Observatories (TEMPO) of France (
https://data.pheno.fr/, accessed on 27 May 2022), which compiles the phenological database of France. Hence, the data were filtered according to the climate (Mediterranean), the cultivar (Cabernet Sauvignon), and the observed phenological stages (Budburst, Flowering, and Veraison). Data matching the search criteria were those from the Unit Experimental Domain De Vassal near Montpellier (Lat. 43°19′42″N, Long. 3°33′47″E) between 1995 and 2012. Thus, the meteorological data used were from the Aéroport Montpellier Méditerranée station (Lat. 43°34′43″N, Long. 3°58′07″E), which is available from the Global Historical Climatology Network Daily (GHCNd) for the period between 1994 and 2012 (
Table 4) (
https://www.ncei.noaa.gov/products/land-based-station/global-historical-climatology-network-daily, accessed on 15 February 2022). For the data of this study, models were evaluated using data from the William Fevre station.
2.5. Phenological Model Based on Data Assimilation (DA-PhenM)
Phenological data assimilation takes the system model’s (W-PhenM) predictions and updates them with the observation model’s (RS-PhenM) outputs. The processes described by both RS-PhenM and W-PhenM are nonlinear. DA algorithms for these processes, such as the Particle Filter (PF), require many observations and high computing capacity, making them complex to implement. Less complex algorithms, such as the Kalman Filter (KF), are implemented in linear systems and are suitable for nonlinear processes. The Extended Kalman Filter (EKF) is a modification of the KF, which incorporates Jacobians or partial derivatives to linearize nonlinear systems. In this regard, the prediction of the state variable is given by the following state-space model of the system:
where the subscripts
k and
k − 1 are the current and previous time, respectively;
x is the state variable of the system (e.g., cumulative forcing units); u is the driving input of the system (e.g., daily mean temperature);
v is the process noise, assuming it is normally distributed with a mean of 0 and variance equal to Q
k−1 (v~
(0, Q
k−1));
A is the matrix that describes the transition of the state variable between time
k − 1 and
k; and
B is the matrix that describes the change in the system state from time
k − 1 to
k due to the effect of the driving variables. Additionally, differing from the KF, the system’s nonlinear equations are linearized in matrix
A by calculating the partial derivatives of each state variable versus time (Jacobians). Similarly, in matrix
B, the partial derivatives are calculated for the state variables with respect to the driving variables of the system.
Additionally, the DA-PhenM and hence the EKF process involve an observation model, which estimates the measurement at time
k (
yk) from the prediction of the state variable at the same time, with the general expression given by:
where
H is the observation matrix used to estimate the sensor observation (e.g., Sentinel-2) at time
k, and
w is the measurement noise, assumed to be
w ~
(0, R
k). The NDVI from Sentinel-2 measurements is fitted to a nonlinear function (Equation (2)), so the matrix
H is calculated from the Jacobian of the NDVI as a function of time.
After the state-space model of the system (Equation (15)) and the observation model (Equation (16)) are derived, the DA-PhenM model is run iteratively, assuming the initial conditions of the system for xk−1 (e.g., xk−1 = 0 forcing units) and uk−1 (e.g., the temperature at time k − 1).
2.6. Model Assessment
All models (RS-PheM, W-PheMs, and DA-PheMs) are evaluated using the following metrics to quantify the goodness of fit to the observed phenological data:
where
yi is the observed,
i is the simulated data, and
n is the number of observations.
The RMSE has the constraint that it is sensitive to outliers. However, outliers decrease when the systematic error is reduced. On the other hand, the RMSE has the advantage of quantifying the error in relative terms, allowing intercomparison between models. In data assimilation, the RMSE is considered an objective function that must be minimized to fit the model parameters.
where
is the average of observed dates. The model efficiency is the ratio between the model error (MSE) and the MSE of the average of observed dates. Therefore, EF ≈ 1 refers to a perfect model, while EF ≈ 0 means that the average of the observations is a better predictor than the model.
where RMSE
DA is the RMSE of DA-PhenM (EKF-GFV and EKF-CE); RMSE
WM is the RMSE of W-PhenM (GFV and CE).
DAskill is an indicator that allows comparison only between DA-PhenM since it shows the degree of the RMSE changes without DA and with DA. Positive values indicate that DA-PhenMs improve the prediction of W-PhenMs, while negative values show that DA-PhenMs do not improve the prediction of W-PhenMs.
Bayesian Information Criterion (BIC). According to [
66], the BIC corrected for small samples is given by:
where σ^
2ML is the estimator of the maximum likelihood of the residual variance, which is given by:
The BIC allows evaluating models with different parameters to calculate, as in the W-PhenM, RS-PhenM, and DA-PhenM. Therefore, the BIC favors the simpler models since it has a component that penalizes the number of parameters. Thus, when comparing the BIC of the models, the best model is the one with the smallest value.
The evaluation of the models will be performed in two stages. In the first stage, the RMSEs are compared. The model with the best performance is the one with the lowest RMSE. In the second stage, the models are compared according to the BIC, with the best performance being the lowest BIC. Although DA-PhenMs are expected to have the highest BIC value, the best model based on data assimilation will be the one with the lowest DAskill.
4. Discussion
Interannual variability in budburst, flowering, and veraison dates is reported by [
102] with variations of eight days in budburst, while Ref. [
13] showed variations of 19 days in budburst, nine days in flowering, and 13 days in veraison. On the other hand, the dates reported in
Table 5 are consistent with those observed for cv. Cabernet Sauvignon in Central Chile between 2004 and 2006 [
102] and 2009 and 2013 [
31].
On the other hand, the irrigation in S
3 only accounted for 65% of the ET
c (313 mm) due to Central Chile’s drought, responsible for a diminishing water supply in channels. Although water consumption dropped in S
3, budburst and flowering dates were similar to S
1 and S
2, while veraison was delayed compared to the previous seasons. Thus, water stress is reported to have a greater impact at the berry formation stage (E-L 27) [
103]. However, water availability in grapevine phenology is coupled with other environmental variables, such as the soil type and temperature [
20,
21].
The fitted Gaussian model provides a daily time series, increasing the accuracy in estimating phenological stages [
95]. Additionally, it is a valuable tool for identifying inter-annual NDVI variations since curve parameters allow a valid estimation for large areas [
104,
105,
106]. However, it cannot identify specific dates of phenological stages around the curve peaks [
105].
Regarding phenological metrics extracted from NDVI, the days between the Start of the Season (SOS) and Maximum Canopy Development (MCD) in S1 and S2 were 70 and 72 days, respectively. In comparison, in S3, it was 55 days, which contrasts with an average of 102 days reported by Ref. [
53] in Washington State and an average of 90 days reported by Ref. [
47] in the Douro region of Portugal. In both Washington State and the Douro Region, the vineyards are under a rainfed system, where the total annual rainfall is around 300 mm and 580 mm, respectively. In our study area, the vineyard is under irrigation, particularly in S3, which consumed 65% of the ETc, equivalent to 203 mm. The latter suggests that the phenological metrics derived from NDVI are related to the water available for wine grapes, thus defining the extremes of NDVI and the duration of the periods based on the intra-annual behavior of the vegetation index. Taking into account that prior to and during budburst, the vineyard is transparent to NDVI, the vineyard surface is characterized by the presence of vegetation in the inter-row area and high soil water content as a result of winter precipitation, which is explained by a correlation coefficient of −0.88 reported by Ref. [
47].
The RMSE of veraison based on the NDVI criterion is 8.6 days; this criterion differs from Ref. [
49], which pointed out that veraison correlates to the local maximum of the NDVI curve, showing a correlation coefficient of 0.87. However, it should be considered for this research that the NDVI was fitted to an asymmetric Gaussian model. In contrast, Ref. [
47] study was fitted to a double logistic model and did not conclude, due to lack of evidence, on the phenological meaning of the right inflection point of the NDVI curve. The present research proposes a phenological
Vitis vinifera L. cv model. Based on Sentinel-2 NDVI data, Cabernet Sauvignon is fitted to an asymmetric double Gaussian model identifying the budburst, flowering, setting, and veraison stages. The proposed model should be improved considering the NDVI time series longer than four seasons and further fine-tuning of the criteria to identify the flowering stage to reduce estimation error.
In evaluating W-PhenM, BRIN model performance is slightly better than those reported for Cabernet Sauvignon by Ref. [
28] (RMSE = 9.7 days) and Ref. [
29] (RMSE = 11.1), although Ref. [
34] obtained a better performance with RMSE around 6.0 days. Compared to the CaEc model, mixed results have been reported, with better performance with RMSE between 4.5 and 5.7 days in Cabernet Sauvignon [
34] and a higher RMSE of 22.8 days in Chardonnay [
37]. In addition, the higher efficiency of CaEc compared to BRIN in the budburst phase is due to the ability of CaEc to capture the behavior of the system in the eco-dormancy phase, which is supported by the results shown by [
27], where the CaEc model explains about 40% of the variance in the modeled budburst date and about 30% of the observed budburst date variance.
This research evaluated models that consider endo-dormancy. Since most climate change scenarios predict an increase in temperature at the end of winter, this group of models would have higher accuracy in predicting budburst [
7,
34]. However, assessments of the current climate have concluded that models based on forcing units, such as Degrees Growing Days with a base temperature of 5 °C (GDD
5) and 10 °C (GDD
10), predict the budburst date better. Thus, models such as BRIN and CaEc do not provide higher accuracy despite the higher number of parameters required [
24,
28,
29,
34].
On the other hand, Ref. [
37] reported higher RMSE values in flowering and lower in veraison. In addition, the CaEc model has not been evaluated for Cabernet Sauvignon in the flowering and veraison stages. Hence, the reference evaluations apply to the Chardonnay variety. In this regard, the errors reported for CaEc are consistent with those obtained, around seven days for flowering and five days for veraison, with better performance than GFV.
The GFV model is more efficient than the CaEc model, especially in the veraison satge. The good performance of the GFV model is likely due to the 0 °C T
b used by the model, which can encompass some important physiological processes not captured by the CaEc model and, as [
24] points out, T
b = 0 °C is a threshold that allows the convergence of the thermal sum simultaneously in the flowering and veraison stage.
The differences obtained in flowering and veraison prediction are likely due to errors in selecting external parameterization data, which involve differences in soil texture, available water, rootstock, pruning, and micrometeorological conditions that are not necessarily represented by the selected weather stations [
37,
96]. In addition, the performance of models reflects the amount of data used in the evaluation, such as Refs. [
24,
107], who had for the flowering stage, 70 and 62 observations for calibration and validation, respectively, while for the veraison stage, they had 66 (calibration) and 105 (validation) observations, which is proportional to the regional validity of the study.
Despite the promising results, proposed DA-PhenMs are limited to local conditions similar to those performed in this evaluation. On the other hand, there is uncertainty in the climatic reliability of the William Fevre and Montpellier Airport stations because of the influence of microclimatic conditions on vineyard phenology [
98]. In addition, this evaluation does not consider variables that determine phenology, such as photoperiod, soil texture, fertility management, and pruning.
However, limitations can be mitigated with the incorporation of proximal sensors such as phenological cameras [
101] or remote sensors such as synthetic aperture radars (SAR), which have provided valuable insights into vineyard water balance modeling [
71,
108]. Regarding assimilation algorithms, the Particle Filter (PF) is more suitable for nonlinear processes [
71,
99,
108]. Therefore, considering CaEc fits a logistic model, data assimilation with CaEc is likely improved with PF, as reported in the phenological modeling of bamboo forests [
109]. Moreover, DA-PhenM performance could be enhanced with the inclusion of variables such as the Leaf Area Index (LAI) since it is closely related to phenology [
78,
79,
98,
101,
110].
Finally, the novel tools proposed in this research have the potential to support near-real-time monitoring of phenology [
108,
111], which would improve irrigation management [
108], water use efficiency [
110], and agronomic practices such as pruning and fertilization [
47]. Moreover, DA-PhenM is also coupled to crop and primary productivity models for yield and carbon balance predictions to optimize resource use [
97,
112].
5. Conclusions
Data assimilation involves the integration of different data sources to improve the modeling process. In this research, a phenological model based on NDVI (RS-PhenM) was optimized, and three phenological models based on meteorological data (W-PhenM) and two novel models based on data assimilation (DA-PhenM) were evaluated.
RS-PhenM performs well in identifying the three most critical phenological stages of wine grapes, budburst, flowering, and veraison. Additionally, the setting stage (Onset fruit) was successfully included in the evaluation, contributing to the development of RS-PhenM. The performance of the RS-PhenM is supported by the noise removal process that was applied, consisting of two phases, one using the Golay–Savitsky algorithm and the second adjusting the NDVI to an asymmetric Gaussian model.
The evaluation of the W-PhenMs yielded satisfactory performance in terms of root mean square (RMSE). However, considering the required parameters, the General Flowering Veraison (GFV) model performed better than the Caffarra and Eccel model (CaEc). Although the CaEc could be parameterized to minimize the required parameters and simplify its practical application. Additionally, it is worth highlighting the contribution of the ability of the CaEc model to simulate flowering and veraison stages for Cabernet Sauvignon, which had only been reported for Chardonnay cultivars.
Two models based on data assimilation through the Extended Kalman Filter (EKF) algorithm, EKF-GFV and EKF-CaEc, were evaluated. The DA-PhenMs are a novel contribution to the phenological modeling of Vitis vinifera L. cv Cabernet Sauvignon since both models performed well compared to those assessed models in wheat and rice and diverse forest formations. However, the application of the proposed models is limited to the local conditions due to the reduced number of phenological observations utilized (three seasons). Additionally, other variables that influence the phenology of wine grapes were not considered, such as the photoperiod, soil texture, microclimatic conditions, and agronomic management. Despite the limitations of models, improvements could be made by incorporating in the assimilation process the leaf area index (LAI) data, additional remote sensing sources such as synthetic aperture radars (SAR), proximate sensors such as phenological cameras, and algorithms more suited to nonlinear processes such as the Particle Filter (PF). Finally, the proposed approach could contribute to monitoring vineyards’ phenology, representing an effective tool to optimize water consumption, irrigation management, agronomic practices, and yield prediction.