Modeling Phenology Combining Data Assimilation Techniques and Bioclimatic Indices in a Cabernet Sauvignon Vineyard ( Vitis vinifera L.) in Central Chile

: Phenology is a science that is fundamental to crop productivity and is especially sensitive to environmental changes. In Mediterranean and semi-arid climates, vineyard phenology is directly affected by changes in temperature and rainfall distribution, being highly vulnerable to climate change. Due to the signiﬁcant heterogeneity in soil, climate, and crop variables, we need fast and reliable ways to assess vineyard phenology in large areas. This research aims to evaluate the performance of the phenological data assimilation model (DA-PhenM) and compare it with phenological models based on meteorological data (W-PhenM) and models based on Sentinel-2 NDVI (RS-PhenM). Two W-PhenM approaches were evaluated, one assessing eco-and endo-dormancy, as proposed by Caffarra and Eccel (CaEc) and the widely used BRIN model, and another approach based on the accumulation of heat units proposed by Parker called the Grapevine Flowering Veraison model (GFV). The DA-PhenM evaluated corresponds to the integration between RS-PhenM and CaEc (EKF-CaEC) and between RS-PhenM and GFV (EKF-GFV). Results show that EKF-CaEc and EKF-GFV have lower root mean square error (RMSE) values than CaEc and GFV models. However, based on the number of parameters that models require, EKF-GFV performs better than EKF-CaEc because the latter has a higher Bayesian Index Criterion (BIC) than EKF-GFV. Thus, DA-PhenM improves the performance of both W-PhenM and RS-PhenM, which provides a novel contribution to the phenological modeling of Vitis vinifera L. cv Cabernet Sauvignon.


Introduction
Several processes of crop physiology depend on the climate at different time scales. Among them, crop phenology is fundamental as it determines crop performance over a growing season and is commonly used for agricultural planning. An accurate representation of this process is crucial for assessing crop productivity.
Global environmental changes experienced in recent decades and those predicted for the coming years affect the crop phenology, especially in areas with high climate variability, such as the Mediterranean and semi-arid zones [1][2][3][4][5]. Vineyards represent one of the most economically significant agricultural products in these areas. Thus, changes in vineyard phenology will shape management practices for strategic planning in the sector, especially regarding viticultural zoning and the selection of suitable cultivars [6][7][8][9][10].
In the area of phenology, DA has made possible the evaluation of the stage of several biome types, especially those found in the Mediterranean zone [79]. Better predictions are obtained in forest ecosystems in spring (when weather is highly uncertain) [65,80]. Additionally, DA helps to identify gaps in parameter estimation and poor relationships between state variables simulated by meteorological models [81,82].
Given the potential for phenology modeling approaches described above, developing a data assimilation (DA) based model in a vineyard would improve the goodness-of-fit performance of the W-PhenM and LSP. Therefore, this research aims to evaluate a DAbased phenology model that integrates W-PhenM with the Sentinel-2 LSP in a commercial Cabernet Sauvignon vineyard in Central Chile. Figure 1 summarizes the data assimilation approach for phenology modeling in vineyards. Firstly, phenology, meteorological, and micro-meteorological ground data are collected. Secondly, the evaluation of the phenology model based on remote sensing data (RS-PhenM) by applying the Savitzki-Golay filter (SG-Filter), fitting the data to a double Gaussian model, and then the derivation of phenological metrics are carried out. The third step is the evaluation of models based on meteorological data: the Grapevine Flowering Veraison model (GFV) [24], the Caffarra and Eccel approach [27] (CaEc), and the BRIN model [28]. Finally, the assimilation process is performed by the Extended Kalman Filter (EKF) algorithm and evaluation of the assimilated models: assimilated GFV (EKF-GFV) and assimilated CaEc (EKF-CaEc).

Materials and Methods
the estimation of state variables [66]. The Kalman Filter (KF) and its variants have been one of the most widely used algorithms in DA [67][68][69][70].
Crop simulation models have been integrated with remote sensing data of the Leaf Area Index (LAI) and soil moisture (SM) to improve yield prediction in grain crops through KF in linear processes. For nonlinear processes, the Extended Kalman Filter (EKF), Ensemble Kalman Filter (EnKF), and Particle Filter (PF) have been adopted [71][72][73][74][75][76][77]. In vineyards, the EnKF and PF have been used to assimilate data from high-resolution thermal infrared sensors and Synthetic Aperture Radar (SAR), with the soil-vegetationatmosphere transfer (SVAT) model to improve soil moisture modeling at the surface and root zone levels [78].
In the area of phenology, DA has made possible the evaluation of the stage of several biome types, especially those found in the Mediterranean zone [79]. Better predictions are obtained in forest ecosystems in spring (when weather is highly uncertain) [65,80]. Additionally, DA helps to identify gaps in parameter estimation and poor relationships between state variables simulated by meteorological models [81,82].
Given the potential for phenology modeling approaches described above, developing a data assimilation (DA) based model in a vineyard would improve the goodness-offit performance of the W-PhenM and LSP. Therefore, this research aims to evaluate a DAbased phenology model that integrates W-PhenM with the Sentinel-2 LSP in a commercial Cabernet Sauvignon vineyard in Central Chile. Figure 1 summarizes the data assimilation approach for phenology modeling in vineyards. Firstly, phenology, meteorological, and micro-meteorological ground data are collected. Secondly, the evaluation of the phenology model based on remote sensing data (RS-PhenM) by applying the Savitzki-Golay filter (SG-Filter), fitting the data to a double Gaussian model, and then the derivation of phenological metrics are carried out. The third step is the evaluation of models based on meteorological data: the Grapevine Flowering Veraison model (GFV) [24], the Caffarra and Eccel approach [27] (CaEc), and the BRIN model [28]. Finally, the assimilation process is performed by the Extended Kalman Filter (EKF) algorithm and evaluation of the assimilated models: assimilated GFV (EKF-GFV) and assimilated CaEc (EKF-CaEc).  Assimilated GFV (EKF-GFV) and assimilated CaEc (EKF-CaEc). Ground data is the reference information that is used for the parameterization of RS-PhenM and W-PhenM. The RS-PhenMs are optimized with SG-Filter and a double Gaussian model. The EKF algorithm builds the DA-PhenM from the RS-PhenM and W-PhenM.

Study Area
The study was carried out in a drip-irrigated vineyard (Vitis vinifera L. cv. Cabernet Sauvignon) during 2017-2018 (S 1 ), 2018-2019 (S 2 ), and 2019-2020 (S 3 ) growing seasons (October-May). The vineyard is located in central Chile, 30 km south of Santiago. This region is characterized by a Mediterranean climate with a mean annual temperature of Remote Sens. 2023, 15, 3537 5 of 25 12.2 • C, a mean temperature in January (summer) of 19.1 • C, and a mean in June of 5.6 • C. Precipitation is concentrated in winter (June-September) with an average annual total of 280 mm and average total reference evapotranspiration of 485 mm. Irrigation is the primary water source during the growing season because precipitation is concentrated in the austral winter (June-August).
The vineyard was planted in 2010. Rows are oriented north-south, with a spacing of 2.5 m between rows and 1.0 m between vines. Inter-rows are maintained vegetationfree using mechanical and chemical weed control measures. Water is applied by drip irrigation during the season. The irrigation time is calculated as a function of reference evapotranspiration (ET 0 ). Usually, the grower sets irrigation to restore 50% of ET 0 every seven days throughout the season. Due to the prevailing drought in the winter of 2019 and reduced water availability for irrigation in S 3 , irrigation time was set to restore 25% of ET 0 . Canopy management also varied among the growing seasons. In S 1 , the trellis system was vertical-shoot positioned with three-wire lines, while in S 2 and S 3 , the training system was structured without wire lines, increasing the frequency of topped and trimmed during the growing season.

Ground Data
Micrometeorological data were obtained using an eddy covariance system (EC), measuring energy and mass exchange between the vineyard and the atmosphere [83]. Due to the prevailing wind direction during the daytime, a west-facing EC tower was installed on the east border of the study area ( Figure 2, with coordinates 33 • 42 16 S and 70 • 34 32 W. Installed sensors, data processing, and quality control are described in detail in Ref. [84].

Study Area
The study was carried out in a drip-irrigated vineyard (Vitis vinifera L. cv. Cabernet Sauvignon) during 2017-2018 (S1), 2018-2019 (S2), and 2019-2020 (S3) growing seasons (October-May). The vineyard is located in central Chile, 30 km south of Santiago. This region is characterized by a Mediterranean climate with a mean annual temperature of 12.2 °C, a mean temperature in January (summer) of 19.1 °C, and a mean in June of 5.6 °C. Precipitation is concentrated in winter (June-September) with an average annual total of 280 mm and average total reference evapotranspiration of 485 mm. Irrigation is the primary water source during the growing season because precipitation is concentrated in the austral winter (June-August).
The vineyard was planted in 2010. Rows are oriented north-south, with a spacing of 2.5 m between rows and 1.0 m between vines. Inter-rows are maintained vegetation-free using mechanical and chemical weed control measures. Water is applied by drip irrigation during the season. The irrigation time is calculated as a function of reference evapotranspiration (ET0). Usually, the grower sets irrigation to restore 50% of ET0 every seven days throughout the season. Due to the prevailing drought in the winter of 2019 and reduced water availability for irrigation in S3, irrigation time was set to restore 25% of ET0. Canopy management also varied among the growing seasons. In S1, the trellis system was verticalshoot positioned with three-wire lines, while in S2 and S3, the training system was structured without wire lines, increasing the frequency of topped and trimmed during the growing season.

Ground Data
Micrometeorological data were obtained using an eddy covariance system (EC), measuring energy and mass exchange between the vineyard and the atmosphere [83]. Due to the prevailing wind direction during the daytime, a west-facing EC tower was installed on the east border of the study area ( Figure 2, with coordinates 33°42′16″S and 70°34′32″W. Installed sensors, data processing, and quality control are described in detail in Ref. [84].  Meteorological data were collected from William Fevre agrometeorological station, located 4 km north of the study area (33 • 67 S and 70 • 58 W). The station records solar radiation (MJ m −2 day −1 ), air temperature ( • C), relative humidity (%), wind speed (m s −1 ), and precipitation (mm).
During the three seasons, phenological observations were recorded every seven days using the modified Eichhorn and Lorenz (E-L) scale [85]. The E-L system identifies the main grapevine development stages (Table 1), and for the present research, Budburst (4), Flowering (23), Setting (27), and Veraison (35) were evaluated.

The Remote Sensing Phenological Model (RS-PheM)
Remote sensing data were obtained from the Sentinel-2 mission, with a spatial resolution of 10 m, a radiometric resolution of 12 bit, and a temporal frequency of 5 days at latitudes near the equator and 2-3 days at mid-latitudes [86]. The spectral bands used correspond to 4 (σ red ) and 8 (σ nir ), whose wavelength ranges are 0.64-0.70 µm and 0.73-0.93 µm, respectively. Remote sensing data were filtered by date during the growing season and by the percentage of cloudiness, selecting those with 30% or less, resulting in 96 images for the three seasons (Table 2). The calculation of the Normalized Difference Vegetation Index (NDVI) was supported by Google Earth Engine (GEE) [87]. GEE is based on JavaScript code for geospatial analysis allowing data assimilation with minimum computational capabilities.

NDVI Time Series Smoothing
Although images are filtered by cloud cover, they still maintain a significant noise level due to atmospheric conditions. The noise is evident by abrupt changes in NDVI values across the time series and does not correspond to the gradual variations of the vegetation during the growing season [88] due to atmospheric and local factors such as soil moisture conditions and the woody structure of perennial crops.
The intra-seasonal and inter-seasonal NDVI time series were smoothed. The smoothing process improves the identification of NDVI changes, which allows relating it with the phenological changes observed in the vineyard and its application according to the criterion of the first and second derivative for the definition of the LSP.
Although several noise removal methods have been developed, there is no agreement on the best method to use. However, the Savitzky-Golay filter has been successfully used in many NDVI-based studies for the assessment of vineyard LAI. Therefore, based on the good fit between LAI and phenology, the algorithm Savitzky and Golay [89] proposed for intraseasonal time series was used, which is a least-squares adjustment between consecutive values obtained by a weighted moving average given as a polynomial of a certain degree. The general equation is given by: where Y is the original NDVI value, Y* is the resultant NDVI value, C i is the coefficient for the ith NDVI value of the filter (smoothing window), and N is the number of convoluting integers and is equal to the smoothing window size. The subindex j is the running index of the original ordinate data table. The smoothing array (filter size) consists of 2m + 1 points, where m is the half-width of the smoothing window.
To smooth inter-seasonal time series, we used a second-degree asymmetric Gaussian model with the general equation: where x is the day of the year (DOY) and a 1 , a 2 , b 1 , b 2 , c 1 , and c 2 are parameters fitted to the NDVI time series. For convenience, DOY starts on 1 July of each year (DOY Jul = 1) and ends on 30 June of the following year (DOY Jul = 365) because the growing season starts in the southern hemisphere in September and ends in May next year. This adjustment of the DOY definition facilitates the evaluation of the models and allows comparison with results obtained in the northern hemisphere.

Remote Sensing Phenology Metrics
The metrics to identify NDVI variations associated with changes in the phenology were determined using a modification of the methodology proposed by Ref. [48] in Portugal vineyards. First, NDVI was adjusted to a seven-parameter double logistic model. Second, the first (δ 1 ) and second (δ 2 ) derivatives were calculated to obtain the phenological metrics (RS-PhenM). Third, the inflection points, local maximum, and minimum were identified. Therefore, by interpolation, the date (DOY jul ) of the Start of the Season (SOS), Left Inflection Point (LIP), Maximum Canopy Development (MCD), and Right Inflection Point (RIP) were estimated. These four NDVI phenological metrics were linked to the stages Budburst (4), Flowering (23), Setting (27), and Veraison (35), respectively (Table 3). Table 3. Phenological Metrics and their Equivalence in Phenological Stages according to first and second derivatives from a Third-Degree Asymmetric Gaussian model applied to NDVI data.

Grapevine Flowering Veraison Model (GFV)
The GFV is a model that assumes a phenological phase occurs when a critical value (F*) of the forcing variable (S f ) is reached at a time (t s ): where R f is the daily sum of the forcing rate, starting on a day of the year (t 0 ), which in the Northern Hemisphere is 1 March (DOY = 60) and in the Southern Hemisphere is 29 August (DOY Jul = 60), and x t is the daily mean temperature. The forcing rate in the GFV model is a function of the base temperature (T b ) of 0 • C, and the following criteria are applied: The BRIN model estimates the date when bud break occurs in vineyards. This model combines two phenological models, one associated with the endo-dormancy period [90] and the other with eco-dormancy [26]. Therefore, the bud break date (N bb ) occurs when the critical sum (G c ) of cumulative growing degree hours (A c ) since dormancy break (N db ) is reached: For the calculation of the growing degree hours (GDH), the hourly temperature of day n [T(h,n)] is estimated by linear interpolation between the maximum temperature of day n [T nx (n)] and the minimum temperature of the following day [T n (n + 1)]. Therefore, assuming that the length of the day is 12 h, it follows that: Equations (6) and (7) show that both the base temperature (T 0Bc ) and the maximum of the eco-dormancy period (T MBc ) limit the A c response; consequently: The BRIN model assumes that T 0Bc = 5 • C before bud break and T MBc = 25 • C. Additionally, dormancy break (N db ) occurs when a critical number (C C ) of chilling units (C U ) counted from 1 March (when buds are dormant) is reached. The C U is calculated based on the Q 10 concept, where an arithmetic progression of 10 • C temperature causes an action with a geometric regression of the Q 10 ratio:

Caffarra and Eccel (2010) Model (CaEc)
The model proposed by Caffarra and Eccel (CaEc) has two components: (a) one describing bud break based on the model of [25], where chilling hours act on the release of endo-dormancy, and (b) describing flowering and veraison as the result of the accumulation of forcing units by a sigmoidal function.
In this regard, the bud break of the CaEc model is based on the following equations: where F crit is the critical forcing units to reach the phenophases of interest; ForcState is the accumulated forcing units; ChState is the accumulated chilling units; T m is the mean daily air temperature; and a, b, c, e are curved shape parameters.
On the other hand, the model CaEc models the date of flowering and veraison according to the equation:

Parameterization and Evaluation of W-PhenM
The W-PheM were parameterized using the Phenology Modelling Platform (PMP) [91], which is free downloaded software (https://www.cefe.cnrs.fr/fr/recherche/ef/forecast/ phenology-modelling-platform, accessed on 25 April 2022) developed with the purpose of fitting phenology model parameters. PMP uses an optimization algorithm based on the simulated annealing method [92], which simultaneously adjusts all model parameters, obtaining effective overall convergence despite the interdependence among phenological model parameters.
To optimize the model evaluation based on data assimilation, the W-PhenM was parameterized with a set of phenological observations obtained from the National Network of Phenology Observatories (TEMPO) of France (https://data.pheno.fr/, accessed on 27 May 2022), which compiles the phenological database of France. Hence, the data were  (Table 4) (https://www.ncei.noaa.gov/ products/land-based-station/global-historical-climatology-network-daily, accessed on 15 February 2022). For the data of this study, models were evaluated using data from the William Fevre station.

Phenological Model Based on Data Assimilation (DA-PhenM)
Phenological data assimilation takes the system model's (W-PhenM) predictions and updates them with the observation model's (RS-PhenM) outputs. The processes described by both RS-PhenM and W-PhenM are nonlinear. DA algorithms for these processes, such as the Particle Filter (PF), require many observations and high computing capacity, making them complex to implement. Less complex algorithms, such as the Kalman Filter (KF), are implemented in linear systems and are suitable for nonlinear processes. The Extended Kalman Filter (EKF) is a modification of the KF, which incorporates Jacobians or partial derivatives to linearize nonlinear systems. In this regard, the prediction of the state variable is given by the following state-space model of the system: where the subscripts k and k − 1 are the current and previous time, respectively; x is the state variable of the system (e.g., cumulative forcing units); u is the driving input of the system (e.g., daily mean temperature); v is the process noise, assuming it is normally distributed with a mean of 0 and variance equal to Q k−1 (v~N(0, Q k−1 )); A is the matrix that describes the transition of the state variable between time k − 1 and k; and B is the matrix that describes the change in the system state from time k − 1 to k due to the effect of the driving variables. Additionally, differing from the KF, the system's nonlinear equations are linearized in matrix A by calculating the partial derivatives of each state variable versus time (Jacobians). Similarly, in matrix B, the partial derivatives are calculated for the state variables with respect to the driving variables of the system. Additionally, the DA-PhenM and hence the EKF process involve an observation model, which estimates the measurement at time k (y k ) from the prediction of the state variable at the same time, with the general expression given by: where H is the observation matrix used to estimate the sensor observation (e.g., Sentinel-2) at time k, and w is the measurement noise, assumed to be w~N (0, R k ). The NDVI from Sentinel-2 measurements is fitted to a nonlinear function (Equation (2)), so the matrix H is calculated from the Jacobian of the NDVI as a function of time.
After the state-space model of the system (Equation (15)) and the observation model (Equation (16)) are derived, the DA-PhenM model is run iteratively, assuming the initial conditions of the system for x k−1 (e.g., x k−1 = 0 forcing units) and u k−1 (e.g., the temperature at time k − 1).

Model Assessment
All models (RS-PheM, W-PheMs, and DA-PheMs) are evaluated using the following metrics to quantify the goodness of fit to the observed phenological data: • Root mean squared error (RMSE): where y i is the observed,ŷ i is the simulated data, and n is the number of observations. The RMSE has the constraint that it is sensitive to outliers. However, outliers decrease when the systematic error is reduced. On the other hand, the RMSE has the advantage of quantifying the error in relative terms, allowing intercomparison between models. In data assimilation, the RMSE is considered an objective function that must be minimized to fit the model parameters.

•
Model Efficiency (EF): where y is the average of observed dates. The model efficiency is the ratio between the model error (MSE) and the MSE of the average of observed dates. Therefore, EF ≈ 1 refers to a perfect model, while EF ≈ 0 means that the average of the observations is a better predictor than the model.   (19) where RMSE DA is the RMSE of DA-PhenM (EKF-GFV and EKF-CE); RMSE WM is the RMSE of W-PhenM (GFV and CE). DA skill is an indicator that allows comparison only between DA-PhenM since it shows the degree of the RMSE changes without DA and with DA. Positive values indicate that DA-PhenMs improve the prediction of W-PhenMs, while negative values show that DA-PhenMs do not improve the prediction of W-PhenMs.

•
Bayesian Information Criterion (BIC). According to [66], the BIC corrected for small samples is given by: where σˆ2 ML is the estimator of the maximum likelihood of the residual variance, which is given by: The BIC allows evaluating models with different parameters to calculate, as in the W-PhenM, RS-PhenM, and DA-PhenM. Therefore, the BIC favors the simpler models since it has a component that penalizes the number of parameters. Thus, when comparing the BIC of the models, the best model is the one with the smallest value.
The evaluation of the models will be performed in two stages. In the first stage, the RMSEs are compared. The model with the best performance is the one with the lowest RMSE. In the second stage, the models are compared according to the BIC, with the best performance being the lowest BIC. Although DA-PhenMs are expected to have the highest BIC value, the best model based on data assimilation will be the one with the lowest DA skill .

Environmental Drivers of Wine Grape Phenology
The environmental drivers of wine grape phenology addressed in this research are temperature and evapotranspiration. Figure 3 shows the average daily temperature of the evaluated seasons from DOY jul 60 (29 August) to DOY jul 275 (1 April). The largest temperature difference among seasons occurs at the beginning of the season, between DOY jul 60 and 90 (28 September), which is in line with the high interannual variability of average air temperature in September, accounting for a coefficient of variation (CV%) of 69.2%, according to the records of William Fevre station between 2016 and 2022.
The interannual variability of phenology is closely associated with air temperature one or two months before each stage [7,12,93]. Table 5 shows that in S 3 , the budburst stage occurs nine days earlier than S 2 and four days earlier than S 1 , which responds to a higher GDD ac due to warmer temperatures in late winter (August-September). Additionally, flowering in S 1 is delayed by 13 and 12 days compared to S 2 and S 3 , while veraison in S 3 is delayed by eight and five days compared to S 1 and S 2 . Figure 4 shows the cumulative evapotranspiration of wine grapes from DOY jul 124 (1 November) to 274 (31 March). The S 1 had the highest water consumption of 316.8 mm, S 2 with 304.5 mm, and S 3 had the lowest water consumption of 202.6 mm. Evapotranspiration depends on irrigation management. S 1 and S 2 were irrigated according to the vineyard water demand or crop evapotranspiration (ET c ), 322.8 mm in S 1 and 314.8 mm in S 2 , corresponding to 98% and 97% of ET c , respectively. flowering in S1 is delayed by 13 and 12 days compared to S2 and S3, while veraison in S3 is delayed by eight and five days compared to S1 and S2.   2 GDD is the growing degree days since 1 July product of GDD = Tm − Tb. Tb is the base temperature set to 10 °C. Figure 4 shows the cumulative evapotranspiration of wine grapes from DOYjul 124 (1 November) to 274 (31 March). The S1 had the highest water consumption of 316.8 mm, S2 with 304.5 mm, and S3 had the lowest water consumption of 202.6 mm. Evapotranspiration depends on irrigation management. S1 and S2 were irrigated according to the vineyard water demand or crop evapotranspiration (ETc), 322.8 mm in S1 and 314.8 mm in S2, corresponding to 98% and 97% of ETc, respectively.  1 DOY refers to the number of days since 1 July. 2 GDD is the growing degree days since 1 July product of GDD = T m − T b . T b is the base temperature set to 10 • C. Figure 5 shows the NDVI time series before (Figure 5a-c) and after applying the Golay-Savitzki filter (Figure 5d-f). Once the filter is used, the noise removal has been evident since mid-December, when the NDVI decreases. Furthermore, the filtered curves preserve their contours related to maximums and the pattern of changes, an essential feature of the Golay-Savitzki filter [88,94,95] and for calculating phenological metrics from remote sensing data. However, it has been reported that this method is subject to NDVI overestimation when the noise is strong [94][95][96]. However, to mitigate this error, it used images with a cloud cover percentage lower than 30%.

Phenological Model Based on Remote Sensing Data (RS-PhenM)
On the other hand, Figure 6 shows the NDVI fitted to a Gaussian model (Figure 6a-c) and their respective first (Figure 6d-f) and second derivatives (Figure 6g-i), which eases the identification of the maximums, minimums, and inflection points of the curves. Figure 6a-c show the variability of NDVI from one season to another, where it is evident that in S 3 , the peak is lower (0.363) than in S 1 (0.553) and S 2 (0.502). Minimum NDVI values, S 1 (0.21) and S 2 (0.195), were lower than S 3 (0.252), which results in a higher amplitude in S 1 -S 2 and lower oscillation in S 3 . Despite the differences in NDVI among seasons, extreme values are consistent with those reported in vineyards with MODIS [53] and SPOT [47] remote sensing data. Table 6 shows the Root Mean Square Error (RMSE) in days and the Bayesian Information Criterion (BIC) of the Remote Sensing Phenological Model (RS-PhenM), represented by the phenological metrics from NDVI. Therefore, the budburst stage is given when the first derivative (δ 1 ) is equal to zero (Figure 6d-f) and the second derivative (δ 2 ) is positive (Figure 6g-i), yielding an RMSE of 4.8 days.  . Vertical dotted lines refer to each phenological stage's earliest and latest observed dates. Cumulative ET in S3 is lower than in S2 and S1. In the flowering window, cumulative ET in S3 was about 20% less than cumulative ET in S2 and S1, while in the veraison window, the difference in S3 compared to S1 and S2 was about 34%. Figure 5 shows the NDVI time series before (Figure 5a-c) and after applying the Golay-Savitzki filter (Figure 5d-f). Once the filter is used, the noise removal has been evident since mid-December, when the NDVI decreases. Furthermore, the filtered curves preserve their contours related to maximums and the pattern of changes, an essential feature of the Golay-Savitzki filter [88,94,95] and for calculating phenological metrics from remote sensing data. However, it has been reported that this method is subject to NDVI overestimation when the noise is strong [94][95][96]. However, to mitigate this error, it used images with a cloud cover percentage lower than 30%.

Phenological Model Based on Remote Sensing Data (RS-PhenM)
On the other hand, Figure 6 shows the NDVI fitted to a Gaussian model (Figure 6ac) and their respective first (Figure 6d-f) and second derivatives (Figure 6g-i), which eases the identification of the maximums, minimums, and inflection points of the curves. Cumulative ET in S 3 is lower than in S 2 and S 1 . In the flowering window, cumulative ET in S 3 was about 20% less than cumulative ET in S 2 and S 1 , while in the veraison window, the difference in S 3 compared to S 1 and S 2 was about 34%.    Figure 6a-c show the variability of NDVI from one season to another, where it is evident that in S3, the peak is lower (0.363) than in S1 (0.553) and S2 (0.502). Minimum   5 Grapevine Flowering Veraison model. 6 Caffarra and Eccel Approach, 2010. 7 Assimilated GFV (EKF-GFV) and 8 assimilated CaEc (EKF-CaEc).
The flowering stage is the left inflection point of the NDVI curve, where the rate of NDVI changes and δ 1 is the primary maximum and δ 2 = 0. Therefore, the flowering stage based on the NDVI gives an RMSE of 22.3 days, which, in contrast to budburst, is a high value compared to Cunha's results, in which applying the inflection point criterion, the error reported ranges between 3 and 6.5 days and showed a correlation coefficient of 0.81. The higher RMSE shown in flowering could be related firstly to a short time interval between flowering and setting (14 to 20 days), and secondly to the lack of coordination between the phenological observation in the field and the satellite overpass, which could be amplified due to unfavorable atmospheric conditions that block the data capture by the satellite. Table 7 shows the parameterization goodness-of-fit of BRIN, GFV, and CaEc models, while Appendix B lists the parameters used to evaluate the W-PhenM. In the budburst stage, the CaEc model performs better than BRIN, while in the flowering and veraison stages, the GFV model achieves better goodness-of-fit than the CaEc model. Furthermore, the models evaluated in S 1 , S 2 , and S 3 during the budburst stage maintain the trend in parameterization, where CaEc performs better than BRIN. However, considering the number of parameters required by each model, BRIN performs better with a BIC of 12.9, in contrast to 18.0 for CaEc (Table 6).

Phenological Models Based on Weather Data (W-PhenM)
Evaluating models in the flowering and veraison stages, CaEc has lower prediction error than GFV. Despite that, GFV performs better than CaEc, since it requires fewer parameters. Thus, BIC from the GFV model is lower in flowering (11.5) and veraison (12.4). The GFV model has better performance in flowering (RMSE = 10.7 days) than in veraison (RMSE = 20.8 days), which agrees with [24,97]. Still, the prediction error reported by those authors is lower, ranging between 4 and 5 days in flowering and between 5 and 7 days in veraison.

Phenological Models Based on Data Assimilation (DA-PhenM)
Data assimilation aims to improve predictions through the enhancement of goodnessof-fit metrics. Thus, in Figure 7, the EKF-CaEc improves the prediction of the budburst stage for CaEc, which means a reduction in RMSE (DA skill ) of 10%. EKF-GFV has the best flowering performance, while EKF-CaEc performs best in veraison (Table 6). However, the simplest model is EKF-GFV, with the lowest BIC in both phenological stages. EKF-GFV improves GFV performance in flowering and veraison, giving it a DA skill of 42.9% and 19.7%, respectively. Furthermore, EKF-CaEc improves CaEc prediction in flowering with DA skill of 19.7% and in veraison with DA skill of 5%.
The phenological models based on data assimilation proposed herein are novel in the phenological modeling of Vitis vinifera L. cv Cabernet Sauvignon since they are based on Sentinel-2 remote sensing data and models based on forcing (GFV) and chilling (CaEc) units. The results are consistent with data assimilation approaches to improve phenology and yield simulation in wheat and rice at regional scales with MODIS remote sensing data [98,99] and coupled to crop simulation models such as the World Food Study (WOFOST) [97]. Additionally, the improved predictions are similar in error magnitude to that reported with Sentinel-2 data in tropical dry forests and temperate deciduous forests [100], bamboo forests [101], and several global-scale biomes [65,79] with MODIS remote sensing data.

Discussion
Interannual variability in budburst, flowering, and veraison dates is reported by [102] with variations of eight days in budburst, while Ref. [13] showed variations of 19 days in budburst, nine days in flowering, and 13 days in veraison. On the other hand, the dates reported in Table 5 are consistent with those observed for cv. Cabernet Sauvignon in Central Chile between 2004 and 2006 [102] and 2009 and 2013 [31].
On the other hand, the irrigation in S3 only accounted for 65% of the ETc (313 mm) due to Central Chile's drought, responsible for a diminishing water supply in channels. Although water consumption dropped in S3, budburst and flowering dates were similar to S1 and S2, while veraison was delayed compared to the previous seasons. Thus, water stress is reported to have a greater impact at the berry formation stage (E-L 27) [103]. However, water availability in grapevine phenology is coupled with other environmental variables, such as the soil type and temperature [20,21]. The fitted Gaussian model provides a daily time series, increasing the accuracy in estimating phenological stages [95]. Additionally, it is a valuable tool for identifying interannual NDVI variations since curve parameters allow a valid estimation for large areas [104][105][106]. However, it cannot identify specific dates of phenological stages around the curve peaks [105].
Regarding phenological metrics extracted from NDVI, the days between the Start of the Season (SOS) and Maximum Canopy Development (MCD) in S1 and S2 were 70 and 72 days, respectively. In comparison, in S3, it was 55 days, which contrasts with an average of 102 days reported by Ref. [53] in Washington State and an average of 90 days reported by Ref. [47] in the Douro region of Portugal. In both Washington State and the Douro Region, the vineyards are under a rainfed system, where the total annual rainfall is around 300 mm and 580 mm, respectively. In our study area, the vineyard is under irrigation, particularly in S3, which consumed 65% of the ETc, equivalent to 203 mm. The latter suggests that the phenological metrics derived from NDVI are related to the water available for wine grapes, thus defining the extremes of NDVI and the duration of the periods based on the intra-annual behavior of the vegetation index. Taking into account that prior to and during budburst, the vineyard is transparent to NDVI, the vineyard surface is characterized by the presence of vegetation in the inter-row area and high soil water content as a result of winter precipitation, which is explained by a correlation coefficient of −0.88 reported by Ref. [47].
The RMSE of veraison based on the NDVI criterion is 8.6 days; this criterion differs from Ref. [49], which pointed out that veraison correlates to the local maximum of the

Discussion
Interannual variability in budburst, flowering, and veraison dates is reported by [102] with variations of eight days in budburst, while Ref. [13] showed variations of 19 days in budburst, nine days in flowering, and 13 days in veraison. On the other hand, the dates reported in Table 5 are consistent with those observed for cv. Cabernet Sauvignon in Central Chile between 2004 and 2006 [102] and 2009 and 2013 [31].
On the other hand, the irrigation in S 3 only accounted for 65% of the ET c (313 mm) due to Central Chile's drought, responsible for a diminishing water supply in channels. Although water consumption dropped in S 3 , budburst and flowering dates were similar to S 1 and S 2 , while veraison was delayed compared to the previous seasons. Thus, water stress is reported to have a greater impact at the berry formation stage (E-L 27) [103]. However, water availability in grapevine phenology is coupled with other environmental variables, such as the soil type and temperature [20,21]. The fitted Gaussian model provides a daily time series, increasing the accuracy in estimating phenological stages [95]. Additionally, it is a valuable tool for identifying inter-annual NDVI variations since curve parameters allow a valid estimation for large areas [104][105][106]. However, it cannot identify specific dates of phenological stages around the curve peaks [105].
Regarding phenological metrics extracted from NDVI, the days between the Start of the Season (SOS) and Maximum Canopy Development (MCD) in S1 and S2 were 70 and 72 days, respectively. In comparison, in S3, it was 55 days, which contrasts with an average of 102 days reported by Ref. [53] in Washington State and an average of 90 days reported by Ref. [47] in the Douro region of Portugal. In both Washington State and the Douro Region, the vineyards are under a rainfed system, where the total annual rainfall is around 300 mm and 580 mm, respectively. In our study area, the vineyard is under irrigation, particularly in S3, which consumed 65% of the ETc, equivalent to 203 mm. The latter suggests that the phenological metrics derived from NDVI are related to the water available for wine grapes, thus defining the extremes of NDVI and the duration of the periods based on the intra-annual behavior of the vegetation index. Taking into account that prior to and during budburst, the vineyard is transparent to NDVI, the vineyard surface is characterized by the presence of vegetation in the inter-row area and high soil water content as a result of winter precipitation, which is explained by a correlation coefficient of −0.88 reported by Ref. [47].
The RMSE of veraison based on the NDVI criterion is 8.6 days; this criterion differs from Ref. [49], which pointed out that veraison correlates to the local maximum of the NDVI curve, showing a correlation coefficient of 0.87. However, it should be considered for this research that the NDVI was fitted to an asymmetric Gaussian model. In contrast, Ref. [47] study was fitted to a double logistic model and did not conclude, due to lack of evidence, on the phenological meaning of the right inflection point of the NDVI curve. The present research proposes a phenological Vitis vinifera L. cv model. Based on Sentinel-2 NDVI data, Cabernet Sauvignon is fitted to an asymmetric double Gaussian model identifying the budburst, flowering, setting, and veraison stages. The proposed model should be improved considering the NDVI time series longer than four seasons and further fine-tuning of the criteria to identify the flowering stage to reduce estimation error.
In evaluating W-PhenM, BRIN model performance is slightly better than those reported for Cabernet Sauvignon by Ref. [28] (RMSE = 9.7 days) and Ref. [29] (RMSE = 11.1), although Ref. [34] obtained a better performance with RMSE around 6.0 days. Compared to the CaEc model, mixed results have been reported, with better performance with RMSE between 4.5 and 5.7 days in Cabernet Sauvignon [34] and a higher RMSE of 22.8 days in Chardonnay [37]. In addition, the higher efficiency of CaEc compared to BRIN in the budburst phase is due to the ability of CaEc to capture the behavior of the system in the eco-dormancy phase, which is supported by the results shown by [27], where the CaEc model explains about 40% of the variance in the modeled budburst date and about 30% of the observed budburst date variance.
This research evaluated models that consider endo-dormancy. Since most climate change scenarios predict an increase in temperature at the end of winter, this group of models would have higher accuracy in predicting budburst [7,34]. However, assessments of the current climate have concluded that models based on forcing units, such as Degrees Growing Days with a base temperature of 5 • C (GDD 5 ) and 10 • C (GDD 10 ), predict the budburst date better. Thus, models such as BRIN and CaEc do not provide higher accuracy despite the higher number of parameters required [24,28,29,34].
On the other hand, Ref. [37] reported higher RMSE values in flowering and lower in veraison. In addition, the CaEc model has not been evaluated for Cabernet Sauvignon in the flowering and veraison stages. Hence, the reference evaluations apply to the Chardonnay variety. In this regard, the errors reported for CaEc are consistent with those obtained, around seven days for flowering and five days for veraison, with better performance than GFV.
The GFV model is more efficient than the CaEc model, especially in the veraison satge. The good performance of the GFV model is likely due to the 0 • C T b used by the model, which can encompass some important physiological processes not captured by the CaEc model and, as [24] points out, T b = 0 • C is a threshold that allows the convergence of the thermal sum simultaneously in the flowering and veraison stage.
The differences obtained in flowering and veraison prediction are likely due to errors in selecting external parameterization data, which involve differences in soil texture, available water, rootstock, pruning, and micrometeorological conditions that are not necessarily represented by the selected weather stations [37,96]. In addition, the performance of models reflects the amount of data used in the evaluation, such as Refs. [24,107], who had for the flowering stage, 70 and 62 observations for calibration and validation, respectively, while for the veraison stage, they had 66 (calibration) and 105 (validation) observations, which is proportional to the regional validity of the study.
Despite the promising results, proposed DA-PhenMs are limited to local conditions similar to those performed in this evaluation. On the other hand, there is uncertainty in the climatic reliability of the William Fevre and Montpellier Airport stations because of the influence of microclimatic conditions on vineyard phenology [98]. In addition, this evaluation does not consider variables that determine phenology, such as photoperiod, soil texture, fertility management, and pruning.
However, limitations can be mitigated with the incorporation of proximal sensors such as phenological cameras [101] or remote sensors such as synthetic aperture radars (SAR), which have provided valuable insights into vineyard water balance modeling [71,108]. Regarding assimilation algorithms, the Particle Filter (PF) is more suitable for nonlinear processes [71,99,108]. Therefore, considering CaEc fits a logistic model, data assimilation with CaEc is likely improved with PF, as reported in the phenological modeling of bamboo forests [109]. Moreover, DA-PhenM performance could be enhanced with the inclusion of variables such as the Leaf Area Index (LAI) since it is closely related to phenology [78,79,98,101,110].
Finally, the novel tools proposed in this research have the potential to support nearreal-time monitoring of phenology [108,111], which would improve irrigation management [108], water use efficiency [110], and agronomic practices such as pruning and fertilization [47]. Moreover, DA-PhenM is also coupled to crop and primary productivity models for yield and carbon balance predictions to optimize resource use [97,112].

Conclusions
Data assimilation involves the integration of different data sources to improve the modeling process. In this research, a phenological model based on NDVI (RS-PhenM) was optimized, and three phenological models based on meteorological data (W-PhenM) and two novel models based on data assimilation (DA-PhenM) were evaluated.
RS-PhenM performs well in identifying the three most critical phenological stages of wine grapes, budburst, flowering, and veraison. Additionally, the setting stage (Onset fruit) was successfully included in the evaluation, contributing to the development of RS-PhenM. The performance of the RS-PhenM is supported by the noise removal process that was applied, consisting of two phases, one using the Golay-Savitsky algorithm and the second adjusting the NDVI to an asymmetric Gaussian model.
The evaluation of the W-PhenMs yielded satisfactory performance in terms of root mean square (RMSE). However, considering the required parameters, the General Flowering Veraison (GFV) model performed better than the Caffarra and Eccel model (CaEc). Although the CaEc could be parameterized to minimize the required parameters and simplify its practical application. Additionally, it is worth highlighting the contribution of the ability of the CaEc model to simulate flowering and veraison stages for Cabernet Sauvignon, which had only been reported for Chardonnay cultivars.
Two models based on data assimilation through the Extended Kalman Filter (EKF) algorithm, EKF-GFV and EKF-CaEc, were evaluated. The DA-PhenMs are a novel contribution to the phenological modeling of Vitis vinifera L. cv Cabernet Sauvignon since both models performed well compared to those assessed models in wheat and rice and diverse forest formations. However, the application of the proposed models is limited to the local conditions due to the reduced number of phenological observations utilized (three seasons). Additionally, other variables that influence the phenology of wine grapes were not considered, such as the photoperiod, soil texture, microclimatic conditions, and agronomic management. Despite the limitations of models, improvements could be made by incorporating in the assimilation process the leaf area index (LAI) data, additional remote sensing sources such as synthetic aperture radars (SAR), proximate sensors such as phenological cameras, and algorithms more suited to nonlinear processes such as the Particle Filter (PF). Finally, the proposed approach could contribute to monitoring vineyards' phenology, representing an effective tool to optimize water consumption, irrigation management, agronomic practices, and yield prediction.

Conflicts of Interest:
The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Appendix A
The second step of DA-PhenM is to predict the state variable at time k from the driving and state variables at time k − 1, according to the state-space model (Equation (15)). Next, the third step is to calculate the covariance of the state variable, which is an estimate of the accuracy of the prediction made in the second step, given in the following general form: where P k|k−1 is the covariance of the state variable at time k given the covariance at time k − 1; P k−1|k−1 is the covariance at time k − 1; A T k−1 is the transposed A matrix; and Q k is the noise covariance matrix of the state-space model.
The fourth step of DA-PhenM calculates the residual (y k ) between the actual observation (z k ) and the estimated observation (y k ) according to the following expression: Then, the fifth step calculates the covariance of the residual of the observations (S k ), given by: where H T k is the transpose matrix of H k ; and R k is the matrix of sensor noise covariance. High values of R k are related to high measurement uncertainty, while values close to zero denote a low measurement uncertainty.
The Kalman Gain (K k ) is calculated in the sixth step of the DA-PhenM, which accounts for the magnitude in which the state variable predictions and their covariance must be corrected: The Kalman Gain (K k ) is tunable by adjusting the covariances of the predictions of the system and the observations. Thus, if the measurement noise is large, K k approaches 0, and the measurements have slight weighting on the best estimate. Conversely, if the process noise of the state-space model is high, K k is close to 1; hence, the sensor measurements have higher weights in the state variable prediction.
Finally, the last steps of DA-PhenM involve updating the estimated state variable and its covariance, which are given by the following expressions: .
x k|k = x k + K k y k (A5) P k = (I − K k H k )P k|k−1 (A6) Equation (21) calculates the state of the system after a new measurement of the sensor at time k (updated), based on values from step 2 (x k ), step 4 (y k ), and step 6 (K k ). In Equation (A6), where I is the identity matrix, the covariance of the system is estimated once it is updated with a new measurement.
This research describes the system by the GFV (Equations (3) and (4)) and CaEc (Equations (11)-(14)) models. At the same time, the observations are outlined by the NDVI time series model (Equation (2)). Therefore, two DA-PhenMs, the GFV (EKF-GFV) and CE (EKF-CE) assimilated with NDVI, are evaluated. Appendix C Figure A1. Eddy covariance system oriented in the prevailing direction of daytime winds over the vineyard alignment (north-south). Irgason (infrared gas analyzer and sonic anemometer); PAR (photosynthetically active radiation sensor); Net Radiation (net radiation sensor); T & HR (temperature and relative humidity sensor); TDR (time-domain reflectometry system); Flux Soil (soil sensible heat flux plate system and thermocouples). Figure A1. Eddy covariance system oriented in the prevailing direction of daytime winds over the vineyard alignment (north-south). Irgason (infrared gas analyzer and sonic anemometer); PAR (photosynthetically active radiation sensor); Net Radiation (net radiation sensor); T & HR (temperature and relative humidity sensor); TDR (time-domain reflectometry system); Flux Soil (soil sensible heat flux plate system and thermocouples).