Understanding Vine Hyperspectral Signature through Different Irrigation Plans: A First Step to Monitor Vineyard Water Status

: The main challenge encountered by Mediterranean winegrowers is water management. Indeed, with climate change, drought events are becoming more intense each year, dragging the yield down. Moreover, the quality of the vineyards is affected and the level of alcohol increases. Remote sensing data are a potential solution to measure water status in vineyards. However, important questions are still open such as which spectral, spatial, and temporal scales are adapted to achieve the latter. This study aims at using hyperspectral measurements to investigate the spectral scale adapted to measure their water status. The ﬁnal objective is to ﬁnd out whether it would be possible to monitor the vine water status with the spectral bands available in multispectral satellites such as Sentinel-2. Four Mediterranean vine plots with three grape varieties and different water status management systems are considered for the analysis. Results show the main signiﬁcant domains related to vine water status (Short Wave Infrared, Near Infrared, and Red-Edge) and the best vegetation indices that combine these domains. These results give some promising perspectives to monitor vine water status.


Introduction
A moderate hydric deficit is essential to ensure vigor of vineyards and to get both high yield and quality of vineyards [1,2]. This is needed to restrict vegetative development and foster the growth of berries, especially between the fruit set and veraison [3]. From veraison to harvest, the optimal water status depends on the desired type of wine. Without water restriction, the produced red wine will be herbaceous, diluted, and acidic, whereas a severe deficit will result in red wines that are excessively tannic, hard, astringent, and alcoholic [3]. Several adaptation strategies have recently been developed to deal with this water constraint issue. Long-and short-term approaches were explored and different scenarios have been considered, such as relocating some vineyards to a more adequate climate, developing resistant varieties [4], improving soil or cover crop management [5][6][7] and preharvesting. However, all these farming practices have different effects according to-among other factors-the terroir, the way they are implemented, or the vine's technological stage. Some other solutions involve compensating for the lack of water, improving its retention or use, and reducing the impact of water stress on the quality of the grapes. Moreover, climate change in the south of Europe has led to an increase in temperatures and a decrease in rainfall during summer [8]. This evolution has led to an increase of water constraint for grapevines. Therefore, a fast and accurate identification of fields and regions suffering from water stress is important to set up efficient countermeasures [7]. Such information could be a critical asset in terms of water management, for example, when restrictions are applied to prioritize areas of greatest need. It could also be useful in the context of plot selection, when preparing allotments and choosing plots that should be harvested together [9,10]. In Europe, less than 30% of the wine-growing areas are presently irrigated as opposed to more than 80% in the New World (Argentina, Australia, Chile, New Zealand, etc.). However, this proportion is constantly increasing over the years and after further heat wave episodes [11]. At the same time, in order to preserve the most important but also the most vulnerable resource that is water, restrictions are set up to control and limit water use. In addition, to preserve the specific characteristics of some French Appellation d'Origine Contrôlée (AOC) wines, irrigation approvals are granted exclusively at particular times and with respect to important water stress.
Whether it is to control quality and quantity or to limit the impact of droughts, it is therefore necessary to quantify water stress levels in vineyards. To achieve this, measurements are usually made in the field [12,13]. A common technique involves measuring the stem water potential [12][13][14][15][16], which provides a better indication of the impact of soil water content on grapevine water status than leaf water potential (when leaves are fully exposed to sunlight) [14]. Other measurements can be done using stomatal conductance or a sap flow sensor [12,13]. In any cases, measurement of water potentials is always time consuming because of the high number of measurements required to get intra-and interplot variability [13].
In this context, there is a particular need for new techniques that allow us to accurately and efficiently map the vines' water status. Among them, optical remote sensing seems of particular interest because satellite sensors are available and already widely used in agriculture. Multispectral images are now easily accessible in the context of the Copernicus missions, thanks to the Sentinel-2 satellites [17,18]. Free access to the data set, combined with its spectral (12 bands in the visible (VIS) to Short-Wave Infrared (SWIR) regions), temporal (5 days revisit time with both satellites A and B), and spatial resolution (10 m to 20 m) makes it a powerful and versatile tool. Several recent studies have demonstrated the relevance of Sentinel-2 images to detect irrigated crops [19] or estimate cotton water consumption [20], for example. Some studies focused on the monitoring of vineyards [21,22], the impact of heatwaves on irrigated vineyard [23] or their water status [24]. This last recent study shows interesting correlations between stem water potential and Sentinel-2 images. However, only 6 vegetation indices (VIs) were tested and none of them included red-edge bands, for example. It is the only study, to our knowledge, to have tested the link between stem water potential (SWP) and Sentinel-2 images with a large number of commercial vineyards throughout the season.
Overall, knowledge needs to be extended to other conditions/varieties/territories to serve as a basis to set-up a robust and efficient tool to monitor vine water status. Moreover, we think that it is necessary to evaluate if the spectral bands available in the satellites are really efficient for identifying vineyard water status.
In order to do that, hyperspectral data are of particular interest. They can improve our understanding of how plants' sanitary status affects the whole range of spectral signatures and allows us to highlight the most sensitive and reliable wavelengths or spectral domains. Some previous studies focused on the whole spectrum to predict leaf water content in maize [25] or grapevine [26][27][28]. Most of them relied on vegetation indices (i.e., a mathematical combination of the reflectance at two or more wavelengths) for maize [29], trees [30], wheat [31], millet [32], sorghum [32], cowpea [33], bean [33], sugar beet [33], and grapevine [34,35]. Recent works using hyperspectral measurements focused on finding the best wavelengths or the best combination of wavelengths related to water status [26][27][28][34][35][36]. These studies often highlight wavelengths from SWIR, which is as expected since it is the location of the water absorption bands. According to this literature, the most promising vegetation indices (VI) are either directly related to water content or indirectly related through their impact on nutrient status or on cell composition, for example. These studies were carried out on a single plot in a field or in experimental green houses.
As hyperspectral sensors are still hard to deploy at a broad scale or for a potential operational service (sensor, processing, and storage costs), we believe that the acquired knowledge could, in turn, be used to process multispectral images more efficiently, targeting the bands that best fit with the most pertinent spectral domains.
This study is part of a global project to monitor the water status of vines by remote sensing. It is the first phase that should enable us to validate our global understanding of the spectral response of the vine to different water statuses. In order to switch from hyperspectral to multispectral scale, several measurements were made from the leaf to the canopy. Indeed, the transition between hyper-and multispectral is not so obvious since there are many issues such as the spatial and spectral resolution of multispectral sensors. This paper focuses on the link between SWP values and hyperspectral reflectance of the leaf as it is the first step to understand the vines' spectral answer under different water statuses.
In this paper, we use hyperspectral data to extend and potentially validate knowledge over several vine plots with three varieties of grapes and over a more pertinent set of data (different stages of development, from pea-size to ripening, and with different irrigation schemes). The main goals are (1) to find out which spectral domains best explain the water status in vines and (2) to define which vegetation index would be the best to accurately monitor water status in vineyards. These results are a first step towards a more global and challenging objective, which is to target the most relevant bands in multispectral sensors, and could allow for the set-up of an operational and efficient tool to monitor vine water status.
To achieve these objectives, four typical Mediterranean vineyards were monitored during summer 2019. This study relies on the combination of two different field measurements: hyperspectral leaves reflectance using an ASD FieldSpec 4 Hi-Res NG Spectroradiometer in visible (VIS) to SWIR wavelengths (from 350 to 2500 nm) and water status measured through stem water potential. The methodological approach explores both the reflectance of leaves using raw wavelengths and data averaged by spectral domains. These features are therefore used to search for correlations with stem water potential (SWP), which is an indicator of the plant water status. The results highlight the best spectral domains related to vine water status and the best vegetation indices for vine water status monitoring.

Study Sites
The study was conducted over four vineyard plots located in the Occitanie region (Herault department) in the south of France ( Figure 1 Table 1. Plot A benefits from a degraded Mediterranean climate with rainfall well distributed throughout the year while plots B and C benefit from a typically Mediterranean climate with a very significant amount of sunshine and relatively mild temperatures. The vine variety planted in plots A and C2 is Syrah. This variety is cultivated all over the world whatever the climate (France, Australia, South Africa, USA, etc.). Syrah has an anisohydric behavior (i.e., it keeps its stomata open even in the case of water constraint), which is favorable to photosynthesis in dry soil conditions but can be problematic in the case of severe drought [37,38]. The Plot B variety is Chardonnay, which has the same anisohydric behavior. On the other hand, the variety of plot C1 is Grenache, which has an isohydric behavior that is well adapted to the Mediterranean climate (i.e., its stomata close as soon as the water content of the soil decreases) [37,38]. Vines could have evolving behaviors fluctuating between the iso and anisohydric genotype, depending on their development stage and climatic and soil conditions [39]. For each plot, A and B, three different irrigation schemes were monitored in order to have a wide range of SWP values in the same plot at the same time (Table 1). C1 and C2 were chosen as nonirrigated plots and were used for the last two weeks to complete measurements on red varieties in addition to plot A (details are given further). The first experiment on plot A was set up especially for this study while the second on plot B was initially set up to determine the impact of water status on wine composition and has been there since 2017. For both plots, drip irrigation systems were used and three distinct areas were irrigated in three different ways: the first area was not irrigated (control) in order to stress the vine as much as possible, the second one was irrigated just enough to maintain a proper grape quality according to winegrowers (W1), and the third was highly irrigated to maximize the yield (W2) ( Figure 2, Table 2). For each irrigation scheme, several subplots were defined for field measurements according to the Sentinel-2 pixels grid. A subplot usually corresponds to a square of four pixels with a ground spatial resolution of 10 m, except for the control area within plot B whose size is too small (Figure 2).
Irrigation was managed according to the expected type of wine and the vine development stage, and was monitored for the two test plots (Table 2). Unfortunately, the irrigation system installed in plot A broke down during the summer and some data related to the amount of water supplied could not be recovered.  For each plot, a grid with Sentinel-2 pixels was created and six subplots (four pixels of 10 m) were chosen to carry out measurements with the objective to be in areas that are homogeneous in terms of vigor ( Figure 3).

Field Measurements
Two measurements were carried out simultaneously during this experiment: (1) the SWP was used to get the water status of vines, (2) the leaves' reflectances were obtained with a field spectrometer. Details on acquisition protocols will be given in Sections 2.3.1 and 2.3.2, respectively. For each 20 × 20 m subplot, both measurements were completed on 10 vines chosen over three rows ( Figure 4). Additionally, a Geo 7x GNSS(Trimble Geospatial Company, California, United States) receiver with accuracy ranging from 0.02 to 0.5 m was used to precisely locate the vines and subplots in the field. Measurements were carried out every two weeks starting from the middle of July 2019 up to the end of August 2019. This allowed us to cover vine growth from veraison to harvest. Rainy days were avoided as much as possible for field measurements.
Rainfall and temperatures have been monitored for the whole period using data from the Meteo-France (French national meteorological service's) stations that were located closest to the study sites. From Figure 5, one can see that plot A received a little more rain than the other plots and the temperatures measured at this site were a little lower than at the others (up to 5 degrees lower during the second week).  Table 3 summarizes the measurements that have actually been carried out. Some leaf reflectance measurements could not be carried out because of the climatic conditions. Regarding Plot A, SWPs were missing on the fourth week due to time shortage and a lack of availability of partner operators. Fortunately, measurements could be made on plots C1 and C2 to complete the data set for red varieties. The number of reflectance measurements differs between days according to climatic conditions and, thus, time slots to acquire data. The water status of each vine in the different subplots was assessed by measuring the SWP using a Scholander pressure chamber, following the method described in [12,40]. One SWP measurement was taken per vine due to the time needed for all the vines. The leaves were bagged with an insulating layer in the morning (between 09:00 and 11:00). This leads to the closing of stomata and then to the balance of the sap between plant and leaf. In the early afternoon (between 14:00 and 16:00), leaves were removed with their stem and quickly set up in the pressure chamber. This step has to be done one leaf at a time, as balance is broken as soon as the leaf is taken out. The actual measurement was then made by cutting the tip of the stem and recording the pressure required to squeeze the first drop of sap out of the stem.
According to this method, the higher the pressure, the more severe the water constraint. It is usually accepted that the potential of free water with maximum availability is set to 0. The value is converted from the positive pressure in bar applied to the petiole to extract the sap to the negative pressure in MegaPascal (MPa) present within the petiole. The value will decrease as the transpiration is high and the soil is dry [37] and, in case of drought, it reaches very low values.
The relationship between measured pressure and actual water stress is not straightforward as it depends on both the development stage of the vine and the expected type of wine. Optimal SWP as a function of those last two parameters have been defined empirically [3] and three typical profiles are reported in Figure 6. Regarding white wines, the aim is usually to have fruity aromas and a light wine, therefore, the plant should not be overstressed to prevent berries from becoming too concentrated in sugar. On the other hand, when considering full-bodied red wines produced for aging, a greater stress is required so the plant produces more powerful and more concentrated aromatic compounds in the berries. Figure 6. Optimal SWP pathways to produce three distinct wines (adapted from [3]).
• SWP data set description Table 4 synthesizes SWP values acquired for all plots, to the extent of data with both SWP and leaves reflectance measured on the same day. The SWP values range from −1.64 MPa to −0.51 MPa, which covers almost the whole range expected from Figure 6.  Figure 7 provides the temporal evolution of SWP with respect to the three optimal pathways presented in Figure 6. SWP values for plot B are almost totally included between the first two curves (i.e., plain yellow line and dashed red line, respectively), which means irrigation was well controlled to produce a white, light wine ( Figure 6) and fits with the data gathered on the amount of distributed water (Table 2). Regarding plot A, SWP values are included between the two lower curves (i.e., dashed red line and dash and dotted brown line), except for the second week, possibly in relation to the breakdown of the irrigation system mentioned in Table 2, which led to an overflow of water in the plot. SWP for plot C2 ended up below the lower curve (i.e., dashed and dotted brown line), reflecting the important water stress that occurred during the last week in accordance with the lack of an irrigation system for red varieties.
The data of plots A and C were then analyzed together, as they reflect the typical behavior of red varieties of plots without a suitable irrigation system to ensure correct water status during the fourth week.

Visible-Near-Infrared (VNIR)/SWIR Reflectance Spectra
Reflectance spectra were acquired using an ASD FieldSpec 4 Hi-Res NG Spectroradiometer (Malvern Panalytical Ltd.; Malver, United Kingdom) that covers the visible-nearinfrared and short-wave infrared spectral range (350-2500 nm) with a spectral resolution of 3 to 6 nm (2151 channels). Leaves' measurements were carried out using both the ASD's leaf clip (version 2) and the ASD contact probe (spot size of 12 mm), providing controlled illumination and observation conditions throughout the field campaign.
Spectral measurements were carried out during the same days of SWP measurements, between 10:00 and 14:00, on leaves selected on the same vine stocks. The acquisition and preprocessing protocol is divided into four phases: • The spectrometer was calibrated using a white reference (Spectralon panel) and a dark current correction was applied. Such calibration is performed every 15 to 30 min to take into account temperature changes over the day. • Three to five raw spectra were acquired on each vine stock, on healthy, young, and mature leaves located preferentially at the top of foliage (i.e., the ones that could be more easily observed from unmanned aerial vehicles (UAVs) or satellites). The measurements were acquired during the whole summer by only two people, alternately and during the day, to minimize operator variability in selection of leaves. Each spectrum is an average of 30 repeated scans. GPS coordinates were also automatically associated to each spectra during the acquisition process. • Each raw spectrum was ultimately converted to reflectance and exported into an ASCII file using the manufacturer's ViewSpec Pro software. • The last step of preprocessing consisted of checking all spectra to remove outliers according to their reflectance values especially the ones that had a low reflectance level due to measurement error. Spectra were then averaged by row subplot. Samples without SWP measurements for the considered date were also removed from the database (e.g., data for plot A and week 4, as mentioned in Table 3).
The final steps of preprocessing are synthesized in Figure 8. The final database comprises 118 spectra that will be used in our study. Figure 8. Preprocessing steps, from raw spectra to the database actually used in our study. Outliers were removed according to their reflectance values, especially the ones that had a low reflectance level due to measurement errors. Figure 9 below shows an example of leaves' spectra acquired on August 23rd in plot B. Figure 9a presents the spectra of a row subplot with W1 irrigation management. Figure 9b gives all the spectra for each row subplot of the three irrigation areas.

Methods
From the database presented in Section 2.3.2, one cannot find any obvious difference in spectra, as their main variations in reflectance are related to the global shapes of major vegetation absorption (e.g., [41,42]) and not to differences in SWP/water stress (e.g., Figure 9). Among the many existing remote sensing processing techniques (e.g., [43,44]), we chose to focus on highlighting the most significant domains related to vine water status. Indeed, the final goal is to verify whether it would be possible to use multispectral data to monitor vine water status; meaning checking which domains and, therefore, which spectral bands are needed.
The methodology implemented in this study is summarized in Figure 10. Details of data acquisition are described in the previous section. Each statistical analysis and interpretation is carried out for either (i) all the data at the same time (n = 118), (ii) by grouping plots A and C (n = 59), or (iii) taking only plot B (n = 59). This allows us to maintain continuity in the measurements and a consistent range of stem potential values for each week and each water pathway (Figure 7). The first step of processing consisted of extracting features from leaves' measurements from both (1) the raw spectra (Section 3.1.1) and (2) spectra that were averaged by wavelength domains (Section 3.1.2). The features retrieved were (a) the reflectance values for each independent wavelength or the average reflectance value for each wavelength domain and (b) the mathematical combination of reflectances at several wavelengths or for several domains.
The second step corresponds to a statistical analysis. Two methods were tested, (1) linear regression (Section 3.2.1) and (2) ExtraTree regressor, and their features' importance (Section 3.2.2), which allows us to highlight relations between features and SWP.
We also chose to evaluate the performance of a classification algorithm. Among this family of algorithms, Support Vector Machines (SVM) are the standard. The SVM implementation in the Sklearn Python package [45] was used. Here, three classes adapted from [3] were considered depending on the variability of the data set: (1) no water constraint SWP > −0.7 MPa, (2) low water constraint −0.7 MPa > SWP > −1.1 MPa, and (3) high water constraint SWP < −1.1 MPa. This leads to limitations on the applicability of such an approach, as the class depends on the variety and should evolve along the season. Consequently, the results will not be described but will be discussed in Section 5.5.

Feature Extraction
Using each wavelength separately: The first feature set consists of the leaves' reflectance values using all the available 1974 wavelengths separately. • Using wavelength combinations: The second feature set relies on the combination of reflectances at multiple wavelengths. To achieve this, two types of combination were tested: (1) the Normalized Difference Spectral Index (NDSI) and (2) the existing Hyperspectral Vegetation Indices (HVI): -In the case of NDSI, a feature data set is constructed using two-wavelength combinations ( [46][47][48]) in following the formula: where R i and R j refer to the reflectance values at l i and l j , respectively. This way, each possible combination of wavelengths over the whole spectrum was systematically tested. This approach allows us to normalize the reflectance and provide a clear overview of the most significant wavelengths. The spectral domains are directly highlighted in the correlation matrix (see Section 3.2.1). Several studies demonstrated that NDSI often performs better than common published indices [49,50].

-
The seven most widely used HVIs have been selected for evaluation in this study (Table 5). These indices are known to be linked to the plants' water status, either directly or using an index related to chlorophyll (itself affected by water content). HVI relies on the combination of two or more wavelengths and each of them is designed to highlight a specific physical property of vegetation.

Index (Abbreviation) Use Formula Reference
Indirect water sensitive VI

Average Spectrum by Domains
In a second step, the objective was to decrease the spectral resolution to mimic what could be observed using a multispectral sensor. The goal was to identify from the whole spectrum which wavelength domains would be the most efficient at estimating SWP. To achieve this, the spectrum was divided into 19 wavelength domains and each of them was defined in relation with the spectrum's global shape, itself directly related to the vegetation composition and structure ( Figure 11).
For instance, the main water absorption in the short-wavelength infrared, located around 1.2, 1.4, and 1.9 µm, were used to define the three domains SWIR b, e, and j, respectively. Additional constraints were also taken into account, especially to ensure compliance with Sentinel-2 bands (e.g., Red-Edge_a and _b related to S2 Red-Edge bands 5 and 6, respectively) and to avoid atmospheric absorption. Table 6 synthesizes the wavelength range associated to each wavelength domain.
Eventually, to produce those artificial multispectral spectra, reflectance values over the whole wavelength range for each domain were averaged. By this way, the aim was to get a first overview of what can be obtained with a multispectral sensor but not to exactly mimic one sensor or another.  As in Section 3.1.1, the first kind of analysis focused on the search for a direct correlation between the averaged reflectance for each domain and SWP. Then, two types of mathematical combinations were tested: (1) the Normalized Difference Spectral Index (NDSI) and (2) where D i and D j refer to the averaged reflectance values for the domains i and j, respectively. This way, each possible combination of domains is systematically tested. • Seven MVIs were selected on the basis of both indices found in the literature and our previous analyses using every wavelength. Literature indices not relying on the identified wavelengths range were discarded. These selected MVIs are mostly linked to chlorophyll content. Nevertheless, in order to take into account domains linked directly with water absorption, two HVIs were specifically adapted with relevant SWIR domains. All the selected indices were computed according to the formulas in Table 7.

Available Data Set Summary
The available data set and the number of features are summarized in Table 8.

Statistical Analysis
In an attempt to highlight the relation between spectral signatures and SWP, this study focused on regressions to test the different extracted features, either using all wavelengths (see Section 3.1.1) or only wavelength domains (see Section 3.1.2).

Linear Regression
For each extracted feature set, linear regression was carried out to study their linear correlation with SWP. The toolbox used is LinearRegression from the Scikit-Learn Python package [45]. The first parameter analyzed was the p-value, which was used as a limit related to the level of significance of the relationship. The lower limit was set to 0.0001. Then, the performance indicator chosen was the determination coefficient (R 2 ), which assesses how strong the linear relationship is between the two variables (i.e., how much of the variability in SWP measurements can be explained by this specific feature). The results are shown as follows: • With a simple line graph for linear regression by wavelengths; • With a correlation matrix for NDSI, HVI, and MVI, to compare features' performance and highlight visually which wavelength ranges or domains are the most significant to extract water status from spectral measurements.

Extra Trees Model
Within the family of algorithms dedicated to the extraction of feature importance, decision trees are commonly used [63]. Among them, ExtraTrees (Extremely Randomized Trees) are quite simple to operate and are computationally efficient. Moreover, unlike linear regression, they can find a nonlinear mapping function between features and SWP values. This algorithm was used in this study on raw reflectance by wavelengths as it is well suited for a large amount of data. Indeed, decision tree algorithms recursively split the data set to learn accurately [64]. ExtraTrees is a machine learning method similar to Random Forest, except that it tends to have a lower variance: instead of searching for the optimal feature/split combination, for each feature, a random value is selected for the split [65]. ExtraTrees selects k features randomly and calculates a Gini index (relative value of statistical dispersion measuring the deviation of the feature distribution from a perfectly equal distribution) for each feature [66]. An equal distribution would mean that all features are equally important. The sklearn Python package [45] was used to compute ExtraTrees. Model performance was evaluated using Root Mean Square Error (RMSE). The Gini index was used to evaluate the Feature Importance, and also the wavelength significance, with the more significant contributions being associated with higher values. The sum of the feature importance values is always equal to 1.

Results Interpretation
To highlight significant domains for each analysis, R 2 and p-value or Feature Importance were used. With the large amount of data and analysis, it was important to be able to highlight the spectral domains that most frequently appeared to be significant. In order to do this, we have created an "Importance criteria" based on the frequency of occurrence of the domains among those identified as the most significant. The most significant domains (with the best R 2 or FI) receive a score of 10, significant domains to a lower extent (significant values but lower R 2 or FI) receive a score of 1, and insignificant domains receive a score of 0. A logarithmic scale has been chosen in order to better emphasize the most significant domains compared to the other ones. When the spectral domain is emphasized for all the plots, the score is doubled. The scores obtained for each group (all plots, plots A and C, or plot B) and for each type of analysis are then summed up to provide the importance criteria (IC). This simple indicator allows us to easily visualize which domains are most often significant (higher importance criteria).

Correlation between SWP and Leaves' Raw Spectra
This section aims to identify the wavelengths that are most sensitive to plants' water status and to identify the wavelength domain they belong to. The presented results focus on the correlation between SWP values and either (i) the raw wavelengths separately or (ii) combinations of wavelengths (NDSI or HVI).

Linear Regression between SWP and Raw Wavelengths
A linear regression was systematically done to evaluate the correlation between SWP and the 1974 wavelengths. The result is a coefficient of determination R 2 associated to each independent wavelength, either when considering all the plots together ( Figure 12) or separately ( Figure 13). All the significant correlations (p-value < 0.0001, i.e., above the dashed line) show relatively low absolute values (R 2 ranging from 0.25 to 0.43).
When considering the whole database (Figure 12), the significant wavelengths (p-value < 0.0001 and R 2 > 0.35) are encompassed within the 1500 to 1700, 1750 to 1850, and 2050 to 2250 nm ranges. When considering only plots A and C (Figure 13a), the most significant wavelengths (p-value < 0.0001 and R 2 > 0.35) are encompassed within the 707 to 722, 1800 to 2000, and 2100 to 2500 nm ranges. When considering only plot B (Figure 13b), the most significant wavelengths (p-value < 0.0001 and R 2 > 0.25) are encompassed within the 1400 to 1480 and 1870 to 1900 nm ranges.
From the intervals defined above, one can determine the spectral domains of particular interest from Table 6. When considering the whole database, the most significant wavelengths are located only in the SWIR spectral region, especially in the "SWIR g", "SWIR h", and "SWIR l" domains. To a lesser extent, "SWIR k" could also be included as a domain of interest. When considering only plots A and C, the most significant wavelengths are associated to "SWIR h", "SWIR j", and "SWIR m" and, to a lesser extent, "SWIR l" and "Red-Edge b". When considering only plot B, the most significant wavelengths are located in the "SWIR e" and "SWIR j" domains. Considering all the data or data grouped by plot (A and C, or B), the SWIR domain is always highlighted. It is worth pointing out that for the three analyses, nothing stands out in the VIS or the near infrared (NIR) wavelength ranges.  Table 6 are reported above the graph.

ExtraTree Regression between SWP and Raw Wavelengths
The Feature Importance derived from the Extra Trees method highlights the most frequently used wavelengths in the decision trees of the algorithm. The result is a feature importance value associated with each independent wavelength, either when considering all plots at the same time ( Figure 14) or separately ( Figure 15). The RMSEs are lower in the second case (RMSE of 0.14 and 0.17 for plots A and C, and plot B, respectively) than when all plots are processed together (RMSE of 0.25).
When all data are aggregated together (Figure 14), the wavelengths that are highlighted as the most frequently used are encompassed within the 670 to 740, 1400 to 1480, 1520 to 1700, and 1750 to 1800 nm ranges. When considering only plots A and C (Figure 15a), the highlighted wavelengths are encompassed within the 680 to 740, 1400 to 1460, 1800 to 1950 nm ranges, and around 2500 nm. When considering only plot B (Figure 15b), the highlighted wavelengths are encompassed within the 500 to 750, 1400 to 1500, and 1800 nm to 1950 nm ranges.
From the intervals defined above, the most highlighted wavelengths when considering the whole database are mainly located in the "SWIR h" domain. To a lesser extent, other highlighted wavelengths are found in the "SWIR g", "SWIR e", and "Red-Edge a" domains. When considering only plots A and C, highlighted wavelengths are mainly located in the "SWIR e", "SWIR j", and "SWIR m" domains. To a lesser extent, other highlighted wavelengths are found in the "Red-Edge a" and "Red-Edge b" domains. Lastly, when considering only plot B, highlighted wavelengths are mainly located in the "SWIR i", "SWIR e", and "Red-Edge b" domains. To a lesser extent, other highlighted wavelengths are found in the visible spectrum associated to the three "Blue", "Green", and "Red" domains.

Linear Regression between SWP and NDSI Using Raw Spectra
The previous section demonstrates that significant (p-value < 0.0001) relationships exist between SWP and reflectance at specific wavelengths. Nevertheless, the relationship could be improved to be stronger (better R 2 and RMSE). To go a step ahead, NDSIs are tested, as normalizing the absolute reflectance values allows us to minimize spectral signatures unrelated to water content in leaves.
Results in this section rely on NDSI using reflectance at every wavelength available in the raw spectra (Equation (1)). Each possible combination of two wavelengths over the whole spectrum is systematically tested and a linear regression is done between its result and SWP. The determination coefficient is then depicted in a correlation matrix, allowing us to easily visualize which wavelength combinations have the stronger relationship with the plant's water status. Figure 16 displays the correlation matrix obtained when considering the whole database. The higher determination coefficients (around 0.6) are dark red, while the lower ones (close to 0) are white. A visual analysis of the correlation matrix shows that combinations that are the most correlated with stem potentials (p-value < 0.0001 and R 2 > 0. 45) combine wavelengths (the first range refers to the horizontal axis while the next one(s) refers to the vertical axis): • From the 800 to 1400 nm range with wavelengths (i) around 1800 nm, (ii) from the 2100 to 2300 nm range, and (iii) from the 1550 to 1700 nm range; • From the 1800 to 1900 nm range with wavelengths from the 2150 to 2300 nm range; • From the 670 to 700 nm range with wavelengths from the 1900 to 2400 nm range.
According to Figure 17a, for plots A and C, the most important NDSI (p-value < 0.0001 and R 2 > 0.6) combined wavelengths as follows (the axis read first is the horizontal one): • From the 700 to 750 nm range with wavelengths from the 700 to 2400 nm range; • From the 500 to 600 nm range with wavelengths from the 600 to 700 nm range; • From the 500 to 1900 nm range with wavelengths around 1900 nm; • From the 1600 to 1800 nm range with wavelengths around 1800 nm.
For plot B (Figure 17b), the most important NDSI (p-value < 0.0001 and R 2 > 0.6) combined wavelengths as follows (the axis read first is the horizontal one): • From the 800 to 1400 nm range with wavelengths (i) around 1900 nm and (ii) from the 1400 to 1500 nm range; • From the 1400 to 1500 nm range with wavelengths from the 1500 to 1800 nm range; • From the 1900 to 2000 nm range with wavelengths from the 2000 to 2400 nm range; • From the 1650 to 1800 nm range with wavelengths from the 1800 to 1850 nm range.
From the intervals defined above, the most highlighted wavelengths when considering the whole database are mainly located in the SWIR region("SWIR g", "SWIR i", "SWIR k", and "SWIR l"). To a lower extent, "Red-Edge a" and "NIR" also seem to be significant. For plots A and C, the most significant domains are the "Red", "Red-Edge b", "SWIR i", and "SWIR j". To a lower extent, other highlighted domains are "Red-Edge a" and "NIR". Regarding plot B, the highlighted domains are "SWIR e" and "SWIR j", and, to a lower extent, "NIR", "SWIR a", "SWIR b", "SWIR c", "SWIR d", "SWIR g", and "SWIR h".
In general, SWIR regions always appear to be significant as well as the "NIR" domain to a lower extent. Moreover, "Red-Edge a" is highlighted for two of the three analyses by plot group.  Table 6 are also reported along the two axes.  Table 6 are also reported along the two axes. (a) Plots A and C; (b) Plot B.

Summary of the Results for All Correlations Tested on Raw Spectra
The results from the three previous sections are summarized in Figure 18. Each domain defined in Table 6 is reported as a cell in the table. The darker the cell, the greater the significance of the domain (black cell for an importance score of 10, grey cell for an importance score of 1, and white cell for a score of 0). Results show the two domains that are most often highlighted (importance criteria ≥50), "SWIR j" and "SWIR h". Four other domains seem also to be relevant (importance criteria ≥40), "SWIR e", "SWIR d", "SWIR i", and "SWIR l".
Regarding the level of significance, with all plots, the determination coefficient increases from 0.35 to 0.45 between raw wavelengths and NDSI indices. Concerning plots A and C and plot B apart, the coefficient increases from 0.25 or 0.3 to 0.6 between the two features (and even reaches a maximum of 0.8). Figure 18. Summary of the three previous sections' results for the three methods either with the data of all plots, for plots A and C, or for plot B. The darker the cell, the greater the significance of the domain: black cell for an importance score of 10 (best R 2 or FI), gray cell for an importance score of 1 (significant p-value but lower R 2 or FI), and white cell for a score of 0 (result not significant). When the spectral domain is emphasized for all plots, the score is doubled. The sum of the scores for each domain is recorded at the bottom of the table and is called the "Importance Criteria".

Linear Regression between SWP and Hyperspectral Vegetation Indices
Based on previous analyses, the calculation of vegetation index with reflectance of two wavelengths seems to give better results than with a single-wavelength reflectance. Therefore, the aim here is to verify the effectiveness of HVIs that are regularly used in the bibliography to describe the water content of vegetation.
According to Table 9, for all the data, the two most relevant HVIs (p-value < 0.0001) are as follows: • Normalized Difference Water Index (NDWI) with wavelength reflectance in "SWIR c" and "NIR"; • Moisture Stress Index (MSI) with wavelength reflectance in "SWIR g" and "NIR".
For plots A and C, the most significant HVI (p-value < 0.0001) is Leaf Water Index (LWI) with wavelength reflectance in "SWIR c" and "SWIR e".
For plot B, four HVIs appear to be significant (p-value < 0.0001): • CI with wavelength reflectance in "Red-Edge a" and "Red-Edge b"; • MSI with wavelength reflectance in "SWIR g" and "NIR"; • NDWI with wavelength reflectance in "SWIR c" and "NIR"; • WBI with wavelength reflectance in "NIR".
These HVI are less significant than some NDSIs (respectively, R 2 = 0.4 at most versus R 2 > 0.6) as found by [47]. This could be due to the particularity of each data set which may lead to slightly different answers in the spectrum. Table 9. Coefficient of determination between HVI and SWP (values in bold are different from 0 at significance level alpha = 0.05).

All Data
Plots A and C Plot B

Correlation between SWP and Leaves' Spectra Averaged by Domains
The purpose of this section is to verify the previous results on the importance of wavelengths in specific domains to identify vine water status, but using the reflectance averaged by spectral domains. This approach will help with erasing disparities between close wavelengths within the same spectral domain as they are expected to spectrally behave in the same way. Correlations with each domain apart were computed using linear regression and ExtraTrees. The results highlight the same spectral domains and do not provide any additional information. Domain combinations provide more significant results with NDSI or MVI.
In general, "SWIR g" and "SWIR d" appear in all the results. Moreover, the "NIR" is highlighted for two of the three analyses by plots group. The correlation coefficients obtained here are lower than those obtained with wavelength combinations but confirm the effectiveness of using SWIR and NIR domains. The previous results are summarized in Table 10. For each analysis, both the chosen thresholds and the identified domains are highlighted. Regarding the latter, each domain defined in Table 6 is reported as a cell in the table. The darker the cell, the greater the significance of the domain (importance score of 0 for white cell, 1 for grey cell, or 10 for black cell). Results show the domain that is most often highlighted (importance criteria = 40) is "SWIR g". Four other domains are also relevant to a lower extent (importance criteria ≥ 20), "NIR", "SWIR b", "SWIR c", and "SWIR d".  Table 11 shows correlation coefficients between MVI and SWP values. When considering the whole database, the best correlations between MVI and SWP values (p-value < 0.0001) are Normalized Difference Infrared (NDII) and MSI. They both combine "NIR" with "SWIR g". For plots A and C, the most significant MVIs (p-value < 0.0001 and R 2 > 0.35) are Red-Edge Position (REP), Normalized difference Red-Edge (NDRE)1,2, and Red-Edge Chlorophyll Absorption (RECAI); they all combine "NIR" with "Red-Edge a" or "Red-Edge b". Regarding plot B, only the LWIs with "SWIR c" and "SWIR e" appears to be significant (p-value < 0.0001).

Linear Regression between SWP and Multispectral Vegetation Indices
MVIs highlight the importance of the SWIR domain (in particular, "SWIR c", "SWIR e", and "SWIR g") but also demonstrate the relevance of "NIR" and "Red-Edge a" or "Red-Edge b".
From the results of our study, it appears that NDVI, an index widely used to monitor vegetation, is not well suited to identify the water status of grapevines as it has the lower determination coefficient. Table 11. Coefficient of determination between MVI and SWP (values in bold are different from 0 at significance level alpha = 0.05).

All Plots
Plots A and C Plot B

Discussion
The main objectives of this paper are to (1) highlight the best spectral domains related to vine water status and (2) identify the most promising vegetation indices that could be used with multispectral data. Table 12 synthesizes all the results obtained by wavelengths and by domains. The first analysis by wavelengths highlighted the SWIR domains from "SWIR g" to "SWIR j" plus "SWIR e" and "SWIR l". These domains coincide with water absorption ("SWIR e" and "SWIR j") and with the maximum reflectance in the SWIR ("SWIR g" to "SWIR i" and "SWIR l"). The second analysis by domains highlighted the "SWIR g" corresponding to a maximum reflectance in the SWIR. "NIR" and "SWIR b" to "SWIR d" are also emphasized.

Summary of the Results
Regarding VI, when considering all plots, the most promising HVI and MVI include NIR and SWIR, in particular, "SWIR g" (NDWI, MSI, and NDII). For plots A and C, the most relevant HVI is LWI with "SWIR c" and "SWIR e", while the most relevant MVIs are REP, NDRE, RECAI, and Inverted Red-Edge Chlorophyll (IRECI) with "NIR", "Red-Edge a", "Red-Edge b", and "Red" or "Green". For plot B, the most relevant HVI is CI with "Red-Edge a" and "Red-Edge b", whereas the most relevant MVI is LWI with "SWIR c" and "SWIR e".
In this paper, we aim at identifying the most suitable VI to be used to process satellite images, especially images acquired by Sentinel-2. The results presented here show that the most promising VIs use the following domains: "NIR" and "SWIR g" (NDII and MSI) and the "NIR", "Red-Edge", and "Red" or "Green" (REP, RECAI, IRECI, and NDRE).  Figure 18 are gathered at the top for each feature. The bands that compose the vegetation indices (VI) are checked and shaded out. The determination coefficients obtained between each VI and SWP value are also gathered at the right of the table for all plots or plots A and C and plot B apart. The most important values (IC and R 2 ) are shown in darker green.

Spectral Domain and VI Sensitivity
In accordance with the literature [26,28,36,48,56,67], the wavelengths most correlated with SWP belong to the SWIR domain, in particular, to the minimum ("SWIR e" and "SWIR j") that corresponds to water absorption, or to the maximum ("SWIR c", "SWIR d", "SWIR g", "SWIR i") reflectance in this range. Those particular spectral signatures (1259 nm in "SWIR c", 1264 nm in "SWIR c", and 1334 nm in "SWIR d") indeed correspond to the vibrations of the O-H stretch in water molecules [28].
On another level, the Red-Edge domain also seems to be of particular interest. This domain gives insight into leaf chlorophyll concentration [62,68,69], which is itself related to the availability of water in the plant [70][71][72]. The "Red-Edge" domain has already been linked to the water status of the vine by [20], who estimated crop coefficient (Kc) using a time series of Sentinel-2 images, or by [73], who correlated stem water potential and UAV data.
NIR combined to red-edge or SWIR, and red combined to green also appear to be of interest, even if slightly less than previous observations. First, a water absorption band is located within the NIR, around 980 nm. Second, this domain gives insights into leaf morphology and structure [47,74], which is also affected by water content. "Green" and "Red" are related to pigments present in plants (xanthophyll and chlorophyll), which also react to water stress, according to [26].
Among the most promising indices according to this study, only MSI is directly related to the water content of the vegetation [24,56,75,76]. The NDRE is usually used to assess chlorophyll content [51] but [77] already mentioned that it could also be used to identify drought stress. In the literature, the four other indices (NDII, REP, IRECI, and RECAI) are used only for chlorophyll concentration and biomass assessment ( [59][60][61][62]). Nevertheless, as mentioned before, those indices may still reflect the impact of water stress on plants, as suggested by [73], as they provide insights on nutrient assimilation or chlorophyll content and, therefore, on biomass.
In this paper, we have highlighted the SWIR, Red-Edge, and NIR as the most interesting spectral domains for monitoring water status. Nevertheless, finding a very specific parameter allowing its specific monitoring in operational conditions is not as simple, as these domains could also be affected by other stresses such as diseases or deficiencies, especially regarding Red-Edge and NIR domains. Further experiments will be needed to test the robustness of these domains or VIs for vine water status monitoring over vineyards that are affected by other types of stresses. However, whatever the results, information from remote sensing tools must always be considered as a decision support tool and will never replace field knowledge and observations.

Correlation Differences between Wavelengths and Domain Reflectance
Correlations with domains are always weaker than with wavelengths. For example, for plots A and C, LWI computed with wavelengths values leads to a coefficient of determination of 0.62, whereas the same vegetation index computed with average wavelengths in domains leads to a coefficient of determination of 0.19. The best coefficient of determination between MVI and SWP values is 0.48 while between HVI and SWP values it is 0.62. This can be explained by the way domains' reflectances are computed in averaging the reflectance values of 30 to 250 wavelengths, taking into account Sentinel-2 bands. It can erase some particularly strong correlations that are effective with only certain wavelengths apart. This choice, therefore, has arbitrary aspects and could be improved in the perspective of other explorations. Indeed, the perspective of using finer and more focused domains in areas of interest could give a better result if the objective is of no to use it with data from an already launched satellite.
In this paper, hyperspectral measurements were used to highlight the most relevant domains that can be used with a multispectral satellite such as Sentinel-2. However, this paper also underlines other domains that are not yet present or usable in the current multispectral satellites but could be in the future at a reasonable price for temporal monitoring. For this purpose, analyses were almost exclusively focused on spectral indices (NDSI, HVI, and MVI). The fact is, with perspectives other than a commercial monitoring service, with the possibilities offered by hyperspectral data, it is possible to implement more specific analysis methods such as Gaussian or Chemometrics processing, which can use the shape of the whole spectrum.
Nowadays, available and affordable commercial satellites for Earth Observation (EO) are multispectral. However, more and more programs deal with the use of hyperspectral satellite for EO. These include the Italian PRecursore IperSpettrale della Missione Applicativa (PRISMA) satellite from the Italian Space Agency, which was launched on 22 March 2019. It has 250 spectral bands from 400 nm to 2500 nm with a spatial resolution of 30 m and a panchromatic acquisition at 5 m. Another European hyperspectral satellite is the Environmental Mapping and Analysis Program (EnMAP), a German hyperspectral satellite, which is in a development and production phase and should be launched in 2021. It has 230 spectral bands from 420 to 2450 nm with a spatial resolution of 30 m and temporal resolution of four days. A last mission can be cited, the Indian Hyperspectral Imaging Satellite (HySIS) with 256 bands from 400 nm to 2400 nm, a spatial resolution of 30 m, and which was launched in 2018.

Correlation Differences between Plots
The analysis often performs better for plots A and C or plot B apart than with all plots together. For example, the RMSE for ExtraTrees algorithm with plots A and C or Plots B is almost half the size of the RMSE with all plots. It can be explained by the sample variability. Indeed, plots A and C are planted with red grape varieties while plot B is planted with a white one. Moreover, plot B has been less stressed than plots A and C.
The correlations are generally a little more important for plots A and C than for plot B. Moreover, for plot B, the significant wavelengths are almost exclusively those in the SWIR. The very low values of SWP for plots A and C may have affected the leaves' metabolism more and impacted the chlorophyll content and leaves' structure more than plot B. This could explain why Red-Edge and NIR domains are more sensitive for these data as they are related to plant structure and pigment content.
Moreover, some wavelengths appeared to be relevant for plots A and C and for plot B apart, but were no longer when all the plots were grouped together. For example, wavelengths in the "SWIR j" with wavelengths in the "SWIR k" to "SWIR m" seems to be highlighted for plots A and C in Figure 17a and for plot B in Figure 17b, but in Figure 16, these wavelengths no longer appear to be relevant. The same occurs for NDSI with wavelengths in the "SWIR g" and "SWIR h" and NDSI with wavelengths in the "SWIR j" and "SWIR g", which seem to be highlighted in Figure 17 but no longer in Figure 16. This can be explained by the distribution of the data for each plot, as shown in Figure 21 below. Indeed, as in Figure 21a, even if an NDSI seems to be well correlated with SWP values for plots A and C (R 2 = 0.59) or for plot B (R 2 = 0.55), with all plots, the correlation will be lower (R 2 = 0.33) as the distribution of the data are not the same for the two data sets. The result is even more obvious in Figure 21b,c, where there is almost no correlation when all plots are aggregated (R 2 = 0.12 and R 2 < 0.1, respectively).
It demonstrates the importance of taking into account the difference between plots or varieties. In fact, varieties react in different ways according to their specifications and adaptations to water stress [78].

Choice of Analysis Method
The two chosen analyses do not always show exactly the same results. This may be due to the fact that one (Linear regression) emphasizes linear and the other (ExtraTrees) nonlinear relationships. In addition, Linear regression used one wavelength apart while Ex-traTrees used all wavelengths together. Moreover, regarding regression choice, the results are not always very high but remain significant (p-value < 0.0001). It could be reasonable to think about further investigations with the classification method. The results of classifications tests with the SVM algorithm are promising, with accuracy rates up to more than 95% for plots A and C with NDSI by wavelengths ( Figure 22b) and up to more than 85% with NDSI by domains (R 2 < 0.6 for regression correlation coefficient). Moreover, the same domains than those with the regression method seem to be the more efficient (Figure 22a), except for the four cases shown as dotted lines. However, the classification of SWP values in three categories can be limited for further investigation. Indeed, these classes are well suitable for the red variety but not for the white one (see Figure 7). Moreover, it is not adapted for all development stages ( Figure 6). Nevertheless, the SWP value distribution of the database used for this study does not allow the creation of more classes, as the number of samples by class would not be well balanced.

From Hyperspectral to Multispectral Data
As highlighted in the introduction, this work is a step towards our main final objective: to identify the most recommendable spectral domains or VIs for water status monitoring using multispectral satellite imagery. This paper focuses on vine leaves' reflectance, we are aware that the results obtained here cannot be directly extrapolated to data obtained by satellite, as a pixel signature is usually mix of leaves, soil, and shadow signatures. Moreover, each multispectral band takes into account several wavelengths according to a distribution function specific to each band and each sensor.
In order to provide an intermediate step in terms of spatial resolution between leaves' spectra and satellite imagery, additional measurements were also made on the vine canopy (about 30 cm above, at arm's length) during the field campaign. The measured reflectance is, in this case, no longer related to pure end-members, but rather, is a mixture of vine leaves, soil under the vine row, and shadows.
The final canopy database is composed of 113 spectra. These data were analyzed following the methodology described in the paper and results globally show the same trend as for leaves' spectra but with much lower levels of significance. This can be explained by the mixture effect and/or the environmental conditions (unstable weather with cloud passing by, three operators with possible differences in height above the foliage during the measurement, difference in the height of the foliage within the plots, etc.). For instance, in Figure 23, one can see that the feature importance algorithm highlights nearly all the same domains but with feature importance values four times smaller. The two differences are located around 1750 and 2000 nm (dotted lines in Figure 1).
In the further transition from hyperspectral acquisition to multispectral images, we can expect other impacts on the significance of the results. A future work could be the first to mimic Sentinel-2 sensors, for example, taking the distribution functions of each band.

Conclusions
This paper investigates which spectral domains appear to be the most relevant to assess vine water status. In order to achieve this goal, leaves' hyperspectral reflectance were measured between 350 nm and 2500 nm on four vine plots with different amounts of water every two weeks from July to August. Correlations were searched with the regression method between leaves' reflectance and a measure of vine water status, the SWP. First, results show the relevance of the SWIR domain, which is directly sensitive to water content. Additional results also reveal the benefits of the NIR and Red-Edge bands, which are indirectly linked with water content through its impact on chlorophyll content and cell structure. The ultimate goal of the project is to verify the effectiveness of using Sentinel-2 images to monitor vine water status. The most relevant multispectral vegetation indices that can be used with Sentinel-2 appear to be those with NIR and Red-Edge bands (REP, NDRE2, and RECAI) and those with NIR and SWIR bands (NDII and MSI).
The longer-term objective would be to verify whether the spectral bands present in Sentinel-2 would allow the water status of the vineyards to be monitored. Further researches are in progress using Sentinel-2 images to investigate vegetation indices as well as machine learning methods (regression and classification) and test data from several years and locations. The aim is to provide temporal and spatial monitoring of vineyard water status at a large scale.  Data Availability Statement: Not applicable. They also thank winegrowers and The Domaine du Chapitre for allowing the realization of measurements in their field. The first author also thanks Florian Mouret and Sebastien Cuq for their support in data analysis and Montaine Foch for the help in the field.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: