Field Spectroscopy: A Non-Destructive Technique for Estimating Water Status in Vineyards

: Water status controls plant physiology and is key to managing vineyard grape quality and yield. Water status is usually estimated by leaf water potential (LWP), which is measured using a pressure chamber; however, this method is di ﬃ cult, time-consuming, and error-prone. While traditional spectral methods based on leaf reﬂectance are faster and non-destructive, most are based on vegetation indices derived from satellite imagery (and so only take into account discrete bandwidths) and do not take full advantage of modern hyperspectral sensors that capture spectral reﬂectance for thousands of wavelengths. We used partial least squares regression (PLSR) to predict LWP from reﬂectance values (wavelength 350–2500 nm) captured with a ﬁeld spectroradiometer. We ﬁrst identiﬁed wavelength ranges that minimized regression error. We then tested several common data pre-processing methods to analyze the impact on PLSR prediction precision, ﬁnding that derivative pre-processing increased the determination coe ﬃ cients of our models and reduced root mean squared error (RMSE). The models ﬁtted with raw data obtained their best results at around 1450 nm, while the models with derivative pre-processed achieved their best estimates at 826 nm and 1520 nm.


Introduction
Proper vine development depends on water availability, which affects plant physiology and, therefore, yield and grape composition. Irrigation scheduling is crucial because water deficiency negatively affects plant growth by modifying leaf pigment composition, reducing content in biochemical elements, and affecting turgor and cell enlargement [1]. In water stress conditions, photosynthesis slows down and metabolism is affected by alterations in the transport and uptake of nutrients, retarding plant growth [2] and sometimes even resulting in vine death.
Several methods can be used to estimate water status in crops. In viticulture, the main method is measurement of leaf water potential (LWP) [3], which requires measuring sap pressure in the xylem [4] using a pressure chamber. In California's San Joaquin Valley vineyards, the water status of vines was measured using a pressure chamber combined with near-infrared techniques [5]. Other authors used a pressure chamber to estimate vine water status, concluding that vine water content is a key factor in final berry mass [6]. However, although measuring leaf pressure provides the most accurate assessment of plant water status, this method is destructive, slow, and laborious for estimating water spectrum along the vertical axis, while multiplicative effects modify the local slope of the spectrum. Multiplicative combinations of these effects are the major factor inhibiting the interpretation of near-infrared diffuse reflectance spectra [30]. To minimize these effects, pre-processing spectral data may improve prediction models, enhancing the signal-to-noise ratio and the accuracy of the prediction models [31]. A well-designed pre-processing step will greatly improve model performance and avoid multi-collinearity. Derivative transformation [32,33] is typically used to smooth spectral data so as to improve the relationship between dependent and independent variables.
One way to minimize the problems associated with the external factors described above and with diffuse and specular reflectance is to use normalization tools. Normalization consists of a set of transformations whose purpose is to standardize the spectral signatures so that all data are at a similar scale. Several normalization methods exist, as described by CAMO Software AS [34]. Moreover, several spectral transformations can be used to mitigate spectral noise and diffuse and specular reflectance, including standard normal variate (SNV), multiplicative scatter correction (MSC), de-trending, and continuum removal (CR).
SNV is a pre-processing method that can be applied to each spectrum individually to remove scatter. SNV calculates the average and standard deviation values of all the data points for a given spectrum, obtaining the mean value for absorbance for each data point and dividing the result by the standard deviation [35]. Previous research found that, in using SNV, the diffuse reflectance spectra was free from multi-collinearity and there was no confusion from the complexity of shapes encountered in derivative spectroscopy [30]. MSC minimizes additive and multiplicative effects in spectral data by eliminating optical interference [36], removing physical effects produced by particle size and surface blaze from the spectra. MSC assumes that each spectrum is determined by sample characteristics and by particle size. Particle size can be represented by a baseline effect and the trend by means of a standard spectrum. MSC corrects the differences in the baseline and trend so that the transformed spectra are similar to the original spectra. Studies suggest that when MSC is used for data pre-processing, the coefficients obtained are robust for online prediction of material properties [36]. De-trending removes curvilinearity and baseline shift, improving wavelength dependency on the spectra [32]. This technique was successfully used to correct the effects of diffuse reflectance [30]. CR, another common spectral transformation, identifies and highlights absorption features of interest [37]. It normalizes the reflectance spectra and enables individual reflectance absorption features to be compared from a common baseline [38]. CR spectral data can be used to estimate water content using PLSR as a statistical method. Some works showed that hyperspectral data and CR transformation improved methods for estimating water status in individual vines [39], with the most suitable models being those that used spectral data pre-processed by CR and PSLR [25].
Although, as has been indicated above, there are sufficient antecedents to consider spectral analysis as a suitable technique to determine plant water status, to date, no exhaustive analysis has been published regarding which bands of the spectrum present the best correlations or how data transformation or pre-processing affect these correlations. The aim of our research was to establish the most suitable procedure for estimating vine leaf water status by determining (i) the most suitable central wavelengths and bandwidths and (ii) the most suitable spectral correction tool to estimate water potential. The PLSR method was used to fit the models, using water status as determined by a pressure chamber as the independent variable and spectral data collected using a field spectroradiometer as the input.

Study Site and Experimental Layout
The research was conducted for a Tempranillo variety vineyard located within the Ribera de (bilateral cordon, vertical shoot positioning with two pairs of wires, rootstock (110-Ritcher) and row spacing 3.0 × 1.25 m).
Nine sets were created in the plot, with three blocks, each representing a water treatment: Without irrigation, 50% of water needs, and 100% of water needs ( Figure 1). Twelve leaves from each block (a total of 36 leaves) were measured. Nine sets were created in the plot, with three blocks, each representing a water treatment: Without irrigation, 50% of water needs, and 100% of water needs ( Figure 1). Twelve leaves from each block (a total of 36 leaves) were measured.

General Workflow
The methodology involved three main steps: Data acquisition (leaf water status and spectral measurements), data processing (pre-processing and spectral transformation), and statistical analyses (model fitting and validation) ( Figure 2). Spectral data were collected immediately before leaf water status (water potential) measurement. The spectral data were pre-processed to obtain the mean spectral signatures of each leaf.
Six normalization methods were applied to reduce noise in the spectral data. Four derivative methods (GAP (Norris Gap) derivative and Savitzky-Golay derivative, both with first and second derivatives) and SNV, MSC, de-trending, and CR methods were used to study the relationship between the spectral data and LWP. PLSR was used to estimate leaf water status from the spectral data and the suitability of the regression models was assessed by cross-validation.

General Workflow
The methodology involved three main steps: Data acquisition (leaf water status and spectral measurements), data processing (pre-processing and spectral transformation), and statistical analyses (model fitting and validation) ( Figure 2). Spectral data were collected immediately before leaf water status (water potential) measurement. The spectral data were pre-processed to obtain the mean spectral signatures of each leaf.
Six normalization methods were applied to reduce noise in the spectral data. Four derivative methods (GAP (Norris Gap) derivative and Savitzky-Golay derivative, both with first and second derivatives) and SNV, MSC, de-trending, and CR methods were used to study the relationship between the spectral data and LWP. PLSR was used to estimate leaf water status from the spectral data and the suitability of the regression models was assessed by cross-validation.

Leaf Water Status
Twelve vines per block were marked and 36 leaves were selected (mature leaves located opposite the first cluster of the central shoot). The selected leaves were picked immediately after measurement. For each leaf, midday LWP was measured using an SF-Pres-70 pressure chamber equipped with a digital manometer (Solfranc Tecnologías SL, Vilaseca, Tarragona, Spain).
Pressure chambers are widely used to measure water potential and pressure-volume ratios in leaves [40]. A pressure chamber applies pressure to a leaf to measure the flow rate of water for a given pressure drop between the inside and outside of the chamber [40]. The pressure necessary to extract sap from the leaf is an indicator of the leaf water status (a higher pressure means higher water stress).

Spectral Data
Spectral signatures were measured using an ASD FieldSpec 4 Portable Spectroradiometer (Analytical Spectral Devices, Inc., Boulder, CO, USA). This spectroradiometer captured spectral data at wavelength ranges of 350 nm and 2500 nm. Three detectors covered the required spectral range: A visible near-infrared (VNIR) silicon photodiode array (350-1000 nm) and two graded Indium Gallium Arsenide photodiode detectors for short-wavelength infrared (SWIR) measurement (1001-1800 nm and 1801-2500 nm). The arrangement of the radiometers caused jumps or splice points in the spectral signature that were not due to reflectance in the leaf; these jumps (at 1000 nm and at 1800 nm) are clearly visible in Figure 3. The ASD FieldSpec 4 used a flexible fibre optic cable to aim at the target.

Leaf Water Status
Twelve vines per block were marked and 36 leaves were selected (mature leaves located opposite the first cluster of the central shoot). The selected leaves were picked immediately after measurement. For each leaf, midday LWP was measured using an SF-Pres-70 pressure chamber equipped with a digital manometer (Solfranc Tecnologías SL, Vilaseca, Tarragona, Spain).
Pressure chambers are widely used to measure water potential and pressure-volume ratios in leaves [40]. A pressure chamber applies pressure to a leaf to measure the flow rate of water for a given pressure drop between the inside and outside of the chamber [40]. The pressure necessary to extract sap from the leaf is an indicator of the leaf water status (a higher pressure means higher water stress).

Spectral Data
Spectral signatures were measured using an ASD FieldSpec 4 Portable Spectroradiometer (Analytical Spectral Devices, Inc., Boulder, CO, USA). This spectroradiometer captured spectral data at wavelength ranges of 350 nm and 2500 nm. Three detectors covered the required spectral range: A visible near-infrared (VNIR) silicon photodiode array (350-1000 nm) and two graded Indium Gallium Arsenide photodiode detectors for short-wavelength infrared (SWIR) measurement (1001-1800 nm and 1801-2500 nm). The arrangement of the radiometers caused jumps or splice points in the spectral signature that were not due to reflectance in the leaf; these jumps (at 1000 nm and at 1800 nm) are clearly visible in Figure 3. The ASD FieldSpec 4 used a flexible fibre optic cable to aim at the target.
In this work, the spectral measurements corresponded to reflectance, calculated as the ratio of the reflected energy of the observed leaf to the reflected energy of a known reference panel (sometimes called a calibration panel). This reference panel was a diffuse white reflectance panel that provided a diffuse homogeneous mix of all full source energy at nearly 100%.  In this work, the spectral measurements corresponded to reflectance, calculated as the ratio of the reflected energy of the observed leaf to the reflected energy of a known reference panel (sometimes called a calibration panel). This reference panel was a diffuse white reflectance panel that provided a diffuse homogeneous mix of all full source energy at nearly 100%.
Data were collected following the recommendations of the spectroradiometer manufacturer [29]. Calibration was done with a white panel face and recalibration was done before measuring the first leaf of each block [29]. A plant probe was used to collect non-destructive spectral data from leaves. The plant probe accessory consisted of a low-intensity quartz-halogen bulb, a grip to position fibreoptic cable input to the spectroradiometer, and a quartz window to press the probe against the surface of interest [41]. The nominal spot size of the plant probe was 10 mm. A leaf clip was also used to hold the target leaf in position without removing it or inflicting damage, while allowing single-handed functionality ( Figure 4).

Wavelength (nm)
High water potentialaverage High water potentialstandard deviation Low water potentialaverage Low water potentialstandard deviation Figure 3. Spectral signatures of leaves with high water potential (black line) and low water content (blue line). The spectral signatures were drawn from seven samples each for the highest and the lowest water potential.
Data were collected following the recommendations of the spectroradiometer manufacturer [29]. Calibration was done with a white panel face and recalibration was done before measuring the first leaf of each block [29]. A plant probe was used to collect non-destructive spectral data from leaves. The plant probe accessory consisted of a low-intensity quartz-halogen bulb, a grip to position fibre-optic cable input to the spectroradiometer, and a quartz window to press the probe against the surface of interest [41]. The nominal spot size of the plant probe was 10 mm. A leaf clip was also used to hold the target leaf in position without removing it or inflicting damage, while allowing single-handed functionality ( Figure 4).  In this work, the spectral measurements corresponded to reflectance, calculated as the ratio of the reflected energy of the observed leaf to the reflected energy of a known reference panel (sometimes called a calibration panel). This reference panel was a diffuse white reflectance panel that provided a diffuse homogeneous mix of all full source energy at nearly 100%.
Data were collected following the recommendations of the spectroradiometer manufacturer [29]. Calibration was done with a white panel face and recalibration was done before measuring the first leaf of each block [29]. A plant probe was used to collect non-destructive spectral data from leaves. The plant probe accessory consisted of a low-intensity quartz-halogen bulb, a grip to position fibreoptic cable input to the spectroradiometer, and a quartz window to press the probe against the surface of interest [41]. The nominal spot size of the plant probe was 10 mm. A leaf clip was also used to hold the target leaf in position without removing it or inflicting damage, while allowing single-handed functionality ( Figure 4).

Wavelength (nm)
High water potentialaverage High water potentialstandard deviation Low water potentialaverage Low water potentialstandard deviation Spectral data were collected from the upper face of the leaf. Measurements were done three times at three different points (avoiding veins, holes, and leaf spots) in order to ensure a representative spectrum of each leaf. Taking the spectral signature took about six seconds per each individual leaf.

Pre-Processing
Spectral reflectance data were checked using ViewSpect Pro6.0 software (Analytical Spectral Devices, Inc., Boulder, Colorado, USA) and exported to Spectral Analysis and Management System (SAMS) software 3.2 (Center for Spatial Technologies and Remote Sensing, University of California, Davis, CA, USA). Mean spectral signatures (one per leaf sample) were calculated. In order to mitigate the effects of mismatches between detectors, jump correction was applied at 1000 nm and 1800 nm, as follows: Bias values were calculated for the VNIR and SWIR2 regions and were used to offset the SWIR1 at the splice point. The resulting spectral data were normalized using Unscrambler®X 10.2 software and six different normalization techniques: (i) Area normalization, (ii) unit vector normalization, (iii) mean normalization, (iv) maximum normalization, (v) range normalization, and (vi) peak normalization.

Transformation
Spectral data pre-processed by a jump correction tool were transformed using five techniques: (i) Derivatives, (ii) SNV, (iii) MSC, (iv) de-trending, and (v) CR transformation. In this section, we describe an approach based on work described in the literature (mainly [32], but also [35,36]).
Baseline drift due to scattering (also referred to as the slope effect) and overlapping peaks were removed by applying derivative algorithms to the spectra. With this transformation, reflectance peaks were in the same place as in the original spectra [32]. Let the spectrum reflectance x i , be a function of wavelength λ. The derivative of the function is given as: Zero order: First order, taking the difference between the two wavelength bands: Second order: where ∆λ is the difference between the λ values of adjacent data points. Four derivative transformations were applied because the derivatives of the reflectance values allow differences between spectrum bands to be increased and also allow correction of the baseline effects [33]. The GAP derivative and Savitzky-Golay derivative were applied, using the first and second derivatives in both cases. The first derivative eliminated baseline displacements parallel to the abscissa axis, while the second derivative eliminated terms that varied linearly with the wavelength.
The SNV was used to locate multiplicative and additive interventions resulting from raw spectral data scatter and particle size variability. The scatter correction was a row-oriented transformation that standardized spectra using individual means and standard deviations. We first calculated the mean and standard deviation for each i spectrum of the m × 1 column vector x i . We then subtracted each data point in x ij from the mean and divided it by the standard deviation [32]: MSC offers similar advantages to SNV in removing the baseline effect for both translation and offset in the spectra. Multiplicative and additive spectral corrections were applied to the original spectral data means, with the corrected spectra producing a relatively consistent baseline. For each j of m wavelengths, we calculated the mean of all n spectra, obtaining an m × 1 column vector of a standard spectrum m. We then performed simple linear regression x ij = a i + b i m j + e ij on each i spectrum of the m × 1 column vector x i of n spectra in X (as the dependent variable), relative to the m × 1 vector m (as the independent variable). Given the solution obtained using the ordinary least square regression (OLSR) method, the regression coefficient parameter b i and intercept a i were used to correct baseline scatter by subtracting each spectrum x i from a i and dividing by b i [32]: De-trending accounts for variations in baseline shift and curvilinearity-generally found in the reflectance spectra of powdered or densely packed samples-using second-degree polynomial regression [30]. The procedure uses subtraction of the polynomial fit in each i spectrum. We define a second-order polynomial equation [32]: where the intercept a i and the regression coefficient parameters b i and c i are determined using the OLSR method. The de-trended spectrum x ij (detrend) is calculated as: where λ j is the band in j wavelength. Derivative, SNV, MSC, and de-trending transformations were computed using Unscrambler®X 10.2 software. CR transformation identified and highlighted absorption features of interest [37], normalizing the reflectance spectra so that individual reflectance absorption features could be compared with a common baseline [38]. The CR transformations were computed using ENVI®4.7 software (www.ittvis.com; IDL, Workbench).

Statistical Methods
Leaf water status estimation models were obtained by PLSR, which used continuous parts of the spectrum as input data.

Partial Least Squares Regression
To estimate leaf water status, PLSR was used to build models from the full spectrum. Multiple linear and stepwise regressions were applied along with several response variables simultaneously, correcting for collinearity and noisy independent variables [42]. PLSR functions by combining the most useful information from hundreds of bands into the first several factors, with the least important factors including background effects [37]. A smaller number of factors explaining most of the total variance in the data results in a simpler model architecture and greater prediction accuracy for the response variable [42]. Our criterion to add an additional factor to the model was that it had to reduce the root mean squared error (RMSE) by at least 2% to ensure model parsimony [24].
Measured water potential (MWP) was used as the predictor variable and pre-processed and transformed spectral data were used as the independent variable. Numerous correlations were analyzed, taking bands of different widths across the entire spectrum between 350 nm and 2500 nm. In terms of bandwidth, bands were taken from a very narrow bandwidth of 20 nm (twice the spectral resolution of the spectroradiometer) up to 2150 nm. In the middle, all bandwidths were incremented by an arbitrary step of 5 nm (twice the spectral resolution). The entire spectrum was run with all bandwidths in steps of 5 nm between bands. In total, 91,378 different PLRS correlations were analyzed for each dataset. The following datasets were considered: (i) MWP versus full reflectance spectrum from 350 nm to 2500 nm with jump correction. (ii) MWP versus full reflectance spectrum from 350 nm to 2500 nm with jump correction and area normalization. PLSR estimates were made using the plsregress function in MATLAB software (MathWorks, Inc, Natick, MA, USA).

Cross-Validation
Model fit was validated comparing observed and expected values and using the k-fold cross-validation method. Cross-validation avoids overfitting by not reusing the same data to both fit a model and to estimate prediction error. In k-fold cross-validation, the original sample is systematically partitioned into k equal sized subsamples. Of the k subsamples, a single subsample is retained to test the model and the remaining k-1 subsamples are used to train the model. The cross-validation process is repeated k times, with each of the k subsamples used exactly once as the validation data. The k results can then be averaged to produce a single estimate. While 10-fold cross-validation is commonly used [43], in k-fold cross-validation, k is an unfixed parameter.
We analyzed the coefficient of determination (R 2 ) and the RMSE in order to compare the fitted models [44]. Two consecutive PLSRs were executed. In the first PLSR, regressions with a maximum of 10 factors were calculated and k-fold cross-validation with k = 9 was computed. k = 9 was chosen instead of the traditional k = 10 because division of the 36 samples by 9 resulted in a whole number. Variation in the RMSE in cross-validation was analyzed using 1, 2, 3, etc. factors, in order to obtain the lowest RMSE value and, hence, the optimal number of factors to fit the water potential prediction models. In the second PLSR, the optimum number of factors as determined above was used, as was k-fold cross-validation, except, in this case, with k = 36 (equivalent to leave-one-out cross-validation); the models were thus constructed with 35 training samples, leaving one sample for the prediction. Prediction was repeated with each of the 36 samples and R 2 was calculated by comparing the predicted and real water potential values. Figure 5 shows RMSE and R 2 values for 91,378 PLSR water potential predictions compared with the corresponding reference data. RMSE is depicted in Figure 5a and R 2 is depicted in Figure 5b. For the predictor variable (spectral data) no pre-processing was done other than jump correction. RMSE results were obtained directly from the plsregress Matlab function using cross-fold validation with k = 9. R 2 values were obtained from another prediction round by combining the plsregress function and an in-house Matlab script with leave-one-out cross-validation. In Figure 5, the bandwidth considered in the PLSR is represented on the vertical axis, while the central wavelength of the band is shown on the horizontal axis. The top point in each triangle represents the result of a PLSR where the full spectrum (maximum width) was used and, logically, its central wavelength corresponds to the middle of the horizontal axis. All other points in the triangle represent RMSE and R 2 values for different PLSR predictions of water potential and many spectral bands of varying widths and locations. RMSE values were between 0.18 and 0.28 MPa. Considering that the reference data (36 samples) values ranged from −1.8 to −0.86 MPa, the best prediction areas had precisions of 10% to 22%. R 2 ranged between negative values and 0.5. The negative R 2 values areas were obtained with the lowest wavelengths, resulting in a model fit worse than a horizontal line. The highest R 2 values did not exceed 0.5, which was a low value. However, certain areas showed better correlations than the rest, which clearly indicated that broad areas of the spectrum were undeniably correlated with water potential, specifically, at around 1450 nm for bandwidths of around 410 nm ( Figure 6, Zone I) and, in the furthest part of the infrared, at 2060 nm for bandwidths of around 410 nm ( Figure 6, Zone II).

Results
The leaf water potential range was not as large as expected due to numerous rain events in the days before field data collection. August of 2017 was exceptionally rainy, with rainfall of 18.91 mm, over three times the average of the previous seven years (6.33 mm). Even so, the proposed method could work for a wide range of leaf water potential values. Part of this research consisted of studying the impact of different methods of pre-processing on correlations between measured water potential and spectral signatures. Figure 7 shows the RMSE obtained by the PLSR after applying the above-mentioned pre-processing treatments (normalization, de-trending, MSC, CR, SNV, and derivatives). . RMSE values for water potential (MPa) obtained by PLSR using raw reflectance data and after pre-processing (area, mean, maximum, unit vector, range, and peak variants of normalization, de-trending, MSC, CR, SNV, and GAP and Savitzky-Golay first-and second-order derivatives). All graphs apply the same colour scale (see legend in the top-right corner). Part of this research consisted of studying the impact of different methods of pre-processing on correlations between measured water potential and spectral signatures. Figure 7 shows the RMSE obtained by the PLSR after applying the above-mentioned pre-processing treatments (normalization, de-trending, MSC, CR, SNV, and derivatives). Part of this research consisted of studying the impact of different methods of pre-processing on correlations between measured water potential and spectral signatures. Figure 7 shows the RMSE obtained by the PLSR after applying the above-mentioned pre-processing treatments (normalization, de-trending, MSC, CR, SNV, and derivatives). . RMSE values for water potential (MPa) obtained by PLSR using raw reflectance data and after pre-processing (area, mean, maximum, unit vector, range, and peak variants of normalization, de-trending, MSC, CR, SNV, and GAP and Savitzky-Golay first-and second-order derivatives). All graphs apply the same colour scale (see legend in the top-right corner).  Figure 7. RMSE values for water potential (MPa) obtained by PLSR using raw reflectance data and after pre-processing (area, mean, maximum, unit vector, range, and peak variants of normalization, de-trending, MSC, CR, SNV, and GAP and Savitzky-Golay first-and second-order derivatives). All graphs apply the same colour scale (see legend in the top-right corner).
Applying normalization produced poor results because errors or residues (RMSE) tended to increase in the areas of maximum interest. As can be seen in Figure 8a, which shows errors obtained according to RMSE percentiles ordered from lowest to highest, minimum error increased with normalization pre-processing in 90% of cases. It was observed that the peak and mean pre-processing diminished maximum errors, although only in areas that were not of interest. De-trending, MSC, and SNV pre-processing did not seem to be very effective; in general, these pre-processing treatments reduced the areas of best and worst correlation results. CR pre-processing improved on the results of the other treatments but did not surpass the raw reflectance results. With CR pre-processing, a "hotspot" appeared with very low residues at around 1110 nm and with a relatively wide band of around 350 nm ( Figure 6, Zone III).
normalization pre-processing in 90% of cases. It was observed that the peak and mean pre-processing diminished maximum errors, although only in areas that were not of interest. De-trending, MSC, and SNV pre-processing did not seem to be very effective; in general, these pre-processing treatments reduced the areas of best and worst correlation results. CR pre-processing improved on the results of the other treatments but did not surpass the raw reflectance results. With CR pre-processing, a "hotspot" appeared with very low residues at around 1110 nm and with a relatively wide band of around 350 nm ( Figure 6, Zone III).
Pre-processing with GAP and Savitzky-Golay derivatives returned very similar results for both first and second derivatives. Pre-processing with the first derivative reduced the RMSE in practically 99% of the correlations (Figure 8b). However, the second derivative reduced the lowest RMSEs, i.e., the most interesting cases, by only 1%. Pre-processing with the second derivative created a wide trapezoid-shaped area that contained optimal correlation results. The algid or optimum point of this zone consisted of a band centred on 1520 nm with a width of 1400 nm (Figure 6, Zone IV). Also resulting from second-derivative pre-processing was a narrow band (140 nm) located at 826 nm ( Figure 6, Zone V). At this point, it was possible to find the best correlations. This zone also appeared for the first derivative, although slightly displaced to 835 nm and somewhat narrower (130 nm); however, it did not appear in the raw reflectance data without pre-processing, nor did it appear significantly in the rest of the pre-processed data. In general, the results for R 2 (Figure 9) are equivalent to those for RMSE and, as expected, the zones with lower RMSE values coincided with higher R 2 values. The R 2 values obtained in the areas with the best correlations exceeded 0.50. Although not very high, it must be taken into account that prediction values were obtained through complete cross-validation, using the prediction for each and every one of the original samples to estimate R 2 . No sample of original field measurements was discarded as being an outlier or an abnormally high residual. Figure 10 shows an example of two specific PLRS water potential predictions in two of the more interesting spectral zones identified in this study; it was observed that the value of R 2 using the data pre-processed using the Savitzky-Golay  Pre-processing with GAP and Savitzky-Golay derivatives returned very similar results for both first and second derivatives. Pre-processing with the first derivative reduced the RMSE in practically 99% of the correlations (Figure 8b). However, the second derivative reduced the lowest RMSEs, i.e., the most interesting cases, by only 1%. Pre-processing with the second derivative created a wide trapezoid-shaped area that contained optimal correlation results. The algid or optimum point of this zone consisted of a band centred on 1520 nm with a width of 1400 nm ( Figure 6, Zone IV). Also resulting from second-derivative pre-processing was a narrow band (140 nm) located at 826 nm ( Figure 6, Zone V). At this point, it was possible to find the best correlations. This zone also appeared for the first derivative, although slightly displaced to 835 nm and somewhat narrower (130 nm); however, it did not appear in the raw reflectance data without pre-processing, nor did it appear significantly in the rest of the pre-processed data.
In general, the results for R 2 (Figure 9) are equivalent to those for RMSE and, as expected, the zones with lower RMSE values coincided with higher R 2 values. The R 2 values obtained in the areas with the best correlations exceeded 0.50. Although not very high, it must be taken into account that prediction values were obtained through complete cross-validation, using the prediction for each and every one of the original samples to estimate R 2 . No sample of original field measurements was discarded as being an outlier or an abnormally high residual. Figure 10 shows an example of two specific PLRS water potential predictions in two of the more interesting spectral zones identified in this study; it was observed that the value of R 2 using the data pre-processed using the Savitzky-Golay second-order derivative (R 2 = 0.54) was slightly higher than the value using R 2 and the raw data (R 2 = 0.51). Moreover, RMSE values were lower for Savitzky-Golay second-order derivative models than for raw data models (0.18 and 0.20, respectively). second-order derivative (R 2 = 0.54) was slightly higher than the value using R 2 and the raw data (R 2 = 0.51). Moreover, RMSE values were lower for Savitzky-Golay second-order derivative models than for raw data models (0.18 and 0.20, respectively). Figure 9. R 2 values obtained by PLSR using raw reflectance data and after pre-processing (area, mean, maximum, unit vector, range, and peak variants of normalization, de-trending, MSC, CR, SNV, and GAP and Savitzky-Golay first-and second-order derivatives). All graphs apply the same colour scale (see legend in the top-right corner).
(a) (b)  Figure 9. R 2 values obtained by PLSR using raw reflectance data and after pre-processing (area, mean, maximum, unit vector, range, and peak variants of normalization, de-trending, MSC, CR, SNV, and GAP and Savitzky-Golay first-and second-order derivatives). All graphs apply the same colour scale (see legend in the top-right corner). second-order derivative (R 2 = 0.54) was slightly higher than the value using R 2 and the raw data (R 2 = 0.51). Moreover, RMSE values were lower for Savitzky-Golay second-order derivative models than for raw data models (0.18 and 0.20, respectively). Figure 9. R 2 values obtained by PLSR using raw reflectance data and after pre-processing (area, mean, maximum, unit vector, range, and peak variants of normalization, de-trending, MSC, CR, SNV, and GAP and Savitzky-Golay first-and second-order derivatives). All graphs apply the same colour scale (see legend in the top-right corner).

Discussion
After analzing a large number of PLSRs with bands of different widths throughout the available spectrum (350-2500 nm) and applying various pre-processing treatments, five spectral areas of interest were identified that allowed LWP for vines to be predicted with around 15%-20% accuracy. The results show that the areas of interest were obtained using near-infrared reflectance. These findings agree with several studies [8,25,45,46] that computed the most suitable relationship between spectral data and vine water content using wavelengths located in the near-infrared. The reason for this could be that wavelengths with maximum water absorption for crops are usually in the near-infrared [45]. The maximum water absorption point is shown in Figure 3, which depicts variations in the spectral signatures of leaves as a function of their water potential. The same figure shows that differences in reflectance between leaves with different water potentials were more evident in the near-infrared wavelengths than in the visible range.
The spectral areas of interest (Zones I and II in Figure 6), obtained from raw reflectance data without pre-processing, corresponded to the traditional absorption peaks, located at 1442 nm and 1928 nm, as frequently reported in previous works [8,46,47]. The numerous tests we carried out suggest that the optimum bandwidth to be used around these two absorption peaks spans from immediately before to immediately after the maximums. These zones have traditionally been used for the correlation of many variables related to water in plants [8,25,45,47].
Results generally deteriorated after normalization pre-processing. Errors or residues increased RMSE in the areas of maximum interest and in most of the bands. In some cases, normalization pre-processing reduced the maximum RMSE, but in bands that were not of interest. De-trending, MSC, and SNV pre-processing did not seem to be very effective either, as they increased the minimum RMSE but reduced the maximum RMSE. It seems that this kind of pre-processing tends to soften or average the spectra in some way, hiding some of the interesting differences in behaviours. Our results suggest that normalization pre-processing should not be used.
Pre-processing with GAP and Savitzky-Golay derivatives produced satisfactory results. While pre-processing with the first derivative reduced RMSE in practically 99% of the PLSR, pre-processing with the second derivative reduced the lowest RMSEs, i.e., the most interesting cases, by only 1%. Pre-processing with the second derivative resulted in a wide trapezoid-shaped area that contained optimal correlation results. The algid or optimum point of this zone consisted of a very wide band centred on 1520 nm and deriving from 820 nm to 2220 nm. This band, which we designated Spectral Zone IV, used practically all the near-infrared spectral data, which, as indicated above, reflected the greatest water absorption. The best correlations of all were found after second-derivative pre-processing in a narrow band (140 nm) located at 826 nm ( Figure 6, Zone V). This zone also appeared after first-derivative pre-processing, although slightly displaced to 835 nm and somewhat narrower (130 nm). It did not appear in the raw reflectance data without pre-processing or to any significant degree in any other pre-processing. This corroborated results reported in the literature [8], where a relationship was found between leaf water content and spectral data in the area centred at 970 nm, suggesting this to be a typical area of crop water absorption; however, the weak relationship meant that this area was not the most suitable for determining LWP.
CR transformation improved on normalization, de-trending, MSC, and SNV results, but generally did not improve on raw reflectance results. With this pre-processing, a "hotspot" appeared with very low residues in 1110 nm, with a relatively wide band of around 350 nm ( Figure 6, Zone III). This area of the spectrum contained two absorption peaks in a width of only 350 nm. However, as the hotspot was very small, bands with slightly different bandwidths or locations in the spectra showed the poorest results. This area therefore cannot be considered useful, despite very low residuals after applying CR, which may be why this area has not been reported before; nonetheless, more research needs to be done regarding this spectral zone. Note that CR pre-processing was designed for use in spectral intervals related to the absorption point of specific variables of plant composition (water, nitrogen, lignin, etc.) [25,45] and not in all spectral signatures; this may explain why CR transformation applied to the entire spectrum resulted in no improvement in the relationship between the spectral data and the variables.
While numerous works evaluated pre-processing of spectral data with a view of improving the prediction models of a variable [32,35,36], results varied depending on the type of variable that was determined and whether or not the pre-processing was combined with other kinds of pre-processing. Combination could explain the unsuitability of models computed with pre-processed spectral data. In our research, we used a single pre-processing treatment. Contrary to the other works, spectral pre-processing in our study did not improve R 2 , due to one of the most suitable prediction models being obtained with raw spectral data. Note that we collected spectral data using a plant probe accessory. Vegetation diversity and measurement conditions are two factors that have a great influence on spectral data collection [9]. When working in the field, it is normal to use passive sensors which require optimal lighting conditions, i.e., clear atmospheric conditions, with low aerosol content and water vapour in the atmosphere [48]; these sensors are also highly influenced by variations in solar or sensor zenith conditions, leaf movements due to wind, shadows in the crop, etc. Studies that predicted water content by spectrometry used smoothed and transformed spectral data to correct for measurement noise [5,26]. In our study, using a plant probe accessory minimized those effects because the plant probe had an integrated light source that allowed spectral data to be collected in full contact with the leaf and standardized lighting conditions [48], thereby allowing raw spectral data to be used to predict leaf water potential.
Looking at the most suitable models, the central wavelength was located at 1450 nm (using raw data) and at 826 nm and 1520 nm (using the second derivative). This result corroborated other works [5,26,47,49] which found that spectrum regions that included 1400 nm and1900 nm were best correlated with leaf composition variables. The reason could be because differences in reflectance values in the visible region (350-650 nm) of the leaf with differing water contents, while small, were greatest in the range 650-1400 nm, indicating this to have the most suitable intervals for detecting variations in water content [47]. These same bands corresponded to the first OH overtone excitation for H 2 O and the O-H and H-O-H deformation combination, whereas other bands corresponded to water content but also to cellulose and other organic components [50]. The most suitable model fitted with raw data (centred at 1450 nm) needed a bandwidth of 410 nm, while the two most suitable models fitted with derivative pre-processed data (centred at 826 nm and 1520 nm) needed bandwidths of 140 nm and 1400 nm, respectively. These results concurred with the findings of research that applied CR transformation in several ranges of the spectrum to estimate leaf water content in four vine varieties [8]. Those authors computed CR in four water absorption ranges centred on 960 nm (with range 205 nm), 1190 nm (with range 151 nm), 1465 nm (with range 403 nm), and 2035 nm (with range 410 nm) and used CR-transformed spectral data and PLSR to obtain suitable vine water prediction models. Thus, the central wavelength and bandwidths were similar in both those studies and our own. In addition, Figures 7 and 9 show the maximum R 2 and minimum RMSE for the model computed using CR pre-processing, with the centred wavelength and bandwidth coinciding with the CR interval selected for the above-mentioned work and for the most suitable model obtained in our study. This corroborates what was stated above; CR pre-processing was specifically designed to apply to spectral intervals related to the absorption point for variables, including water, nitrogen, lignin, etc. Figure 10 shows that, although the differences are small, models computed using Savitzky-Golay second-order derivative were slightly more appropriate than models fitted using raw data.
Our results indicate the importance of proper selection of wavelength, bandwidth, and pre-processing methods in developing a prediction model. Future research could examine combinations of normalization and other pre-processing methods with a view of improving the relationships between spectral data and vine water potential and developing an accurate, rapid, and non-destructive approach to detecting water stress in vineyards.

Conclusions
Our research, in which we tested different pre-processing methods for spectral data and identified the wavelength range that minimized RMSE values, demonstrates the usefulness of field spectroscopy for estimating midday water potential in vines.
The results show that derivative pre-processing increases the determination coefficients and reduces RMSE. A model fitted with raw data obtains the best results for the range centred on 1450 nm (bandwidth 410 nm), while a model fitted with derivative pre-processed data obtains the best results for bands centred on 826 nm (bandwidth 140 nm) and 1520 nm (bandwidth 1400 nm).
The proposed method has some advantages with respect to other methods, mainly that the field work is simple, rapid, and non-destructive. However, it has some drawbacks, such as the need to pre-process the reflectance data and the model fit.
In short, field spectroscopy is a means to evaluate water stress level in vines, which could be useful in optimizing irrigation scheduling and managing remote sensing systems.