Using Support Vector Regression and Hyperspectral Imaging for the Prediction of Oenological Parameters on Different Vintages and Varieties of Wine Grape Berries

The performance of a support vector regression (SVR) model with a Gaussian radial basis kernel to predict anthocyanin concentration, pH index and sugar content in whole grape berries, using spectroscopic measurements obtained in reflectance mode, was evaluated. Each sample contained a small number of whole berries and the spectrum of each sample was collected during ripening using hyperspectral imaging in the range of 380–1028 nm. Touriga Franca (TF) variety samples were collected for the 2012–2015 vintages, and Touriga Nacional (TN) and Tinta Barroca (TB) variety samples were collected for the 2013 vintage. These TF vintages were independently used to train, validate and test the SVR methodology; different combinations of TF vintages were used to train and test each model to assess the performance differences under wider and more variable datasets; the varieties that were not employed in the model training and validation (TB and TN) were used to test the generalization ability of the SVR approach. Each case was tested using an external independent set (with data not included in the model training or validation steps). The best R2 results obtained with varieties and vintages not employed in the model’s training step were 0.89, 0.81 and 0.90, with RMSE values of 35.6 mg·L−1, 0.25 and 3.19 ◦Brix, for anthocyanin concentration, pH index and sugar content, respectively. The present results indicate a good overall performance for all cases, improving the state-of-the-art results for external test sets, and suggesting that a robust model, with a generalization capacity over different varieties and harvest years may be obtainable without further training, which makes this a very competitive approach when compared to the models from other authors, since it makes the problem significantly simpler and more cost-effective.


Introduction
As increases in computational power and processing capability continue, the introduction of new technologies into the most diverse industry segments is expanding, viticulture and the whole wine industry being no exception.With producers and companies demanding cheaper and faster ways to Remote Sens. 2018, 10, 312 2 of 23 produce higher-quality wine from their vineyards, together with the massive increases in information available to the market, the search for a competitive advantage is increasing.One such advantage can be found in the development of new methodologies to reduce the cost of gathering information about grapes in an environmentally-friendly and timely manner, allowing winemakers to obtain more frequent insights into their wine grapes, to enable harvesting them at the optimal point of maturity and selecting them according to certain quality features.
Laboratory-based hyperspectral imaging collects information on how objects reflect and absorb light as a function of wavelength, providing both spatial and spectral information [1,2].The main advantage of this kind of image-based system in comparison with traditional chemical analysis is that it is non-destructive and reduces the overall cost of the acquisition of quality information.However, it involves complex data that requires powerful analysis tools to extract the necessary information from the underlying patterns in the spectra.In recent years, the use of hyperspectral techniques combined with proper data analysis tools for extracting information has become an important technique to measure different oenological parameters that are highly involved in the evaluation of grapes' ripeness stage.The most commonly used equipment in the literature operates simultaneously in the visible and near-infrared range, between 400-1000 nm, although the near-infrared wavelength above 1000 nm may contain important absorption bands.A possible reason for the extensive use of that range, in addition to the good results obtained, may be related to the fact that equipment operating at larger wavelengths tends to be significantly more expensive.
In this process of analysis and evaluation, the sugar content, anthocyanin concentration and pH index are highly researched parameters because they are correlated with the flavour, colour and are good indicators of the grapes' ripeness.At this stage of the maturation process the level of anthocyanin and sugars increases, while the acidity diminishes sharply, leaving a signature in the spectral activity of the grapes.Nevertheless, due to several factors such as different climate changes, soil quality, sun exposition, water assessment, altitude and harvest time, a large variability may be present in the vineyards and consequently in the quality of the grapes and in the profile of the different oenological parameters.In recent years, the use of hyperspectral image-based methods for the prediction of such parameters has been proposed, using (a) transmittance mode [3,4], (b) interactance mode [5][6][7], and (c) reflectance mode.Since for the same illumination scenario the intensity of light reflected from the grape is stronger, which facilitates measurements, we chose reflectance mode spectroscopy for our methodology, making it more relevant to analysing previous results in the reflectance mode.The works on the reflectance mode can be further divided into (1) the reflectance mode for a small number of berries in each sample [8][9][10][11][12][13][14][15][16], using partial least squares (PLS), neural networks (NN) and least-squares support vector machines (LSSVM); and (2) reflectance mode for a large number of berries in each sample [17][18][19][20][21][22][23][24][25], using PLS or modified partial least squares (MPLS).Using a small number of berries per sample comprises a more difficult problem, since the samples' variability is higher than when using samples with a larger number of berries, where the reference measurements result from the mixture of the individual berries.
The use and effectiveness of support vector machines combined with hyperspectral imaging ( [26]) has already been tested and widely employed in classification problems [27][28][29], but approaches using regression are still uncommon.Other works are available that measure different chemical compounds (for example, phenolic compounds, solid sugar compounds or aroma compounds) [30][31][32][33][34]. Table 1 provides the detailed results published in the literature for the determination of sugar content, anthocyanin concentration and pH index, with the hyperspectral imaging performed in the reflectance mode.
Table 1.Literature results for the prediction of the oenological parameters for whole wine grape berries with hyperspectral imaging performed in reflectance mode.

Oenological Parameter
Author R 2
The main purpose of this work is to present a different approach capable of dealing with the hyperspectral data of wine grape berries from several different years and to evaluate the ability of the model to assess varieties of wine grapes that were not used in the model's training.Building robust models that present a good generalization ability with new vintages and/or varieties of wine grapes is becoming a topic of major importance, since it avoids creating a new model for each new application.The existing scientific literature is sparse apart from the works published by the same authors [14,15], and the few exceptions [19,23] that tested new vintages for homogenised samples or for samples containing a large number of berries.Regarding the authors' previous works, they provided a less in-depth study of the model's generalization capacity since in this work we use a greater number of samples in the training phase, with more harvest years and also a different capture location to obtain a greater description of data variability.
In the present work, we introduce new models combining hyperspectral imaging and support vector regression ( [35]) with a Gaussian kernel, implemented with MATLAB software using the parallel computing toolbox with a Compute Unified Device Architecture (CUDA)-enabled NVIDIA Graphics Processing Unit (GPU), to predict oenological parameters in grapes, namely anthocyanin concentration, pH index and sugar content.The study is focused on the generalization ability of the method and on the use of a small number of whole berries per sample, which can be applicable to the selection of the best berries for precision viticulture and for mapping areas in order to improve ripening.The obtained results reveal better performance when compared with previously implemented solutions found in the literature and, unlike the majority of previous works, it includes different varieties and harvest years of wine grape berries.Thus, it is a more detailed study with a capacity for generalization.Section 2 provides a brief description of the varieties of wine grape berries used, an overview of the hyperspectral imaging setup for image-acquisition, details the procedures followed for the reflectance measurements and the techniques applied to build the prediction model, with emphasis on the principal component analysis, support vector regression and cross-validation algorithms, respectively.Sections 3 and 4 provide a detailed analysis of the results obtained for the different oenological parameters in the study, including a comparison with current state-of-the-art approaches.Section 5 provides general conclusions about the work conducted, alongside future directions for improvement.

Grape Sampling
The subject of our study were grapes bunches of the Touriga Franca (TF) variety, considered one of the most important varieties for the production of port wine in the Douro region, due to its resiliency to plant diseases, fruity flavour and intense colour, harvested in the years 2012, 2013 and 2014, from the vineyards of Quinta do Bonfim in Pinhão, Portugal, which is a property of Symington Family Estates, In the year 2015, the samples were collected from the vineyards in Universidade de Trás-os-Montes e Alto Douro, Vila Real, Portugal.In order to achieve the best possible model, it is important to test grapes from the beginning of veraison and maturity, and from areas within the same vineyard under different conditions (sun exposure, water availability, soil quality, among others).A total of 240 samples were collected in the year of 2012 (24 per day), 84 on the year of 2013 (12 per day), 120 in the year of 2014 (12 per day) and 108 in the year of 2015 (12 per day).The samples were collected from three different regions in the vineyard from vine trees with small, medium and large vigour.Laboratory-based, line-scan, hyperspectral image acquisition was performed on fresh grape berries, as described in Section 2.2.Each sample evaluated by hyperspectral imaging was composed of six grape berries randomly collected in a single bunch, removed from bunches with their pedicel still attached.All the samples were then kept frozen, at −18 • C. The chemical analysis was carried out with the six grape berries being defrosted at room temperature and then crushed in a buffer solution of tartaric acid (pH.3.2) and ethanol (95%) by macerating, and the resulting mixture was kept overnight at 25 • C [36].The samples were centrifuged (SIGMA centrifuge 3K18, 20 min, 4 • C, spin at 7155 g) and a clear extract was collected and mixed with acidified ethanol (0.1% Hydrochloric Acid).The total anthocyanin concentration was determined photometrically by the SO 2 bleaching method [37].A Ultraviolet-Visible Spectroscopy (UV/VIS) spectrophotometer (Shimadzu) and 1 cm path length disposable cells were used for spectral measurements at 520 nm and the pigment content, expressed in mg•L −1 , was calculated from a calibration curve of malvidin-glucoside.All determinations were performed in duplicate and the juice released was analysed for pH and brix contents according to validated standard methods [38].
In order to test the generalization capacity of the model, 84 and 60 samples (12 per day) were collected in the year 2013 for the Tinta Barroca (TB) and Touriga Nacional (TN) varieties, which are also important varieties for the production of port wine in the Douro region.The sample collection, number of berries per sample, hyperspectral image acquisition and chemical analyses were performed as mentioned previously.

Experimental Setup for Hyperspectral Imaging
We chose the reflectance mode over the transmittance mode for hyperspectral imaging since, for the same illumination scenario, the intensity of light reflected from the grape is stronger, which facilitates measurements.
The experimental setup assembled for the images collected used a hyperspectral camera, composed of a JAI Pulnix (JAI, Yokohama, Japan) black and white camera and a Specim Imspector V10E spectrograph (Specim, Oulu, Finland); lighting, using a lamp holder with 300 × 300 × 175 mms (length × width × height) that held four 20 W, 12 V halogen lamps and two 40 W, 220 V blue reflector lamps (Spotline, Philips, Eindhoven, The Netherlands).Both types of lamps were powered by continuous current power supplies to avoid light flickering and the reflector lamps were powered at only 110 V to reduce lighting and prevent camera saturation.The resulting hyperspectral images correspond to a single line over the sample and had 1040 wavelengths (ranging between 380 to 1028 nm, with approximately 0.6 nm width in each channel) × 1392 pixels.The 1392 pixels stand for the spatial dimension over the samples with approximately 110 mm of width.The distance between the camera and the sample base was 420 mm, and the camera was controlled with Coyote software from JAI inside a dark room, at room temperature.Figure 1 illustrates the experimental setup assembled for the hyperspectral image acquisition.After the image acquisition, it is possible to identify the grape berries using image segmentation methods.

Experimental Setup for Hyperspectral Imaging
We chose the reflectance mode over the transmittance mode for hyperspectral imaging since, for the same illumination scenario, the intensity of light reflected from the grape is stronger, which facilitates measurements.
The experimental setup assembled for the images collected used a hyperspectral camera, composed of a JAI Pulnix (JAI, Yokohama, Japan) black and white camera and a Specim Imspector V10E spectrograph (Specim, Oulu, Finland); lighting, using a lamp holder with 300 × 300 × 175 mms (length x width x height) that held four 20W, 12V halogen lamps and two 40W, 220V blue reflector lamps (Spotline, Philips, Eindhoven, Netherlands).Both types of lamps were powered by continuous current power supplies to avoid light flickering and the reflector lamps were powered at only 110V to reduce lighting and prevent camera saturation.The resulting hyperspectral images correspond to a single line over the sample and had 1040 wavelengths (ranging between 380 to 1028 nm, with approximately 0.6 nm width in each channel) × 1392 pixels.The 1392 pixels stand for the spatial dimension over the samples with approximately 110 mm of width.The distance between the camera and the sample base was 420 mm, and the camera was controlled with Coyote software from JAI inside a dark room, at room temperature.Figure 1 illustrates the experimental setup assembled for the hyperspectral image acquisition.After the image acquisition, it is possible to identify the grape berries using image segmentation methods.

Experimental Setup for Hyperspectral Imaging
We chose the reflectance mode over the transmittance mode for hyperspectral imaging since, for the same illumination scenario, the intensity of light reflected from the grape is stronger, which facilitates measurements.
The experimental setup assembled for the images collected used a hyperspectral camera, composed of a JAI Pulnix (JAI, Yokohama, Japan) black and white camera and a Specim Imspector V10E spectrograph (Specim, Oulu, Finland); lighting, using a lamp holder with 300 × 300 × 175 mms (length x width x height) that held four 20W, 12V halogen lamps and two 40W, 220V blue reflector lamps (Spotline, Philips, Eindhoven, Netherlands).Both types of lamps were powered by continuous current power supplies to avoid light flickering and the reflector lamps were powered at only 110V to reduce lighting and prevent camera saturation.The resulting hyperspectral images correspond to a single line over the sample and had 1040 wavelengths (ranging between 380 to 1028 nm, with approximately 0.6 nm width in each channel) × 1392 pixels.The 1392 pixels stand for the spatial dimension over the samples with approximately 110 mm of width.The distance between the camera and the sample base was 420 mm, and the camera was controlled with Coyote software from JAI inside a dark room, at room temperature.Figure 1 illustrates the experimental setup assembled for the hyperspectral image acquisition.After the image acquisition, it is possible to identify the grape berries using image segmentation methods.Observing Figure 2, it is possible to see patterns of light absorption, specifically in three main regions, these are where three wine grape berries were placed simultaneously for imaging.
Remote Sens. 2018, 10, 312 6 of 23 A threshold-based segmentation method was then applied to each hyperspectral image to obtain an individual image for each wine grape berry, thus leading to the reflectance measurement step.

Reflectance Measurements
Reflectance is the quotient between the intensity of the light reflected by an object and the light that illuminates that object, being a function of the wavelength of the light.We chose reflectance as input because the patterns of reflectance and absorption across wavelengths can uniquely identify chemical compounds and because, in contrast to the transmittance and interactance mode, it is possible to perform the imaging without requiring contact between the spectrometer/camera and the sample.For some positions x, and at wavelengths λ, the reflectance R can be expressed as: where GI is the intensity of light reflected from the grape; SI is the intensity of light reflected from the Spectralon (which is a total reflectance target); and DI is the dark current signal, which is electronic noise.The dark current signal is measured with the hyperspectral camera lens covered and must be subtracted from the grape and the Spectralon signal because it is independent of the object being imaged and would otherwise distort the calculated reflectance values.
To achieve noise reduction, an accumulation of 32 hyperspectral images on each grape berry were derived for GI(x, λ), DI(x, λ) and SI(x, λ) records.All reflectance measurements for the six grape berries were carried out along the berry "equator", and for three different berry rotations.In order to create a single reflectance spectrum, all grape point reflectance values were averaged over the spatial dimension and rotations.The resulting spectrum was then normalized (by subtracting the minimum values from each spectrum and dividing by the difference between the maximum and minimum values), in order to eliminate fluctuations in the measured light intensities due to the grape berry size and curvature.Figure 3 presents an example of the reflectance measurements, in the case for the samples of the TF 2012 variety.
Remote Sens. 2018, 10, 312 6 of 23 Observing Figure 2, it is possible to see patterns of light absorption, specifically in three main regions, these are where three wine grape berries were placed simultaneously for imaging.A threshold-based segmentation method was then applied to each hyperspectral image to obtain an individual image for each wine grape berry, thus leading to the reflectance measurement step.

Reflectance Measurements
Reflectance is the quotient between the intensity of the light reflected by an object and the light that illuminates that object, being a function of the wavelength of the light.We chose reflectance as input because the patterns of reflectance and absorption across wavelengths can uniquely identify chemical compounds and because, in contrast to the transmittance and interactance mode, it is possible to perform the imaging without requiring contact between the spectrometer/camera and the sample.For some positions , and at wavelengths , the reflectance  can be expressed as: where  is the intensity of light reflected from the grape;  is the intensity of light reflected from the Spectralon (which is a total reflectance target); and  is the dark current signal, which is electronic noise.The dark current signal is measured with the hyperspectral camera lens covered and must be subtracted from the grape and the Spectralon signal because it is independent of the object being imaged and would otherwise distort the calculated reflectance values.
To achieve noise reduction, an accumulation of 32 hyperspectral images on each grape berry were derived for (, ), (, ) and (, ) records.All reflectance measurements for the six grape berries were carried out along the berry "equator", and for three different berry rotations.In order to create a single reflectance spectrum, all grape point reflectance values were averaged over the spatial dimension and rotations.The resulting spectrum was then normalized (by subtracting the minimum values from each spectrum and dividing by the difference between the maximum and minimum values), in order to eliminate fluctuations in the measured light intensities due to the grape berry size and curvature.Figure 3 presents an example of the reflectance measurements, in the case for the samples of the TF 2012 variety.

Principal Component Analysis
Due to data complexity, with the dimensionality of the resulting matrix equal to the number of spectral channels measured by the hyperspectral camera (namely 1040), there are difficulties in processing such a large, multivariate dataset.In order to obtain the maximum performance of the machine learning algorithm employed, one must significantly reduce the size of its inputs with some

Principal Component Analysis
Due to data complexity, with the dimensionality of the resulting matrix equal to the number of spectral channels measured by the hyperspectral camera (namely 1040), there are difficulties in processing such a large, multivariate dataset.In order to obtain the maximum performance of the machine learning algorithm employed, one must significantly reduce the size of its inputs with some kind of data compression method.Consequently, a principal component analysis (PCA) was performed on the data matrix, extracting the dominant patterns present in the spectra in terms of a complementary set of scores and loadings plots, providing the means to significantly reduce the size of a dataset without losing variability in the data.The PCA was implemented by eigenvalue decomposition of the data covariance matrix and applied to each dataset composed of a different variety and vintage of wine grape berries, together with the mean-centering and auto-scaling methods to normalize the spectra, which subtracts the mean from the original dataset and then divides by its standard deviation.Figure 4 shows the loadings plot for the six first Principal Components (PC) extracted from the TF 2012 variety data matrix.
Remote Sens. 2018, 10, 312 7 of 23 kind of data compression method.Consequently, a principal component analysis (PCA) was performed on the data matrix, extracting the dominant patterns present in the spectra in terms of a complementary set of scores and loadings plots, providing the means to significantly reduce the size of a dataset without losing variability in the data.The PCA was implemented by eigenvalue decomposition of the data covariance matrix and applied to each dataset composed of a different variety and vintage of wine grape berries, together with the mean-centering and auto-scaling methods to normalize the spectra, which subtracts the mean from the original dataset and then divides by its standard deviation.Figure 4 shows the loadings plot for the six first Principal Components (PC) extracted from the TF 2012 variety data matrix.Observing Figure 4 it is noticeable that there are a large number of peaks throughout the spectrum, with emphasis between 400-700 nm, that are usually related to chemical compounds such as chlorophyll, carotenoids and anthocyanin, while the peeks between 700-800 nm are commonly associated with sugars [38].
A basic assumption in the use of the PCA is that the score and loading vectors corresponding to the largest eigenvalues contain the most useful information relating to the specific problem, and that the remaining ones mainly comprise noise [39].In general cases, the number of principal components used is chosen by the number of factors with eigenvalues over one, resulting in a percentage of variance explained (that is, the variability accounted for in the data) in a cumulative sum usually between 95%-99%.However, in this case the aforementioned assumption might not be true due to the highly complex chemical interactions present in the samples that have an impact on the reflectance measurements.There is no clear answer as to how many factors should be retained for analysis (only general rules of thumb, such as a scree plot analysis) and, since we are using the PCA as a pre-processing step for a supervised learning task, in this paper we decided to treat the number of PCs as another parameter to be optimized during the cross-validation task, in which we used different try-outs to choose the ideal number of PCs.Thus, every model was tested using between one and 50 principal components, saving the best result.We chose 50 PCs as the upper-limit since the results never improved above that number of PCs and the computational cost starts to increase with a higher number of PCs retained for analysis.

Support Vector Regression
The advantages of using a support vector machine methodology are (both for classification or regression): it has a regularization parameter, making it easier to avoid overfitting; it maps the input Observing Figure 4 it is noticeable that there are a large number of peaks throughout the spectrum, with emphasis between 400-700 nm, that are usually related to chemical compounds such as chlorophyll, carotenoids and anthocyanin, while the peeks between 700-800 nm are commonly associated with sugars [38].
A basic assumption in the use of the PCA is that the score and loading vectors corresponding to the largest eigenvalues contain the most useful information relating to the specific problem, and that the remaining ones mainly comprise noise [39].In general cases, the number of principal components used is chosen by the number of factors with eigenvalues over one, resulting in a percentage of variance explained (that is, the variability accounted for in the data) in a cumulative sum usually between 95-99%.However, in this case the aforementioned assumption might not be true due to the highly complex chemical interactions present in the samples that have an impact on the reflectance measurements.There is no clear answer as to how many factors should be retained for analysis (only general rules of thumb, such as a scree plot analysis) and, since we are using the PCA as a pre-processing step for a supervised learning task, in this paper we decided to treat the number of PCs as another parameter to be optimized during the cross-validation task, in which we used different try-outs to choose the ideal number of PCs.Thus, every model was tested using between one and 50 principal components, saving the best result.We chose 50 PCs as the upper-limit since the results never improved above that number of PCs and the computational cost starts to increase with a higher number of PCs retained for analysis.

Support Vector Regression
The advantages of using a support vector machine methodology are (both for classification or regression): it has a regularization parameter, making it easier to avoid overfitting; it maps the input vectors to a high-dimensional feature space, so that it is possible to build expert knowledge about a problem via engineering the kernel function; and, most importantly, the algorithm is defined as a convex optimization problem, for which there is no local minima (unlike neural networks or partial least squares regression), using a subset of training points in the decision function, called support vectors, making the computation cost significantly smaller.In order to introduce our choice of the type of regression and kernel employed in the support vector regression framework, below we provide a brief description of the Support Vector (SV) algorithm.A complete mathematical formulation can be found in [40,41].
Given a set of training data {(x 1 , y 1 ), . . . ,(x n , y n )} ⊂ χ ∈ R, where χ denotes the space of the input patterns, the objective is to find a function f (x) = w, x + b, w ∈ χ, b ∈ R that has a maximum deviation ε from the real measured targets y i for all the training data and, simultaneously, is as flat as possible (that is, to be as close to a straight line as possible).
In the case of a linear function, we can express the convex optimization problem as the minimization of the Euclidean norm; however, we must also guarantee that infeasible constraints are dealt with by introducing slack variables ξ i , ξ * i , reaching the formulation stated in [35]: where the constant C > 0 determines the trade off between the flatness of f and the amount up to which deviations larger than ε are tolerated.An alternative to Vapnik's ε-SV regression was introduced by [42], named ν-SV regression, where ε is not defined a priori but is itself a variable, where its value is traded off against model complexity and slack variables by means of a constant ν ∈ [0, 1].In order to extend the SV machine to nonlinear functions, one must start by writing the optimization problem in its dual formulation via a Lagrange function from both the objective function and the corresponding constraints by introducing a dual set of variables.Satisfying all the constraints and optimizing the equation, we reach the so-called SV expansion, stated below: where ω is completely described as a linear combination of the training patterns, and examples with α i > 0 are the support vectors.
The chemical processes we are modelling are nonlinear, so the final step is to adapt the SV algorithm to deal with such processes.A computationally cheap way is to map the input vectors into a high-dimensional feature space through some nonlinear mapping, and then to solve the optimization problem in that feature space.With the use of a suitable, nonlinear function κ (called kernel), we obtain the nonlinear regression functions of the form: Two major types of kernel can be defined: local kernels (based on distance), which state that only the data that are near each other have an influence on the kernel values; and global kernels (based on the dot product), which state that samples that are far away from each other still have an influence on the kernel value [43].
In the present work, a model with Vapnik's ε-SV algorithm and a Gaussian radial basis local kernel (κ(x, x i ) = exp −γ x − x i 2 ) was chosen because it has obtained the least root mean square error for all the models tested (linear, sigmoid, polynomial and Gaussian kernels with Vapnik's ε-SV regression and Chalimourda's υ-SV regression, as seen in Tables 2 and 3).The interval of values for the parameter C of the algorithm and the parameter γ in the kernel (the hyper parameters) was determined via a genetic algorithm, resulting in C ∈ [80; 120] and γ ∈ [0.00001; 0.001], and a random search algorithm was implemented to search these intervals for the first combination of these values lower than an error threshold (mean squared error lower than 0.1) with ε = 0.001 in the cross-validation procedure, with 10 attempts performed in each different configuration to ensure unbiased predictions.

N-Fold Cross-Validation with Test Set
To avoid overfitting, a cross-validation approach was employed in the model presented, allowing for the separation of the observations into a training set (to estimate the calibration) and a validation set (to validate the calibration and to correct parameters).In more advanced experiments on the model's capacity, an independent external test set was used (to evaluate the model performance on an independent set of samples).For further testing the model's generalization capacity, some of the independent test sets will be composed of new samples of different varieties of wine grape berries not yet seen by the model on the training and validation sets.The cross-validation approach chosen was the n-fold cross-validation method, where the data is split into n folds: n − 1, used for training, and one for validation, namely the procedure repeated for every fold with a different validation set at each time.The advantage of this kind of approach is that it guarantees that each sample in the dataset is used by the model at least once, contrary to methods that draw data randomly with replacement.The final validation results presented are the average of the results for all the validation sets in each experiment.The training, validation and independent test sets are chosen at random for each run, and the fine-tuning of the model's hyper parameters only occurs in the cross-validation stage to guarantee an unbiased model.
In order to proceed to a comparison between the present results and state-of-the-art publications, both the root mean square error of the cross-validation and test sets were used (RMSE), alongside the squared correlation coefficient (R 2 )-these values are defined as follows: where y i is the reference value, ŷi is the model estimate, σ y ŷ is the covariance between y and ŷ and σ y , σ ŷ are the respective standard deviations.Each wine grape variety data had its own training and validation sets (Section 3.1), except for the 2015 vintage that was only used in more advanced experiments.The finetuning of the model's hyperparameters occurred for all varieties.An independent external test set was used in the study of the TF 2012 variety and in the study of a mixture of samples from different vintages of the TF variety (Section 3.2), since these are the only cases that, in our opinion, presented a sufficient number of samples to use a test set and obtain significant results.To further test the model's generalization capacity, an independent external test set composed of different varieties of wine grapes was also studied for evaluation of the model's performance with previously unused varieties in the test set (Section 3.3).Table 4 provides detailed information about the experiences conducted and described in each section.A descriptive statistical analysis was performed to study the datasets used (see Appendixs A-C).ANOVA (one-way analysis of variance) tests were also employed to verify any significant differences between the means of the different sets (see Appendixs D-F), and it was found that there are significant differences in the means between the datasets used for this work.

Model Training and Validation
The validation set results obtained by the models (one for each vintage) for the prediction of the anthocyanin concentration, pH index and sugar content in the different vintages are presented in Table 5.For the 2014 vintage of the TF variety there are no laboratory results available for the anthocyanin concentration, preventing the development of a model for that particular vintage.
For each dataset the training and validation were conducted with 10-fold cross-validation for the TF 2012 samples, and with five-fold cross-validation for the remaining vintages (because these have a lower number of samples and reducing the number of folds helps avoiding overfitting in these scenarios).

Model Behaviour Using Test Sets
The study of the model's behavior using test sets was performed with the TF 2012, TF 2013, TF 2014 and TF 2015 datasets (except for the prediction of the anthocyanin concentration, where the 2014 vintage results were not available), with 10% of the samples used as an independent test set: the first, with the 2012 vintage (test set: 30 samples); the second, composed of 2012 and 2013 vintages samples (test set: 36 samples); the third, with samples from the 2012, 2013 and 2014 vintages (test set: 50 samples for sugar content and pH index); the fourth, composed of 2012, 2013, 2014 and 2015 vintages (63 samples for sugar content and pH index, 48 samples for anthocyanin concentration).
For each data set the training and validation were conducted with the remaining samples, with 10-fold cross-validation and with the fine-tuning of the hyper parameters in the Support Vector Regression (SVR) models occurring only for the training and validation sets.
Table 6 shows the results obtained for the prediction of the anthocyanin concentration, pH index and sugar content on the external test sets while referencing the best result found in the literature for the same experiment, for a direct comparison.For more information regarding these test sets, see Appendixs G-I.The overall results for the different test sets and for all the oenological parameters under study represent an improvement in the R 2 and RMSE values in comparison to the ones obtained in the validation sets.Since this is somewhat unusual in machine learning problems, Table 7 shows a set of experiments that were employed to rule out a possible inherent bias of the SVR algorithm to perform well on the randomly chosen data for the external test sets (that is, a natural predisposition of the algorithm to obtain better results for a certain set of samples).
For each experiment, the samples were independent and chosen at random, which guarantees that the model's response is unbiased.A total of 10% of the samples were left out to comprise the independent test set, while the remaining samples were used to train and calibrate the model using 10-fold cross-validation.A possible explanation for the difference in performance in the independent testing phase is that through the pipeline of our model there are various steps where the concern with overfitting [44][45][46] is present.The application of methods to counter this effect, and the sum of all these applications, might then lead the model away from overfitting and lead it into an underfitting situation.This is not a traditional underfitting scenario, where the model cannot capture the relationships in the data during the training stage, but a scenario where the inductive bias of the algorithm is predisposed to lead to increased generalization capacity.Another possibility is that because the optimization method for the hyper-parameters in the SVM tries to optimize the results for the respective test sets on the many different random cross-validation runs, it sacrifices training performance for enhanced generalization capacity [47].Either way further study on this topic should be conducted.
The enhanced generalization capacity of the model is noticeable by the differences in the results between the validation and test sets present in Table 7: a model with strong inductive biases is likely to benefit when these biases are well suited to the data, which seems to be the case for this work.

Model Generalization: Different Varieties and Vintages
The last test aimed to further analyse the model's generalization capacity and involved the use of a mixture of samples from different vintages and varieties of wine grape berries; this is a very important configuration, because if the prediction algorithm cannot handle the variations in the grapes' oenological parameters that are known to occur between years and varieties, the application of the models becomes more complex since it will require a new model to be used for every different year or for every different variety.
The test of the model's generalization capacity was performed with two datasets: the first using the samples from all the vintages of the TF variety employed in the training and validation phases, with 10-fold cross validation, and 25% of the samples of the TB variety (23 samples) comprising the independent test set; the second also using all the vintages of the TF variety used in the training and validation stages using 10-fold cross-validation, alongside 25% of the samples of the TN variety (17 samples) used for the external test set.
Figure 5 shows the results obtained for the determination of the anthocyanin concentration in both cases.The number of principal components used were 47 and three, respectively, for each of the datasets.For additional information regarding the samples used, please check Appendix J.

Model Training and Validation
When studying the anthocyanin concentration results (Table 5), 10 folds were used for crossvalidation of the Touriga Franca 2012 (with 240 samples), while for the Touriga Franca 2013 (with 82 samples) only five folds were used by the algorithm, so that the model would not skew its learning in the training step.Since the sample set size was smaller, a slight difference between results is considered acceptable; the model was able to obtain similar results and with a smaller number of principal components used, indicating that there is less variance between samples in the 2013 vintage.Overall, the model showed accurate predictions with a small error rate and a good value for the squared correlation coefficient, in line with similar works from the literature.A descriptive statistical analysis of the datasets was conducted (see Appendix A) to study the behaviour of both set of samples.
Analysing the pH index results (Table 5), it is observable that the model underperforms when predicting the pH index in comparison to the determination of anthocyanin concentration (an issue that was also reported in other state-of-the-art approaches, as seen in Table 1).A possible explanation is that since for each run random samples were chosen to compose the validation set, the greatest variation in the pH patterns was reflected in the model's training step, which had difficulty capturing such relationships in the data with a small number of samples.Also, there is an additional challenge

Model Training and Validation
When studying the anthocyanin concentration results (Table 5), 10 folds were used for cross-validation of the Touriga Franca 2012 (with 240 samples), while for the Touriga Franca 2013 (with 82 samples) only five folds were used by the algorithm, so that the model would not skew its learning in the training step.Since the sample set size was smaller, a slight difference between results is considered acceptable; the model was able to obtain similar results and with a smaller number of principal components used, indicating that there is less variance between samples in the 2013 vintage.Overall, the model showed accurate predictions with a small error rate and a good value for the squared correlation coefficient, in line with similar works from the literature.A descriptive statistical analysis of the datasets was conducted (see Appendix A) to study the behaviour of both set of samples.
Analysing the pH index results (Table 5), it is observable that the model underperforms when predicting the pH index in comparison to the determination of anthocyanin concentration (an issue that was also reported in other state-of-the-art approaches, as seen in Table 1).A possible explanation is that since for each run random samples were chosen to compose the validation set, the greatest variation in the pH patterns was reflected in the model's training step, which had difficulty capturing such relationships in the data with a small number of samples.Also, there is an additional challenge in measuring the pH of wine grape berries, since the acidity is sensitive to small changes in the condition of the sample.The one-way ANOVA analysis showed that there are significant differences in the means (ρ < 0.001) between the TF 2013 and all of the remaining vintages (Appendix E), which might have caused the slight downgrade in the model's performance.Since the prediction intervals were significantly small and the standard deviations between the samples were close to zero, this may be the reason for the model's difficulty to learn from these datasets (see Appendix B).A higher number of samples might be required in order to capture the relationships in the data.
Studying the sugar content results (Table 5), it can be seen that the model's results for all the vintages are accurate, with high squared correlation coefficient values and low root mean square error values.These results indicate that despite the variations in the patterns present in each set, the model was able to capture most of the relations in the data early in the training step.Descriptive statistics for all the sets of samples (see Appendix C) are presented.Regarding the TF 2013 and TF 2014 vintages, it is possible to observe that the latter obtained inferior results, although it had a higher number of samples.Performance of a one-way ANOVA to study meaningful differences in the means for each set of samples revealed that there was a significant difference in the means (ρ < 0.001) between the vintages in the analysis (Appendix F), which might cause the small variations in the results and highlights the model's capacity to learn from populations of wine grape berries with meaningful statistical differences.
The validation set results presented strongly suggest that the framework is robust, being able to learn the patterns present in the data, but also indicates a need for an increase on the number of samples for the most complex data sets.

Model Behaviour Using Test Sets
Investigating the anthocyanin concentration results (Table 6) and comparing these with authors with similar methodologies, Chen et al. [17] and Fadock et al. [19] previously used a SVR approach to measure the total anthocyanin concentration of wine grape berries, obtaining an R 2 of 0.941 and 0.690, respectively, (the error rate was calculated with different measurements and a comparison is not suitable).Looking at the results obtained in the present work, it is noticeable that our approach using an SVR model provided better results, even when a mixture of vintages was used.Previously, Reference [10] had obtained the highest coefficients for the estimation of anthocyanin concentration in external test sets, with a R 2 of 0.950 and RMSE of 14.000 mg•L −1 using a neural network approach.It is possible to observe that for the TF 2012 data set, our results showed an equivalent but slightly better R 2 (0.970) and a better RMSE (11.748 mg•L −1 ).As for the test sets composed of a mixture of TF samples, the results suggest a robust model with a capacity to learn from wine grapes of different vintages.It is also observable that the number of principal components used increased with the variability of the data used and that this increase may be crucial to the model's adaptability to the differences in the variance of the datasets allowing for more stable predictions.
Analysing the pH index results using test sets (Table 6), while looking for authors using a similar model to predict the pH index of wine grape berries, Reference [9] previously used an approach with least squares support vector machines tuned via a genetic algorithm to measure this chemical compound, obtaining a R 2 of 0.957 with a RMSE of 0.126 for the validation set.Unfortunately, the results for an external test set were not provided in the published work.The authors in [19] used a SVR model to measure the pH index as well, obtaining a R 2 of 0.560 and RMSE of 0.050 for a test set composed of samples from different vintages, and a R 2 of 0.807 and RMSE of 0.050 for a test set composed of samples from the same variety and vintage.Our results showed better performance for predictions on an external test set using a SVR model, with and without different vintages of wine grape berries used in those sets.Despite showing superior results than those obtained by [19] (0.807 for R 2 and 0.050 for the RMSE) for an external test set, the model's performance for the pH index was the worst of all the oenological parameters studied, suggesting that a higher number of samples might be required to better incorporate all the data variability into the model.
Studying the sugar content results using test sets (Table 6), for sugar content determination, Reference [9] previously used an approach with least squares support vector machines tuned via a genetic algorithm to measure this chemical compound, obtaining a R 2 of 0.820 with a RMSE of 0.960 ºBrix for the validation set.The results for an external test set were not provided in the published work.The authors in [19] also used a SVR model to measure sugar content, obtaining a R 2 of 0.710 and RMSE of 0.870 • Brix for a test set composed of samples from different vintages, and a R 2 of 0.910 and RMSE of 0.650 • Brix for a test set composed of samples from the same variety and vintage.Our results denote better performance for predictions on an external test set using a SVR model, with and without different vintages of wine grape berries used in those sets.The best previous results for sugar content prediction were from [12], with values of 0.959 and 1.026 • Brix for R 2 and RMSE, respectively.Our model outperforms the mentioned work for the single vintage dataset (R 2 of 0.964 and RMSE of 0.804), and presents slightly worse results in the mixture of samples from different vintages.

Model Generalization: Different Varieties and Vintages
Regarding the model's generalization capacity for the different vintages and varieties of wine grape berries used in the independent external test sets (Figures 5-7), despite a slight drop in the model's performance and a higher error rate in the predictions (which is considered a reasonable outcome due to the fact that these varieties are not found in the training steps, increasing the uncertainty), these results are good indicators with respect to the model's generalization ability, since they indicate it might not be necessary to build models that require a yearly update of samples, or new samples for each variety.Using a mixture of samples from different vintages and varieties in the test phase is, according to the authors' knowledge, a rare configuration in general and very unique for samples with a small number of whole berries.Comparing the results with the only work found in the literature using different varieties of wine grapes in the testing phase, we found that [14] obtained a R 2 of 0.751 and a RMSE of 23.200 mg•L −1 for the determination of anthocyanin concentration, and a R 2 of 0.710 and a RMSE of 0.176 for the prediction of pH index; our model outperforms the mentioned work for all the datasets studied.
Analysing the values obtained for each oenological parameter, it is clear that the model performs quite well on the determination of the sugar content and anthocyanin concentration, with a minor decrease in the squared correlation coefficient and the root mean squared error.With respect to the determination of the pH index, it is possible that the observed differences between varieties might result from the more complex patterns of pH related to their genetic proximity or distance and the analysis should have this focus for future work.Also, it is reasonable to infer that more samples are needed in order to capture the more complex patterns in the spectra.The difference between the number of principal components retained for analysis in each case should also be highlighted.Although this goes beyond the scope of this paper and should be the subject of future research, we believe that this might happen due to the genetic proximity of the wine grape berries varieties in question; the TN variety is more similar to the TF variety than to the TB variety, which might explain why fewer principal components were required for the TN test set for sugar content and anthocyanin concentration.

Conclusions
A hyperspectral imaging technique was combined with a machine learning algorithm (support vector regression) to compose a framework capable of estimating oenological parameters with different varieties and vintages of wine grape berries.The present paper presents a fast, inexpensive and non-destructive type of analysis that provides an alternative to traditional methods when studying wine grape berries during ripening.
The results obtained are competitive in comparison to current state-of-the-art publications in the prediction of sugar content, anthocyanin concentration and pH index, maintaining high performance across different varieties and vintages of wine grapes.This represents a step forward in terms of the study of the generalization capacity, which is very important in order to achieve a model capable of predicting values for a wide variety of wine grapes without the need to capture more and more samples throughout the years to tune the predictor.Moreover, the hyperspectral imaging was conducted in the reflectance mode and with a small number of whole berries, which is a setup rarely found in literature and is relevant when mapping areas in order to improve ripening, selecting the best berries inside each bunch for the production of high quality wines.
Further work should include the study of different pre-processing, dimensionality reduction and feature selection methods, which may represent an improvement in the models' capacity to capture different patterns in the spectra, especially for the estimation of the pH index, which obtained slightly inferior results.Also, datasets composed of different varieties for the training and test phases should be investigated to promote the development of the generalization capacity while investigating the models' behaviour when tested without supplementary training.

Figure 1 .
Figure 1.Experimental setup used for hyperspectral imaging.

Figure 2
Figure2shows a hyperspectral image taken by the aforementioned setup, for samples of the TF 2012 variety.

Figure 2 .
Figure 2. Hyperspectral image of samples of the TF 2012 variety before segmentation and reflectance measurements.

Figure 1 .
Figure 1.Experimental setup used for hyperspectral imaging.

Figure 2
Figure2shows a hyperspectral image taken by the aforementioned setup, for samples of the TF 2012 variety.

Figure 1 .
Figure 1.Experimental setup used for hyperspectral imaging.

Figure 2
Figure2shows a hyperspectral image taken by the aforementioned setup, for samples of the TF 2012 variety.

Figure 2 .
Figure 2. Hyperspectral image of samples of the TF 2012 variety before segmentation and reflectance measurements.Figure 2. Hyperspectral image of samples of the TF 2012 variety before segmentation and reflectance measurements.

Figure 2 .
Figure 2. Hyperspectral image of samples of the TF 2012 variety before segmentation and reflectance measurements.Figure 2. Hyperspectral image of samples of the TF 2012 variety before segmentation and reflectance measurements.

Figure 3 .
Figure 3. Reflectance measurements for the TF 2012 variety samples.

Figure 3 .
Figure 3. Reflectance measurements for the TF 2012 variety samples.

Figure 4 .
Figure 4. Loadings plot for the TF 2012 variety data matrix.

Figure 4 .
Figure 4. Loadings plot for the TF 2012 variety data matrix.

Figure 5 .
Figure 5. Results for the determination of the anthocyanin concentration in the external test sets to further study the generalization capacity; (a) TB 2013 samples; (b) TN 2013 samples.

Figure 6
Figure 6 illustrates the outcome for the prediction of the pH index in both cases.The number of principal components used were 11 and 50, respectively, for each of the experiments.More information on the samples used can be seen in Appendix K.

Figure 5 .
Figure 5. Results for the determination of the anthocyanin concentration in the external test sets to further study the generalization capacity; (a) TB 2013 samples; (b) TN 2013 samples.

Figure 6
Figure 6 illustrates the outcome for the prediction of the pH index in both cases.The number of principal components used were 11 and 50, respectively, for each of the experiments.More information on the samples used can be seen in Appendix K.

Figure 5 .
Figure 5. Results for the determination of the anthocyanin concentration in the external test sets to further study the generalization capacity; (a) TB 2013 samples; (b) TN 2013 samples.

Figure 6 Figure 6 .
Figure6illustrates the outcome for the prediction of the pH index in both cases.The number of principal components used were 11 and 50, respectively, for each of the experiments.More information on the samples used can be seen in Appendix K.

Figure 7
Figure 7 presents the results for the estimation of the sugar content in both cases.The number of principal components used were 45 and 17, respectively, for each of the datasets.Further information regarding the samples is presented in Appendix L.

Figure 6 .
Figure 6.Results for the prediction of the pH index in the external test sets to further study the generalization capacity; (a) TB 2013 samples; (b) TN 2013 samples.

Figure 7 Figure 7 .
Figure 7 presents the results for the estimation of the sugar content in both cases.The number of principal components used were 45 and 17, respectively, for each of the datasets.Further information regarding the samples is presented in Appendix L. Remote Sens. 2018, 10, 312 14 of 23

Figure 7 .
Figure 7. Results for the estimation of the sugar content in the external test sets to further study the generalization capacity; (a) TB 2013 samples; (b) TN 2013 samples.

Table 2 .
Experimental results to build the support vector regression model with different loss functions.
1 Principal components used.

Table 3 .
Experimental results to build the support vector regression model with different kernel functions.
1 Principal components used.

Table 4 .
Outline of the different experiments performed in the sections below.

Concentration pH Index Sugar Content Sections Train./Val. Set Test Set Train./Val. Set Test Set Train./Val. Set Test Set
1 No samples available.2Notused.

Table 5 .
Results obtained in the validation set for the determination of the anthocyanin concentration, pH index and sugar content of the wine grape berries.
1 Principal components used.

Table 6 .
Results obtained in the external test set for the determination of the anthocyanin concentration, pH index and sugar content of the wine grape berries.
1 Principal components used.2Samevariety and vintage of wine grape berries used in the training and test sets.

Table 7 .
Results obtained by the SVR model in 10 different experiments for the validation and test sets of samples from the TF 2012 vintage for sugar content prediction.