The phenolic content of wines produced form V. vinifera
berries can vary widely for several reasons, including vineyard practices [1
], cultivar [2
], vineyard geography [3
], vintage [5
], and wine making practices [6
]. Phenolic quantitation is invaluable from a commercial perspective, particularly for red wines that have a greater and more diverse phenolic content than wines made from white cultivars [7
] due to the duration of skin contact during red wine production [6
As wine phenolics possess similar chemical structures, they also possess similar ultraviolet–visible (UV–Vis) spectra. For this reason, several methods aimed at isolating wine phenolics by class have been developed [8
]. Analysis of phenolics using HPLC and mass spectrometry has also been developed [13
]. Regardless of the methodology, phenolic analysis by separation is consumptive of time and resources to obtain accurate results. For that reason, several researchers have attempted to circumvent this necessity by implementing multivariate statistical analysis.
Modern statistical learning theory began in the 1960s with Rosenblatt’s perceptron [14
]. Since that time, the development of modern computers has permitted highly accurate methods for identification [15
], classification [16
], and prediction [17
] across many fields, including enology. For example, Skogerson et al. [18
] applied partial least squares regression (PLSR) to predict the phenolic composition of wine during fermentation from its UV–Vis spectra. Beyond phenolic prediction, Hosu et al. [19
] predicted the antioxidant capacity in Romanian red wines using UV–Vis spectroscopy and artificial neural networks. As for alcohol and titratable acidity (TA), Yu et al. [20
] used a least squares support vector machine (LS-SVM) to accurately predict the alcohol content and TA in Chinese rice wine by recording the wine’s UV–Vis and near-infrared spectrum (350 nm–1200 nm). Sensorial predictive models have also been constructed. Lombardo and Veaux [21
] proposed a nonlinear application of PLSR using Multivariate adaptive regression splines (MARS) for the sensorial analysis of both red and white wines.
While modern machine learning approaches have been successfully applied in various ways to enological analysis, the application of such techniques remains experimental. This study attempted to measure the validity of phenolic model prediction in three steps:
1. Compare several multivariate regression models to determine which gives the most accurate predictions for wine phenolics (tannins, anthocyanins, and total iron reactive phenolics).
2. Address phenolic multicollinearity in the UV–Vis spectra by mathematically isolating individual phenolics through the spectrum of a malvidin chloride standard.
3. Compare the final adapted phenolic model predictions across two vintages and two instruments.
2. Results and Discussion
2.1. Algorithm Comparison and Overall Performance
compares the performance of the three algorithms used for phenolic prediction. The first three rows are for anthocyanins, rows 4 through 6 are for tannins, and rows 7 through 9 are for total iron reactive phenolics (TIPs). All root mean squared error values were calculated by taking the square root of the squared sum difference between predicted values and observed values divided by the number of observations (Equation (1)).
Root mean squared error (RMSE) equation.
is equal to predicted values, O
is equal to observed values, and N
is equal to the number of observations.
Root mean squared errors of calibration (RMSEC) in some cases were smaller than that of root mean squared errors of prediction and cross-validation (RMSEP and RMSECV), while the R2 values for prediction and cross-validation (R2P and R2CV) were generally larger than R2 values for calibration (R2C). In these cases, the RMSEC was always smaller than the RMSEP regardless of cost, so these sets were optimized by choosing the cost that maximized the R2P. Support vector regression (SVR) outperformed the other two algorithms overall.
The initial model for this project was built from data acquired using a single spectrophotometer from a single vintage. While the model calibrated and validated well, new predictions made were quite poor as the spectrophotometer available was different from that used in the original work. Beyond different spectrophotometers, the vintage and the grape-growing regions were also different in the new data set, unlike previous work which utilized a single fruit source and vintage [18
]. This was addressed in three steps. The first two steps addressed spectral multicollinearity issues independent of the instrument in use, and the third step addressed the different instrumentation issues.
2.2. Spectral Multicollinearity
In the UV–Vis absorbance spectra of red wine, the spectra of several phenolics overlap including the ones measured here. This can be problematic in building predictive models as it becomes difficult to determine exactly how much absorbance at a given wavelength in the spectra is due to a particular phenolic compound or compound class. For assay measurement of phenolics by UV–visible absorbance, the compounds of interest are typically isolated chemically before the final absorbance is recorded [12
]. Anthocyanins, for example, can be isolated by dropping the pH [24
]. Tannins can be isolated through protein precipitation [25
], while polymeric pigment isolation can be accomplished through bisulfite bleaching [26
]. A goal of this work was to eliminate or at least minimize the need for chemical isolation of phenolics. To achieve this, the spectra for individual phenolics were isolated mathematically. For anthocyanins, this was easily achieved by only considering the visible spectra (430 nm–700 nm) to make predictions as TIPs and tannins have no absorbance in the visible range. For TIPs and tannins, absorbance in the UV range of the spectra (230 nm–429 nm) due to the presence of anthocyanins had to first be estimated and removed. To calculate this estimate, the entire spectrum of the malvidin chloride (MC) standard was transformed such that the absorbance at each wavelength was a percentage of the sum total, such that the spectrum summed to one. Next, the portion of each spectrum in the raw data due to anthocyanins below 430 nm was estimated by multiplying each wavelength in the transformed MC spectrum below 430 nm by the raw wine spectra at 520 nm divided by the transformed MC spectra at 520 nm (Equation (2)). Lastly, each calculated anthocyanin spectrum below 430 nm was subtracted from each raw wine spectrum below 430 nm to give the final spectra for TIPs and tannins.
Equation (2): The phenolic spectra used to predict tannins and total iron reactive phenolics (TIPs) was generated by multiplying each point in the transformed malvidin chloride (MC) spectra below 430 nm by the raw sample spectra at 520 nm divided by the MC spectra at 520 nm.
It is important to emphasize that the predictive models presented are meant to predict the chemical phenolic composition of a given red wine only rather than its perceived sensorial aspects [27
]. While the sensorial perception of a wine is obviously important, building such a model is beyond the scope of this work.
In an ideal world, every UV–Vis absorbance spectrophotometer would be identical in every way. This is of course not the case but having the ability to apply the same predictive model across different instruments would be advantageous. For that reason, two different instruments were compared in this study, namely the Genesys 10S produced by Thermo Fisher Scientific (Waltham, MA) and the Cary 14 spectrophotometer produced by Olis (Bogart, GA) to address this issue. The two instruments differed in several areas, including instrument sensitivity, absorbance quantification range, and available spectral range. The first data set compared several different dilutions for data acquired using the Genesys 10S spectrophotometer. Once the optimal dilution for that instrument was determined, a new sample set was acquired from a new vintage and a different region. Several different ratios of model wine to wine were tested in the Cary 14 spectrophotometer until the scaled spectra of the new samples closely resembled the average of the scaled spectra from the Genesys S10 data set. The difference in optimal dilutions between the two spectrophotometers was considerable (a 1:5 dilution was optimal for the Genesys S10, 1:25 was optimal for the Cary 14).
Unfortunately, simply calibrating an instrument using an accepted standard is not a reliable way to apply a multivariate predictive model across different instruments. Beyond absorbance sensitivity (spectral resolution), other variations such as signal to noise ratio and ultraviolet absorbance to visible absorbance ratio can and do vary between instruments. For this reason, whenever a predictive model is implemented with a new instrument, it is strongly recommended that a subset of data using the new instrument be added to the original data set. The subset should contain both assay data and the concomitant spectral data. The combined data set should then be calibrated and validated to maximize model predictive accuracy using the new instrument.
2.4. Phenolic Evolution
Polymeric pigments are formed through reactions of tannins, other phenolics, and keto-acids with anthocyanins [6
]. The spectral data acquired in this study suggests a significant change in color occurred within the first month after fermentation was complete. Table 2
shows that correlations between phenolic assay measurements and the respective absorbance values of the wine at 520 nm and 280 nm fluctuated greatly over time. Table 3
demonstrates that by the fourth week, there was a significant negative correlation between anthocyanins and TIPs as well as anthocyanins and tannins.
While spectral transformation did greatly improve predictive power for tannins and TIPs, there remained a certain level of inherent error for tannins and TIPs as there is no wavelength in the UV spectra in which tannins and TIPs do not overlap. TIPs are very heterogeneous by nature, and for this reason, there are no established external standards available for TIPs. This makes spectral isolation of tannins and TIPs difficult if not impossible. Despite this, tannin and TIP models performed well, with root mean squared error values below ten percent. This suggests that spectral transformation by removing the calculated malvidin chloride spectra was enough to generate trustworthy tannin and TIP spectra, so long that the model was re-calibrated by combining the old data set with some new data.
Just as with tannins and TIPs, polymeric pigment formation institutes a significant source of predictive error as the formation of such pigments significantly changes the overall correlation between the assay data with any given point in the spectra. For example, while fermenting wines had the highest correlation with measured anthocyanins at 524 nm (0.87), wines four weeks after fermentation was complete had the highest correlation with measured anthocyanins at 357 nm (0.82). Unfortunately, the model applied in this study did not calibrate for polymeric pigments, although it is difficult to say how accurate a predictive model for polymeric pigments built using UV–Vis spectroscopy could be. As mentioned, the spectra of tannins and TIPs overlap, which presents an inherent source of error in tannin and TIP prediction. Polymeric pigments represent a very heterogeneous group of compounds that could be formed not only from covalent interactions between tannins and anthocyanins but also through such interactions between tannins and TIPs, or tannins, anthocyanins, and TIPs. While tannin and TIP models can be adjusted by mathematically removing the estimated spectra of malvidin chloride, an accurate adjustment is difficult for polymeric pigments due to the heterogeneity of the class and, therefore, the heterogeneity of the spectra. Phenolic oxidation over time only further adds to the complexity of such a model. When considering all of these factors together, it becomes more apparent as to why there are no obvious trends among the correlation values depicted in Table 1
and Table 2