Modeling Pinot Noir Aroma Profiles Based on Weather and Water Management Information Using Machine Learning Algorithms: A Vertical Vintage Analysis Using Artificial Intelligence

Wine aroma profiles are determinant for the specific style and quality characteristics of final wines. These are dependent on the seasonality, mainly weather conditions, such as solar exposure and temperatures and water management strategies from veraison to harvest. This paper presents machine learning modeling strategies using weather and water management information from a Pinot noir vineyard from 2008 to 2016 vintages as inputs and aroma profiles from wines from the same vintages assessed using gas chromatography and chemometric analyses of wines as targets. The results showed that artificial neural network (ANN) models rendered the high accuracy in the prediction of aroma profiles (Model 1; R = 0.99) and chemometric wine parameters (Model 2; R = 0.94) with no indication of overfitting. These models could offer powerful tools to winemakers to assess the aroma profiles of wines before winemaking, which could help adjust some techniques to maintain/increase the quality of wines or wine styles that are characteristic of specific vineyards or regions. These models can be modified for different cultivars and regions by including more data from vertical vintages to implement artificial intelligence in winemaking.


Introduction
Wine quality traits are difficult to assess in a rapid and objective way in vineyards, especially before winemaking. Usually, quality assessments that are performed in the wine industry are related to the acidity and sugar content in berries (Brix or Baume) to assess maturity [1,2]. However, this assessment only gives information about the amount of alcohol and acidity in the final wine through fermentation. Hence, berry sugars/acidity do not provide useful information on any other important quality trait, such as the potential aroma profiles that could be obtained in the final wine.
Alcohol present in beverages has been found to have an effect on the perception of flavor and aromas, as it aids in the release of volatile aromatic compounds [3]. Furthermore, higher alcohol wines have been sometimes regarded as beneficial for the physicochemical expression of color and other quality traits that impact their sensory evaluation [4]. However, increasing the alcohol content in wines is a problem nowadays due to climate change, specifically global warming. Specifically, higher temperatures are compressing phenological stages, resulting in earlier harvest during hotter months around the globe [5][6][7][8]. This phenomenon produces a double global warming effect in WB = I + RF(0.85) − ET c (1) where WB = water balance; I = irrigation applied in megaliter (ML); RF = effective rainfall, considering 85% of the water is available to the plant, and ET c = crop evapotranspiration calculated using the corresponding crop coefficient (K c ) for different phenological stages [14].

Physicochemical Analysis
Wines from each vintage were analyzed in triplicates for the different physicochemical data measured in this study. A volume of 20 mL of each wine sample was poured in a 60 × 15 mm Greiner Bio-One Polystyrene Petri dish (item number 628102; Greiner Bio-One, Kremsmünster, Austria) and placed on a white uniform surface. Color in CIELab and RGB scales was measured using a NIX Pro color sensor (NIX Sensor Ltd. Hamilton, Ontario, Canada). The UV-Vis spectra from 380 to 780 nm were acquired with a Lighting Passport Pro portable spectrometer (Asensetek Incorporation, New Taipei City, Taiwan). To calculate color intensity, the absorbance of 420, 520, and 620 nm were summed, while for color hue, the absorbance from 420 nm was divided by the value from 520 nm. Fifty mL of each wine sample were used to determine liquid density (weight divided by volume), pH was determined using a pH-meter (QM-1670, DigiTech, Sandy, UT, USA), total dissolved solids (TDS) and electric conductivity (EC) were measured with a Yuelong YL-TDS2-A digital water quality tester (Zhengzhou Yuelong Electronic Technology Co., Ltd, Zhengzhou City, Henan Province, China), salt concentration was measured using a digital salt-meter (PAL-SALT Mohr, Atago Co., Ltd. Saitama, Japan), and alcohol content using an AlcolyzerWine M alcohol meter (Anton Paar GmbH, Graz, Austria).

Gas Chromatography-Mass Spectroscopy
A 5 mL sample of each wine replicate was poured into a 20 mL screw cap vial and sealed with an 18 mm magnetic screwcap with a polytetrafluoroethylene and silicone liner. These samples were analyzed with the method proposed by Gonzalez Viejo et al. [38] using a high-efficiency gas chromatograph with a mass selective detector 5977B (GC-MSD; Agilent Technologies, Inc., Santa Clara, CA, USA), coupled with a PAL3 autosampler system (CTC Analytics AG, Zwingen, Switzerland). The GC-MSD has a detection limit of 1.5 fg, and an HP-5MS column was attached (length: 30 m, inner diameter: 0.25 mm, film: 0.25 µ; Agilent Technologies, Inc., Santa Clara, CA, USA), while the flow rate was set to 1 mL min −1 of the carrier gas (Helium). Headspace with solid-phase microextraction (SPME) and a divinylbenzene-carboxen-polydimethylsiloxane grey fiber (1.1 mm; Agilent Technologies, Inc., Santa Clara, CA, USA) was used. Incubation time was set to 20 at 45 • C with a 5 min cycle and 1 min for fiber conditioning (170 • C). Furthermore, the extraction time was set to 40 min with agitation. Two blank samples were used, one at the start and one at the end to avoid any carryover effect. To identify the volatile compounds, the National Institute of Standards and Technology library (NIST; National Institute of Standards and Technology, Gaithersburg, MD, USA) was used. Only the compounds with ≥ 80% certainty were reported.

Statistical Analysis and Machine Learning Modeling
Data from weather, physicochemical, and aroma profile measurements were analyzed using a customized code written in Matlab ® R2019a (Mathworks, Inc. Natick, MA, USA) to assess significant correlations (p < 0.05) between parameters were reported in a matrix. These data were also used to develop machine learning models based on artificial neural networks (ANN) using an automated code in Matlab ® that tests 17 different training algorithms in a loop. The weather data related to (i) solar exposure V-H, (ii) solar exposure from S-H, (iii) MJSE, (iv) DD-S-H, (v) MJT, (vi) MeanMaxTV-H, (vii) MeanMinTV-H, and (viii) water balance were used as inputs for machine learning purposes. Two models were developed using these inputs to predict (i) the peak area of nine volatile aromatic compounds measured using the GC-MSD (Model 1) and (ii) 14 physicochemical measurements (Model 2). Both models were developed using normalized data (inputs and targets) from −1 to 1, and with a random data division with 60% of the samples used for training with a Levenberg-Marquardt algorithm, 20% for validation with a mean squared error performance algorithm, and 20% for testing with a default derivative function. The number of neurons was defined by performing a trimming exercise with three, five, seven, and 10 neurons, with 10 neurons giving the best models that contribute to the absence of overfitting. The models consisted of a two-layer feedforward network with a tan-sigmoid function in the hidden layer and a linear transfer function in the output layer ( Figure 2).  Table 1 Table 2 shows the nine volatile compounds identified in all the wine samples tested and the aromas associated with them. It can be observed from this table that most of the aromas are related to fruity scents, especially apple, with two specific compounds (phenylethyl alcohol and ethyl laurate) with floral and one (ethyl palmitate) with milky or creamy notes.   Table S1. In Table 3, the statistical results from the ANN models are shown. Model 1 had an overall high correlation coefficient (r = 0.99) with similar results for all stages (training, validation, and testing; r > 0.97) to predict the peak area of nine volatile aromatic compounds (Table 2). From the performance, it can be observed that both validation and testing mean square error (MSE) values were the same (MSE = 0.03), and the training had a lower result (MSE = 0.003), which contributes to the absence of overfitting of the model. Furthermore, the slope (b) for all stages and the overall model was close to the unity (b = 0.97). On the other hand, Model 2 had an overall correlation r = 0.94 to predict 14 physicochemical parameters (Figure 2b). The slopes from the models of the three stages were high enough (b > 0.83) with an overall model b = 0.90. Similar to Model 1, the performance of the training stage from Model 2 was lower (MSE = 0.02) than the validation and testing stages, with the last two presenting similar results (MSE = 0.05 and MSE = 0.06; respectively). Table 3. Statistics from the artificial neural network models to predict the aroma profile based on the peak area of volatile aromatic compounds (Model 1) and the physicochemical data (Model 2) from Pinot noir wines.  Figure 4a shows the overall Model 1 to predict the aroma profile based on the peak area of volatile aromatic compounds of Pinot noir wines. From the 95% confidence bounds, only 1.01% of outliers (six out of 594) were found. On the other hand, Figure 4b depicts the overall Model 2 to predict the physicochemical data of the wines. Regarding the 95% prediction bounds, the model presented 3.25% (30 out of 924) of outliers. For both models, several retraining attempts were performed, obtaining similar results to those presented in Table 3 and Figure 4. When feeding these models with new data, the outputs values are given normalized from −1 to 1; however, the reverse function for normalization in Matlab ® R2019a (Mathworks Inc., Natick, MA, USA) provides the actual values in the corresponding units.  Figure 2). The models show the observed (x-axis) and predicted (y-axis) data as well as the 95% confidence bounds.

Discussion
The physicochemical parameters assessed in this study have been associated with wine quality by other authors. Aromas and color-related parameters are some of the factors that have been the most associated with wine quality [42,43]. Sáenz-Navajas et al. [44] found that there is a relationship between red wine color and the quality perception from consumers and concluded that darker wines with higher red and lower yellow values were rated as higher quality. Jackson et al. [42] reported a significant and positive correlation between both pH and color and overall wine quality. The importance of TDS, EC, and salt measurements rely on the fact that these are an approach to minerals content [45], which are important in wine quality, as the minerals present in wine have been related to those present in the soil, and these have been associated with the wine's nutritional composition and safety [46].
There was a significant variability within the vintages and the particular region in Victoria analyzed in this study. The extremes can be considered for low-quality wines produced in the 2010-2011 vintage due to heavy rains before harvest, which negatively affects the quality traits in berries and wine [47,48]; this low-quality assessment was obtained from anecdotal information from points received in those particular years and the sensory analysis conducted by the vineyard studied. On the contrary, dry seasons were found for example in 2013-2014 and 2014-2015, with increased berry quality traits that were passed to the respective wines. The latter were mainly due to some control of the water received by plants from irrigation and water deficits. These differences contribute to the robustness of the machine learning models found, which presented no indication of overfitting with high precision in the prediction of the peak area of volatile aromatic compounds (Model 1) and physicochemical wine characteristics (Model 2).
The effects of solar exposure and canopy architecture (which is dependent on water balance) on the aroma profiles of wines have been previously reported, and they are consistent with the data presented in Figure 3. Specifically, these effects manifest through the influence of the microclimate within bunches [49], phenolic compounds [50,51], and the flavonol profile [52]. Due to the direct effect of bunch exposure to radiation in the aroma profiles obtained in wines, researchers have investigated the effect of defoliation as a management strategy to increase berry quality and aroma traits, which depends on the cultivar, timing of defoliation, and climatic region [53][54][55][56][57][58][59][60]. These researches demonstrate the importance of fruit exposure to solar radiation and microclimate conditions that are favorable to the development of berry quality traits.
As previously mentioned, seasonal temperatures not only influence the occurrence and length of different phenological stages in grapevines, such as budbreak, flowering, berry set, pea size, veraison, and harvest, but also the chemical and aroma composition of berries. Of critical importance is the influence of weather parameters, such as temperature [61][62][63][64], and water availability from veraison onwards in red cultivars, which is determinant to the final wine quality and aroma profiles. Several studies have focused on the pre and post veraison phenological stages for irrigation treatments to increase berry and wine quality traits, especially in red cultivars [65][66][67][68][69].
For machine learning modeling, it has been demonstrated that the implementation of important parameters as inputs that directly influence the targets proposed render more robust models in contrast to the usage of raw data. Based on calculated parameters rather than raw data inputs, there are recent studies implementing machine learning to assess beer quality [70][71][72], interpret remote sensing data for plant water status assessment in vineyards [73], chocolate quality assessment by consumers using NIR [74], and aroma profiles in cocoa trees based on canopy architecture parameters [75]. In this study, relevant parameters from weather conditions, management strategies, and physicochemical parameters of wines were obtained and considered as inputs in the machine learning modeling, which can explain the high accuracy obtained for the predictions of Models 1 and 2 without signs of overfitting.
The use of ANN for modeling has the advantage of being able to use multiple targets, which makes the models more efficient. This is due to the easiness of feeding only one model to obtain all the output data instead of having to add the new inputs to several single-target models. Several studies related to food and agriculture have used this type of machine learning algorithms with high performance and accuracy [38,71,72,[75][76][77].
The technique proposed considers the readily available weather information from vintages close to the vineyards and a vertical vintage library, which most wineries can obtain easily. The models developed assume that the vineyard management is consistent throughout the seasons, including the winemaking techniques and yeast used. The implementation of these models to other cultivars, environments, and regions will need the incorporation of further site-specific data as inputs and wine chemical and aroma profile analysis from available and contrasting vintages. The latter benefit from the learning aspect of the models proposed, which does not require a full development of new analyses for different regions.

Conclusions
Artificial intelligence techniques can be implemented in the wine industry from readily available weather and management practices data to assess quality traits in final wines. Modeling strategies using artificial neural networks developed for particular regions can be implemented for other cultivars, environments, and regions by including extreme values from their respective vintages. High accuracy models to determine the aroma profile of wines before the winemaking process can offer a powerful tool to growers and winemakers for the decision making in the vinification process to maintain or increase wine quality and styles. Further research is required to adapt these techniques to canopy management strategies and within-season modeling that can be implemented in real-time within the season to manipulate the final wine and aroma profiles to specific targets using management strategies, such as canopy, fertilization, and irrigation management.