Non-Invasive Tools to Detect Smoke Contamination in Grapevine Canopies, Berries and Wine: A Remote Sensing and Machine Learning Modeling Approach

Bushfires are becoming more frequent and intensive due to changing climate. Those that occur close to vineyards can cause smoke contamination of grapevines and grapes, which can affect wines, producing smoke-taint. At present, there are no available practical in-field tools available for detection of smoke contamination or taint in berries. This research proposes a non-invasive/in-field detection system for smoke contamination in grapevine canopies based on predictable changes in stomatal conductance patterns based on infrared thermal image analysis and machine learning modeling based on pattern recognition. A second model was also proposed to quantify levels of smoke-taint related compounds as targets in berries and wines using near-infrared spectroscopy (NIR) as inputs for machine learning fitting modeling. Results showed that the pattern recognition model to detect smoke contamination from canopies had 96% accuracy. The second model to predict smoke taint compounds in berries and wine fit the NIR data with a correlation coefficient (R) of 0.97 and with no indication of overfitting. These methods can offer grape growers quick, affordable, accurate, non-destructive in-field screening tools to assist in vineyard management practices to minimize smoke taint in wines with in-field applications using smartphones and unmanned aerial systems (UAS).


Introduction
A recent report from the Victorian government of Australia concluded that bushfires have increased in number and severity since the 1970s across the east and south of the country [1]. The main contributing factor to this environmental disaster is climate change, specifically the increased frequency of recurrent heat waves (i.e., prolonged periods of hotter weather) and drought conditions, which have increased the window of risk for bushfires, as well as their number, and severity. Recently, Chile (central region), USA (California), Greece, South Africa (Stellenbosch) and Australia (various states) have suffered some of the worst bushfires experienced in each country's history. These countries are major producers of wines, and their grape growers and winemakers are similarly affected by global warming with detrimental effects in drought, vine phenological changes, shifting of suitable grapevine growing regions towards the north and south, and increased bush fire events near wine growing regions [2][3][4].
When bushfires occur in close proximity to vineyards, smoke can contaminate leaves and fruit. One of the main physiological effects of bush fire smoke in grapevines is the reduction of stomatal

Physiological Measurements using Leaf Porometry
Leaf conductance to water vapor was measured as stomatal conductance (gs) using a porometer (AP4, Delta-T Devices, Cambridge, UK). Porometer readings used were obtained from the cultivars Shiraz, Sauvignon Blanc, Chardonnay, and Merlot. Measurements were performed two hours after smoke treatments using nine mature, fully expanded sunlit leaves, for each of the two middle vines of two replicates per treatment per cultivar (n = 72) under natural leaf orientation with natural light intensity. Leaves were chosen to ensure measurements were performed on three leaves from the top, middle, and bottom parts of the canopies from each vine in a 3 × 3 matrix arrangement.

Infrared Thermal Imagery of Canopies
Thermal images were acquired from grapevine canopies using an infrared thermal camera FLIR® T-series (Model B360) (FLIR Systems, Portland, OR, USA), with a resolution of 320 × 240 pixels. The camera measures temperature in the range of −20 to +1200 °C. The thermal sensitivity of the camera is <0.08 °C @ +30 °C/80 mK with a spatial resolution of 1.36 milliradians. Each pixel is considered an effective temperature reading in degrees Celsius (°C). Infrared thermal images were acquired from the same side and in parallel to porometer measurements (shaded side of the canopy to reduce variability) in the estimation of the infrared index (Ig), which is proportional to gs [16]. One thermal image from the canopy of each of the middle vines of two replicates per treatment per cultivar was obtained from a constant distance of 2.5 m perpendicular to the row direction (distance between rows being 3 m; Figure 2A). The calculated infrared thermal index (Ig) was compared with porometry measurements acquired immediately after obtaining each thermal image from

Physiological Measurements Using Leaf Porometry
Leaf conductance to water vapor was measured as stomatal conductance (g s ) using a porometer (AP4, Delta-T Devices, Cambridge, UK). Porometer readings used were obtained from the cultivars Shiraz, Sauvignon Blanc, Chardonnay, and Merlot. Measurements were performed two hours after smoke treatments using nine mature, fully expanded sunlit leaves, for each of the two middle vines of two replicates per treatment per cultivar (n = 72) under natural leaf orientation with natural light intensity. Leaves were chosen to ensure measurements were performed on three leaves from the top, middle, and bottom parts of the canopies from each vine in a 3 × 3 matrix arrangement.

Infrared Thermal Imagery of Canopies
Thermal images were acquired from grapevine canopies using an infrared thermal camera FLIR ® T-series (Model B360) (FLIR Systems, Portland, OR, USA), with a resolution of 320 × 240 pixels. The camera measures temperature in the range of −20 to +1200 • C. The thermal sensitivity of the camera is <0.08 • C @ +30 • C/80 mK with a spatial resolution of 1.36 milliradians. Each pixel is considered an effective temperature reading in degrees Celsius ( • C). Infrared thermal images were acquired from the same side and in parallel to porometer measurements (shaded side of the canopy to reduce variability) in the estimation of the infrared index (I g ), which is proportional to g s [16]. One thermal image from the canopy of each of the middle vines of two replicates per treatment per cultivar was obtained from a constant distance of 2.5 m perpendicular to the row direction (distance between rows being 3 m; Figure 2A). The calculated infrared thermal index (I g ) was compared with porometry measurements acquired immediately after obtaining each thermal image from corresponding vines. All thermal images were acquired on a clear day. The smoke treatments were applied with minimal wind; a requirement for undertaking the field trials implemented to avoid the risk of fire spreading from accidental burning of interrow dry plant material and to secure representativeness of thermal images [16,17]. corresponding vines. All thermal images were acquired on a clear day. The smoke treatments were applied with minimal wind; a requirement for undertaking the field trials implemented to avoid the risk of fire spreading from accidental burning of interrow dry plant material and to secure representativeness of thermal images [16,17].
Examples of a radiometric thermal image (A) processing for data extraction of Tdry (A, solid circle) and Twet (A, dotted circle) by painting leaves with petroleum jelly and water, respectively. Binary image obtained by thresholding Tdry and Twet (B); masked radiometric image extracting nonleaf material, such as overheated elements and sky (C); and subdivision of thermal image to extract information from sections of the canopy in a 5 × 5 sub-division (D).

Algorithms Used to Calculate Crop Water Stress Indices (CWSI) and Infrared Index (Ig)
Crop water stress index (CWSI) was calculated using the following equation, after determining Tdry and Twet [18]: where Tcanopy is the actual canopy temperature extracted from the thermal image at determined positions, and Tdry and Twet are the reference temperatures (in °C), obtained using the method of painting both sides of reference leaves with petroleum jelly and water, respectively [16]. An infrared index (Ig), proportional to leaf conductance to water vapor transfer (gs), can be obtained using the relationship as follows [19]: Examples of a radiometric thermal image (A) processing for data extraction of T dry (A, solid circle) and T wet (A, dotted circle) by painting leaves with petroleum jelly and water, respectively. Binary image obtained by thresholding T dry and T wet (B); masked radiometric image extracting non-leaf material, such as overheated elements and sky (C); and subdivision of thermal image to extract information from sections of the canopy in a 5 × 5 sub-division (D).

Algorithms Used to Calculate Crop Water Stress Indices (CWSI) and Infrared Index (I g )
Crop water stress index (CWSI) was calculated using the following equation, after determining T dry and T wet [18]: where T canopy is the actual canopy temperature extracted from the thermal image at determined positions, and T dry and T wet are the reference temperatures (in • C), obtained using the method of painting both sides of reference leaves with petroleum jelly and water, respectively [16]. An infrared index (I g ), proportional to leaf conductance to water vapor transfer (g s ), can be obtained using the relationship as follows [19]: where r aw = boundary layer resistance to water vapor, r RH = the parallel resistance to heat and radiative transfer, Υ = psychrometric constant and s = slope of the curve relating saturation vapor pressure to temperature [17,19].

Infrared Thermography Data Extraction
The T dry and T wet values were obtained on a per image basis using a customized code written in Matlab ® R2019a (Mathworks Inc. Natick, MA, USA) to crop the radiometric data from the areas within the respective painted leaves with water (T wet ) and petroleum jelly (T dry ) (Figure 2A). To filter non-leaf material from the radiometric image using the determined threshold, a second customized code was written in Matlab®to binarize a masked image ( Figure 2B) and to extract these values from the original image ( Figure 2C). For automatic extraction of data within a canopy, a pre-defined subdivision of 3 × 3 = 9; 5 × 5 = 25; 7 × 7 = 49 and 10 × 10 = 100 was automatically implemented ( Figure 2D; for the case of 5 × 5). From these subdivisions, data were extracted for T canopy per image ( Figure 2D), I g Equation (2) and CWSI Equation (1).
The image sub-divisions ( Figure 2D) represent the matrix (A) with m × n (m = rows and n = columns) extraction points represented as per the following matrix: Since m, n represent the pre-determined subdivision for automatic cropping sections from the infrared thermal image (A), every sub-image is processed for automatic canopy extraction by filtering non-leaf temperatures using the T dry and T wet values extracted ( Figure 2B) as minimum and maximum possible temperatures for the canopy. The calculated T value then corresponds to the averaged T canopy for each sub-division.

Pattern Recognition of Infrared Thermal Imagery using Machine Learning for Smoke Contamination Prediction
Pattern recognition models were developed using a customized Matlab®code, which is able to test 17 different training algorithms, two from Backpropagation with Jacobian derivatives, 11 from Backpropagation with gradient derivatives and four from Supervised weight and bias training functions, in loop to select the best model. This model was constructed using the infrared thermal image output values as inputs to classify the samples into smoked and non-smoked (control). The infrared thermal images were analyzed with the methodology described in Figure 2 to obtain T canopy , I g , and CWSI data obtained using Equations (1) and (2) with sub-divisions of 3 × 3 (n = 27 per image); 5 × 5 (n = 75, per image; 7 × 7 (n = 147, per image) and 10 × 10 (n = 300 per image). All algorithms tested used a random data division. However, for the algorithms such as scaled conjugate gradient, which consist of three stages-training, validation and testing, the data was divided as 60% (n = 28 images) for training, 20% (n = 10 images) for validation with a cross-entropy performance algorithm, and 20% (n = 10 images) for testing with a default derivative function. For the algorithms such as sequential order weights and bias, which only consist of training and testing stages, the data was divided as 70% (n = 34) for training and 30% (n = 14) for testing with a cross-entropy performance algorithm. A trimming exercise was conducted using 3, 7 and 10 neurons to select the best model with no signs of overfitting ( Figure 3). Sensors 2019, 18, x FOR PEER REVIEW 6 of 16

Berry Near Infrared (NIR) Spectroscopy Measurements
Full berries were scanned using a spectrophotometer (ASD FieldSpec ® 3, Analytical Spectral Devices, Boulder, CO, USA) equipped with the ASD contact probe, built for contact measurements, attached by fiber optic cable to the instrument. A total of 112 berries collected at harvest from seven cultivars (16 berries per cultivar) were scanned by putting the probe's lens in contact with the berries and a total of 401 spectra were recorded for each berry. The instrument records spectra with a resolution of 1.4 nm for the region 350-1000 nm and 2 nm for the region 1000-1850 nm. The instrument was used in reflectance mode and data was then transformed into absorbance values (absorbance = log (1/reflectance)). A reference tile (Spectralon ® , Analytical Spectral Devices, Boulder, CO, USA) was used as a white reference, for scatter correction. A new reference was taken every ten spectra acquisitions.

Winemaking and Chemical Analysis of Berries and Wine
Small scale winemaking of control and smoke-affected fruit from this trial has been described previously in detail by Ristic et al. [6]. Guaiacol glycoconjugates were measured in fruit and wine by HPLC-MS/MS using a stable isotope dilution analysis (SIDA) method developed by Dungey et al. (2011) [12]. Volatile phenols, including guaiacol, were determined in berries and wine by the Australian Wine Research Institute's (AWRI) Commercial Services Laboratory (Adelaide, Australia). Volatile phenols were measured by GC-MS according to SIDA methods reported previously [13].

Fitting Modeling of Near-Infrared (NIR) Spectroscopy of Berries using Machine Learning Modeling to Predict Smoke Taint in Berries and Wine
A regression model was developed using a customized Matlab® code, which is able to test 17 different training algorithms, two from Backpropagation with Jacobian derivatives, 11 from Backpropagation with gradient derivatives and four from Supervised weight and bias training functions, in loop to select the best model. NIR absorbance values corresponding to the range of wavelengths within 700 and 1100 nm with a second derivative transformation, which were used as inputs in the machine learning algorithms, since that range corresponds to alcohol and alcohol-based compounds to predict (i) guaiacol glycoconjugates in berries (µg Kg −1 ), (ii) guaiacol glycoconjugates in wines (µg L −1 ) and iii) guaiacol in wine (µg L −1 ). All algorithms tested used a random data division. However, for the algorithms, which consist of three stages-training, validation and testing, the data was divided as 60% (n = 28) for training, 20% (n = 10) for validation with a means squared error (MSE) performance algorithm and 20% (n = 10) for testing with a default derivative function (data not shown). For the algorithms such as sequential order weights and bias, which only consist of training and testing stages, data was divided as 70% (n = 34) for training and 30% (n = 14) for testing with a

Berry Near Infrared (NIR) Spectroscopy Measurements
Full berries were scanned using a spectrophotometer (ASD FieldSpec ® 3, Analytical Spectral Devices, Boulder, CO, USA) equipped with the ASD contact probe, built for contact measurements, attached by fiber optic cable to the instrument. A total of 112 berries collected at harvest from seven cultivars (16 berries per cultivar) were scanned by putting the probe's lens in contact with the berries and a total of 401 spectra were recorded for each berry. The instrument records spectra with a resolution of 1.4 nm for the region 350-1000 nm and 2 nm for the region 1000-1850 nm. The instrument was used in reflectance mode and data was then transformed into absorbance values (absorbance = log (1/reflectance)). A reference tile (Spectralon ® , Analytical Spectral Devices, Boulder, CO, USA) was used as a white reference, for scatter correction. A new reference was taken every ten spectra acquisitions.

Winemaking and Chemical Analysis of Berries and Wine
Small scale winemaking of control and smoke-affected fruit from this trial has been described previously in detail by Ristic et al. [6]. Guaiacol glycoconjugates were measured in fruit and wine by HPLC-MS/MS using a stable isotope dilution analysis (SIDA) method developed by Dungey et al. (2011) [12]. Volatile phenols, including guaiacol, were determined in berries and wine by the Australian Wine Research Institute's (AWRI) Commercial Services Laboratory (Adelaide, Australia). Volatile phenols were measured by GC-MS according to SIDA methods reported previously [13].

Fitting Modeling of Near-Infrared (NIR) Spectroscopy of Berries Using Machine Learning Modeling to Predict Smoke Taint in Berries and Wine
A regression model was developed using a customized Matlab ® code, which is able to test 17 different training algorithms, two from Backpropagation with Jacobian derivatives, 11 from Backpropagation with gradient derivatives and four from Supervised weight and bias training functions, in loop to select the best model. NIR absorbance values corresponding to the range of wavelengths within 700 and 1100 nm with a second derivative transformation, which were used as inputs in the machine learning algorithms, since that range corresponds to alcohol and alcohol-based compounds to predict (i) guaiacol glycoconjugates in berries (µg Kg −1 ), (ii) guaiacol glycoconjugates in wines (µg L −1 ) and iii) guaiacol in wine (µg L −1 ). All algorithms tested used a random data division. However, for the algorithms, which consist of three stages-training, validation and testing, the data was divided as 60% (n = 28) for training, 20% (n = 10) for validation with a means squared error (MSE) performance algorithm and 20% (n = 10) for testing with a default derivative function (data not shown). For the algorithms such as sequential order weights and bias, which only consist of training and testing stages, data was divided as 70% (n = 34) for training and 30% (n = 14) for testing with a means squared

Statistical Analysis
Data from chemometry and morphometry of berries, wine compounds, and Ig and gs were analyzed through ANOVA using SAS ® 9.4 software (SAS Institute Inc., Cary, NC, USA) with Tukey's studentized range test (HSD; p < 0.05) as post-hoc analysis for multiple comparisons to assess significant differences. Statistical data such as means and standard deviation (SD) were obtained from the replicates of each cultivar and treatment.

Experiment 1
3.1.1. Grapevine Physiological Data Relationships between Porometry and Infrared Thermal Imagery Table 1 shows the mean values for gs and Ig with respective standard deviations (SD) for the four cultivars from Experiment 1. The general trend for the mean values of the control treatments follows a positive linear relationship (R 2 = 0.99; Ig = 0.0027 gs). On the contrary, the trend for the mean values of the smoke treatments have lower linearity and relationship, but still showed a positive linear pattern (R 2 = 0.23; Ig = 0.0023 gs; data not shown). In the control samples, the mean Ig values per cultivar did not show significant change, as reflected by the SD values, but Merlot showed a significantly higher Ig (p < 0.05). This trend was similar for the mean gs values showing Merlot with the highest mean (p < 0.05). The mean Ig values for smoked samples were more variable, while gs showed higher mean values, except Sauvignon Blanc, and more variability as reflected by the higher SD values compared to control. The Ig mean values for both treatments were not very sensitive, as seen in Table 1. Figure 5 shows the relationships between gs and Ig for different sections of canopies (top, middle, and bottom) of the grapevines monitored for both non-smoked (control) and smoked treatments. The graph ( Figure 5A) shows a strong and significant linear relationship between gs and Ig (R 2 = 0.85; Ig = 0.0026 gs). However, there was no relationship observed for smoke treatments, with the data presenting high variability, which is consistent with results shown in Table 1. Figure 5A shows that regardless of the measurement position within the canopy for Ig, there is a broader distribution of values between top, middle, and bottom of the canopy along the linear relationship found. On the contrary, Figure 5B shows that the bottom readings for gs are more concentrated towards the lower values (<200 mmol m 2 s −1 ). Furthermore, the Ig values become less sensitive (spread between 0 and

Statistical Analysis
Data from chemometry and morphometry of berries, wine compounds, and I g and g s were analyzed through ANOVA using SAS ® 9.4 software (SAS Institute Inc., Cary, NC, USA) with Tukey's studentized range test (HSD; p < 0.05) as post-hoc analysis for multiple comparisons to assess significant differences. Statistical data such as means and standard deviation (SD) were obtained from the replicates of each cultivar and treatment.

Experiment 1
3.1.1. Grapevine Physiological Data Relationships between Porometry and Infrared Thermal Imagery Table 1 shows the mean values for g s and I g with respective standard deviations (SD) for the four cultivars from Experiment 1. The general trend for the mean values of the control treatments follows a positive linear relationship (R 2 = 0.99; I g = 0.0027 g s ). On the contrary, the trend for the mean values of the smoke treatments have lower linearity and relationship, but still showed a positive linear pattern (R 2 = 0.23; I g = 0.0023 g s ; data not shown). In the control samples, the mean Ig values per cultivar did not show significant change, as reflected by the SD values, but Merlot showed a significantly higher I g (p < 0.05). This trend was similar for the mean g s values showing Merlot with the highest mean (p < 0.05). The mean I g values for smoked samples were more variable, while g s showed higher mean values, except Sauvignon Blanc, and more variability as reflected by the higher SD values compared to control. The I g mean values for both treatments were not very sensitive, as seen in Table 1. Figure 5 shows the relationships between g s and I g for different sections of canopies (top, middle, and bottom) of the grapevines monitored for both non-smoked (control) and smoked treatments. The graph ( Figure 5A) shows a strong and significant linear relationship between g s and I g (R 2 = 0.85; I g = 0.0026 g s ). However, there was no relationship observed for smoke treatments, with the data presenting high variability, which is consistent with results shown in Table 1. Figure 5A shows that regardless of the measurement position within the canopy for I g , there is a broader distribution of values between top, middle, and bottom of the canopy along the linear relationship found. On the contrary, Figure 5B shows that the bottom readings for g s are more concentrated towards the lower values (<200 mmol m 2 s −1 ). Furthermore, the Ig values become less sensitive (spread between 0 and 1). The same pattern can be seen for most of the top readings with the middle readings having a wider spread distribution. 1). The same pattern can be seen for most of the top readings with the middle readings having a wider spread distribution. Means followed by different superscript letters are statistically significant between treatments based on Tukey's studentized range test (HSD, p < 0.05).  Table 2 shows the results of the pattern recognition modeling for the data extracted from infrared thermal images from the canopies of four different cultivars combined for Experiment 1. The best performing algorithm for the 3 × 3 sub-division and extraction of Tcanopy, Ig, and CWSI used as inputs and classification of smoked and non-smoked as target was the scaled conjugate gradient algorithm. The training, validation, and testing procedures (using 10 neurons) resulted in an overall model with 94% accuracy. In the case of the data extracted using a 5 × 5 sub-division, the overall best model (sequential order weight and bias) resulted in an accuracy of 88% (using 10 neurons) in the classification of smoked and non-smoked canopies. For the 7 × 7 sub-division, the best algorithm (also the sequential order weight and bias) resulted in an accuracy of 94% (using 7 neurons) in the classification. Finally, the 10 × 10 was the best performing algorithm overall (sequential order weight and bias) resulted in an accuracy of 96% (using 3 neurons). Furthermore, the performance of training was lower than the one for testing, and testing accuracy was close to that from the training stage, which are evidence of no overfitting  Table 2 shows the results of the pattern recognition modeling for the data extracted from infrared thermal images from the canopies of four different cultivars combined for Experiment 1. The best performing algorithm for the 3 × 3 sub-division and extraction of T canopy , I g , and CWSI used as inputs and classification of smoked and non-smoked as target was the scaled conjugate gradient algorithm. The training, validation, and testing procedures (using 10 neurons) resulted in an overall model with 94% accuracy. In the case of the data extracted using a 5 × 5 sub-division, the overall best model (sequential order weight and bias) resulted in an accuracy of 88% (using 10 neurons) in the classification of smoked and non-smoked canopies. For the 7 × 7 sub-division, the best algorithm (also the sequential order weight and bias) resulted in an accuracy of 94% (using 7 neurons) in the classification. Finally, the 10 × 10 was the best performing algorithm overall (sequential order weight and bias) resulted in an accuracy of 96% (using 3 neurons). Furthermore, the performance of training was lower than the one for testing, and testing accuracy was close to that from the training stage, which are evidence of no overfitting [20,21]. Table 2. Best pattern recognition model developed for each set of inputs showing the best training algorithm and number of neurons to predict whether canopies are smoked or non-smoked (control). Inputs corresponds to data extracted from infrared thermal images for T canopy , I g and crop water stress index (CWSI) in matrix arrangement of 3 × 3 (n = 27), 5 × 5 (n = 75), 7 × 7 (n = 147) and 10 × 10 (n = 300) data points per thermography. Performance reported is based on cross-entropy.  Figure 6 shows the Receiver Operating Characteristic (ROC) for the best performing model found to predict smoke contamination in grapevine canopies (10 × 10 sub-division; Table 2). The figure shows that results for both smoke and control pattern recognition using infrared thermography data as inputs are projected in a similar trend to the True Positive Rate prediction axis of the graph.   Figure 6 shows the Receiver Operating Characteristic (ROC) for the best performing model found to predict smoke contamination in grapevine canopies (10 × 10 sub-division; Table 2). The figure shows that results for both smoke and control pattern recognition using infrared thermography data as inputs are projected in a similar trend to the True Positive Rate prediction axis of the graph.  Table 3.  Table 3.  Table 3 shows the average data of morphometric and chemometric measurements obtained from berry samples for all the seven cultivars for Experiment 2. Even though there are some significant differences between morphometric measurements of berries for the different cultivars comparing smoke and non-smoked (Control) treatments, they do not affect results and models developed.

Smoke-Related Compounds Found in Berries and Wines
Data for smoke-related compounds have been previously reported by Ristic et al. (2016) [6], and comprised of volatiles with statistical differences between control (non-smoked) and smoked treatments. Specifically, for purposes of modeling, guaiacol glycoconjugates found in berries (µg Kg −1 ), guaiacol glycoconjugates found in wines (µg L −1 ) and guaiacol found in wines (µg L −1 ) were used since these are the primary compounds identified by the industry to contribute to smoke taint. In berries, the guaiacol glycoconjugates average concentration ranged for control between 37 and 602 µg kg −1 and from 253 to 2452 µg kg −1 for smoke-affected treatments. The guaiacol glycoconjugates concentrations in wines ranged from 8 to 334 µg L −1 for control and from 111 to 1480 µg L −1 for smoke-affected treatments. In the case of guaiacol concentration in wines, values ranged from 0 (not detected) to 9 µg L −1 for control and from 0 (not detected) to 26 µg L −1 [6]. Figure 7 shows the main average spectra for berries from smoke and non-smoked (control) treatments for red ( Figure 7A) and white cultivars ( Figure 7B). There were no significant differences in the averaged spectra between smoked and non-smoked berries for red cultivars. On the contrary, there appears to be a consistent difference for white cultivars of around 0.05 in absorbance, especially from 820 to 1100 nm for the range considered for this study. Smoke-related compounds for this trial and used for the machine learning model reported here have been previously reported by Ristic et al. [6]. In this study, statistically significant differences in the main smoke taint compounds were reported for all the seven cultivars included in Experiment 2.  Table 4 shows the best machine learning regression model obtained for the NIR data from berries (700-1100 nm using the second derivative transformation; Sequential Order Weights and Bias) as inputs and smoke taint compounds measured in berries and wine. The correlation between the estimated and observed values was R = 0.97 and slope b = 0.93 (close to unity). The same correlations and similar slopes were found for the training and the test stages. The overall model can also be seen in Figure 8, in which most of the point cloud data fits in the 1:1 line representing the accuracy of predicted versus observed data. Based on the 95% confidence bounds, the overall model had 3.6% of outliers. The performance of training was lower than the one for testing, and testing accuracy was the same as that from the training stage, which are evidence of no overfitting [20,21].

Machine Learning Modeling Based on NIR Spectra to Estimate Smoke Taint Compounds in
Berries and Wine Table 4 shows the best machine learning regression model obtained for the NIR data from berries (700-1100 nm using the second derivative transformation; Sequential Order Weights and Bias) as inputs and smoke taint compounds measured in berries and wine. The correlation between the estimated and observed values was R = 0.97 and slope b = 0.93 (close to unity). The same correlations and similar slopes were found for the training and the test stages. The overall model can also be seen in Figure 8, in which most of the point cloud data fits in the 1:1 line representing the accuracy of predicted versus observed data. Based on the 95% confidence bounds, the overall model had 3.6% of outliers. The performance of training was lower than the one for testing, and testing accuracy was the same as that from the training stage, which are evidence of no overfitting [20,21].  Figure 8. Overall fitting model using machine learning (Sequential Order Weights and Bias) using NIR spectra (700-1100 nm; second derivative transformation) of berries from seven grapevine cultivars as inputs and main smoke taint compounds found in berries and wine as targets.

Physiological Changes within Grapevine Canopies Due to Smoke Contamination
The relationship between the Ig thermal index and gs is linear, as shown in Table 1 and Figure  5A for non-smoked vines. These results are consistent with other studies showing the same relationships for grapevines [16,17], coffee plants [22] and olive trees [23], which are tree-like or bushy canopies. However, this relationship was not observed for smoked canopies of the four cultivars from Experiment 1 ( Figure 5B). Smoke contamination is an external signal to the plant which is composed mainly of CO, CO2 and other gases, which cause acidification of the sub-stomatal cavity due to the production of carbonic acid (H2CO3) when combined with water, with the resulting pH reduction causing partial or complete stomata closure [5]. This effect could explain the increased variability within gs data amongst individual leaves that was detected in porometry data (Table 1 and Figure  5B). The reported Ig data from the whole infrared thermal images (Table 1) did not have significant differences in the variability of the data, which can be explained by the unrepresentativeness of means when using this type of high-resolution information.
It is important to note that the comparison between gs and Ig for Figure 5 was made in this case using the methodology proposed in Figure 2 and with a sub-division of 3 × 3 for comparison purposes. Since every image was taken from 2.5 m distance, the field of view from infrared thermal images was around 140 × 110 cm of the canopy, which divided by nine gives a sub-area of 47 × 37 cm (area = 1739 cm 2 ). Considering that the area of an average leaf (data not shown) is of around 50-80 cm 2 [24], the Ig values represent the average of an area of approximately 25-fold of single leaves, in which porometry was conducted. This difference may explain the lower sensitivity of Ig to gs, especially for smoked canopies with higher gs variability expected even at the leaf level (patchy stomata behavior).
The extraction of Ig values from infrared thermal images require a Tdry and Twet reference temperatures. In this study, the painted leaves method was implemented for more accuracy in the determination of reference temperature thresholds to separate leaf from non-leaf material in the analysis. However, this method is manual and hinders the possibility of automation. Alternatively, the leaf energy balance method could be implemented using micrometeorological weather data such

Physiological Changes within Grapevine Canopies Due to Smoke Contamination
The relationship between the I g thermal index and g s is linear, as shown in Table 1 and Figure 5A for non-smoked vines. These results are consistent with other studies showing the same relationships for grapevines [16,17], coffee plants [22] and olive trees [23], which are tree-like or bushy canopies. However, this relationship was not observed for smoked canopies of the four cultivars from Experiment 1 ( Figure 5B). Smoke contamination is an external signal to the plant which is composed mainly of CO, CO 2 and other gases, which cause acidification of the sub-stomatal cavity due to the production of carbonic acid (H 2 CO 3 ) when combined with water, with the resulting pH reduction causing partial or complete stomata closure [5]. This effect could explain the increased variability within g s data amongst individual leaves that was detected in porometry data (Table 1 and Figure 5B). The reported I g data from the whole infrared thermal images (Table 1) did not have significant differences in the variability of the data, which can be explained by the unrepresentativeness of means when using this type of high-resolution information.
It is important to note that the comparison between g s and I g for Figure 5 was made in this case using the methodology proposed in Figure 2 and with a sub-division of 3 × 3 for comparison purposes. Since every image was taken from 2.5 m distance, the field of view from infrared thermal images was around 140 × 110 cm of the canopy, which divided by nine gives a sub-area of 47 × 37 cm (area = 1739 cm 2 ). Considering that the area of an average leaf (data not shown) is of around 50-80 cm 2 [24], the I g values represent the average of an area of approximately 25-fold of single leaves, in which porometry was conducted. This difference may explain the lower sensitivity of I g to g s , especially for smoked canopies with higher g s variability expected even at the leaf level (patchy stomata behavior).
The extraction of I g values from infrared thermal images require a T dry and T wet reference temperatures. In this study, the painted leaves method was implemented for more accuracy in the determination of reference temperature thresholds to separate leaf from non-leaf material in the analysis. However, this method is manual and hinders the possibility of automation. Alternatively, the leaf energy balance method could be implemented using micrometeorological weather data such as temperature, relative humidity, and solar radiation to calculate T dry and T wet on-the-go, while obtaining the infrared thermal images. It is common nowadays to access cheap sensor technology to measure these micrometeorological variables and dataloggers or access to the Internet of Things (IoT) for data transmission and processing. Previous research has shown that these reference temperatures can be calculated with high accuracy (R 2 = 0.95; RMSE = 0.85; p < 0.001) [16]. Furthermore, there is the requirement for infrared thermal images to be explored and assessed more in-depth at higher subdivisions and using machine learning modeling to assess the pattern variability and use it as a predictor of smoke contamination levels.

Pattern Recognition of Smoke Contamination Using Machine Learning Modeling
Considering the sub-division of infrared thermography data, the field of view of canopies and size of single leaves for this study, it is not surprising that the best pattern recognition model (96% accuracy) using machine learning (Sequential order weight and bias) was obtained with the 10 × 10 subdivision. This sub-division will render comparison areas within the canopy of 154 cm 2 , which is only 2.2-fold compared to a single leaf area (70 cm 2 ). Furthermore, from the neuron trimming analysis, a highly accurate model was obtained for the classification of smoked and non-smoked canopies with three neurons, which makes the model more efficient and less susceptible to overfitting. The latter is also supported by the performance value obtained by this model. Results shown in this paper from pattern recognition modeling using machine learning to asses smoke contamination of canopies have excellent potential for the use in short and mid-range remote sensing based on Unmanned Aerial Vehicles (UAVs) platforms. From Figure 5B, it can be seen that the main variability within g s values is in the bottom and top parts of the canopies, which validates obtaining infrared thermal imagery using UAVs at 0 • Nadir angle. Furthermore, models developed in this study should be tested using UAV with infrared cameras that could render a 15 × 15-pixel resolution, which corresponds to an area of 225 cm 2 , which is close to the 154 cm 2 area used for machine learning modeling here.
This kind of remote sensing tool can render spatial distribution maps of contaminated areas within vineyards that could aid growers to apply differential management strategies discussed before to mitigate smoke contamination of the fruit. Spatial maps of smoke contamination can also help to achieve differential harvests to avoid mixing fruit with smoke-tainted fruit. Hence, a system is proposed using these methods, which is depicted in Figure 9 for proximal and mid-distance remote sensing using infrared cameras and UAV platforms. For proximal remote sensing, the algorithms developed in this study can be implemented in smartphone devices as computer applications (Apps) connected to portable and affordable infrared thermal cameras (i.e., FLIR One ® , FLIR Systems, Portland, OR, USA) and near-infrared spectroscopy devices (i.e., Lighting Passport ® , AsenseTek, Taipei, Taiwan). as temperature, relative humidity, and solar radiation to calculate Tdry and Twet on-the-go, while obtaining the infrared thermal images. It is common nowadays to access cheap sensor technology to measure these micrometeorological variables and dataloggers or access to the Internet of Things (IoT) for data transmission and processing. Previous research has shown that these reference temperatures can be calculated with high accuracy (R 2 = 0.95; RMSE = 0.85; p < 0.001) [16]. Furthermore, there is the requirement for infrared thermal images to be explored and assessed more in-depth at higher subdivisions and using machine learning modeling to assess the pattern variability and use it as a predictor of smoke contamination levels.

Pattern Recognition of Smoke Contamination Using Machine Learning Modeling
Considering the sub-division of infrared thermography data, the field of view of canopies and size of single leaves for this study, it is not surprising that the best pattern recognition model (96% accuracy) using machine learning (Sequential order weight and bias) was obtained with the 10 × 10 subdivision. This sub-division will render comparison areas within the canopy of 154 cm 2 , which is only 2.2-fold compared to a single leaf area (70 cm 2 ). Furthermore, from the neuron trimming analysis, a highly accurate model was obtained for the classification of smoked and non-smoked canopies with three neurons, which makes the model more efficient and less susceptible to overfitting. The latter is also supported by the performance value obtained by this model. Results shown in this paper from pattern recognition modeling using machine learning to asses smoke contamination of canopies have excellent potential for the use in short and mid-range remote sensing based on Unmanned Aerial Vehicles (UAVs) platforms. From Figure 5B, it can be seen that the main variability within gs values is in the bottom and top parts of the canopies, which validates obtaining infrared thermal imagery using UAVs at 0° Nadir angle. Furthermore, models developed in this study should be tested using UAV with infrared cameras that could render a 15 × 15-pixel resolution, which corresponds to an area of 225 cm 2 , which is close to the 154 cm 2 area used for machine learning modeling here.
This kind of remote sensing tool can render spatial distribution maps of contaminated areas within vineyards that could aid growers to apply differential management strategies discussed before to mitigate smoke contamination of the fruit. Spatial maps of smoke contamination can also help to achieve differential harvests to avoid mixing fruit with smoke-tainted fruit. Hence, a system is proposed using these methods, which is depicted in Figure 9 for proximal and mid-distance remote sensing using infrared cameras and UAV platforms. For proximal remote sensing, the algorithms developed in this study can be implemented in smartphone devices as computer applications (Apps) connected to portable and affordable infrared thermal cameras (i.e., FLIR One ® , FLIR Systems, Portland, OR, USA) and near-infrared spectroscopy devices (i.e., Lighting Passport ® , AsenseTek, Taipei, Taiwan).

Figure 9.
Diagram showing the implementation of machine learning modeling strategies proposed in this paper for proximal (using smartphones and portable infrared thermal cameras and NIR devices) and mid-distance remote sensing using unmanned aerial system (UAS) platforms. Figure 9. Diagram showing the implementation of machine learning modeling strategies proposed in this paper for proximal (using smartphones and portable infrared thermal cameras and NIR devices) and mid-distance remote sensing using unmanned aerial system (UAS) platforms.

Near-Infrared (NIR) Spectroscopy of Berries
Since NIR spectroscopy was obtained from full berries, the tool proposed in this paper is non-destructive. Furthermore, it has been shown that a higher concentration of smoke-related compounds after contamination can be found in the skin of berries, which is higher than in the pulp and higher than the seeds [12]. Furthermore, the range of 700-1100 nm was chosen since most of the available NIR instrumentation in this range can be affordable for growers compared to the instrument used in this study which can cost around 45 times more. The 982 nm overtone is associated with the OH overtone band and 1100 for the CH bands, which corresponds to alcohol and phenolic compounds [25].
The model reported using machine learning fitting algorithms can be of great assistance to growers and winemakers to obtain chemometry data in real time using the proposed methodology shown in Figure 9. Currently, growers do not have sophisticated tools to assess potential smoke contamination of berries bunches and wines. The only option available is collection of samples within a vineyard for compositional analysis by an accredited laboratory using GC-MS or HPLC-MS/MS. This process is destructive, expensive, and takes a long time, which makes it less ideal for the implementation of mitigation strategies and/or decision making before harvest. Furthermore, it may minimize smoke taint by the information provided through a spatial assessment of the contamination either through canopies or berries for informed decision making regarding palliative measures (as presented in this paper) or differential harvest.
The models developed in this study were able to predict smoke contamination in canopies, berries and wines, regardless of the cultivar. Hence, the models could be applied as a universal methodology. Further studies and data acquired could be added to models to include more cultivars. However, the seven cultivars included in this study were some of the most commercially important in Australia. Finally, it is important to note that the levels of smoke-taint compounds present in wine are in part related to the winemaking process (i.e., duration of skin contact time during fermentation), hence this model will need to be adjusted for different winemaking techniques, which can influence the extraction of smoke-related compounds from the berry.

Conclusions
This paper showed two main advancements for tools to detect smoke contamination in grapevine canopies and smoke-related compounds in berries and wine using remote sensing techniques. This study is the first to apply machine learning modeling techniques to assist growers confronted with vineyard exposure to smoke from bushfires, an issue which has been exacerbated in prominent wine regions around the world due to climate change. Furthermore, this paper has proposed an affordable method to implement these novel techniques using smartphones, portable thermal imagery and NIR spectroscopy devices. More research is required to assess the usage of these affordable devices in the future using the models proposed.
Author Contributions: S.F. conceived the machine learning modeling idea and practical applications; S.F., E.J.T. and C.G.V. analyzed the data and created the machine learning models; K.W., S.T. and S.F. were awarded funding for the study; K.W. and R.R. performed field trials, laboratory analysis and winemaking; S.F. and R.D.B. acquired the physiological and NIR data. All authors contributed to the writing of the paper.
Funding: This research received no external funding.