Dual Activation Function-Based Extreme Learning Machine ( ELM ) for Estimating Grapevine Berry Yield and Quality

Reliable assessment of grapevine productivity is a destructive and time-consuming process. In addition, the mixed effects of grapevine water status and scion-rootstock interactions on grapevine productivity are not always linear. Despite the potential opportunity of applying remote sensing and machine learning techniques to predict plant traits, there are still limitations to previously studied techniques for vine productivity due to the complexity of the system not being adequately modeled. During the 2014 and 2015 growing seasons, hyperspectral reflectance spectra were collected using a handheld spectroradiometer in a vineyard designed to investigate the effects of irrigation level (0%, 50%, and 100%) and rootstocks (1103 Paulsen, 3309 Couderc, SO4 and Chambourcin) on vine productivity. To assess vine productivity, it is necessary to measure factors related to fruit ripeness and not just yield, as an over cropped vine may produce high-yield but poor-quality fruit. Therefore, yield, Total Soluble Solids (TSS), Titratable Acidity (TA) and the ratio TSS/TA (maturation index, IMAD) were measured. A total of 20 vegetation indices were calculated from hyperspectral data and used as input for predictive model calibration. Prediction performance of linear/nonlinear multiple regression methods and Weighted Regularized Extreme Learning Machine (WRELM) were compared with our newly developed WRELM-TanhRe. The developed method is based on two activation functions: hyperbolic tangent (Tanh) and rectified linear unit (ReLU). The results revealed that WRELM and WRELM-TanhRe outperformed the widely used multiple regression methods when model performance was tested with an independent validation dataset. WRELM-TanhRe produced the highest prediction accuracy for all the berry yield and quality parameters (R2 of 0.522–0.682 and RMSE of 2–15%), except for TA, which was predicted best with WRELM (R2 of 0.545 and RMSE of 6%). The results demonstrate the value of combining hyperspectral remote sensing and machine learning methods for improving of berry yield and quality prediction.


Introduction
Cultivating or phenotyping highly productive grapevine cultivars would be a solution to minimize the effect of climate changes, but this may result in differences in the flavor profile, take a long time for development, and there is a chance that new cultivars will be susceptible to other biotic or abiotic stressors.Grafting is a common alterative for imparting stress resistance while maintaining fruit characteristics, whereby growers use selected rootstocks (roots and lower stem onto which another variety is grafted) and scions (above-ground part of the plant the scion produces the stem, leaves, flowers, and berries) to increase the performance of grapevines [1][2][3].Therefore, grafting and the development of new rootstock genotypes have become common practices in modern viticulture [4].
Rootstock-scion interactions, through the impact of rootstocks on gas exchange and water use of scions, play a critical role in determining berry yield and quality [5].Selection of the most productive rootstock-scion combinations may become even more complicated when the potential for different or variable vine water availability is considered.This selection process depends not only on the amount of berry yield, but also on berry quality-related parameters, such as Total Soluble Solids (TSS, • Brix), Titratable Acidity (TA, g tartaric acid L −1 ) and the ratio TSS/TA (maturation index, IMAD).These factors were incorporated due to the fact that commercial fruit harvest is decided on when fruit meets basic chemical requirements (i.e., needing a minimum TSS to produce enough alcohol during fermentation and TA needing to have reduced to a level during ripening that will be palatable).Additionally, these factors relate to the overall health and photosynthetic production of a vine, as a productive and healthy vine is not only able to produce a larger crop but also have higher quality fruit [6].Given the fact that the best rootstock-scion combination is determined by several berry yield and quality related factors, it is inefficient to employ conventional methods of yield and quality measurements that are usually expensive, destructive, laborious and time-consuming [7].It is, therefore, critical to have methods that are rapid, non-destructive, accurate and available at low-cost.
Applications of hyperspectral remote sensing in the determination of plant health and estimation of crop yield have been rapidly expanding [8][9][10][11][12][13].However, studies exploring the potential of hyperspectral data in fruit quality estimation are lacking, because most authors were primarily seeking to optimize irrigation scheduling or plant health.In recent years, various hyperspectral indices have been found to be useful for fruit yield and quality estimation.Martín et al. [14] and Meggio et al. [15] demonstrated the potential use of pigment-based reflectance indices to estimate berry quality in vineyards affected by iron deficiency.Serrano et al. [16] suggested the suitability of the water index (WI) to predict the berry quality of grapevines grown in rainfed conditions.In addition, photochemical reflectance index (PRI) [8], an indicator of epoxidation state of the xanthophyll cycle pigments and non-photochemical quenching (NPQ), was found to be related to the fruit quality parameter in citrus and pear orchards [17,18].There is no consensus on the effectiveness of a single index.Furthermore, when different indices are combined, the contributions of spectral indices to vine productivity estimation have not been explored in a complex field environment where both different irrigation treatments and scion-rootstock combinations have been implemented.
In addition to single index-based methods, there exist multiple regression approaches that include more than one spectral index or wavelength as explanatory variables to improve estimation performance of plant physiology, water status, yield, and quality.Multiple linear regression (MLR), an extension of simple linear regression, generated better results than traditional spectral index or single wavelength-based methods [11,19].However, MLR suffers from multi-collinearity inherent in hyperspectral datasets [20].Alternatively, partial least squares regression (PLSR) has proved to be an effective method in various applications by reducing the multi-collinearity of a large numbered explanatory variables to a few non-correlated latent variables [11,[21][22][23].Nevertheless, grapevine berry yield and quality are determined by complex interactions of many factors and the relationships.Grapevine and remote sensing data may not always linear, and linear statistical methods fail to approximate the non-linear relationships within the data.Thus, machine learning methods based on artificial neural networks (ANNs) and random forest regression (RFR) have been utilized to capture both linear and non-linear relationships that exist between remote sensing and vegetative parameters [24][25][26].Yuan et al. [27] and Zhu et al. [28] demonstrated that RFR was superior to ANNs in leaf area index (LAI) prediction due to its suitability for a relatively small number of training samples and insensitivity to noisy data [29].Recently, Pôças et al. [30] demonstrated the power of machine learning methods to support irrigation scheduling in vineyards using data from a handheld spectrometer.Hence, it is worth exploring the potential of machine learning methods in estimation grapevine productivity using hyperspectral data.
Among machine learning methods, Extreme Learning Machine (ELM; [31]), and its variants with different activation functions, have been successfully applied to a variety of research fields [32][33][34][35][36][37].The activation function is the nonlinear transformation of the weighted input signals and bias [38].ELM exhibits good generalization with commonly used activation functions [39].For the first time, Maimaitijiang et al. [40] used ELM in soybean phenotypic trait estimation from fused aerial images and found that ELM was more capable to handle complex data than conventional regression methods.Rocha Neto et al.
[41] also reported that ELM performed better than ANNs in estimating soil electric conductivity using hyperspectral data.Generally, ELM has been found superior to other machine learning and conventional regression methods because it is easy to implement, has fast learning speed and good generalization performance [31,39].However, its application for assessing vine productivity through hyperspectral data has not been exploited.
Non-destructive estimation of pre-harvest fruit yield and quality of perennial tree crops such as grapevines is a challenging domain.To best of our knowledge, this is the first attempt on the application of hyperspectral remote sensing and machine learning for berry yield and quality from standing plants.The employment of novel machine learning techniques for remote sensing field has become a promising avenue.Combination of remote sensing and machine learning can be valuable for improving berry yield and quality estimation.Within this context, the main objective of this study is to develop robust yield and fruit quality prediction models using canopy level hyperspectral data for grapevines grown under different irrigation treatments and rootstocks conditions.To accomplish our goal: (i) we developed berry yield and quality prediction models with MLR, PLSR, RFR and WRELM using vegetation indices derived from canopy spectra; (ii) we proposed a new activation function by fusing of hyperbolic tangent (Tanh) function and Rectified linear unit (ReLU) for Weighted Regularized ELM (WRELM-TanhRe); (iii) conducted comparative analysis between prediction models that were developed with existing methods and our newly proposed method; (iv) evaluated the relative importance of the vegetation indices to berry yield and quality estimation; (v) discussed the model scalability and transferability.

Study Site
The experiment was carried out in an experimental vineyard located in Mount Vernon, Missouri, USA (37 • 4 27.17"N, 93 • 52 46.70"W), at 376 m above mean sea level, during 2014 and 2015 (Figure 1).The vineyard where the additional studies being conducted, one of which was a multi-year evaluation of rootstock and irrigation impacts on the berry and wine quality.The vineyard has a continental climate, in which rainfall occurs primarily during the start of the growing season, with an average annual temperature of 15.6 • C and mean annual rainfall of 1066.8 mm.Soil texture is a combination of sandy loam, silt loam, and loam, with an average pH of 6.The vineyard was 120 × 75 m in dimension and planted as Chambourcin vines, either own-rooted or grafted on 1103 Paulsen, 3309 Couderc and SO4 in 2009.Vine density was 504 vines ha −1 with 3 m row spacing and 3 m vine spacing, including 25 rows and 1034 vines in total.Vines were trained on a high wire cordon trellis and spur pruned.The grass was sown between the rows to avoid soil erosion, but a weed-free strip was kept below the vines.At establishment, six irrigation zones were installed allowing for randomization of plots for both four different rootstocks and three different irrigation treatments.The irrigation treatments were: (i) nonirrigated (NIR), (ii) full replacement of evapotranspiration (FIR) or (iii) irrigated at 50% of potential evapotranspiration (INT).The irrigation treatments were applied on 9 rows for three consecutive years (2013)(2014)(2015) with data collection in this study only occurring in 2014 and 2015.Each treated row consisted of 8 plots and 4 adjacent vines in each plot with the same rootstocks.
There are 72 measurement plots in the vineyard, consisting of 9 irrigation treatment rows with 8 plots.Figure 2 presents the vineyard weather conditions.Prior to 2013, all vines received irrigation to ensure equal establishment.FIT and INT rows were irrigated using a drip irrigation system with a flow rate of 604.15 l h −1 per treatment.To maintain the different treatments along the growing season, both timing and amount of water were determined based on evapotranspiration (ET) calculated using the weather data from a nearby weather station situated 270 m north of the vineyard.
In the vineyard, the maximum canopy height was 2.2 m and width of the canopy ranged from 0.5 to 1.3 m.On the field measurement dates, average leaf area index (LAI) values were 1.5 and 1.3, respectively.LAI was determined using LAI-2200C Plant Canopy Analyzer (LI-COR Inc., Lincoln, NE, USA).
The average air temperature was 22.8 • C for both growing seasons of 2014 and 2015.The average rainfall for 2014 season was 423 mm, which was 6 mm higher than average rainfall (417 mm) of 2015 season.Please note that, in 2014, most of the rainfall events occurred before and after the irrigation treatment applied, while in 2015, frequent rainfall events happened before the irrigation treatment initiation.
soil erosion, but a weed-free strip was kept below the vines.At establishment, six irrigation zones were installed allowing for randomization of plots for both four different rootstocks and three different irrigation treatments.The irrigation treatments were: (i) nonirrigated (NIR), (ii) full replacement of evapotranspiration (FIR) or (iii) irrigated at 50% of potential evapotranspiration (INT).The irrigation treatments were applied on 9 rows for three consecutive years (2013)(2014)(2015) with data collection in this study only occurring in 2014 and 2015.Each treated row consisted of 8 plots and 4 adjacent vines in each plot with the same rootstocks.There are 72 measurement plots in the vineyard, consisting of 9 irrigation treatment rows with 8 plots.Figure 2 presents the vineyard weather conditions.Prior to 2013, all vines received irrigation to ensure equal establishment.FIT and INT rows were irrigated using a drip irrigation system with a flow rate of 604.15 l h −1 per treatment.To maintain the different treatments along the growing season, both timing and amount of water were determined based on evapotranspiration (ET) calculated using the weather data from a nearby weather station situated 270 m north of the vineyard.
In the vineyard, the maximum canopy height was 2.2 m and width of the canopy ranged from 0.5 to 1.3 m.On the field measurement dates, average leaf area index (LAI) values were 1.5 and 1.3, respectively.LAI was determined using LAI-2200C Plant Canopy Analyzer (LI-COR Inc., Lincoln, NE, USA).
The average air temperature was 22.8 °C for both growing seasons of 2014 and 2015.The average rainfall for 2014 season was 423 mm, which was 6 mm higher than average rainfall (417 mm) of 2015 season.Please note that, in 2014, most of the rainfall events occurred before and after the irrigation treatment applied, while in 2015, frequent rainfall events happened before the irrigation treatment initiation.

Field Data Collection
The field measurement dates were during the late veraison stage (19 August, DOY 231) in 2014 and the fruit ripening stage (21 September, DOY 264) in 2015.The growth stages were suggested by previous studies focusing on estimating berry yield and quality from remote sensing data [16,43,44].The dates were determined based on the number of no-rain days after irrigation treatment initiation, which was started later in 2015 due to the frequent early season rain compared to the prior year.

Field Data Collection
The field measurement dates were during the late veraison stage (19 August, DOY 231) in 2014 and the fruit ripening stage (21 September, DOY 264) in 2015.The growth stages were suggested by previous studies focusing on estimating berry yield and quality from remote sensing data [16,43,44].The dates were determined based on the number of no-rain days after irrigation treatment initiation, which was started later in 2015 due to the frequent early season rain compared to the prior year.

Field Spectroscopy Measurements
Reflectance measurements using a high-resolution full-range portable spectroradiometer PSR-3500 (Spectral Revolution, Inc., Lawrence, MA, USA) were performed between 11:00 am and 2:00 pm local time.Measurements were taken under clear-sky conditions to minimize the disturbances from changes in sun angle and canopy shadow.The spectroradiometer has a spectral range of 350-2500 nm with a resolution of 3.5 nm in the 350-1000 nm range, 10 nm in the 1000-1900 nm range, and 7 nm in the 1900-2500 nm range.Top-of-canopy radiance was recorded from an elevated platform holding a 1.2 m long fiber optic with 25 • circular field of view (FOV) attached to the Spectro radiometer.The fiber optic head was held in a nadir orientation with a pistol grip above the canopy at an average distance of 0.3 m.This resulted in an acquisition footprint of about 0.17 m.Care was taken to ensure that the FOV of the spectrometer covered the grapevine canopy, reducing background effects (e.g., soil).Canopy reflectance was calculated as the ratio between top-of-canopy radiance and incident irradiance, which was measured over a 99% reflectance Spectralon calibration panel (Labsphere, Inc., North Sutton, New Hampshire) before target measurement.Four vines within each plot were measured 3-4 times for top-of-canopy radiances and averaged per plot.This means 72 spectral measurements in total and 6 spectral measurements for each "class" (3 irrigation and 4 rootstock treatments).The spectroradiometer was configured to average 40 readings automatically per sampling, and the raw spectra bandwidth was interpolated to 1 nm.This resulted in 2151 individual spectral bands.For further analysis, we only focus on the visible and near-infrared (400-1100 nm, VNIR) region due to its high signal-to-noise ratio and easy accessibility from commonly available handheld spectroradiometers, as well as from satellite sensors and unmanned aerial vehicles (UAVs) [30,42,45].

Determination of Berry Yield and Quality
Manual harvesting was carried out on Day of Year (DOY) 278 and 283 in 2014 and 2015, respectively.Following the berry weight measurements on-site for each individual vine within the plots (72 plots × 4 vines), the berries were sent to the lab in insulated coolers for fruit quality determination. 100 berries for each vine were hand crushed and juice centrifuged.The extracted juice was analyzed for total soluble solids (TSS, • Brix) and titratable acidity (TA, g tartaric acid L −1 ).TSS was measured with an Atago RX-5000 digital refractrometer (Atago, Tokyo, Japan), while TA was determined by titration with 0.1 N NaOH using a Mettler-Toledo G20 compact titrator and DG115-SC probe (Mettler-Toledo, Schwerzenbach, Switzerland).The maturity index (IMAD) was computed using the ratio between TSS and TA.All berry yield and quality data were determined for each individual vine within the plot and averaged per plot, resulting 144 plots for two growing seasons.

Methods
A workflow for the development of berry yield and quality estimation models is given in Figure 3.The methodology can be divided into three main steps.First, grapevine data preparation and calculation of vegetation indices derived from the canopy reflectance.Second, calibration of prediction models and validation.Third, analysis of variable importance.

Workflow for the Model Development
Four linear and non-linear multiple regression methods, including Multiple Linear Regression (MLR), Partial Least Squares Regression (PLSR) and Random Forest Regression (RFR), and Weighted Regularized Extreme Learning Machine (WRELM)-based machine learning methods, were implemented to compare performance of proposed WRELM improved with dual activation function (WRELM-TanhRe) in berry yield and quality prediction (Figure 3).For implementing independent validation, the whole dataset was randomly split into a calibration and validation set.Please note that the split of calibration and validation samples is typically based on the data complexity and the total number of samples.However due to the complexity of data and the limited samples (144 in total) in our database, it was empirically found that an 80% (116 samples) calibration set and a 20% (28 samples) validation set was sufficient.This was done by experimenting with different splits of calibration and validation samples.These splits included 50-50%, 60-40%, 70-30%, 80-20% and 90-10%.More details on the size of the training sample and its significance are provided in Supplementary Tables S1-S5.All the regression models were run on the calibration dataset and the associated parameters were optimized using five-fold cross-validation with 10 repeated experiments.Five-fold cross validation was preferred over 10-fold cross validation because of the limited sample size [28,46].Finally, variable importance was determined to analyze the contribution of each predictor variable to prediction accuracy of the best models.

Calculation of Vegetation Indices
Previous studies have demonstrated the importance of spectral indices related to biochemical, structural, physiological parameters and water stress as direct and indirect indicators of fruit yield and quality [18,[47][48][49].Therefore, using the preprocessed canopy reflectance spectra, a total of 20 vegetation indices were calculated, and all the indices were used to calibrate prediction models (Table 1).The indices were divided into four categories including: (i) pigment, (ii) structure, (iii) physiology, and (iv) water content.The correlations between the vegetation indices, berry yield, and quality were assessed using the Spearman rank correlation [50], which accounts for both linear and non-linear relationships between parameters.

Calculation of Vegetation Indices
Previous studies have demonstrated the importance of spectral indices related to biochemical, structural, physiological parameters and water stress as direct and indirect indicators of fruit yield and quality [18,[47][48][49].Therefore, using the preprocessed canopy reflectance spectra, a total of 20 vegetation indices were calculated, and all the indices were used to calibrate prediction models (Table 1).The indices were divided into four categories including: (i) pigment, (ii) structure, (iii) physiology, and (iv) water content.The correlations between the vegetation indices, berry yield, and quality were assessed using the Spearman rank correlation [50], which accounts for both linear and non-linear relationships between parameters.

Water content
Water Index WI WI = R 900 /R 970 [66] 3.3.Background on Extreme Learning Machines (ELMs) and the Developed Method In this section, we first provide a brief review of ELM, RELM, and WRELM, and then introduce the proposed WRELM-TanhRe.Table 2 includes the full names and acronyms of the frequently used new terms in this section.ELM is a single hidden layer feedforward neural network with randomly initialized input weights and biases, whereas its output weights are analytically determined [31].Due to its easy implementation, fast learning and good generalization performance, ELM has gained more popularity compared to other machine learning methods [31,39].However, very few studies have explored the potential of ELM in crop classification and estimation of phenotypic traits using remote sensing datasets [40].For simplicity, we consider the basic setup of ELM for regression problems.
Given N training samples {x i , y i } N i=1 where input x i ∈ R d and the corresponding expected output y i ∈ R. where Y is the expected output value for N training samples, β is the output weight vector, and H refers to the hidden-layer output matrix.From Equation (1), the unknown parameter is the output weight β, which can be obtained by the least squares solution [31].

Regularized ELM (RELM)
Since ELM aims to minimize the training error, it may lead to overfitting [67].In contrast, RELM alleviates this issue by introducing a regularization parameter C [68]; thus, Equation (1) can be rewritten as min C where ε represents the error variable.The detailed solution for Equation ( 2) can be found in Huang et al. [69].

WRELM
Weighted RELM (WRELM) was proposed to weaken the influence of outliers [70].Specifically, samples with high training error are assigned with small weights while high weights are assigned to samples with less training error [67].This is achieved by weighing the RELM error variable ε by weighting factors v. Thus, the ε 2  2 is changed to Vε 2  2 , where V = diag{v 1 , v 2 , . . ., v i , . . ., v N }.Subsequently, the output weight β is given by [70] A more detailed description of WRELM can be found in [67,70].

Proposed WRELM-TanhRe
The activation function is the nonlinear transformation of the weighted input signals and bias [38].The activation function, in ELM, RELM, and WRELM is important because it can transform input data to a nonlinear feature space which may help to improve the prediction accuracy.The frequently used activation functions generally come from saturated functions such as hyperbolic tangent (Tanh; Figure 4a) and its non-saturated counterpart Rectified Linear Unit (ReLU; Figure 4b) [71][72][73].One of the merits of ReLU is that it introduces sparsity by pruning the negative values to zero and retaining the positive ones [73].Sigmoid function is another popular activation function which has been widely used in ELM and its variants [67,69,70].However, it compresses the input data to non-negative for next layer.In contrast, the Tanh function transforms the input values into both negative and positive ones.Considering the abovementioned activation functions, we aim to design an activation function which has the following properties: (1) replaces the negative part of the ReLU with a nonlinear function, since it has been shown that this can improve performance of neural networks [74]; (2) introduces a semi-bounded activation function where negative parts are bounded and positive parts are non-bounded.This eliminates issues of the dense reorientation of inputs as it exists in traditional bounded activation functions (e.g., Tanh and sigmoid) [75].Consequently, our proposed activation function (i.e., TanhRe) combines ReLU and Tanh functions to produce a semi-bounded as well as non-densed representation (Figure 4c), defined as where x is the input of the nonlinear activation f .Figure 4 shows the shapes of Tanh, ReLU and TanhRe.
From Equation (4), it can be observed that the positive values are maintained while the negative values are transformed by Tanh function.

And the output weight β
The other computation scheme is same as WRELM mentioned in Section 3.3.3.
Remote Sens. 2019, 11, x FOR PEER REVIEW 10 of 25 The other computation scheme is same as WRELM mentioned in Section 3.3.3.

Comparison to other Modeling Methods
Several popular non-neural network regression methods were selected, including Multiple Linear Regression (MLR), Partial Least Squares Regression (PLSR), and Random Forest Regression (RFR).MLS utilizes the least squares method to account for two or more predictors that affect response variables.However, MLR fails to deal with collinearity between the predictors and considers only a few predictors for modeling proposes.PLSR regression, on the other hand, was proposed to reduce collinearity within the predictors by selecting non-correlated latent variables or components using principle component analysis [23,76,77].It identifies a linear relationship between a set of dependent (response) variables and a set of predictor variables [78].Even though previous studies have showed the linear relationship between spectral indices and grapevine productivity, there may exist a nonlinear relationship.Therefore, another regression method, RFR, an ensemble technique-based machine learning algorithm, may be better able to model data with nonlinearity and complex relationships between predictors and response variables.RFR uses a bootstrap sampling method to construct a large number of independent decision trees to obtain the minimum sum of squared residuals [29,79].Each decision tree is created using randomly selected predictive and responsive variables.Nodes of the decision tree are divided based on the random subset of the predictive variables.This randomness and requiring no assumption of the probability distribution in predictive variables, increase model prediction accuracy and robustness against over-fitting [80,81].Finally, an optimal prediction model is generated by aggregating all the "trees" that form the "forest" [82,83].

Model Performance Analysis
The predictive power of the best performing calibrated models per berry yield and quality parameters were subsequently evaluated on an independent dataset.Predictive power and robustness of the models were assessed by common evaluation metrics.These metrics include coefficient of determination (R 2 ) between the predicted and observed parameters, Root Mean Square Error (RMSE) and normalized RMSE (RMSE/average of observed parameter × 100; RMSE%) [84][85][86].

Variable Importance
To reveal the vegetation indices that contribute most to the development of prediction models, we carried out variable importance analysis.Specifically, each vegetation index is sent to the proposed WRELM-TanhRe model as a predictive feature for berry yield and quality, then the prediction result in terms of

Comparison to other Modeling Methods
Several popular non-neural network regression methods were selected, including Multiple Linear Regression (MLR), Partial Least Squares Regression (PLSR), and Random Forest Regression (RFR).MLS utilizes the least squares method to account for two or more predictors that affect response variables.However, MLR fails to deal with collinearity between the predictors and considers only a few predictors for modeling proposes.PLSR regression, on the other hand, was proposed to reduce collinearity within the predictors by selecting non-correlated latent variables or components using principle component analysis [23,76,77].It identifies a linear relationship between a set of dependent (response) variables and a set of predictor variables [78].Even though previous studies have showed the linear relationship between spectral indices and grapevine productivity, there may exist a nonlinear relationship.Therefore, another regression method, RFR, an ensemble technique-based machine learning algorithm, may be better able to model data with nonlinearity and complex relationships between predictors and response variables.RFR uses a bootstrap sampling method to construct a large number of independent decision trees to obtain the minimum sum of squared residuals [29,79].Each decision tree is created using randomly selected predictive and responsive variables.Nodes of the decision tree are divided based on the random subset of the predictive variables.This randomness and requiring no assumption of the probability distribution in predictive variables, increase model prediction accuracy and robustness against over-fitting [80,81].Finally, an optimal prediction model is generated by aggregating all the "trees" that form the "forest" [82,83].

Model Performance Analysis
The predictive power of the best performing calibrated models per berry yield and quality parameters were subsequently evaluated on an independent dataset.Predictive power and robustness of the models were assessed by common evaluation metrics.These metrics include coefficient of determination (R 2 ) between the predicted and observed parameters, Root Mean Square Error (RMSE) and normalized RMSE (RMSE/average of observed parameter × 100; RMSE%) [84][85][86].

Variable Importance
To reveal the vegetation indices that contribute most to the development of prediction models, we carried out variable importance analysis.Specifically, each vegetation index is sent to the proposed WRELM-TanhRe model as a predictive feature for berry yield and quality, then the prediction result in terms of coefficient of determination R 2 is calculated, where the importance of the indices is obtained by ranking their corresponding R 2 from high values to low values.

Descriptive Analysis of Berry Yield and Quality
Between the years, yield, TSS, and IMAD were higher in 2014 than in 2015, while the opposite was true for TA (Table 3).In this study, only mild water stress was observed in 2014 due to the season having consistent rainfall [42], and level of stress was not high enough to reduce the yield significantly.In 2015, all the berry yield and quality parameters presented higher coefficient of variation (CV) than in 2014 owing to a greater range yield under different irrigation regimes.The berry yield had the highest degree of variation, while TSS was the most stable in both years, which is to be expected as harvest was partially decided by meeting industry TSS requirements.Similarly, in the pooled dataset, the berry yield showed the largest variability (CV = 19%), followed by IMAD, TA, and TSS (CV = 12%, CV = 11%, and CV = 3%).Overall, the data ranges were similar to those observed in different regions and this was especially true for the berry quality parameters [16,87,88].

Relationship (in Absolute Terms) Between Grape Yield Parameters and Hyperspectral Vegetation Indices
When berry yield and quality parameters were compared, these parameters were moderately related (Figure 5).Yield correlated well with TA and IMAD (r = −0.56 and r = 0.57, respectively), implying that higher yield is linked to lower TA and higher IMAD.Additionally, there was a moderately positive correlation between yield and TSS (r = 0.47).
Berry yield and quality showed moderate to relatively strong correlation with the vegetation indices from each category included in Table 1.Apparently, there were frequent and strong correlations between structure and water content-based indices.Individual vine yield showed the strongest correlation with WI (r = 0.67), which was closely followed by MTVI and GNDVI (r = 0.64 and r = −0.53,respectively).Physiology-based indices were consistently related to all the yield-related parameters and FIR4 was the best correlated with vine yield (r = 0.48).Very similar correlation patterns were observed for fruit quality as where observed for yield parameters.However, the correlations tended to have lower r values for fruit quality parameters than yield parameters.Specifically, IMAD had the best correlation with WI (r = 0.66), which was followed by RGI and GI (r = −0.61for both).WI appeared to be the best index for TSS and TA, as it had the highest r values of 0.55 and −0.66, respectively.The second highly correlated indices with TSS and TA were MTVI and RGI (r = 0.48 and r = −0.63,respectively).
the highest r values of 0.55 and −0.66, respectively.The second highly correlated indices with TSS and TA were MTVI and RGI (r = 0.48 and r = −0.63,respectively).

Model Performance and Accuracy Assessment
The results of the predictive models when using calibration and validation datasets are presented in Table 4.To increase the samples size and develop more robust and transferrable models, the data from different years, irrigation treatments, and rootstocks were pooled and randomly split into calibration and validation datasets.Validating the performance of prediction models using broadly ranged independent dataset helps identify reliable models with reduced uncertainty.The model performance evaluation was conducted by comparing the evaluation metrics (R 2 , the RMSE, and the RMSE%) derived from the five models for the respective berry yield and quality parameters.The model performance was assessed based on: (1) sample data used for model calibration; and (2) the independent validation dataset.
Calibration dataset-based assessment.As presented in Table 4, RFR models outperformed all other models in the calibration, with the highest R 2 of 0.845-0.884and the lowest RMSE% = 1-12% for berry yield and quality parameters.This was followed by MLR models with R 2 of 0.328-0.551and RMSE% = 2-24%.PLSR and ELM-based models produced similar results with R 2 of 0.257-0.512and RMSE% = 2-26%.In general, all the calibrated models tended to have higher prediction accuracy (i.e., higher R 2 , lower RMSE, and RMSE%) for yield and IMAD compared to TSS and TA.
Independent validation dataset-based assessment.Table 4 also presents model evaluation metrics for the independent validation dataset (20% of the dataset).In contrast to the performance of the prediction models in calibration, our proposed WRELM-TanhRe generally performed better than the other models, achieving the highest prediction accuracy for yield, TSS and IMAD with an R 2 of 0.522-0.682and RMSE% of 2-5%, while WRELM (activation function is ReLU) produced the best prediction for TA with R 2 of 0.545 and RMSE% of 6% (Figure 6).To the best of our knowledge, this study is the first to introduce ReLU into WRELM.The best performance of the RFR models in calibration was not confirmed in independent validation dataset and showed some tendency to overfitting.Nevertheless, RFR models performed better than the MLR and PLSR in predicting berry yield and quality parameters with relatively high R 2 , low RMSE, and RMSE%.

Model Performance and Accuracy Assessment
The results of the predictive models when using calibration and validation datasets are presented in Table 4.To increase the samples size and develop more robust and transferrable models, the data from different years, irrigation treatments, and rootstocks were pooled and randomly split into calibration and validation datasets.Validating the performance of prediction models using broadly ranged independent dataset helps identify reliable models with reduced uncertainty.The model performance evaluation was conducted by comparing the evaluation metrics (R 2 , the RMSE, and the RMSE%) derived from the five models for the respective berry yield and quality parameters.The model performance was assessed based on: (1) sample data used for model calibration; and (2) the independent validation dataset.
Calibration dataset-based assessment.As presented in Table 4, RFR models outperformed all other models in the calibration, with the highest R 2 of 0.845-0.884and the lowest RMSE% = 1-12% for berry yield and quality parameters.This was followed by MLR models with R 2 of 0.328-0.551and RMSE% = 2-24%.PLSR and ELM-based models produced similar results with R 2 of 0.257-0.512and RMSE% = 2-26%.In general, all the calibrated models tended to have higher prediction accuracy (i.e., higher R 2 , lower RMSE, and RMSE%) for yield and IMAD compared to TSS and TA.
Independent validation dataset-based assessment.Table 4 also presents model evaluation metrics for the independent validation dataset (20% of the dataset).In contrast to the performance of the prediction models in calibration, our proposed WRELM-TanhRe generally performed better than the other models, achieving the highest prediction accuracy for yield, TSS and IMAD with an R 2 of 0.522-0.682and RMSE% of 2-5%, while WRELM (activation function is ReLU) produced the best prediction for TA with R 2 of 0.545 and RMSE% of 6% (Figure 6).To the best of our knowledge, this study is the first to introduce ReLU into WRELM.The best performance of the RFR models in calibration was not confirmed in independent validation dataset and showed some tendency to overfitting.Nevertheless, RFR models performed better than the MLR and PLSR in predicting berry yield and quality parameters with relatively high R 2 , low RMSE, and RMSE%.

Variable Importance for Model Performance
In general, WI derived from canopy reflectance was the most important vegetation index in the prediction of berry yield and quality, with the exception of TA (Figure 7).The contribution of other indices changed depending upon different berry parameters.
The overall importance of pigment-based indices for yield prediction was lower compared to structure and physiology indices.Within the pigment-based indices category, TCARI was the most important index for berry yield and it was less important in the prediction of TSS, TA, and IMAD, for which RGI was the

Variable Importance for Model Performance
In general, WI derived from canopy reflectance was the most important vegetation index in the prediction of berry yield and quality, with the exception of TA (Figure 7).The contribution of other indices changed depending upon different berry parameters.
The overall importance of pigment-based indices for yield prediction was lower compared to structure and physiology indices.Within the pigment-based indices category, TCARI was the most important index for berry yield and it was less important in the prediction of TSS, TA, and IMAD, for which RGI was the most important.In addition, RGI was noticeably more important than the next important indices.
The most important index in the structure category for yield and TSS prediction was MTVI, followed by NDVI and GI in the prediction of yield and TSS, respectively.GI turned out to be the most important in TA and IMAD prediction, closely followed by GNDVI.
Among the stress-based indices, fluorescence related indices were shown to be markedly more important than others.In particular, the most important index for yield, TA and TSS prediction was FRI4, consistently followed by FRI2.In contrast, FRI2 was the most important index in IMAD prediction, and FRI4 was negligibly less important than FRI2.The most important index in the structure category for yield and TSS prediction was MTVI, followed by NDVI and GI in the prediction of yield and TSS, respectively.GI turned out to be the most important in TA and IMAD prediction, closely followed by GNDVI.
Among the stress-based indices, fluorescence related indices were shown to be markedly more important than others.In particular, the most important index for yield, TA and TSS prediction was FRI4, consistently followed by FRI2.In contrast, FRI2 was the most important index in IMAD prediction, and FRI4 was negligibly less important than FRI2.

Overall Performance of the Berry Yield and Quality Models
Vine productivity parameters, including berry yield and quality, were best estimated using machine learning-based prediction models.Previous studies have shown that there exist many factors that can nonlinearly affect the relationship between canopy reflectance factor spectra and vegetation traits [89,90].Compared to MLR and PLSR, the robust performance of RFR and the ELM-based machine learning models was most likely attributed to the existence of a non-linear relationship between grapevine productivity parameters and the hyperspectral vegetation indices.Although MLR and PLSR are widely used in statistical predictions, their limitation in handling non-linear relationships between vegetation traits and reflectance data has been noted in the literature [90][91][92].However, the better performance of the PLSR models compared to MLR models demonstrated the power of the PLSR in developing prediction models using principal component analysis (PCA) when there are many highly correlated independent variables [78].
In general, as indicated by evaluation metrics, the RFR models were superior to MLR and PLSR models.However, the RFR models could be overfitting in the prediction of berry yield and quality parameters.It was obvious from a significant decline in R 2 , and an increase in RMSE and RMSE% at the same time.Previous studies have demonstrated the potential of the RFR in the prediction of various plant traits from remote sensing observations by comparison with vegetation indices and linear regression methods [93][94][95].On the other hand, researchers have reported the poor performance of the RFR [96,97].These varying performances of RFR may be explained by noise in data caused by indirect spectral responses of target parameters and several samples used for model calibration [27,97,98].

Overall Performance of the Berry Yield and Quality Models
Vine productivity parameters, including berry yield and quality, were best estimated using machine learning-based prediction models.Previous studies have shown that there exist many factors that can non-linearly affect the relationship between canopy reflectance factor spectra and vegetation traits [89,90].Compared to MLR and PLSR, the robust performance of RFR and the ELM-based machine learning models was most likely attributed to the existence of a non-linear relationship between grapevine productivity parameters and the hyperspectral vegetation indices.Although MLR and PLSR are widely used in statistical predictions, their limitation in handling non-linear relationships between vegetation traits and reflectance data has been noted in the literature [90][91][92].However, the better performance of the PLSR models compared to MLR models demonstrated the power of the PLSR in developing prediction models using principal component analysis (PCA) when there are many highly correlated independent variables [78].
In general, as indicated by evaluation metrics, the RFR models were superior to MLR and PLSR models.However, the RFR models could be overfitting in the prediction of berry yield and quality parameters.It was obvious from a significant decline in R 2 , and an increase in RMSE and RMSE% at the same time.Previous studies have demonstrated the potential of the RFR in the prediction of various plant traits from remote sensing observations by comparison with vegetation indices and linear regression methods [93][94][95].On the other hand, researchers have reported the poor performance of the RFR [96,97].These varying performances of RFR may be explained by noise in data caused by indirect spectral responses of target parameters and several samples used for model calibration [27,97,98].
In similar studies, a simple linear relationship established between berry yield/quality parameters and hyperspectral vegetation indices derived from visible and near-infrared spectral region [15,16].Despite the undesirable prediction ability of RFR in the validation, the results of RFR in the calibration (R 2 of 0.845-0.884and the lowest RMSE% = 1-12%) alone are greatly improved relative to previously conducted similar studies.When the independent validation dataset is considered, ELM-based machine leaning methods are evaluated to be the best performing algorithms (R 2 of 0.522-0.682and RMSE% of 6-15%).It must be noted that compared to grapevine health, considerably fewer fruit quality estimation studies have been reported.One reason for this is the relatively indirect and complex relationship between the nadir remote sensing observations of canopy vegetation and fruit, which is on the lateral side of the canopy and not fully exposed to the sun [87].Alternatively, machine learning methods can reveal this complicated relationship by relating remote sensing data with fruit quality parameters.
In this contribution, the comparison of the different models indicates that ELM-based machine leaning methods, especially our newly proposed WRELM-TanhRe method, lead to the best results for the prediction of the berry yield and quality parameters with independent validation dataset.This is mainly due to two reasons: (1) our proposed nonlinear activation possesses the merits of ReLU and tanh, which are two widely used activations; (2) we suggest using WRELM-based regression due to its insensitivity to outliers in the data, when compared with basic ELM.

Contribution of Vegetation Indices to Berry Yield and Quality Estimation
The importance of WI in the prediction of berry yield and quality further confirmed the findings of [16], who reported the ability of WI with respect to yield and berry quality prediction in rainfed commercial vineyards, as grapevine productivity is strongly impacted by vine water status.
There was a significantly strong correlation between leaf pigment concentrations and fruit quality [14].Meggio et al. [15] found that indices designed to estimate carotenoid and anthocyanin could have more potential for the prediction of berry quality of vines affected by iron chlorosis than traditional structural and pigment-based indices, e.g., NDVI and TCARI/OSAVI.Mild water deficit and no visible signs of stress were reported for the vineyard under study by Maimaitiyiming et al.,and De Jong et al. [42,99] demonstrated the early response of RGI to drought and heat stresses, which affect leaf pigment concentrations and photosynthetic efficiency, and thus berry yield and quality.This may explain why RGI was the most important index among the pigment-based indices in developing a model for predicting berry qualities.
Vegetative biophysical parameters including leaf area index (LAI), biomass and vigor can be estimated using structural indices, and these parameters are known to be critical for producing sugars and acids in fruits through photosynthesis [100,101].Several relevant studies have shown a strong relationship between structural indices calculated from airborne multispectral datasets and fruit quality [17,102].Compared to traditional structural indices, e.g., NDVI and SR, MTVI has been proven to be sensitive to variations in biophysical parameters by minimizing the asymptotic saturation effect caused by high density of vegetation [92,103].Similarly, García-Estévez et al. [103] found a strong performance of canopy reflectance-based GI for fruit quality estimation in their recent work, and this is confirmed in the current study.
In addition to heat dissipation and photochemical quenching, actively emitting absorbed excess light energy as sun-induced fluorescence (SIF) in the 600-800 nm spectral region is another important photoprotection process for plants [104,105].Therefore, SIF has been considered as a proxy for plant health even though SIF signal is weak and only accounts for only 2-5% of the absorbed light energy.SIF is featured with two relatively strong emission peaks in the red (around 680-690 nm) and far-red (730-750 nm) spectral regions [106].The red portion of the SIF emission is usually reduced substantially at leaf and canopy level because the red peak region overlaps with the chlorophyll absorption region, while the far-red peak is minimally affected [107].Additionally, physiological, biochemical and structural factors jointly control the amount of SIF emission, and hyperspectral reflectance contains all these information [108][109][110].This may explain why reflectance-based FRI2 and FRI4 that use the far-red peak as a measurement band appeared to be more important in vine productivity production, especially under the condition that there was no significant reduction in chlorophyll concentration within the canopy.

Model scalability and transferability
Our findings are of great significance, because, traditionally, there is no reliable approach for predicting berry yield and quality before harvest [111][112][113][114].Here we recommend that the use of hyperspectral sensors, especially imaging hyperspectral sensors mounted on UAVs, will be faster and more computationally inexpensive compared to traditional methods.When satellite or aircraft-based observations are used, soil/background, canopy architecture and shadow may hinder the applicability of the method due to coarse spatial resolution [45,[115][116][117][118].This is particularly true for highly heterogenous fields of tree crops (orchard and vineyards), where plants are discontinuously row-structured [47,119,120].Furthermore, high cost, low revisit frequency and potential cloud occurrence limit the suitability of satellite remote sensing in agriculture, while operational complexity presents a major constraint for manned airborne platforms [121][122][123].Indeed, high spatial resolution images collected at low altitude have favorable signal-to-noise ratio, and it is possible to eliminate soil and shadow pixels with high confidence [40,[124][125][126].Additionally, image information (radiance and reflectance) extracted from pure vegetation pixels is likely to reduce the effects of shadows and background soils.
However, in this study, the spectral measurements were made by the spectroradiometer, which reduces the noise, needs no atmospheric correction and allows very fine spectral sampling.This is not the case for aerial hyperspectral imagery.Therefore, aerial imaging campaigns need to be designed in a way whereby rigorous atmospheric and geometric correction can be carried out to minimize the negative effects of atmosphere and platform instability.Additionally, most of the common UAV-friendly hyperspectral cameras cover the 400-1000 nm domain, and 450-950 nm or an even narrower region could be used for further analysis because of the sensor-inherent noise at longer wavelengths.Care must be taken to reduce or eliminate noise for calculation of WI, which uses 970 nm.When such noise is present or cannot be removed, the crop water stress index (CWSI) retrieved from thermal images acquired concurrently with hyperspectral images can be used to capture canopy water status.
Improved transferability and generalization of machine learning-based models are dependent upon on the variability of the calibration and independent validation dataset [127].To increase the transferability of the developed prediction models in this study, care was taken to ensure that berry yield and quality parameters were representative of various conditions.This was done by involving different irrigation regimes and rootstock in the experimental vineyard.Furthermore, neural network-based machine learning methods benefit from a large sample size [31].The power of WRELM-TanhRe and other machine learning methods may not have been fully exploited due to the relatively small sample size in our case.Nonetheless, we believe this is an encouraging first step towards developing more generalized global models.With an increased number of samples, future studies should focus on generalizing the predictive models at regional scale and other deep learning-based machine learning techniques should be used to explore their superiority over traditional machine learning methods for the prediction of berry yield and quality parameters.

Conclusions
Non-invasively predicting grapevine productivity is of great significance in serving the precision viticulture.In this regard, the main goal of this study was to calibrate robust prediction models for improving grapevine productivity using hyperspectral data and machine learning methods and develop a new algorithm that can overcome model overfitting.Therefore, we developed an ELM-based machine learning method, which possesses the advantages of the Tanh and ReLU as a dual activation function for the in-depth study of the complex relationship between the vegetation indices and vine productivity.Compared to the commonly used MLR, PLSR and RFR methods, ELM-based machine learning methods outperformed in all cases.The newly proposed WRELM-TanhRe method appeared to be the most robust in the prediction of berry yield, TSS, and IMAD, while TA was best predicted by WRELM.Variable importance analysis revealed comparable contributions of indices in each of the considered categories, and WI was consistently selected as the most important index in general.
To conclude, the current study has implication for ensuring the broader applicability of the previous studies focusing on the application of hyperspectral vegetation indices in the prediction of berry yield and quality.Importantly, the findings of this contribution provided the great potential for combining hyperspectral remote sensing and machine learning methods for prediction of berry yield and quality under different irrigation treatments and rootstock-scion interactions.The future work of this study is to scale up the models developed in the current work to UAV-based observations.Supplementary Materials: The following are available online at http://www.mdpi.com/2072-4292/11/7/740/s1,Table S1: Mean R 2 ± standard deviation of model runs with 10 repetitions, Tables S2-S5: Tukey's test and ANOVA results of different data splits.
Funding: Funding for this work was provided by National Science Foundation (IIA-1355406 and IIA-1430427), NASA (NNX15AK03H), Grape and Wine Institute at the University of Missouri-Columbia, and Center for Sustainability at Saint Louis University.

Figure 1 .
Figure 1.Overview of the vineyard: the experimental vine rows with different irrigation treatments and the individual vines grafted on the same rootstock in each plot are drawn on an image acquired with Sony's Alpha ILCE-7R camera on September 21, 2015.

Figure 1 .
Figure 1.Overview of the vineyard: the experimental vine rows with different irrigation treatments and the individual vines grafted on the same rootstock in each plot are drawn on an image acquired with Sony's Alpha ILCE-7R camera on September 21, 2015.Remote Sens. 2019, 11, x FOR PEER REVIEW 5 of 25

25 Figure 3 .
Figure 3. Flowchart of the berry yield and quality prediction process using canopy-level hyperspectral reflectance spectra.Grey box indicates the developed method in this study.

Figure 3 .
Figure 3. Flowchart of the berry yield and quality prediction process using canopy-level hyperspectral reflectance spectra.Grey box indicates the developed method in this study.

Figure 5 .
Figure 5. Correlogram of Spearman correlation coefficients found between vine berry yield/quality parameters and hyperspectral indices.Crosses indicate relationships that are not significant when p-values ≤ 0.05.

Figure 5 .
Figure 5. Correlogram of Spearman correlation coefficients found between vine berry yield/quality parameters and hyperspectral indices.Crosses indicate relationships that are not significant when p-values ≤ 0.05.

Figure 6 .
Figure 6.Observed versus predicted grapevine berry and quality parameters with the best ELM-based models using independent validation dataset.

Figure 6 .
Figure 6.Observed versus predicted grapevine berry and quality parameters with the best ELM-based models using independent validation dataset.

Figure 7 .
Figure 7. ELM-based best models showing the relative importance of vegetation indices in the prediction of berry yield and quality parameters.Indices within each category were ranked by importance.The higher the importance value (IV) the more important the index.

Figure 7 .
Figure 7. ELM-based best models showing the relative importance of vegetation indices in the prediction of berry yield and quality parameters.Indices within each category were ranked by importance.The higher the importance value (IV) the more important the index.

Table 1 .
Vegetation indices used in this study.

Table 1 .
Vegetation indices used in this study.

Table 2 .
Full names and acronyms of the frequently used new terms in Section 3.3.

Table 3 .
Descriptive statistics of the berry yield and quality parameters.
Each sample value is the mean of a plot of four individual vines with same rootstock.SD: standard deviation; CV: coefficient of variation.

Table 4 .
Results of the model performance for predicting berry yield and quality.Bold fonts for the best prediction models in validation.