Combination of an Automated 3D Field Phenotyping Workﬂow and Predictive Modelling for High-Throughput and Non-Invasive Phenotyping of Grape Bunches

: In grapevine breeding, loose grape bunch architecture is one of the most important selection traits, contributing to an increased resilience towards Botrytis bunch rot. Grape bunch architecture is mainly inﬂuenced by the berry number, berry size, the total berry volume, and bunch width and length. For an objective, precise, and high-throughput assessment of these architectural traits, the 3D imaging sensor Artec ® Spider was applied to gather dense point clouds of the visible side of grape bunches directly in the ﬁeld. Data acquisition in the ﬁeld is much faster and non-destructive in comparison to lab applications but results in incomplete point clouds and, thus, mostly incomplete phenotypic values. Therefore, lab scans of whole bunches (360 ◦ ) were used as ground truth. We observed strong correlations between ﬁeld and lab data but also shifts in mean and max values, especially for the berry number and total berry volume. For this reason, the present study is focused on the training and validation of di ﬀ erent predictive regression models using 3D data from approximately 2000 di ﬀ erent grape bunches in order to predict incomplete bunch traits from ﬁeld data. Modeling concepts included simple linear regression and machine learning-based approaches. The support vector machine was the best and most robust regression model, predicting the phenotypic traits with an R 2 of 0.70–0.91. As a breeding orientated proof-of-concept, we additionally performed a Quantitative Trait Loci (QTL)-analysis with both the ﬁeld modeled and lab data. All types of data resulted in joint QTL regions, indicating that this innovative, fast, and non-destructive phenotyping method is also applicable for molecular marker development and grapevine breeding research.


Introduction
Grapevines are one of the economically most important fruit crops in the world, with a high requirement for quality and phytosanitary aspects. Fungal pathogens like Plasmopara viticola (P. viticola), Erysiphe necator (E. necator), and Botrytis cinerea (B. cinerea) are a challenge for winemakers around the world, requiring intense crop management, and the application of fungicides during the vegetation period. In the past decade, grapevine breeders have made remarkable progress in breeding new resistant varieties by the combination of important quality traits with active resistance genes against P. viticola and E. necator [1]. In contrast, for B. cinerea, an effective cellular defense mechanism has not been reported for grapevines yet. Therefore, plant resistances against B. cinerea are not available for breeding purposes.
Several studies describe a link between a compact bunch structure and an enhanced risk for bunch rot infestations caused by B. cinereal, and vice versa, a loose bunch structure provides an enhanced resilience towards Botrytis bunch rot [2][3][4][5]. Loose bunch structure is therefore an important selection trait for grapevine breeding [6][7][8]. The degree of compactness is thus strongly related to the combination of berry traits (e.g., berry number and berry size) and stem traits (e.g., the bunch width and length) (a detailed overview is given by Tello & Ibáñez [9]). The combination of these different berry and stem traits represents the grape bunch architecture. For scoring the degree of bunch compactness, a qualitative descriptor was developed by the Organization of Vine and Wine (OIV). The OIV descriptor 204 (OIV 204) rates the level of compactness/density into five categories: class 1 -very loose bunch architecture, up to class 9 -very dense bunch architecture [10]. This categorical classification system is rather superficial, as it does not provide any quantitative information about the sub-traits of the grape bunch architecture that have a high impact on, bunch density, the combination of berry number, total berry volume, and grape bunch length [8,9,11]. Moreover, these traits have a strong link to yields [12]. Precise (quantitative) data on these sub-traits are needed to identify underlying genetic determinants by Quantitative Trait Loci (QTL) analysis, and marker development for SMART-breeding approaches. These tools are needed to develop valid genetic markers, which increase breeding efficiency due to an earlier seedling selection. The bottleneck for using these highly efficient genotyping technologies is the lack of comparably efficient phenotyping methods [13].
In the recent past, few studies have been conducted, applying two-or three-dimensional (2D or 3D) sensor systems. Automatic and semi-automatic image analysis was often used to extract phenotypic traits from Red-Green-Blue (RGB)-images of grapes [14][15][16][17]. Based on the extracted phenotypic traits of 135 table grape bunches of Vitis labruscana cv. Kyoto, Chen et al. performed a predictive modeling approach in order to automatically grade the compactness of bunches from images [15]. In contrast to 2D imaging, 3D data are fully informative and highly precise. An additional dimension gives important information about the shape, structure, and volume of the scanned object [18][19][20]. Several studies have addressed the potential of 3D based plant phenotyping methods within different scenarios. Measurement of vegetative parameters like plant and canopy height, leaf parameters, fruit detection, and characterization of crop and fruit plants have been described (An overview is given by Paulus [21]). Regarding the phenotypically complex and highly varied characteristics of grape bunches: In different studies, 3D scanning was applied under controlled lab conditions in order to obtain the full 360 • structure of the bunches [8,[22][23][24][25][26][27]. The limitation of several 3D methods is a reduced throughput, which restricts the number of grapes that can be used for analysis. Recently, we established an automated 3D based phenotyping pipeline that is 10-times faster compared with 2D based approaches [8]. The application of the method results in dense and accurate 3D data of the whole grape bunches. Scanning takes only a few seconds (depending on the shape and size of the bunch). The following automated analysis framework (operable over an intuitive graphical user interface) extracts important bunch architecture-related traits from 3D scans like berry number, mean berry diameter and volume, total berry volume (total volume), bunch width and bunch length. In these experiments, high correlation coefficients of up to 0.95 for berry number and total volume within a wide range of different phenotypes (cultivars and specimens of an experimental breeding population) were observed. A correlation of r = 0.9 was observed for berry volume in comparison to 2D ground truth data [8,27]. Higher variations were observed for the correlation coefficients of bunch length between the cultivar and the specimen of the experimental breeding population (0.85-0.57, respectively). Lowest correlation values were observed for bunch width comparisons between 3D Remote Sens. 2019, 11, 2953 3 of 22 and 2D ground truth data (r = 0.59) [8]. However, all these mentioned studies mainly focused on an invasive lab approach. This means that bunches of interest must be harvested and brought to the lab, which is again time-consuming and a source of errors.
Studies on grape bunch development or characterization of grapevine seedlings in the first step of selection require non-invasive phenotyping approaches that must be applied directly in the field.
Most of the filed phenotyping in viticulture regards the prediction of yield parameters. Herrero-Huerta et al. [28] collected images from different overlapping angels of 20 grape bunches. A combination of Photogrammetry quality and computer vision techniques the reconstruction of 3D point clouds of the bunches. They compared the number of berries and weight derived from the point clouds to the actual values and obtained R 2 of 0.78 and 0.80, respectively [28]. In the study of Rose et al. [29], a multi-view stereo approach was used to record grape bunches directly at vine rows. They estimated the berry number and berry size parameter from the partial point clouds.
Hacking et al. [30] used a Kinect 3D sensor to reconstruct 20 grape bunches from the field. They used fruit volume estimation to predict yield with r = 0.61.
Focusing on grape bunch architecture-related traits, Rist et al. [8] tested the applicability of a structured-light based sensor in the field and conducted a first proof-of-principle study by scanning a smaller subset of 48 grape bunches from four different cultivars. The major difference in the field is that only the visible side of the bunch can be scanned. Scanning in the field is further restricted by the recoding angle. For 3D plant phenotyping, few studies investigate the impact of the recording angles on the estimation of phenotypic traits. Depending on the plant and the measured trait, single recording angles can affect phenotypic measurements [31,32]. The comparison between the field and the 360 • scans of the 48 grape bunches revealed that approximately only 30-40% of the bunch structure can be recorded (due to the recording angle and depending on size and shape of the bunch and the presence of secondary bunches). As a result, phenotypic values differ between the two approaches [8]. From a breeder or scientific point of view, the knowledge of the complete 3D information is very important, especially to obtain the total berry number and, thus, the total volume of grapes for yield estimation, genetic studies, or selection of breeding material. Thus, phenotypic data need to be predicted by statistical methods in order to obtain full information from partial 3D field scans. Making predictions based on patterns and relationships in datasets is known as predictive modeling [33]. An enormous number of different regression methods can be used for finding those patterns that explain variation in the dependent variable. Mathematically simpler approaches like linear regression models or more complex machine learning black-box based models are used in order to increase prediction efficacy and minimize prediction errors [34].
The present study addresses the adaptation and validation of a lab-established, high throughput 3D-based phenotyping pipeline for automated and objective analysis of the grape bunch architecture traits under field conditions. The study also aimed to minimize the loss of information due to partial field scans at the prediction of 360 • lab scan related phenotypic values, on the basis of the field scans. Therefore, we scanned grapevine material with different genetic backgrounds, representing a wide range of phenotypically variable grape bunches in both field and lab environments. Moreover, a comparable QTL analysis was conducted in order to test the usability of this method for applications in molecular grapevine breeding and genetic research. The study is divided into three main tasks: 1. Assessing the relationships between full scans taken under standardized 360 • lab conditions and partial field scans of a preferably broad spectrum of phenotypic variation of nearly 2000 grape bunches; 2. Compare the predictive performance of several regression models with different complexity to predict missing bunch traits like berry number from field scan data, 3. Proof-of-concept that the 3D based phenotypic data are valid for QTL-analysis.
All samples were taken during two consecutive growing seasons in 2017 and 2018 from the basal insertion of the first three central shoots of the fruit cane and cut directly at the shoot at maturity, according to BBCH89 [35]. Every sampled bunch was rated according to the OIV descriptor OIV 204 and OIV 208 (shape) [10]. All vines were treated with best local practice policies for viticulture.

Field and Lab Scans
For 3D imaging, we used the 3D scanner Artec Spider (Artec 3D, L-1466, Luxembourg). Bunches were scanned in the field from the visible side according to the maximum scanning range of the sensor of 25-30 cm distance. Approximately 30 to 40% of the bunch structure could be recorded depending on bunch shape (secondary bunches) and bunch size. An artificial background was used in the case of inferring leaves and/or adjacent bunches to prevent false phenotypic results. Right after the field scan, bunches were harvested, transported to the lab, and fixed to a hook, which was attached to a motorized device. For the lab scan, bunches were rotated at different speeds (between 0.5 s −1 and 0.16 s −, depending on size and shape) and scanned for 360° under controlled light conditions. Scanning distance was kept constant in the lab and in the field. According to [8], bunches were scanned until the entire structure was captured ( Figure 2a). The produced point clouds of the bunches from field and lab scans ( Figure 2b) were stored in polygon file format and analyzed with the software '3D Bunch Tool'. All samples were taken during two consecutive growing seasons in 2017 and 2018 from the basal insertion of the first three central shoots of the fruit cane and cut directly at the shoot at maturity, according to BBCH89 [35]. Every sampled bunch was rated according to the OIV descriptor OIV 204 and OIV 208 (shape) [10]. All vines were treated with best local practice policies for viticulture.

Field and Lab Scans
For 3D imaging, we used the 3D scanner Artec Spider (Artec 3D, L-1466, Luxembourg). Bunches were scanned in the field from the visible side according to the maximum scanning range of the sensor of 25-30 cm distance. Approximately 30 to 40% of the bunch structure could be recorded depending on bunch shape (secondary bunches) and bunch size. An artificial background was used in the case of inferring leaves and/or adjacent bunches to prevent false phenotypic results. Right after the field scan, bunches were harvested, transported to the lab, and fixed to a hook, which was attached to a motorized device. For the lab scan, bunches were rotated at different speeds (between 0.5 s −1 and 0.16 s −1 depending on size and shape) and scanned for 360 • under controlled light conditions. Scanning distance was kept constant in the lab and in the field. According to [8], bunches were scanned until the entire structure was captured ( The 3D Bunch Tool works in the following way: A region growing is applied to the point cloud to create segments containing one berry each. In the resulting segments, spheres as parameterized representations of berries are fitted based on the Random Sample Consensus algorithm. All sphere hypotheses containing more than 100 inliers are counted as valid. In a post-processing step, sphere hypotheses that overlap to more than 25% are compared, and only the sphere with the highest number of inliers is kept. From the final set of spheres, the berry number, mean berry diameter (mean diameter) (mm), mean berry volume (mean volume) (mL), and total berry volume (mL) (total volume) as the summed up volumes of all detected spheres are computed. Bunch width (mm), bunch length (mm), and convex hull volume (CVH) (mL) are derived from the points that lie in close distance to the surface of a sphere, with bunch width being the largest distance between two of the points on a horizontal axis and bunch length the largest distance on the vertical axis [8].

Correlation between 360°-Lab vs. Field Scans to Investigate Important Grape Bunch traits
To determine the relationship between data from 360° lab and field scans, berry number, mean diameter, mean volume, total volume, CVH, bunch width, and bunch length from the lab and field were compared using Pearson correlation coefficients (α = 0.05) for all phenotypic data. An overview of the complete workflow is shown in Figure 4. The 3D Bunch Tool works in the following way: A region growing is applied to the point cloud to create segments containing one berry each. In the resulting segments, spheres as parameterized representations of berries are fitted based on the Random Sample Consensus algorithm. All sphere hypotheses containing more than 100 inliers are counted as valid. In a post-processing step, sphere hypotheses that overlap to more than 25% are compared, and only the sphere with the highest number of inliers is kept. From the final set of spheres, the berry number, mean berry diameter (mean diameter) (mm), mean berry volume (mean volume) (mL), and total berry volume (mL) (total volume) as the summed up volumes of all detected spheres are computed. Bunch width (mm), bunch length (mm), and convex hull volume (CVH) (mL) are derived from the points that lie in close distance to the surface of a sphere, with bunch width being the largest distance between two of the points on a horizontal axis and bunch length the largest distance on the vertical axis ( Figure 3) [8].

Correlation between 360 • -Lab vs. Field Scans to Investigate Important Grape Bunch traits
To determine the relationship between data from 360 • lab and field scans, berry number, mean diameter, mean volume, total volume, CVH, bunch width, and bunch length from the lab and field were compared using Pearson correlation coefficients (α = 0.05) for all phenotypic data. An overview of the complete workflow is shown in Figure 4.

Data Setting and Preprocessing
The experimental plant material (n = 1478 samples) was randomly partitioned in 2/3 training and 1/3 validation data (n = 986 vs. n = 492). It displays a wide range of morphological characteristics ( Table 1). All classes of OIV 204 (Class 1: 13%, Class 3: 32%, Class 5: 36%, Class 7: 14%, Class 9: 5%) and OIV 208 (Class 1: 29%, Class 2: 33 %, Class 3: 36%) were present. Approximately 35% of the individuals showed large secondary bunches. Fifteen individuals had a very small amount of berries and uncommon shapes due to flower abscission (less than 40 berries per bunch) ( Figure 1) [8,17]. All remaining data were used as an external test dataset (n = 428). The descriptive statistics are shown in Supplementary 1. This data set contained a larger genetical background with breeding samples, commercially relevant cultivar samples, and OIV 204 representative specimens. For data preprocessing, missing values were excluded, and some variables were ₂ √ and x 0.3 transformed to linearize relationships between predictor and dependent variables corresponding to Tukey and Mosteller's bulging rule [36]. An overview is given in Supplementary 2.

Data Setting and Preprocessing
The experimental plant material (n = 1478 samples) was randomly partitioned in 2/3 training and 1/3 validation data (n = 986 vs. n = 492). It displays a wide range of morphological characteristics ( Table 1). All classes of OIV 204 (Class 1: 13%, Class 3: 32%, Class 5: 36%, Class 7: 14%, Class 9: 5%) and OIV 208 (Class 1: 29%, Class 2: 33 %, Class 3: 36%) were present. Approximately 35% of the individuals showed large secondary bunches. Fifteen individuals had a very small amount of berries and uncommon shapes due to flower abscission (less than 40 berries per bunch) ( Figure 1) [8,17]. All remaining data were used as an external test dataset (n = 428). The descriptive statistics are shown in Supplementary Table S1. This data set contained a larger genetical background with breeding samples, commercially relevant cultivar samples, and OIV 204 representative specimens. For data preprocessing, missing values were excluded, and some variables were 2 √ and x 0.3 transformed to linearize relationships between predictor and dependent variables corresponding to Tukey and Mosteller's bulging rule [36]. An overview is given in Supplementary Table S2. Table 1. Regression models and methods used in this study for the prediction of phenotypic parameters.

Concept Hyper Parameters Concept Description
Lm 1 Simple linear regression none The linear regression is based on ordinary least squares (OLS) and estimates the linear effect of one or more explanatory variables on the dependent variable [37].
lm stepAIC 2 Multiple linear regression with stepwise feature selection for raw data none The multivariate linear regression model is also based on OLS but performs a stepwise feature selection according to the Akaike information criterion (AIC) on raw and transformed data as predictor variables [37]. The AIC is used to find the most parsimonious model among a set of candidate models as it weights the goodness of fit against the number of estimated parameters in a model [38].
Multiple linear regression with stepwise feature selection for transformed data none glmAIC_model 2 Generalized Linear Model with Stepwise Feature Selection none The generalized linear models are an extension to a linear model since it allows different probability distribution for the response variable (e.g., Poisson) and different link functions (e.g., log), which specifies the relationship between the mean of the response and the covariates through the linear predictor [39]. Also, for Generalized Linear Models GLMs, a stepwise feature section based on AIC was applied.
bayes glm 2 Bayesian Generalized Linear Model none A generalized linear model, extended with a posterior probability, i.e., a prior distribution as recommended by [39,40]. Generalized linear model with lasso (alpha = 1), ridge (alpha = 0) or elastic-net regularization (alpha > 0 and < 1) is a useful method for the N << p case (i.e., relatively small sample size in comparison to many explanatory variables) since it prevents overfitting through a correction factor lambda, which shrinks coefficient estimates towards or to zero [37]. These methods usually decrease the error variance at the cost of introducing some bias.

Comparison of Different Regression Models
In order to predict the four important phenotypic traits (berry number, bunch width, bunch length, total volume) of 360 • lab scans from phenotypic values from field scans, several regression models were compared. In the first approach, a simple (univariate) linear regression model (lm) was fitted with the individual phenotypic trait from the 360 • lab scans as a dependent variable and the corresponding trait from the field as a predictor variable. In the second approach, models were extended by all phenotypic traits (berry number, berry diameter, berry volume, bunch width, bunch length, CVH, and total volume) from the field as predictor variables (i.e., multivariate models). Table 1 shows a complete overview of the used regression models. Those regression models included linear regression with stepwise feature selection on raw and transformed data (lm stepAIC), Generalized Linear Model with Stepwise Feature Selection (glmAIC_model), Bayesian Generalized Linear Model (bayes glm), Lasso and Elastic-Net Regularized Generalized Linear Model (glmnet_model), Support Vector Machines with linear and polynomial kernel (svm linear, svm poly) and Random Forest (random forest). The glms were fitted for the berry number with Poisson distribution and log-link for bunch length, bunch width, and total volume with gamma distribution and log-link. In addition, a negative binomial glm with log-link was fitted for berry number ( Table 1). The lm, lm stepAIC, glmAIC_model, and bayes glm models were parameterized using tenfold-repeated cross-validation, while the models that require hyperparameters were tuned with an adaptive resampling search on tenfold-repeated cross-validation according to Kuhn 2014 [43].
For model evaluation and quantification of model performance, the Root Mean Squared Error (RMSE) and the Coefficient of Determination (R 2 ) were calculated on training, validation, and test data. We ranked the models according to the lowest error on the respective data set and compared them with each other. Model diagnostics were performed by plotting observed versus predicted values to visualize model fit, including potential outliers and bias. Additionally, in the test data, we assessed the model fit of the best models by performing an analysis of variance (ANOVA) and Tukey's test on the model residuals of the largest phenotypic groups: i.e., GF × VB, 'Riesling', 'Regent', 'Chardonnay', 'Felicia' and with all remaining samples characterized as OIV204 group. Variable importance was determined in a sensitivity analysis by changing parameter values at a time and comparing the effect on the model output [44,45].
All statistical analyses were performed using R version 3.5.2 [46] with the caret package to build the modelling framework [47], the ggplot2 [48], ggpubr [49] and the lattice packages [50] for graphics, the emmeans package for the Tuckey-test [51], and the rminer package [52] for the estimation of variable importance.

QTL-Analysis
For 150 progenies of the experimental breeding population GF × VB, a QTL-analysis was performed on the data of 2018 with MapQTL6.0 [53]. The software uses as input a genetic map together with phenotypic and molecular marker data of the breeding population. Then the probability of a linkage between phenotypic variation patterns and the segregating marker positions can be calculated for each position on the map, determining the position of a QTL. This probability is estimated by the "logarithm of the odds" (LOD) ratio. It refers to the correlation between the phenotypic values and the corresponding molecular markers of each genotype in the experimental breeding population. We performed an interval mapping approach (IM) and a step size of 1 Centimorgan (cM) with the genetical map from Zyprian et al. [54].
A chromosome-specific trait-linked "logarithm of the odds" (LOD) threshold was calculated with a permutation test (1000 iterations, p < 0.05). Genetic regions that cross this LOD threshold were defined as QTL regions. MapQTL6.0 was used to calculate QTLs [53].
In the next step, we compared QTLs derived from phenotypic data from the lab, the field, and predicted values from the best performing models of the test dataset (see results). We compared interval lengths, the LODmax values, and the LODmax positions between the three data sources (lab, Remote Sens. 2019, 11, 2953 9 of 22 field, and model) using separate paired t-Tests (i.e., lab vs. field and lab vs. model). Separate paired t-Tests were necessary since the number of common QTLs differed between comparisons.
We provide information about the LODmax position with the corresponding values and the explained phenotypic variation. Additional information about chromosome-specific significance thresholds (Chr. Spec. Sig), the nearest QTL bordering markers, and the nearest LODmax position markers are provided in Supplementary Table S5.

Relationship between Lab and Field Scans
The major difference from the point clouds, generated in the field, to the point clouds that are generated in the lab is that they have a considerable amount of missing points resulting in an incomplete point cloud and thus different and mostly lower phenotypic values ( Table 2). A high correlation was observed for all parameters recorded 360 • in the lab and their respective parameter values recorded in the field. The highest correlations were observed for mean diameter and mean volume (r = 0.94 and 0.96, respectively) with a linear relationship ( Table 2). Berry number and total volume correlations were r = 0.89−0.91, with a linear relationship. CVH showed a correlation of 0.79, and bunch length and bunch width showed correlations between r = 0.74 and 0.82, respectively ( Table 2).

Regression Modelling and Comparison of Performance
The different regression models revealed in some but not all cases better performance than the simple linear regression analysis (Table 3). In this study, all phenotypic traits were predicted with R 2 -values ranging between 0.54−0.82 for the validation data and 0.70−0.91 for the test data (Table 3). The gain in performance often led to slight improvements. Notably, models with svm poly fit showed the lowest RMSE in all cases and an R 2 of 0.86 for berry number, 0.71 for bunch width, 0.70 for bunch length, and 0.91 for total volume when applied to the external test data (Table 3).
The comparison of the performance of the best models from the validation data with the performance when applied on the test data revealed for berry number a RMSE of 24.29 with the lm stepAIC model based on transformed data and of 18.20 with the svm poly model (Table 3, Figure 5a,b).
For the bunch width, the RMSE of the lm stepAIC model based on transformed data was 13.80 mm, when applied on the test data, while the svm poly showed a better performance with a RMSE of 13.51 on the test data. (Table 3, Figure 5c,d).
For bunch length, the RMSE of the svm linear model was 19.96 mm for test data, while it was 19.24 mm for the svm poly model (Table 3, Figure 5e,f).
The best performance for total volume was observed for the svm poly on both the validation and the test data with a RMSE of 36.11 mL and 28.09 mL, respectively (Table 3, Figure 5g). The comparison of the performance of the best models from the validation data with the performance when applied on the test data revealed for berry number a RMSE of 24.29 with the lm stepAIC model based on transformed data and of 18.20 with the svm poly model (Table 3, Figure 5a,b).  data (a, c, e, g) and on the best models from the test data (b, d, f, g) (in all cases, multivariate models). On the x-axis are predicted values, on the y-axis, the observed values. The bisecting line is black, and the regression line blue.
For the bunch width, the RMSE of the lm stepAIC model based on transformed data was 13.80 mm, when applied on the test data, while the svm poly showed a better performance with a RMSE of 13.51 on the test data. (Table 3, Figure 5c,d).
For bunch length, the RMSE of the svm linear model was 19.96 mm for test data, while it was 19.24 mm for the svm poly model (Table 3, Figure 5e,f).
The best performance for total volume was observed for the svm poly on both the validation and the test data with a RMSE of 36.11 mL and 28.09 mL, respectively (Table 3, Figure 5g).  number (a,b), bunch width (c,d), bunch length (e,f), and total volume (g). Predictions are based on the best models from the validation data and applied on the test data (a,c,e,g) and on the best models from the test data (b,d,f,g) (in all cases, multivariate models). On the x-axis are predicted values, on the y-axis, the observed values. The bisecting line is black, and the regression line blue.

Residuals over Phenotypic Groups
Residuals of the svm poly models for berry number and total volume did not differ between cultivars (Figure 6a,d, see Supplementary Table S3 for ANOVA and Tukey's test results). For the trait bunch width, residuals of GF × VB, 'Felicia', 'Regent', and OIV204 were significantly lower than for 'Chardonnay' (Figure 6b, Supplementary Table S3). For bunch length, residuals were on average positive, indicating that the model underestimates the bunch length of the cultivars. This bias was lowest for GF × VB and 'Riesling', and highest for 'Felicia' and 'Regent' (Figure 6c, Supplementary  Table S3). cultivars (Figure 6a, d, see Supplementary 3 for ANOVA and Tukey's test results). For the trait bunch width, residuals of GF × VB, 'Felicia', 'Regent', and OIV204 were significantly lower than for 'Chardonnay' (Figure 6b, Supplementary 3). For bunch length, residuals were on average positive, indicating that the model underestimates the bunch length of the cultivars. This bias was lowest for GF × VB and 'Riesling', and highest for 'Felicia' and 'Regent' (Figure 6c, Supplementary 3).

Variable Importance
The corresponding predictor variable is shown for every respective response trait of the highest importance (Figure 7). For bunch width, several predictor variables showed a comparable and higher amount of importance (Figure 7b). In general, number of berries and total volume showed a high impact on predictor importance for every trait. (Figure 7). The berry size traits mean diameter and mean volume had moderate importance for berry number, bunch width, and total volume ( Figure   Figure 6. Distribution of residuals according to the phenotypic groups in the test dataset. Residuals for prediction with the Support Vector Machines with polynomial kernal (svm poly) for berry number, (a) bunch width, (b) bunch length, and (c) total volume (d). Groups that are indicated by the same letter do not differ at an alpha of 0.05 (Tukey's test).

Variable Importance
The corresponding predictor variable is shown for every respective response trait of the highest importance (Figure 7). For bunch width, several predictor variables showed a comparable and higher amount of importance (Figure 7b). In general, number of berries and total volume showed a high impact on predictor importance for every trait. (Figure 7). The berry size traits mean diameter and mean volume had moderate importance for berry number, bunch width, and total volume (Figure 7b,d) and lower importance for bunch length (Figure 7a,c). The categorical predictor for bunch density OIV 204 showed moderate to low importance for all predicted traits (Figure 7). OIV 208 also showed moderate importance for bunch width but rather small importance for berry number, bunch length, and total volume (Figure 7). Remote Sens. 2019, 11, x FOR PEER REVIEW 13 of 21 7b, d) and lower importance for bunch length (Figure 7a, c). The categorical predictor for bunch density OIV 204 showed moderate to low importance for all predicted traits (Figure 7). OIV 208 also showed moderate importance for bunch width but rather small importance for berry number, bunch length, and total volume (Figure 7).

Figure 7.
Variable importance for the predicted phenotypic traits for the best performing models on test data. Svm poly fit for berry number, (a) bunch width, (b) bunch length, and (c) total volume.

Comparison of Lab, Modelled, and Field Data for QTL Application
The lab data showed 14 QTLs, the field data 15 QTLs, and the modeled data 13 QTLs for the four parameters berry number, bunch width, bunch length, and total volume (Table 4). Detected QTLs explained between 8% and 13% of phenotypic variance. Comparing the positions of QTLs obtained for lab data, ten QTLs for field data and nine for modeled data showed an overlap, respectively (Table 4). LODmax values, positions, and interval lengths of the overlapping QTL regions did not differ between lab, field, and modeled data (Supplementary 4). Comparing QTLs based on 360° lab data the QTLs that are unique on field data are five and four for the modeled data (Table 4) Figure 7. Variable importance for the predicted phenotypic traits for the best performing models on test data. Svm poly fit for berry number, (a) bunch width, (b) bunch length, and (c) total volume.

Comparison of Lab, Modelled, and Field Data for QTL Application
The lab data showed 14 QTLs, the field data 15 QTLs, and the modeled data 13 QTLs for the four parameters berry number, bunch width, bunch length, and total volume (Table 4). Detected QTLs explained between 8% and 13% of phenotypic variance. Comparing the positions of QTLs obtained for lab data, ten QTLs for field data and nine for modeled data showed an overlap, respectively (Table 4). LODmax values, positions, and interval lengths of the overlapping QTL regions did not differ between lab, field, and modeled data (Supplementary Table S4). Comparing QTLs based on 360 • lab data the QTLs that are unique on field data are five and four for the modeled data (Table 4)

Relationship and Model Comparison Framework
In the present study, a portable, high-resolution 3D imaging sensor was applied directly in the vineyard in order to acquire dense but partial 3D point clouds of the visible side of grape bunches. The resulting partial phenotypic data were correlated with complement phenotypic data extracted from 360 • lab scans. We observed strong correlations between both data sets, especially for all berry related traits: berry number, berry diameter, berry volume, and total volume. For 3D based plant phenotyping, a few studies demonstrate that the amount of recording angle(s) can have an impact on the outcome of phenotypic parameter measurements, depending on the measured plant. For instance, Andújar et al. [31] captured RGB-D images of poplar seedlings from different single recording angles but also for 360 • in order to determine the best angle for the extraction of plant parameters. For one-year-old poplar plants, 90 • front view scans showed the highest similarities in comparison to 360 • scans for the estimation in total biomass and height. Sun and Wang [55] used a Kinect v2 sensor to scan tomato plants in the greenhouse. They used different recording angles (three vs. four recording angles) and converted the corresponding RGB-D images into 3D point clouds. The reconstruction methods based on the number of recording angles showed an impact on the outcome of the calculated phenotypic parameters. The height, width, and area of the canopy showed a lower SD and CV for point clouds reconstructed by four different recording angles. In contrast, no significant effect was observed for canopy volume estimation between the different recording angles. In our previous study [8], the robustness and reliability of the phenotyping pipeline (sensor application and '3D Bunch Tool') was already validated on approximately 300 different grape bunches, including 222 samples from the experimental population GF × VB. We found high correlations between the 360 • 3D data and proposed ground truth data (e.g., berry number and total volume with correlations, both 0.95). Together with the findings of [8], the observations indicate that the scanning of grape bunches from the visible front side in the field is suitable for the extraction of grape bunch architecture determining traits.
Correlations, however, do not provide information on the shape and slope of relationships. In our studies, the traits berry number, bunch width, bunch length, and total volume showed a shift in mean and max values between field and lab data. Hence, different modeling concepts, including simple linear models and more complex machine learning methods, are required. Notably, for the berry size traits (mean diameter and mean volume), almost no differences in min, mean, and max values were observed between field and lab data. Therefore, modeling was less required for berry size traits.
Every model algorithm follows different mathematical operations, which can lead to strong differences in performance [56]. Thus, it is reasonable to compare different regression models. In various disciplines, there are examples where the comparison of different regression models revealed wider or only slighter differences between models [56][57][58][59].
Focused on the validation data, the simplest regression model with univariate lm showed in comparison to more complex regression models, similar predictive performances, except for total volume and bunch width. For most of the regression models, only minor differences in performance were observed. For example, the RMSE for the berry number of the best three models ranged from 18.49-18.12, which indicates consistent performance across the majority of all tested regression models. The more complex multivariate regression models on transformed data (berry number, bunch width) and the svm models (bunch length, total volume) showed the best predictive performances. Our results demonstrate that direct predictors have a strong linear relationship and the highest importance on the prediction of 360 • values, whereas the other additional variables have a wider relationship and hence only small importance for berry number and bunch length. In contrast, the gap in importance between the direct predictor and the remaining predictor variables for bunch width and total volume is not that wide. These phenotypic traits a more complex, and measurement errors during the field scan may be larger in comparison to bunch length when the shape of the bunch is heterogeneous.
To further assess the quality and generalizability of a model, we applied it to independent data that were not used in model building [33,60]. The specimen, used in the test data do have either another genetical background or the same genetical background but were crossed independently in comparison to a specimen of training and validation data, which increased morphological variability. Interestingly the test data displayed differences in observed values, especially at the lower value range, i.e., the test data has extreme values outside the observations of the training data. Hence, in this study, all models were applied to the test data to assess their performance on a dataset with different properties. Particularly OLS-based multivariate regression model may be prone to unusual observations [61], which is also observable in our study. The multivariate regression models on transformed data dropped substantially in performance for the predicted traits berry number and total volume, whereas the svm poly showed the most stable and the best predictive performance for all predicted traits. In contrast, svms are characterized by seeking to minimize the impact of outliers, having the ability for a good generalization [33,62], and have been proven in many different disciplines [56,[63][64][65].
The traits berry number, bunch width, and total volume could be generalized better than bunch length in the test data. This is shown by the higher amount of explained variability and hence, lower RMSE in comparison to the validation data. One explanation for this might be that in this data set, a higher amount of morphological uniformity is present as it consists of more uniform replications from different cultivars. The residual plots for the largest phenotypic groups in the test dataset revealed significant differences for bunch length (and bunch width) between cultivars indicating a cultivar specific errors, which might be due to morphological attributes. Diago et al. and Millan et al. observed cultivar dependent differences in the prediction of berry number by 2D images, and they proposed that the different levels of compactness influenced model outcomes [15,66]. Further reasons for model uncertainties might be differences in the experimental procedure between field and lab. Measurements in the field, especially for bunch width and length, are more affected by the recording angle and the bunch position at the vine than 360 • measurements made in the lab. Moreover, the natural shape of the bunches is altered after harvest and fixation on a hook in the lab. Lower correlations for the shape parameters (bunch width and bunch length) supports this assumption. Further studies may clarify the impact of morphological attributes and the experimental set up on measurement errors. Taken together, our results showed that the best performing models on the validation data were not the best performing models for the external test data (especially for the prediction of berry number). The grape bunch architecture and related traits can strongly differ between grapevine varieties, breeding seedlings, or genetic resources, and thus, phenotypic variation can vary between years and investigated material. Based on our results, we propose a continuous and wider evaluation of regression models, especially when the phenotypic values of investigated samples reveal stronger differences.

Application for QTL-Analysis
In the past years, several studies investigated bunch architecture-related traits and could identify several genetic regions attributed to bunch compactness (an overview is given by Tello and Ibanez [9]). Reliable molecular markers are not available for breeding purposes, yet. One of the most challenging reasons for that is the lack of high-throughput and objective phenotyping methods enabling investigations of several hundreds of genotypes in a short time and the fact that several genetic regions are involved in regulating bunch architecture [9,16]. For proof of concept, the population chosen in this work was already elaborated for the segregation of bunch architecture-related traits in the recent study of Richter et al. [17]. In their study, manual and image-based measurements identified 30 stable QTLs for various bunch architecture-related traits over several years [17]. Our results revealed for all three methods (lab, modeled, field) a comparable number of QTLs (13-15) for four investigated traits (note that berry size-related traits are not considered, as they were not used for prediction modeling). Proposed QTL regions showed relatively wide confidence intervals, which is in accordance with known regions determining bunch compactness related traits [17]. Approximately two-thirds of the QTLs found from modelled and field data correspond to the QTLs revealed by the lab data. That means that both modelled data and field data contain partly the same population specific phenotypic variation as the data gathered in the lab, which is required for the detection of a QTL. The results suggest that in the investigated population, both methods are basically suitable for QTL-analysis. On the other hand, one-third of the identified regions are different from the lab approach. Thus, the applied predictive models do not fully restore the phenotypic variation of the lab data.
For the presented proof of concept, one-year data was used. Bunch architecture-related traits are greatly influenced by environmental conditions [9,67]. This can lead to variation in the location of a QTL between the seasons [17]. In order to confirm the position and/or identify possible false-positive QTLs, data of several years must be evaluated and compared to known regions in literature. Although we investigate only on one-year data, QTLs for berry number on chromosome 10 and chromosome 17 were found by all three methods and also proposed by Richter et al. [17]. Moreover, we found for all three methods the same LODmax associated marker VRZAG7 [17]. A QTL for mean berry volume is located on chromosome 12. All three methods locate a QTL for total volume in this study [17]. Total volume can likely be derived from the mean berry volume. These results support our initial assumption that modeled and field data can be used for the investigation of QTLs.
Since the data acquisition with the sensor only takes a few seconds in the field and harvesting the bunches is not necessary, we propose larger screenings of multiple populations or larger genetical repositories over several years and the use of both, the modeled and the field data for further genetic studies. Future studies should also consider if the alteration of the shape by harvesting might have an impact on QTL analysis and detection.

Future Prospects
In our study, we demonstrated that the proposed 3D phenotyping pipeline allows a fast and non-destructive application on grape bunches directly in the field. The future data recording should lead towards a vehicle-driven, more automized approach. The 3D sensor that we used in this study is a structured-light based device and allows a scanning distance range between 25-30 cm from the object and a recording speed between six and seven frames per second. This makes the application on a vehicle, for example, difficult. From a technical point of view, data acquisition would require slow speed and extremely short distances between our used sensor and the grape bunches.
For viticulture, literature offers several examples for an automized data acquisition in the field. The majority of those studies focus on the estimation of yield components based on 2D image approaches, thus the determination of berry number and berry size, directly on the vine [66,68,69]. Liu and Whitty [67] and Pérez-Zavala et al. [68] were able to detect grape bunches by 2D image analysis [70,71]. In a recent study, Di Gennaro et al. [72] used images from a UAV to detect bunches from RGB-images. Based on their detection method, they could predict yield on the vines with R2 = 0.82 [72]. Another possibility for 3D based phenotyping would be the use of another sensor system that would allow a wider range and a faster data acquisition. In our experiments, we used an artificial background in order to prevent false phenotypic results from inferring leaves and/or adjacent bunches. In this case, using an artificial background would be obstructive. The work of Rose et al. demonstrates an interesting procedure. They proposed a more autonomous concept in order to detect yield parameters. With a track-driven vehicle, they collected geotagged images. A multi-view stereo approach was used to reconstruct the 3D structure of a whole grapevine row. Furthermore, a supervised classification with an incremental support vector machine was applied. They could classify the grape bunches in the grapevine row a recall of up to 94% and a precision of 62.8% from the classified grape bunches.
Taken together, these examples show the possibility of detecting bunches and berry traits with 2D as well as 3D techniques. All these methods offer approaches towards the development of a more automized strategy for the assessment of bunch architecture-related traits. This would enable another level in quantity of high-throughput field phenotyping and thus offering new perspectives in understanding this complex trait.

Conclusions
This is the first study that acquired 3D sensor data of approximately 2000 different grape bunches in both environments, non-invasively in the field and under controlled lab conditions. The material used covers a large morphological range (experimental breeding material, OIV 204 reference cultivars, and commercially important cultivars). 3D data acquisition directly in the vineyard resulting in a dense but partial 3D point cloud of the visible side of grape bunches. After an automated extraction of bunch architecture-related traits, different regression models were applied and tested in order to predict the full information of phenotypic traits from partial field scans. We show that despite the highly variable morphology, phenotypic trait data from the visible side of the bunch in the field is related to 360 • lab data. Gathering 360 • data is only possible invasively and laboriously in the lab. Our model comparison approach allows, based on field data, the prediction of the full 360 • phenotypic values with high accuracies. The QTL-analysis further showed that phenotypic data derived directly from the field can be used for genetical analyses. This study is an initial development for more extensive field phenotyping of grape bunch architecture determining traits. The application of our proposed method offers the possibility of screening larger or several experimental populations and genetic repositories. This will promote the identification of genetic regions that are responsible for bunch architecture-related traits. This further can accelerate the development of genetic markers for breeding purposes or the identification of candidate genes within genetic association studies.
In order to increase Botrytis resiliency, breeders could use this 3D pipeline to characterize breeding material and select suitable genotypes with looser bunches. Moreover, these bunch physiological traits can also be used for the investigation of yield forming processes.  Table S3 gives an overview of the ANOVA and Tukey's test results for the residuals of the phenotypic groups. Supplementary Table S4 provides the results of the t-tests for the QTL comparisons. Additional information, including Chr. Spec. Sig, the nearest QTL bordering markers, and the nearest LODmax position markers for the overlapping QTLs, are presented in Supplementary  Table S5.