Next Article in Journal
A Multiscale Productivity Assessment of High Andean Peatlands across the Chilean Altiplano Using 31 Years of Landsat Imagery
Next Article in Special Issue
Analysis of Cold-Developed vs. Cold-Acclimated Leaves Reveals Various Strategies of Cold Acclimation of Field Pea Cultivars
Previous Article in Journal
High-Throughput Phenotyping of Indirect Traits for Early-Stage Selection in Sugarcane Breeding
Previous Article in Special Issue
A Proposed Methodology to Analyze Plant Growth and Movement from Phenomics Data
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Combination of an Automated 3D Field Phenotyping Workflow and Predictive Modelling for High-Throughput and Non-Invasive Phenotyping of Grape Bunches

1
Julius Kühn-Institute, Institute for Grapevine Breeding Geilweilerhof, 76833 Siebeldingen, Germany
2
Institute of Crop Science and Resource Conservation (INRES)–Plant Breeding, University of Bonn, 53113 Bonn, Germany
3
Julius Kühn-Institute, Institute for Crop and Soil Science, Bundesallee 58, 38116 Braunschweig, Germany
4
Institute of Computer Science 4, University of Bonn, Endenicher Allee 19 A, 53115 Bonn, Germany
*
Author to whom correspondence should be addressed.
Remote Sens. 2019, 11(24), 2953; https://doi.org/10.3390/rs11242953
Submission received: 15 October 2019 / Revised: 6 December 2019 / Accepted: 7 December 2019 / Published: 10 December 2019
(This article belongs to the Special Issue Advanced Imaging for Plant Phenotyping)

Abstract

:
In grapevine breeding, loose grape bunch architecture is one of the most important selection traits, contributing to an increased resilience towards Botrytis bunch rot. Grape bunch architecture is mainly influenced by the berry number, berry size, the total berry volume, and bunch width and length. For an objective, precise, and high-throughput assessment of these architectural traits, the 3D imaging sensor Artec® Spider was applied to gather dense point clouds of the visible side of grape bunches directly in the field. Data acquisition in the field is much faster and non-destructive in comparison to lab applications but results in incomplete point clouds and, thus, mostly incomplete phenotypic values. Therefore, lab scans of whole bunches (360°) were used as ground truth. We observed strong correlations between field and lab data but also shifts in mean and max values, especially for the berry number and total berry volume. For this reason, the present study is focused on the training and validation of different predictive regression models using 3D data from approximately 2000 different grape bunches in order to predict incomplete bunch traits from field data. Modeling concepts included simple linear regression and machine learning-based approaches. The support vector machine was the best and most robust regression model, predicting the phenotypic traits with an R2 of 0.70–0.91. As a breeding orientated proof-of-concept, we additionally performed a Quantitative Trait Loci (QTL)-analysis with both the field modeled and lab data. All types of data resulted in joint QTL regions, indicating that this innovative, fast, and non-destructive phenotyping method is also applicable for molecular marker development and grapevine breeding research.

Graphical Abstract

1. Introduction

Grapevines are one of the economically most important fruit crops in the world, with a high requirement for quality and phytosanitary aspects. Fungal pathogens like Plasmopara viticola (P. viticola), Erysiphe necator (E. necator), and Botrytis cinerea (B. cinerea) are a challenge for winemakers around the world, requiring intense crop management, and the application of fungicides during the vegetation period. In the past decade, grapevine breeders have made remarkable progress in breeding new resistant varieties by the combination of important quality traits with active resistance genes against P. viticola and E. necator [1]. In contrast, for B. cinerea, an effective cellular defense mechanism has not been reported for grapevines yet. Therefore, plant resistances against B. cinerea are not available for breeding purposes.
Several studies describe a link between a compact bunch structure and an enhanced risk for bunch rot infestations caused by B. cinereal, and vice versa, a loose bunch structure provides an enhanced resilience towards Botrytis bunch rot [2,3,4,5]. Loose bunch structure is therefore an important selection trait for grapevine breeding [6,7,8]. The degree of compactness is thus strongly related to the combination of berry traits (e.g., berry number and berry size) and stem traits (e.g., the bunch width and length) (a detailed overview is given by Tello & Ibáñez [9]). The combination of these different berry and stem traits represents the grape bunch architecture. For scoring the degree of bunch compactness, a qualitative descriptor was developed by the Organization of Vine and Wine (OIV). The OIV descriptor 204 (OIV 204) rates the level of compactness/density into five categories: class 1 - very loose bunch architecture, up to class 9 - very dense bunch architecture [10]. This categorical classification system is rather superficial, as it does not provide any quantitative information about the sub-traits of the grape bunch architecture that have a high impact on, bunch density, the combination of berry number, total berry volume, and grape bunch length [8,9,11]. Moreover, these traits have a strong link to yields [12]. Precise (quantitative) data on these sub-traits are needed to identify underlying genetic determinants by Quantitative Trait Loci (QTL) analysis, and marker development for SMART-breeding approaches. These tools are needed to develop valid genetic markers, which increase breeding efficiency due to an earlier seedling selection. The bottleneck for using these highly efficient genotyping technologies is the lack of comparably efficient phenotyping methods [13].
In the recent past, few studies have been conducted, applying two- or three-dimensional (2D or 3D) sensor systems. Automatic and semi-automatic image analysis was often used to extract phenotypic traits from Red-Green-Blue (RGB)-images of grapes [14,15,16,17]. Based on the extracted phenotypic traits of 135 table grape bunches of Vitis labruscana cv. Kyoto, Chen et al. performed a predictive modeling approach in order to automatically grade the compactness of bunches from images [15]. In contrast to 2D imaging, 3D data are fully informative and highly precise. An additional dimension gives important information about the shape, structure, and volume of the scanned object [18,19,20]. Several studies have addressed the potential of 3D based plant phenotyping methods within different scenarios. Measurement of vegetative parameters like plant and canopy height, leaf parameters, fruit detection, and characterization of crop and fruit plants have been described (An overview is given by Paulus [21]). Regarding the phenotypically complex and highly varied characteristics of grape bunches: In different studies, 3D scanning was applied under controlled lab conditions in order to obtain the full 360° structure of the bunches [8,22,23,24,25,26,27]. The limitation of several 3D methods is a reduced throughput, which restricts the number of grapes that can be used for analysis. Recently, we established an automated 3D based phenotyping pipeline that is 10-times faster compared with 2D based approaches [8]. The application of the method results in dense and accurate 3D data of the whole grape bunches. Scanning takes only a few seconds (depending on the shape and size of the bunch). The following automated analysis framework (operable over an intuitive graphical user interface) extracts important bunch architecture-related traits from 3D scans like berry number, mean berry diameter and volume, total berry volume (total volume), bunch width and bunch length. In these experiments, high correlation coefficients of up to 0.95 for berry number and total volume within a wide range of different phenotypes (cultivars and specimens of an experimental breeding population) were observed. A correlation of r = 0.9 was observed for berry volume in comparison to 2D ground truth data [8,27]. Higher variations were observed for the correlation coefficients of bunch length between the cultivar and the specimen of the experimental breeding population (0.85–0.57, respectively). Lowest correlation values were observed for bunch width comparisons between 3D and 2D ground truth data (r = 0.59) [8]. However, all these mentioned studies mainly focused on an invasive lab approach. This means that bunches of interest must be harvested and brought to the lab, which is again time-consuming and a source of errors.
Studies on grape bunch development or characterization of grapevine seedlings in the first step of selection require non-invasive phenotyping approaches that must be applied directly in the field.
Most of the filed phenotyping in viticulture regards the prediction of yield parameters. Herrero-Huerta et al. [28] collected images from different overlapping angels of 20 grape bunches. A combination of Photogrammetry quality and computer vision techniques the reconstruction of 3D point clouds of the bunches. They compared the number of berries and weight derived from the point clouds to the actual values and obtained R2 of 0.78 and 0.80, respectively [28]. In the study of Rose et al. [29], a multi-view stereo approach was used to record grape bunches directly at vine rows. They estimated the berry number and berry size parameter from the partial point clouds. Hacking et al. [30] used a Kinect 3D sensor to reconstruct 20 grape bunches from the field. They used fruit volume estimation to predict yield with r = 0.61.
Focusing on grape bunch architecture-related traits, Rist et al. [8] tested the applicability of a structured-light based sensor in the field and conducted a first proof-of-principle study by scanning a smaller subset of 48 grape bunches from four different cultivars. The major difference in the field is that only the visible side of the bunch can be scanned. Scanning in the field is further restricted by the recoding angle. For 3D plant phenotyping, few studies investigate the impact of the recording angles on the estimation of phenotypic traits. Depending on the plant and the measured trait, single recording angles can affect phenotypic measurements [31,32]. The comparison between the field and the 360° scans of the 48 grape bunches revealed that approximately only 30–40% of the bunch structure can be recorded (due to the recording angle and depending on size and shape of the bunch and the presence of secondary bunches). As a result, phenotypic values differ between the two approaches [8]. From a breeder or scientific point of view, the knowledge of the complete 3D information is very important, especially to obtain the total berry number and, thus, the total volume of grapes for yield estimation, genetic studies, or selection of breeding material. Thus, phenotypic data need to be predicted by statistical methods in order to obtain full information from partial 3D field scans. Making predictions based on patterns and relationships in datasets is known as predictive modeling [33]. An enormous number of different regression methods can be used for finding those patterns that explain variation in the dependent variable. Mathematically simpler approaches like linear regression models or more complex machine learning black-box based models are used in order to increase prediction efficacy and minimize prediction errors [34].
The present study addresses the adaptation and validation of a lab-established, high throughput 3D-based phenotyping pipeline for automated and objective analysis of the grape bunch architecture traits under field conditions. The study also aimed to minimize the loss of information due to partial field scans at the prediction of 360° lab scan related phenotypic values, on the basis of the field scans. Therefore, we scanned grapevine material with different genetic backgrounds, representing a wide range of phenotypically variable grape bunches in both field and lab environments. Moreover, a comparable QTL analysis was conducted in order to test the usability of this method for applications in molecular grapevine breeding and genetic research. The study is divided into three main tasks: 1. Assessing the relationships between full scans taken under standardized 360° lab conditions and partial field scans of a preferably broad spectrum of phenotypic variation of nearly 2000 grape bunches; 2. Compare the predictive performance of several regression models with different complexity to predict missing bunch traits like berry number from field scan data, 3. Proof-of-concept that the 3D based phenotypic data are valid for QTL-analysis.

2. Materials and Methods

2.1. Plant Material and Sampling

This study took place at the experimental vineyards at JKI Geilweilerhof in Siebeldingen, Germany. We used a total of 1907 grape bunches (= samples) from V. vinifera. Thereof, 1478 samples came from 150 genotypes of the F1 progeny of GF.GA-47-42 (syn. ‘Calardis Musqué’) x ‘Villard Blanc’ (GF × VB), including parents (Figure 1). This experimental breeding population (four plants of each genotype) was planted twice on two adjoining plots at Geilweilerhof in 2000. An additional 213 samples came from 68 genotypes of another GF × VB cross, planted in 2013 (a single plant each). Further 56 samples came from the OIV 204 reference cultivars of ‘Uva rara’, ‘Pearl of Csaba’, ‘Cardinal’, ‘Chasselas’, ‘Chenin Blanc’, ‘Sauvignon Blanc’, ‘Silvaner’, and ‘Pinot Meunier’, and from the grandparents of GF x VB:‘Bacchus’, ‘Seyval’, Seibel 6468 and ‘Subereux’. All above-mentioned plants were trained in a vertical shoot position (VSP). In addition, 160 samples came from four different cultivars ‘Riesling’, ‘Regent’, ‘Chardonnay’, and ‘Felicia’, that were trained either in VSP or semi minimal pruned hedge (SMPH).
All samples were taken during two consecutive growing seasons in 2017 and 2018 from the basal insertion of the first three central shoots of the fruit cane and cut directly at the shoot at maturity, according to BBCH89 [35]. Every sampled bunch was rated according to the OIV descriptor OIV 204 and OIV 208 (shape) [10]. All vines were treated with best local practice policies for viticulture.

2.2. Field and Lab Scans

For 3D imaging, we used the 3D scanner Artec Spider (Artec 3D, L-1466, Luxembourg). Bunches were scanned in the field from the visible side according to the maximum scanning range of the sensor of 25–30 cm distance. Approximately 30 to 40% of the bunch structure could be recorded depending on bunch shape (secondary bunches) and bunch size. An artificial background was used in the case of inferring leaves and/or adjacent bunches to prevent false phenotypic results. Right after the field scan, bunches were harvested, transported to the lab, and fixed to a hook, which was attached to a motorized device. For the lab scan, bunches were rotated at different speeds (between 0.5 s−1 and 0.16 s−1 depending on size and shape) and scanned for 360° under controlled light conditions. Scanning distance was kept constant in the lab and in the field. According to [8], bunches were scanned until the entire structure was captured (Figure 2a). The produced point clouds of the bunches from field and lab scans (Figure 2b) were stored in polygon file format and analyzed with the software ‘3D Bunch Tool’.
The 3D Bunch Tool works in the following way: A region growing is applied to the point cloud to create segments containing one berry each. In the resulting segments, spheres as parameterized representations of berries are fitted based on the Random Sample Consensus algorithm. All sphere hypotheses containing more than 100 inliers are counted as valid. In a post-processing step, sphere hypotheses that overlap to more than 25% are compared, and only the sphere with the highest number of inliers is kept. From the final set of spheres, the berry number, mean berry diameter (mean diameter) (mm), mean berry volume (mean volume) (mL), and total berry volume (mL) (total volume) as the summed up volumes of all detected spheres are computed. Bunch width (mm), bunch length (mm), and convex hull volume (CVH) (mL) are derived from the points that lie in close distance to the surface of a sphere, with bunch width being the largest distance between two of the points on a horizontal axis and bunch length the largest distance on the vertical axis (Figure 3) [8].

2.3. Statistical Analysis

2.3.1. Correlation between 360°-Lab vs. Field Scans to Investigate Important Grape Bunch traits

To determine the relationship between data from 360° lab and field scans, berry number, mean diameter, mean volume, total volume, CVH, bunch width, and bunch length from the lab and field were compared using Pearson correlation coefficients (α = 0.05) for all phenotypic data. An overview of the complete workflow is shown in Figure 4.

2.3.2. Data Setting and Preprocessing

The experimental plant material (n = 1478 samples) was randomly partitioned in 2/3 training and 1/3 validation data (n = 986 vs. n = 492). It displays a wide range of morphological characteristics (Table 1). All classes of OIV 204 (Class 1: 13%, Class 3: 32%, Class 5: 36%, Class 7: 14%, Class 9: 5%) and OIV 208 (Class 1: 29%, Class 2: 33 %, Class 3: 36%) were present. Approximately 35% of the individuals showed large secondary bunches. Fifteen individuals had a very small amount of berries and uncommon shapes due to flower abscission (less than 40 berries per bunch) (Figure 1) [8,17]. All remaining data were used as an external test dataset (n = 428). The descriptive statistics are shown in Supplementary Table S1. This data set contained a larger genetical background with breeding samples, commercially relevant cultivar samples, and OIV 204 representative specimens. For data preprocessing, missing values were excluded, and some variables were √ and x0.3 transformed to linearize relationships between predictor and dependent variables corresponding to Tukey and Mosteller’s bulging rule [36]. An overview is given in Supplementary Table S2.

2.3.3. Comparison of Different Regression Models

In order to predict the four important phenotypic traits (berry number, bunch width, bunch length, total volume) of 360° lab scans from phenotypic values from field scans, several regression models were compared. In the first approach, a simple (univariate) linear regression model (lm) was fitted with the individual phenotypic trait from the 360° lab scans as a dependent variable and the corresponding trait from the field as a predictor variable. In the second approach, models were extended by all phenotypic traits (berry number, berry diameter, berry volume, bunch width, bunch length, CVH, and total volume) from the field as predictor variables (i.e., multivariate models). Table 1 shows a complete overview of the used regression models. Those regression models included linear regression with stepwise feature selection on raw and transformed data (lm stepAIC), Generalized Linear Model with Stepwise Feature Selection (glmAIC_model), Bayesian Generalized Linear Model (bayes glm), Lasso and Elastic-Net Regularized Generalized Linear Model (glmnet_model), Support Vector Machines with linear and polynomial kernel (svm linear, svm poly) and Random Forest (random forest). The glms were fitted for the berry number with Poisson distribution and log-link for bunch length, bunch width, and total volume with gamma distribution and log-link. In addition, a negative binomial glm with log-link was fitted for berry number (Table 1). The lm, lm stepAIC, glmAIC_model, and bayes glm models were parameterized using tenfold-repeated cross-validation, while the models that require hyperparameters were tuned with an adaptive resampling search on tenfold-repeated cross-validation according to Kuhn 2014 [43].
For model evaluation and quantification of model performance, the Root Mean Squared Error (RMSE) and the Coefficient of Determination (R2) were calculated on training, validation, and test data. We ranked the models according to the lowest error on the respective data set and compared them with each other. Model diagnostics were performed by plotting observed versus predicted values to visualize model fit, including potential outliers and bias. Additionally, in the test data, we assessed the model fit of the best models by performing an analysis of variance (ANOVA) and Tukey’s test on the model residuals of the largest phenotypic groups: i.e., GF × VB, ‘Riesling’, ‘Regent’, ‘Chardonnay’, ‘Felicia’ and with all remaining samples characterized as OIV204 group. Variable importance was determined in a sensitivity analysis by changing parameter values at a time and comparing the effect on the model output [44,45].
All statistical analyses were performed using R version 3.5.2 [46] with the caret package to build the modelling framework [47], the ggplot2 [48], ggpubr [49] and the lattice packages [50] for graphics, the emmeans package for the Tuckey-test [51], and the rminer package [52] for the estimation of variable importance.

2.4. QTL-Analysis

For 150 progenies of the experimental breeding population GF × VB, a QTL-analysis was performed on the data of 2018 with MapQTL6.0 [53]. The software uses as input a genetic map together with phenotypic and molecular marker data of the breeding population. Then the probability of a linkage between phenotypic variation patterns and the segregating marker positions can be calculated for each position on the map, determining the position of a QTL. This probability is estimated by the “logarithm of the odds” (LOD) ratio. It refers to the correlation between the phenotypic values and the corresponding molecular markers of each genotype in the experimental breeding population. We performed an interval mapping approach (IM) and a step size of 1 Centimorgan (cM) with the genetical map from Zyprian et al. [54].
A chromosome-specific trait-linked “logarithm of the odds” (LOD) threshold was calculated with a permutation test (1000 iterations, p < 0.05). Genetic regions that cross this LOD threshold were defined as QTL regions. MapQTL6.0 was used to calculate QTLs [53].
In the next step, we compared QTLs derived from phenotypic data from the lab, the field, and predicted values from the best performing models of the test dataset (see results). We compared interval lengths, the LODmax values, and the LODmax positions between the three data sources (lab, field, and model) using separate paired t-Tests (i.e., lab vs. field and lab vs. model). Separate paired t-Tests were necessary since the number of common QTLs differed between comparisons.
We provide information about the LODmax position with the corresponding values and the explained phenotypic variation. Additional information about chromosome-specific significance thresholds (Chr. Spec. Sig), the nearest QTL bordering markers, and the nearest LODmax position markers are provided in Supplementary Table S5.

3. Results

3.1. Relationship between Lab and Field Scans

The major difference from the point clouds, generated in the field, to the point clouds that are generated in the lab is that they have a considerable amount of missing points resulting in an incomplete point cloud and thus different and mostly lower phenotypic values (Table 2). A high correlation was observed for all parameters recorded 360° in the lab and their respective parameter values recorded in the field. The highest correlations were observed for mean diameter and mean volume (r = 0.94 and 0.96, respectively) with a linear relationship (Table 2). Berry number and total volume correlations were r = 0.89−0.91, with a linear relationship. CVH showed a correlation of 0.79, and bunch length and bunch width showed correlations between r = 0.74 and 0.82, respectively (Table 2).

3.2. Regression Modelling and Comparison of Performance

The different regression models revealed in some but not all cases better performance than the simple linear regression analysis (Table 3). In this study, all phenotypic traits were predicted with R2-values ranging between 0.54−0.82 for the validation data and 0.70−0.91 for the test data (Table 3). The gain in performance often led to slight improvements. Notably, models with svm poly fit showed the lowest RMSE in all cases and an R2 of 0.86 for berry number, 0.71 for bunch width, 0.70 for bunch length, and 0.91 for total volume when applied to the external test data (Table 3).
The comparison of the performance of the best models from the validation data with the performance when applied on the test data revealed for berry number a RMSE of 24.29 with the lm stepAIC model based on transformed data and of 18.20 with the svm poly model (Table 3, Figure 5a,b).
For the bunch width, the RMSE of the lm stepAIC model based on transformed data was 13.80 mm, when applied on the test data, while the svm poly showed a better performance with a RMSE of 13.51 on the test data. (Table 3, Figure 5c,d).
For bunch length, the RMSE of the svm linear model was 19.96 mm for test data, while it was 19.24 mm for the svm poly model (Table 3, Figure 5e,f).
The best performance for total volume was observed for the svm poly on both the validation and the test data with a RMSE of 36.11 mL and 28.09 mL, respectively (Table 3, Figure 5g).

3.3. Residuals over Phenotypic Groups

Residuals of the svm poly models for berry number and total volume did not differ between cultivars (Figure 6a,d, see Supplementary Table S3 for ANOVA and Tukey’s test results). For the trait bunch width, residuals of GF × VB, ‘Felicia’, ‘Regent’, and OIV204 were significantly lower than for ‘Chardonnay’ (Figure 6b, Supplementary Table S3). For bunch length, residuals were on average positive, indicating that the model underestimates the bunch length of the cultivars. This bias was lowest for GF × VB and ‘Riesling’, and highest for ‘Felicia’ and ‘Regent’ (Figure 6c, Supplementary Table S3).

3.4. Variable Importance

The corresponding predictor variable is shown for every respective response trait of the highest importance (Figure 7). For bunch width, several predictor variables showed a comparable and higher amount of importance (Figure 7b). In general, number of berries and total volume showed a high impact on predictor importance for every trait. (Figure 7). The berry size traits mean diameter and mean volume had moderate importance for berry number, bunch width, and total volume (Figure 7b,d) and lower importance for bunch length (Figure 7a,c). The categorical predictor for bunch density OIV 204 showed moderate to low importance for all predicted traits (Figure 7). OIV 208 also showed moderate importance for bunch width but rather small importance for berry number, bunch length, and total volume (Figure 7).

3.5. Comparison of Lab, Modelled, and Field Data for QTL Application

The lab data showed 14 QTLs, the field data 15 QTLs, and the modeled data 13 QTLs for the four parameters berry number, bunch width, bunch length, and total volume (Table 4). Detected QTLs explained between 8% and 13% of phenotypic variance. Comparing the positions of QTLs obtained for lab data, ten QTLs for field data and nine for modeled data showed an overlap, respectively (Table 4). LODmax values, positions, and interval lengths of the overlapping QTL regions did not differ between lab, field, and modeled data (Supplementary Table S4). Comparing QTLs based on 360° lab data the QTLs that are unique on field data are five and four for the modeled data (Table 4)

4. Discussion

4.1. Relationship and Model Comparison Framework

In the present study, a portable, high-resolution 3D imaging sensor was applied directly in the vineyard in order to acquire dense but partial 3D point clouds of the visible side of grape bunches. The resulting partial phenotypic data were correlated with complement phenotypic data extracted from 360° lab scans. We observed strong correlations between both data sets, especially for all berry related traits: berry number, berry diameter, berry volume, and total volume. For 3D based plant phenotyping, a few studies demonstrate that the amount of recording angle(s) can have an impact on the outcome of phenotypic parameter measurements, depending on the measured plant. For instance, Andújar et al. [31] captured RGB-D images of poplar seedlings from different single recording angles but also for 360° in order to determine the best angle for the extraction of plant parameters. For one-year-old poplar plants, 90° front view scans showed the highest similarities in comparison to 360° scans for the estimation in total biomass and height. Sun and Wang [55] used a Kinect v2 sensor to scan tomato plants in the greenhouse. They used different recording angles (three vs. four recording angles) and converted the corresponding RGB-D images into 3D point clouds. The reconstruction methods based on the number of recording angles showed an impact on the outcome of the calculated phenotypic parameters. The height, width, and area of the canopy showed a lower SD and CV for point clouds reconstructed by four different recording angles. In contrast, no significant effect was observed for canopy volume estimation between the different recording angles. In our previous study [8], the robustness and reliability of the phenotyping pipeline (sensor application and ‘3D Bunch Tool’) was already validated on approximately 300 different grape bunches, including 222 samples from the experimental population GF × VB. We found high correlations between the 360° 3D data and proposed ground truth data (e.g., berry number and total volume with correlations, both 0.95). Together with the findings of [8], the observations indicate that the scanning of grape bunches from the visible front side in the field is suitable for the extraction of grape bunch architecture determining traits.
Correlations, however, do not provide information on the shape and slope of relationships. In our studies, the traits berry number, bunch width, bunch length, and total volume showed a shift in mean and max values between field and lab data. Hence, different modeling concepts, including simple linear models and more complex machine learning methods, are required. Notably, for the berry size traits (mean diameter and mean volume), almost no differences in min, mean, and max values were observed between field and lab data. Therefore, modeling was less required for berry size traits.
Every model algorithm follows different mathematical operations, which can lead to strong differences in performance [56]. Thus, it is reasonable to compare different regression models. In various disciplines, there are examples where the comparison of different regression models revealed wider or only slighter differences between models [56,57,58,59].
Focused on the validation data, the simplest regression model with univariate lm showed in comparison to more complex regression models, similar predictive performances, except for total volume and bunch width. For most of the regression models, only minor differences in performance were observed. For example, the RMSE for the berry number of the best three models ranged from 18.49–18.12, which indicates consistent performance across the majority of all tested regression models. The more complex multivariate regression models on transformed data (berry number, bunch width) and the svm models (bunch length, total volume) showed the best predictive performances. Our results demonstrate that direct predictors have a strong linear relationship and the highest importance on the prediction of 360° values, whereas the other additional variables have a wider relationship and hence only small importance for berry number and bunch length. In contrast, the gap in importance between the direct predictor and the remaining predictor variables for bunch width and total volume is not that wide. These phenotypic traits a more complex, and measurement errors during the field scan may be larger in comparison to bunch length when the shape of the bunch is heterogeneous.
To further assess the quality and generalizability of a model, we applied it to independent data that were not used in model building [33,60]. The specimen, used in the test data do have either another genetical background or the same genetical background but were crossed independently in comparison to a specimen of training and validation data, which increased morphological variability. Interestingly the test data displayed differences in observed values, especially at the lower value range, i.e., the test data has extreme values outside the observations of the training data. Hence, in this study, all models were applied to the test data to assess their performance on a dataset with different properties. Particularly OLS-based multivariate regression model may be prone to unusual observations [61], which is also observable in our study. The multivariate regression models on transformed data dropped substantially in performance for the predicted traits berry number and total volume, whereas the svm poly showed the most stable and the best predictive performance for all predicted traits. In contrast, svms are characterized by seeking to minimize the impact of outliers, having the ability for a good generalization [33,62], and have been proven in many different disciplines [56,63,64,65].
The traits berry number, bunch width, and total volume could be generalized better than bunch length in the test data. This is shown by the higher amount of explained variability and hence, lower RMSE in comparison to the validation data. One explanation for this might be that in this data set, a higher amount of morphological uniformity is present as it consists of more uniform replications from different cultivars. The residual plots for the largest phenotypic groups in the test dataset revealed significant differences for bunch length (and bunch width) between cultivars indicating a cultivar specific errors, which might be due to morphological attributes. Diago et al. and Millan et al. observed cultivar dependent differences in the prediction of berry number by 2D images, and they proposed that the different levels of compactness influenced model outcomes [15,66].
Further reasons for model uncertainties might be differences in the experimental procedure between field and lab. Measurements in the field, especially for bunch width and length, are more affected by the recording angle and the bunch position at the vine than 360° measurements made in the lab. Moreover, the natural shape of the bunches is altered after harvest and fixation on a hook in the lab. Lower correlations for the shape parameters (bunch width and bunch length) supports this assumption. Further studies may clarify the impact of morphological attributes and the experimental set up on measurement errors. Taken together, our results showed that the best performing models on the validation data were not the best performing models for the external test data (especially for the prediction of berry number). The grape bunch architecture and related traits can strongly differ between grapevine varieties, breeding seedlings, or genetic resources, and thus, phenotypic variation can vary between years and investigated material. Based on our results, we propose a continuous and wider evaluation of regression models, especially when the phenotypic values of investigated samples reveal stronger differences.

4.2. Application for QTL-Analysis

In the past years, several studies investigated bunch architecture-related traits and could identify several genetic regions attributed to bunch compactness (an overview is given by Tello and Ibanez [9]). Reliable molecular markers are not available for breeding purposes, yet. One of the most challenging reasons for that is the lack of high-throughput and objective phenotyping methods enabling investigations of several hundreds of genotypes in a short time and the fact that several genetic regions are involved in regulating bunch architecture [9,16]. For proof of concept, the population chosen in this work was already elaborated for the segregation of bunch architecture-related traits in the recent study of Richter et al. [17]. In their study, manual and image-based measurements identified 30 stable QTLs for various bunch architecture-related traits over several years [17]. Our results revealed for all three methods (lab, modeled, field) a comparable number of QTLs (13–15) for four investigated traits (note that berry size-related traits are not considered, as they were not used for prediction modeling). Proposed QTL regions showed relatively wide confidence intervals, which is in accordance with known regions determining bunch compactness related traits [17]. Approximately two-thirds of the QTLs found from modelled and field data correspond to the QTLs revealed by the lab data. That means that both modelled data and field data contain partly the same population specific phenotypic variation as the data gathered in the lab, which is required for the detection of a QTL. The results suggest that in the investigated population, both methods are basically suitable for QTL-analysis. On the other hand, one-third of the identified regions are different from the lab approach. Thus, the applied predictive models do not fully restore the phenotypic variation of the lab data.
For the presented proof of concept, one-year data was used. Bunch architecture-related traits are greatly influenced by environmental conditions [9,67]. This can lead to variation in the location of a QTL between the seasons [17]. In order to confirm the position and/or identify possible false-positive QTLs, data of several years must be evaluated and compared to known regions in literature. Although we investigate only on one-year data, QTLs for berry number on chromosome 10 and chromosome 17 were found by all three methods and also proposed by Richter et al. [17]. Moreover, we found for all three methods the same LODmax associated marker VRZAG7 [17]. A QTL for mean berry volume is located on chromosome 12. All three methods locate a QTL for total volume in this study [17]. Total volume can likely be derived from the mean berry volume. These results support our initial assumption that modeled and field data can be used for the investigation of QTLs.
Since the data acquisition with the sensor only takes a few seconds in the field and harvesting the bunches is not necessary, we propose larger screenings of multiple populations or larger genetical repositories over several years and the use of both, the modeled and the field data for further genetic studies. Future studies should also consider if the alteration of the shape by harvesting might have an impact on QTL analysis and detection.

4.3. Future Prospects

In our study, we demonstrated that the proposed 3D phenotyping pipeline allows a fast and non-destructive application on grape bunches directly in the field. The future data recording should lead towards a vehicle-driven, more automized approach. The 3D sensor that we used in this study is a structured-light based device and allows a scanning distance range between 25–30 cm from the object and a recording speed between six and seven frames per second. This makes the application on a vehicle, for example, difficult. From a technical point of view, data acquisition would require slow speed and extremely short distances between our used sensor and the grape bunches.
For viticulture, literature offers several examples for an automized data acquisition in the field. The majority of those studies focus on the estimation of yield components based on 2D image approaches, thus the determination of berry number and berry size, directly on the vine [66,68,69]. Liu and Whitty [67] and Pérez-Zavala et al. [68] were able to detect grape bunches by 2D image analysis [70,71]. In a recent study, Di Gennaro et al. [72] used images from a UAV to detect bunches from RGB-images. Based on their detection method, they could predict yield on the vines with R2 = 0.82 [72]. Another possibility for 3D based phenotyping would be the use of another sensor system that would allow a wider range and a faster data acquisition. In our experiments, we used an artificial background in order to prevent false phenotypic results from inferring leaves and/or adjacent bunches. In this case, using an artificial background would be obstructive. The work of Rose et al. demonstrates an interesting procedure. They proposed a more autonomous concept in order to detect yield parameters. With a track-driven vehicle, they collected geotagged images. A multi-view stereo approach was used to reconstruct the 3D structure of a whole grapevine row. Furthermore, a supervised classification with an incremental support vector machine was applied. They could classify the grape bunches in the grapevine row a recall of up to 94% and a precision of 62.8% from the classified grape bunches.
Taken together, these examples show the possibility of detecting bunches and berry traits with 2D as well as 3D techniques. All these methods offer approaches towards the development of a more automized strategy for the assessment of bunch architecture-related traits. This would enable another level in quantity of high-throughput field phenotyping and thus offering new perspectives in understanding this complex trait.

5. Conclusions

This is the first study that acquired 3D sensor data of approximately 2000 different grape bunches in both environments, non-invasively in the field and under controlled lab conditions. The material used covers a large morphological range (experimental breeding material, OIV 204 reference cultivars, and commercially important cultivars). 3D data acquisition directly in the vineyard resulting in a dense but partial 3D point cloud of the visible side of grape bunches. After an automated extraction of bunch architecture-related traits, different regression models were applied and tested in order to predict the full information of phenotypic traits from partial field scans. We show that despite the highly variable morphology, phenotypic trait data from the visible side of the bunch in the field is related to 360° lab data. Gathering 360° data is only possible invasively and laboriously in the lab. Our model comparison approach allows, based on field data, the prediction of the full 360° phenotypic values with high accuracies. The QTL-analysis further showed that phenotypic data derived directly from the field can be used for genetical analyses. This study is an initial development for more extensive field phenotyping of grape bunch architecture determining traits. The application of our proposed method offers the possibility of screening larger or several experimental populations and genetic repositories. This will promote the identification of genetic regions that are responsible for bunch architecture-related traits. This further can accelerate the development of genetic markers for breeding purposes or the identification of candidate genes within genetic association studies.
In order to increase Botrytis resiliency, breeders could use this 3D pipeline to characterize breeding material and select suitable genotypes with looser bunches. Moreover, these bunch physiological traits can also be used for the investigation of yield forming processes.

Supplementary Materials

Supplementary Materials are available online at https://www.mdpi.com/2072-4292/11/24/2953/s1, Table S1, Table S2, and Table S3. Supplementary Material: Supplementary Table S1 shows the descriptive statistics for lab and field scan data on all three datasets. Supplementary Table S2 describes the conducted transformations of the predictor variables. Supplementary Table S3 gives an overview of the ANOVA and Tukey’s test results for the residuals of the phenotypic groups. Supplementary Table S4 provides the results of the t-tests for the QTL comparisons. Additional information, including Chr. Spec. Sig, the nearest QTL bordering markers, and the nearest LODmax position markers for the overlapping QTLs, are presented in Supplementary Table S5.

Author Contributions

All authors have read and agree to the published version of the manuscript. Conceptualization, F.R., D.G. and K.H.; methodology, F.R. and D.G.; software, J.M. and V.S.; validation; F.R. and D.G.; formal analysis, F.R. and D.G.; investigation, F.R.; resoureces, K.H. and R.T., data curation, F.R.; writing—original draft preparation, F.R., D.G., K.H. and R.T.; writing—review and editing, F.R., K.H., D.G.and J.M.; visualization, F.R. and D.G.; supervision, K.H.; project administration, K.H. and R.T.; funding acquisition, R.T. and V.S.

Funding

This research was funded by the German Research Foundation (Deutsche Forschungsgemeinschaft (DFG), Bonn, Germany (Automated Evaluation and Comparison of Grapevine Genotypes by means of Grape Cluster Architecture, TO 152/6-1 and STE 806/2-1).

Acknowledgments

For excellent technical support, we want to thank Patrick Römer and Margrit Daum (Julius Kühn-Institute, Institute for Grapevine Breeding Geilweilerhof). We want to thank Anna Kicherer (Julius Kühn-Institute, Institute for Grapevine Breeding Geilweilerhof) for the support in selecting the cultivar samples (‘Riesling’, ‘Regent’, ‘Chardonnay’, and ‘Felicia’).

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Töpfer, R.; Hausmann, L.; Harst, M.; Maul, E.; Zyprian, E. New Horizons for Grapevine Breeding. In Methods Temperate Fruit Breeding; Fruit, Vegetable and Cereal Science and Biotechnology; Global Science Books, Ltd.: Ikenobe, Japan, 2011; ISBN 1752-3419. [Google Scholar]
  2. Vail, M.; Marois, J. Grape cluster architecture and the susceptibility of berries to Botrytis cinerea. Phytopathology 1991, 81, 188–191. [Google Scholar] [CrossRef]
  3. Vail, M.E.; Wolpert, J.A.; Gubler, W.D.; Rademacher, M.R. Effect of Cluster Tightness on Botrytis Bunch Rot in Six Chardonnay Clones. Plant Dis. 1998, 82, 107–109. [Google Scholar] [CrossRef] [Green Version]
  4. Hed, B.; Ngugi, H.; TRavis, J. Relationship between cluster compactness and bunch rot in Vignoles grapes. Am. Phytopath Soc. 2009, 93, 1195–1201. [Google Scholar] [CrossRef] [Green Version]
  5. Molitor, D.; Behr, M.; Hoffman, L.; Evers, D. Impact of grape cluster division on cluster morphology and bunch rot epidemic. Am. J. Enol. Vitic. 2012, 63, 508–514. [Google Scholar] [CrossRef]
  6. Gabler, F.M.; Smilanick, J.L.; Mansour, M.; Ramming, D.W.; Mackey, B.E. Correlations of Morphological, Anatomical, and Chemical Features of Grape Berries with Resistance to Botrytis cinerea. Phytopathology 2003, 93, 1263–1273. [Google Scholar] [CrossRef] [Green Version]
  7. Herzog, K.; Wind, R.; Töpfer, R. Impedance of the grape berry cuticle as a novel phenotypic trait to estimate resistance to Botrytis cinerea. Sensors 2015, 15, 12498–12512. [Google Scholar] [CrossRef]
  8. Rist, F.; Herzog, K.; Mack, J.; Richter, R.; Steinhage, V.; Töpfer, R. High-Precision Phenotyping of Grape Bunch Architecture Using Fast 3D Sensor and Automation. Sensors 2018, 18, 763. [Google Scholar] [CrossRef] [Green Version]
  9. Tello, J.; Ibáñez, J. What do we know about grapevine bunch compactness? A state-of-the-art review. Aust. J. Grape Wine Res. 2018, 24, 6–23. [Google Scholar] [CrossRef]
  10. Organization Internationale de la Vigne et du Vin (OIV). OIV Descriptor List for Grape Varieties and Vitis Species; OIV (Office International de la Vigne et du Vin): Paris, France, 2007. [Google Scholar]
  11. Tello, J.; Ibáñez, J. Evaluation of indexes for the quantitative and objective estimation of grapevine bunch compactness. Vitis J. Grapevine Res. 2014, 53, 9–16. [Google Scholar]
  12. Nuske, S.; Achar, S.; Bates, T.; Narasimhan, S.; Singh, S. Yield Estimation in Vineyards by Visual Grape Detection. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, San Francisco, CA, USA, 25–30 September 2011; pp. 2352–2358. [Google Scholar]
  13. Furbank, R.; Tester, M. Phenomics–technologies to relieve the phenotyping bottleneck. Trends Plant Sci. 2011, 16, 635–644. [Google Scholar] [CrossRef]
  14. Kicherer, A.; Roscher, R.; Herzog, K.; Šimon, S.; Förstner, W.; Töpfer, R. BAT (Berry Analysis Tool): A high-throughput image interpretation tool to acquire the number, diameter, and volume of grapevine berries. VITIS J. Grapevine Res. 2015, 52, 129–135. [Google Scholar]
  15. Diago, M.P.; Tardaguila, J.; Aleixos, N.; Millan, B.; Prats-Montalban, J.M.; Cubero, S.; Blasco, J. Assessment of cluster yield components by image analysis. J. Sci. Food Agric. 2015, 95, 1274–1282. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  16. Chen, X.; Ding, H.; Yuan, L.M.; Cai, J.R.; Chen, X.; Lin, Y. New approach of simultaneous, multi-perspective imaging for quantitative assessment of the compactness of grape bunches. Aust. J. Grape Wine Res. 2018, 24, 413–420. [Google Scholar] [CrossRef]
  17. Richter, R.; Gabriel, D.; Rist, F.; Töpfer, R.; Zyprian, E. Identification of co—Located QTLs and genomic regions affecting grapevine cluster architecture. Theor. Appl. Genet. 2018, 132, 1159–1177. [Google Scholar] [CrossRef]
  18. Ma, Y.; Soatto, S.; Košecká, J.; Sastry, S.S. An Invitation to 3-D Vision; Interdisciplinary Applied Mathematics; Springer: New York, NY, USA, 2004; Volume 26, ISBN 978-1-4419-1846-8. [Google Scholar]
  19. Paulus, S.; Behmann, J.; Mahlein, A.K.; Plümer, L.; Kuhlmann, H. Low-Cost 3D Systems: Suitable Tools for Plant Phenotyping. Sensors 2014, 14, 3001–3018. [Google Scholar] [CrossRef] [Green Version]
  20. Vázquez-Arellano, M.; Griepentrog, H.; Reiser, D.; Paraforos, D. 3-D Imaging Systems for Agricultural Applications—A Review. Sensors 2016, 16, 618. [Google Scholar] [CrossRef] [Green Version]
  21. Paulus, S. Measuring crops in 3D: using geometry for plant phenotyping. Plant Methods 2019, 15, 103. [Google Scholar] [CrossRef]
  22. Wahabzada, M.; Paulus, S.; Kersting, K.; Mahlein, A.K. Automated interpretation of 3D laserscanned point clouds for plant organ segmentation. BMC Bioinform. 2015, 16, 248. [Google Scholar] [CrossRef] [Green Version]
  23. Schöler, F.; Steinhage, V. Automated 3D reconstruction of grape cluster architecture from sensor data for efficient phenotyping. Comput. Electron. Agric. 2015, 114, 163–177. [Google Scholar] [CrossRef]
  24. Tello, J.; Cubero, S.; Blasco, J.; Tardaguila, J.; Aleixos, N.; Ibáñez, J. Application of 2D and 3D image technologies to characterise morphological attributes of grapevine clusters. J. Sci. Food Agric. 2016, 96, 4575–4583. [Google Scholar] [CrossRef] [Green Version]
  25. Mack, J.; Trakowski, A.; Rist, F.; Herzog, K.; Töpfer, R. Experimental Evaluation of the Performance of Local Shape Descriptors for the Classification of 3D Data in Precision Farming. J. Comput. Commun. 2017, 5, 1–12. [Google Scholar] [CrossRef] [Green Version]
  26. Mack, J.; Lenz, C.; Teutrine, J.; Steinhage, V. High-precision 3D detection and reconstruction of grapes from laser range data for efficient phenotyping based on supervised learning. Comput. Electron. Agric. 2017, 135, 300–311. [Google Scholar] [CrossRef]
  27. Mack, J.; Schindler, F.; Rist, F.; Herzog, K.; Töpfer, R.; Steinhage, V. Semantic labeling and reconstruction of grape bunches from 3D range data using a new RGB-D feature descriptor. Comput. Electron. Agric. 2018, 155, 96–102. [Google Scholar] [CrossRef]
  28. Herrero-Huerta, M.; González-Aguilera, D.; Rodriguez-Gonzalvez, P.; Hernández-López, D. Vineyard yield estimation by automatic 3D bunch modelling in field conditions. Comput. Electron. Agric. 2015, 110, 17–26. [Google Scholar] [CrossRef]
  29. Rose, J.C.; Kicherer, A.; Wieland, M.; Klingbeil, L.; Töpfer, R.; Kuhlmann, H. Towards automated large-scale 3D phenotyping of vineyards under field conditions. Sensors 2016, 16, 2136. [Google Scholar] [CrossRef] [Green Version]
  30. Hacking, C.; Poona, N.; Manzan, N.; Poblete-Echeverría, C. Investigating 2-D and 3-D Proximal Remote Sensing Techniques for Vineyard Yield Estimation. Sensors 2019, 19, 3652. [Google Scholar] [CrossRef] [Green Version]
  31. Andújar, D.; Fernández-Quintanilla, C.; Dorado, J. Matching the Best Viewing Angle in Depth Cameras for Biomass Estimation Based on Poplar Seedling Geometry. Sensors 2015, 15, 12999–13011. [Google Scholar] [CrossRef] [Green Version]
  32. Yang, H.; Wang, X.; Sun, G. Three-Dimensional Morphological Measurement Method for a Fruit Tree Canopy Based on Kinect Sensor Self-Calibration. Agronomy 2019, 9, 741. [Google Scholar] [CrossRef] [Green Version]
  33. Kuhn, M.; Johnson, K. Applied Predictive Modeling; Springer: New York, NY, USA, 2013; ISBN 978-1-4614-6848-6. [Google Scholar]
  34. Breiman, L. Statistical modeling: The two cultures. Stat. Sci. 2001, 16, 199–215. [Google Scholar] [CrossRef]
  35. Coombe, B.G. Growth Stages of the Grapevine: Adoption of a system for identifying grapevine growth stages. Aust. J. Grape Wine Res. 1995, 1, 104–110. [Google Scholar] [CrossRef]
  36. Mosteller, F.; Tukey, J.W.; John, W. Data Analysis and Regression: A Second Course in Statistics; Addison-Wesley Pub. Co.: Boston, MA, USA, 1977; ISBN 9780201048544. [Google Scholar]
  37. Hastie, T.; Tibshirani, R.; Friedman, J. The Elements of Statistical Learning; Springer: New York, NY, USA, 2009; ISBN 978-0-387-84857-0. [Google Scholar]
  38. Burnham, K.P.; Anderson, D.R.; Burnham, K.P. Model Selection and Multimodel Inference: A Practical Information-Theoretic Approach; Springer: Berlin/Heidelberg, Germany, 2002; ISBN 9780387224565. [Google Scholar]
  39. Faraway, J.J. Extending the Linear Model with R: Generalized Linear, Mixed Effects and Nonparametric Regression Models; CRC Press: Boca Raton, FL, USA, 2006; ISBN 158488424X. [Google Scholar]
  40. Gelman, A.; Jakulin, A.; Pittau, M.G.; Su, Y.S. A weakly informative default prior distribution for logistic and other regression models. Ann. Appl. Stat. 2008, 2, 1360–1383. [Google Scholar] [CrossRef]
  41. Ripley, B.D.; Venables, W.N.; William, N. Modern Applied Statistics with S; Springer: Berlin/Heidelberg, Germany, 2013; ISBN 9780387217062. [Google Scholar]
  42. Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef] [Green Version]
  43. Kuhn, M. Futility Analysis in the Cross-Validation of Machine Learning Models. arXiv 2014, arXiv:14056974. [Google Scholar]
  44. Kewley, R.H.; Embrechts, M.J.; Breneman, C. Data strip mining for the virtual design of pharmaceuticals with neural networks. IEEE Trans. Neural Netw. 2000, 11, 668–679. [Google Scholar] [CrossRef] [Green Version]
  45. Cortez, P.; Embrechts, M.J. Using sensitivity analysis and visualization techniques to open black box data mining models. Inf. Sci. (N. Y.) 2013, 225, 1–17. [Google Scholar] [CrossRef] [Green Version]
  46. R Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2018; Available online: http://www.r-project.org (accessed on 15 October 2019).
  47. Kuhn, M. Building predictive models in R using the caret package. J. Stat. Softw. 2008, 28, 1–26. [Google Scholar] [CrossRef] [Green Version]
  48. Wickham, H. Ggplot2: Elegant Graphics for Data Analysis; Springer: New York, NY, USA, 2009; ISBN 978-0-387-98140-6. [Google Scholar]
  49. Kassambara, A. ggpubr: “ggplot2” Based Publication Ready Plots. R Package Version 0.1.7. 2018. CRAN Repository. Available online: https://CRAN.R-project.org/package=ggpubr (accessed on 15 October 2019).
  50. Cheshire, J. Lattice: Multivariate Data Visualization with R. J. R. Stat. Soc. Ser. A Stat. Soc. 2009, 173, 275–276. [Google Scholar] [CrossRef]
  51. Russell, L. Emmeans: Estimated Marginal Means, Aka Least-Squares Means. R Packag. Version 1.1.2. 2018. Available online: https://CRAN.R-project.org/package=emmeans (accessed on 15 October 2019).
  52. Cortez, P. Data Mining with Neural Networks and Support Vector Machines Using the R/rminer Tool; Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Springer: Berlin/Heidelberg, Germany, 2010. [Google Scholar]
  53. Van Ooijen, J.W. MapQTL ® 6, Software for the Mapping of Quantita Tive Trait Loci in Experimental Populations of Diploid Species; Kyazma B.V.: Wageningen, The Netherlands, 2009. [Google Scholar]
  54. Zyprian, E.; Ochßner, I.; Schwander, F.; Šimon, S.; Hausmann, L.; Bonow-Rex, M.; Moreno-Sanz, P.; Grando, M.S.; Wiedemann-Merdinoglu, S.; Merdinoglu, D.; et al. Quantitative trait loci affecting pathogen resistance and ripening of grapevines. Mol. Genet. Genom. 2016, 291, 1573–1594. [Google Scholar] [CrossRef]
  55. Sun, G.; Wang, X. Three-Dimensional Point Cloud Reconstruction and Morphology Measurement Method for Greenhouse Plants Based on the Kinect Sensor Self-Calibration. Agronomy 2019, 9, 596. [Google Scholar] [CrossRef] [Green Version]
  56. Gredell, D.A.; Schroeder, A.R.; Belk, K.E.; Broeckling, C.D.; Heuberger, A.L.; Kim, S.Y.; King, D.A.; Shackelford, S.D.; Sharp, J.L.; Wheeler, T.L.; et al. Comparison of Machine Learning Algorithms for Predictive Modeling of Beef Attributes Using Rapid Evaporative Ionization Mass Spectrometry (REIMS) Data. Sci. Rep. 2019, 5, 5721. [Google Scholar] [CrossRef] [Green Version]
  57. Canizo, B.V.; Escudero, L.B.; Pellerano, R.G.; Wuilloud, R.G. Data mining approach based on chemical composition of grape skin for quality evaluation and traceability prediction of grapes. Comput. Electron. Agric. 2019, 162, 514–522. [Google Scholar] [CrossRef]
  58. Barnard, D.M.; Germino, M.J.; Pilliod, D.S.; Arkle, R.S.; Applestein, C.; Davidson, B.E.; Fisk, M.R. Cannot see the random forest for the decision trees: selecting predictive models for restoration ecology. Restor. Ecol. 2019, 27, 1053–1063. [Google Scholar] [CrossRef]
  59. Maimaitiyiming, M.; Sagan, V.; Sidike, P.; Kwasniewski, M.T. Dual activation function-based Extreme Learning Machine (ELM) for estimating grapevine berry yield and quality. Remote Sens. 2019, 11, 740. [Google Scholar] [CrossRef] [Green Version]
  60. Houlahan, J.E.; McKinney, S.T.; Anderson, T.M.; McGill, B.J. The priority of prediction in ecological understanding. Oikos 2017, 126, 1–7. [Google Scholar] [CrossRef]
  61. Anderson, R. Modern Methods for Robust Regression—Robert Andersen—Google Books; Sage: Newcastle upon Tyne, UK, 2008. [Google Scholar]
  62. Yao, X.J.; Panaye, A.; Doucet, J.P.; Zhang, R.S.; Chen, H.F.; Liu, M.C.; Hu, Z.D.; Fan, B.T. Comparative Study of QSAR/QSPR Correlations Using Support Vector Machines, Radial Basis Function Neural Networks, and Multiple Linear Regression. J. Chem. Inf. Comput. Sci. 2004, 44, 1257–1266. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  63. Al Zoubi, O.; Wong, C.K.; Kuplicki, R.T.; Yeh, H.W.; Mayeli, A.; Refai, H.; Paulus, M.; Bodurka, J. Predicting age from brain EEG signals-a machine learning approach. Front. Aging Neurosci. 2018, 10, 184. [Google Scholar] [CrossRef] [Green Version]
  64. Papachristou, N.; Puschmann, D.; Barnaghi, P.; Cooper, B.; Hu, X.; Maguire, R.; Apostolidis, K.; Conley, Y.P.; Hammer, M.; Katsaragakis, S.; et al. Learning from data to predict future symptoms of oncology patients. PLoS ONE 2018, 13, e0208808. [Google Scholar] [CrossRef]
  65. Han, L.; Yang, G.; Dai, H.; Xu, B.; Yang, H.; Feng, H.; Li, Z.; Yang, X. Modeling maize above-ground biomass based on machine learning approaches using UAV remote-sensing data. Plant Methods 2019, 15, 10. [Google Scholar] [CrossRef] [Green Version]
  66. Millan, B.; Velasco-Forero, S.; Aquino, A.; Tardaguila, J. On-the-go grapevine yield estimation using image analysis and boolean model. J. Sens. 2018, 2018, 9634752. [Google Scholar] [CrossRef]
  67. Li-Mallet, A.; Rabot, A.; Geny, L. Factors controlling inflorescence primordia formation of grapevine: Their role in latent bud fruitfulness? A review. Botany 2015, 94, 147–163. [Google Scholar] [CrossRef]
  68. Nuske, S.; Wilshusen, K.; Achar, S.; Yoder, L.; Singh, S. Automated visual yield estimation in vineyards. J. Field Robot. 2014, 31, 837–860. [Google Scholar] [CrossRef]
  69. Kicherer, A.; Herzog, K.; Pflanz, M.; Wieland, M.; Rüger, P.; Kecke, S.; Kuhlmann, H.; Töpfer, R. An automated field phenotyping pipeline for application in grapevine research. Sensors 2015, 15, 4823–4836. [Google Scholar] [CrossRef] [PubMed]
  70. Pérez-Zavala, R.; Torres-Torriti, M.; Cheein, F.A.; Troni, G. A pattern recognition strategy for visual grape bunch detection in vineyards. Comput. Electron. Agric. 2018, 151, 136–149. [Google Scholar] [CrossRef]
  71. Liu, S.; Whitty, M. Automatic grape bunch detection in vineyards with an SVM classifier. J. Appl. Log. 2015, 13, 643–653. [Google Scholar] [CrossRef]
  72. Di Gennaro, S.F.; Toscano, P.; Cinat, P.; Berton, A.; Matese, A. A Low-Cost and Unsupervised Image Recognition Methodology for Yield Estimation in a Vineyard. Front. Plant Sci. 2019, 10, 559. [Google Scholar] [CrossRef] [Green Version]
Figure 1. Specimen of GF.GA-47-42 (syn. ‘Calardis Musqué’) x ‘Villard Blanc’(GF x VB)—Example of morphological variability between specimens of the GF x VB experimental population. Different degrees of compactness, berry numbers and secondary bunches are represented in the population.
Figure 1. Specimen of GF.GA-47-42 (syn. ‘Calardis Musqué’) x ‘Villard Blanc’(GF x VB)—Example of morphological variability between specimens of the GF x VB experimental population. Different degrees of compactness, berry numbers and secondary bunches are represented in the population.
Remotesensing 11 02953 g001
Figure 2. Application of the 3D sensor in the field and in the lab and resulting point clouds. Scanning in the field with artificial background from the visible side of the bunch (left) and with the bunch on motorized device 360° in the lab (right) (a). Point cloud examples of three scanned grape bunches in the field and correspondingly in the lab (b).
Figure 2. Application of the 3D sensor in the field and in the lab and resulting point clouds. Scanning in the field with artificial background from the visible side of the bunch (left) and with the bunch on motorized device 360° in the lab (right) (a). Point cloud examples of three scanned grape bunches in the field and correspondingly in the lab (b).
Remotesensing 11 02953 g002
Figure 3. Example of point cloud analysis. The raw point cloud is first segmented into connected regions. In the next step, spheres were fitted based on the RANSAC-algorithm, and phenotypic values are extracted.
Figure 3. Example of point cloud analysis. The raw point cloud is first segmented into connected regions. In the next step, spheres were fitted based on the RANSAC-algorithm, and phenotypic values are extracted.
Remotesensing 11 02953 g003
Figure 4. Schematic workflow applied in this study.
Figure 4. Schematic workflow applied in this study.
Remotesensing 11 02953 g004
Figure 5. Predicted vs. observed values. Comparison of predictions for berry number (a,b), bunch width (c,d), bunch length (e,f), and total volume (g). Predictions are based on the best models from the validation data and applied on the test data (a,c,e,g) and on the best models from the test data (b,d,f,g) (in all cases, multivariate models). On the x-axis are predicted values, on the y-axis, the observed values. The bisecting line is black, and the regression line blue.
Figure 5. Predicted vs. observed values. Comparison of predictions for berry number (a,b), bunch width (c,d), bunch length (e,f), and total volume (g). Predictions are based on the best models from the validation data and applied on the test data (a,c,e,g) and on the best models from the test data (b,d,f,g) (in all cases, multivariate models). On the x-axis are predicted values, on the y-axis, the observed values. The bisecting line is black, and the regression line blue.
Remotesensing 11 02953 g005
Figure 6. Distribution of residuals according to the phenotypic groups in the test dataset. Residuals for prediction with the Support Vector Machines with polynomial kernal (svm poly) for berry number, (a) bunch width, (b) bunch length, and (c) total volume (d). Groups that are indicated by the same letter do not differ at an alpha of 0.05 (Tukey’s test).
Figure 6. Distribution of residuals according to the phenotypic groups in the test dataset. Residuals for prediction with the Support Vector Machines with polynomial kernal (svm poly) for berry number, (a) bunch width, (b) bunch length, and (c) total volume (d). Groups that are indicated by the same letter do not differ at an alpha of 0.05 (Tukey’s test).
Remotesensing 11 02953 g006
Figure 7. Variable importance for the predicted phenotypic traits for the best performing models on test data. Svm poly fit for berry number, (a) bunch width, (b) bunch length, and (c) total volume.
Figure 7. Variable importance for the predicted phenotypic traits for the best performing models on test data. Svm poly fit for berry number, (a) bunch width, (b) bunch length, and (c) total volume.
Remotesensing 11 02953 g007
Table 1. Regression models and methods used in this study for the prediction of phenotypic parameters.
Table 1. Regression models and methods used in this study for the prediction of phenotypic parameters.
ModelConceptHyper ParametersConcept Description
Lm 1Simple linear regression noneThe linear regression is based on ordinary least squares (OLS) and estimates the linear effect of one or more explanatory variables on the dependent variable [37].
lm stepAIC 2Multiple linear regression with stepwise feature selection for raw datanoneThe multivariate linear regression model is also based on OLS but performs a stepwise feature selection according to the Akaike information criterion (AIC) on raw and transformed data as predictor variables [37]. The AIC is used to find the most parsimonious model among a set of candidate models as it weights the goodness of fit against the number of estimated parameters in a model [38].
Multiple linear regression with stepwise feature selection for transformed datanone
glmAIC_model 2Generalized Linear Model with Stepwise Feature Selection noneThe generalized linear models are an extension to a linear model since it allows different probability distribution for the response variable (e.g., Poisson) and different link functions (e.g., log), which specifies the relationship between the mean of the response and the covariates through the linear predictor [39]. Also, for Generalized Linear Models GLMs, a stepwise feature section based on AIC was applied.
bayes glm 2Bayesian Generalized Linear Model noneA generalized linear model, extended with a posterior probability, i.e., a prior distribution as recommended by [39,40].
glm_NB 2Negative Binomial Generalized Linear Model linkThe generalized linear model assuming a negative binomial distribution [41].
glmnet_model 2Lasso and Elastic-Net Regularized Generalized Linear Model alpha, lambdaGeneralized linear model with lasso (alpha = 1), ridge (alpha = 0) or elastic-net regularization (alpha > 0 and < 1) is a useful method for the N << p case (i.e., relatively small sample size in comparison to many explanatory variables) since it prevents overfitting through a correction factor lambda, which shrinks coefficient estimates towards or to zero [37]. These methods usually decrease the error variance at the cost of introducing some bias.
svm 2Support Vector Machines with Linear Kernel CThe support vector machine is a machine learning algorithm useful for classification and regression. It minimizes the prediction error by transforming the predictor variables into higher dimensions using kernel functions [37].
Support Vector Machines with Polynomial Kernel degree, scale, C
rf 2Random forest mtry, splitrule, min.node.sizeRandom forest is an ensemble learning method useful for classification and regression. It is based on bagging (bootstrap aggregation) but modified in a way that it builds numerous de-correlated decision trees, which get averaged to reduce variance [37,42].
1 univariate model; 2 multivariate model.
Table 2. Results of the correlation analysis between phenotypic bunch traits receiving from field and lab scans. Correlations between all phenotypic traits (n = 1907) recorded in the field and in the lab.
Table 2. Results of the correlation analysis between phenotypic bunch traits receiving from field and lab scans. Correlations between all phenotypic traits (n = 1907) recorded in the field and in the lab.
Field vs. Labr
Berry Number0.89
Mean Diameter0.94
Mean Volume0.96
Total Volume0.91
CVH0.79
Bunch Width0.74
Bunch Length0.82
Table 3. Overview of the regression model outcome for the phenotypic traits. Prediction for berry number, bunch width (mm), bunch length (mm) and total volume (mL). Shown are the Root Mean Squared Error (RMSE), the Coefficient of Determination (R2) for the training, validation, and test datasets for each individual model. Best model performances are highlighted in bold. The linear regression models (lm) were univariate; all remaining models were multivariate.
Table 3. Overview of the regression model outcome for the phenotypic traits. Prediction for berry number, bunch width (mm), bunch length (mm) and total volume (mL). Shown are the Root Mean Squared Error (RMSE), the Coefficient of Determination (R2) for the training, validation, and test datasets for each individual model. Best model performances are highlighted in bold. The linear regression models (lm) were univariate; all remaining models were multivariate.
Berry Number
DatasetTrainingValidationTest
ModelRMSER2RMSER2RMSER2
lm28.030.7427.790.7718.580.86
lm stepAIC untransformed data25.750.7927.720.7519.050.85
lm stepAIC transformed data26.640.7726.450.7924.290.83
bayes glm untransformed data29.180.7532.710.6625.690.79
glmnet_model25.740.7927.650.7518.630.85
glmAIC_model29.290.7532.710.6625.690.79
glm_NB25.930.7927.560.7518.490.85
svm linear25.870.7927.620.7518.210.86
svm poly25.770.7927.500.7618.200.86
random forest26.950.7727.840.7518.530.85
Bunch Width [mm]
DatasetTrainingValidationTest
ModelRMSER2RMSER2RMSER2
lm16.380.4316.330.4618.610.58
lm stepAIC untransformed data14.060.5914.760.5314.280.71
lm stepAIC transformed data14.040.5814.450.5813.800.71
bayes glm untransformed data14.610.5615.080.5215.740.69
glmnet_model14.020.5914.650.5415.020.71
glmAIC_model14.530.5615.100.5215.570.70
svm linear14.070.5814.740.5414.300.71
svm poly13.910.6014.700.5413.510.71
random forest14.170.5814.690.5414.570.70
Bunch Length [mm]
Data SetTrainingValidationTest
ModelRMSER2RMSER2RMSER2
lm17.690.6618.440.6119.910.68
lm stepAIC untransformed data17.600.6517.430.6619.700.68
lm stepAIC transformed data17.330.6718.120.6220.020.69
bayes glm untransformed data17.840.6517.850.6520.490.66
glmnet_model17.580.6617.350.6619.760.68
glmAIC_model17.740.6517.840.6520.620.66
svm linear17.740.6517.260.6719.960.67
svm poly17.640.6517.400.6619.240.70
random forest17.770.6518.040.6421.030.64
Total Volume [mL]
DatasetTrainingValidationTest
ModelRMSER2RMSER2RMSER2
lm41.540.7741.810.7728.580.91
lm stepAIC untransformed data40.280.8036.340.8229.570.91
lm stepAIC transformed data39.250.8038.800.8132.020.90
bayes glm untransformed data51.470.7247.850.7434.480.89
glmnet_model40.230.8036.390.8229.190.91
glmAIC_model51.740.7347.860.7434.470.89
svm linear40.370.8036.540.8228.290.91
svm poly40.340.7936.110.8228.090.91
random forest41.440.7836.920.8130.410.90
Table 4. Overview of Qualitative Trait Loci (QTL) revealed by the three different datasets. Given are the LODmax positions (cM) with the respective LODmax values and the explained phenotypic variance.
Table 4. Overview of Qualitative Trait Loci (QTL) revealed by the three different datasets. Given are the LODmax positions (cM) with the respective LODmax values and the explained phenotypic variance.
Population GF x VB (n =150)
LabModelField
TraitChromosomeLODmax PositionLODmaxChr. Spec. SigLODmax PositionLODmaxChr. Spec. SigLODmax PositionLODmaxChr. Spec. Sig
Berry Number47.474.16311.144.153.111.144.413
7 45.912.852.8
8 21.223.113
9 12.092.972.912.094.062.9
1069.863.062.869.863.01369.863.462.9
12 40.353.3340.353.042.8
1357.062.932.9
176.323.252.86.323.172.76.323.272.7
1942.443.592.9
Width1069.863.772.913.753.723.113.753.863
18 2.003.143.1
1949.403.392.9 30.823.112.9
Length135.344.103.235.343.343.235.343.933.2
7 72.782.922.8
839.113.543.120.483.772.939.113.653.3
91.003.103.110.624.142.89.623.342.9
10 44.953.042.9
1262.973.313
1930.823.352.930.824.42.930.823.992.9
Total Volume7
1262.973.602.863.9683.52.963.973.612.8
1942.442.982.7 -
QTL Total14 13 15
Match 9 10

Share and Cite

MDPI and ACS Style

Rist, F.; Gabriel, D.; Mack, J.; Steinhage, V.; Töpfer, R.; Herzog, K. Combination of an Automated 3D Field Phenotyping Workflow and Predictive Modelling for High-Throughput and Non-Invasive Phenotyping of Grape Bunches. Remote Sens. 2019, 11, 2953. https://doi.org/10.3390/rs11242953

AMA Style

Rist F, Gabriel D, Mack J, Steinhage V, Töpfer R, Herzog K. Combination of an Automated 3D Field Phenotyping Workflow and Predictive Modelling for High-Throughput and Non-Invasive Phenotyping of Grape Bunches. Remote Sensing. 2019; 11(24):2953. https://doi.org/10.3390/rs11242953

Chicago/Turabian Style

Rist, Florian, Doreen Gabriel, Jennifer Mack, Volker Steinhage, Reinhard Töpfer, and Katja Herzog. 2019. "Combination of an Automated 3D Field Phenotyping Workflow and Predictive Modelling for High-Throughput and Non-Invasive Phenotyping of Grape Bunches" Remote Sensing 11, no. 24: 2953. https://doi.org/10.3390/rs11242953

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop