3D Assessment of Vine Training Systems Derived from Ground-Based RGB-D Imagery

: In the ﬁeld of computer vision, 3D reconstruction of crops plays a crucially important role in agriculture. On-ground assessment of geometrical features of vineyards is of vital importance to generate valuable information that enables producers to take the optimum actions in terms of agricultural management. A training system of vines ( Vitis vinifera L.), which involves pruning and a trellis system, results in a particular vine architecture, which is vital throughout the phenological stages. Pruning is required to maintain the vine’s health and to keep its productivity under control. The creation of 3D models of vineshoots is of crucial importance for management planning. Volume and structural information can improve pruning systems, which can increase crop yield and improve crop management. In this experiment, an RGB-D camera system, namely Kinect v2, was used to reconstruct 3D vine models, which were used to determine shoot volume on eight differentiated vineyard training systems: Lyre, GDC (Geneva Double Curtain), Y-Trellis, Pergola, Single Curtain, Smart Dyson, VSP (Vertical Shoot Positioned), and the head-trained Gobelet. The results were compared with dry biomass ground truth-values. Dense point clouds had a substantial impact on the connection between the actual biomass measurements in four of the training systems (Pergola, Curtain, Smart Dyson and VSP). For the comparison of actual dry biomass and RGB-D volume and its associated 3D points, strong linear ﬁts were obtained. Signiﬁcant coefﬁcients of determination (R 2 = 0.72 to R 2 = 0.88) were observed according to the number of points connected to each training system separately, and the results revealed good correlations with actual biomass and volume values. When comparing RGB-D volume to weight, Pearson’s correlation coefﬁcient increased to 0.92. The results reveal that the RGB-D approach is also suitable for shoot reconstruction. The research proved how an inexpensive optical sensor can be employed for rapid and reproducible 3D reconstruction of vine vegetation that can improve cultural practices such as pruning, canopy management and harvest.


Introduction
The European Union (EU) accounts for 45% of global wine-growing areas totaling more than three million cultivated hectares [1]. Hence, the importance of the wine industry, which represents a sizable portion of EU agriculture [2]. The worldwide enormous economic, social, and environmental importance of viticulture and wine industries [3][4][5] drives the development and deployment of novel technology targeted at vineyard monitoring in order to improve grape and wine quality [6,7]. Furthermore, increased mechanization and automation of vineyard and winery tasks are required due to labor shortages [8], soaring wage costs [9] and a tendency towards larger wineries [10]. Thus, the trellis system should be adapted to the particular conditions required for every agronomical management. A trellis system supports and accommodates the vines, allowing for a variety of training systems designed to maximize productivity and fruit quality. Trellis sion by employing different techniques in order to improve the accuracy of 3D models [27]. Light detection and ranging (LiDAR) have been explored in a myriad of possibilities. In terms of ground-based research, these range from the ability to differentiate pests [28], obtaining vineyard productivity maps [29], estimating vine biomass [30,31], leaf area [32], tree row volume [33] and parameters such as canopy porosity and crown surface [34]. Among these imaging techniques, RGB-D sensors-also called range imaging cameras, can function on three main operating principles: structured light (SL), time-of-flight (ToF), and active infrared stereoscopy (AIRS) i.e., with the support of unstructured light illumination to gather information even on low-textured surfaces [35]. SL and AIRS are both based on the same triangulation principle, while ToF provides a direct measurement. At outdoor applications, ToF, though affected, has good enough performance even under strong illumination conditions to estimate depth [36], while AIRS is also sensitive to high lighting environments or even unable as SL sensors [37]. Intel has recently launched the Realsense series, resuming the R200 family through D400 series [38], based on AIRS superseding the Kinect generation in terms of popularity [39]. Nevertheless, these cameras have not yet been generally adopted in agriculture. The research has been limited to certain applications [40]: in detecting apples [41] and in vineyards [42], these cameras are suited to fruit localization tasks [35,43], particularly for object size estimation in close-range outdoor applications [44]. For example, Milella et al. [42], studying grapevine phenotyping through deep learning, installed a R200 as an AIRS depth camera on an agricultural vehicle to detect and count the grapevine bunches and measure canopy volume. It is worth noting that the depth camera election must be based on the application, in balance, especially in terms of the lighting conditions and the distance at which the camera will be used.
Similar in scope and cost [42], though less expensive, the ToF Kinect v2 was never intended for field operations by the manufacturer. Although being discontinued in 2015, the high number of devices in circulation allows for continuous usage in precision agriculture. The inexpensive Microsoft Kinect v2 sensor, due to its robustness and higher performance, particularly when compared to SL Kinect v1, has gained prominence in recent years as an amenable device to generate 3D reconstructions. The cost is lower than that of LiDAR systems, and the extra information supplied by the low-cost Kinect v2 makes it a viable alternative to expensive laser-based sensors [45]. Additionally, it has a better resolution and is capable of processing a greater volume of data while maintaining a broader field of vision [46]. Moreover, 3D-point clouds obtained via LiDAR are intrinsically dependent on the GPS systems that are employed in order to geo-position the sensor within the crop. Therefore, the receiver inaccuracy is transferred (i.e., the associated error) to the LiDAR scans and, consequently, to plant characteristics [47]. In contrast, the use of Kinect v2 sensors avoids using this technology when measuring small areas [48], thus lowering the research budget and easing postprocessing calculations. These features indicate why the Kinect v2 has proven to be so popular as a low-cost alternative to LiDAR systems for 3D crop characterization [49], being employed in vineyards to estimate above-ground biomass [26,48] and dormant pruning weight [16].
Thus, geometric characteristics and plant structure in assessing vineshoot biomass through wood volume calculation, may be assessed on the complete agricultural field utilizing contactless and non-destructive optical technologies. Using more precise crop sensing and reconstruction methodologies would allow information to be extracted from 3D models, thereby improving decision-making. Estimating yearly transient vineshoot volume as a surrogate of biomass and geometrical structure can improve training systems and thereby pruning systems and trellis constructions, improving crop output and management. In addition, the producer can benefit from comprehending within-vineyard variability according to each training system by concentrating on attaining balanced vegetative growth and reproductive development for each vine within a vineyard. Using 3D characterization with contactless sensing techniques could be beneficial for monitoring vineshoot volume in a non-destructive and unbiased manner. Furthermore, due to the fact that dormant pruning must be performed during the winter months, it requires arduous labor in frequently harsh weather conditions. Therefore, the aim of this study is to assess the performance and suitability of the Kinect v2 in terms of geometric characterization. The system was tested on different grapevine training systems through the calculation of vineshoot volume, i.e., estimating dormant pruning weight.

Site Location
All experiments were performed in a 1 ha vineyard field located at the Madrid region (Spain 40 • 8 N, 3 • 22 W; altitude 750 m a.s.l.) and managed by IMIDRA. The variety was Cabernet Sauvignon (Vitis vinifera L.) in every training system. Field measurements were made in January 2020 during the phenological period of dormancy and vines were completely defoliated. The vineyard belonging to the geographical indication of Vinos de Madrid was planted in 1999 with an inter and intra row distance of 2.5 m × 1.2 m and the vine rows were oriented from north-to-south. The annual precipitation ranges from 350 to 600 mm. The site is characterized by an average annual temperature of 14 • C, being within the optimum for the cultivation of the vine (11-18 • C). The soil is deep enough that vine roots can develop conveniently, and it also has an appropriate water absorption capacity due to its loamy-clayey composition. The field was divided into the most common training systems in Spain. As a result of the various training systems, which include pruning and trellis systems, a wide variety of vine architectures is presented. Plantation geometry is also a key component of vine architecture since the space between vines and rows indicates the density of plantation. The percentage of leaf area that can be exposed to the light on a constant basis is a critical factor in selecting a training system. These techniques are primarily targeted at disposing perennial wood and canes in such a way that the exposed leaf area is maximized for light interception, resulting in increased production potential, better quality, and enhanced disease management and optimized leaf area to fruit ratio.
In this manner, the field was assessed according to eight training systems. Seven of them are trellised and one is head-trained: Pergola, Curtain, Smart Dyson, Lyre, Geneva Double Curtain abbreviated to GDC, T-Trellis, Vertical Shoot Positioned (VSP) and Gobelet ( Figure 1). The vine spacing within and between rows was different according to each training system ( Table 1). The trellis configuration is inherent to the training system, with the one head-trained, i.e., Gobelet, featuring a simpler vessel training with the help of wooden posts. Conversely, the trellised systems need the installation of metal structures, wires and posts. The eight training systems were measured separately. Each training system grew in separated areas formed by three rows. The sampling was conducted in the central row. The samples were taken in 11-vine consecutive batches, i.e., replicates. The 11 vine batches were used for 3D reconstruction, and its comparison with ground truth. Ground truth was based in the dry biomass of the pruned branches. In total, 88 vines were shoot pruned and analyzed separately. The research areas were pruned towards the end of February. Over the previous seasons, vines were pruned in the same manner and no differential treatments were applied. The branches of every training system were collected together. The pruned branches were immediately sent to the laboratory for dry biomass assessment. Dry branches were physically weighed both within the vineyard and following the drying operation. Grapevine shoots corresponding to each training system were individually dried during 48 h at +80 • C. After the samples had dried, they were weighed to determine the weight of the pruned branches of every training system. Then, the dry biomass values that were collected were utilized to be contrasted with the volume values that were obtained from the three-dimensional model of the different training systems.

Sampling System
A data collection system was set up using a Kinect v2 as an RGB-D sensor-based system. The Kinect v2 employs an approach known as continuous wave (CW) intensity Agriculture 2022, 12, 798 7 of 18 modulation, which is most prevalently adopted in ToF cameras. Light from an amplitude modulated light source is backscattered by objects within the camera FoV, and the phase delay of the amplitude envelope is measured between the emitted and reflected light in a CW [50]. For each pixel in the imaging array, the phase difference is converted into a distance measurement. Notwithstanding that the Kinect for Windows v2 sensor uses a different technology than the Kinect v1, it can still acquire depth, color and infrared information. The RGB camera records color data at a resolution of 1920 × 1080 pixels, whereas the IR camera captures depth maps and also IR data in real time at a resolution of 512 × 424 pixels. The entire acquisition process can be completed at a framerate of up to 30 Hz. The depth sensing field of vision opens 70 degrees horizontally and 60 degrees vertically, offering a working range from 0.5 m to 4.5 m. Even though the previous Kinect was not adapted for daylight outdoor acquisitions, Kinect v2 is still less affected under the influence of the very bright conditions providing that the sensor's lenses are not directly exposed to sunrays [50,51]. The sensor operating range meets the vineyard row inspection criteria, whilst non-interesting objects, such as those that are extremely close and those in remote places that normally encompass adjacent vineyard rows, are disregarded. The sensor was attached sideways to an on-ground electric platform allowing data acquisition ( Figure 2). An on-board computer (Intel Core i7-4771@3.5GHz processor, Santa Clara, (CA) USA, 16 GB RAM, NVIDIA GeForce GTX 660 graphic card) was used to connect the Kinect v2 sensor.

Sampling System
A data collection system was set up using a Kinect v2 as an RGB-D sensor-based system. The Kinect v2 employs an approach known as continuous wave (CW) intensity modulation, which is most prevalently adopted in ToF cameras. Light from an amplitude modulated light source is backscattered by objects within the camera FoV, and the phase delay of the amplitude envelope is measured between the emitted and reflected light in a CW [50]. For each pixel in the imaging array, the phase difference is converted into a distance measurement. Notwithstanding that the Kinect for Windows v2 sensor uses a different technology than the Kinect v1, it can still acquire depth, color and infrared information. The RGB camera records color data at a resolution of 1920 × 1080 pixels, whereas the IR camera captures depth maps and also IR data in real time at a resolution of 512 × 424 pixels. The entire acquisition process can be completed at a framerate of up to 30 Hz. The depth sensing field of vision opens 70 degrees horizontally and 60 degrees vertically, offering a working range from 0.5 m to 4.5 m. Even though the previous Kinect was not adapted for daylight outdoor acquisitions, Kinect v2 is still less affected under the influence of the very bright conditions providing that the sensor's lenses are not directly exposed to sunrays [50,51]. The sensor operating range meets the vineyard row inspection criteria, whilst non-interesting objects, such as those that are extremely close and those in remote places that normally encompass adjacent vineyard rows, are disregarded. The sensor was attached sideways to an on-ground electric platform allowing data acquisition ( Figure 2). An on-board computer (Intel Core i7-4771@3.5GHz processor, Santa Clara, (CA) USA, 16 GB RAM, NVIDIA GeForce GTX 660 graphic card) was used to connect the Kinect v2 sensor. The vehicle followed a straight path, parallel to the vine row and through the interrow space. In front of the platform an extruded aluminum profile held the sensor in a height adjustable manner. This platform is based on the Twizy Urban model (Renault, Boulogne-Billancourt, France), a compact commercial electric car with extremely compact dimensions suited for sampling most crops and adjusted to vineyard plantations. Due to the fact that data resolution is substantially affected by the speed of the mobile platform, the electric motor results in negligible vibrations since it supports very low speeds, below 3 km/h, which is a vital aspect in acquiring high-quality data. The vehicle followed a straight path, parallel to the vine row and through the interrow space. In front of the platform an extruded aluminum profile held the sensor in a height adjustable manner. This platform is based on the Twizy Urban model (Renault, Boulogne-Billancourt, France), a compact commercial electric car with extremely compact dimensions suited for sampling most crops and adjusted to vineyard plantations. Due to the fact that data resolution is substantially affected by the speed of the mobile platform, the electric motor results in negligible vibrations since it supports very low speeds, below 3 km/h, which is a vital aspect in acquiring high-quality data.

3D Modelling Process
The algorithm developed by [52] was used to process the acquired RGB-D data from the Kinect v2 in order to obtain the 3D reconstruction of the vineshoots according to eight different grapevine training systems. This approach achieves satisfactory results when reconstructing large zones using the information provided by the Kinect sensor. The ap- proach further develops the algorithm proposed by [53] in order to model large areas by fusing multiple overlapping depth pictures, saving information solely on the voxels nearest to the identified item, and retrieving the stored information through a hash table. This eliminates the need for a whole regular voxel grid to be kept in memory, resulting in significant computational savings. By virtue of its simplicity, the ray-casting method [54] is used to find the voxels in the 3D model that are intersected by each ray from the camera sensor, since its position is known. Thus, the depth voxels are determined. The information collected from the ray-casting approach is used to compute the camera's position and orientation (six degrees of freedom) per each scene surface obtained. The ultimate estimation is carried out using a variation of the iterative closest point (ICP) algorithm [55], which outputs a point cloud. This approach generated 3D models of sampled vineyard plots using a desktop computer (Intel Core i7-6900K@3.2 GHz processor, 64 GB of RAM, NVIDIA GeForce GTX Titan X graphics card). Once the volumes were computed, the next step was to filter the point cloud to remove the unsought points. The output data can be visualized in the form of a 3D point cloud using Cloud Compare (CloudCompare 2.9.1 GNU License, Paris, France) to manually delete the training structures, i.e., trellis systems and wooden posts and the fixed physical references that marked the starting and ending points of the vineyard rows. The step consisted in automatically filtering isolated points. If the average distance between a point and its 64 nearest neighbors is more than the standard deviation of the distance between all points, the point is termed as an outlier. The 3D model reconstruction was simplified as a result of this computation procedure, and plant volume could be determined for every sampled plot using these models.
In order to calculate the volume enclosed by the vineshoots from the different pruning systems, the alpha shape [56] algorithm was utilized to wrap the set of the corresponding 3D points due to its accurate performance [16,57,58]. The alpha-shape indicates how tightly the body fits the points and represents the contour that envelopes a set of 3D points [16,48,58]. The fitness level is determined by the index. The tighter the surface to encapsulate a group of 3D points, the closer the shape matches the points as the value decreases. The goal of this research was to obtain a solid, free of voids, while enclosing the smallest volume. The determination of an optimum value for α is absolutely critical for volume calculation. Variations in alpha-shape were also evaluated using index values of 0.1, 0.3, 0.5, 0.7 and 0.9 to calculate the most proper shape of the vineshoot. To address this calculation, the R package alphashape3d [59] was employed to compute volumes according to a specific α index. Higher values resulted in looser shapes, whereas lower values led to tighter shapes. The vineshoot volume was automatically retrieved by the algorithm. The entire 3D modeling process can be seen in Figure 3.

Statistical Analysis
Actual field measurements of plant dry biomass, i.e., ground truth, were compared with the RGB-D-based 3D models according to each vine training system. Therefore, the dataset was processed statistically to assess the capabilities of the system for crop charac-

Statistical Analysis
Actual field measurements of plant dry biomass, i.e., ground truth, were compared with the RGB-D-based 3D models according to each vine training system. Therefore, the dataset was processed statistically to assess the capabilities of the system for crop characterization in terms of geometric reliability, i.e., vineshoot 3D modeling. Least squares regression analyses were used to assess the prospective capabilities of the system for quantifying the dry biomass derived from the pruned shoots. Therefore, on all the training systems, simple linear regressions were used to determine the Pearson's correlation coefficients, as well as their related standard errors in the assessment of best fit. ANOVA was used to test the effect of the number of 3D points and volumes measured with the Kinect v2 on ground truth values, i.e., to establish a relationship between the RGB-D information and the actual volume and dry biomass parameters. A series of F-tests at significant alpha level of 0.05 were run to eliminate non-significant factors on the regression parameters. Furthermore, this approach was used to produce an estimated 95 percent confidence interval (CI), which allowed for a more direct assessment of the different training systems and regression coefficients. Influential observations exerting an undue effect on the regression models were filtered according to standardized Difference in Fits (DFFITS) criterion used as a cutoff. In contrast, leverage outliers applying a strong influence were not flagged in accordance with the Mahalanobis index. Analyses were performed on SPSS v28 (IBM SPSS Statistics).

Results and Discussion
Three-dimensional modelling is of high value since the architecture of vines varies within the same training system and significantly from one form of training to another. The use of a digital system for geometric reconstruction is of vital importance for further decision-making processes. The evaluation of several training systems was evaluated with differential results. Despite these geometrical intricacies, the RGB-D-based statistics and dry biomass measurements were in most cases significant. Therefore, geometrical complexity constitutes a great hindrance to accurate measurements. However, despite these geometrical intricacies, the RGB-D-based statistics and dry biomass measurements were in most cases significant at p < 0.05, so regression models showed a good fit (Table 2). Thus, the increasing number of 3D points, associated with their respective point clouds and higher volume values, corresponding to the increasing biomass values, i.e., ground truth. Regarding volume calculation, multiple evaluations were carried out with various values of α index, revealing that the alpha-shape formed with 0.1 was the most accurate representation of the vineshoots' true contour. The training systems had a varying effect on the relationship between the investigated parameters, mainly distinguishing between horizontally divided canopy, non-horizontally divided canopy (vertical canopy) and the head-pruned vines that form a kind of goblet shape as a special form of training. Gobelet is planted with no trellis wires, just one supportive stake per vine, producing a trunk crowned by vineshoots. Therefore, the measured vineshoot dry biomass versus the calculated volumes and 3D points linked to each volume were explained by linear models stating noticeable differences in slope and intercepts with regard to each training system ( Table 2). In accordance with the regression models between Kinect v2 volume, RGB-D volume onwards, and vineshoot dry biomass, linear relationships revealed significant but also inconsistent correlations. When data were analyzed according to each training system, it was observed that while some linear models showed very strong correlations (R 2 > 0.90) others corresponded worse with the RGB-D measurements, i.e., vineshoot volume, and dry biomass values (R 2 < 0.50 and even close to 0.001). Additionally, the GDC relationship had a negative slope, which contradicts the system's expected assessing function and also depicted a subhorizontal line for the Y-Trellis ( Figure 4). The regression models were significant at p < 0.05, with the exception of the Lyre, GDC and T-Trellis, indicating that a higher vineshoot volume corresponded to the increasing measured ground truth with an R 2 up to 0.92. Even for the free-standing Gobelet training system, due to the width of the 95% CI for the slope parameter, it demonstrated a significant correlation between RGB-D volume and dry biomass despite the weak correlation coefficient (R 2 = 0.47).
Analogously, for the linear model between RGB-D 3D points (Kinect v2 3D points) and dry biomass, a similar situation was seen ( Figure 5). In this instance, the system demonstrated discriminating capabilities for the training systems Curtain, VSP, and even for Pergola and Smart Dyson, with a relatively high degree of agreement (0.66 < R 2 < 0.88). Additionally, even though the Gobelet-trained vines correlation is low (R 2 = 0.51) when comparing the number of 3D points versus the dry biomass, the system showed a meaningful relationship ( Table 2). It is worth noting with reference to the Kinect v2 predecessor that one of the primary shortcomings of the Kinect v1 was its absolute blindness in strong daylight. This issue occurred since Kinect v1 measures depth by triangulating the position of an infrared pattern of points projected onto the scene. Marinello et al. [60] proposed employing a Kinect v1 as a technique for quickly determining the three-dimensional structure of a grapevine canopy. Despite the fact that the v2 outperformed the functioning due to sunlight conditions, they found significant linear relationships. Experiments demonstrated a high degree of concordance between the outcomes of digital analysis and manual measurements for two distinct grapevine varieties, having a coefficient of determination of R 2 = 0.76. In contrast, Kinect v2 employs a completely different measurement principle (time-of-flight) that should at least partially alleviate the problem of daylight interference. Nevertheless, although Kinect v2 outperformed its predecessor [61] the sensor showed its limits in broad daylight when the RGB-D measures were taken, worsening its effects in the assessment of horizontally divided vines. GDC bears a divided canopy with branches trained downward into two different horizontal parallel free-hanging curtains. Comparably, the Lyre system is a horizontally divided inverted GDC. Y-Trellis also divides the canopy using the arms of Y-shaped posts, i.e., the vineshoots are allowed to trail on wires fixed apart on the inclined surface of 'Y'. The regression models were significant at p < 0.05, with the exception of the Lyre, GDC and T-Trellis, indicating that a higher vineshoot volume corresponded to the increasing measured ground truth with an R 2 up to 0.92. Even for the free-standing Gobelet training system, due to the width of the 95% CI for the slope parameter, it demonstrated a significant correlation between RGB-D volume and dry biomass despite the weak correlation coefficient (R 2 = 0.47). Analogously, for the linear model between RGB-D 3D points (Kinect v2 3D points) and dry biomass, a similar situation was seen ( Figure 5).  Thus, they are similar in terms of using trellises aimed at separating the vine's canopy, thereby increasing sun exposure to grape bunches. However, this architecture builds up two curtains of vineshoots which are spaced apart, i.e., opposite, at the base out of the Kinect v2 effective functioning range. Despite the fact that the depth nominal operating range of the sensor is 0.5-4.5 m [62], the maximum depth accuracy lowers when used outside [63]. In particular, experiments undertaken outdoors under varying daylight illumination conditions indicate that the sensor is capable of acquiring correct depth readings up to 1.9 m on sunny days and up to 2.8 m on cloudy days [51]. Since the mobile platform proceeded parallel to the row across the inter-row area, the raw data was comprised of RGB-D measurements of the vine rows' lateral surface. Hence, elements positioned further than the operative range resulted in sparse 3D reconstructions. Owing to this particular trellis system, this behavior might have been expected, especially these vines trained to Lyre, GDC and Y-trellis according to their low R 2 (Table 2). Furthermore, not only was the real depth range lower during the RGB-D readings, but the occlusion of the overlapped vineshoots also might have contributed to generate poor 3D models. This is explained by the fact that the two divided canopies were in an elevation view from the side of the sensor, and therefore the opposite canopy become hidden while the Kinect v2 was measuring. However, it is noted that factors such as temperature fluctuations or color variations with varying reflectance may impair the reliability of depth determinations. This issue was mitigated since the surveyed vines were leafless during the field trials, leaving only wooden components visible without foliage, resulting in a uniform 3D color environment [64].
Similarly, as mentioned above, Gobelet-trained vines led to some inconsistencies either when comparing dry biomass with volume and 3D points (R 2 = 0.47 and 0.52 respectively). Although Gobelet is the very simplest trellis, and is merely a stake in the ground that is tied with minimal training, the vine is crowned by vineshoots bending down pointing towards the sensor.
Consequently, the mobile platform had to offset the move from the center of the alley. In order to avoid null RGB-D values, the sensor had to move back from the path, i.e., not equidistant from the center, more than 0.5, the official value, since the minimum operative distance ranges from 0.6 m to 0.75 m [65]. This resulted in a lack of 3D digitized information from those vineshoots that were hanging further on the other, opposite side of the Gobelet vines since they were out of the reach of the effective RGB-D measuring distance capacity. A possible solution to assess these horizontally divided training systems, including the head-trained vines, could be by scanning both sides of the crop rows.
By way of contrast, the vertical canopy training systems correlated well in both cases when comparing RGB-D volume and RGB-D 3D points against measured dry biomass ground-truth. Pearson's correlation coefficients varied from 0.67, reaching a top value of 0.92 (Figures 4 and 5). This could be explained by the architecture of these training systems since the canopy, thus the vineshoots, are virtually contained in a plane, i.e., a curtain parallel or at an angle to the movement direction of the sensor. These systems allow the vineshoots to grow within only one side of the trellis, in contrast to horizontally divided canopies. In the case of the Single Curtain, the upward growth of the shoots is supported by stakes promoting a similar branch architecture. In a similar manner, Smart Dyson uses a high-low approach, with upward and downward shoots facing the direction of the Kinect v2. Shoots that grow from upward-facing spurs are vertically oriented upward, whereas those that develop from downward-facing spurs are similarly oriented downward. Though these systems form a desirable geometry, when assessing the linear regressions, influential points were flagged according to the DFFITS criterion.
The regression model conforming to Single Curtain was optimized from R 2 = 0.51 to 0.67 and R 2 = 0.58 to a more meaningful correlation (R 2 = 0.75) when comparing dry biomass against estimated volume and RGB-D 3D points, respectively. On the other hand, the regression models associated with Smart Dyson ranged in magnitude between R 2 = 0.44, increasing substantially to 0.92 and R 2 = 0.64 to 0.72 in the same manner as stated above. However, the number of measured vines could influence this tendency. A higher number of sampling points may fit the requirements to validate the system. A significant relationship was found for both systems (p < 0.05) which can be attached to the theory that increasing the number of RGB-D measurements correlates with increasing ground truth vineyard biomass. On the other hand, Pergola consists of a series of metal vertical poles positioned at an angle to support the branches hanging free and pointing towards the sensor. This trellis also manipulates the form of the vine in an adequate 3D orientation for the sensor position while RGB-D measuring, resulting in a high square-R of 0.81 when comparing volume and dry biomass, decreasing to 0.66 in consonance with the RGB-D 3D points. The vertical shoot-positioned trellis type maintains a canopy form and foliage separation in narrow-row spacings. Shoots emerging from spurs are directed upward between the sets of training wires on the trellis. Therefore, this geometry led to the most consistent relationship with dry biomass, whether estimated volume or three-dimensional points. Thus, a strong correlation was observed between the estimated parameters by the RGB-D digital model and their actual ground truth. The squared R values ranged from 0.82 to 0.88 for each correlation. Although the vineshoots are relatively thin, these trellising systems aid in avoiding the non-visible parts as an inherent limitation in this type of optical measurement. Furthermore, readings were acquired from defoliated canopies, which is a significant benefit when RGB-D imaging the vine structure to minimize leaf occlusion as a significant constraint on 3D reconstruction [31,66]. In addition, even though only one side was RGB-D measured, this architectural structure enabled a high correlation while avoiding sparse 3D models and significantly reducing isolated points. A further approach independent of the training system, to enhance the RGB-D sensor performance, can be taken by increasing the scene time lapse. The Kinect v2 sensor has been designed as a ToF camera, so each pixel of the collected depth maps contains a depth measurement according to the nearest objects. Moreover, it is not feasible to act on the modulation frequency or the integration time of the Kinect v2 compared to other RGB-D cameras [67]. This results in increasing the standard deviation within the range for the measured distances.
Nonetheless, with today's technology, due to the increased computing capacity, to cope with this distance inhomogeneity, the traveling speed of the sensor could be lowered in order to obtain more robust data through averaging redundant data, i.e., integrating more RGB-D information for the same scene.
In the light of these results, a significant degree of agreement was identified between ground truth dry biomass as a surrogate of RGB-D vineshoot volume and 3D points associated with the correspondent point clouds. These outcomes proved that a system combining RGB-D volume and 3D points results in a more robust approach to assessing the geometrical structure of vineshoots. Nonetheless, the typical shoot average diameter is 10 mm, so depth cameras are unable to measure them properly, resulting in poor or non-existent reconstructions. Moreover, it is stressed that for all the training systems, the terminal parts of the shoots are so much thinner than the Kinect v2 cannot properly acquire RGB-D information [67], hence losing input information for the regression models. However, although in this study the RGB-D data was captured at full field of view (FoV), the Kinect v2 allows for the selection of ranges of FoV, then varying the sensor performance. Similarly, García-Fernández et al. [68] using photogrammetric procedures i.e., structure from motion (SfM), determined the pruning weight in two vineyards with satisfactory results. The vegetation volume was determined from the generated 3D point clouds, and the pruning weight was estimated using linear regression analysis, with R 2 ranging from 0.66 to 0.71 and 0.56 to 0.68 for the two research vineyards. In the present study, although end-details of vineshoots were difficult to be 3D modeled, the RGB-D-based system showed its powerful ability to estimate accurate vineshoot dry biomass. The system demonstrates the ability to discriminate four out of the eight of the studied training systems in terms of volume and 3D point estimation. It can be differentiated between horizontally divided (Lyre, GDC and T-Trellis) and non-horizontally divided canopy vines (Pergola, Curtain, Smart Dyson and VSP) standing alone in the particular head-trained system Gobelet. This procedure was aimed at obtaining 3D information, and thus indirectly dry biomass reveals the capabilities of this low-cost sensor for obtaining three-dimensional information. Moreover, multiple environmental factors, such as soil parameters and access to nutrients and water, influence the length and width of dormant vineshoots in vines [69], so gaining more insight would allow map-based variable rate application or water management programs to be obtained. It can be stated that the Kinect v2 sensor for this purpose is a trustworthy proximal sensing technology due to its capabilities of 3D reconstructing of large crop areas in a high geometrical resolution. Thus, pruning wood estimation can be a valuable alternative for assessing vine vigor spatial variability in order to perform sitespecific input application i.e., irrigation or fertilization, and to adapt vineyard management operations such as pruning to improve vine balance and grape quality outcomes [14]. This can aid in the mapping of vine vigor in vineyards by adding an additional layer of data that can be utilized to demarcate management zones. Furthermore, the results demonstrated that the Kinect v2 as an RGB-D camera can be utilized to rapidly and accurately 3D digitize winter dormant shoots prior to pruning. This study contributes to the assessment of the capabilities of this RGB-D camera to obtain reliable 3D information. This can also help to gain insight into pruning practices, since a way to measure the vigor of a vine is by weighing its winter cane pruning for one growing season. Thus, if a certain biomass is then assigned to each shoot, the optimal number of shoots per vine based on its current vigor can be calculated. Thus, shoot density can be altered by thinning unnecessary shoots or by adjusting the degree of pruning [15]. Moreover, using proper algorithms for feature extraction of other prospective precision agriculture practices can be implemented while performing other monitoring tasks such as robotic pruning, i.e., accurate estimation of the cut-points on the branches for automation of dormant pruning [70] or bunch detection and counting [42]. An accurate 3D model of crop biophysical characteristics would therefore allow approaches such as automated pruning, irrigation systems and variable rate technologies to be implemented.

Conclusions
Proximal 3D sensing characterization can help to assess dormant shoot pruning weight as a good estimate of vigor, and then foresee the expected yield, thereby casting light in other tasks such as pruning directives on delimitating management zones considering vine vigor. The proposed methodology can help to achieve the aim of developing a 3D model of an entire crop in high detail. A self-developed mobile platform used a Kinect v2 as a low-cost RGB-D camera that accomplished the objective of 3D reconstructing the vineshoot structure according to eight different training systems when vines were completely defoliated during winter dormancy. The architectural reconstruction of the training systems based on vertical trellises showed strong correlations in both analysis when comparing vineshoot dry biomass and calculated RGB-D volume and the associated 3D points. Dense point clouds significantly influenced the relationship between ground truth dry biomass measurements in four out of eight training systems (Pergola, Single Curtain, Smart Dyson and VSP). The potential capabilities of the Kinect v2 sensor as an RGB-D camera used in this method lead to the conclusion that the election of the approach must be based on the target. A further approach should focus on acquiring more depth information and fusing an additional series of tridimensional point clouds from other angles. Thus, the density of the point clouds can be increased in order to improve the 3D reconstruction, which could be a faster alternative to measuring both sides of the vine rows. To improve the efficacy of the proposed method, additional research must be undertaken to determine the optimal measurement settings such as sensor height, distance to crop rows or vehicle speed to adapt them to the training system.