1. Introduction
Grapevine represents one of the most significant perennial crops globally, encompassing approximately 7.1 million hectares of cultivated area in 2024. Its production is distributed across Europe, Asia, the Americas, Africa, and Oceania [
1]. Viticulture is an economically and culturally significant sector whose impact extends through complex supply and distribution networks. Beyond the direct financial benefits generated by wine sales, which support a wide range of stakeholders, including wine companies, their employees, grape growers, and landowners, the sector also delivers substantial indirect benefits. These include landscape preservation, biodiversity conservation, the provision of key ecosystem services, and the promotion of enotourism, collectively contributing to the sector’s broad and far-reaching societal and environmental impacts [
2].
Climate change is reshaping viticultural suitability worldwide, leading not only to increasing challenges in many traditional wine-growing regions but also to the emergence of new areas with potentially favorable conditions for grapevine cultivation [
3].
Grapevine production is becoming increasingly demanding, while climate-driven shifts in suitability indicate a potential expansion of vineyard areas, increasing the need for adaptable and scalable production technologies capable of operating across diverse and evolving environments. Even marginal improvements in efficiency or reductions in production costs in viticulture can have a significant impact on the global economy.
Despite the significant importance of grape production in the international market, the level of automation in this sector remains very low. Mechanization is a key factor in ensuring the timely execution of operations and in reducing production costs. For tasks that require selective decision-making, substantially greater efficiency can be achieved through robotic systems supported by computer vision technologies [
4].
Progress in the development of robots for automated grapevine pruning based on deep learning methods can be considered achievable in the medium term. In this context, particular emphasis is placed on the development and evaluation of artificial intelligence (AI)-based solutions that can be implemented on portable devices with limited computational resources [
5].
Manual pruning of fruit species is highly labor intensive and constitutes a substantial share of annual labor costs in fruit production [
6]. This is particularly evident in vineyards, where operational constraints and cost pressures limit the use of large-scale machinery. Consequently, there is growing research interest in compact and flexible robotic platforms capable of performing precise pruning in heterogeneous working environments, especially in areas where conventional mechanized solutions prove inadequate [
7]. Moreover, pruning is crucial for the subsequent quality of the vine fruit [
8].
The automation of pruning operations requires overcoming multiple challenges, encompassing both robotic manipulation and the perception and detection of the working environment. Effective execution of these tasks depends on the ability of robotic systems to accurately identify appropriate pruning points [
9].
Although mechanical winter pruning has been shown to reduce production costs, these approaches have not been widely adopted. The economic drivers supporting full mechanization, particularly labor availability and labor costs, have proven insufficient to provide the economic incentive required for broad implementation. Other factors have also played a significant role, including the perception among many growers that the quality of grapes and wine produced from mechanically pruned vines is inferior to that obtained through manual pruning, despite evidence to the contrary [
10].
Artificial intelligence has the potential to significantly transform the agricultural sector by enabling more efficient and sustainable farming systems [
11]. Artificial intelligence has significantly advanced the wine industry by improving product consistency, operational efficiency, quality control, and safety across the entire production chain. AI-driven systems enable real-time monitoring, predictive modeling, and enhanced traceability, supporting both regulatory compliance and consumer trust. Despite these benefits, challenges related to data integration, implementation costs, model transparency, and ethical considerations remain. Addressing these issues is essential for the responsible adoption of AI and for realizing its full potential in sustainable and precision winemaking [
12]. Another important step is the systematic implementation of artificial intelligence-based solutions in viticulture, supporting growers in the management of fertilization, protection against pathogens, pests, weeds, and adverse weather conditions, as well as in canopy management practices, of which both summer and winter pruning constitute essential components. However, AI-based solutions are often integrated with more classical analytical approaches. One such example is principal component analysis (PCA), which can support the estimation of branch orientation [
13], and may facilitate the development of autonomous pruning systems.
Deep learning is widely regarded as a core component of artificial intelligence and data science. Unlike traditional machine learning and data mining approaches, deep learning enables the extraction of highly sophisticated data representations from large volumes of raw data. As a result, it provides an effective framework for addressing a wide range of real-world problems [
14].
Convolutional neural network (CNN) architectures provide a flexible and conceptually straightforward deep learning framework for addressing a wide range of perceptual tasks [
15]. The term convolutional neural network indicates that the architecture employs the convolution operation as a core linear transformation, replacing general matrix multiplication in at least one layer. A CNN typically consists of one or more convolutional layers followed by fully connected layers, analogous to standard multilayer neural networks. Owing to their effectiveness, CNNs are among the most widely adopted approaches in image recognition and computer vision and represent a prominent class of models in data science [
16]. Object detection techniques constitute a fundamental component of artificial intelligence. The YOLO (You Only Look Once) algorithm, based on convolutional neural networks, represents one of the major directions in the development of modern object detection methods [
17]. In the YOLO algorithm, object detection is formulated as a regression task that predicts spatially distributed bounding boxes together with their associated class probabilities. A single neural network processes the complete input image in one forward pass to estimate both object locations and categories. As the entire detection process is implemented within a single network, it can be optimized end to end with respect to detection performance [
18]. YOLO models support multiple computer vision tasks, including object detection, image segmentation, image classification, pose estimation, and oriented object detection. Object detection focuses on identifying and localizing objects in images or video frames using bounding boxes. Image segmentation extends this concept by assigning pixel-level labels to image regions based on their content. Classification aims to assign a single label to an entire image according to its visual characteristics. Pose estimation involves detecting predefined key points to represent object structure or motion. Oriented object detection enhances conventional detection by incorporating object orientation, enabling accurate localization of rotated objects [
19]. The application of the YOLO algorithm to agricultural object recognition represents a significant advancement in digital agricultural tools and technologies. By leveraging its capability to identify and localize multiple objects in real time, YOLO enables the development of reliable systems for the detection and localization of diverse agricultural elements. Furthermore, its single-pass image processing paradigm proves particularly effective in dynamic and resource-demanding agricultural environments spanning large and heterogeneous field areas [
20]. The transition from Agriculture 4.0 frameworks toward emerging Agriculture 5.0 paradigms is expected to redefine intelligent crop monitoring systems by prioritizing the resolution of increasingly complex detection tasks and the enhancement of monitoring methodologies, including real-time autonomous and multitasking capabilities. Within this evolving paradigm, artificial intelligence and big data represent fundamental enabling technologies that underpin advanced decision support systems and predictive analytics frameworks. Future developments will therefore emphasize automated decision-making processes, fully unmanned field operations, and a progressive reduction in direct human intervention, driven by increasingly sophisticated artificial intelligence solutions [
21,
22,
23,
24].
Considering the potential of convolutional neural network-based YOLO algorithms in agricultural applications, the objective of this study was to develop a method for estimating winter pruning points and cutting lines of grapevines based on segmentation masks generated by YOLO models. The novelty of the study lies in combining deep learning-based instance segmentation with deterministic geometric reasoning for grapevine pruning support under conditions characteristic of colder-climate viticulture. This approach is motivated by the limited availability of prior research addressing viticulture systems prevalent in Central and Eastern Europe, where grapevine training systems and winter pruning strategies differ substantially from those commonly studied in warmer viticultural zones. Accordingly, the study evaluates the feasibility of a vision-based, geometry-driven pruning support method under leafless winter conditions and provides a methodological foundation for future robotic pruning systems adapted to grapevine-growing regions characterized by cold winters and region-specific training practices.
3. Results
The analysis of the results presented in
Table 4 indicates noticeable differences in the performance characteristics of individual YOLO model generations. These differences depend both on the evaluation level and on the analyzed class. At the overall level, models from the YOLOv8 family achieved the highest
precision values, which indicates their ability to generate stable and selective predictions with a limited number of false detections. At the same time, YOLO11 models obtained comparable results in metrics reflecting detection completeness and quality evaluated across a wider range of
IoU thresholds. In selected cases these values were slightly higher. This pattern suggests a different distribution of the trade-off between
precision and
Recall rather than a clear superiority of one architecture over the other.
These differences become more pronounced in the per-class analysis. For the headingCut class, YOLOv8 models achieved the best results in terms of precision, F1-score, and detection quality across multiple IoU thresholds. This confirms their stability and effectiveness in scenarios characterized by a relatively well-defined visual structure. Only in terms of Recall did the YOLO11s model obtain the highest value, which indicates a greater tendency to achieve more complete coverage of objects belonging to this class.
For the rejectingCut class a different performance pattern was observed. Although YOLOv8 models maintained the highest precision values, YOLO11 architectures achieved better results in metrics reflecting detection completeness, the balance between Recall and precision, and prediction quality averaged across a wide range of IoU thresholds. This indicates that for this less visually complex class, YOLO11 models were more effective in capturing the full spectrum of object appearance variability. In contrast, YOLO26 models demonstrated the weakest performance across all evaluated configurations.
The analysis of the results presented in
Table 5 indicates that the use of the extended annotation strategy modified the distribution of performance across the evaluated YOLO models. The highest
precision was achieved by the YOLOv8n-seg variant. However, this advantage was not consistently reflected in the remaining quality metrics. The YOLO11s seg model achieved the highest values across all other evaluation metrics which suggests more effective utilization of the additional information contained in the extended labels by this architecture.
For the headingCut class no single model demonstrated clear dominance. In terms of precision, defined as the proportion of correct detections among all model predictions, the highest value was obtained by YOLOv8n-seg. The best Recall and F1-score values were achieved by YOLO11n-seg. In contrast, the highest mAP values across multiple IoU thresholds were obtained by YOLOv8s-seg.
For the rejectingCut class the overall pattern was consistent with the general evaluation results. YOLO11 models achieved the highest performance across most metrics with the exception of precision, where YOLO26s-seg obtained the best result. As observed with the simplified annotation strategy, YOLO26 models consistently showed the weakest performance across all evaluation levels.
Figure 10 presents a comparison of correctly assigned segmentation masks obtained from models trained using both the simplified and extended annotation strategies. The primary difference between the two approaches is that the extended strategy additionally includes segmentation of the arm region. However, this difference did not affect the final correctness of the detected cutting regions in the presented cases. Dark blue indicates the headingCut class, and light blue indicates the rejectingCut class.
Although the trained models were capable of detecting nonvisible buds based solely on the visible node located on the opposite side of the shoot axis, certain cases were not correctly identified, as illustrated in
Figure 11. In the presented images yellow markers indicate node swelling behind which a bud is located. Orange markers denote the positions where the upper boundary of an ideally assigned mask should be placed, which would serve as the reference for cutting point and cutting line estimation. Red markers represent the actual cutting points that would be generated based on the visible segmentation masks for the headingCut class shown in blue.
The analysis of the results summarized in
Table 6 presents the effectiveness of cutting point and cutting line estimation obtained from segmentation masks with consideration of both annotation strategies. At the overall level, cutting point prediction achieved higher accuracy than cutting line prediction regardless of the applied annotation method. This indicates that the estimation of single decision points is less sensitive to minor geometric inaccuracies of segmentation masks than the reconstruction of complete cutting lines together with their inclination angles.
A comparison between the simplified and extended annotation strategies reveals relevant differences in the error characteristics of the decision pipeline. Under the simplified labeling strategy, both cutting point and cutting line predictions achieved high overall effectiveness which suggests that the pipeline is well aligned with data characterized by lower semantic complexity. At the same time, a clear degradation in line prediction quality relative to point prediction was observed, which confirms that line estimation is more sensitive to the precision and geometric quality of the input masks.
The per-class analysis reveals additional differences between the headingCut and rejectingCut classes. For the headingCut class, the effectiveness of cutting point estimation and cutting line estimation decreases when the extended annotation strategy is applied. This may indicate that richer labeling introduces greater geometric variability, which complicates the determination of stable cutting lines, particularly in cases characterized by less distinct visual structure.
A different pattern is observed for the rejectingCut class. In this case cutting point estimation and cutting line estimation remain at a high level regardless of the applied annotation strategy although the formal definition of rejectingCut labels remains unchanged. The observed differences between results obtained with the simplified and extended strategies do not arise from modifications of the rejectingCut annotations themselves. Instead, they represent an indirect effect of changes in the segmentation quality of other structures, which influence the behavior of the entire decision pipeline.
Figure 12 presents a comparison of correctly estimated cutting points and cutting lines obtained from segmentation masks predicted by YOLO models using both simplified and extended annotation strategies. In the presented cases, the applied annotation strategy did not affect the final correctness of the estimated cutting points and cutting lines.
Figure 13 presents a comparison of cutting points and cutting lines estimated from segmentation masks predicted by YOLO models using both simplified and extended annotation strategies. The results include both correct and incorrect estimations, with errors primarily associated with the extended annotation approach. In these cases, the annotation strategy influenced the final correctness of cutting point and cutting line estimation. The reduced performance can be attributed to the more complex geometric structure of cutting regions in the extended strategy, for which the PCAcutSeg-V method was not fully adapted.
Table 7 presents the results of real-time evaluation of the estimation of cutting points and lines based on segmentation masks generated by the YOLO11s-seg model trained on the dataset labeled using the simplified annotation strategy. The evaluation was performed at both the overall level and separately for the headingCut and rejectingCut classes. The obtained results indicate high effectiveness of the proposed pipeline under real-time conditions. In general, the estimation of cutting points achieved slightly higher accuracy than the estimation of cutting lines, which is consistent with the observations obtained in the offline analysis. The per-class analysis reveals comparable performance for both analyzed classes, with slightly more stable predictions observed for the rejectingCut class. Overall, the results confirm that the proposed method maintains reliable performance when applied in a real-time processing environment.
Figure 14 shows examples of correct real-time estimations of cut points and lines using the PCAcutSeg-V method based on segmentation masks provided by the YOLO11s-seg model trained on simplifited annotated data.
Figure 15 shows examples of cutting locations being estimated despite a previous cut being made based on previous estimates. This is a valuable practical observation because in such situations, the autonomous robot’s control system will need to somehow know not to make another cut.
4. Discussion
The obtained results indicate that YOLOv8n-seg, YOLOv8s-seg, YOLO11n-seg, and YOLO11s-seg can be effectively applied to the segmentation of winter pruning regions. However, their practical suitability differed depending on the metric considered and on the downstream requirements of the proposed pruning estimation pipeline. At the overall level, YOLOv8 models achieved the highest precision values, indicating more selective predictions with fewer false positive detections. In contrast, YOLO11 models more frequently achieved comparable or higher values in Recall and F1-score, indicating that a larger proportion of real objects was correctly detected while maintaining a favorable balance between completeness and precision.
This distinction is particularly important in the context of PCAcutSeg-V, because the proposed method relies not only on correct object detection, but also on the geometric fidelity of the predicted segmentation mask. From this perspective, a model with slightly lower precision but more complete and geometrically stable object coverage may provide more suitable input for cut-point and cut-line estimation than a more selective model producing incomplete masks. Therefore, the observed trade-off between precision and Recall should not be interpreted as a simple ranking of segmentation performance, but rather as a task-dependent balance between false positive control and geometric usefulness for downstream pruning estimation. In this context, particular importance was assigned to mAP50:95, because this metric reflects segmentation quality across a broad range of IoU thresholds and therefore better captures the geometric fidelity of predicted masks than precision alone. Since PCAcutSeg-V operates directly on mask contours, this metric was considered the most informative for selecting the most suitable model for downstream cut-point and cut-line estimation. Importantly, YOLO11s-seg achieved the highest overall mAP50:95 value under both annotation strategies, which further supports its selection as the most suitable segmentation model for the proposed pipeline.
Models YOLO26n-seg and YOLO26s-seg demonstrated the weakest performance across all analyzed configurations. This may indicate limited suitability of these variants for the considered task. Consequently, under the current experimental setup, YOLO26-based models do not represent a competitive alternative to the earlier generations.
It should be emphasized that the objective of this study was not to evaluate the models with respect to computational efficiency or energy consumption. The YOLO26 architecture was designed from the ground up for edge deployment. It introduces a simplified structural design that reduces unnecessary complexity while integrating targeted innovations to enable faster, lighter, and more accessible implementation [
33]. The overall results confirm the existence of a clear trade-off between precision and detection completeness. The selection of an optimal architecture should therefore depend on the intended application scenario. In some cases, minimizing false detections may be critical. In others, maximizing detection coverage and geometric accuracy under diverse operational conditions may be more important. At the final stage of system design the computational requirements and energy demand of individual models should also be considered in relation to the target edge devices on which the system will operate.
The proposed PCAcutSeg-V method, which applies principal component analysis and operates on the geometry of segmentation masks predicted by YOLO models, demonstrates high potential for cutting point estimation and cutting line estimation.
A comparison between the simplified and extended annotation strategies reveals significant differences in the error characteristics of the decision pipeline. Under the simplified labeling strategy both cutting point estimation and cutting line estimation achieve high overall effectiveness. This suggests that the pipeline is well aligned with data characterized by lower semantic complexity. At the same time a clear degradation in cutting line estimation relative to cutting point estimation is observed, which confirms the greater sensitivity of this stage to the quality and geometric precision of the input masks.
The per-class analysis reveals additional differences between the headingCut and rejectingCut classes. For the headingCut class the effectiveness of both cutting point estimation and cutting line estimation decreases when the extended annotation strategy is applied. This may indicate that more complex labeling introduces increased geometric variability, which complicates the determination of stable cutting lines, particularly in cases characterized by less distinct visual structure. In the case of these studies, the main source of error was not the failure to detect cutting regions themselves, but the irregular spatial arrangement of shots. In particular, reduced accuracy was observed in cases where shots deviated substantially from the expected upright orientation and developed in more horizontal or oblique directions. Under such conditions, the geometric structure of the segmented region becomes less consistent with the assumptions of the PCAcutSeg-V method, which relies on stable dominant contour orientation for cut-line and cut-point estimation. This effect was particularly relevant for cutting line estimation, which is more sensitive than point estimation to local variations in object geometry.
The overall results indicate that the effectiveness of cutting point estimation and cutting line estimation is strongly dependent on both the applied annotation strategy and the characteristics of the analyzed class. The simplified labeling strategy supports stable and repeatable estimation at the general level. The extended annotation strategy enables improved representation of more complex structural cases but increases the sensitivity of the decision pipeline to data variability. These findings confirm the importance of separately analyzing segmentation quality and the performance of mask-based decision algorithms, particularly in operational contexts. The real-time experiments further support these observations. When the proposed pipeline was applied in a live processing configuration using the YOLO11s-seg model trained on the simplified annotation strategy, the system maintained high effectiveness in estimating both cutting points and cutting lines. The observed behavior was consistent with the offline evaluation, where cutting point estimation showed slightly higher robustness than cutting line estimation. These results indicate that the geometric reasoning implemented in the PCAcutSeg-V method remains stable under real-time conditions and confirm the potential applicability of the approach in future autonomous pruning systems.
Several previous studies have also focused on the detection or segmentation of grapevine pruning regions. One study based on deep learning proposed an automated winter pruning approach that combined two Faster R-CNN models for the detection of cutting regions with a Mask R-CNN model for the segmentation of dormant grapevine organs. The authors demonstrated that detection performance was strongly dependent on the visibility of pruning regions. The highest accuracy was obtained for clearly visible complex spurs. Occlusion was identified as the main limiting factor. In addition, the study showed that the segmentation of grapevine organs achieved substantially better results on vines subjected to shoot thinning. This confirms the significant influence of canopy management on the effectiveness of artificial intelligence-based solutions [
34]. The combination of skeleton extraction using the Rosenfeld algorithm and bud identification via Harris corner detection achieved a reasonable level of positive detections, although the overall recognition performance remained moderate [
35]. In studies addressing node detection in grapevine images acquired under diverse natural backgrounds, which is relevant for pruning automation, YOLOv7-tiny was shown to provide the most favorable balance between detection performance and inference time. This enabled its practical use in real-time systems [
9]. ViNet is a deep learning-based framework for the reconstruction of grapevine structure from images. The method relies on node detection, shoot type identification, and graph-based reconstruction of spatial relationships between structural elements. The approach demonstrated high accuracy in plant structure prediction on a dedicated dataset. At the same time, limitations related to occlusion and incomplete plant visibility were identified as key constraints affecting performance [
36]. A vision system for an autonomous grapevine winter pruning robot has also been proposed. The system generates a three-dimensional skeletonized model of shoots for the estimation of key pruning metrics. The authors demonstrated stable operation under real vineyard conditions and confirmed that the system enables dimensionally accurate extraction of shoot parameters which form the basis for further automation of the pruning decision process [
37]. Current research also investigates the use of deep neural networks for tasks related to summer thinning of non-lignified shoots during the growing season. Studies have shown that deep learning methods, particularly Faster R-CNN with a ResNet18 backbone, enable the detection of visible cordon segments under real vineyard conditions during summer canopy management operations such as green shoot thinning. The authors demonstrated that detection quality strongly depends on the plant growth stage. As occlusion increases due to leaf and shoot development, detection performance systematically decreases. These findings confirm that precise visual perception represents a critical yet sensitive component of automation in vineyard management operations [
38]. Furthermore, it has been demonstrated that pixel-level semantic segmentation based on deep learning networks enables accurate delineation of grapevine cordons under field conditions. In addition, cordon trajectories were shown to be effectively approximated using simple mathematical models. This allows precise tool positioning in automated green shoot thinning operations [
39]. Progress in the detection of individual grapevine structural components highlights the potential for their practical use in the prediction of precise cutting locations.
Winter grapevine pruning can be partially automated by combining semantic image segmentation with the algorithmic reconstruction of a simplified two-dimensional plant structure model. The authors demonstrated that such a model allows the generation of a set of potential cutting points on shoots which can subsequently be filtered according to predefined agronomic rules. At the same time, it was emphasized that the accuracy of cutting point determination strongly depends on segmentation quality. Simplified structure linking algorithms may introduce incorrect connections even though the overall system enables autonomous execution of the pruning operation [
40]. An algorithm for the localization of pruning points on dormant grapevines has also been proposed based on the integration of semantic segmentation, object detection, and depth information. PSPNet was used to separate shoots and the trunk from the background, while bud detection was performed using a YOLOv5 model. It was demonstrated that bud detection accuracy improves when semantic segmentation is applied prior to detection. Cutting point location was determined using bud coordinates, shoot skeleton information, and predefined agronomic rules, resulting in high localization accuracy. The authors emphasized that the proposed approach provides a foundation for three dimensional cutting point estimation and can be extended to other pruning rule sets [
41]. An increasing number of studies focus on complete robotic systems for grapevine pruning. One example is a prototype robot for automated winter pruning that integrates a vision system for incremental three-dimensional reconstruction of plant structure, a decision module for cutting site selection, and collision-free trajectory planning for the robotic arm. The system was tested under vineyard conditions and demonstrated the ability to perform real pruning cuts. However, a key limitation identified by the authors was the long chain of interdependent components which affects the overall reliability of the solution [
42].
The prediction of cutting points in viticulture is not limited to pruning operations but also applies to harvesting. An algorithm for localization of the grape peduncle cutting point for a harvesting robot has been presented based on a multi-camera vision system and artificial intelligence methods. The approach combines cluster detection using a YOLO model with pixel-level semantic segmentation and three-dimensional data for cutting point estimation. Point cloud processing was applied in cases of partial occlusion. The system was evaluated under laboratory and field conditions and demonstrated high accuracy in cutting point detection for both artificial and real grape clusters. The authors also showed that the proposed algorithm can be integrated with a harvesting robot and enables effective grape harvesting under field conditions [
43]. Available studies emphasize the importance of developing an optimal method for cutting point prediction to support further mechanization of vineyard operations.
Recent studies have demonstrated that deep learning-based instance segmentation can reliably support vineyard perception, with lightweight architectures enabling deployment on embedded platforms. In particular, a comparison between Mask R-CNN and YOLOv8 showed that YOLO provides superior segmentation efficiency and inference speed, making it suitable as a perceptual front-end for robotic pruning systems. However, these approaches primarily focus on detecting vine structures or buds rather than explicitly localizing cutting points. By emphasizing end-to-end perception, they simplify system design but implicitly couple semantic recognition with decision-making, limiting geometric interpretability. The proposed method builds on these findings by treating instance segmentation as an enabling step and introducing geometry-based post-processing to derive stable, class-dependent cutting points [
5].
Agrotechnical practice is based on a comprehensive system-oriented approach in which individual technological solutions do not operate in isolation. Robots designed for autonomous winter grapevine pruning should therefore be integrated within a broader ecosystem of automated and mechanized vineyard management tools. The objective of such integration is to minimize direct human involvement while maintaining the required operational precision.
Robotic cutting of shoots alone does not address the removal of pruned material from the trellis structure. This would require an additional robotic system capable of grasping and transferring shoots to the inter-row area. However, existing mechanical solutions are available that cut shoots at a predefined height and simultaneously remove them from the rows. In a coordinated workflow, the majority of shoot biomass could first be removed mechanically. Robotic systems could then perform precise cutting at optimal locations. The remaining biomass would be limited and could fall beneath the rows where it could be removed using established methods.
In addition, the increasing number of electrically powered devices used in modern agriculture highlights the importance of sustainable energy sources. Biomass generated during winter pruning may represent a potential resource for energy production in future vineyard management systems.
Current research indicates that image analysis based on segmentation and machine learning can be used to predict the potential biomass yield obtainable from grapevine shoots [
44]. In addition, dual neural network systems are being developed to simulate combustion processes of vineyard-derived biomass. The objective is to maximize energy output while reducing greenhouse gas emissions [
45].
Mechanical winter pruning of grapevines is increasingly adopted worldwide and is based on well-established physiological principles developed over many years. Despite confirmed benefits in terms of reduced labor demand and lower operational costs, a clear demonstration of its economic profitability remains challenging due to the potential reduction in vine productivity [
46].
One of the main limitations of mechanical pruning is the lack of selectivity [
47]. The exclusive use of this method may also require additional crop load adjustment in order to achieve improved quality parameters [
48]. Many currently available technologies for mechanized winter pruning appear to be underutilized and have not been widely adopted. The implication is clear, for mechanization to be widely accepted by the industry, it must deliver increased production efficiency while maintaining or improving grape and wine quality [
10]. A logical next step is therefore to complement non-selective mechanical pruning with selective and precise robotic cutting.
Research highlights the potential of robotic systems to bridge the gap between manual and mechanized operations and to support more efficient, sustainable, and precise agrotechnical practices. The absence of deployment-ready solutions currently represents a major barrier to the development of automated grapevine pruning systems.
The predominance of prototypes tested under controlled conditions, high costs of robotic platforms, and limited validation under variable field environments restrict wider adoption. Ongoing advances in artificial intelligence create opportunities to reduce system costs by limiting the number of required sensors while maintaining comparable functionality [
7]. At the same time, autonomous navigation systems for vineyard applications are being developed based on machine vision and YOLO algorithms. These systems can support the operation of machines and robots dedicated to autonomous pruning. They demonstrate the long-term potential of robotic technology to transform vineyard productivity and operational efficiency [
49].
The results presented in this study align with the broader research trend focused on the automation of vineyard management operations. The findings confirm that modern perception methods based on deep learning can effectively support the identification of cutting locations under field conditions. Comparative analysis demonstrated that high segmentation quality is a necessary but not sufficient condition for stable cutting point estimation and cutting line estimation. Subsequent geometric processing and agronomic contexts play a critical role.
The obtained results indicate that an approach separating the perception stage from the decision stage increases the interpretability and stability of the overall pipeline. In this framework, segmentation is treated as an enabling step rather than a final solution. The proposed PCAcutSeg-V script based on geometric analysis of segmentation masks represents a promising method for class-dependent cutting point estimation and cutting line estimation with improved robustness to input variability.
This study is subject to several limitations. The dataset was collected from a single vineyard, which restricts variability in background conditions, training systems, and cultivars. Experiments were intentionally not conducted under snow cover conditions, which further limits environmental diversity. In addition, cutting point estimation based on segmentation masks was restricted to a single geometric method relying on principal component analysis. Alternative estimation strategies were not evaluated. A comprehensive validation of the proposed approach should include integration with a robotic arm equipped with a cutting tool. Such integration would enable direct assessment of operational effectiveness and represents a clear direction for future research. A particularly important issue will be investigating the possibility of detecting pruning points based on the number of buds left, which in our research was limited to only one specific number. In cool-climate viticulture, winter conditions may significantly affect bud survival [
50] and, consequently, influence pruning strategies.
It should be emphasized that effective automation of winter grapevine pruning should be considered as part of a broader integrated agrotechnical system. Such a system should combine selective robotic cutting with existing mechanized solutions and subsequent biomass management strategies. This integrated approach may, in the long term, reduce manual labor requirements while maintaining crop quality and thus represents a realistic step toward practical deployment of robotic pruning systems in vineyards.
Future research should focus on the optimization of available solutions with respect to both the detection of cutting regions and estimation of final cutting points and cutting angles. Emphasis should be placed on scalable approaches adapted to region-specific training systems and viticultural practices, supported by interdisciplinary collaboration in order to develop autonomous cutting systems that can be implemented in vineyards.