Grapevine Winter Pruning Point Localization Using YOLO-Based Instance Segmentation

Kapłan, Magdalena; Buczyński, Kamil

doi:10.3390/agriculture16090943

Open AccessArticle

Grapevine Winter Pruning Point Localization Using YOLO-Based Instance Segmentation

by

Magdalena Kapłan

^*

and

Kamil Buczyński

Institute of Horticulture Production, University of Life Sciences in Lublin, Głęboka 28, 20-612 Lublin, Poland

^*

Author to whom correspondence should be addressed.

Agriculture 2026, 16(9), 943; https://doi.org/10.3390/agriculture16090943

Submission received: 23 March 2026 / Revised: 16 April 2026 / Accepted: 22 April 2026 / Published: 24 April 2026

(This article belongs to the Special Issue Adapting Horticultural Plant Cultivation Technology and Storage to Changing Conditions)

Download

Browse Figures

Versions Notes

Abstract

Winter pruning is a key management practice in viticulture that directly affects vine architecture, yield balance, and grape quality. At the same time, it is a highly labor-intensive operation, and the selective identification of appropriate cutting locations remains one of the main challenges limiting the automation of pruning in vineyards. Advances in machine vision provide new opportunities to support the development of robotic pruning systems. The objective of this study was to develop and evaluate a vision-based method for estimating grapevine pruning points and cutting lines using instance segmentation outputs generated by YOLO models. A dataset of 1500 RGB images of dormant grapevines was collected under field conditions in the Nobilis vineyard located in southeastern Poland. Two annotation strategies were implemented to define pruning regions. YOLO-based instance segmentation models were trained and evaluated for detecting cutting-related structures. Based on the predicted segmentation masks, a geometry-based method termed PCAcutSeg-V was developed to estimate class-dependent cutting points and cutting lines using principal component analysis applied to object contours. The results indicate that YOLOv8 and YOLO11 architectures achieved the highest segmentation performance among the evaluated models. The simplified annotation strategy provided more stable geometric inputs for the PCAcutSeg-V method, enabling more reliable estimation of cutting points and cutting lines compared with the extended annotation approach. When combined with the PCAcutSeg-V method, the proposed perception–geometry pipeline achieved high effectiveness in pruning decision estimation. The method was further implemented in a real-time processing pipeline using an RGB camera and an edge computing platform, where it maintained performance consistent with the results obtained from offline image analysis. These findings demonstrate that combining deep learning-based instance segmentation with deterministic geometric reasoning enables accurate and interpretable estimation of grapevine pruning locations and provides a promising foundation for future autonomous pruning systems.

Keywords:

Vitis; You Only Look Once; CNN; convolutional neural network; machine vision; deep learning; principal component analysis

1. Introduction

Grapevine represents one of the most significant perennial crops globally, encompassing approximately 7.1 million hectares of cultivated area in 2024. Its production is distributed across Europe, Asia, the Americas, Africa, and Oceania [1]. Viticulture is an economically and culturally significant sector whose impact extends through complex supply and distribution networks. Beyond the direct financial benefits generated by wine sales, which support a wide range of stakeholders, including wine companies, their employees, grape growers, and landowners, the sector also delivers substantial indirect benefits. These include landscape preservation, biodiversity conservation, the provision of key ecosystem services, and the promotion of enotourism, collectively contributing to the sector’s broad and far-reaching societal and environmental impacts [2].

Climate change is reshaping viticultural suitability worldwide, leading not only to increasing challenges in many traditional wine-growing regions but also to the emergence of new areas with potentially favorable conditions for grapevine cultivation [3].

Grapevine production is becoming increasingly demanding, while climate-driven shifts in suitability indicate a potential expansion of vineyard areas, increasing the need for adaptable and scalable production technologies capable of operating across diverse and evolving environments. Even marginal improvements in efficiency or reductions in production costs in viticulture can have a significant impact on the global economy.

Despite the significant importance of grape production in the international market, the level of automation in this sector remains very low. Mechanization is a key factor in ensuring the timely execution of operations and in reducing production costs. For tasks that require selective decision-making, substantially greater efficiency can be achieved through robotic systems supported by computer vision technologies [4].

Progress in the development of robots for automated grapevine pruning based on deep learning methods can be considered achievable in the medium term. In this context, particular emphasis is placed on the development and evaluation of artificial intelligence (AI)-based solutions that can be implemented on portable devices with limited computational resources [5].

Manual pruning of fruit species is highly labor intensive and constitutes a substantial share of annual labor costs in fruit production [6]. This is particularly evident in vineyards, where operational constraints and cost pressures limit the use of large-scale machinery. Consequently, there is growing research interest in compact and flexible robotic platforms capable of performing precise pruning in heterogeneous working environments, especially in areas where conventional mechanized solutions prove inadequate [7]. Moreover, pruning is crucial for the subsequent quality of the vine fruit [8].

The automation of pruning operations requires overcoming multiple challenges, encompassing both robotic manipulation and the perception and detection of the working environment. Effective execution of these tasks depends on the ability of robotic systems to accurately identify appropriate pruning points [9].

Although mechanical winter pruning has been shown to reduce production costs, these approaches have not been widely adopted. The economic drivers supporting full mechanization, particularly labor availability and labor costs, have proven insufficient to provide the economic incentive required for broad implementation. Other factors have also played a significant role, including the perception among many growers that the quality of grapes and wine produced from mechanically pruned vines is inferior to that obtained through manual pruning, despite evidence to the contrary [10].

Artificial intelligence has the potential to significantly transform the agricultural sector by enabling more efficient and sustainable farming systems [11]. Artificial intelligence has significantly advanced the wine industry by improving product consistency, operational efficiency, quality control, and safety across the entire production chain. AI-driven systems enable real-time monitoring, predictive modeling, and enhanced traceability, supporting both regulatory compliance and consumer trust. Despite these benefits, challenges related to data integration, implementation costs, model transparency, and ethical considerations remain. Addressing these issues is essential for the responsible adoption of AI and for realizing its full potential in sustainable and precision winemaking [12]. Another important step is the systematic implementation of artificial intelligence-based solutions in viticulture, supporting growers in the management of fertilization, protection against pathogens, pests, weeds, and adverse weather conditions, as well as in canopy management practices, of which both summer and winter pruning constitute essential components. However, AI-based solutions are often integrated with more classical analytical approaches. One such example is principal component analysis (PCA), which can support the estimation of branch orientation [13], and may facilitate the development of autonomous pruning systems.

Deep learning is widely regarded as a core component of artificial intelligence and data science. Unlike traditional machine learning and data mining approaches, deep learning enables the extraction of highly sophisticated data representations from large volumes of raw data. As a result, it provides an effective framework for addressing a wide range of real-world problems [14].

Convolutional neural network (CNN) architectures provide a flexible and conceptually straightforward deep learning framework for addressing a wide range of perceptual tasks [15]. The term convolutional neural network indicates that the architecture employs the convolution operation as a core linear transformation, replacing general matrix multiplication in at least one layer. A CNN typically consists of one or more convolutional layers followed by fully connected layers, analogous to standard multilayer neural networks. Owing to their effectiveness, CNNs are among the most widely adopted approaches in image recognition and computer vision and represent a prominent class of models in data science [16]. Object detection techniques constitute a fundamental component of artificial intelligence. The YOLO (You Only Look Once) algorithm, based on convolutional neural networks, represents one of the major directions in the development of modern object detection methods [17]. In the YOLO algorithm, object detection is formulated as a regression task that predicts spatially distributed bounding boxes together with their associated class probabilities. A single neural network processes the complete input image in one forward pass to estimate both object locations and categories. As the entire detection process is implemented within a single network, it can be optimized end to end with respect to detection performance [18]. YOLO models support multiple computer vision tasks, including object detection, image segmentation, image classification, pose estimation, and oriented object detection. Object detection focuses on identifying and localizing objects in images or video frames using bounding boxes. Image segmentation extends this concept by assigning pixel-level labels to image regions based on their content. Classification aims to assign a single label to an entire image according to its visual characteristics. Pose estimation involves detecting predefined key points to represent object structure or motion. Oriented object detection enhances conventional detection by incorporating object orientation, enabling accurate localization of rotated objects [19]. The application of the YOLO algorithm to agricultural object recognition represents a significant advancement in digital agricultural tools and technologies. By leveraging its capability to identify and localize multiple objects in real time, YOLO enables the development of reliable systems for the detection and localization of diverse agricultural elements. Furthermore, its single-pass image processing paradigm proves particularly effective in dynamic and resource-demanding agricultural environments spanning large and heterogeneous field areas [20]. The transition from Agriculture 4.0 frameworks toward emerging Agriculture 5.0 paradigms is expected to redefine intelligent crop monitoring systems by prioritizing the resolution of increasingly complex detection tasks and the enhancement of monitoring methodologies, including real-time autonomous and multitasking capabilities. Within this evolving paradigm, artificial intelligence and big data represent fundamental enabling technologies that underpin advanced decision support systems and predictive analytics frameworks. Future developments will therefore emphasize automated decision-making processes, fully unmanned field operations, and a progressive reduction in direct human intervention, driven by increasingly sophisticated artificial intelligence solutions [21,22,23,24].

Considering the potential of convolutional neural network-based YOLO algorithms in agricultural applications, the objective of this study was to develop a method for estimating winter pruning points and cutting lines of grapevines based on segmentation masks generated by YOLO models. The novelty of the study lies in combining deep learning-based instance segmentation with deterministic geometric reasoning for grapevine pruning support under conditions characteristic of colder-climate viticulture. This approach is motivated by the limited availability of prior research addressing viticulture systems prevalent in Central and Eastern Europe, where grapevine training systems and winter pruning strategies differ substantially from those commonly studied in warmer viticultural zones. Accordingly, the study evaluates the feasibility of a vision-based, geometry-driven pruning support method under leafless winter conditions and provides a methodological foundation for future robotic pruning systems adapted to grapevine-growing regions characterized by cold winters and region-specific training practices.

2. Materials and Methods

2.1. Data Collection

2.1.1. Overview of the Research Area

The experimental plantation shown in Figure 1 consisted of grapevines of the species Vitis vinifera L. cultivated in the Nobilis vineyard located in Faliszowice in southeastern Poland at 50.6639° N and 21.5663° E. Data collection was conducted during the winter dormancy period when vines were in the leafless stage. The study included the cultivars Jutrzenka, Muscaris, Cabernet Cortis, and Regent planted in spring 2018. Additional plantings of Jutrzenka and Cabernet Cortis established in spring 2021 were also included. All vines were planted at a spacing of 2 m between rows and 1 m within rows which corresponds to a planting density of 5000 vines per hectare.

Vines were trained on a vertical trellis system composed of steel posts and a wire structure. The trellis included one fruiting wire supporting the permanent cordon and three pairs of catch wires maintaining vertical shoot positioning. All vines were trained according to a bilateral cordon system characterized by a trunk height of 80 cm and a permanent cordon approximately 0.9 m in length with four branching nodes. Two bud renewal spurs were retained at each node and each spur produced two fruiting shoots. The training system was consistently applied from the fourth year after planting. Summer shoot thinning, shoot tip management, and winter pruning were performed according to the training system’s specifications. No experimental treatments aimed at increasing canopy load or yield were applied. Vine load was maintained within a standard range of 9 to 11 fruiting buds per lignified cane.

2.1.2. Image Acquisition Procedure

During the winter period between December 2025 and January 2026, a total of 1500 RGB images were acquired from vines representing the studied cultivars. For each cultivar and vine age combination, 250 images were collected. Data were collected from a total of 2100 m of crop row length.

The evaluated cultivars differed in vine age which resulted in differences in the structure of the permanent cordons. Four cultivars were characterized by five-year-old cordons, whereas the younger plantings had three-year-old cordons. Pruning operations performed on older permanent cordons are more demanding under practical conditions and require greater experience and attention. Older structures contain a higher proportion of secondary wood, renewal spurs, and bud primordia.

In addition, older sections of the cordon accumulate multiple pruning wounds from previous seasons and exhibit physiological alterations. These factors complicate the identification of healthy tissue and the accurate selection of pruning sites. Older wood frequently presents an irregular distribution of living and dead tissue and shows increased susceptibility to wood-related diseases. As a result, shoot removal requires higher precision to avoid damaging fragile or infected tissue and to minimize the formation of unfavorable wounds.

Furthermore, older cordons contain a greater amount of overgrown wood and structural swellings formed as a consequence of repeated training interventions and previous pruning events. This substantially complicates the identification of appropriate fruiting buds and correct cutting locations. Careful spur selection and shoot positioning are therefore required to preserve canopy structure and to maintain a balanced vine load. Consequently, pruning on older cordons is more time consuming and more prone to error compared with pruning on younger vine structures. Younger bushes are also characterized by a more constant vertical growth direction of canes and the less developed spur and arm, which significantly simplifies the area of assigned segmentation masks and may ultimately affect the increase in the correctness of estimation of lines and cutting points.

The image dataset was collected using a Samsung Galaxy S25 smartphone (Samsung Electronics Co., Ltd., Suwon-si, Republic of Korea) featuring an RGB imaging sensor (1/1.56″, 50 MP) with a native spatial resolution of 8160 × 6120 pixels [25].

Images were acquired under open field conditions during the seasonal period suitable for winter pruning operations, from viewpoints positioned no more than 30 cm from the bushes, corresponding to distances commonly observed during standard pruning operations, and naturally encompassed both frontal and oblique perspectives. Images were obtained from unpruned areas representing each cultivar, rather than from individually identified plants. This approach was deliberately adopted to encompass the complete range of visual variation present across the plantation, thereby maximizing dataset diversity and representativeness. Shrubs designated for real-time testing were the only instances excluded from image acquisition. To enhance the generalization capability of the trained object detection models, image acquisition was conducted under a broad range of illumination and weather conditions. The dataset comprises photographs captured spanning different times of the day, varying cloud coverage, diverse sunlight intensities, and multiple plant surface moisture states, including dry and wet conditions. This diversity reflects natural variations in image contrast and specular reflection, both of which play a critical role in determining the robustness and operational reliability of computer vision systems deployed in real-world scenarios. Data acquisition and real-time testing were intentionally suspended under snow cover conditions, as such environments can obstruct the visibility of the vine’s above-ground organs and adversely affect the operation of electronic equipment at temperatures below 0 °C.

2.2. Data Preprocessing

2.2.1. Image Scaling

Original images were acquired at full sensor resolution of 8160 × 6120 pixels in order to retain unaltered raw visual information and to avoid device-level processing commonly applied in low-resolution capture modes. Subsequently, all images were uniformly downscaled to 640 × 480 pixels while preserving the native 4:3 aspect ratio using a deterministic Python-based preprocessing pipeline (version 3.12.0). During model training, the 640 × 480 inputs were resized to 640 × 640 pixels through letterbox padding implemented within the YOLO data loader, thereby maintaining the original aspect ratio and preventing geometric distortion of target objects. The selected input resolution of 640 pixels conforms to the standard YOLO training configuration and guarantees consistency throughout the dataset. All preprocessing, training, and evaluation scripts are made publicly accessible through the project repository [26].

2.2.2. Images Labeling

Polygon-based annotations were generated using the Labelme tool [27]. Two labeling strategies were applied for the definition of cutting regions which were denoted as simplified and extended.

The rejectingCut class was defined identically in both strategies and included the renewal spur. The lower boundary of this polygon was defined as the reference cutting line. For the headingCut class in the simplified strategy, the annotated region extended from the lateral bud, which corresponds to the first visible bud, and terminated above the second bud. The upper boundary of this mask defined the cutting line. In the extended strategy, the headingCut class remained identical in the region defining the cutting line. However, the lower boundary of the polygon was extended downward to the junction between the arm and the permanent cordon. An identical number of annotations per class was assigned for each strategy. Two annotation methods were used due to the possibility of assessing how the level of complexity of the assigned segmentation masks may affect the detection of cutting sites using the applied method. A total of 3093 cutting regions were labeled for the headingCut class and 1900 regions were labeled for the rejectingCut class. It should be noted that the number of retained buds may vary depending on the growing region, the grapevine training system, and even between individual vineyards.

Figure 2 presents a comparison of segmentation masks generated according to both annotation strategies. Red segmentation masks are assigned to the rejectingCut class, and blue ones to the headingCut class.

Figure 3 presents examples of the headingCut class in which the assigned annotations were identical for both labeling strategies. In these cases, no clearly distinguishable arm segment was visible. As a result, the lateral bud region was directly adjacent to the branching nodes located on the permanent cordon.

A critical aspect of data annotation for winter pruning regions concerns whether labels should be assigned exclusively based on clearly visible buds or whether nodes should also be considered. Nodes are characterized by localized swelling and frequently exhibit darker coloration. A bud is always located opposite the node and the reciprocal anatomical relationship is consistent. Restricting annotation only to shoots with clearly visible buds may lead to incorrect identification of appropriate cutting regions by the trained models. A similar effect may occur when no label is assigned in cases where the bud is not externally visible but the node is present.

For this reason, the annotation protocol adopted in this study for the headingCut class included both cutting regions with a clearly visible bud and regions in which only the node was externally visible. Labels were assigned with reference to the second clearly distinguishable bud which represents the first bud located above the lateral bud. It is important to note that renewal spurs also contain a basal bud located at the base of the cane in the region of a strongly shortened internode. From the perspective of growth potential, this basal bud is usually of minor significance. In cold-climate viticulture it typically does not develop under normal conditions. However, it may become a source of new shoots in the event of frost damage, which is characteristic of such growing regions. During the annotation process the basal bud was not considered and was not included in the count of analyzed buds. Figure 4 includes examples showing different degrees of bud and node visibility.

This annotation approach reduces labeling errors because it incorporates the detailed morphological structure of grapevine shoots and buds. Consideration of these structural features enables more precise identification of appropriate cutting locations. The node, together with its associated bud, forms a characteristic morphological unit consisting of the bud and the swelling located on the opposite side of the shoot axis.

2.2.3. Preparing Datasets

Each image series defined by cultivar and vine age was stored in a separate top-level directory corresponding to a single data acquisition session. For each dataset, the split into training, validation, and test subsets was performed using a deterministic random sampling procedure with a fixed random seed set to 42. Within each acquisition folder the list of images was shuffled using a seeded pseudorandom generator and assigned to three subsets in an approximate ratio of 80 to 10 to 10. This procedure ensures full reproducibility of the data partitioning while maintaining a proportional representation of all acquisition sessions and environmental conditions across the training, validation, and test subsets.

2.3. Model Training

2.3.1. YOLO Models

Training of YOLO models was restricted exclusively to the segmentation task. Accurate estimation of cutting points and cutting lines using the proposed method requires precise object boundaries that correspond strictly to the cutting region while minimizing visible background. This requirement cannot be satisfied using object detection based on bounding boxes, because only a portion of each bounding box edge corresponds to the cutting area, while the remaining area includes background.

Figure 5 illustrates the differences between detection and segmentation outputs. The blue color indicates the detection and segmentation of the headingCut class, and the red color indicates the rejectingCut class. It is important to note that the use of oriented object detection within the YOLO framework could partially reduce the inclusion of background due to the ability to assign rotated bounding boxes. However, this approach does not provide fully optimal boundary precision for the geometric estimation procedure applied in this study.

The selection was limited to the latest YOLO models from generations 8 to 26. However, because not all generations within this range are capable of performing segmentation tasks, the study included YOLOv8n-seg, YOLOv8s-seg, YOLO11n-seg, YOLO11s-seg, YOLO26s-seg, and YOLO26n-seg. The choice of models was also limited to the most efficient “n” and “s” versions, which are the least demanding in terms of the computing power needed to operate in real time, which translates into their potential in practical applications [28]. Although the YOLO12 architecture supports instance segmentation, no publicly available pretrained weights for the segmentation task were released at the time of this study. Therefore, YOLO12-seg models were excluded from the experimental evaluation.

2.3.2. Training Hyperparameter Settings

Model training was conducted over 200 epochs using a batch size of 32 and an input resolution of 640 × 640 pixels. Optimization was performed with stochastic gradient descent, configured with an initial learning rate of 0.01, a final learning rate multiplier of 0.01, a momentum value of 0.937, and a weight decay coefficient of 0.0005. Data augmentation adhered strictly to the standard Ultralytics YOLO pipeline activated during training, with all transformations applied dynamically during data loading to ensure consistent and reproducible preprocessing. Early stopping was disabled by setting the patience parameter to zero, allowing training to proceed until full convergence. A fixed random seed with a value of zero was used to guarantee deterministic behavior. All experiments were executed on an NVIDIA GPU equipped with CUDA support, as illustrated in Table 1.

2.4. Evaluation of Trained Models and Cutting Prediction Pipeline

2.4.1. Evaluation of YOLO Models in the Segmentation Task

The models’ performance was assessed using established object detection criteria, including precision, Recall, mean Average Precision evaluated at an Intersection over Union (IoU) threshold of 0.5 denoted as mAP₅₀, and the mean Average Precision averaged across thresholds from 0.5 to 0.95 referred to as mAP_50:95, as well as the F₁-score. Metric calculation was conducted through custom lightweight Python scripts built on the official YOLO evaluation pipeline, providing a consistent and reproducible assessment procedure across all models and datasets. All detection metrics reported in this study were derived exclusively from the held-out test sets, which remained completely separated from the training process.

Precision and Recall were adopted as complementary indicators to quantify detection accuracy and coverage, respectively, as defined in Equations (1) and (2). In these formulations, TP, FP, and FN correspond to the counts of true positive, false positive, and false negative predictions.

P r e c i s i o n = \frac{T P}{T P + F P}

(1)

R e c a l l = \frac{T P}{T P + F N}

(2)

The F₁-score defined in Equation (3) represents the harmonic mean of precision and Recall, serving as a single indicator that balances detection accuracy with completeness.

F_{1} - s c o r e = 2 \cdot \frac{P r e c i s i o n \cdot R e c a l l}{P r e c i s i o n + R e c a l l}

(3)

The mean Average Precision metrics quantify detection performance by integrating the area under the recision-Recall curve, thereby capturing the trade-off between confidence and accuracy across varying decision thresholds. The mAP₅₀ score reflects model performance evaluated at a single Intersection over Union threshold of 0.5, while the mAP_50:95 score represents an aggregate measure obtained by averaging results across multiple Intersection over Union thresholds ranging from 0.5 to 0.95 with a step size of 0.05, as defined in Equations (4) and (5).

{m A P}_{50} = \frac{1}{N} \sum_{i = 1}^{N} {A P}_{i} (I o U = 0.5)

(4)

{m A P}_{50 : 95} = \frac{1}{N} \sum_{i = 1}^{N} (\frac{1}{10} \sum_{k = 0}^{9} {A P}_{i} (I o U = 0.5 + 0.05 k))

(5)

Collectively, these metrics enable a thorough assessment of the segmentation performance exhibited by the trained models.

2.4.2. Geometry-Based Cut-Line and Cut-Point Estimation Method

A Python-based method, termed PCAcutSeg-V, was developed for estimating class-dependent cutting points and lines from YOLO segmentation masks.

This is a deterministic, geometry-based method for extracting class-dependent semantic cut lines from polygon-based segmentation annotations. The approach operates directly on YOLO-style segmentation outputs and requires no learning, training, or model optimization. Instead, it relies exclusively on geometric reasoning applied to object boundaries, making it well suited for precision vision and robotics applications. The method is designed to provide stable and interpretable geometric descriptors.

Each input image is processed together with a corresponding object segmentation, represented as a set of polygonal contours and associated class labels. The polygon coordinates are defined in normalized image space and are mapped to pixel coordinates using the image resolution. For each segmented object, a binary mask is generated by rasterizing the polygon. This representation enables robust boundary extraction and decouples the subsequent geometric processing from the specific source or format of the segmentation output.

From the binary mask, the external object contour is extracted using full-resolution contour tracing. To improve robustness, only the largest external contour extracted from the binary mask is retained, thereby suppressing small artifacts and spurious regions caused by annotation inconsistencies. Contours with insufficient point density are discarded to avoid unstable geometric estimates in later stages of processing.

To capture the dominant geometric orientation of each object, the method applies principal component analysis to the extracted contour points. This use of PCA follows its standard formulation for estimating the dominant orientation of two-dimensional point sets in contour-based shape analysis and computer vision [29,30,31]. PCA was selected due to its ability to provide a stable and noise-robust estimate of the dominant orientation of irregular object contours without requiring parameter tuning or learning. Its deterministic nature and low computational complexity make it well suited for real-time vision-based robotic applications, where consistent geometric descriptors are required for downstream control.

The contour is first centered by subtracting its centroid, after which the principal axis is estimated as the dominant direction of spatial variance. To ensure consistent orientation across images and objects, the direction of the principal axis is stabilized relative to the image coordinate system. This step removes sign ambiguities and guarantees repeatable results regardless of object pose.

Once the principal axis is established, all contour points are projected onto this axis. The resulting one-dimensional representation allows the object boundary to be analyzed along its dominant direction. Semantic edge regions are selected based on the object class. For one class, the method extracts points corresponding to the upper extreme of the object, while for the other class it selects points near the lower extreme. This class-dependent strategy enables the method to encode semantic meaning directly into the geometric extraction process. Percentile-based selection is employed to provide robustness against local boundary variations. Objects for which an insufficient number of points is obtained are discarded to prevent unstable geometric estimation.

A straight cut line is estimated from the selected edge points using orthogonal regression. This fitting strategy minimizes geometric error perpendicular to the line direction and is therefore insensitive to noise along the boundary. The resulting infinite line is converted into a finite segment by identifying the extreme projections of the edge points along the fitted direction. This yields a cut line that tightly spans the semantic boundary region of interest. Although the cut line is visualized as a line segment for interpretability and qualitative inspection, the method internally represents cutting orientation using numerical geometric descriptors derived from the fitted line. In practical robotic applications, these descriptors can be directly used to determine the inclination angle of pruning shear blades relative to the vine shoot.

For downstream tasks requiring a compact geometric representation, the semantic cutting location is expressed as a single cut point, defined as the midpoint of the fitted cut-line segment. This point offers a stable and noise-robust descriptor suitable for direct use in robotic control, measurement, and alignment pipelines.

In addition, the current implementation allows the cutting position to be flexibly adjusted by introducing a configurable offset relative to the segmentation mask contour generated by the YOLO model. By modifying this offset, the cutting height can be adapted to different pruning strategies. This capability enables the method to accommodate variations in the desired spur length across different vineyards without modifying the underlying geometric estimation procedure.

For qualitative evaluation and debugging, the method overlays semi-transparent object masks, fitted cut lines, and representative midpoints directly onto the original image. Class-specific color coding is used to visually distinguish different object types and their corresponding geometric features.

The full mathematical formulation of PCAcutSeg-V, detailed implementation information, and the source code for both offline and real-time operation are provided in the project repository [26].

2.4.3. Evaluation of PCAcutSeg-V Method on Static Images

To support qualitative inspection and dataset-level evaluation of the proposed PCAcutSeg-V method, a lightweight offline evaluation framework was developed. The framework applies the geometric cut-line and cut-point estimation method to static RGB images together with corresponding object segmentation annotations, enabling visual verification and exploratory analysis without real-time constraints.

The framework operates by loading pre-recorded images and their associated polygon-based segmentation data, and subsequently invoking the PCAcutSeg-V method to compute class-dependent cutting lines and representative cutting points for each segmented object.

For visualization purposes, the framework overlays semi-transparent object masks, estimated cut lines, and cut points directly onto the input images using class-specific color coding. An interactive graphical interface enables sequential browsing of image samples, facilitating qualitative assessment of the method’s behavior across diverse object geometries and segmentation qualities.

This offline configuration is intended exclusively for evaluation, debugging, and analysis of the proposed method. It does not introduce additional processing steps, learning mechanisms, or decision logic beyond those defined by PCAcutSeg-V, and therefore provides a faithful representation of the method’s geometric behavior under controlled conditions. The offline evaluation framework was implemented in Python. Figure 6 illustrates the graphical user interface of the offline evaluation framework used to visualize the output of the PCAcutSeg-V method. The estimated cutting points and cut lines provide a geometric representation of the cutting orientation, corresponding in real-world conditions to the alignment of pruning shear blades with respect to the vine shoot. Green markers and lines denote the headingCut class, while red markers and lines correspond to the rejectingCut class and applies to all other figures as well.

To evaluate the performance of the PCAcutSeg-V method, images from the test set were used together with segmentation labels generated by the trained YOLO models based on datasets annotated using the simplified and extended strategies. For both annotation strategies, the YOLO11s-seg model was selected as the source of segmentation masks.

The selection was based on the comparative evaluation of the tested architectures, in which YOLO11s-seg achieved the highest mAP_50:95 values among the analyzed models. This metric reflects segmentation quality across a wide range of IoU thresholds and therefore better captures the geometric fidelity of predicted masks than mAP₅₀ alone.

Because the PCAcutSeg-V method relies on geometric analysis of segmentation contours, accurate reconstruction of object boundaries is critical for reliable estimation of cutting points and cutting lines. For this reason, the model demonstrating the best overall segmentation quality was selected as the optimal input for the PCAcutSeg-V pipeline. Cutting points and cutting lines representing the cutting angle were evaluated independently. An incorrect estimation of the cutting line angle does not necessarily imply an incorrect localization of the cutting point.

The PCAcutSeg-V method was evaluated using a correctness-based framework for both cut-point and cut-line estimation, with results expressed as the number of correct versus incorrect estimations. Continuous geometric error metrics, such as pixel distance or angular deviation from a single manually defined reference, were intentionally not adopted. This decision reflects the practical objective of the study, which was not the mathematical optimization of abstract geometric primitives, but the validation of a pruning-support pipeline intended for future robotic cutting systems. From an operational perspective, the critical question is whether the estimated cut point and cut line lead to an acceptable pruning decision for a given shoot, rather than the exact numerical deviation from an arbitrarily fixed reference. Moreover, pruning evaluation is inherently dependent on the adopted pruning technique, which may vary with the training system, cultivar, management objectives, and individual pruning style. As a result, the definition of a single mathematically exact ground truth for each cut would introduce an artificial level of precision that may not reflect real agronomic practice. For this reason, a correctness-based evaluation was considered more appropriate for assessing the practical validity of the proposed method under realistic viticultural conditions.

The evaluation excluded points and lines assigned to cutting regions that were not fully visible in a given image even when their position was consistent with the actual shoot geometry. In cases where trained YOLO models generated duplicate segmentation masks for the same cane fragment, resulting in predicted duplicate cuts, the analysis considered only the correctness of the properly identified cutting region and disregarded the second mask.

To ensure consistent and comparable evaluation criteria for the PCAcutSeg-V method, the analysis was performed in parallel for both YOLO model variants differing in the applied annotation strategy. Figure 7 presents examples of situations in which two segmentation masks were assigned to the same cutting region.

From a practical perspective, the execution of an additional redundant cut at an earlier stage would not prevent the correct pruning outcome. The final correctly localized cut performed would subsequently lead to the intended structural result, whereas an additional cut would only represent redundant intervention without affecting the final shoot architecture.

2.4.4. Evaluation of PCAcutSeg-V Method in Real-Time Conditions

To enable real-time operation and validate the applicability of the proposed PCAcutSeg-V method under live sensing conditions, an online processing pipeline was implemented for streaming data acquired from an RGB camera. In this configuration, object segmentation masks are produced on-the-fly using a YOLO-based segmentation model, and the resulting polygonal contours are directly passed to the PCAcutSeg-V geometric processing stage.

For each incoming RGB frame, the segmentation model outputs class-labeled polygonal object masks. These polygons are converted into binary masks, from which the largest external contour is extracted and processed using the same PCA-based geometric reasoning as in the offline evaluation setup. No temporal filtering, tracking, or learning-based refinement is applied, ensuring that the live pipeline remains a faithful real-time instantiation of the PCAcutSeg-V method.

For visualization and operator feedback, the live system overlays semi-transparent object masks, estimated cut lines, and cutting points directly onto the RGB video stream. Class-specific color coding is used consistently with the offline framework, with green markers denoting the headingCut class and red markers indicating the rejectingCut class. Importantly, the live configuration does not modify the underlying PCAcutSeg-V logic relative to the offline evaluation framework. The same deterministic, geometry-based method is applied in both cases, with the only difference being the source of the segmentation mask’s pre-recorded annotations in the offline setup and real-time model predictions in the live pipeline. This design ensures methodological consistency while demonstrating the feasibility of deploying PCAcutSeg-V in real-world, time-constrained pruning scenarios. As a result, the live pipeline serves as a direct operational deployment of PCAcutSeg-V rather than a modified variant, ensuring strict methodological equivalence between offline analysis and real-time execution.

One of the major constraints in research on the implementation of technologies supporting agrotechnical operations in fruit species cultivation is the short and strictly defined time window available for specific field activities. In the case of grapevine pruning, the possibility of conducting real field experiments depends on the leafless dormancy stage and on the availability of vines prior to the execution of winter pruning. Snow cover and temperatures below 0 °C may further reduce the number of suitable weather windows during which experimental trials can be conducted.

To enable real-time evaluation of the PCAcutSeg-V method independently of external conditions and the seasonal availability of plant material, an artificial grapevine structure was designed and constructed. Separate models were trained for the artificial vine to ensure proper system performance within this controlled experimental environment. The implementation of this solution allowed continuous experimentation and refinement of the PCAcutSeg-V method in both static image processing mode and live processing configuration. This approach improved the implementation workflow and increased the stability and repeatability of the obtained results. However, the artificial vine was used solely as a supporting tool for the development and refinement of the method applied in this study. It was not intended to replace validation under real vineyard conditions. Full verification of the practical applicability of the PCAcutSeg-V method requires testing on actual grapevine plants, where natural variability in shoot arrangement, morphology, and background conditions can be fully represented. In the controlled artificial grapevine setup, the method achieved 100% correctness for both classes in cut-point and cut-line estimation. However, this result should be interpreted within the context of a simplified and controlled experimental environment used to support method development, rather than as a substitute for validation under real vineyard conditions.

Figure 8 presents the artificial grapevine structure, the experimental setup, and an example of real-time system operation. Lateral buds were marked in white and higher-positioned buds in red, while structural elements of the vine were painted in black to represent the arm and in gray to represent the spur.

In the real-time experiments, image data were acquired using the RGB sensor of the Orbbec Gemini 336 stereo vision camera (Orbbec Inc., Shenzhen, China). Although the device integrates both RGB and depth-sensing capabilities, only the RGB imaging modality was utilized in this study, and all depth-related functionalities were intentionally excluded from the processing pipeline.

The RGB camera provides color images with a maximum spatial resolution of 1920 × 1080 pixels at a frame rate of up to 30 frames per second, which was sufficient to support real-time operation of the proposed vision-based processing pipeline. The RGB sensor features a horizontal field of view of 86° and a vertical field of view of 55°, enabling coverage of the complete region of interest during pruning-related observations [32].

All RGB frames were streamed directly from the camera and processed without hardware-level pre-filtering or proprietary enhancement. The acquired images served as direct inputs to the YOLO-based segmentation model, followed by geometric processing using the proposed PCAcutSeg-V method.

Real-time evaluation was conducted using a mobile research platform (Figure 9) based on an edge computing device from the Jetson series by NVIDIA for real-time experiments.

Inference was performed using a TensorRT-optimized engine file (best.engine) rather than the standard PyTorch model (best.pt). The YOLO segmentation network was exported and deployed in TensorRT format to fully exploit hardware acceleration, ensuring low-latency and high-throughput inference suitable for real-time operation.

Based on the evaluation results of the trained models and on the comparison of outcomes obtained from static image analysis for cutting point and cutting line estimation, the YOLO11s-seg model trained using the simplified annotation strategy was selected for the final stage of real-time experiments. This model achieved the highest evaluation metrics among the analyzed variants and demonstrated the greatest stability in the estimation of geometric cutting parameters compared with the alternative annotation strategy.

For each cultivar × vine age combination, the correctness of cutting point estimation was evaluated using 50 cutting points. In total, the evaluation included 300 cutting points. During the evaluation, the camera was positioned 10–20 cm away from the cutting areas and was moved manually. During the experiments in March, favorable lighting conditions were observed. To introduce variability in illumination conditions, the evaluation was conducted from both sides of the vineyard rows. This allowed the camera to be oriented both toward the sun and in the opposite direction, resulting in scenes captured under both front-lit and backlit conditions.

2.5. Hardware Configuration

2.5.1. Workstation

Table 2 summarizes the computational environment used for both model training and evaluation. All YOLO variants were trained on a dedicated high-performance workstation featuring an AMD Ryzen 9 9950X (Advanced Micro Devices, Inc., Sunnyvale, CA, USA) processor and an NVIDIA GeForce RTX 5080 (ASUSTek Computer Inc., Taipei, Taiwan) graphics accelerator with 16 GB of video memory. The hardware configuration was designed to support stable multi-GPU execution and uniform batch processing across all training runs. The software stack consisted of Microsoft Windows 11 Pro, Python version 3.11, PyTorch version 2.8, and CUDA version 12.8, providing full compatibility with the Ultralytics YOLO framework.

2.5.2. Embedded Edge AI Device

The mobile research platform was based on the reComputer J4012 (Seeed Technology Co., Ltd., Shenzhen, China) Edge AI Computer developed by Seeed Studio, and was built around the NVIDIA Jetson Orin NX module with 16 GB of memory, featuring an eight core Arm Cortex A78AE v8.2 processor and an NVIDIA Ampere graphics processor comprising 1024 CUDA cores and 32 Tensor Cores. The system ran Ubuntu 22.04.5 LTS for the aarch64 architecture with JetPack version 6.1, enabling hardware acceleration through CUDA 12.6, cuDNN 9.3, and TensorRT 10.3, as summarized in Table 3.

3. Results

The analysis of the results presented in Table 4 indicates noticeable differences in the performance characteristics of individual YOLO model generations. These differences depend both on the evaluation level and on the analyzed class. At the overall level, models from the YOLOv8 family achieved the highest precision values, which indicates their ability to generate stable and selective predictions with a limited number of false detections. At the same time, YOLO11 models obtained comparable results in metrics reflecting detection completeness and quality evaluated across a wider range of IoU thresholds. In selected cases these values were slightly higher. This pattern suggests a different distribution of the trade-off between precision and Recall rather than a clear superiority of one architecture over the other.

These differences become more pronounced in the per-class analysis. For the headingCut class, YOLOv8 models achieved the best results in terms of precision, F₁-score, and detection quality across multiple IoU thresholds. This confirms their stability and effectiveness in scenarios characterized by a relatively well-defined visual structure. Only in terms of Recall did the YOLO11s model obtain the highest value, which indicates a greater tendency to achieve more complete coverage of objects belonging to this class.

For the rejectingCut class a different performance pattern was observed. Although YOLOv8 models maintained the highest precision values, YOLO11 architectures achieved better results in metrics reflecting detection completeness, the balance between Recall and precision, and prediction quality averaged across a wide range of IoU thresholds. This indicates that for this less visually complex class, YOLO11 models were more effective in capturing the full spectrum of object appearance variability. In contrast, YOLO26 models demonstrated the weakest performance across all evaluated configurations.

The analysis of the results presented in Table 5 indicates that the use of the extended annotation strategy modified the distribution of performance across the evaluated YOLO models. The highest precision was achieved by the YOLOv8n-seg variant. However, this advantage was not consistently reflected in the remaining quality metrics. The YOLO11s seg model achieved the highest values across all other evaluation metrics which suggests more effective utilization of the additional information contained in the extended labels by this architecture.

For the headingCut class no single model demonstrated clear dominance. In terms of precision, defined as the proportion of correct detections among all model predictions, the highest value was obtained by YOLOv8n-seg. The best Recall and F₁-score values were achieved by YOLO11n-seg. In contrast, the highest mAP values across multiple IoU thresholds were obtained by YOLOv8s-seg.

For the rejectingCut class the overall pattern was consistent with the general evaluation results. YOLO11 models achieved the highest performance across most metrics with the exception of precision, where YOLO26s-seg obtained the best result. As observed with the simplified annotation strategy, YOLO26 models consistently showed the weakest performance across all evaluation levels.

Figure 10 presents a comparison of correctly assigned segmentation masks obtained from models trained using both the simplified and extended annotation strategies. The primary difference between the two approaches is that the extended strategy additionally includes segmentation of the arm region. However, this difference did not affect the final correctness of the detected cutting regions in the presented cases. Dark blue indicates the headingCut class, and light blue indicates the rejectingCut class.

Although the trained models were capable of detecting nonvisible buds based solely on the visible node located on the opposite side of the shoot axis, certain cases were not correctly identified, as illustrated in Figure 11. In the presented images yellow markers indicate node swelling behind which a bud is located. Orange markers denote the positions where the upper boundary of an ideally assigned mask should be placed, which would serve as the reference for cutting point and cutting line estimation. Red markers represent the actual cutting points that would be generated based on the visible segmentation masks for the headingCut class shown in blue.

The analysis of the results summarized in Table 6 presents the effectiveness of cutting point and cutting line estimation obtained from segmentation masks with consideration of both annotation strategies. At the overall level, cutting point prediction achieved higher accuracy than cutting line prediction regardless of the applied annotation method. This indicates that the estimation of single decision points is less sensitive to minor geometric inaccuracies of segmentation masks than the reconstruction of complete cutting lines together with their inclination angles.

A comparison between the simplified and extended annotation strategies reveals relevant differences in the error characteristics of the decision pipeline. Under the simplified labeling strategy, both cutting point and cutting line predictions achieved high overall effectiveness which suggests that the pipeline is well aligned with data characterized by lower semantic complexity. At the same time, a clear degradation in line prediction quality relative to point prediction was observed, which confirms that line estimation is more sensitive to the precision and geometric quality of the input masks.

The per-class analysis reveals additional differences between the headingCut and rejectingCut classes. For the headingCut class, the effectiveness of cutting point estimation and cutting line estimation decreases when the extended annotation strategy is applied. This may indicate that richer labeling introduces greater geometric variability, which complicates the determination of stable cutting lines, particularly in cases characterized by less distinct visual structure.

A different pattern is observed for the rejectingCut class. In this case cutting point estimation and cutting line estimation remain at a high level regardless of the applied annotation strategy although the formal definition of rejectingCut labels remains unchanged. The observed differences between results obtained with the simplified and extended strategies do not arise from modifications of the rejectingCut annotations themselves. Instead, they represent an indirect effect of changes in the segmentation quality of other structures, which influence the behavior of the entire decision pipeline.

Figure 12 presents a comparison of correctly estimated cutting points and cutting lines obtained from segmentation masks predicted by YOLO models using both simplified and extended annotation strategies. In the presented cases, the applied annotation strategy did not affect the final correctness of the estimated cutting points and cutting lines.

Figure 13 presents a comparison of cutting points and cutting lines estimated from segmentation masks predicted by YOLO models using both simplified and extended annotation strategies. The results include both correct and incorrect estimations, with errors primarily associated with the extended annotation approach. In these cases, the annotation strategy influenced the final correctness of cutting point and cutting line estimation. The reduced performance can be attributed to the more complex geometric structure of cutting regions in the extended strategy, for which the PCAcutSeg-V method was not fully adapted.

Table 7 presents the results of real-time evaluation of the estimation of cutting points and lines based on segmentation masks generated by the YOLO11s-seg model trained on the dataset labeled using the simplified annotation strategy. The evaluation was performed at both the overall level and separately for the headingCut and rejectingCut classes. The obtained results indicate high effectiveness of the proposed pipeline under real-time conditions. In general, the estimation of cutting points achieved slightly higher accuracy than the estimation of cutting lines, which is consistent with the observations obtained in the offline analysis. The per-class analysis reveals comparable performance for both analyzed classes, with slightly more stable predictions observed for the rejectingCut class. Overall, the results confirm that the proposed method maintains reliable performance when applied in a real-time processing environment.

Figure 14 shows examples of correct real-time estimations of cut points and lines using the PCAcutSeg-V method based on segmentation masks provided by the YOLO11s-seg model trained on simplifited annotated data.

Figure 15 shows examples of cutting locations being estimated despite a previous cut being made based on previous estimates. This is a valuable practical observation because in such situations, the autonomous robot’s control system will need to somehow know not to make another cut.

4. Discussion

The obtained results indicate that YOLOv8n-seg, YOLOv8s-seg, YOLO11n-seg, and YOLO11s-seg can be effectively applied to the segmentation of winter pruning regions. However, their practical suitability differed depending on the metric considered and on the downstream requirements of the proposed pruning estimation pipeline. At the overall level, YOLOv8 models achieved the highest precision values, indicating more selective predictions with fewer false positive detections. In contrast, YOLO11 models more frequently achieved comparable or higher values in Recall and F₁-score, indicating that a larger proportion of real objects was correctly detected while maintaining a favorable balance between completeness and precision.

This distinction is particularly important in the context of PCAcutSeg-V, because the proposed method relies not only on correct object detection, but also on the geometric fidelity of the predicted segmentation mask. From this perspective, a model with slightly lower precision but more complete and geometrically stable object coverage may provide more suitable input for cut-point and cut-line estimation than a more selective model producing incomplete masks. Therefore, the observed trade-off between precision and Recall should not be interpreted as a simple ranking of segmentation performance, but rather as a task-dependent balance between false positive control and geometric usefulness for downstream pruning estimation. In this context, particular importance was assigned to mAP_50:95, because this metric reflects segmentation quality across a broad range of IoU thresholds and therefore better captures the geometric fidelity of predicted masks than precision alone. Since PCAcutSeg-V operates directly on mask contours, this metric was considered the most informative for selecting the most suitable model for downstream cut-point and cut-line estimation. Importantly, YOLO11s-seg achieved the highest overall mAP_50:95 value under both annotation strategies, which further supports its selection as the most suitable segmentation model for the proposed pipeline.

Models YOLO26n-seg and YOLO26s-seg demonstrated the weakest performance across all analyzed configurations. This may indicate limited suitability of these variants for the considered task. Consequently, under the current experimental setup, YOLO26-based models do not represent a competitive alternative to the earlier generations.

It should be emphasized that the objective of this study was not to evaluate the models with respect to computational efficiency or energy consumption. The YOLO26 architecture was designed from the ground up for edge deployment. It introduces a simplified structural design that reduces unnecessary complexity while integrating targeted innovations to enable faster, lighter, and more accessible implementation [33]. The overall results confirm the existence of a clear trade-off between precision and detection completeness. The selection of an optimal architecture should therefore depend on the intended application scenario. In some cases, minimizing false detections may be critical. In others, maximizing detection coverage and geometric accuracy under diverse operational conditions may be more important. At the final stage of system design the computational requirements and energy demand of individual models should also be considered in relation to the target edge devices on which the system will operate.

The proposed PCAcutSeg-V method, which applies principal component analysis and operates on the geometry of segmentation masks predicted by YOLO models, demonstrates high potential for cutting point estimation and cutting line estimation.

A comparison between the simplified and extended annotation strategies reveals significant differences in the error characteristics of the decision pipeline. Under the simplified labeling strategy both cutting point estimation and cutting line estimation achieve high overall effectiveness. This suggests that the pipeline is well aligned with data characterized by lower semantic complexity. At the same time a clear degradation in cutting line estimation relative to cutting point estimation is observed, which confirms the greater sensitivity of this stage to the quality and geometric precision of the input masks.

The per-class analysis reveals additional differences between the headingCut and rejectingCut classes. For the headingCut class the effectiveness of both cutting point estimation and cutting line estimation decreases when the extended annotation strategy is applied. This may indicate that more complex labeling introduces increased geometric variability, which complicates the determination of stable cutting lines, particularly in cases characterized by less distinct visual structure. In the case of these studies, the main source of error was not the failure to detect cutting regions themselves, but the irregular spatial arrangement of shots. In particular, reduced accuracy was observed in cases where shots deviated substantially from the expected upright orientation and developed in more horizontal or oblique directions. Under such conditions, the geometric structure of the segmented region becomes less consistent with the assumptions of the PCAcutSeg-V method, which relies on stable dominant contour orientation for cut-line and cut-point estimation. This effect was particularly relevant for cutting line estimation, which is more sensitive than point estimation to local variations in object geometry.

The overall results indicate that the effectiveness of cutting point estimation and cutting line estimation is strongly dependent on both the applied annotation strategy and the characteristics of the analyzed class. The simplified labeling strategy supports stable and repeatable estimation at the general level. The extended annotation strategy enables improved representation of more complex structural cases but increases the sensitivity of the decision pipeline to data variability. These findings confirm the importance of separately analyzing segmentation quality and the performance of mask-based decision algorithms, particularly in operational contexts. The real-time experiments further support these observations. When the proposed pipeline was applied in a live processing configuration using the YOLO11s-seg model trained on the simplified annotation strategy, the system maintained high effectiveness in estimating both cutting points and cutting lines. The observed behavior was consistent with the offline evaluation, where cutting point estimation showed slightly higher robustness than cutting line estimation. These results indicate that the geometric reasoning implemented in the PCAcutSeg-V method remains stable under real-time conditions and confirm the potential applicability of the approach in future autonomous pruning systems.

Several previous studies have also focused on the detection or segmentation of grapevine pruning regions. One study based on deep learning proposed an automated winter pruning approach that combined two Faster R-CNN models for the detection of cutting regions with a Mask R-CNN model for the segmentation of dormant grapevine organs. The authors demonstrated that detection performance was strongly dependent on the visibility of pruning regions. The highest accuracy was obtained for clearly visible complex spurs. Occlusion was identified as the main limiting factor. In addition, the study showed that the segmentation of grapevine organs achieved substantially better results on vines subjected to shoot thinning. This confirms the significant influence of canopy management on the effectiveness of artificial intelligence-based solutions [34]. The combination of skeleton extraction using the Rosenfeld algorithm and bud identification via Harris corner detection achieved a reasonable level of positive detections, although the overall recognition performance remained moderate [35]. In studies addressing node detection in grapevine images acquired under diverse natural backgrounds, which is relevant for pruning automation, YOLOv7-tiny was shown to provide the most favorable balance between detection performance and inference time. This enabled its practical use in real-time systems [9]. ViNet is a deep learning-based framework for the reconstruction of grapevine structure from images. The method relies on node detection, shoot type identification, and graph-based reconstruction of spatial relationships between structural elements. The approach demonstrated high accuracy in plant structure prediction on a dedicated dataset. At the same time, limitations related to occlusion and incomplete plant visibility were identified as key constraints affecting performance [36]. A vision system for an autonomous grapevine winter pruning robot has also been proposed. The system generates a three-dimensional skeletonized model of shoots for the estimation of key pruning metrics. The authors demonstrated stable operation under real vineyard conditions and confirmed that the system enables dimensionally accurate extraction of shoot parameters which form the basis for further automation of the pruning decision process [37]. Current research also investigates the use of deep neural networks for tasks related to summer thinning of non-lignified shoots during the growing season. Studies have shown that deep learning methods, particularly Faster R-CNN with a ResNet18 backbone, enable the detection of visible cordon segments under real vineyard conditions during summer canopy management operations such as green shoot thinning. The authors demonstrated that detection quality strongly depends on the plant growth stage. As occlusion increases due to leaf and shoot development, detection performance systematically decreases. These findings confirm that precise visual perception represents a critical yet sensitive component of automation in vineyard management operations [38]. Furthermore, it has been demonstrated that pixel-level semantic segmentation based on deep learning networks enables accurate delineation of grapevine cordons under field conditions. In addition, cordon trajectories were shown to be effectively approximated using simple mathematical models. This allows precise tool positioning in automated green shoot thinning operations [39]. Progress in the detection of individual grapevine structural components highlights the potential for their practical use in the prediction of precise cutting locations.

Winter grapevine pruning can be partially automated by combining semantic image segmentation with the algorithmic reconstruction of a simplified two-dimensional plant structure model. The authors demonstrated that such a model allows the generation of a set of potential cutting points on shoots which can subsequently be filtered according to predefined agronomic rules. At the same time, it was emphasized that the accuracy of cutting point determination strongly depends on segmentation quality. Simplified structure linking algorithms may introduce incorrect connections even though the overall system enables autonomous execution of the pruning operation [40]. An algorithm for the localization of pruning points on dormant grapevines has also been proposed based on the integration of semantic segmentation, object detection, and depth information. PSPNet was used to separate shoots and the trunk from the background, while bud detection was performed using a YOLOv5 model. It was demonstrated that bud detection accuracy improves when semantic segmentation is applied prior to detection. Cutting point location was determined using bud coordinates, shoot skeleton information, and predefined agronomic rules, resulting in high localization accuracy. The authors emphasized that the proposed approach provides a foundation for three dimensional cutting point estimation and can be extended to other pruning rule sets [41]. An increasing number of studies focus on complete robotic systems for grapevine pruning. One example is a prototype robot for automated winter pruning that integrates a vision system for incremental three-dimensional reconstruction of plant structure, a decision module for cutting site selection, and collision-free trajectory planning for the robotic arm. The system was tested under vineyard conditions and demonstrated the ability to perform real pruning cuts. However, a key limitation identified by the authors was the long chain of interdependent components which affects the overall reliability of the solution [42].

The prediction of cutting points in viticulture is not limited to pruning operations but also applies to harvesting. An algorithm for localization of the grape peduncle cutting point for a harvesting robot has been presented based on a multi-camera vision system and artificial intelligence methods. The approach combines cluster detection using a YOLO model with pixel-level semantic segmentation and three-dimensional data for cutting point estimation. Point cloud processing was applied in cases of partial occlusion. The system was evaluated under laboratory and field conditions and demonstrated high accuracy in cutting point detection for both artificial and real grape clusters. The authors also showed that the proposed algorithm can be integrated with a harvesting robot and enables effective grape harvesting under field conditions [43]. Available studies emphasize the importance of developing an optimal method for cutting point prediction to support further mechanization of vineyard operations.

Recent studies have demonstrated that deep learning-based instance segmentation can reliably support vineyard perception, with lightweight architectures enabling deployment on embedded platforms. In particular, a comparison between Mask R-CNN and YOLOv8 showed that YOLO provides superior segmentation efficiency and inference speed, making it suitable as a perceptual front-end for robotic pruning systems. However, these approaches primarily focus on detecting vine structures or buds rather than explicitly localizing cutting points. By emphasizing end-to-end perception, they simplify system design but implicitly couple semantic recognition with decision-making, limiting geometric interpretability. The proposed method builds on these findings by treating instance segmentation as an enabling step and introducing geometry-based post-processing to derive stable, class-dependent cutting points [5].

Agrotechnical practice is based on a comprehensive system-oriented approach in which individual technological solutions do not operate in isolation. Robots designed for autonomous winter grapevine pruning should therefore be integrated within a broader ecosystem of automated and mechanized vineyard management tools. The objective of such integration is to minimize direct human involvement while maintaining the required operational precision.

Robotic cutting of shoots alone does not address the removal of pruned material from the trellis structure. This would require an additional robotic system capable of grasping and transferring shoots to the inter-row area. However, existing mechanical solutions are available that cut shoots at a predefined height and simultaneously remove them from the rows. In a coordinated workflow, the majority of shoot biomass could first be removed mechanically. Robotic systems could then perform precise cutting at optimal locations. The remaining biomass would be limited and could fall beneath the rows where it could be removed using established methods.

In addition, the increasing number of electrically powered devices used in modern agriculture highlights the importance of sustainable energy sources. Biomass generated during winter pruning may represent a potential resource for energy production in future vineyard management systems.

Current research indicates that image analysis based on segmentation and machine learning can be used to predict the potential biomass yield obtainable from grapevine shoots [44]. In addition, dual neural network systems are being developed to simulate combustion processes of vineyard-derived biomass. The objective is to maximize energy output while reducing greenhouse gas emissions [45].

Mechanical winter pruning of grapevines is increasingly adopted worldwide and is based on well-established physiological principles developed over many years. Despite confirmed benefits in terms of reduced labor demand and lower operational costs, a clear demonstration of its economic profitability remains challenging due to the potential reduction in vine productivity [46].

One of the main limitations of mechanical pruning is the lack of selectivity [47]. The exclusive use of this method may also require additional crop load adjustment in order to achieve improved quality parameters [48]. Many currently available technologies for mechanized winter pruning appear to be underutilized and have not been widely adopted. The implication is clear, for mechanization to be widely accepted by the industry, it must deliver increased production efficiency while maintaining or improving grape and wine quality [10]. A logical next step is therefore to complement non-selective mechanical pruning with selective and precise robotic cutting.

Research highlights the potential of robotic systems to bridge the gap between manual and mechanized operations and to support more efficient, sustainable, and precise agrotechnical practices. The absence of deployment-ready solutions currently represents a major barrier to the development of automated grapevine pruning systems.

The predominance of prototypes tested under controlled conditions, high costs of robotic platforms, and limited validation under variable field environments restrict wider adoption. Ongoing advances in artificial intelligence create opportunities to reduce system costs by limiting the number of required sensors while maintaining comparable functionality [7]. At the same time, autonomous navigation systems for vineyard applications are being developed based on machine vision and YOLO algorithms. These systems can support the operation of machines and robots dedicated to autonomous pruning. They demonstrate the long-term potential of robotic technology to transform vineyard productivity and operational efficiency [49].

The results presented in this study align with the broader research trend focused on the automation of vineyard management operations. The findings confirm that modern perception methods based on deep learning can effectively support the identification of cutting locations under field conditions. Comparative analysis demonstrated that high segmentation quality is a necessary but not sufficient condition for stable cutting point estimation and cutting line estimation. Subsequent geometric processing and agronomic contexts play a critical role.

The obtained results indicate that an approach separating the perception stage from the decision stage increases the interpretability and stability of the overall pipeline. In this framework, segmentation is treated as an enabling step rather than a final solution. The proposed PCAcutSeg-V script based on geometric analysis of segmentation masks represents a promising method for class-dependent cutting point estimation and cutting line estimation with improved robustness to input variability.

This study is subject to several limitations. The dataset was collected from a single vineyard, which restricts variability in background conditions, training systems, and cultivars. Experiments were intentionally not conducted under snow cover conditions, which further limits environmental diversity. In addition, cutting point estimation based on segmentation masks was restricted to a single geometric method relying on principal component analysis. Alternative estimation strategies were not evaluated. A comprehensive validation of the proposed approach should include integration with a robotic arm equipped with a cutting tool. Such integration would enable direct assessment of operational effectiveness and represents a clear direction for future research. A particularly important issue will be investigating the possibility of detecting pruning points based on the number of buds left, which in our research was limited to only one specific number. In cool-climate viticulture, winter conditions may significantly affect bud survival [50] and, consequently, influence pruning strategies.

It should be emphasized that effective automation of winter grapevine pruning should be considered as part of a broader integrated agrotechnical system. Such a system should combine selective robotic cutting with existing mechanized solutions and subsequent biomass management strategies. This integrated approach may, in the long term, reduce manual labor requirements while maintaining crop quality and thus represents a realistic step toward practical deployment of robotic pruning systems in vineyards.

Future research should focus on the optimization of available solutions with respect to both the detection of cutting regions and estimation of final cutting points and cutting angles. Emphasis should be placed on scalable approaches adapted to region-specific training systems and viticultural practices, supported by interdisciplinary collaboration in order to develop autonomous cutting systems that can be implemented in vineyards.

5. Conclusions

This study evaluated selected YOLO-based instance segmentation models for the identification of winter pruning regions in grapevines and proposed a geometry-driven post-processing method for predicting cutting points and lines based on segmentation masks. The results showed that YOLOv8 and YOLO11 architectures provide a favorable balance between segmentation quality and robustness, while the proposed PCAcutSeg-V method enabled stable, class-dependent prediction of pruning geometry. Real-time experiments further confirmed that the proposed pipeline maintains comparable performance under live processing conditions, with results consistent with those obtained from offline image analysis. These findings indicate that separating perception from geometry-based decision-making can provide the reliability and interpretability of pruning point estimation under field conditions. The proposed approach represents a step toward transforming vision-based perception into actionable geometric control, bridging the gap between image understanding and physical execution in autonomous grapevine pruning systems.

Author Contributions

Conceptualization, M.K. and K.B.; methodology, M.K. and K.B.; software, K.B.; validation, M.K. and K.B.; formal analysis, M.K. and K.B.; investigation, M.K. and K.B.; resources, M.K. and K.B.; data curation, M.K. and K.B.; writing—original draft preparation, M.K. and K.B.; writing—review and editing, M.K. and K.B.; visualization, M.K. and K.B.; supervision, M.K.; project administration, M.K.; funding acquisition, M.K. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

All preprocessing, training, and evaluation scripts used in this study are available in the associated repository: [https://github.com/kamilczynski/Grapevine-Winter-Pruning-Point-Localization-Using-YOLO-Based-Instance-Segmentation] (accessed on 23 January 2026). The original datasets and trained YOLO model weights (best.pt and best.engine) are not publicly available due to their large volume and storage constraints but can be obtained from the corresponding author upon reasonable request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

International Organisation of Vine and Wine (OIV). Available online: https://www.oiv.int/sites/default/files/documents/OIV-State_of_the_World_Vine-and-Wine-Sector-in-2024.pdf (accessed on 31 December 2025).
Fraga, H.; Freitas, T.R.; Fonseca, A.; Fernandes, A.; Santos, J.A. Climate Change Implications on the Viticulture Geography. In Advances in Botanical Research; Elsevier: Amsterdam, The Netherlands, 2024; Volume 110, pp. 27–69. [Google Scholar]
Van Leeuwen, C.; Sgubin, G.; Bois, B.; Ollat, N.; Swingedouw, D.; Zito, S.; Gambetta, G.A. Climate Change Impacts and Adaptations of Wine Production. Nat. Rev. Earth Environ. 2024, 5, 258–275. [Google Scholar] [CrossRef]
Khadatkar, A.; Sawant, C.P.; Thorat, D.; Gupta, A.; Jadhav, S.; Gawande, D.; Magar, A.P. A Comprehensive Review on Grapes (Vitis spp.) Cultivation and Its Crop Management. Discov. Agric. 2025, 3, 9. [Google Scholar] [CrossRef]
Pacioni, E.; Abengózar, E.; Macías, M.M.; García-Orellana, C.J.; Gallardo, R.; González Velasco, H.M. Towards Intelligent Pruning of Vineyards by Direct Detection of Cutting Areas. Agriculture 2025, 15, 1154. [Google Scholar] [CrossRef]
Mika, A.; Buler, Z.; Treder, W. Mechanical Pruning of Apple Trees as an Alternative to Manual Pruning. Acta Sci. Pol. Hortorum Cultus 2016, 15, 113–121. [Google Scholar]
Navone, A.; Martini, M.; Chiaberge, M. Autonomous Robotic Pruning in Orchards and Vineyards: A Review. Smart Agric. Technol. 2025, 12, 101283. [Google Scholar] [CrossRef]
Rätsep, R.; Karp, K.; Vool, E.; Tõnutare, T. Effect of Pruning Time and Method on Hybrid Grapevine (Vitis sp.) ‘Hasanski Sladki’ Berry Maturity in Cool Climate Conditions. Acta Sci. Pol. Hortorum Cultus 2014, 13, 99–112. [Google Scholar]
Oliveira, F.; Da Silva, D.Q.; Filipe, V.; Pinho, T.M.; Cunha, M.; Cunha, J.B.; Dos Santos, F.N. Enhancing Grapevine Node Detection to Support Pruning Automation: Leveraging State-of-the-Art YOLO Detection Models for 2D Image Analysis. Sensors 2024, 24, 6774. [Google Scholar] [CrossRef]
Poni, S.; Tombesi, S.; Palliotti, A.; Ughini, V.; Gatti, M. Mechanical Winter Pruning of Grapevine: Physiological Bases and Applications. Sci. Hortic. 2016, 204, 88–98. [Google Scholar] [CrossRef]
Aijaz, N.; Lan, H.; Raza, T.; Yaqub, M.; Iqbal, R.; Pathan, M.S. Artificial Intelligence in Agriculture: Advancing Crop Productivity and Sustainability. J. Agric. Food Res. 2025, 20, 101762. [Google Scholar] [CrossRef]
Izquierdo-Bueno, I.; Moraga, J.; Cantoral, J.M.; Carbú, M.; Garrido, C.; González-Rodríguez, V.E. Smart Viniculture: Applying Artificial Intelligence for Improved Winemaking and Risk Management. Appl. Sci. 2024, 14, 10277. [Google Scholar] [CrossRef]
Ahmed, D.; Sapkota, R.; Churuvija, M.; Karkee, M. Estimating Optimal Crop-Load for Individual Branches in Apple Tree Canopies Using YOLOv8. Comput. Electron. Agric. 2025, 229, 109697. [Google Scholar] [CrossRef]
Sarker, I.H. Deep Learning: A Comprehensive Overview on Techniques, Taxonomy, Applications and Research Directions. SN Comput. Sci. 2021, 2, 420. [Google Scholar] [CrossRef]
LeCun, Y.; Kavukcuoglu, K.; Farabet, C. Convolutional Networks and Applications in Vision. In Proceedings of the 2010 IEEE International Symposium on Circuits and Systems, Paris, France, 30 May–2 June 2010; IEEE: Piscataway, NJ, USA, 2010; pp. 253–256. [Google Scholar]
Dhillon, A.; Verma, G.K. Convolutional Neural Network: A Review of Models, Methodologies and Applications to Object Detection. Prog. Artif. Intell. 2020, 9, 85–112. [Google Scholar] [CrossRef]
Jiang, P.; Ergu, D.; Liu, F.; Cai, Y.; Ma, B. A Review of Yolo Algorithm Developments. Procedia Comput. Sci. 2022, 199, 1066–1073. [Google Scholar] [CrossRef]
Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; IEEE: Piscataway, NJ, USA, 2016; pp. 779–788. [Google Scholar]
Ultralytics. Computer Vision Tasks Supported by Ultralytics YOLO11. Available online: https://docs.ultralytics.com/tasks/ (accessed on 31 December 2025).
Badgujar, C.M.; Poulose, A.; Gan, H. Agricultural Object Detection with You Only Look Once (YOLO) Algorithm: A Bibliometric and Systematic Literature Review. Comput. Electron. Agric. 2024, 223, 109090. [Google Scholar] [CrossRef]
Mesías-Ruiz, G.A.; Pérez-Ortiz, M.; Dorado, J.; De Castro, A.I.; Peña, J.M. Boosting Precision Crop Protection towards Agriculture 5.0 via Machine Learning and Emerging Technologies: A Contextual Review. Front. Plant Sci. 2023, 14, 1143326. [Google Scholar] [CrossRef]
Taha, M.F.; Mao, H.; Zhang, Z.; Elmasry, G.; Awad, M.A.; Abdalla, A.; Mousa, S.; Elwakeel, A.E.; Elsherbiny, O. Emerging Technologies for Precision Crop Management Towards Agriculture 5.0: A Comprehensive Overview. Agriculture 2025, 15, 582. [Google Scholar] [CrossRef]
Fountas, S.; Espejo-García, B.; Kasimati, A.; Gemtou, M.; Panoutsopoulos, H.; Anastasiou, E. Agriculture 5.0: Cutting-Edge Technologies, Trends, and Challenges. IT Prof. 2024, 26, 40–47. [Google Scholar] [CrossRef]
Holzinger, A.; Fister, I.; Fister, I.; Kaul, H.-P.; Asseng, S. Human-Centered AI in Smart Farming: Toward Agriculture 5.0. IEEE Access 2024, 12, 62199–62214. [Google Scholar] [CrossRef]
Samsung Galaxy S25. Available online: https://www.samsung.com/pl/smartphones/galaxy-s25/specs/ (accessed on 3 November 2025).
Kamilczynski. GitHub Repository. 2026. Available online: https://github.com/kamilczynski/Grapevine-Winter-Pruning-Point-Localization-Using-YOLO-Based-Instance-Segmentation (accessed on 31 January 2026).
Wkentaro. Labelme: Image Annotation Tool, version 5.9.1; GitHub Repository: San Francisco, CA, USA, 2026. Available online: https://github.com/wkentaro/labelme (accessed on 4 January 2026).
Ultralytics. Available online: https://docs.ultralytics.com/models/ (accessed on 20 January 2026).
Šonka, M.; Hlaváč, V.; Boyle, R. Image Processing, Analysis, and Machine Vision, 4th ed.; Cengage Learning: Stamford, CT, USA, 2015. [Google Scholar]
Gonzalez, R.C.; Woods, R.E. Digital Image Processing, 4th ed.; Pearson Education: New York, NY, USA, 2018. [Google Scholar]
Jolliffe, I.T. Principal Component Analysis; Springer Series in Statistics; Springer: New York, NY, USA, 2002. [Google Scholar]
Orbbec. Stereo Vision Camera. Available online: https://www.orbbec.com/products/stereo-vision-camera/gemini-336/ (accessed on 22 January 2026).
Ultralytics YOLO26. Available online: https://docs.ultralytics.com/models/yolo26/ (accessed on 18 January 2026).
Guadagna, P.; Fernandes, M.; Chen, F.; Santamaria, A.; Teng, T.; Frioni, T.; Caldwell, D.G.; Poni, S.; Semini, C.; Gatti, M. Using Deep Learning for Pruning Region Detection and Plant Organ Segmentation in Dormant Spur-Pruned Grapevines. Precis. Agric. 2023, 24, 1547–1569. [Google Scholar] [CrossRef]
Xu, S.; Xun, Y.; Jia, T.; Yang, Q. Detection Method for the Buds on Winter Vines Based on Computer Vision. In Proceedings of the 2014 Seventh International Symposium on Computational Intelligence and Design, Hangzhou, China, 13–14 December 2014; IEEE: Piscataway, NJ, USA, 2014; pp. 44–48. [Google Scholar]
Gentilhomme, T.; Villamizar, M.; Corre, J.; Odobez, J.-M. Towards Smart Pruning: ViNet, a Deep-Learning Approach for Grapevine Structure Estimation. Comput. Electron. Agric. 2023, 207, 107736. [Google Scholar] [CrossRef]
Williams, H.; Smith, D.; Shahabi, J.; Gee, T.; Nejati, M.; McGuinness, B.; Black, K.; Tobias, J.; Jangali, R.; Lim, H.; et al. Modelling Wine Grapevines for Autonomous Robotic Cane Pruning. Biosyst. Eng. 2023, 235, 31–49. [Google Scholar] [CrossRef]
Majeed, Y.; Karkee, M.; Zhang, Q.; Fu, L.; Whiting, M.D. A Study on the Detection of Visible Parts of Cordons Using Deep Learning Networks for Automated Green Shoot Thinning in Vineyards. IFAC-PapersOnLine 2019, 52, 82–86. [Google Scholar] [CrossRef]
Majeed, Y.; Karkee, M.; Zhang, Q.; Fu, L.; Whiting, M.D. Determining Grapevine Cordon Shape for Automated Green Shoot Thinning Using Semantic Segmentation-Based Deep Learning Networks. Comput. Electron. Agric. 2020, 171, 105308. [Google Scholar] [CrossRef]
Fernandes, M.; Scaldaferri, A.; Fiameni, G.; Teng, T.; Gatti, M.; Poni, S.; Semini, C.; Caldwell, D.; Chen, F. Grapevine Winter Pruning Automation: On Potential Pruning Points Detection through 2D Plant Modeling Using Grapevine Segmentation. In Proceedings of the 2021 IEEE 11th Annual International Conference on CYBER Technology in Automation, Control, and Intelligent Systems (CYBER), Jiaxing, China, 27–31 July 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 13–18. [Google Scholar]
Chen, Z.; Wang, Y.; Tong, S.; Chen, C.; Kang, F. Grapevine Branch Recognition and Pruning Point Localization Technology Based on Image Processing. Appl. Sci. 2024, 14, 3327. [Google Scholar] [CrossRef]
Botterill, T.; Paulin, S.; Green, R.; Williams, S.; Lin, J.; Saxton, V.; Mills, S.; Chen, X.; Corbett-Davies, S. A Robot System for Pruning Grape Vines. J. Field Robot. 2017, 34, 1100–1122. [Google Scholar] [CrossRef]
Yang, L.; Noguchi, T.; Hoshino, Y. Development of a Grape Cut Point Detection System Using Multi-Cameras for a Grape-Harvesting Robot. Sensors 2024, 24, 8035. [Google Scholar] [CrossRef]
Puccio, S.; Miccichè, D.; Victorino, G.; Lopes, C.M.; Di Lorenzo, R.; Pisciotta, A. Estimating Pruning Wood Mass in Grapevine Through Image Analysis: Influence of Light Conditions and Acquisition Approaches. Agriculture 2025, 15, 966. [Google Scholar] [CrossRef]
Postawa, K.; Klimek, K.; Maj, G.; Kapłan, M.; Szczygieł, J. Advanced Dual-Artificial Neural Network System for Biomass Combustion Analysis and Emission Minimization. J. Environ. Manag. 2024, 349, 119543. [Google Scholar] [CrossRef]
Allegro, G.; Martelli, R.; Valentini, G.; Pastore, C.; Mazzoleni, R.; Pezzi, F.; Filippetti, I. Effects of Mechanical Winter Pruning on Vine Performances and Management Costs in a Trebbiano Romagnolo Vineyard: A Five-Year Study. Horticulturae 2022, 9, 21. [Google Scholar] [CrossRef]
Barcia, F.; Prieto, J.; Trentacoste, E.R. Effects of Mechanical Box Pruning Intensity on Bud Development, Vegetative Growth, and Yield Components on Cv. Cabernet-Sauvignon in Mendoza, Argentina. OENO One 2023, 57, 153–163. [Google Scholar] [CrossRef]
Gatti, M.; Civardi, S.; Bernizzoni, F.; Poni, S. Long-Term Effects of Mechanical Winter Pruning on Growth, Yield, and Grape Composition of Barbera Grapevines. Am. J. Enol. Vitic. 2011, 62, 199–206. [Google Scholar] [CrossRef]
Saha, S.; Noguchi, N. Smart Vineyard Row Navigation: A Machine Vision Approach Leveraging YOLOv8. Comput. Electron. Agric. 2025, 229, 109839. [Google Scholar] [CrossRef]
Lisek, J. Winter Frost Injury of Buds on One-Year-Old Grapevine Shoots of Vitis vinifera Cultivars and Interspecific Hybrids in Poland. Folia Hortic. 2012, 24, 97–103. [Google Scholar] [CrossRef]

Figure 1. Experimental vineyard.

Figure 2. Examples of differences between labeling methods: (a) simplified; (b) extended.

Figure 3. Examples of no difference between labeling methods.

Figure 4. Examples of varying degrees of bud and node visibility: (a) the first bud located above the latent bud is completely invisible; (b–e) buds are partially visible to varying degrees; (f) fully exposed bud.

Figure 5. Difference in marking the cutting areas: (a) detection; (b) segmentation.

Figure 6. PCAcutSeg-V interface.

Figure 7. Examples of overlapping segmentation masks predicted for the same cutting region.

Figure 8. Artificial grapevine and real-time system evaluation: (a) structure; (b) setup; (c) cutting point and line estimation.

Figure 9. Mobile research platform.

Figure 10. Comparison of correct segmentation metrics: (a) simplified annotation strategy; (b) extended annotation strategy.

Figure 11. Examples of incorrect segmentation of cutting areas.

Figure 12. Comparison of correct points and cut lines for both labeling methods: (a) simplified annotation strategy; (b) extended annotation strategy.

Figure 13. Comparison of correct and incorrect cut points and lines for both labeling methods: (a) simplified annotation strategy; (b) extended annotation strategy.

Figure 14. Examples of real-time cut-point and cut-line estimation.

Figure 15. Examples of re-estimating cut areas.

Table 1. Hyperparameters of YOLO training models.

Hyperparameter	Value
epochs	200
patience	0
imgsz	640
optimizer	‘SGD’
momentum	0.937
weight_decay	0.0005
lr₀	0.01
lrf	0.01
seed	0
augment	True
workers	8
batch	32
device	‘cuda’

Table 2. Software configuration primary workstation used for model training and evaluation.

Software	Version
Microsoft Windows	11 Pro (build 26200, 64 bit)
Python	3.11.13
PyTorch	2.8.0
CUDA	12.8
cuDNN	9.1.0.2 (NVIDIA build 91002)
Ultralytics	8.4.0

Table 3. Software configuration edge AI device used for real-time evaluation.

Software	Version
Ubuntu	22.04.5 LTS (aarch64)
Jetpack/(L4T)	6.1 (R36.4.0, kernel 5.15.148-tegra)
Python	3.10.12
PyTorch	2.5.0a0+872d972e41.nv24.08
CUDA Toolkit	12.6
cuDNN	9.3.0
TensortRT	10.3.0.30-1 + CUDA 12.5
Ultralytics	8.3.221

Table 4. Evaluaton of YOLO models trained on simplified labeled data.

Evaluation Level	Model	Precision	Recall	F₁-score	mAP₅₀	mAP_50:95
Overall	YOLOv8n-seg	0.845	0.753	0.796	0.844	0.477
	YOLOv8s-seg	0.835	0.783	0.808	0.839	0.481
	YOLO11n-seg	0.812	0.796	0.804	0.841	0.462
	YOLO11s-seg	0.816	0.792	0.804	0.833	0.484
	YOLO26n-seg	0.508	0.583	0.541	0.545	0.274
	YOLO26s-seg	0.681	0.670	0.675	0.716	0.388
Per-Class (headingCut)	YOLOv8n-seg	0.845	0.785	0.814	0.871	0.484
	YOLOv8s-seg	0.823	0.785	0.804	0.862	0.486
	YOLO11n-seg	0.795	0.782	0.788	0.838	0.457
	YOLO11s-seg	0.791	0.786	0.789	0.836	0.474
	YOLO26n-seg	0.537	0.691	0.605	0.623	0.317
	YOLO26s-seg	0.698	0.715	0.707	0.756	0.405
Per-Class (rejectingCut)	YOLOv8n-seg	0.844	0.720	0.777	0.817	0.469
	YOLOv8s-seg	0.848	0.780	0.813	0.816	0.476
	YOLO11n-seg	0.829	0.811	0.820	0.843	0.467
	YOLO11s-seg	0.841	0.798	0.819	0.831	0.494
	YOLO26n-seg	0.480	0.474	0.477	0.466	0.231
	YOLO26s-seg	0.664	0.624	0.643	0.676	0.370

Table 5. Evaluation of YOLO models trained on extended labeled data.

Evaluation Level	Model	Precision	Recall	F₁-score	mAP₅₀	mAP_50:95
Overall	YOLOv8n-seg	0.791	0.769	0.780	0.832	0.450
	YOLOv8s-seg	0.777	0.813	0.795	0.837	0.467
	YOLO11n-seg	0.779	0.817	0.798	0.847	0.456
	YOLO11s-seg	0.776	0.833	0.803	0.853	0.483
	YOLO26n-seg	0.689	0.706	0.697	0.738	0.378
	YOLO26s-seg	0.779	0.680	0.725	0.785	0.422
Per-Class (headingCut)	YOLOv8n-seg	0.796	0.780	0.788	0.817	0.421
	YOLOv8s-seg	0.776	0.797	0.786	0.841	0.452
	YOLO11n-seg	0.775	0.810	0.792	0.831	0.410
	YOLO11s-seg	0.773	0.806	0.789	0.833	0.444
	YOLO26n-seg	0.682	0.728	0.704	0.753	0.350
	YOLO26s-seg	0.762	0.726	0.744	0.802	0.398
Per-Class (rejectingCut)	YOLOv8n-seg	0.786	0.759	0.772	0.848	0.480
	YOLOv8s-seg	0.778	0.830	0.803	0.834	0.482
	YOLO11n-seg	0.782	0.824	0.803	0.862	0.502
	YOLO11s-seg	0.779	0.861	0.818	0.873	0.522
	YOLO26n-seg	0.696	0.685	0.690	0.723	0.406
	YOLO26s-seg	0.795	0.634	0.706	0.768	0.446

Table 6. Offline evaluation of cutting point and cutting line estimation on static images.

Annotation Method	Evaluation Level	Cutting Marking	Total Cuts	Correct Cuts
Simplified	Overall	Point	490	478 (97.55%)
	Overall	Line	490	462 (94.29%)
	Per-Class (headingCut)	Point	320	310 (96.88%)
	Per-Class (headingCut)	Line	320	302 (94.38%)
	Per-Class (rejectingCut)	Point	170	168 (98.82%)
	Per-Class (rejectingCut)	Line	170	160 (94.12%)
Extended	Overall	Point	479	449 (93.74%)
	Overall	Line	479	418 (87.27%)
	Per-Class (headingCut)	Point	305	276 (90.49%)
	Per-Class (headingCut)	Line	305	258 (84.59%)
	Per-Class (rejectingCut)	Point	174	173 (99.43%)
	Per-Class (rejectingCut)	Line	174	160 (91.95%)

Table 7. Real-time evaluation of cutting point and cutting line estimation.

Evaluation Level	Cutting Marking	Total Cuts	Correct Cuts
Overall	Point	300	288 (96.00%)
Overall	Line	300	283 (94.33%)
Per-Class (headingCut)	Point	200	191 (95.50%)
Per-Class (headingCut)	Line	200	188 (94.00%)
Per-Class (rejectingCut)	Point	100	97 (97.00%)
Per-Class (rejectingCut)	Line	100	95 (95.00%)

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Kapłan, M.; Buczyński, K. Grapevine Winter Pruning Point Localization Using YOLO-Based Instance Segmentation. Agriculture 2026, 16, 943. https://doi.org/10.3390/agriculture16090943

AMA Style

Kapłan M, Buczyński K. Grapevine Winter Pruning Point Localization Using YOLO-Based Instance Segmentation. Agriculture. 2026; 16(9):943. https://doi.org/10.3390/agriculture16090943

Chicago/Turabian Style

Kapłan, Magdalena, and Kamil Buczyński. 2026. "Grapevine Winter Pruning Point Localization Using YOLO-Based Instance Segmentation" Agriculture 16, no. 9: 943. https://doi.org/10.3390/agriculture16090943

APA Style

Kapłan, M., & Buczyński, K. (2026). Grapevine Winter Pruning Point Localization Using YOLO-Based Instance Segmentation. Agriculture, 16(9), 943. https://doi.org/10.3390/agriculture16090943

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Grapevine Winter Pruning Point Localization Using YOLO-Based Instance Segmentation

Abstract

1. Introduction

2. Materials and Methods

2.1. Data Collection

2.1.1. Overview of the Research Area

2.1.2. Image Acquisition Procedure

2.2. Data Preprocessing

2.2.1. Image Scaling

2.2.2. Images Labeling

2.2.3. Preparing Datasets

2.3. Model Training

2.3.1. YOLO Models

2.3.2. Training Hyperparameter Settings

2.4. Evaluation of Trained Models and Cutting Prediction Pipeline

2.4.1. Evaluation of YOLO Models in the Segmentation Task

2.4.2. Geometry-Based Cut-Line and Cut-Point Estimation Method

2.4.3. Evaluation of PCAcutSeg-V Method on Static Images

2.4.4. Evaluation of PCAcutSeg-V Method in Real-Time Conditions

2.5. Hardware Configuration

2.5.1. Workstation

2.5.2. Embedded Edge AI Device

3. Results

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI