A GNSS–Vision Integrated Autonomous Navigation System for Trellis Orchard Transportation Robots

Liu, Huaiyang; Gu, Haiyang; Wang, Yong; Zhong, Tianjiao; Tian, Tong; Geng, Changxing

doi:10.3390/ai7040125

Open AccessArticle

A GNSS–Vision Integrated Autonomous Navigation System for Trellis Orchard Transportation Robots

by

Huaiyang Liu

^1,†

,

Haiyang Gu

^1,2,†,

Yong Wang

^1,2,

Tianjiao Zhong

³,

Tong Tian

^3,* and

Changxing Geng

^2,*

¹

State Key Laboratory of Efficient Utilization of Arid and Semi-Arid Arasble Land in Northern China, Institute of Agricultural Resources and Regional Planning, Chinese Academy of Agricultural Sciences, Beijing 100086, China

²

School of Mechanical and Electrical Engineering, Soochow University, Suzhou 215000, China

³

Hainan Tang Huajun Academician Workstation, Key Laboratory of Applied Research on Tropical Crop Information Technology of Hainan Province, Institute of Scientific and Technical Information, Chinese Academy of Tropical Agricultural Sciences, Haikou 570000, China

^*

Authors to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

AI 2026, 7(4), 125; https://doi.org/10.3390/ai7040125

Submission received: 19 February 2026 / Revised: 17 March 2026 / Accepted: 18 March 2026 / Published: 1 April 2026

Download

Browse Figures

Versions Notes

Abstract

Autonomous navigation is essential for orchard transportation robots to support automated operations and precision orchard management. However, in trellis orchards, dense vegetation and complex canopy structures often degrade the stability of GNSS-based navigation in in-row environments. To address this issue, this study proposes a GNSS–vision integrated navigation framework for orchard transportation robots. The performance of GNSS-based navigation in out-of-row environments and vision-based navigation in in-row environments was experimentally evaluated under representative orchard operating conditions. In out-of-row areas, the robot employs GNSS-based path planning and trajectory tracking to achieve reliable navigation in relatively open, lightly occluded environments. During in-row navigation, a deep learning-based real-time object detection approach is used to detect tree trunks and trellis supporting structures. By integrating corner-point selection with temporal RANSAC-based line fitting, a stable orchard row structure is constructed to generate robust navigation references. The visual perception module serves as the front-end sensing component of the navigation system and is designed to be independent of specific object detection architectures, allowing flexible integration with different real-time detection models. Field experiments were conducted under various orchard layouts and growth stages. The average lateral deviation of GNSS-based navigation in out-of-row scenarios ranged from 0.093 to 0.221 m, while the average heading deviation of in-row visual navigation was approximately 5.23° at a robot speed of 0.6 m/s. These results indicate that the proposed perception and navigation methods can maintain stable navigation performance within their respective applicable scenarios in trellis orchard environments. The experimental findings provide a practical and engineering-oriented basis for future research on automatic navigation mode switching and system-level integration of orchard transportation robots.

Keywords:

orchard transportation robots; GNSS–vision integrated navigation; in-row visual navigation; out-of-row GNSS navigation; orchard row structure extraction

1. Introduction

Autonomous navigation in orchard environments is a core enabling technology for precision agriculture, supporting transportation, spraying, and monitoring tasks [1,2,3]. For instance, Ref. [1] proposed an orchard robot navigation method based on octree-optimized 3D point clouds to improve mapping efficiency, while Ref. [2] provided a comprehensive review of key technologies for autonomous navigation in agricultural machinery. In addition, Ref. [3] developed a navigation framework integrating 3D SLAM with point cloud localization, demonstratin.

g reliable localization performance in orchard environments. However, trellis orchards typically contain dense canopy structures and supporting frames that partially obstruct satellite signals, which can lead to signal attenuation and severe multipath effects. Previous studies have reported that dense vegetation and canopy structures can significantly degrade GNSS positioning accuracy and may even cause temporary signal loss during in-row operations [4,5]. Recent research further confirms that GNSS can be highly unreliable in dense orchard scenarios due to canopy occlusion and multipath propagation. This necessitates alternative navigation strategies. For example, performance evaluations of 2D LiDAR SLAM algorithms in simulated orchard environments explicitly highlighted that GNSS often fails under canopy cover, indicating that SLAM is a feasible alternative in GNSS-denied conditions [6]. In addition, the CropNav framework proposed a hybrid navigation strategy in real farm environments, where GNSS reliability was substantially reduced under crop canopy, and navigation continuity was maintained through automatic switching between GNSS and other sensor modalities [7]. Consequently, conventional GNSS-based localization and path-planning strategies often fail to maintain stable navigation performance inside orchard rows.

In recent years, a wide range of perception, localization, and mapping approaches have been explored for orchard navigation. LiDAR-based SLAM systems, often integrated with GNSS and IMU sensors, have demonstrated robust map construction and accurate trajectory tracking in complex orchard environments [1,3]. For example, Ref. [8] proposed a navigation system for orchard spraying robots based on multi-sensor integration and 3D LiDAR SLAM, fusing LiDAR, IMU, and odometry data to generate high-quality point-cloud maps, with navigation accuracy validated through field experiments. To address GNSS-unavailable or degraded conditions commonly encountered in orchards, tightly coupled LiDAR-IMU fusion methods have been developed for real-time localization and mapping, significantly improving positioning stability under dense canopy coverage [9]. In addition, single-sensor LiDAR approaches have been employed for real-time tree trunk detection in densely planted orchards [4]. Other studies have combined 2D LiDAR with EKF [10], or integrated stereo vision with visual–inertial odometry [11], enhancing localization robustness under illumination variations and texture-degraded conditions. SLAM methods incorporating semantic information have also been introduced to improve environmental understanding and obstacle handling. For instance, Ref. [12] proposed a context-aware navigation framework that integrates semantic perception to support navigation decision-making in horticultural environments, while Ref. [13] developed a GNSS–LiDAR integrated navigation method capable of maintaining stable localization under intermittent GNSS dropout conditions. Furthermore, Ref. [14] proposed a semantic mapping approach to enhance environmental understanding for orchard robots, and Rapado-Rincon et al. [15] introduced Tree-SLAM to enable efficient mapping and localization of individual trees across seasons. Collectively, these studies demonstrate that SLAM and multi-sensor fusion approaches can provide reliable in-row navigation solutions for structured orchard environments.

Despite significant progress, including navigation-line extraction using LiDAR and hybrid strategies enabling adaptive switching between in-row and inter-row navigation [16,17,18], achieving stable and reliable transitions between GNSS-available out-of-row regions and GNSS-denied in-row regions remains challenging. For example, Ref. [16] proposed a navigation-line extraction method based on vision and 2D LiDAR fusion to improve navigation robustness in dense orchards. Ref. [17] investigated a vision-based SLAM localization algorithm for orchard environments, while Ref. [18] developed a LiDAR-SLAM-based orchard inspection robot integrating Hybrid A and DWA for navigation planning. To cope with navigation control under GNSS-limited conditions, some studies have integrated visual–inertial odometry with model predictive control (MPC), achieving reliable steering and path-tracking performance in orchard environments [19]. These results indicate that visual and inertial information can effectively support local navigation tasks in the absence of GNSS constraints.

However, such approaches typically focus on localization or control stability within a single scenario and pay limited attention to autonomous scene recognition and navigation-mode switching in complex orchard environments. Vision-based and LiDAR-based navigation methods often rely on high-density sensors or complex fusion architectures and still exhibit limitations in end-of-row detection and dynamic navigation-mode switching [12,13,18]. Advanced 3D SLAM frameworks, robust 2D SLAM algorithms, and semantic perception models [3,12,14,15,20,21] have improved navigation accuracy across different tree growth stages, while CNN-based obstacle classification methods [22] contribute to more accurate obstacle avoidance in densely planted row environments. Multi-sensor fusion strategies integrating LiDAR, GNSS, IMU, and vision [23] further enhance navigation stability and environmental adaptability.

Although this study focuses on orchard navigation, related research in greenhouses, raised-bed systems, and controlled-environment agriculture also provides valuable insights into multi-sensor fusion and navigation-mode selection. In GNSS-denied environments, multi-sensor fusion localization and navigation studies conducted in greenhouse scenarios have demonstrated that integrated sensor architectures can effectively mitigate localization instability caused by GNSS signal loss or occlusion [24]. Vision-based autonomous navigation has also been extensively investigated in orchard and orchard-like environments, validating the potential of visual perception for path recognition and navigation decision-making [25]. For instance, Ref. [26] proposed an RGB-D vision-based path detection and navigation method for vineyard environments and constructed a corresponding dataset for field validation, showing that visual information can reliably support navigation decisions in structured orchard-row environments. Furthermore, multi-sensor navigation frameworks combining LiDAR, IMU, vision, and GNSS have been shown to improve navigation stability and environmental adaptability in spraying robots and facility-agriculture robots [23,27]. These findings suggest that integrating GNSS and vision-based navigation, together with environment-aware navigation-mode selection, represents a rational technical pathway for bridging in-row and out-of-row navigation in trellis orchards.

Nevertheless, several critical challenges remain in orchard autonomous navigation. First, GNSS performance in trellis orchard in-row areas is severely degraded by canopy and structural occlusions, limiting its applicability for in-row navigation. Second, existing LiDAR-based and vision-based navigation methods, although exhibiting strong localization accuracy and environmental perception capabilities, often rely on high-cost sensors or complex fusion architectures and still face robustness limitations in vision-supported navigation strategies. Third, existing semantic mapping and perception approaches have not been fully exploited to guide adaptive navigation-mode selection between GNSS and vision, resulting in insufficient system-level integration and autonomy at the decision-making layer [21,28,29,30]. Recent surveys and monographs also indicate that existing research on orchard robot navigation mainly focuses on single operational scenarios or individual navigation modes, such as GNSS-based navigation in open areas or vision-based navigation inside orchard rows. Limited attention has been paid to environment-aware coordination between different navigation modes in complex orchard environments, and unified system-level solution frameworks remain insufficiently explored [31].

To further clarify the technical distinctions between the proposed approach and existing orchard navigation studies, representative orchard navigation methods reported in recent years are systematically compared from an engineering perspective, as summarized in Table 1. It should be noted that differences in experimental scenarios, operating speeds, robotic platforms, and evaluation metrics among existing studies make direct quantitative comparisons based on unified error metrics difficult. Therefore, this study focuses primarily on qualitative and system-level comparisons, including sensor configurations, system complexity, environmental adaptability, and key technical characteristics. As shown in Table 1, most existing navigation systems rely on high-density sensors or complex multi-sensor fusion architectures, which increase system complexity and computational requirements. In contrast, the approach proposed in this study aims to achieve reliable navigation performance in trellis orchard environments through a lightweight vision–GNSS cooperative framework, while maintaining moderate computational complexity and good environmental adaptability across different growth stages.

Building upon prior research on row-boundary perception [32], this study focuses on visual perception mechanisms tailored for trellis orchard environments to enhance navigation stability under GNSS-limited conditions. A vision-triggered navigation framework is proposed, in which environmental feature information is exploited to assist navigation-mode selection, enabling continuous and stable autonomous operation.

Importantly, compared with existing orchard navigation studies, our approach emphasizes a lightweight Vision + GNSS cooperative framework that maintains moderate computational complexity while providing reliable navigation across multiple growth stages. While most prior methods rely on high-density sensors, complex multi-sensor fusion, or SLAM-based approaches, our framework achieves similar or improved performance in both in-row and out-of-row navigation. This systematic comparison highlights the practical applicability of our method in trellis orchard transportation scenarios.

The main contributions of this work are summarized as follows:

The differences between in-row and out-of-row scenarios in trellis orchards are systematically analyzed, and a scene-oriented visual perception strategy is established to provide explicit guidance for navigation-mode selection.
A robust visual perception method is designed to improve navigation reliability under severe occlusion and illumination variation conditions.
Extensive field experiments are conducted across different orchard layouts and growth stages, demonstrating that the proposed approach can consistently support continuous navigation in trellis orchards.

The presented results provide practical insights for the engineering deployment of transportation robots in complex trellis orchard environments.

2. Materials and Methods

2.1. System Overview

The hardware platform of the trellis orchard transportation robot is specifically designed to meet the requirements of GNSS–Vision integrated navigation system. The overall system architecture comprises three core modules: (1) perception and localization; (2) control and decision-making; and (3) motion execution. This modular design supports GNSS-based navigation in open-field (inter-row) areas as well as vision-based navigation in constrained (intra-row) orchard environments, supporting both independent operation and system-level integration. The overall navigation hardware architecture is illustrated in Figure 1, and the experimental platform of the trellis orchard robot is shown in Figure 2.

Regarding hardware installation and coordinate frame design, a dual-antenna GNSS module is mounted at the geometric center of the robot to enhance positioning and heading stability. The primary and secondary antennas are arranged along the robot’s longitudinal axis, with the baseline maximized while ensuring horizontal installation and an unobstructed sky view. This configuration reduces uncertainty during attitude compensation and coordinate transformations. In the GNSS coordinate system, the X-axis points in the robot’s forward driving direction, and the Z-axis points vertically downward.

To meet the requirements of environmental structure perception for intra-row navigation, Intel, Santa Clara, CA, USA, RealSense D435i cameras are symmetrically installed at the front and rear along the robot’s longitudinal axis at approximately 0.8 m height. This arrangement minimizes occlusion from the robot structure and enhances stable observation of the orchard row geometry. The main technical specifications of the trellis orchard transportation robot are summarized in Table 2.

2.2. Analysis of Trellis Orchard Navigation Scenarios and System Workflow

2.2.1. Experimental Scenarios and Navigation Mode Classification

Trellis orchards employ support structures to guide tree growth, resulting in dense canopies and complex spatial environments. Variations in trellis design, tree variety, and growth cycles lead to geometric differences across regions, but the general characteristic remains a continuous spatial system of trellis structures and tree rows, which partially constrains GNSS reception and increases demands on perception and navigation stability.

The experimental site is a grape orchard in Maoshatang Village, Bacheng Town, Kunshan City, Jiangsu Province, China. The orchard uses a conventional horizontal trellis with regular inter-row spacing and clear passageways. Cement pillars support the grapevines, forming a structurally stable and highly repetitive workspace (Figure 3).

Figure 4 shows an orthophoto from UAV aerial imaging, illustrating environmental features and the robot’s operational paths. Colored arrows indicate typical navigation segments: green for inter-row (open) navigation, yellow for intra-row (constrained) navigation, and red for critical transition zones, highlighting spatial relationships.

The L1 segment, indicated by green arrows, represents the robot traveling from the warehouse to the orchard entrance, corresponding to the inter-row navigation scenario. In this relatively open space, GNSS reception is favorable, making GNSS-based global positioning and path planning suitable. The L2 segment, shown with yellow arrows, represents intra-row tasks performed within the orchard. Due to occlusion by trellis structures and dense canopies, GNSS positioning accuracy decreases significantly; therefore, this stage relies primarily on vision-based navigation, using real-time perception of row structures and environmental features to achieve stable path following. The red-framed areas, A1 and A2, denote critical transition zones between inter-row and intra-row regions. These zones play a key role in navigation mode switching, requiring reliable identification of row start and end positions. During the inter-row stage, the robot performs global navigation using GNSS while approaching the orchard boundary; upon entering the rows, the navigation mode switches from GNSS-based to vision-based intra-row navigation. The L3 segment, indicated by orange arrows, represents row-changing maneuvers between different rows, which rely on end-of-row detection to support continuous and smooth inter-row transitions.

Based on GNSS availability and orchard spatial characteristics, the operational environment is classified into three navigation scenarios: inter-row (open), intra-row (constrained), and end-of-row/critical zones. GNSS-based navigation is applied in inter-row regions, vision-based navigation in intra-row regions, and end-of-row zones support mode switching. This classification provides the foundation for navigation mode selection and system workflow design.

2.2.2. Workflow Design for Inter-Row and Intra-Row Navigation

The system workflow of the robot’s typical operations within the trellis orchard is abstractly modeled from an engineering perspective. Based on the robot’s spatial positions in the orchard, the operational path is divided into key state points labeled A–G. Points A–D correspond to inter-row navigation, point E represents the transition from inter-row to intra-row, and points E–G correspond to intra-row navigation.

The overall motion workflow is illustrated in Figure 5. Two typical operational paths are included: the first involves departing from the warehouse, traversing the inter-row region, entering the orchard, and performing intra-row transportation; the second involves returning to the warehouse along the same path after completing intra-row operations. From a system perspective, this process can be abstracted as a state transition sequence: Inter-row GNSS navigation → Row-entry vision trigger → Intra-row vision-based navigation → End-of-row detection → Return to inter-row navigation.

Detailed workflow:

Step 1:: Robot at Warehouse (Point A): After system startup, GNSS and vision sensors, as well as control modules, are initialized. The inter-row navigation algorithm and the inter-/intra-row judgment module enter a standby state.
Step 2:: Segment A–D: The system interpolates pre-planned waypoints to generate the inter-row navigation trajectory in real time and acquires relevant navigation parameters.
Step 3:: Trajectory tracking: The robot executes path-following control using GNSS positioning information to complete inter-row navigation tasks.
Step 4:: Approaching row entry (Point E): As the robot nears the intra-row transition, the system determines whether to enter the orchard row based on position information and vision-triggered conditions, while performing necessary attitude adjustments.
Step 5:: Segment E–G: Within the intra-row region, the system detects structural feature points of grapevine supports in real time using vision perception, constructs the row structure model, and fits the navigation line.
Step 6:: Intra-row navigation: The robot follows the generated visual navigation line to perform path tracking and maintain stable intra-row movement.
Step 7:: End-of-row detection (Point G): Upon detecting the end-of-row area, the system executes logical switching of robot motion direction and camera configuration according to the end-of-row judgment, supporting subsequent operations.
Step 8:: Return to warehouse: The return path mirrors the entry process, describing the system state transitions across inter-row and intra-row scenarios.

Note: This study primarily focuses on the feasibility and performance of inter-row GNSS navigation and intra-row vision-based navigation. The navigation mode switching process illustrates the system workflow; its quantitative evaluation will be addressed in future research.

2.3. Multi-Modal Perception and Navigation Mode Coordination Based on Scenario Recognition

In Section 2.2, the operational environment of the trellis orchard was divided into inter-row, intra-row, and end-of-row/critical areas, corresponding to GNSS navigation, vision-based navigation, and navigation mode switching, respectively. This scenario-based classification provides a clear foundation for system-level navigation strategy design. Building upon this, a multi-modal perception and navigation mode coordination mechanism is developed to enable logical transitions between different navigation modes. The overall navigation logic follows: “Global GNSS-based inter-row navigation → Vision-based intra-row precise navigation → End-of-row state triggers return to GNSS navigation”.

To support this mechanism, a vision perception module is designed for orchard operational scenario recognition. In GNSS-limited trellis orchard environments, this module perceives and models the row structure of trees to provide stable geometric constraints for intra-row navigation, while outputting end-of-row state information for navigation mode management and system process control. This module functions as a component within the system architecture and is not the primary focus of algorithmic or experimental validation.

2.3.1. GNSS-Based Inter-Row Global Navigation Mode

In open inter-row areas, the environment is relatively unobstructed, with minimal trellis and canopy blockage. GNSS signal reception is favorable, making GNSS-based global navigation suitable for robot localization and motion control. The main task during this stage is to guide the transport robot from the warehouse or initial position to the orchard row entrance or designated row head area, providing a consistent and reliable initial pose for subsequent intra-row operations.

During inter-row navigation, the robot relies primarily on GNSS positioning information for global path tracking and pose control. A commercial GNSS module is employed, and its output serves as a stable reference without additional algorithmic or model modifications. Combined with pre-planned waypoints, the system achieves stable, continuous straight or curved path motion, ensuring controlled entry into the orchard operational area.

Since the inter-row environment lacks significant trellis obstructions, GNSS navigation meets the required positioning accuracy and system stability. Therefore, only the GNSS navigation mode is activated during this stage, and the vision perception module does not participate in trajectory generation or navigation decision-making, avoiding unnecessary sensing and computational overhead.

2.3.2. Vision-Based Intra-Row Navigation

Within intra-row areas of the trellis orchard, dense canopies, trellis supports, and vine branches severely degrade GNSS signals, making global coordinate-based navigation infeasible. Navigation reference is primarily determined by the geometric relationships of row structures. Constructing a robust vision perception system capable of accurately sensing intra-row structures and converting them into navigation constraints is essential for autonomous intra-row navigation.

This framework includes five key stages: vision data acquisition and preprocessing, tree and trellis detection, geometrically constrained feature selection, row line modeling with temporal stabilization, and intra-row center navigation line generation. Each stage progressively contributes to navigation stability and continuity.

Vision data acquisition and preprocessing

Experiments were conducted at a vineyard in Maoshatang Village, Bacheng Town, Kunshan City, Jiangsu Province, covering grape growth stages: sprouting, growth, and ripening. RGB-D images were captured using an Intel RealSense D435i camera mounted at approximately 0.8 m height, producing 4500 images. Different illumination conditions (noon, afternoon, evening) were included to ensure model generalization (Table 3). The dataset was split 7:3 into training and validation sets, and tree trunks and support poles were manually labeled using Labelme. Bounding boxes were defined as minimum enclosing rectangles including roots and support bottoms.

2.: Tree trunk and trellis support detection

Object detection serves as the front-end of the intra-row vision perception module, providing stable structural feature inputs for row modeling and navigation line generation. A YOLO-series single-stage detection framework was adopted to meet real-time and accuracy requirements. Experiments were conducted on an MV200 industrial computer platform using PyTorch 1.13.1 with CUDA 11.3 under Ubuntu 20.04. The experimental results are reported in Table 4. For the dataset containing nine object categories, YOLOv7 [33] achieved an overall mean Average Precision (mAP) of 91.38% on the complete test set. Detection performance exhibited moderate variation across different growth stages. The highest accuracy was obtained during the germination stage, followed by the vegetative stage, while a slight performance degradation was observed during the mature stage. This trend can be mainly attributed to differences in canopy density and occlusion severity. In the mature stage, dense foliage structures may partially occlude tree trunks and supporting poles, thereby increasing detection difficulty. In contrast, under different illumination conditions within the same growth stage, only minor variations in detection accuracy were observed. This indicates that the adopted detection model maintains strong robustness to illumination changes, which is essential for stable perception in outdoor orchard environments.

Taking YOLOv7 as a representative example, the object detection module exhibits stable performance across different crop growth stages and illumination conditions (Figure 6). Comprehensive experimental results indicate that the adopted detection model consistently provides reliable orchard structural feature outputs under varying environmental conditions, meeting the stability and real-time requirements of in-row visual navigation. These outputs serve as a robust input foundation for subsequent orchard row line fitting and center navigation line generation.

3.: Geometrically constrained feature point selection

In trellised orchard row environments, grapevine trunks and supporting poles are densely distributed, leading to object detection outputs that may include redundant or noisy points irrelevant for navigation. To enhance the stability and robustness of subsequent row-line modeling, a geometrically constrained feature point selection strategy is implemented as follows.

Step 1: Corner point extraction from detection bounding boxes

For each detection bounding box, the top-left corner

(x_{1}, y_{1})

and bottom-right corner

(x_{2}, y_{2})

coordinates are obtained. These corner points serve as candidate feature points for row modeling (Figure 7). Considering the relative position of objects within the image, horizontal centerline segmentation is applied. With an image resolution of 1280 × 720 pixels, the vertical line at x = 640 divides the left and right regions of the image. For objects in the left region, the bottom-right corner

(x_{2}, y_{2})

is selected; for objects in the right region, the bottom-left corner

(x_{1}, y_{2})

is selected (Figure 8). This ensures the feature points are geometrically closer to the inner boundaries of each row.

Step 2: Threshold-based geometric filtering

Single-frame detection may still contain false positives or scattered points. To mitigate this, candidate feature points are filtered using perspective-aware thresholds along both the image width and height:

(1): Far-field points (large image Y-coordinate values): constraints are applied only along the image width: $k_{1}^{w i d t h} \leq x \leq k_{2}^{w i d t h}$ .
(2): Near-field points (small image Y-coordinate values): constraints are applied along both width and height: $k_{3}^{w i d t h} \leq x \leq k_{4}^{w i d t h}$ , $k_{5}^{h e i g h t} \leq y \leq h e i g h t$ .

Here,

k_{1}

–

k_{5}

are empirically determined coefficients reflecting the orchard row geometry and camera perspective (Figure 9). This geometric filtering removes outliers caused by foliage occlusion and ensures the remaining points represent the structural boundaries of the row accurately.

Step 3: Left–right row separation

Filtered points are further classified as belonging to the left or right row using the vertical centerline as a boundary. This spatial partitioning facilitates independent row line fitting in subsequent steps.

4.: Orchard row line modeling with temporal stabilization

In trellised orchard environments, grapevine trunks and supporting poles are densely distributed, and single-frame feature points extracted from object detection may be noisy or partially occluded. To enhance the stability of row-line fitting, particularly in end-of-row areas, a temporal orchard row-line modeling strategy is employed, integrating RANSAC-based fitting with inter-frame information fusion.

Step 1: Candidate feature point extraction from the current frame

The left and right orchard row lines estimated from the previous frame are retained as prior knowledge. This temporal continuity mitigates sudden loss of row lines due to occlusion, sparse detection points, or irregular bounding boxes.

Step 2: Detection of trunks and supporting poles in the current frame.

Geometrically constrained corner points obtained from the current frame (Section 3) are used as candidate points. Points are divided into left and right row sets based on the image vertical centerline (x = 640 for 1280 × 720 resolution).

Step 3: RANSAC-based row-line fitting

For each orchard row, a RANSAC algorithm is applied to fit a linear model, yielding the slopes of the left and right row lines (

k_{1}

and

k_{2}

). To ensure reproducibility, the following RANSAC configuration was adopted:

(1): Minimal sample set: Two candidate feature points are randomly sampled in each iteration;
(2): Maximum iteration number: Set to 80 to enhance robustness under sparse features, occlusion, and potential outliers at row ends;
(3): Inlier distance threshold: $ϵ = 12$ px. A candidate point is considered an inlier if its distance to the fitted line is ≤12;
(4): Confidence level and minimum inlier ratio: Confidence $P = 0.99$ and minimum inlier ratio $ρ = 0.30$ , yielding a theoretical minimum iteration count $N_{m i n} = \frac{\log (1 - P)}{\log (1 - ρ^{8})} \approx 49$ , where $s = 2$ is the minimal sample size;
(5): Early stopping criteria: Iteration stops when the current model inlier ratio exceeds 0.70 or the best inlier count does not increase over 10 consecutive iterations;
(6): Random seed control: Seed fixed at 2024 to ensure reproducibility across experiments;
(7): Computational complexity: For a single frame with $m$ candidate feature points, the RANSAC complexity is $O (N \cdot m)$ . Introducing temporal constraints adds only constant-time operations (reading previous-frame model and threshold comparison) and has minimal impact on real-time performance;

This parameterized RANSAC procedure ensures stable row-line fitting under partial occlusion and sparse feature conditions, providing reliable constraints for subsequent navigation tasks.

Step 4: Slope validity evaluation based on perspective geometry

Due to perspective projection, orchard rows appear as a trapezoidal region in the image plane (Figure 10). Reference slopes

Q_{1}

and

Q_{2}

are computed from the lateral edges of this trapezoid. A fitted row line is considered valid if:

|k_{1} - Q_{1}| \leq t h r e s h o l d

,

|k_{2} - Q_{2}| \leq t h r e s h o l d

. If the slope check fails, the row line from the previous frame is retained to maintain temporal stability.

Step 5: Temporal update and row-line selection

Accepted row lines are stored for the next frame, enabling frame-to-frame fusion that reduces jitter and ensures continuous navigation in areas with partial occlusion.

This workflow guarantees that the intra-row navigation line generated in the subsequent step is based on temporally stabilized and geometrically validated row structures, providing a reliable reference for path tracking.

5.: Generation of the intra-row center navigation line

To achieve accurate heading control, a center navigation line is generated using the angle bisector principle between the left and right fitted row lines. Considering perspective convergence, the slopes

k_{1}

and

k_{2}

are used to compute the angle bisector slope

k_{3}

, from which the intersection point

(x_{3}, y_{3})

and navigation line are derived (Equations (1)–(4), Figure 11).

k_{3} = \frac{(k_{1} \cdot k_{2} - 1) - \sqrt{{(k_{1} \cdot k_{2} - 1)}^{2} + {(k_{1} + k_{2})}^{2}}}{k_{1} + k_{2}},

(1)

Solving Equation (3) yields the intersection point

(x_{3}, y_{3})

as given in Equation (4). Finally, the center navigation line is formulated as Equation (5). The resulting navigation line is illustrated in Figure 11.

\{\begin{matrix} y_{1} = k_{1} \cdot x + b_{1} \\ y_{2} = k_{2} \cdot x + b_{2} \end{matrix},

(2)

\{\begin{array}{l} x_{3} = \frac{b_{2} - b_{1}}{k_{1} - k_{2}} \\ y_{3} = k_{1} \cdot x_{3} + b_{1} \end{array},

(3)

{Y - y}_{3} = k_{3} \cdot (X - x_{3}),

(4)

2.3.3. Scene-Aware Navigation Mode Switching Mechanism

The robot must navigate between open inter-row areas and constrained intra-row regions. To ensure continuous operation, a scene-aware navigation mode switching mechanism is implemented.

The mechanism relies on an end-of-row state recognition module to identify the current scenario. When strong row structural constraints are detected or GNSS signals are unreliable, the navigation mode switches from GNSS-based inter-row navigation to vision-based intra-row navigation. This module has been validated in prior work [32] and functions as a system-level trigger rather than an algorithmic focus.

Through this mechanism, the system achieves logical coordination between out-of-row and in-row navigation at the framework level. It is important to note that the primary focus of this study is to evaluate the performance of GNSS-based out-of-row navigation and vision-based in-row navigation in their respective applicable scenarios. The navigation mode switching is presented primarily to illustrate the system operation mechanism, rather than as an independent quantitative evaluation. Therefore, detailed metrics such as switching success rate, switching duration, or instantaneous lateral/heading errors are not reported in this work.

This mechanism ensures logical consistency during transitions and supports stable continuous navigation within complex trellised orchard environments, providing a foundation for future research on system-level integration and automatic mode switching.

3. Results

3.1. Experimental Setup and Evaluation Metrics

To validate the effectiveness and robustness of the proposed in-row visual navigation method, systematic comparative experiments and field trials were conducted in a representative trellis-style vineyard. The experiments evaluated tree-row line fitting accuracy, center navigation line stability, and overall robot navigation performance, with multiple quantitative metrics providing an objective assessment.

3.1.1. Experimental Platform and Scene Configuration

The experiments were conducted using a self-developed trellis-orchard transportation robot (hardware configuration detailed in Section 2.1), capable of GNSS-based global navigation in open areas and vision-based autonomous navigation within orchard rows.

The experimental site was a standardized vineyard located in Maoshatang Village, Bacheng Town, Kunshan City, Suzhou, Jiangsu Province, China. The vineyard uses a horizontal trellis cultivation system, with regularly spaced inter-row corridors and continuous, structurally stable canopies, representing typical operating conditions of trellis orchards.

To evaluate adaptability under varying environmental conditions, experiments were conducted across three grape growth stages (budburst, growth, and maturity), and datasets were collected under different illumination conditions in representative areas (see Section 2.2.1).

In trellised orchard environments, the transportation robot must navigate between open inter-row areas and constrained intra-row regions. To ensure continuous system operation, a scene-aware navigation mode switching mechanism is implemented to enable smooth transitions between GNSS-based navigation and vision-based row navigation.

3.1.2. Navigation Control and Experimental Procedure

The in-row visual navigation workflow consists of three main stages: tree-row line detection, center navigation line generation, and real-time motion control based on the generated navigation line.

Tree-row line detection: The visual perception module captures the intra-row environment in real time and fits the left and right tree-row lines.
Center navigation line generation: A center navigation line is generated based on the geometric relationship between the two tree-row lines.
Motion control: A Pure Pursuit controller is applied to the differential-drive chassis using the navigation parameters, enabling autonomous path tracking along the center navigation line.

To enable reproducibility and support experimental evaluation, the controller and perception parameters are explicitly specified as follows:

Look-ahead distance: In in-row visual navigation mode, it is set based on corridor width, vehicle kinematics, and operational speed: 0.9 m at 0.4 m/s, 1.0 m at 0.6 m/s, and 1.1 m at 0.8 m/s. In out-of-row GNSS navigation mode, it is set to 1.2 m.
Controller update frequency: 20 Hz.
Speed control strategy: Constant linear velocity reference according to the experimental speed settings.
Steering-rate limitation: Angular velocity is saturated at ±0.75 rad/s to prevent overshoot and oscillations in segments with sharp curvature changes.
Perception-to-control latency: The end-to-end delay from the visual perception module to the controller input is approximately 0.1 s, sufficient to maintain real-time navigation at the experimental low speeds.

During the experiments, the robot operated at a relatively low speed to mitigate the effects of ground unevenness and visual noise on path tracking performance. The look-ahead distance of the Pure Pursuit algorithm was adjusted according to the corridor width of the trellis orchard and the robot’s kinematic characteristics, balancing tracking accuracy and control smoothness.

3.1.3. Evaluation Metrics and Analysis Methods

To comprehensively evaluate the performance of the in-row visual navigation method, evaluation metrics were designed from both perception and control perspectives.

Perception level: The focus was on the fitting quality of tree-row lines and the generated center navigation line. The proportion of valid fitted frames was statistically analyzed to quantify algorithm robustness under noise interference and end-of-row conditions. A valid fitted frame is defined as follows: two researchers independently annotated the left and right tree-row reference lines in the original images, based on the geometric boundaries of trunks or trellis supports near the inner side of the row. In cases of significant disagreement, consensus annotations were obtained through discussion. A frame is considered valid if the angular error of both left and right fitted lines relative to the human-annotated reference lines does not exceed 5° and the mean point-to-line distance of candidate feature points is less than 15 px. Fitting accuracy is defined as the ratio of valid fitted frames to the total number of tested frames. This definition was consistently applied for the “Correct Fits/Accuracy (%)” reported in Table 5, Table 6, Table 7, Table 8 and Table 9.

It should be noted that row-line fitting quality does not directly equate to vehicle navigation error; however, it influences the generation of the center navigation line and subsequently affects control inputs. Therefore, more stable row-line fitting generally helps reduce heading fluctuations and lateral deviations during navigation. Qualitative visualization in representative scenarios further supports the assessment of fitting stability.

2.: Control level: Lateral deviation and heading deviation were analyzed to evaluate the robot’s tracking performance. Heading deviation is defined as the absolute angle difference between the robot’s current heading and the reference heading tangent to the center navigation line at the look-ahead point: $∆ ψ = |ψ - ψ_{r e f}|$ , where $∆ ψ$ is the heading deviation, $ψ$ is the robot’s current heading angle, and is the reference heading computed from the center navigation line at the look-ahead point. Smaller values indicate better alignment with the desired navigation direction. Lateral deviation is further defined to characterize the robot’s position relative to the row centerline. Using camera imaging geometry, the horizontal pixel offset of the center navigation line at the look-ahead depth $Z_{p}$ is converted to a physical lateral deviation: $y = \frac{(μ_{c} - μ_{0}) \times Z_{p}}{f_{x}}$ , where $y$ is the lateral deviation, $μ_{c}$ is the horizontal pixel coordinate of the center navigation line at the look-ahead depth, $μ_{0}$ is the camera principal point in the x-direction, and $f_{x}$ is the camera focal length in pixels. This conversion gives the lateral deviation in meters, providing a clear physical meaning for in-row navigation errors.

It should be noted that pixel-based metrics are sensitive to camera resolution, mounting height, and perspective. In this study, all in-row experiments were conducted with the same camera, identical installation height (0.8 m), fixed resolution (1280 × 720), and field of view configuration. Pixel coordinates are therefore used only as intermediate computations for tree-row extraction and center-line generation, while navigation performance is reported in angular or physical units (° or m). Pixel-based errors should not be interpreted as cross-platform performance indicators.

3.2. Comparative Experiments on Vineyard Row Line Fitting Methods

In complex trellis vineyard environments, reliable visual navigation relies on accurately fitting vineyard row lines from the detected trunk and support post feature points. Based on the previously extracted feature points, this study obtained the key structural corner points for row line modeling. Common linear fitting methods include Least Squares (LS), Hough Transform, Spline Interpolation, and Random Sample Consensus (RANSAC). Among these, Hough Transform suffers from high computational complexity in high-dimensional parameter spaces and performs poorly on imperfect linear structures. Spline Interpolation can easily overfit under sparse or noisy feature point conditions. Both are thus unsuitable for complex trellis vineyards with frequent occlusions. In contrast, Least Squares is computationally simple and suitable for linear data, but it is sensitive to noise and outliers. RANSAC exhibits strong robustness against outliers, producing more stable fits in complex environments.

Therefore, this section conducts comparative experiments using LS, single-frame RANSAC, and RANSAC incorporating temporal frame information, evaluating their performance under noise, outliers, and sparse end-of-row conditions, providing a foundation for subsequent center navigation line generation.

3.2.1. Vineyard Row Line Fitting Based on Least Squares

LS is a classical linear estimation method that determines the optimal slope and intercept by minimizing the sum of squared residuals. In our experiments, feature points from the left and right sides of each image frame were fitted separately to obtain the corresponding vineyard row lines.

However, due to environmental factors such as variable lighting, occlusion by leaves, and uneven feature point distribution, Least Squares is sensitive to noise and outliers. In particular, fitting at the row ends or in sparse regions may deviate significantly or even fail. Figure 12 illustrates examples where row lines at the end of the vineyard failed to fit accurately. Quantitative results are presented in Table 5. The average fitting accuracy across different rows is approximately 80.4%, indicating limited robustness of LS in complex trellis vineyard environments and its inability to fully meet the stability requirements for in-row visual navigation.

3.2.2. Vineyard Row Line Fitting Based on RANSAC

RANSAC iteratively selects a minimal subset of points to hypothesize a model and evaluates the number of inliers to maximize consistency, exhibiting strong robustness in the presence of outliers. Experimental results demonstrate that RANSAC achieves an average fitting accuracy of 91.7% across all rows (Table 6), significantly outperforming Least Squares and effectively suppressing noise and outlier interference.

However, in end-of-row or cross-trellis scenarios, single-frame RANSAC may still encounter fitting interruption or misfitting due to sparse feature points. Figure 13 and Figure 14 illustrate cases where row lines were not fitted correctly or were overfitted at row ends, revealing the limitations of single-frame RANSAC.

3.2.3. RANSAC Incorporating Temporal Frame Information

To address the instability of single-frame RANSAC at row ends, temporal frame information is integrated into the RANSAC fitting process. This introduces continuity and dynamic updates of vineyard row lines in the temporal dimension, enhancing stability. During robot traversal, the vision sensor captures sequential frames. Feature points from each frame are used for the current row line fit while constrained by neighboring frames, mitigating fitting instability caused by sparse end-of-row points.

Figure 15 illustrates the fitting effect of this method at the row end. Figure 15a,b show reduced feature points due to missing trunks or support posts, leading to interruption in single-frame fitting. Figure 15c shows misfitting under sparse conditions. By contrast, RANSAC with temporal constraints leverages spatial continuity across frames, successfully reconstructing the row lines even with unilateral feature loss (Figure 15d–f).

Quantitative evaluation is shown in Table 7. Temporal RANSAC maintains high fitting accuracy across all rows, with improvements exceeding 5.93% over single-frame RANSAC, demonstrating the effectiveness of temporal constraints under sparse end-of-row conditions.

Furthermore, the overall valid fitting rate across all test frames was computed for three methods (Table 8). Least Squares, single-frame RANSAC, and Temporal RANSAC achieved overall valid fitting rates of 80.31% (1570/1955), 91.66% (1792/1955), and 97.24% (1901/1955), respectively, with 95% confidence intervals calculated using the Wilson method. Temporal RANSAC improved the overall valid fitting rate by 5.58 percentage points over single-frame RANSAC, further demonstrating the effectiveness of temporal constraints in stabilizing end-of-row fitting.

Furthermore, to verify applicability across growth stages, images from budburst, growth, and maturity stages were tested. Figure 16 illustrates the fitting results, and Table 9 summarizes the accuracy. Average accuracies for budburst, growth, and maturity are 98.20%, 98.07%, and 97.63%, respectively, with an overall average of 97.97%. Despite increased occlusion and structural complexity at later stages, the method remains robust.

In summary, temporal RANSAC effectively overcomes fitting failures at row ends observed with traditional methods and maintains robustness across all growth stages, providing a reliable foundation for visual navigation path generation in trellis vineyard environments.

3.3. Comparative Experiments on Vineyard Row Line Fitting Methods

To evaluate the applicability and stability of the generated navigation lines under different grapevine growth stages, systematic experiments were conducted in three representative trellis-based orchard scenarios, including the budding stage, growing stage, and maturing stage. The experiments focused on assessing the accuracy and continuity of navigation line fitting under increasing variations in plant morphology and occlusion conditions, thereby verifying the robustness of the proposed method in real orchard operating environments.

Figure 17 illustrates the in-row navigation line fitting results at the three growth stages. During the budding stage, vine branches and leaves are sparse, and the trunks and supporting structures are clearly visible. Under these conditions, navigation line generation remains stable, and the resulting paths are continuous and accurately reflect the orchard row direction. As the vines enter the growing stage, foliage density increases and partial occlusions begin to appear; nevertheless, the generated navigation lines maintain consistent alignment with the orchard rows. In the maturing stage, dense canopies and significant occlusions are present, leading to slight fluctuations in individual frames. However, the overall continuity and usability of the navigation paths are not noticeably degraded.

To quantitatively evaluate navigation line fitting performance across different growth stages, the fitting accuracy was statistically analyzed, as summarized in Table 10. The results indicate that the average navigation line fitting accuracy remains above 92% for all three stages. The highest average accuracy, 94.35%, is achieved during the budding stage, primarily due to the relatively simple plant structure and uniform distribution of feature points, which facilitate stable path generation. As vine growth progresses into the growing and maturing stages, increased foliage occlusion and structural complexity result in a slight reduction in accuracy; however, the overall decline remains limited.

Furthermore, overall statistics were computed for each growth stage (Table 11). The overall navigation line fitting accuracy for the budding, growing, and maturing stages was 94.37%, 92.69%, and 92.28%, respectively, with corresponding 95% confidence intervals of 93.26–95.31%, 91.45–93.76%, and 91.01–93.38%. From budding to maturing stages, the overall accuracy decreased by 2.09 percentage points, indicating that increased canopy occlusion slightly reduces path generation precision. Nevertheless, accuracy remains above 92%, demonstrating good stability across multiple growth stages.

Overall, the experimental results demonstrate that the proposed navigation line fitting method can reliably generate continuous and usable in-row navigation paths across different vine growth stages. Even under severe occlusion conditions during the maturing stage, the method maintains high fitting accuracy, consistent with the results obtained from the vineyard row line fitting experiments. These findings confirm the robustness and practical applicability of the proposed approach in dynamic trellis orchard environments and provide a reliable foundation for visual in-row navigation of orchard transportation robots.

3.4. Orchard In-Row and Out-of-Row Navigation Experiments

To evaluate the practical performance of the proposed navigation method under different operational scenarios in trellis-based orchards, both out-of-row GNSS-based path-following experiments and in-row vision-based navigation experiments were conducted. The experiments systematically assessed navigation accuracy, stability, and adaptability of the system in complex orchard environments.

3.4.1. Out-of-Row GNSS-Based Navigation Experiments

The out-of-row navigation experiments were conducted in open corridors of the trellis orchard. The transportation robot was commanded to follow pre-planned straight, curved, and Z-shaped paths, with a constant operating speed of 0.8 m/s. The reference paths were generated using pre-collected GNSS waypoints. During each experiment, the actual trajectory of the robot was recorded to evaluate navigation accuracy and path-tracking performance.

Figure 18 illustrates the experimental setup and path layout in the out-of-row orchard environment. Figure 19, Figure 20 and Figure 21 present the lateral deviation as a function of travel distance for the straight, curved, and Z-shaped paths, respectively. Table 12 summarizes the lateral deviation statistics, including maximum, minimum, and average deviations for each trial, as well as the average, standard deviation, and coefficient of variation (CV) across the three trials, providing a comprehensive view of trajectory stability and deviation fluctuations.

As shown in Table 12, the average lateral deviations of the three path types across three trials were 0.094 m, 0.164 m, and 0.221 m, with corresponding standard deviations of 0.032 m, 0.019 m, and 0.040 m. These results indicate that the robot can maintain stable tracking under different path geometries, while path complexity influences both deviation magnitude and variability, with the Z-shaped path showing the largest average and most variable deviations. It should be noted that the previous “centimeter-level” description applies only to steady-state segments after initial path capture and does not represent the full trajectory. The larger peaks in Table 12 mainly occur during path initialization and high-curvature transitions, with the maximum deviation of 1.680 m observed in the second trial of the Z-shaped path. As the robot converges to the desired pose, lateral deviations decrease significantly and stabilize. Therefore, the out-of-row navigation results are more accurately described as “average lateral deviations ranged from 0.094 to 0.221 m, with steady-state errors considerably lower than those during initial path capture and turning transitions”.

The visualization results in Figure 19, Figure 20 and Figure 21 further demonstrate that, despite differences in path geometry, the robot is capable of following the target trajectories continuously and reliably. The maximum deviations mainly occur during the initial phase, while the overall navigation process remains stable, confirming the feasibility and high accuracy of GNSS-based navigation in open orchard corridors.

3.4.2. In-Row Vision-Based Navigation Experiments

The in-row navigation experiments were conducted within the trellis orchard, where realistic operational conditions such as illumination variation, foliage occlusion, and ground surface characteristics were considered. The robot traveled along the pre-generated in-row center navigation path, while the visual perception system continuously detected grapevine trunks and supporting structures to fit the vineyard row lines and the center navigation line in real time.

To investigate the influence of operating speed on in-row navigation performance, experiments were performed at three different speeds: 0.4 m/s, 0.6 m/s, and 0.8 m/s. For each speed setting, 100 image frames were randomly sampled to compute heading deviation statistics. Figure 22, Figure 23 and Figure 24 show the pixel-based heading deviation as a function of travel distance for the three speed conditions. Table 13 summarizes the maximum, minimum, average, and range (maximum–minimum) of heading deviations, providing a more comprehensive view of navigation stability across different speeds. As indicated in Table 13, the average heading deviations at 0.4 m/s, 0.6 m/s, and 0.8 m/s were 3.24°, 5.23°, and 8.12°, respectively, with corresponding ranges of 10.40°, 20.07°, and 12.22°. Among the three speed settings, 0.4 m/s yielded the smallest average deviation, indicating higher directional stability at low speed. At 0.6 m/s, the average deviation remained acceptable, but transient fluctuations were larger, suggesting that temporary heading offsets were more likely under local occlusions or near the end-of-row. At 0.8 m/s, the average deviation further increased, showing that the controller faced greater challenges tracking visual guidance information as speed increased. Considering both navigation accuracy and operational efficiency, 0.6 m/s was selected as the recommended operational speed for the experimental platform, while 0.4 m/s achieved the best precision and stability from a purely accuracy-oriented perspective.

Overall, the experimental results indicate that low to moderate operating speeds provide a favorable balance between navigation accuracy and system stability for in-row navigation in trellis orchards. In particular, an operating speed of 0.6 m/s achieves satisfactory accuracy while maintaining practical efficiency, making it suitable for in-row cruising and operational tasks.

4. Discussion

This study addresses the navigation requirements of orchard transportation robots in two representative operational scenarios, namely in-row and out-of-row environments. Correspondingly, a vision-based navigation method for in-row operation and a GNSS-based path-following navigation method for out-of-row operation were proposed and experimentally validated. Compared with existing studies, which often focus on a single scenario or a single navigation modality [31], this study provides a systematic evaluation of both navigation modules, enabling environment-aware navigation-mode coordination. Field experiments conducted under different crop growth stages demonstrate that both navigation modules are capable of maintaining stable performance despite significant variations in plant morphology and illumination conditions, highlighting the robustness and environmental adaptability of the proposed system.

In out-of-row navigation scenarios, where orchard corridors are relatively open and occlusion is limited, a GNSS waypoint-based trajectory was adopted for path-following control. Experimental results show that the transportation robot can accurately track predefined trajectories under straight, turning, and Z-shaped path conditions, with lateral deviations consistently maintained at the centimeter level. These results indicate that GNSS-based navigation can provide reliable positioning and trajectory tracking performance in relatively open orchard corridors. These findings are consistent with previous studies reporting that GNSS systems generally perform reliably in open orchard environments [5,13].

In in-row navigation scenarios, where dense foliage occlusion and GNSS signal degradation are common, a vision-based navigation strategy was employed. By extracting orchard row structural features, the proposed method enables path constraint generation and autonomous in-row motion. Experimental results indicate that the robot maintains stable trajectories within orchard rows, even in the presence of branch occlusions or sparse feature points near row ends. The fitted navigation lines exhibit good continuity and reliability, with navigation line fitting accuracies exceeding 92% across different growth stages. These results confirm that the proposed vision-based navigation approach can effectively adapt to structural variations and environmental complexity in trellis orchard environments, providing reliable guidance for in-row robot motion. These findings are consistent with prior work highlighting the effectiveness of vision-based perception in GNSS-denied environments [8,11,16,26].

Compared with previous orchard navigation studies, the proposed modular vision–GNSS navigation framework exhibits several distinctive features. Most prior approaches rely on high-density LiDAR or complex multi-sensor fusion architectures [1,3,5,6,7,8,12,13,14,15], which can achieve high localization accuracy but incur high computational and hardware costs. In contrast, our framework uses a lightweight Vision–GNSS combination, achieving reliable in-row and out-of-row performance without requiring dense LiDAR deployment or highly complex sensor fusion, thereby offering a more practical solution for engineering deployment.

Existing GNSS-based methods [5,13] perform well in open orchard corridors but fail under dense canopy occlusion, whereas vision-based methods [8,11,16,26] enable in-row tracking under GNSS-denied conditions but often rely on dense features or high-end cameras. Our approach balances these trade-offs by validating separately the vision-based in-row module and the GNSS-based out-of-row module, providing robust guidance under a variety of growth stages and row structures.

However, the framework does not yet implement automatic mode switching between in-row and out-of-row navigation; thus, unlike hybrid frameworks such as CropNav [7], it cannot autonomously select navigation modes during operation. This limitation highlights an area for future integration of dynamic switching strategies and multi-sensor fusion to achieve end-to-end autonomous navigation.

It should be noted that automatic switching between in-row and out-of-row navigation modes was not implemented within a single experimental workflow. Instead, the focus of this study is on evaluating the performance and stability of each navigation module in its respective scenario. Our previous work has explored navigation-mode switching within an environment-aware vision–GNSS integrated framework [32]. By validating both navigation modules independently, the present study establishes a reliable experimental basis for future implementation of seamless navigation-mode transitions.

Furthermore, the switching mechanism between GNSS and vision-based navigation constitutes a system-level strategy rather than a core algorithmic contribution. The experimental evaluation primarily focuses on overall navigation stability and trajectory accuracy during continuous operation, without quantitatively analyzing the dynamic characteristics of the switching process. In practical orchard operations, navigation-mode switching typically occurs only at row boundaries, which are spatially constrained and relatively predictable. During the conducted experiments, no instability or navigation failure associated with mode transition was observed. Future work may further investigate the dynamic behavior of the switching process to enhance system-level performance.

It is also important to consider the limitations of the current study. The experiments were mainly conducted in structured and well-organized trellis orchard environments. Variations in orchard structure, such as irregular row spacing, heterogeneous canopy density, or uneven terrain, may influence visual feature extraction and navigation accuracy. For example, extremely dense foliage or severe canopy occlusion may reduce the accuracy of in-row line fitting, while irregular row spacing or sloped terrain may introduce path planning deviations. Future work will extend the validation of the proposed method to more diverse orchard environments and investigate multi-sensor fusion and adaptive parameter optimization strategies [23,24,27] to further improve generalization and engineering applicability.

Overall, the proposed “vision-based in-row navigation combined with GNSS-based out-of-row navigation” strategy aligns well with the spatially varying perception conditions encountered in orchard environments. By leveraging the complementary advantages of visual perception and satellite positioning, the proposed approach reduces dependence on a single sensing modality while ensuring stable navigation performance. The experimental results demonstrate that the combined navigation scheme effectively satisfies the navigation requirements of orchard transportation robots operating in both in-row and out-of-row environments, providing a practical foundation for future engineering deployment and multi-scenario applications.

5. Conclusions

This study addresses the autonomous navigation problem of transportation robots operating in complex trellis orchard environments. A combined navigation framework integrating vision-based and GNSS-based methods is proposed and experimentally validated, enabling effective coverage of both in-row and out-of-row operational scenarios. The framework explicitly considers the spatial variability of GNSS signal availability in orchard environments, thereby selecting appropriate navigation strategies for different operational conditions.

Experimental results of in-row visual navigation demonstrate that the front-end object detection module, represented by YOLO-series models, is capable of providing stable and reliable structural feature inputs for orchard row modeling and navigation line generation. Field experiments conducted under different crop growth stages further verify the stability and engineering feasibility of the proposed perception and navigation approach. The results indicate that the visual perception module consistently supports navigation decision-making and motion control during continuous operation, and no control abnormalities caused by detection or inference latency were observed in the conducted experiments.

Under out-of-row navigation conditions, the transportation robot maintains an average lateral deviation ranging from 0.093 to 0.221 m along straight, turning, and Z-shaped paths, satisfying the positioning accuracy and motion stability requirements of orchard transportation and transition operations. For in-row visual navigation, when the robot operates at the recommended speed of 0.6 m/s, the average heading deviation is approximately 5.23°, indicating good compatibility and coordination between the vision-based navigation method and the motion control module in real orchard environments.

Regarding in-row navigation implementation, this study combines a geometry-constrained feature corner selection strategy with a RANSAC-based line fitting method that incorporates temporal image frame information, enabling stable extraction of orchard row structures. Experimental results show that the temporal RANSAC method achieves an overall valid fitting rate of 97.24% across all test frames, while the average row-line fitting accuracy reaches 97.97% across different crop growth stages. The center navigation line generated using the angle bisector method achieves an average accuracy exceeding 92% under different crop growth conditions, demonstrating the robustness and environmental adaptability of the proposed method.

Overall, the experimental results demonstrate that the proposed GNSS–vision combined navigation framework can provide stable navigation performance in both in-row and out-of-row operational environments. These findings provide a methodological and experimental foundation for future research on autonomous navigation mode switching, system integration, and large-scale deployment of orchard robotic systems.

Author Contributions

Conceptualization, H.G. and C.G.; methodology, H.G. and Y.W.; software, H.G. and H.L.; validation, H.G. and Y.W.; formal analysis, H.G. and H.L.; investigation, H.G. and Y.W.; resources, T.T., C.G. and T.Z.; writing—original draft preparation, Y.W. and H.G.; writing—review and editing, H.G. and H.L.; visualization, H.G. and H.L.; supervision, T.T. and C.G.; project administration, H.G. and T.Z.; funding acquisition, T.T., Y.W. and T.Z.; All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Innovation Program of the Chinese Academy of Agricultural Sciences (CAAS-CAE-202302 and CAAS-CAE-202301); the Science and Technology Innovation Project of the National Center for Tropical Agricultural Sciences, Chinese Academy of Tropical Agricultural Sciences (CATAS202503); the Hainan Provincial Philosophy and Social Sciences Planning Project (NO.HNSK(YB)24-18); and the Key Laboratory of Applied Research on Tropical Crop Information Technology of Hainan Province (ZDSYS-KFJJ-202503). The APC was funded by the same funding sources.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The datasets presented in this article are not readily available because the data are part of an ongoing study. Requests to access the datasets should be directed to chxgeng@suda.edu.cn.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

GNSS	Global Navigation Satellite System
RANSAC	Random Sample Consensus
LiDAR	Light Detection and Ranging
SLAM	Simultaneous Localization and Mapping
IMU	Inertial Measurement Unit
EKF	Extended Kalman Filtering
MPC	Model Predictive Control
2D	Two-Dimensional
CNN	Convolutional Neural Network
YOLOv7	You Only Live Once Version 7
VIO	Visual-Inertial Odometry
NN	Neural Network-Based
UAV	Unmanned Aerial Vehicle
DC	Direct Current
CUDA	Compute Unified Device Architecture
mAP	Mean Average Precision

References

Li, H.; Huang, K.; Sun, Y.; Lei, X.; Yuan, Q.; Zhang, J.; Lv, X. An autonomous navigation method for orchard mobile robots based on octree 3D point cloud optimization. Front. Plant Sci. 2024, 15, 1510683. [Google Scholar] [CrossRef]
Wu, H.; Wang, X.; Chen, X.; Zhang, Y.; Zhang, Y. Review on Key Technologies for Autonomous Navigation in Field Agricultural Machinery. Agriculture 2025, 15, 1297. [Google Scholar] [CrossRef]
Xia, Y.; Lei, X.; Pan, J.; Chen, L.; Zhang, Z.; Lyu, X. Research on orchard navigation method based on fusion of 3D SLAM and point cloud positioning. Front. Plant Sci. 2023, 14, 1207742. [Google Scholar] [CrossRef] [PubMed]
Jiang, A.; Ahamed, T.J. Navigation of an Autonomous Spraying Robot for Orchard Operations Using LiDAR for Tree Trunk Detection. Sensors 2023, 23, 4808. [Google Scholar] [CrossRef] [PubMed]
Wang, W.; Qin, J.; Huang, D.; Zhang, F.; Liu, Z.; Wang, Z.; Yang, F. Integrated Navigation Method for Orchard-Dosing Robot Based on LiDAR/IMU/GNSS. Agronomy 2024, 14, 2541. [Google Scholar] [CrossRef]
Li, Q.; Zhu, H. Performance evaluation of 2D LiDAR SLAM algorithms in simulated orchard environments. Comput. Electron. Agric. 2024, 221, 108994. [Google Scholar] [CrossRef]
Gasparino, M.V.; Higuti, V.A.; Sivakumar, A.N.; Velasquez, A.E.; Becker, M.; Chowdhary, G. Cropnav: A framework for autonomous navigation in real farms. In Proceedings of the 2023 IEEE International Conference on Robotics and Automation (ICRA), London, UK, 29 May–2 June 2023; pp. 11824–11830. [Google Scholar]
Jiang, S.; Qi, P.; Han, L.; Liu, L.; Li, Y.; Huang, Z.; Liu, Y.; He, X. Navigation system for orchard spraying robot based on 3D LiDAR SLAM with NDT_ICP point cloud registration. Comput. Electron. Agric. 2024, 220, 108870. [Google Scholar] [CrossRef]
Shen, Y.; Xiao, X.; Liu, H. Real-time localization and mapping method for agricultural robot in orchards based on LiDAR/IMU tight-coupling. Trans. Chin. Soc. Agric. Mach. 2023, 54, 20–28. [Google Scholar]
Su, Z.; Zou, W.; Zhai, S.Q.; Tan, H.; Yang, S.; Qin, X. Design of an Autonomous Orchard Navigation System Based on Multi-Sensor Fusion. Agronomy 2024, 14, 2825. [Google Scholar] [CrossRef]
Xu, X.; Liang, J.; Li, J.; Wu, G.; Duan, J.; Jin, M.; Fu, H. Stereo visual-inertial localization algorithm for orchard robots based on point-line features. Comput. Electron. Agric. 2024, 224, 15. [Google Scholar] [CrossRef]
Jin, P.; Li, T.; Pan, Y.; Hu, K.; Xu, N.; Ying, W.; Jin, Y.; Kang, H. A Context-Aware Navigation Framework for Ground Robots in Horticultural Environments. Sensors 2024, 24, 3663. [Google Scholar] [CrossRef]
Li, Y.; Feng, Q.; Ji, C.; Sun, J.; Sun, Y. GNSS and LiDAR Integrated Navigation Method in Orchards with Intermittent GNSS Dropout. Appl. Sci. 2024, 14, 3231. [Google Scholar] [CrossRef]
Pan, Y.; Hu, K.; Cao, H.; Kang, H.; Wang, X. A novel perception and semantic mapping method for robot autonomy in orchards. Comput. Electron. Agric. 2024, 219, 108769. [Google Scholar] [CrossRef]
Rapado-Rincon, D.; Kootstra, G. Tree-SLAM: Semantic object SLAM for efficient mapping of individual trees in orchards. Smart Agric. Technol. 2025, 12, 101439. [Google Scholar] [CrossRef]
Shi, Z.; Bai, Z.; Yi, K.; Qiu, B.; Dong, X.; Wang, Q.; Jiang, C.; Zhang, X.; Huang, X. Vision and 2D LiDAR Fusion-Based Navigation Line Extraction for Autonomous Agricultural Robots in Dense Pomegranate Orchards. Sensors 2025, 25, 5432. [Google Scholar]
Ma, Z.; Yang, S.; Li, J.; Qi, J. Research on SLAM Localization Algorithm for Orchard Dynamic Vision Based on YOLOD-SLAM2. Agriculture 2024, 14, 1622. [Google Scholar] [CrossRef]
Qu, J.; Gu, Y.; Qiu, Z.; Guo, K.; Zhu, Q. Development of an Orchard Inspection Robot: A ROS-Based LiDAR-SLAM System with Hybrid A*-DWA Navigation. Sensors 2025, 25, 6662. [Google Scholar] [CrossRef]
Wang, Z.; Huang, P.; Wu, X.; Liu, J. Field-validated VIO-MPC fusion for autonomous headland turning in GNSS-denied orchards. Smart Agric. Technol. 2025, 12, 101373. [Google Scholar] [CrossRef]
Usuelli, M.; Rapado-Rincon, D.; Kootstra, G.; Matteucci, M. AgriGS-SLAM: Orchard Mapping Across Seasons via Multi-View Gaussian Splatting SLAM. arXiv 2025, arXiv:2510.26358. [Google Scholar]
Zhou, H.; Wang, J.; Chen, Y.; Hu, L.; Li, Z.; Xie, F.; He, J.; Wang, P. Neural Network-Based SLAM/GNSS Fusion Localization Algorithm for Agricultural Robots in Orchard GNSS-Degraded or Denied Environments. Agriculture 2025, 15, 1612. [Google Scholar] [CrossRef]
Syed, T.N.; Zhou, J.; Lakhiar, I.A.; Marinello, F.; Gemechu, T.T.; Rottok, L.T.; Jiang, Z. Enhancing Autonomous Orchard Navigation: A Real-Time Convolutional Neural Network-Based Obstacle Classification System for Distinguishing ‘Real’ and ‘Fake’Obstacles in Agricultural Robotics. Agriculture 2025, 15, 827. [Google Scholar] [CrossRef]
Zhu, X.; Zhao, X.; Liu, J.; Feng, W.; Fan, X. Autonomous Navigation and Obstacle Avoidance for Orchard Spraying Robots: A Sensor-Fusion Approach with ArduPilot, ROS, and EKF. Agronomy 2025, 15, 1373. [Google Scholar] [CrossRef]
Cheng, B.; He, X.; Li, X.; Zhang, N.; Song, W.; Wu, H. Research on positioning and navigation system of greenhouse mobile robot based on multi-sensor fusion. Sensors 2024, 24, 4998. [Google Scholar] [CrossRef] [PubMed]
Xu, S.; Rai, R. Vision-based autonomous navigation stack for tractors operating in peach orchards. Comput. Electron. Agric. 2024, 217, 108558. [Google Scholar] [CrossRef]
Liu, E.; Monica, J.; Gold, K.; Cadle-Davidson, L.; Combs, D.; Jiang, Y. Vision-based Vineyard Navigation Solution with Automatic Annotation. In Proceedings of the 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Detroit, MI, USA, 1–5 October 2023; pp. 4234–4241. [Google Scholar]
Yan, Y.; Zhang, B.; Zhou, J.; Zhang, Y.; Liu, X. Real-Time Localization and Mapping Utilizing Multi-Sensor Fusion and Visual–IMU–Wheel Odometry for Agricultural Robots in Unstructured, Dynamic and GPS-Denied Greenhouse Environments. Agronomy 2022, 12, 1740. [Google Scholar] [CrossRef]
Nazate-Burgos, P.; Torres-Torriti, M.; Aguilera-Marinovic, S.; Arévalo, T.; Huang, S.; Cheein, F.A. Robust 2D lidar-based SLAM in arboreal environments without IMU/GNSS. arXiv 2025, arXiv:2505.10847. [Google Scholar] [CrossRef]
Shen, S.; Meng, J. A Review of Autonomous Navigation Technology for Orchard Robots Based on Visual SLAM. Asian Res. J. Agric. 2025, 18, 261–271. [Google Scholar] [CrossRef]
Zheng, S. A Review of Navigation and SLAM Technologies in Orchard Environments. Asian Res. J. Agric. 2025, 18, 13–21. [Google Scholar] [CrossRef]
Jiang, A.; Ahamed, T. Development of an autonomous navigation system for orchard spraying robots integrating a thermal camera and LiDAR using a deep learning algorithm under low- and no-light conditions. Comput. Electron. Agric. 2025, 235, 110359. [Google Scholar] [CrossRef]
Gu, H.; Wang, Y.; Liu, H.; Tian, T.; Geng, C.; Shi, Y. SkySeg-Net: Sky Segmentation-Based Row-Terminal Recognition in Trellised Orchards. Mach. Learn. Knowl. Extr. 2026, 8, 46. [Google Scholar] [CrossRef]
Wang, C.-Y.; Bochkovskiy, A.; Liao, H.-Y.M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 7464–7475. [Google Scholar]

Figure 1. Schematic diagram of the navigation hardware system.

Figure 2. Experimental platform of the trellis orchard transportation robot: (a) dual-antenna GNSS module; (b) on-board controller; (c) vehicle-mounted antenna; (d) RealSense camera; (e) industrial computer; (f) motor drive mechanism.

Figure 3. Trellis-style grape orchard.

Figure 4. Navigation paths in trellis orchard application scenario.

Figure 5. Inter-row and intra-row motion workflow in trellis orchard.

Figure 6. Target detection results under different grape growth stages and illumination conditions: (a) sprouting stage; (b) growing stage; (c) ripening stage.

Figure 7. Definition of corner points extracted from object detection bounding boxes.

Figure 8. Definition of corner points extracted from object detection bounding boxes.

Figure 9. Threshold-based geometric constraints applied to candidate feature points in image space.

Figure 10. Trapezoidal projection region induced by perspective effects and the corresponding slope definitions.

Figure 11. Generation of the intra-row center navigation line based on the fitted orchard row lines.

Figure 12. Failure example of vineyard row line fitting using least squares.

Figure 13. Failure example of vineyard row line fitting using RANSAC.

Figure 14. End-of-row vineyard line fitting failures: (a) right-side row line not fitted at row end; (b) left-side row line not fitted at row end; (c) row lines not fitted at row end; (d) overfitting of row lines at row end.

Figure 15. Generation of end-of-row vineyard row lines using temporal frame information: (a) end-of-row image 1 (Frame 764); (b) end-of-row image 2 (Frame 773); (c) end-of-row image 3 (Frame 805); (d) end-of-row image 1 (Frame 764); (e) end-of-row image 2 (Frame 773); (f) end-of-row image 3 (Frame 805).

Figure 16. Generation of end-of-row vineyard row lines using temporal frame information at different growth stages: (a) original image (Budding stage); (b) row line fitting (Budding stage); (c) original image (Growing stage); (d) row line fitting (Growing stage); (e) original image (Maturing stage); (f) row line fitting (Maturing stage).

Figure 17. Navigation line fitting results at different vine growth stages: (a) budding stage; (b) growing stage; (c) maturing stage.

Figure 18. Experimental scene of out-of-row navigation in a trellis orchard.

Figure 19. Lateral deviation along a straight path.

Figure 20. Lateral deviation along a curved path.

Figure 21. Lateral deviation along a Z-shaped path.

Figure 22. Pixel-based heading deviation at 0.4 m/s.

Figure 23. Pixel-based heading deviation at 0.6 m/s.

Figure 24. Pixel-based heading deviation at 0.8 m/s.

Table 1. Comparison of representative orchard robot navigation methods reported in recent.

Ref.	Primary Sensors	Core Method	Application Scenario	System Complexity	Key Characteristics
[1]	3D LiDAR	3D point-cloud mapping	Orchard mapping and navigation	High	Multi-season robust
[2]	3D SLAM + Point Cloud Localization	SLAM-based localization	Orchard localization and navigation	High	Robust under occlusion
[4]	LiDAR	Tree-trunk detection	Tree-trunk detection	Medium	Stable in structured orchards
[5]	LiDAR + IMU + GNSS	Multi-sensor fusion localization	Orchard fusion navigation	High	Adaptable to GNSS-degraded environments
[6]	LiDAR + IMU + GNSS	Integrated navigation	Orchard dosing robot navigation	High	Maintains positioning under GNSS occlusion
[7]	Multi-sensor (GNSS + vision + inertial)	Hybrid navigation framework	Real farm autonomous navigation (CropNav)	High	Automatic mode switching under GNSS failure
[8]	3D LiDAR SLAM	NDT-ICP SLAM mapping	Spraying robot navigation	High	Robust to point-cloud variations
[9]	LiDAR + IMU	LiDAR-IMU SLAM	Orchard localization and mapping	Medium	Stable under GNSS occlusion
[10]	Multi-sensor Fusion	EKF-based localization	Orchard navigation	Medium	Adaptable to diverse environments
[11]	Visual–Inertial	Visual-inertial odometry	Orchard localization	Medium	Robust to illumination changes
[12]	Semantic Perception	Context-aware navigation	Horticultural robot navigation	High	Strong environmental understanding
[13]	GNSS + LiDAR	Sensor fusion localization	Intermittent GNSS environments	High	Robust to GNSS loss
[14]	Semantic SLAM	Semantic mapping	Orchard mapping	High	Robust to semantic environment changes
[15]	Tree-SLAM	Tree-level mapping	Single-tree mapping	High	Cross-season stability
[16]	Vision + 2D LiDAR	Navigation line extraction	Dense orchard navigation	Medium	Robust to occlusion
[17]	LiDAR SLAM + Hybrid A *	Path planning and control	Inspection robots	High	Stable in dynamic environments
[18]	YOLO + SLAM	Visual SLAM navigation	Visual SLAM navigation	High	Robust to visual environment changes
[19]	VIO + MPC	Control-oriented navigation	GNSS-denied steering and path tracking	High	Stable under GNSS loss
[20]	Gaussian SLAM	Multi-season SLAM mapping	Multi-season mapping	High	Cross-season stability
[21]	NN SLAM + GNSS	Learning-based fusion localization	GNSS-degraded environments	High	Adaptive sensor fusion
[22]	CNN-based Visual Recognition	Visual obstacle detection	Inter-row obstacle avoidance	Medium	Dynamic obstacle detection
[23]	LiDAR + Vision + IMU + GNSS	Multi-sensor navigation	Spraying robot navigation and obstacle avoidance	High	Robust to multi-source environments
[24]	Multi-sensor Fusion	GNSS-denied localization	Greenhouse navigation	Medium	Stable under GNSS-denied conditions
[25]	Vision-based Navigation	Row detection	Peach orchard navigation	Medium	Adaptable to row-structure environments
[26]	RGB-D Vision	Depth-based navigation	Vineyard navigation	Medium	Stable in inter-row structures
[27]	Vision + IMU + Wheel Odometry	Visual-inertial localization	Greenhouse navigation	Medium	Adaptable to dynamic environments
[28]	2D LiDAR SLAM	Tree-structured SLAM	Tree-structured SLAM	Medium	GNSS-independent operation
Proposed method	Vision + GNSS	Vision-triggered navigation framework	Trellis orchard transportation navigation	Medium	Verified across multiple growth stages

Table 2. Trellis orchard transportation robot specifications.

Parameter	Specification	Parameter	Specification
Weight	90 kg	Power	48 V DC brushless motor
Dimensions	1160 × 815 × 620 mm	Operating Temperature	−20 °C to 50 °C
Wheelbase	740 mm	Maximum Obstacle Height	20°
Track Width	700 mm	Maximum Climbing Height	150 mm
Wheel Diameter	300 mm	Maximum Speed	<5 km/h
Minimum Turning Radius	On-the-spot turning	Endurance	4 h

Table 3. Dataset composition across different grape growth stages.

Growth Stage	Sprouting Stage			Growing Stage			Ripening Stage
Time of day	Noon	Afternoon	Evening	Noon	Afternoon	Evening	Noon	Afternoon	Evening
Number of images	470	520	510	519	504	477	500	515	515

Table 4. Model adaptability results under different growth stages.

Growth Stage	Illumination Condition	mAP (%)
Germination stage	High illumination (noon)	93.4
	Moderate illumination (afternoon)	95.7
	Low illumination (dusk)	91.9
Vegetative stage	High illumination (noon)	87.3
	Moderate illumination (afternoon)	93.2
	Low illumination (dusk)	89.6
Mature stage	High illumination (noon)	88.5
	Moderate illumination (afternoon)	92.2
	Low illumination (dusk)	90.6

Table 5. Fitting accuracy of vineyard row lines using least squares.

Row	Row 1	Row 2	Row 3	Row 4
Number of images	470	510	490	485
Correct fits	382	389	410	389
Accuracy (%)	81.3	76.3	83.7	80.2

Table 6. Accuracy of vineyard row line fitting using RANSAC.

Row	Row 1	Row 2	Row 3	Row 4
Number of images	470	510	490	485
Correct fits	434	456	448	454
Accuracy (%)	92.3	89.6	91.4	93.5

Table 7. Accuracy of vineyard row line fitting using temporal RANSAC.

Row	Row 1	Row 2	Row 3	Row 4
Number of images	470	510	490	485
Correct fits	453	498	473	477
Accuracy (%)	96.2	97.6	98.5	98.2

Table 8. Overall valid fitting rate comparison of three row line fitting methods.

Method	Valid Fitted Frames	Total Frames	Overall Valid Rate (%)	95% CI (%)	Relative Improvement vs. LS/pct. Points
Least Squares	1570	1955	80.31	78.49–82.01	—
RANSAC	1792	1955	91.66	90.35–92.81	+11.35
Temporal RANSAC	1901	1955	97.24	96.41–97.88	+16.93

Table 9. Accuracy of vineyard row line fitting across growth stages (temporal RANSAC).

Row	Budburst Stage (%)	Growth Stage(%)	Maturity Stage(%)
Row 1	98.5	97.6	96.2
Row 2	98	98.1	97.6
Row 3	97.2	97.9	98.5
Row 4	99.1	98.7	98.2

Table 10. Navigation line fitting accuracy at different growth stages.

Growth Stage	Row	Images	Correct (Frame)	Accuracy (%)	Mean Accuracy (%)
Budding stage	1	470	435	92.6	94.35
	2	510	488	95.7
	3	490	462	94.3
	4	485	460	94.8
Growing stage	1	470	425	90.5	92.68
	2	510	480	94.1
	3	490	457	93.3
	4	485	450	92.8
Maturing stage	1	470	429	91.2	92.23
	2	510	478	93.7
	3	490	452	92.3
	4	485	445	91.7

Table 11. Overall navigation line fitting accuracy across growth stages.

Growth Stage	Correct (Frame)	Total Frames	Overall Accuracy (%)	95% CI (%)
Budding stage	1845	1955	94.37	93.26–95.31
Growing stage	1812	1955	92.69	91.45–93.76
Maturing stage	1804	1955	92.28	91.01–93.38

Table 12. Statistical results of lateral deviation under different paths.

Path Type	Trial	Maximum Deviation (m)	Minimum Deviation (m)	Average Deviation (m)	Mean of 3 Trials (m)	Standard Deviation (m)	Coefficient of Variation (%)
Straight path	1	1.128	0.023	0.061	0.094	0.032	34.04
	2	0.926	0.04	0.096
	3	1.322	0.043	0.124
Curved path	1	1.035	0.038	0.182	0.164	0.019	11.59
	2	1.564	0.032	0.164
	3	1.256	0.043	0.145
Z-shaped path	1	1.416	0.070	0.214	0.221	0.040	18.10
	2	1.680	0.021	0.263
	3	1.347	0.055	0.185

Table 13. Heading deviation statistics at different speeds.

Speed (m·s⁻¹)	Maximum Heading Deviation (°)	Minimum Heading Deviation (°)	Average Heading Deviation (°)	Range (°)
0.4	10.52	0.12	3.24	10.40
0.6	20.3	0.23	5.23	20.07
0.8	12.6	0.38	8.12	12.22

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Liu, H.; Gu, H.; Wang, Y.; Zhong, T.; Tian, T.; Geng, C. A GNSS–Vision Integrated Autonomous Navigation System for Trellis Orchard Transportation Robots. AI 2026, 7, 125. https://doi.org/10.3390/ai7040125

AMA Style

Liu H, Gu H, Wang Y, Zhong T, Tian T, Geng C. A GNSS–Vision Integrated Autonomous Navigation System for Trellis Orchard Transportation Robots. AI. 2026; 7(4):125. https://doi.org/10.3390/ai7040125

Chicago/Turabian Style

Liu, Huaiyang, Haiyang Gu, Yong Wang, Tianjiao Zhong, Tong Tian, and Changxing Geng. 2026. "A GNSS–Vision Integrated Autonomous Navigation System for Trellis Orchard Transportation Robots" AI 7, no. 4: 125. https://doi.org/10.3390/ai7040125

APA Style

Liu, H., Gu, H., Wang, Y., Zhong, T., Tian, T., & Geng, C. (2026). A GNSS–Vision Integrated Autonomous Navigation System for Trellis Orchard Transportation Robots. AI, 7(4), 125. https://doi.org/10.3390/ai7040125

Article Menu

A GNSS–Vision Integrated Autonomous Navigation System for Trellis Orchard Transportation Robots

Abstract

1. Introduction

2. Materials and Methods

2.1. System Overview

2.2. Analysis of Trellis Orchard Navigation Scenarios and System Workflow

2.2.1. Experimental Scenarios and Navigation Mode Classification

2.2.2. Workflow Design for Inter-Row and Intra-Row Navigation

2.3. Multi-Modal Perception and Navigation Mode Coordination Based on Scenario Recognition

2.3.1. GNSS-Based Inter-Row Global Navigation Mode

2.3.2. Vision-Based Intra-Row Navigation

2.3.3. Scene-Aware Navigation Mode Switching Mechanism

3. Results

3.1. Experimental Setup and Evaluation Metrics

3.1.1. Experimental Platform and Scene Configuration

3.1.2. Navigation Control and Experimental Procedure

3.1.3. Evaluation Metrics and Analysis Methods

3.2. Comparative Experiments on Vineyard Row Line Fitting Methods

3.2.1. Vineyard Row Line Fitting Based on Least Squares

3.2.2. Vineyard Row Line Fitting Based on RANSAC

3.2.3. RANSAC Incorporating Temporal Frame Information

3.3. Comparative Experiments on Vineyard Row Line Fitting Methods

3.4. Orchard In-Row and Out-of-Row Navigation Experiments

3.4.1. Out-of-Row GNSS-Based Navigation Experiments

3.4.2. In-Row Vision-Based Navigation Experiments

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI