3.1. Principles of the RTAB-MAP Algorithm and Its Improvements
RTAB-Map is a graph-based SLAM method widely used in RGB-D SLAM, employing depth data to build 3D point cloud maps for robot navigation. To adapt it to orchard environments—which contain numerous low-hanging, traversable branches and foliage—we enhance the conventional RTAB-Map framework in two ways. First, we integrate ORB-SLAM3 to supply high-precision pose estimates in place of RTAB-Map’s native visual odometry, improving estimation robustness and accuracy. Second, we apply a height filtering strategy at the point cloud level to produce navigation-tailored maps that eliminate unnecessary overhead obstacles. The overall improved architecture is illustrated in
Figure 1.
The inputs for the improved RTAB-Map algorithm comprise two primary components: depth camera data and external visual odometry data provided by ORB-SLAM3. Depth camera data is provided in RGB-D format; external visual odometry data is sup plied by the ORB-SLAM3 system, with its pose information published via the nav_msgs/Odometry message type. The coordinate frames of all sensors are unified with that of the robot chassis using the TransformFrame (TF) mechanism. A detailed temporal flow of the entire system is illustrated in
Figure 2, showing the data exchange between ROS nodes and the sequential processing from input synchronisation to planning. The system TF tree is shown in
Figure 3.
As pose data and image data are input into the system asynchronously via different topics, timestamp alignment processing is required. The data synchronisation process is illustrated in
Figure 4. Employ the rgbd_sync node to precisely synchronise the image and depth information from the RGB-D camera, ensuring they share identical timestamps. The aligned RGB-D data is then approximately synchronised with the pose information released by ORB-SLAM3 to ensure consistency between environmental perception and pose estimation data. For synchronising RGB-D images (rgbd_sync) and external odometry data, the message_filters::ApproximateTimepolicy was employed. This policy matches messages from different topics based on their timestamps within a defined temporal tolerance (slop). A queue size of 10 messages per topic was used to buffer incoming data while searching for timestamp matches across topics. The slopparameter was set to 0.05 s, defining the maximum allowable time difference between timestamps for messages to be considered synchronised.
Synchronised sensor data is stored within the Short-Term Memory (STM) module. This module primarily creates nodes for each frame of data and stores them.
The external visual odometry system provides core pose estimation information for constructing local maps. Based on the node pose information provided by ORB-SLAM3, the local maps are stitched together to form a global map. In this study, ORB-SLAM3 provides only visual odometry functionality, without performing its original loop closure detection and global optimisation. This design prevents conflicts with RTAB-Map’s backend graph-optimisation module and ensures that the odometry-generated pose sequence remains continuous and smooth. This facilitates the proper functioning of backend loop closure detection and graph optimisation.
The workflow of ORB-SLAM3, employed as an external visual odometry module, consists of the following key stages: First, the system receives colour and depth image inputs from the RGB-D camera and extracts ORB feature points from each frame. The ORB feature comprises the FAST corner detector and the BRIEF descriptor, exhibiting both rotation and scale invariance. It is well-suited for robust feature detection in orchard environments characterised by varying illumination and repetitive textures. During the feature matching phase, ORB-SLAM3 performs nearest neighbour matching on ORB descriptors from adjacent frames and resolves ambiguities using Lowe’s ratio test. The de termination formula for reliable feature matching is given in Equation (1):
and
denote the Hamming distances between a feature point and its nearest neighbour and second-nearest neighbour, respectively. When the ratio is less than the threshold
, the match is deemed reliable. During the pose estimation phase, ORB-SLAM3 establishes 3D-2D correspondences based on matched point pairs, and solves the camera pose using PnP (Perspective-n-Point) and RANSAC. As formulated in Equation (2):
Here,
denotes a three-dimensional point,
represents its observed pixel in the image,
is the camera projection function, and
is the robust kernel function (used to suppress the influence of anomalous matching points). Upon obtaining the initial pose, ORB-SLAM3 further employs nonlinear optimisation (Levenberg–Marquardt, LM) to minimise reprojection error, thereby achieving a higher-precision pose. To enhance the stability of short-term tracking, ORB-SLAM3 performs local bundle adjustment (BA) within the local keyframe window, defined in Equation (3):
and denote the camera pose at frame , represents a point in the 3D map, and indicates the observation correspondence. In this study, the loop closure detection module of ORB-SLAM3 was disabled, and its candidate frame retrieval functionality based on DBoW2 visual bag-of-words no longer participated in pose estimation. This ensured that only smooth, continuous visual odometry results were published to the /odom topic in ROS.
Ultimately, the odometry messages issued by ORB-SLAM3 are passed as external input to RTAB-Map, replacing its internal RGB-D odometry algorithm. The accuracy and robustness of front-end pose estimation have been significantly enhanced, whilst loop closure detection and global map optimisation continue to be performed by the RTAB-Map backend. The system thus achieves a balance between high-precision odometry and the retention of dense mapping capabilities alongside global consistency optimisation. ORB-SLAM3 serves as an external odometry framework, as illustrated in
Figure 5. As shown in
Appendix A.1, these are the key parameters for ORB-SLAM3 as a visual odometry system.
Furthermore, additional closed-loop detection and global optimisation of the global map are required. Within the RTAB-Map algorithm, closed-loop detection is achieved through a visual bag-of-words model and a Bayesian filter. The visual bag-of-words model updates the weight of a localisation point by calculating the similarity between the current pose node and the working memory (WM) node, thereby determining whether the node participates in closed-loop detection or is transferred to the long-term memory (LTM) module. In this approach, the backend optimiser of RTAB-Map performs loop closure correction and global optimisation based on the precise initial pose provided by ORB-SLAM3. This ultimately outputs a high-quality navigation map that is globally consistent and free from invalid high-altitude obstacles.
Within this integrated research system, although the front-end pose is provided by the high-precision ORB-SLAM3, significantly reducing cumulative error, the back-end global optimisation remains crucial for ensuring map consistency in large-scale environments. The closed-loop detection and optimisation module of RTAB-Map is re sponsible for executing this task. Closed-loop detection is achieved through the collaborative operation of a visual bag-of-words model and a Bayesian filter. The bag-of-words model converts images into vectors composed of visual words, rapidly screening loopback candidate frames by comparing vector similarity. For the current pose node
and the historical node
in the working memory (WM), the similarity
between their bag-of-words vectors is computed using Equation (4).
In the formula: and denote the total number of visual words in the current node and the candidate node, respectively, while represents the number of matched word pairs. This similarity metric ensures the fairness and validity of the matching process.
However, mere visual similarity is susceptible to perceptual ambiguity (such as repetitive visual structures within a garden). Consequently, the system employs Bayesian filtering to estimate the probability of closed-loop existence, thereby comprehensively determining the most probable feedback node. Let
denote the aggregate state of all candidate feedback nodes at time “
”. Its probability distribution
is updated via Bayes’ theorem, as in Formula (5):
In the formula: denotes the normalisation constant, represents the observation likelihood computed via Equation (4), and signifies the state transition probability. Through continuous updating, the candidate node with the highest prob ability shall be confirmed as the correct loop. Upon successful detection of the closed loop, this constraint is incorporated into a pose graph. The system invokes optimisers such as g2o to perform global optimisation on all pose nodes within the graph (initialised with high-precision poses provided by ORB-SLAM3). This effectively corrects any residual mi nor drift from the front-end, yielding a globally consistent optimal pose estimation. Upon completion of optimisation, the locally mapped areas associated with each node are fused, ultimately generating a globally consistent map for navigation purposes.
3.2. Post-Processing Raster Map Optimisation Methods for Orchard Scenes
Autonomous navigation within orchard environments frequently encounters challenges in practical application due to low-hanging branches, shrubbery, and irregular terrain, leading to the emergence of ‘false obstacles’ during robot mapping and path planning. Conventional vision-based mapping methods, when generating two-dimensional occupancy grid maps, often misinterpret low-level branch point clouds as obstacles capable of impeding robot movement. This misinterpretation consequently triggers path de tours or planning failures. To address this issue, this paper proposes a post-processing method for point cloud height filtering, thereby enhancing the robustness and feasibility of robotic operations within orchard environments. This method optimises the rasterisation parameters within the database file (rtabmap.db) generated following RTAB-Map mapping, thereby achieving precise modelling of passable terrain. Considering the typical weed height in orchards, the overall height of the orchard navigation robot (H = 42.5 cm), and potential future configurations such as mounting robotic arms, this study opts to remove point cloud data within the range
. The specific procedure is as follows: Configure the key parameters of the Grid module within the .db file: Grid/MaxGround Height: 0.6; Grid/MinGroundHeight: 0.2. The key parameters for RTAB-Map and modifications to key parameters within the .db file are shown in
Appendix A.2.
The three-dimensional point cloud of low-hanging branches in the simulated orchard scene is shown in
Figure 6. The red box indicates branches that may not impede the navigation of orchard robots within the scene, yet their projections onto the two-dimensional grid map are classified as obstacles. As shown in
Figure 7, the point cloud under goes high-pass filtering to specifically eliminate interference from low-hanging branches during navigation. Only key navigational information—such as tree trunks, ground sur faces, and low-lying obstacles—is retained.
The processed point cloud data is converted into the robot coordinate system via the TF tree, generating the corresponding two-dimensional raster map. This is further optimised within the ROS Navigation framework to refine the cost map. Finally, stable autonomous navigation within complex orchard environments is achieved through the combined application of A* and TEB path planning algorithms. Mathematically, this process can be defined as Formula (6):
Here, denotes the original point cloud set, while represents the filtered point cloud, with . Through this approach, low-hanging branches and foliage within the 0.2–0.6 m height range are effectively removed, whilst obstacles above ground level that genuinely impede the robot’s passage remain preserved. The filtered occupancy grid map is thus rendered more concise, providing a more accurate representation of the environment for subsequent path planning.
In the realm of navigation path planning, this study employs a combined approach utilising the global path planning algorithm A* alongside the local path planning method TEB (Timed Elastic Band). The global A* path planning cost function is expressed in Equation (7).
where
denotes the cumulative cost from the starting point to the current node, and
represents the heuristic estimate from the current node to the destination, typically taken as the Euclidean distance. The A* algorithm guarantees the connectivity and optimality of the global path, providing constraints for the robot’s overall motion direction. However, A* alone is insufficient for handling dynamic obstacles or accommodating the robot’s kinematic constraints. Therefore, this paper further introduces the TEB algorithm as a local planner. TEB decomposes the global path into a series of time-constrained trajectory bands. Through nonlinear optimisation, it simultaneously considers trajectory smoothness, dynamic feasibility, and obstacle avoidance constraints. The TEB optimisation objective is formulated in Equation (8), balancing trajectory smoothness, dynamic feasibility, and obstacle avoidance cost.
denotes a trajectory point, with the first term constraining trajectory continuity and the second term maintaining obstacle avoidance via a cost map, where serves as the balancing coefficient. During optimisation, TEB dynamically adjusts its path based on real-time sensor inputs, thereby achieving flexible obstacle avoidance in semi-structured environments such as orchards. By integrating a hierarchical strategy combining global A* and local TEB, the robot maintains overall path optimality while possessing the capability to avoid low-hanging branches and root obstacles in real time.
The aforementioned strategy ensures the accuracy of environmental representation through processed maps, while cost map optimisation enhances parameter consistency between mapping and navigation phases. The integration of A* and TEB balances global optimality with local dynamic feasibility, thereby enabling the robot to maintain stable autonomous navigation capabilities within complex orchard environments. The principal parameters used for the local TEB planner and costmap configuration are summarised in
Appendix A.3 These parameters define the motion limits, obstacle avoidance constraints, and Ackermann kinematic restrictions of the robot, ensuring dynamic feasibility and safe navigation in the orchard environment.