1. Introduction
Navigating unstructured environments is a core challenge for autonomous all-terrain robots [
1,
2,
3,
4,
5]. Systems operating in agricultural fields, construction sites, and rugged terrains must cope with uneven surfaces, dynamic obstacles, and fluctuating environmental conditions. However, current simulation frameworks seldom integrate these complexities, constraining the development and testing of robust navigation algorithms. Most existing tools focus on structured indoor settings [
6,
7,
8], manipulation tasks [
9,
10], drone operations [
11,
12], or support only the older Gym interface [
13]. To address these gaps, we introduce DUnE, an open-source framework that implements the Gymnasium interface [
14,
15] within ROS 2 Humble [
16] and Gazebo Fortress [
17], complete with automated metric logging and dynamic obstacle generation. Although DUnE can be used to simulate any navigation system, it was primarily designed for the evaluation of reinforcement learning algorithms. Our platform includes models of three off-road robotic systems widely used in research and industry—the Rover Robotics Rover Zero [
18], the open-source quadruped HyperDog [
19], and FictionLab Leo Rover [
20]—along with multiple open-source environments representing agricultural landscapes, industrial inspection zones, and construction sites (
Figure 1).
Table 1 shows a comparison of our toolkit vs. the ROS-Pybullet [
21] framework, the gym-gazebo2 [
9] toolkit, and iGibson [
6].
Reinforcement learning (RL) is a branch of machine learning where an agent learns to make decisions by interacting with an environment to maximize cumulative rewards [
22]. The agent explores actions and observes their consequences, using algorithms to balance exploration (trying new actions) and exploitation (choosing the best-known actions). The Gymnasium RL interface is a standardized API for developing and benchmarking reinforcement learning environments, enabling agents to interact with environments through consistent methods like reset, step, and render. ROS 2 (Robot Operating System 2) is an open-source robotics framework designed for modular and scalable development of robotic applications. It provides tools for node-based communication, hardware abstraction, and real-time system capabilities. ROS 2 supports distributed computing, making it ideal for complex multi-robot systems and advanced robotic tasks. Gazebo is a 3D simulation platform used for designing and testing robotic systems in realistic environments. It includes advanced physics engines, sensor modeling, and dynamic object interactions, allowing for accurate performance testing. Seamlessly integrated with ROS 2, Gazebo excels in training and deploying ROS 2-based robot platforms, making it a top choice for robotics simulation [
23].
A key contribution of DUnE is the automated capturing, logging, and graphing of comprehensive performance metrics, including success rate (SR), total collisions (TC), mean time to traverse (MTT), traverse rate (TR), and velocity over rough terrain (VORT). These metrics provide standardized benchmarks for evaluating and comparing the performance of different navigation algorithms and robotic platforms. Furthermore, DUnE has the built-in capability for semi-automated insertion of dynamic obstacles [
24], enabling the creation of truly dynamic environments. This feature allows researchers to simulate scenarios with moving obstacles, such as humans or vehicles, to test and refine obstacle avoidance strategies effectively. Additionally, we validate the effectiveness of our framework by implementing a baseline Soft Actor Critic (SAC) agent for the pointgoal navigation task [
25]. The agent is tested across various challenging terrains, demonstrating the framework’s capability to facilitate the development and assessment of reinforcement learning agents in complex unstructured environments.
Our contributions bridge significant gaps in existing robotics simulation toolkits by providing the following:
Gymnasium interface: A custom Gymnasium environment that interfaces with ROS 2 and Gazebo.
Multiple unstructured environments: Integration of various unstructured environments which include agriculture, construction, and back country.
Dynamic obstacle insertion: A service which provides trajectory generation and runtime insertion of dynamic obstacles.
Automated metric logging: Comprehensive performance evaluation through standardized metric logging and visualization.
By offering an integrated framework (
Figure 2), we aim to accelerate research and development in off-road robotic navigation, providing a robust foundation for testing and deploying autonomous systems in unstructured terrain.
2. Method
This section details the development, integration, and validation methodologies employed in creating our versatile software framework for off-road mobile robotics research. The framework integrates the Gymnasium interface with ROS 2 Humble and Gazebo Fortress, supporting multiple robot platforms and dynamic unstructured environments. It also incorporates a baseline Soft Actor Critic (SAC) agent for pointgoal navigation. The software implementation is organized as follows:
- launch/:
ROS 2 launch files for starting simulations and nodes.
- scripts/:
Utility scripts for metric logging, dynamic obstacle control, and simulation management.
The entire codebase is open-source and available at
https://github.com/jackvice/RoboTerrain (accessed on 15 March 2025). It is released under the MIT License, allowing for widespread use and modification by the research community. During the preparation of this article, the authors used ChatGPT o1 for the purposes of grammar correction and paragraph structuring.
2.1. Integration of Gymnasium Interface with ROS 2 and Gazebo
To bridge the gap between the RL algorithms and the robotic simulation, we developed a custom Gymnasium environment that interfaces with ROS 2 and Gazebo. We defined a Gym.Env subclass that encapsulates the observation and action spaces corresponding to the robot’s sensor inputs and control commands. Within this environment, ROS 2 publishers and subscribers were developed to send control commands to the robot and receive sensor data back via the Gymnasium environment functions.
We implemented Gazebo and ROS 2 services for managing dynamic obstacles and resetting the simulation environment between agent episodes. To maintain consistency during training episodes, we ensured time synchronization between the Gym environment and the Gazebo simulation. This integration enables the RL agent to interact with the simulated robot in near real time, receiving sensor observations and sending control actions within each simulation step.
2.2. Robot Platforms
As shown in
Figure 3, we included models of three off-road robots:
FictionLab Leo Rover: A research-focused four-wheel-drive robot with differential steering and rocker articulating suspension.
Rover Robotics Rover Zero: An all-terrain four-wheel-drive robot with differential steering and configurable chassis.
HyperDog: An open-source quadruped robot featuring 12 degrees of freedom and inexpensive 3D-printed parts.
Each robot model includes realistic physical properties, sensor configurations (e.g., LiDAR, cameras, IMUs), and control interfaces compatible with ROS 2. The robot models were integrated into Gazebo with accurate collision meshes and inertia parameters to simulate realistic dynamics.
The Leo Rover is a compact, four-wheeled mobile robot measuring 430 × 450 × 250 mm and weighing 6.5 kg. Each wheel is driven by an in-hub DC motor equipped with a 73.2:1 planetary gearbox and a 12 CPR encoder, and is attached to a longitudinal rocker-arm suspension. Powered by an 11.1 V DC, 5000 mAh Li-ion battery, it reaches a top speed of about 0.4 m/s linearly and 60 deg/s angularly. A 5 MPx camera with a 170° field of view is mounted at the front, while the rover’s top plate offers numerous mounting points and can carry payloads of up to 5 kg. The Leo Rover has a waterproof IP64 rating for reliable operation in various environments.
The Rover Zero is a cost-effective yet capable ground robot measuring 620 × 390 × 254 mm and weighing 11 kg. Its drivetrain consists of brushless motors controlled by a Dual FSESC4.20 100 A unit, with Hall effect quadrature encoders providing precise feedback. The platform can carry up to 50 kg and runs on a removable 98 Wh Li-ion battery, delivering 1–2 h of driving time and approximately 5 h of idle operation. Built with 10″ solid flat-free or pneumatic wheels, the Rover Zero 3 adeptly handles moderate outdoor terrain, although it lacks a formal weatherproof rating and is best suited to dry conditions. Pre-configured with ROS 2 Humble, it streamlines the development and deployment of navigation and perception algorithms, making it an excellent entry-level choice for researchers and engineers.
HyperDog is an open-source quadruped robot designed for robotic software development, featuring 12 degrees of freedom (DoF) facilitated by RC servo motors. Its compact frame measures 300 mm in width, 175 mm in height, and 240 mm in depth, with each leg comprising three joints: hip, upper leg, and lower leg. Constructed from 3D-printed parts and carbon fiber, HyperDog weighs approximately 5 kg and can carry a payload of up to 2 kg. The robot is powered by an 8.4 V, 8.8 Ah Li-ion battery pack, providing around 50 min of operational time per charge. Equipped with an onboard NVIDIA Jetson Nano computer and an STM32F4 microcontroller, HyperDog operates on the Robot Operating System 2 (ROS2) and micro-ROS frameworks, enabling efficient communication and control for various research and development applications.
2.3. Environment Models
DUnE includes four open-source unstructured environment models to simulate diverse real-world conditions:
Office CPR Construction [
26]: Contains equipment, uneven surfaces, and materials common in construction areas.
Inspection World [
26]: Includes irrigation systems and variable terrain elevations.
Rubicon [
27]: Features uneven ground with trees and foliage, presenting challenging navigation with rocks, slopes, varied elevations, and vegetation.
Island [
28]: Small simple island of rough terrain with very few obstacles.
Figure 4 shows the environments that were compiled from multiple open-source repositories and databases.
2.4. Dynamic Obstacle Integration
To simulate dynamic obstacles, we generate actors in Gazebo with custom waypoints and timing parameters, spawning them at runtime via a service call (ign service -s /world/default/create). To create a dynamic obstacle 3D trajectory, we simply pass the world mesh file name, the desired velocity, and pairs of waypoint x,y coordinates as arguments to the generate_trajectory.py script. The user-defined 2D waypoints are used to extract elevation data from the .dae terrain file to create a smooth 3D trajectory. Cubic splines generate dense intermediate points (sampled at 5 cm intervals) for realistic motion at a constant velocity. Given two successive waypoints and , the yaw is computed as , ensuring the actor smoothly transitions along the trajectory while maintaining the correct heading. The final yaw value is repeated for the last waypoint to maintain trajectory consistency.
The final trajectory is embedded in an SDF <trajectory> block, which is then integrated into an <actor> definition. A runtime spawning script parses this SDF file, configures animation properties by referencing a predefined walking motion from Gazebo Fuel, and invokes Ignition Transport to insert the actor into the simulation. This automated process enables real-time scenario generation, eliminating the need for manually recorded trajectories while supporting dynamic obstacle integration in autonomous navigation experiments.
Introducing multiple moving actors into the simulation provides diverse, dynamic scenarios that help prevent overfitting for both camera and LiDAR data by exposing the reinforcement learning model to a wide range of visual and spatial variations. For cameras, moving actors create varying occlusions, motion patterns, and lighting changes, challenging the model to generalize its perception of obstacles and navigation paths. For LiDAR, the changing positions and velocities of actors result in dynamic point cloud variations, making the model more robust to noise, sensor artifacts, and unexpected object trajectories in real-world applications. By training in such dynamic environments, the agent learns to adapt to diverse, non-static conditions, improving its performance in complex, real-world scenarios where unpredictable interactions frequently occur. This variability reduces the likelihood of overfitting to specific patterns or static layouts, enhancing the model’s generalizability.
2.5. Automated Metric Logging System
We developed an automated system to capture, log, and graph performance metrics during simulation runs. The metrics captured include:
Success tate (SR): Percentage of trials where the robot successfully reached the goal without collisions.
Total collisions (TC): Number of collisions with static or dynamic obstacles.
Mean time to traverse (MTT): Average time taken to reach the goal position.
Traversal rate (TR): Ratio of successful traversals to total attempts.
Obstacle clearance (OC): The minimum distance to any obstacle.
Vertical roughness (VR): The z-axis acceleration of the robot.
Velocity over rugged terrain (VORT): Average velocity maintained over uneven terrain sections.
We implemented ROS 2 logging mechanisms to record the metrics and then Python 3.10 scripts generated performance graphs comparing multiple trials or conditions. The metric-logging component enables standardized evaluation and comparison of different navigation algorithms and robot configurations.
2.6. Soft Actor Critic (SAC) Agent Implementation
We implemented a baseline RL agent for basic point navigation (PointNav) using the Stable Baselines3 [
29] implementation of the SAC [
30] algorithm. The multimodal [
31,
32] observation space included an RGB camera, 2D LiDAR, IMU, and the distance and heading information for the randomly generated goal position. The Stable Baselines3 agent implementation uses a CNN for the image component and then fuses it with the other vector observations. For the observation, the image has a 64 × 64-pixel resolution and the 2D LiDAR component provides 32 range points. The raw LiDAR scan is 128 points and is clipped to ensure that all values remain within the sensor’s valid measurement interval. Next, the modified 128-point array is divided into 32 equal segments of 4 points each, and the minimum range value of each segment is used for the LiDAR observation component. This downsampling step transforms the raw input into a 32-point observation vector, which reduces the observation space size while still providing enough points for navigation.
Our Gymnasium environment, RoverEnv(gym.Env), includes a proportional–integral–derivative (PID) loop for motor control. The action space consists of a heading and velocity command, with the PID loop controlling the robot motors to achieve the desired heading. The PID controller operates at a 20 Hz update rate (0.05 s per step), dynamically adjusting the angular velocity to align the robot with the desired heading. The error is computed as the shortest angular distance between the desired and current headings, normalized to the range . To ensure safe operation, the control output is clipped to the maximum safe linear and angular velocities. During each simulation step, this angular velocity is computed and passed to the robot motors, effectively adjusting the platform’s orientation to match the desired heading.
The reward function was designed to encourage progress toward the goal while penalizing collisions and inefficient movements. It consists of several components:
provides a large positive reward upon reaching the target distance, and
applies an immediate negative penalty when an obstacle is too close. As auxiliary rewards,
gives a small bonus or penalty based on how well the agent is oriented toward the target, promoting proper alignment.
rewards reducing the distance to the goal when the agent is heading in a favorable direction, with a slight penalty for negligible or misaligned progress. Additionally,
incentivizes gradual acceleration and sustained forward movement by adding a bonus proportional to the agent’s linear speed. These components are combined—scaled by a final multiplier—to guide the agent toward efficient, goal-oriented behavior with smooth dynamics.
Table 2 summarizes the reward function parameters.
The reward function for our SAC agent is designed to encourage goal-oriented navigation while avoiding obstacles. The reward at each timestep is composed of several components that balance progress toward the goal, heading alignment, collision avoidance, and motion efficiency.
The primary navigation reward combines distance progress and heading alignment:
Distance progress is rewarded when the agent moves closer to the target while maintaining proper alignment:
where
represents the reduction in distance to the target.
Heading alignment is rewarded based on the difference between the agent’s orientation and the direction to the target:
This piecewise function provides the maximum reward when perfectly aligned (), decreases linearly as the heading difference increases up to 90°, and then becomes increasingly negative for orientations facing away from the target.
To promote smooth and efficient motion, we incorporate a velocity incentive:
The reward function handles two special cases with immediate returns. First, goal achievement occurs when the agent reaches within the success-threshold distance:
Second, collisions are penalized when the agent approaches obstacles too closely:
This reward formulation balances the competing objectives of efficient navigation and safety while ensuring the agent is properly incentivized to reach the goal.
2.7. RL Training Procedure
We trained our agents to learn pointgoal navigation in both a baseline Flat world environment and the unstructured terrain of the Inspection and Island worlds, allowing us to compare the agent’s performance across different conditions. The Flat world is very similar to the environments commonly used to train RL agents for indoor navigation. Agents were trained until policy convergence (1.4–3.8 million steps) in each of the three environments. Inspection world consisted of smooth sloped terrain with industrial infrastructure obstacles, while Island world included extremely bumpy terrain and relatively few simple rectangular obstacles. In all three environments, we used the same episodic setup, in which each episode began by placing the robot at a randomly sampled position and orientation, and by assigning a random goal location within an 8 × 8 m interior region. This randomization procedure ensured exposure to a diverse range of initial states, making the learned policy more robust to different start–goal configurations. Each episode terminated once the agent reached the goal, became stuck, flipped over, or exceeded a fixed step limit of 2000 steps. The Soft Actor Critic training parameters were as follows:
Total timesteps: The agent was trained for 5,000,000 timesteps.
Learning rate: Linear decay from to .
Learning starts: 50,000 steps.
Discount factor (): Set to 0.99 to balance immediate and future rewards.
Entropy coefficient (): Automatic temperature tuning starting at 0.5.
3. Results
This section presents results on the performance of our baseline RL agent on Rover Zero. Metrics were collected using the automated logging system during 10 min (11,400 step) evaluation runs. The same training procedures of random spawn, random goal position, and episode terminal were used. At the end of each episode, the framework automatically recorded the following metrics: success rate (SR), total collisions (TC), velocity over rough terrain (VORT), and total smoothness of route (TSR). This enabled comparisons of the robot’s navigation performance across different environments.
Table 3 and
Figure 4 summarize the performance of our baseline SAC agent in three environments: Flat, Inspection, and Island. The Flat world, free of obstacles and elevation changes, yielded a perfect 100% success rate with zero collisions. By contrast, the two unstructured settings led to lower success rates (75% on Inspection and 73% on Island) due to uneven surfaces and obstacles, increasing both the difficulty of collision avoidance and the time needed to reach the goal. The mean values for obstacle clearance, total collisions and velocity over rough terrain are shown in
Figure 5,
Figure 6 and
Figure 7.
Figure 8 shows the vertical roughness (VR), which is calculated from the absolute value of the vertical (z-axis) acceleration.
Figure 9 and
Figure 10 show the training progression for the SAC agent in the Flat and Inspection worlds.
Figure 5 and
Figure 6 summarize two major safety-related outcomes, obstacle clearance and total collisions, across both environments. As shown in
Figure 6, the agent’s collision rate in the Inspection world is considerably higher, reflecting the greater navigational challenges of the sloped terrain near the obstacles.
Figure 5 further highlights the closer proximity of obstacles in Island world compared to in Inspection world.
Figure 9 illustrates the training of a Soft Actor Critic (SAC) agent in a simple, flat environment, where convergence is reached in around 900k steps. The critic loss (red) and actor loss (orange) both show stable downward trends, with the critic loss approaching zero after initial peaks and the actor loss decreasing from about 0.6 to −0.4. Meanwhile, the mean episode reward (blue) steadily rises beyond 800, indicating the agent’s proficiency at the task. In contrast,
Figure 10 shows the SAC training in the more challenging Inspection world that features dynamic actor obstacles, requiring 3.8 million steps for convergence. Because the agent must navigate moving objects in addition to uneven terrain, learning progress is less smooth and the convergence slower. Nonetheless, both the actor and critic losses (orange and red curves, respectively) eventually stabilize, reflecting effective policy and value function learning despite the heightened complexity introduced by the dynamic obstacles.
Total smoothness of route (TSR) quantifies the overall motion quality of the robot by combining linear and angular motion components. For each timestep, the instantaneous smoothness is calculated as the sum of the linear acceleration magnitude and angular velocity magnitude, measured in and , respectively. The TSR is then computed as the mean of these instantaneous smoothness values over the entire route, where lower values indicate smoother navigation with fewer abrupt changes in motion. For the Flat environment, the robot was able to quickly change direction of movement, producing high acceleration and thus a higher score than expected.
The relationship between success rate (SR) and terrain features reveals important insights about navigation challenges. While a 100% SR was achieved on the flat terrain, the more complex environments saw significant decreases—75% for Inspection world and 73% for Island world. This decline appears to be strongly correlated with specific terrain characteristics: the Inspection world’s combination of slopes and industrial obstacles created challenging navigation scenarios where the agent had to simultaneously manage inclines while maintaining safe distances from structures. The Island world’s extremely bumpy terrain, while having fewer obstacles, demonstrated how surface irregularity alone can substantially impact navigation success. These findings suggest that SR could potentially be improved by incorporating additional sensing modalities—particularly more sophisticated depth sensing to better characterize surface geometry ahead of the robot.
The velocity over rough terrain (VORT) metric, shown in
Figure 7, proved to be a crucial indicator of navigation efficiency, with clear implications for real-world deployment. The baseline measurements (0.78 m/s on flat terrain, dropping to 0.44 m/s and 0.61 m/s on Inspection and Island worlds, respectively) demonstrate how terrain complexity and roughness directly impacts achievable speeds. This metric particularly influenced the agent’s decision-making process in Inspection world, where the lower VORT (0.44 m/s) reflects a more cautious navigation strategy adopted around obstacles and slopes. Interestingly, while the Island world had very uneven terrain, as shown in the vertical roughness graph in
Figure 8, its relatively obstacle-free nature allowed for a higher VORT (0.61 m/s), suggesting that the agent learned to prioritize straight-line paths when possible, even over bumpy surfaces. This relationship between VORT and navigation strategy has important implications for real-world applications, particularly in scenarios where maintaining higher speeds must be balanced against vehicle stability and safety considerations.
Collectively, these observations show that our learned policy generalizes well across diverse conditions and can effectively handle off-road navigation in both simpler and more challenging unstructured domains. These results confirm that DUnE supports evaluation of autonomous agents in challenging off-road conditions. Future extensions will leverage DUnE’s capacity for diverse terrains and obstacles, robot models, and control architectures for advanced off-road navigation research.
4. Discussion
This work introduced DUnE, a framework that integrates diverse terrain models, dynamic obstacles, and automated metric logging with ROS 2, Gazebo, and the Gymnasium interface, resulting in a versatile toolkit for the advancement of RL research for off-road robotics. By providing multiple robot platforms, diverse world models, automated metric logging, and dynamic obstacle capabilities, this framework addresses key gaps in existing simulation tools. Initial experiments using a baseline SAC agent demonstrated that the system can train and evaluate navigation policies in complex unstructured settings.
The collected performance metrics, such as success rate (SR), total collisions (TC), and velocity over rough terrain (VORT), facilitate objective evaluations of performance. We provide an RL agent, implemented with SAC, as a baseline for future comparisons to more advanced methods. The current semi-automated method for inserting dynamic obstacles worked adequately, but further improvements to these model trajectories could enable more real-world-type scenarios.
These initial findings suggest several potential directions for future work, including refinement of obstacle models, richer terrain properties, and integration of more advanced learning algorithms. Adding new world models simply involves placing the world model file in the worlds folder and adding the name to the launch file. In addition to human actors, incorporating cars, trucks, and livestock as dynamic obstacles would improve the simulation utility for construction, industrial inspection, and agricultural environments. For camera observations, incorporating photogrammetry-based world models and particle emitter-based reduced visibility would improve visual realism for simulation experiments and sim2real efforts. Additionally, adding deformable vegetation such as tall grass would greatly improve traversability research.
As the framework is fully open-source and built with modular, widely adopted tools, it can be readily expanded and improved. With the Gymnasium interface, various RL architectures and algorithms can be incorporated and tested with minimal effort. To further enhance sim2real transfer capabilities, future development will focus on domain randomization techniques to bridge the reality gap. Implementing systematic sensor noise models that simulate real-world LiDAR artifacts, camera distortion, and IMU drift would better prepare algorithms for deployment on physical robots. Randomizing terrain properties such as friction coefficients and surface compliance would improve robustness to varying ground conditions. Coupled with the ROS 2 middleware and navigation stack, new robotic platforms and control strategies could be integrated to validate policies on actual hardware. In summary, DUnE provides a solid foundation for iterative experimentation and progress in off-road robotics research.