DUnE: A Versatile Dynamic Unstructured Environment for Off-Road Navigation

Vice, Jack M.; Sukthankar, Gita

doi:10.3390/robotics14040035

Open AccessArticle

DUnE: A Versatile Dynamic Unstructured Environment for Off-Road Navigation

by

Jack M. Vice

and

Gita Sukthankar

^*

Department of Computer Science, University of Central Florida, Orlando, FL 32816, USA

^*

Author to whom correspondence should be addressed.

Robotics 2025, 14(4), 35; https://doi.org/10.3390/robotics14040035

Submission received: 4 February 2025 / Revised: 28 February 2025 / Accepted: 17 March 2025 / Published: 21 March 2025

(This article belongs to the Section AI in Robotics)

Download

Browse Figures

Versions Notes

Abstract

Navigating uneven, unstructured terrain with dynamic obstacles remains a challenge for autonomous mobile robots. This article introduces Dynamic Unstructured Environment (DUnE) for evaluating the performance of off-road navigation systems in simulation. DUnE is a versatile software framework that implements the Gymnasium reinforcement learning (RL) interface for ROS 2, incorporating unstructured Gazebo simulation environments and dynamic obstacle integration to advance off-road navigation research. The testbed automates key performance metric logging and provides semi-automated trajectory generation for dynamic obstacles including simulated human actors. It supports multiple robot platforms and five distinct unstructured environments, ranging from forests to rocky terrains. A baseline reinforcement learning agent demonstrates the framework’s effectiveness by performing pointgoal navigation with obstacle avoidance across various terrains. By providing an RL interface, dynamic obstacle integration, specialized navigation tasks, and comprehensive metric tracking, DUnE addresses significant gaps in existing simulation tools.

Keywords:

off-road navigation; unstructured terrain; robotic simulators; reinforcement learning benchmarks

1. Introduction

Navigating unstructured environments is a core challenge for autonomous all-terrain robots [1,2,3,4,5]. Systems operating in agricultural fields, construction sites, and rugged terrains must cope with uneven surfaces, dynamic obstacles, and fluctuating environmental conditions. However, current simulation frameworks seldom integrate these complexities, constraining the development and testing of robust navigation algorithms. Most existing tools focus on structured indoor settings [6,7,8], manipulation tasks [9,10], drone operations [11,12], or support only the older Gym interface [13]. To address these gaps, we introduce DUnE, an open-source framework that implements the Gymnasium interface [14,15] within ROS 2 Humble [16] and Gazebo Fortress [17], complete with automated metric logging and dynamic obstacle generation. Although DUnE can be used to simulate any navigation system, it was primarily designed for the evaluation of reinforcement learning algorithms. Our platform includes models of three off-road robotic systems widely used in research and industry—the Rover Robotics Rover Zero [18], the open-source quadruped HyperDog [19], and FictionLab Leo Rover [20]—along with multiple open-source environments representing agricultural landscapes, industrial inspection zones, and construction sites (Figure 1). Table 1 shows a comparison of our toolkit vs. the ROS-Pybullet [21] framework, the gym-gazebo2 [9] toolkit, and iGibson [6].

Reinforcement learning (RL) is a branch of machine learning where an agent learns to make decisions by interacting with an environment to maximize cumulative rewards [22]. The agent explores actions and observes their consequences, using algorithms to balance exploration (trying new actions) and exploitation (choosing the best-known actions). The Gymnasium RL interface is a standardized API for developing and benchmarking reinforcement learning environments, enabling agents to interact with environments through consistent methods like reset, step, and render. ROS 2 (Robot Operating System 2) is an open-source robotics framework designed for modular and scalable development of robotic applications. It provides tools for node-based communication, hardware abstraction, and real-time system capabilities. ROS 2 supports distributed computing, making it ideal for complex multi-robot systems and advanced robotic tasks. Gazebo is a 3D simulation platform used for designing and testing robotic systems in realistic environments. It includes advanced physics engines, sensor modeling, and dynamic object interactions, allowing for accurate performance testing. Seamlessly integrated with ROS 2, Gazebo excels in training and deploying ROS 2-based robot platforms, making it a top choice for robotics simulation [23].

A key contribution of DUnE is the automated capturing, logging, and graphing of comprehensive performance metrics, including success rate (SR), total collisions (TC), mean time to traverse (MTT), traverse rate (TR), and velocity over rough terrain (VORT). These metrics provide standardized benchmarks for evaluating and comparing the performance of different navigation algorithms and robotic platforms. Furthermore, DUnE has the built-in capability for semi-automated insertion of dynamic obstacles [24], enabling the creation of truly dynamic environments. This feature allows researchers to simulate scenarios with moving obstacles, such as humans or vehicles, to test and refine obstacle avoidance strategies effectively. Additionally, we validate the effectiveness of our framework by implementing a baseline Soft Actor Critic (SAC) agent for the pointgoal navigation task [25]. The agent is tested across various challenging terrains, demonstrating the framework’s capability to facilitate the development and assessment of reinforcement learning agents in complex unstructured environments.

Our contributions bridge significant gaps in existing robotics simulation toolkits by providing the following:

Gymnasium interface: A custom Gymnasium environment that interfaces with ROS 2 and Gazebo.
Multiple unstructured environments: Integration of various unstructured environments which include agriculture, construction, and back country.
Dynamic obstacle insertion: A service which provides trajectory generation and runtime insertion of dynamic obstacles.
Automated metric logging: Comprehensive performance evaluation through standardized metric logging and visualization.

By offering an integrated framework (Figure 2), we aim to accelerate research and development in off-road robotic navigation, providing a robust foundation for testing and deploying autonomous systems in unstructured terrain.

2. Method

This section details the development, integration, and validation methodologies employed in creating our versatile software framework for off-road mobile robotics research. The framework integrates the Gymnasium interface with ROS 2 Humble and Gazebo Fortress, supporting multiple robot platforms and dynamic unstructured environments. It also incorporates a baseline Soft Actor Critic (SAC) agent for pointgoal navigation. The software implementation is organized as follows:

launch/:: ROS 2 launch files for starting simulations and nodes.
scripts/:: Utility scripts for metric logging, dynamic obstacle control, and simulation management.

The entire codebase is open-source and available at https://github.com/jackvice/RoboTerrain (accessed on 15 March 2025). It is released under the MIT License, allowing for widespread use and modification by the research community. During the preparation of this article, the authors used ChatGPT o1 for the purposes of grammar correction and paragraph structuring.

2.1. Integration of Gymnasium Interface with ROS 2 and Gazebo

To bridge the gap between the RL algorithms and the robotic simulation, we developed a custom Gymnasium environment that interfaces with ROS 2 and Gazebo. We defined a Gym.Env subclass that encapsulates the observation and action spaces corresponding to the robot’s sensor inputs and control commands. Within this environment, ROS 2 publishers and subscribers were developed to send control commands to the robot and receive sensor data back via the Gymnasium environment functions.

We implemented Gazebo and ROS 2 services for managing dynamic obstacles and resetting the simulation environment between agent episodes. To maintain consistency during training episodes, we ensured time synchronization between the Gym environment and the Gazebo simulation. This integration enables the RL agent to interact with the simulated robot in near real time, receiving sensor observations and sending control actions within each simulation step.

2.2. Robot Platforms

As shown in Figure 3, we included models of three off-road robots:

FictionLab Leo Rover: A research-focused four-wheel-drive robot with differential steering and rocker articulating suspension.
Rover Robotics Rover Zero: An all-terrain four-wheel-drive robot with differential steering and configurable chassis.
HyperDog: An open-source quadruped robot featuring 12 degrees of freedom and inexpensive 3D-printed parts.

Each robot model includes realistic physical properties, sensor configurations (e.g., LiDAR, cameras, IMUs), and control interfaces compatible with ROS 2. The robot models were integrated into Gazebo with accurate collision meshes and inertia parameters to simulate realistic dynamics.

The Leo Rover is a compact, four-wheeled mobile robot measuring 430 × 450 × 250 mm and weighing 6.5 kg. Each wheel is driven by an in-hub DC motor equipped with a 73.2:1 planetary gearbox and a 12 CPR encoder, and is attached to a longitudinal rocker-arm suspension. Powered by an 11.1 V DC, 5000 mAh Li-ion battery, it reaches a top speed of about 0.4 m/s linearly and 60 deg/s angularly. A 5 MPx camera with a 170° field of view is mounted at the front, while the rover’s top plate offers numerous mounting points and can carry payloads of up to 5 kg. The Leo Rover has a waterproof IP64 rating for reliable operation in various environments.

The Rover Zero is a cost-effective yet capable ground robot measuring 620 × 390 × 254 mm and weighing 11 kg. Its drivetrain consists of brushless motors controlled by a Dual FSESC4.20 100 A unit, with Hall effect quadrature encoders providing precise feedback. The platform can carry up to 50 kg and runs on a removable 98 Wh Li-ion battery, delivering 1–2 h of driving time and approximately 5 h of idle operation. Built with 10″ solid flat-free or pneumatic wheels, the Rover Zero 3 adeptly handles moderate outdoor terrain, although it lacks a formal weatherproof rating and is best suited to dry conditions. Pre-configured with ROS 2 Humble, it streamlines the development and deployment of navigation and perception algorithms, making it an excellent entry-level choice for researchers and engineers.

HyperDog is an open-source quadruped robot designed for robotic software development, featuring 12 degrees of freedom (DoF) facilitated by RC servo motors. Its compact frame measures 300 mm in width, 175 mm in height, and 240 mm in depth, with each leg comprising three joints: hip, upper leg, and lower leg. Constructed from 3D-printed parts and carbon fiber, HyperDog weighs approximately 5 kg and can carry a payload of up to 2 kg. The robot is powered by an 8.4 V, 8.8 Ah Li-ion battery pack, providing around 50 min of operational time per charge. Equipped with an onboard NVIDIA Jetson Nano computer and an STM32F4 microcontroller, HyperDog operates on the Robot Operating System 2 (ROS2) and micro-ROS frameworks, enabling efficient communication and control for various research and development applications.

2.3. Environment Models

DUnE includes four open-source unstructured environment models to simulate diverse real-world conditions:

Office CPR Construction [26]: Contains equipment, uneven surfaces, and materials common in construction areas.
Inspection World [26]: Includes irrigation systems and variable terrain elevations.
Rubicon [27]: Features uneven ground with trees and foliage, presenting challenging navigation with rocks, slopes, varied elevations, and vegetation.
Island [28]: Small simple island of rough terrain with very few obstacles.

Figure 4 shows the environments that were compiled from multiple open-source repositories and databases.

2.4. Dynamic Obstacle Integration

To simulate dynamic obstacles, we generate actors in Gazebo with custom waypoints and timing parameters, spawning them at runtime via a service call (ign service -s /world/default/create). To create a dynamic obstacle 3D trajectory, we simply pass the world mesh file name, the desired velocity, and pairs of waypoint x,y coordinates as arguments to the generate_trajectory.py script. The user-defined 2D waypoints are used to extract elevation data from the .dae terrain file to create a smooth 3D trajectory. Cubic splines generate dense intermediate points (sampled at 5 cm intervals) for realistic motion at a constant velocity. Given two successive waypoints

(x_{1}, y_{1})

and

(x_{2}, y_{2})

, the yaw is computed as

θ = atan 2 (y_{2} - y_{1}, x_{2} - x_{1})

, ensuring the actor smoothly transitions along the trajectory while maintaining the correct heading. The final yaw value is repeated for the last waypoint to maintain trajectory consistency.

The final trajectory is embedded in an SDF <trajectory> block, which is then integrated into an <actor> definition. A runtime spawning script parses this SDF file, configures animation properties by referencing a predefined walking motion from Gazebo Fuel, and invokes Ignition Transport to insert the actor into the simulation. This automated process enables real-time scenario generation, eliminating the need for manually recorded trajectories while supporting dynamic obstacle integration in autonomous navigation experiments.

Introducing multiple moving actors into the simulation provides diverse, dynamic scenarios that help prevent overfitting for both camera and LiDAR data by exposing the reinforcement learning model to a wide range of visual and spatial variations. For cameras, moving actors create varying occlusions, motion patterns, and lighting changes, challenging the model to generalize its perception of obstacles and navigation paths. For LiDAR, the changing positions and velocities of actors result in dynamic point cloud variations, making the model more robust to noise, sensor artifacts, and unexpected object trajectories in real-world applications. By training in such dynamic environments, the agent learns to adapt to diverse, non-static conditions, improving its performance in complex, real-world scenarios where unpredictable interactions frequently occur. This variability reduces the likelihood of overfitting to specific patterns or static layouts, enhancing the model’s generalizability.

2.5. Automated Metric Logging System

We developed an automated system to capture, log, and graph performance metrics during simulation runs. The metrics captured include:

Success tate (SR): Percentage of trials where the robot successfully reached the goal without collisions.
Total collisions (TC): Number of collisions with static or dynamic obstacles.
Mean time to traverse (MTT): Average time taken to reach the goal position.
Traversal rate (TR): Ratio of successful traversals to total attempts.
Obstacle clearance (OC): The minimum distance to any obstacle.
Vertical roughness (VR): The z-axis acceleration of the robot.
Velocity over rugged terrain (VORT): Average velocity maintained over uneven terrain sections.

We implemented ROS 2 logging mechanisms to record the metrics and then Python 3.10 scripts generated performance graphs comparing multiple trials or conditions. The metric-logging component enables standardized evaluation and comparison of different navigation algorithms and robot configurations.

2.6. Soft Actor Critic (SAC) Agent Implementation

We implemented a baseline RL agent for basic point navigation (PointNav) using the Stable Baselines3 [29] implementation of the SAC [30] algorithm. The multimodal [31,32] observation space included an RGB camera, 2D LiDAR, IMU, and the distance and heading information for the randomly generated goal position. The Stable Baselines3 agent implementation uses a CNN for the image component and then fuses it with the other vector observations. For the observation, the image has a 64 × 64-pixel resolution and the 2D LiDAR component provides 32 range points. The raw LiDAR scan is 128 points and is clipped to ensure that all values remain within the sensor’s valid measurement interval. Next, the modified 128-point array is divided into 32 equal segments of 4 points each, and the minimum range value of each segment is used for the LiDAR observation component. This downsampling step transforms the raw input into a 32-point observation vector, which reduces the observation space size while still providing enough points for navigation.

Our Gymnasium environment, RoverEnv(gym.Env), includes a proportional–integral–derivative (PID) loop for motor control. The action space consists of a heading and velocity command, with the PID loop controlling the robot motors to achieve the desired heading. The PID controller operates at a 20 Hz update rate (0.05 s per step), dynamically adjusting the angular velocity to align the robot with the desired heading. The error is computed as the shortest angular distance between the desired and current headings, normalized to the range

[- π, π]

. To ensure safe operation, the control output is clipped to the maximum safe linear and angular velocities. During each simulation step, this angular velocity is computed and passed to the robot motors, effectively adjusting the platform’s orientation to match the desired heading.

The reward function was designed to encourage progress toward the goal while penalizing collisions and inefficient movements. It consists of several components:

R_{goal}

provides a large positive reward upon reaching the target distance, and

R_{collision}

applies an immediate negative penalty when an obstacle is too close. As auxiliary rewards,

R_{heading}

gives a small bonus or penalty based on how well the agent is oriented toward the target, promoting proper alignment.

R_{distance}

rewards reducing the distance to the goal when the agent is heading in a favorable direction, with a slight penalty for negligible or misaligned progress. Additionally,

R_{velocity}

incentivizes gradual acceleration and sustained forward movement by adding a bonus proportional to the agent’s linear speed. These components are combined—scaled by a final multiplier—to guide the agent toward efficient, goal-oriented behavior with smooth dynamics. Table 2 summarizes the reward function parameters.

The reward function for our SAC agent is designed to encourage goal-oriented navigation while avoiding obstacles. The reward at each timestep is composed of several components that balance progress toward the goal, heading alignment, collision avoidance, and motion efficiency.

The primary navigation reward combines distance progress and heading alignment:

R_{nav} = α \times (R_{distance} + R_{heading}) + R_{velocity}

(1)

Distance progress is rewarded when the agent moves closer to the target while maintaining proper alignment:

R_{distance} = \{\begin{matrix} Δ d \times s_{fact}, & if Δ d > 0 and | θ_{diff} | \leq \frac{π}{2}, \\ - 0.001, & otherwise . \end{matrix}

(2)

where

Δ d = d_{prev} - d_{current}

represents the reduction in distance to the target.

Heading alignment is rewarded based on the difference between the agent’s orientation and the direction to the target:

R_{heading} = R_{head} \times \{\begin{matrix} 1 - \frac{2 | θ_{diff} |}{π}, & if | θ_{diff} | \leq \frac{π}{2}, \\ - \frac{2 (| θ_{diff} | - π / 2)}{π}, & if | θ_{diff} | > \frac{π}{2} . \end{matrix}

(3)

This piecewise function provides the maximum reward when perfectly aligned (

θ_{diff} = 0

), decreases linearly as the heading difference increases up to 90°, and then becomes increasingly negative for orientations facing away from the target.

To promote smooth and efficient motion, we incorporate a velocity incentive:

R_{velocity} = v_{lin} \times 0.0025

(4)

The reward function handles two special cases with immediate returns. First, goal achievement occurs when the agent reaches within the success-threshold distance:

R = R_{goal} if d_{current} < d_{succ}

(5)

Second, collisions are penalized when the agent approaches obstacles too closely:

R = R_{coll} if d_{min} < c_{thr}

(6)

This reward formulation balances the competing objectives of efficient navigation and safety while ensuring the agent is properly incentivized to reach the goal.

2.7. RL Training Procedure

We trained our agents to learn pointgoal navigation in both a baseline Flat world environment and the unstructured terrain of the Inspection and Island worlds, allowing us to compare the agent’s performance across different conditions. The Flat world is very similar to the environments commonly used to train RL agents for indoor navigation. Agents were trained until policy convergence (1.4–3.8 million steps) in each of the three environments. Inspection world consisted of smooth sloped terrain with industrial infrastructure obstacles, while Island world included extremely bumpy terrain and relatively few simple rectangular obstacles. In all three environments, we used the same episodic setup, in which each episode began by placing the robot at a randomly sampled position and orientation, and by assigning a random goal location within an 8 × 8 m interior region. This randomization procedure ensured exposure to a diverse range of initial states, making the learned policy more robust to different start–goal configurations. Each episode terminated once the agent reached the goal, became stuck, flipped over, or exceeded a fixed step limit of 2000 steps. The Soft Actor Critic training parameters were as follows:

Total timesteps: The agent was trained for 5,000,000 timesteps.
Learning rate: Linear decay from $3 \times 10^{- 4}$ to $5 \times 10^{- 5}$ .
Learning starts: 50,000 steps.
Discount factor ( $γ$ ): Set to 0.99 to balance immediate and future rewards.
Entropy coefficient ( $α$ ): Automatic temperature tuning starting at 0.5.

3. Results

This section presents results on the performance of our baseline RL agent on Rover Zero. Metrics were collected using the automated logging system during 10 min (11,400 step) evaluation runs. The same training procedures of random spawn, random goal position, and episode terminal were used. At the end of each episode, the framework automatically recorded the following metrics: success rate (SR), total collisions (TC), velocity over rough terrain (VORT), and total smoothness of route (TSR). This enabled comparisons of the robot’s navigation performance across different environments.

Table 3 and Figure 4 summarize the performance of our baseline SAC agent in three environments: Flat, Inspection, and Island. The Flat world, free of obstacles and elevation changes, yielded a perfect 100% success rate with zero collisions. By contrast, the two unstructured settings led to lower success rates (75% on Inspection and 73% on Island) due to uneven surfaces and obstacles, increasing both the difficulty of collision avoidance and the time needed to reach the goal. The mean values for obstacle clearance, total collisions and velocity over rough terrain are shown in Figure 5, Figure 6 and Figure 7. Figure 8 shows the vertical roughness (VR), which is calculated from the absolute value of the vertical (z-axis) acceleration. Figure 9 and Figure 10 show the training progression for the SAC agent in the Flat and Inspection worlds.

Figure 5 and Figure 6 summarize two major safety-related outcomes, obstacle clearance and total collisions, across both environments. As shown in Figure 6, the agent’s collision rate in the Inspection world is considerably higher, reflecting the greater navigational challenges of the sloped terrain near the obstacles. Figure 5 further highlights the closer proximity of obstacles in Island world compared to in Inspection world.

Figure 9 illustrates the training of a Soft Actor Critic (SAC) agent in a simple, flat environment, where convergence is reached in around 900k steps. The critic loss (red) and actor loss (orange) both show stable downward trends, with the critic loss approaching zero after initial peaks and the actor loss decreasing from about 0.6 to −0.4. Meanwhile, the mean episode reward (blue) steadily rises beyond 800, indicating the agent’s proficiency at the task. In contrast, Figure 10 shows the SAC training in the more challenging Inspection world that features dynamic actor obstacles, requiring 3.8 million steps for convergence. Because the agent must navigate moving objects in addition to uneven terrain, learning progress is less smooth and the convergence slower. Nonetheless, both the actor and critic losses (orange and red curves, respectively) eventually stabilize, reflecting effective policy and value function learning despite the heightened complexity introduced by the dynamic obstacles.

Total smoothness of route (TSR) quantifies the overall motion quality of the robot by combining linear and angular motion components. For each timestep, the instantaneous smoothness is calculated as the sum of the linear acceleration magnitude and angular velocity magnitude, measured in

m / s^{2}

and

rad / s

, respectively. The TSR is then computed as the mean of these instantaneous smoothness values over the entire route, where lower values indicate smoother navigation with fewer abrupt changes in motion. For the Flat environment, the robot was able to quickly change direction of movement, producing high acceleration and thus a higher score than expected.

The relationship between success rate (SR) and terrain features reveals important insights about navigation challenges. While a 100% SR was achieved on the flat terrain, the more complex environments saw significant decreases—75% for Inspection world and 73% for Island world. This decline appears to be strongly correlated with specific terrain characteristics: the Inspection world’s combination of slopes and industrial obstacles created challenging navigation scenarios where the agent had to simultaneously manage inclines while maintaining safe distances from structures. The Island world’s extremely bumpy terrain, while having fewer obstacles, demonstrated how surface irregularity alone can substantially impact navigation success. These findings suggest that SR could potentially be improved by incorporating additional sensing modalities—particularly more sophisticated depth sensing to better characterize surface geometry ahead of the robot.

The velocity over rough terrain (VORT) metric, shown in Figure 7, proved to be a crucial indicator of navigation efficiency, with clear implications for real-world deployment. The baseline measurements (0.78 m/s on flat terrain, dropping to 0.44 m/s and 0.61 m/s on Inspection and Island worlds, respectively) demonstrate how terrain complexity and roughness directly impacts achievable speeds. This metric particularly influenced the agent’s decision-making process in Inspection world, where the lower VORT (0.44 m/s) reflects a more cautious navigation strategy adopted around obstacles and slopes. Interestingly, while the Island world had very uneven terrain, as shown in the vertical roughness graph in Figure 8, its relatively obstacle-free nature allowed for a higher VORT (0.61 m/s), suggesting that the agent learned to prioritize straight-line paths when possible, even over bumpy surfaces. This relationship between VORT and navigation strategy has important implications for real-world applications, particularly in scenarios where maintaining higher speeds must be balanced against vehicle stability and safety considerations.

Collectively, these observations show that our learned policy generalizes well across diverse conditions and can effectively handle off-road navigation in both simpler and more challenging unstructured domains. These results confirm that DUnE supports evaluation of autonomous agents in challenging off-road conditions. Future extensions will leverage DUnE’s capacity for diverse terrains and obstacles, robot models, and control architectures for advanced off-road navigation research.

4. Discussion

This work introduced DUnE, a framework that integrates diverse terrain models, dynamic obstacles, and automated metric logging with ROS 2, Gazebo, and the Gymnasium interface, resulting in a versatile toolkit for the advancement of RL research for off-road robotics. By providing multiple robot platforms, diverse world models, automated metric logging, and dynamic obstacle capabilities, this framework addresses key gaps in existing simulation tools. Initial experiments using a baseline SAC agent demonstrated that the system can train and evaluate navigation policies in complex unstructured settings.

The collected performance metrics, such as success rate (SR), total collisions (TC), and velocity over rough terrain (VORT), facilitate objective evaluations of performance. We provide an RL agent, implemented with SAC, as a baseline for future comparisons to more advanced methods. The current semi-automated method for inserting dynamic obstacles worked adequately, but further improvements to these model trajectories could enable more real-world-type scenarios.

These initial findings suggest several potential directions for future work, including refinement of obstacle models, richer terrain properties, and integration of more advanced learning algorithms. Adding new world models simply involves placing the world model file in the worlds folder and adding the name to the launch file. In addition to human actors, incorporating cars, trucks, and livestock as dynamic obstacles would improve the simulation utility for construction, industrial inspection, and agricultural environments. For camera observations, incorporating photogrammetry-based world models and particle emitter-based reduced visibility would improve visual realism for simulation experiments and sim2real efforts. Additionally, adding deformable vegetation such as tall grass would greatly improve traversability research.

As the framework is fully open-source and built with modular, widely adopted tools, it can be readily expanded and improved. With the Gymnasium interface, various RL architectures and algorithms can be incorporated and tested with minimal effort. To further enhance sim2real transfer capabilities, future development will focus on domain randomization techniques to bridge the reality gap. Implementing systematic sensor noise models that simulate real-world LiDAR artifacts, camera distortion, and IMU drift would better prepare algorithms for deployment on physical robots. Randomizing terrain properties such as friction coefficients and surface compliance would improve robustness to varying ground conditions. Coupled with the ROS 2 middleware and navigation stack, new robotic platforms and control strategies could be integrated to validate policies on actual hardware. In summary, DUnE provides a solid foundation for iterative experimentation and progress in off-road robotics research.

Author Contributions

Conceptualization, J.M.V. and G.S.; software, J.M.V.; investigation, J.M.V.; writing—original draft preparation, J.M.V.; writing—review and editing, G.S.; visualization, J.M.V.; supervision, G.S. All authors have read and agreed to the published version of the manuscript.

Funding

The research reported herein was partially sponsored by the US Army Combat Capabilities Development Command (DEVCOM) Army Research Laboratory (ARL) and was accomplished under Cooperative Agreement W911NF-21-2-0279 with the University of Central Florida. The views and conclusions contained in this report are those of the authors and should not be interpreted as representing the official policies or positions, either expressed or implied, of ARL, the US Government, or the University of Central Florida.

Data Availability Statement

The entire codebase is open source and available at https://github.com/jackvice/RoboTerrain (accessed on 15 March 2025). It is released under the MIT License.

Acknowledgments

During the preparation of this study, the authors used ChatGPT o1 for the purposes of grammar correction and paragraph structuring. The authors have reviewed and edited the output and take full responsibility for the content of this publication.

Conflicts of Interest

The funding sponsors had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, and in the decision to publish the results.

Abbreviations

The following abbreviations are used in this manuscript:

DUnE	Dynamic Unstructured Environment
RL	Reinforcement Learning
SDF	Simulation Description Format
SAC	Soft Actor Critic
SR	Success Rate
TC	Total Collisions
VORT	Velocity over Rough Terrain
TSR	Total Smoothness of Route
PID	Proportional–integral–derivative

References

Smith, L.; Kostrikov, I.; Levine, S. A walk in the park: Learning to walk in 20 min with model-free reinforcement learning. arXiv 2022, arXiv:2208.07860. [Google Scholar]
Nahrendra, I.M.A.; Yu, B.; Myung, H. DreamWaQ: Learning robust quadrupedal locomotion with implicit terrain imagination via deep reinforcement learning. In Proceedings of the 2023 IEEE International Conference on Robotics and Automation (ICRA), London, UK, 29 May–2 June 2023; pp. 5078–5084. [Google Scholar]
Kahn, G.; Abbeel, P.; Levine, S. Badgr: An autonomous self-supervised learning-based navigation system. IEEE Robot. Autom. Lett. 2021, 6, 1312–1319. [Google Scholar] [CrossRef]
Zhang, H.; Li, Z.; Zeng, X.; Smith, L.; Stachowicz, K.; Shah, D.; Yue, L.; Song, Z.; Xia, W.; Levine, S.; et al. Traversability-Aware Legged Navigation by Learning from Real-World Visual Data. arXiv 2024, arXiv:2410.10621. [Google Scholar]
Hossain, J.; Faridee, A.-Z.; Roy, N.; Basak, A.; Asher, D.E. Covernav: Cover following navigation planning in unstructured outdoor environment with deep reinforcement learning. In Proceedings of the 2023 IEEE International Conference on Autonomic Computing and Self-Organizing Systems (ACSOS), Washington, DC, USA, 25–29 September 2023; pp. 127–132. [Google Scholar]
Li, C.; Xia, F.; Martín-Martín, R.; Lingelbach, M.; Srivastava, S.; Shen, B.; Vainio, K.; Gokmen, C.; Dharan, G.; Jain, T.; et al. iGibson 2.0: Object-centric simulation for robot learning of everyday household tasks. arXiv 2021, arXiv:2108.03272. [Google Scholar]
Savva, M.; Kadian, A.; Maksymets, O.; Zhao, Y.; Wijmans, E.; Jain, B.; Straub, J.; Liu, J.; Koltun, V.; Malik, J.; et al. Habitat: A Platform for Embodied AI Research. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019. [Google Scholar]
Zamora, I.; Pulido, F.; Hernández, A.; Rosell, J.; Ros, E.; García, J. Extending the OpenAI Gym for Robotics: A Toolkit for Reinforcement Learning Using ROS and Gazebo. arXiv 2016, arXiv:1608.05742. [Google Scholar]
Gonzalez Lopez, N.; Erro Nuin, Y.L.; Barba Moral, E.; Usategui San Juan, L.; Solano Rueda, A.; Mayoral Vilches, V.; Kojcev, R. gym-gazebo2, a toolkit for reinforcement learning using ROS 2 and Gazebo. arXiv 2019, arXiv:1903.06278. [Google Scholar]
Zhu, Y.; Mo, C.; Mandlekar, A.; Garg, A.; Booher, J.; Zhao, F.; Wu, Y.; Gao, J.; Poganski, M.; Handa, A.; et al. robosuite: A Modular Simulation Framework and Benchmark for Robot Learning. arXiv 2020, arXiv:2009.12293. [Google Scholar]
Panerati, J.; Zheng, H.; Zhou, S.; Xu, J.; Prorok, A.; Schoellig, A.P. Learning to fly—A gym environment with PyBullet physics for reinforcement learning of multi-agent quadcopter control. In Proceedings of the 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Prague, Czech Republic, 27 September–1 October 2021; pp. 7512–7519. [Google Scholar]
Coumans, E.; Bai, Y. PyBullet: A Python Module for Physics Simulation for Games. PyBullet. 2016. Available online: https://docs.google.com/document/u/1/d (accessed on 15 March 2025).
Ferigo, D.; Traversaro, S.; Metta, G.; Pucci, D. Gym-ignition: Reproducible robotic simulations for reinforcement learning. In Proceedings of the 2020 IEEE/SICE International Symposium on System Integration (SII), Honolulu, HI, USA, 12–15 January 2020; pp. 885–890. [Google Scholar]
Brockman, G.; Cheung, V.; Pettersson, L.; Schneider, J.; Schulman, J.; Tang, J.; Zaremba, W. OpenAI Gym. arXiv 2016, arXiv:1606.01540. [Google Scholar]
Towers, M.; Kwiatkowski, A.; Terry, J.; Balis, J.U.; De Cola, G.; Deleu, T.; Goulão, M.; Kallinteris, A.; Krimmel, M.; Arjun, K.G.; et al. Gymnasium: A standard interface for reinforcement learning environments. arXiv 2024, arXiv:2407.17032. [Google Scholar]
Macenski, S.; Foote, T.; Gerkey, B.; Lalancette, C.; Woodall, W. Robot Operating System 2: Design, architecture, and uses in the wild. Sci. Robot. 2022, 7, eabm6074. [Google Scholar] [PubMed]
Koenig, N.; Howard, A. Design and use paradigms for Gazebo, an open-source multi-robot simulator. In Proceedings of the 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Sendai, Japan, 28 September–2 October 2004; pp. 2149–2154. [Google Scholar]
Rover Robotics. Gazebo Models. Available online: https://github.com/RoverRobotics/roverrobotics_ROS2 (accessed on 1 January 2025).
Mudalige, N.D.W.; Rathnayaka, N.B.; Ameer, A.; Naghashyan, R.; Rao, N.S.V.; Alzubaidi, A.; Stinissen, J.; Hernandez Corbato, C.; Dehghani, R.; Arapi, S. Hyperdog: An open-source quadruped robot platform based on ROS2 and micro-ROS. In Proceedings of the 2022 IEEE International Conference on Systems, Man, and Cybernetics (SMC), Prague, Czech Republic, 9–12 October 2022; IEEE; pp. 436–441. [Google Scholar]
Fictionlab. Gazebo Models. Available online: https://github.com/LeoRover/leo_robot-ROS2 (accessed on 1 January 2025).
Mower, C.E.; Stouraitis, T.; Moura, J.; Rauch, C.; Yan, L.; Behabadi, N.Z.; Gienger, M.; Vercauteren, T.; Bergeles, C.; Vijayakumar, S. ROS-PyBullet Interface: A Framework for Reliable Contact Simulation and Human-Robot Interaction. In Proceedings of the Conference on Robot Learning, Auckland, New Zealand, 14–18 December 2022. [Google Scholar]
Sutton, R.; Barto, A. Reinforcement Learning: An Introduction; MIT Press: Cambridge, MA, USA, 1998. [Google Scholar]
Körber, M.; Rothfuss, J.; Homma, T.; DeMaria, A.; Schindler, C.; Metz, J.F.R.; Huang, Z.; Maurer, C. Comparing Popular Simulation Environments in the Scope of Robotics and Reinforcement Learning. arXiv 2021, arXiv:2103.04616. [Google Scholar]
Yokoyama, N.; Luo, Q.; Batra, D.; Ha, S. Benchmarking augmentation methods for learning robust navigation agents: The winning entry of the 2021 iGibson challenge. In Proceedings of the 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Kyoto, Japan, 23–27 October 2022; pp. 1748–1755. [Google Scholar]
Wijmans, E.; Kadian, A.; Morcos, A.; Lee, S.; Essa, I.; Parikh, D.; Savva, M.; Batra, D. Dd-SAC: Learning near-perfect pointgoal navigators from 2.5 billion frames. arXiv 2019, arXiv:1911.00357. [Google Scholar]
Clearpath Robotics. Gazebo Terrain Models. Available online: https://github.com/clearpathrobotics/cpr_gazebo/tree/noetic-devel (accessed on 1 January 2025).
Open Robotics. Rubicon. Gazebo Fuel. 1 December 2022. Available online: https://app.gazebosim.org/Penkatron/fuel/worlds/Rubicon%20World (accessed on 23 January 2025).
Open Robotics. Null Island. Gazebo Fuel. 3 September 2023. Available online: https://app.gazebosim.org/OpenRobotics/fuel/worlds/Null%20Island (accessed on 23 January 2025).
Raffin, A.; Hill, A.; Gleave, A.; Kanervisto, A.; Ernestus, M.; Dormann, N. Stable-Baselines3: Reliable reinforcement learning implementations. J. Mach. Learn. Res. 2021, 22, 1–8. [Google Scholar]
Haarnoja, T.; Zhou, A.; Abbeel, P.; Levine, S. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In Proceedings of the International Conference on Machine Learning (ICML), Stockholm, Sweden, 10–15 July 2018; pp. 1861–1870. [Google Scholar]
Ngiam, J.; Khosla, A.; Kim, M.; Nam, J.; Lee, H.; Ng, A.Y. Multimodal deep learning. In Proceedings of the 28th International Conference on Machine Learning (ICML-11), Bellevue, WA, USA, 28 June–2 July 2011; pp. 689–696. [Google Scholar]
Song, H.; Li, A.; Wang, T.; Wang, M. Multimodal deep reinforcement learning with auxiliary task for obstacle avoidance of indoor mobile robot. Sensors 2021, 21, 1363. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Inspection world with two dynamic actor obstacles.

Figure 2. Testbed architecture: The RL agent communicates with ROS 2 via the Gymnasium interface. The Ground Truth Position Node acts as a GPS for robot navigation. Dynamic obstacles are inserted via the Obstacle Trajectory Generator. The metric-logging node records success rate (SR), total collisions (TC), mean time to traverse (MTT) and velocity over rough terrain (VORT). The included statistical analysis and graphing scripts complete the experimental process.

Figure 3. The three robotic platforms included with DUnE: (a) Leo Rover, (b) Rover Zero, and (c) HyperDog.

Figure 4. Environments included with DUnE: (a) Construction, (b) Inspection, (c) Rubicon, and (d) Island. The Flat world is not shown.

Figure 5. Obstacle clearance (OC): Defined as the minimum distance to obstacles over time, with higher values indicating safer navigation. Since Flat world does not contain any obstacles, its value consistently remains at the maximum LiDAR range of 12 m.

Figure 6. Total collisions (TC): Safety performance metric where lower values indicate better obstacle avoidance. The robot had zero collisions when traversing the Flat world and Island world.

Figure 7. Velocity over rough terrain (VORT): Measurement of how effectively the robot maintains speed when traversing challenging surfaces.

Figure 8. Vertical roughness (VR): Calculated as the absolute value of the vertical (z-axis) acceleration component from the IMU data, measured in

m / s^{2}

. This metric quantifies the smoothness of navigation.

Figure 8. Vertical roughness (VR): Calculated as the absolute value of the vertical (z-axis) acceleration component from the IMU data, measured in

m / s^{2}

. This metric quantifies the smoothness of navigation.

Figure 9. Training progression of a Soft Actor Critic (SAC) agent over 900K steps in the Flat world. (Top): The critic loss (red) shows effective value function convergence, decreasing from initial peaks to stabilize near zero by 600K steps, while the actor loss (orange) demonstrates successful policy optimization, declining steadily from 0.6 to around −0.4. (Bottom): The mean episode reward (blue) shows clear learning progress, improving from negative values at the start of training to achieve returns above 800, indicating the agent has learned an effective policy for the task.

Figure 10. SAC training progression in Inspection world with dynamic obstacles over 3.8 million steps. (Top): Mean episode reward (blue) curve shows convergence but learning is not as smooth as that of the Flat world (Figure 5). (Bottom): The actor and critic losses (orange and red, respectively) show effective policy and value function convergence.

Table 1. Comparison of open-source robotics toolkits. ✓ indicates that the toolkit possesses a feature and ✗ denotes a missing feature.

Feature	DUnE (Ours)	ROS-PyBullet	gym-gazebo2	iGibson
ROS 2 Support	✓	✓	✓	✓
Realistic Rough Terrain	✓	✗	✗	✗
Gymnasium Interface	✓	✗	✓	✓
Unstructured Environments	✓	✗	✗	✗
Automated Dynamic Obstacles	✓	✗	✗	✓
Mobile Robot Platform Integration	✓	✓	✓	?
Reinforcement Learning Integration	✓	✗	✓	✓
Performance Metric Logging	✓	✗	✗	✗
Open-Source Availability	✓	✓	✓	✓
Licensing	MIT License	BSD License	Apache 2.0	MIT License

Table 2. SAC agent reward function parameters.

Constants		Variables
$α$	Final reward multiplier (1.5)	$d_{current}$	Current distance to target
$c_{thr}$	Collision threshold distance	$d_{prev}$	Previous distance to target
$R_{coll}$	Collision penalty ( $- 0.5$ )	$Δ d$	Change in distance
$d_{succ}$	Success distance threshold (0.75)	$θ_{diff}$	Heading difference to target direction
$s_{fact}$	Distance scaling factor (0.3)	$d_{min}$	Minimum distance to an obstacle
$h_{tol}$	Heading tolerance ( $45^{\circ}$ or $π / 4$ )	$v_{lin}$	Current linear velocity of the robot
$R_{head}$	Base heading bonus (0.01 per step)

Table 3. Performance metrics across environments. Arrows are used to illustrate whether a high (↑) or low score (↓) is desirable. The best scoring environment for each metric is marked in bold.

Metric	Flat Terrain	Inspection World	Island
Success rate (SR) ↑	100%	75%	73%
Total collisions (TC) ↓	0	52	0
Velocity over rough terrain (VORT) ↑	0.78 $m / s$	0.44 $m / s$	0.61 $m / s$
Total smoothness of route (TSR) ↓	12.82	12.43	14.46

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Vice, J.M.; Sukthankar, G. DUnE: A Versatile Dynamic Unstructured Environment for Off-Road Navigation. Robotics 2025, 14, 35. https://doi.org/10.3390/robotics14040035

AMA Style

Vice JM, Sukthankar G. DUnE: A Versatile Dynamic Unstructured Environment for Off-Road Navigation. Robotics. 2025; 14(4):35. https://doi.org/10.3390/robotics14040035

Chicago/Turabian Style

Vice, Jack M., and Gita Sukthankar. 2025. "DUnE: A Versatile Dynamic Unstructured Environment for Off-Road Navigation" Robotics 14, no. 4: 35. https://doi.org/10.3390/robotics14040035

APA Style

Vice, J. M., & Sukthankar, G. (2025). DUnE: A Versatile Dynamic Unstructured Environment for Off-Road Navigation. Robotics, 14(4), 35. https://doi.org/10.3390/robotics14040035

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

DUnE: A Versatile Dynamic Unstructured Environment for Off-Road Navigation

Abstract

1. Introduction

2. Method

2.1. Integration of Gymnasium Interface with ROS 2 and Gazebo

2.2. Robot Platforms

2.3. Environment Models

2.4. Dynamic Obstacle Integration

2.5. Automated Metric Logging System

2.6. Soft Actor Critic (SAC) Agent Implementation

2.7. RL Training Procedure

3. Results

4. Discussion

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI