Figure 1.
Architecture of the LLM-enhanced software robot for ROS2-based platforms. The agent connects the user web interface with the underlying ROS2 ecosystem, combining a large language model, a command-planning layer, and ROS2 integration nodes that access topics, services, and logs. High-level natural-language requests are translated into structured ROS2 commands and missions, while feedback from the robot and infrastructure is aggregated and summarized back to the user.
Figure 1.
Architecture of the LLM-enhanced software robot for ROS2-based platforms. The agent connects the user web interface with the underlying ROS2 ecosystem, combining a large language model, a command-planning layer, and ROS2 integration nodes that access topics, services, and logs. High-level natural-language requests are translated into structured ROS2 commands and missions, while feedback from the robot and infrastructure is aggregated and summarized back to the user.
Figure 2.
Mobile robot platform (ASTIbot) used for real-life validation experiments: (a) rear view with detailed components of the platform; (b) frontal view.
Figure 2.
Mobile robot platform (ASTIbot) used for real-life validation experiments: (a) rear view with detailed components of the platform; (b) frontal view.
Figure 3.
Examples of omnidirectional motion patterns of the Mecanum-wheeled mobile platform.
Figure 3.
Examples of omnidirectional motion patterns of the Mecanum-wheeled mobile platform.
Figure 4.
Global and robot-fixed coordinate frames used to define the mobile platform pose.
Figure 4.
Global and robot-fixed coordinate frames used to define the mobile platform pose.
Figure 5.
Geometric definition of the pose-tracking error between the robot state and the desired goal.
Figure 5.
Geometric definition of the pose-tracking error between the robot state and the desired goal.
Figure 6.
Mobile robot local web server and ROS node stack used as the integration layer for the AI control agent: (a) Vendor-provided web interface for manual base control and real-time status monitoring on the ASTIbot platform. (b) RViz visualization of the full robot model and sensor configuration.
Figure 6.
Mobile robot local web server and ROS node stack used as the integration layer for the AI control agent: (a) Vendor-provided web interface for manual base control and real-time status monitoring on the ASTIbot platform. (b) RViz visualization of the full robot model and sensor configuration.
Figure 7.
Overall performance comparison across LLMs. (a) Average response time, (b) generation speed (tokens/s), (c) quality score (0–100), (d) topic coverage percentage.
Figure 7.
Overall performance comparison across LLMs. (a) Average response time, (b) generation speed (tokens/s), (c) quality score (0–100), (d) topic coverage percentage.
Figure 8.
Model reliability across evaluated LLMs, reporting the proportion of successful responses, errors, and timeouts in the ROS/ROS2 benchmarking environment.
Figure 8.
Model reliability across evaluated LLMs, reporting the proportion of successful responses, errors, and timeouts in the ROS/ROS2 benchmarking environment.
Figure 9.
Heatmap showing quality scores across different ROS/ROS2 categories. Darker colors indicate better performance.
Figure 9.
Heatmap showing quality scores across different ROS/ROS2 categories. Darker colors indicate better performance.
Figure 10.
Radar chart comparing the evaluated language models across multiple dimensions (quality, topic coverage, speed, reliability, and completeness).
Figure 10.
Radar chart comparing the evaluated language models across multiple dimensions (quality, topic coverage, speed, reliability, and completeness).
Figure 11.
Model rankings based on weighted composite scores. (a) Overall ranking; (b) score breakdown by component.
Figure 11.
Model rankings based on weighted composite scores. (a) Overall ranking; (b) score breakdown by component.
Figure 12.
Overall performance comparison across LLMs. (a) Command generation quality (scores, 0–100), (b) interpretation quality (scores, 0–100), (c) valid-command success rate (%), (d) valid interpretation success rate (%).
Figure 12.
Overall performance comparison across LLMs. (a) Command generation quality (scores, 0–100), (b) interpretation quality (scores, 0–100), (c) valid-command success rate (%), (d) valid interpretation success rate (%).
Figure 13.
Category-wise command-generation performance of the evaluated LLMs. Each subplot reports per-category scores for (a) command generation quality (0–100) and (b) interpretation quality (0–100). Higher command-generation scores indicate a larger proportion of syntactically and semantically valid ROS2 commands, while interpretation scores reflect how accurately models summarize and explain ROS feedback across navigation, perception, diagnostics, and system-configuration tasks (%).
Figure 13.
Category-wise command-generation performance of the evaluated LLMs. Each subplot reports per-category scores for (a) command generation quality (0–100) and (b) interpretation quality (0–100). Higher command-generation scores indicate a larger proportion of syntactically and semantically valid ROS2 commands, while interpretation scores reflect how accurately models summarize and explain ROS feedback across navigation, perception, diagnostics, and system-configuration tasks (%).
Figure 14.
Radar plot summarizing command-level performance of the evaluated LLMs. Each axis corresponds to one evaluation dimension (command-generation mean score, interpretation mean score, both valid scores, speed score, and consistency of the model). Larger, more regular polygons indicate models that achieve a more balanced trade-off between ROS2 command generation and robust interpretation of feedback across all metrics.
Figure 14.
Radar plot summarizing command-level performance of the evaluated LLMs. Each axis corresponds to one evaluation dimension (command-generation mean score, interpretation mean score, both valid scores, speed score, and consistency of the model). Larger, more regular polygons indicate models that achieve a more balanced trade-off between ROS2 command generation and robust interpretation of feedback across all metrics.
Figure 15.
Final command-level rankings of the evaluated LLMs based on the composite score that combines command-generation quality (25%), interpretation quality (25%), the rate of interactions where both command and interpretation are valid (30%), and speed (20%). Models with higher composite scores achieve a more favorable trade-off between generating correct ROS2 commands, reliably interpreting feedback, and responding within practical latency bounds.
Figure 15.
Final command-level rankings of the evaluated LLMs based on the composite score that combines command-generation quality (25%), interpretation quality (25%), the rate of interactions where both command and interpretation are valid (30%), and speed (20%). Models with higher composite scores achieve a more favorable trade-off between generating correct ROS2 commands, reliably interpreting feedback, and responding within practical latency bounds.
Figure 16.
Web application used to interact with the AI control agents and to monitor the simulated TurtleBot3 mobile robot: (a) Home dashboard summarizing the status of connected ROS2 topics, services, and agents. (b) Log view displaying recent interactions, generated commands, and execution feedback used to test the agent’s abilities. (c) Chat interface where operators issue natural-language instructions that are translated into ROS2 commands. (d) Location management page for defining and storing frequently used waypoints in the simulated environment.
Figure 16.
Web application used to interact with the AI control agents and to monitor the simulated TurtleBot3 mobile robot: (a) Home dashboard summarizing the status of connected ROS2 topics, services, and agents. (b) Log view displaying recent interactions, generated commands, and execution feedback used to test the agent’s abilities. (c) Chat interface where operators issue natural-language instructions that are translated into ROS2 commands. (d) Location management page for defining and storing frequently used waypoints in the simulated environment.
Figure 17.
ROS2 TurtleBot3 navigation simulation environment: (a) 3D simulated warehouse-like world in Gazebo with the TurtleBot3 platform. (b) RViz2 view showing the full ROS2 navigation stack, including the robot model, map, local and global paths, and active navigation goals.
Figure 17.
ROS2 TurtleBot3 navigation simulation environment: (a) 3D simulated warehouse-like world in Gazebo with the TurtleBot3 platform. (b) RViz2 view showing the full ROS2 navigation stack, including the robot model, map, local and global paths, and active navigation goals.
Figure 18.
ROS2 computation graph of the mobile robot stack: navigation, perception, and low-level control nodes and their communication links.
Figure 18.
ROS2 computation graph of the mobile robot stack: navigation, perception, and low-level control nodes and their communication links.
Figure 19.
TF transform tree of coordinate frames used for localization, odometry, and motion control on the mobile platform.
Figure 19.
TF transform tree of coordinate frames used for localization, odometry, and motion control on the mobile platform.
Figure 20.
RViz view of the mobile robot in our laboratory.
Figure 20.
RViz view of the mobile robot in our laboratory.
Table 1.
Overview of physical and software (virtual) robots, their characteristics, their applications, and the impact of AI.
Table 1.
Overview of physical and software (virtual) robots, their characteristics, their applications, and the impact of AI.
| Robots | Characteristics | Applications | AI Impact |
|---|
| Physical industrial robots | High-precision, repeatable motions; can handle heavy loads; continuous operation with minimal breaks; high upfront investment but long-term cost savings. | Assembly, welding, painting, material handling in manufacturing Pavel and Stamatescu [15]; logistics and warehousing. | AI-based perception and control improve path planning, quality inspection, and adaptive behavior to variations in the environment. |
| Mobile service robots | Autonomous navigation in dynamic environments; interaction with humans and objects; equipped with sensors and onboard computing. | Hospital logistics, delivery robots, inspection and maintenance, hospitality and retail. | AI enables robust localization, mapping, and human–robot interaction, extending deployment to unstructured and crowded spaces. |
| Software process robots (RPA) | Operate in purely digital environments; mimic human interactions with user interfaces and APIs; highly scalable and easily replicable. | Back-office processes (invoice handling, order processing, report generation), integration of legacy IT systems. | ML and NLP allow bots to handle semi-structured data, classify documents, and make context-aware decisions. |
| Cloud orchestration and infrastructure bots | Monitor and manage distributed computational resources; automatically deploy, scale, and update services. | Fleet management platforms for robots, cloud-based perception or planning services, CI/CD pipelines for robotic software. | AI-driven policies optimize resource allocation, predict load, and enable self-healing infrastructures that support large-scale robotic deployments. |
| Digital twin and simulation agents | Virtual replicas of physical robots and environments Rosioru et al. [16]; run accelerated simulations and what-if scenarios. | Design and testing of robotic cells, optimization of production lines, energy and throughput analysis, operator training. | AI uses simulation data for policy learning and transfer learning to improve real-world performance. |
| Data-analysis and monitoring agents | Continuously analyze sensor and log data; detect anomalies and predict failures; provide decision-support dashboards. | Predictive maintenance for robots, quality control, energy optimization, safety monitoring in industrial plants. | Advanced ML models increase accuracy of failure prediction, reduce false alarms, and support proactive interventions that minimize downtime. |
| Human-facing conversational agents | Natural-language interfaces to robotic systems; support operators and end-users with explanations and task specification. | Voice or chat interfaces for configuring robots, training support, remote assistance and teleoperation. | LLMs enable more intuitive multimodal interaction, automatic documentation, and human–machine collaboration. |
Table 3.
Hardware platforms and technical specifications.
Table 3.
Hardware platforms and technical specifications.
| Computer model | ASUS Vivobook Pro 15 OLED | NVIDIA Jetson Orin Nano 4 GB |
| CPU | AMD Ryzen 9 5900HX | 6-core Arm Cortex-A78AE v8.2 |
| GPU | NVIDIA GeForce RTX 3050 | 512-core NVIDIA Ampere architecture GPU with 16 tensor cores |
| Memory | Dedicated GPU memory 4 GB | Shared RAM memory 4 GB |
| OS | MS Windows 11 Pro | Ubuntu 20.04.6 LTS |
| Role | Running software LLM agents | Controlling hardware mobile platform |
Table 4.
Mobile platform wheel geometry and roller angles used in the kinematic model.
Table 4.
Mobile platform wheel geometry and roller angles used in the kinematic model.
| Wheel Index | Position | Associated Velocity | Angle of Rollers |
|---|
| 1 | Front-left | | −45° |
| 2 | Front-right | | 45° |
| 3 | Rear-left | | 45° |
| 4 | Rear-right | | −45° |
Table 5.
Hardware components and specifications of the mobile robot platform.
Table 5.
Hardware components and specifications of the mobile robot platform.
| Name | Description |
|---|
| NVIDIA Jetson Orin Nano | |
| LsLiDAR M10 Scanner | TOF (Time-of-Flight) principle 2D scanning of the surrounding 360° environment Measuring frequency: 10 Hz Accuracy: ±3 cm Maximum range: 25 m
|
| ASTRA Depth Camera | Distance range: 0.6–8 m RGB image resolution: 1920 × 1080 @ 30 fps Depth image resolution: 640 × 480 @ 30 fps Depth precision: ±3 mm @ 1 m
|
| MD60 100W DC Motor | Chassis Wheel Independent Suspension Shock Absorber Damper MD60 Reduction ratio: 1:18 Motor: DC Brush, rated voltage 24 V, rated power 100 W Rated current: 6 A Rated speed: 175 rpm Rated torque: 55.7 kg·cm
|
| Omnidirectional Wheels (Mecanum) | Material: stainless steel + rubber Weight: 700 g (per wheel) Diameter: 152.4 mm Width: 55.5 mm Roller dimensions: 20.4 mm × 41.3 mm
|
| Power Supply | 2 × VRLA Ultracell 12 V battery, 22 Ah (UCG22-12) |
| Total Weight | 45 kg |
| Payload | ∼30 kg |
| Dimensions (W × H × D) | 73.5 cm × 62 cm × 100 cm |
Table 6.
Representative ROS/ROS2 benchmark prompts by skill level and topic category.
Table 6.
Representative ROS/ROS2 benchmark prompts by skill level and topic category.
| Level | Category | Representative Prompt | Main Topics |
|---|
| Beginner | Basics | Difference between ROS and ROS2? When to use each? | DDS, real-time, lifecycle |
| Intermediate | Services | Topics vs. services + Python service example | request–response, srv, client–server |
| Advanced | Actions | Action server with feedback (Python 3) | goal, feedback, result, ActionServer |
| Expert | Security | Implement SROS2 for production system | DDS security, certificates, encryption |
| Expert | Migration | Best strategy: large ROS 1 → ROS 2 migration | ros1_bridge, gradual migration |
| Troubleshooting | Debugging | Nodes cannot see each other—troubleshoot | domain_id, discovery, firewall |
Table 7.
Natural-language robot queries with expected ROS topics and command examples.
Table 7.
Natural-language robot queries with expected ROS topics and command examples.
| Category | Representative User Query | Key Expected Topics | Typical ROS Command |
|---|
| Position | Where is the robot right now? | pose, position, location, odom, coordinates | ros2 topic echo /robot_pose |
| Battery | What is the battery level? | battery, percentage, voltage, charging | ros2 topic echo /battery_status |
| Status | What is the robot status? | status, state, activity, idle, navigating | ros2 topic echo /robot_status |
| Sensors | Are there any obstacles around the robot? | scan, lidar, obstacles, range, laser | ros2 topic echo /scan |
| Command Generation | Send the robot to the assembly station | navigate, goto, position, cmd_vel, service | ros2 service call /goto_position |
Table 8.
Overall model ranking based on the weighted composite benchmark score.
Table 8.
Overall model ranking based on the weighted composite benchmark score.
| Rank | Model | Quality | Coverage | Speed | Composite |
|---|
| 1 | Gemma3-qat | 85.57 | 48.58 | 15.99 | 62.72 |
| 2 | Gemma3 | 84.61 | 47.62 | 16.03 | 62.11 |
| 3 | SmolLM2 | 62.5 | 14.48 | 86.04 | 58.42 |
| 4 | Llama 3.2 | 80.76 | 34.74 | 18.35 | 57.36 |