Next Article in Journal
Few-Shot Learning for Irregular Hangeul Typeface Expansion: A Comparative Study of GAN, VQGAN, and Diffusion Models
Previous Article in Journal
An Efficient and Secure Group Rekeying Scheme for WSNs via Symmetric Polynomial Key Pre-Distribution
Previous Article in Special Issue
EdgeElderCare: A Resource-Aware, Scene-Adaptive Edge-Cloud Collaborative System for Long-Term Elderly Safety and Health Monitoring
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Review

The Evolution of the Robot Operating System Communication Ecosystem: An Overview of the DDS Architecture and Emerging Communication Protocols

1
School of Computer Science and Artificial Intelligence, Civil Aviation Flight University of China, Guanghan 618307, China
2
Guanghan Flight College, Civil Aviation Flight University of China, Guanghan 618307, China
*
Author to whom correspondence should be addressed.
Electronics 2026, 15(12), 2632; https://doi.org/10.3390/electronics15122632 (registering DOI)
Submission received: 12 May 2026 / Revised: 9 June 2026 / Accepted: 12 June 2026 / Published: 14 June 2026

Abstract

As robotic systems evolve toward large-scale distributed architectures and cloud-edge collaboration, communication middleware has become a critical infrastructure that impacts system real-time performance and scalability. The traditional Robot Operating System 1 (ROS 1) communication architecture, which relies on a centralized master node, has limitations in dynamic network environments. Robot Operating System 2 (ROS 2) achieves decentralized communication through the introduction of DDS. However, the single Data Distribution Service (DDS) mechanism remains inadequate for cross-network communication and high-performance local data exchange. Addressing the current issue in ROS communication research: the coexistence of multiple mechanisms without a unified analytical framework or guidance for selection. This paper systematically traces the evolution of the ROS communication architecture from centralized to distributed systems. It constructs a unified analytical framework covering two dimensions: communication models and data transmission paths. Crucially, to overcome the unreliability of cross-protocol comparisons based on heterogeneous literature, this paper designs and executes a set of unified benchmark experiments on a controlled testbed. These experiments systematically evaluate the performance of two mainstream DDS implementations (CycloneDDS and FastDDS) across five key metrics: latency, throughput, jitter, scalability, and packet loss rate under load. Additionally, a comprehensive comparative analysis of the performance of three transmission modes is conducted. Based on this comprehensive evaluation, this paper summarizes the performance characteristics of different mechanisms and further proposes an optimization-based middleware selection method for quantitative communication mechanism selection under different workload and application requirements. This paper provides a systematic reference for the design and optimization of ROS communication systems and offers guidance for promoting the application of multi-middleware collaborative architectures in robotic systems.

1. Introduction

With the advancement of robotics in fields such as industrial automation and autonomous driving, the software complexity and scale of collaboration in robotic systems continue to increase [1,2]. Communication middleware has become a core component of robotic software system architectures, and its design directly impacts the system’s real-time performance, scalability, and reliability [3].
As a standard middleware platform in the field of robotics software development, the Robot Operating System (ROS) has been widely used in academic research and industrial development since its release in 2007 [4]. The early ROS 1 system was designed primarily for single-robot or small-scale systems in terms of its communication architecture [4]. Although this architecture offered high usability in early research, it has revealed significant limitations in large-scale distributed robotic systems, real-time control, and safety applications [5]. ROS 1 employs a centralized discovery mechanism based on master nodes. Node registration and connection negotiation are implemented via XML-RPC, and data transmission is carried out using TCPROS or UDPROS [4]. ROS 2 has undergone an architectural upgrade, introducing a DDS-based distributed communication model. DDS is a standardized publish-subscribe communication middleware developed by the Object Management Group, capable of providing a wide range of quality of service policies. It also supports a decentralized node discovery mechanism, thereby significantly enhancing the system’s scalability and real-time communication capabilities [6,7]. A significant body of research has focused on the performance differences among various DDS implementations [8,9].
Most existing research focuses on evaluating the performance of a single communication mechanism or conducting localized optimization analyses for specific application scenarios. Currently, there is a lack of an analytical framework capable of providing a unified characterization of various communication mechanisms (such as DDS, Zenoh, and Iceoryx) based on key dimensions such as communication models and data transmission paths. Furthermore, within the ROS communication ecosystem where multiple mechanisms coexist, there remains a lack of systematic guidance on how to select appropriate mechanisms and design them collaboratively based on different application scenarios (such as local high-frequency communication, cross-network communication, or mixed-criticality systems).
Robotic system architectures are gradually shifting toward edge computing and cloud-native approaches, and the ROS communication ecosystem is becoming increasingly diverse [3]. Emerging communication technologies such as Zenoh and Iceoryx are being progressively integrated into the ROS ecosystem, providing support for high-performance local communication and cross-network data exchange, respectively. These technologies offer new communication paradigms and avenues for performance optimization in robotic systems [10]. The ROS communication ecosystem is gradually evolving into a complex system composed of various middleware technologies. Therefore, it is necessary to conduct a systematic review of the development of ROS communication technologies from a comprehensive perspective. A comparative analysis of different communication middleware in terms of performance characteristics and application scenarios is required to clarify their respective scopes of application.
To date, no study has conducted a comprehensive performance comparison and application selection analysis of DDS, Zenoh, and Iceoryx from the dual perspectives of communication models and data transmission paths. This paper presents a systematic review of the development process of the ROS communication ecosystem. It analyzes the evolution of ROS communication mechanisms from the perspective of architectural evolution. It systematically examines the communication model of ROS 1 and the distributed communication architecture of ROS 2, and, in conjunction with the research methodology framework shown in Figure 1, summarizes the key changes in the design philosophy of the ROS communication system.
In response to the aforementioned research gaps, this paper focuses on three main aspects: architectural modeling, performance evaluation, and optimization-based middleware selection. The main contributions of this paper can be summarized as follows:
  • From the perspective of architectural evolution, systematically trace the technical development path of ROS communication from centralized to distributed systems, and summarize key design turning points.
  • Establish a unified classification framework for communication middleware, and conduct a standardized analysis of DDS, Zenoh, and Iceoryx based on two dimensions: communication models and data transmission paths.
  • Conduct original, unified benchmark experiments: In contrast to existing studies that aggregate results from heterogeneous literature, this paper designs and performs systematic performance measurements on a controlled testbed. The experiments cover message sizes from 64 B to 8 MB and node counts from 1 to 50, providing reliable cross-protocol comparisons of CycloneDDS and FastDDS in terms of latency, throughput, jitter, scalability, packet loss under publication rate load, and the performance of shared memory and zero-copy modes for large messages.
  • Propose an optimization-based middleware selection method that models middleware selection as a constrained optimization problem based on message size, subscriber scale, and application performance requirements. By combining the analytical model with experimentally derived performance parameters, the method enables quantitative middleware selection and a priori performance estimation for ROS communication systems.
  • Summarize the key challenges of ROS communication regarding multi-middleware collaboration, real-time performance assurance, and resource constraints, and outline future development directions.

2. The Evolution of ROS Communication Mechanisms

The ROS communication system has undergone significant architectural evolution over the past decade or so [3,11]. This evolution has ranged from the centralized master-based communication model of ROS 1 to the distributed communication architecture using DDS in ROS 2.
The ROS communication system has gradually evolved from a simple node-to-node interconnection mechanism into a complex communication ecosystem characterized by high scalability, configurable Quality of Service (QoS), and support for multiple middleware solutions [3]. This chapter analyzes the design features and limitations of ROS 1 communication mechanisms from the perspective of architectural evolution. It further introduces the DDS-based communication model in ROS 2 and the design philosophy behind the ROS Middleware Interface (RMW) abstraction layer.

2.1. Communication Architecture and Limitations of ROS 1

2.1.1. Design of the TCPROS/UDPROS Protocols

In ROS 1, data exchange between nodes is primarily achieved through two communication protocols: TCPROS and UDPROS [3]. TCPROS is a reliable transmission mechanism based on the TCP protocol that ensures packets are transmitted in order and in their entirety. As a result, it is commonly used for the exchange of control information and state data that require high reliability. UDPROS is implemented based on the UDP protocol and uses a best-effort transmission method, which can reduce communication latency to some extent [12].
From an implementation perspective, both protocols use the ROS custom message format for data encapsulation. The communication process typically involves steps such as message serialization, network transmission, and deserialization. This process introduces some CPU overhead and data copying overhead [13]. In scenarios with low bandwidth or high-frequency data streams, serialization and network transmission may become performance bottlenecks [14]. Furthermore, both TCPROS and UDPROS rely on the ROS Master to perform node discovery and establish connections. This requires additional network interactions during the communication initialization process. Since the ROS 1 communication protocols do not provide a unified QoS control mechanism, it is difficult for application developers to fine-tune communication behavior according to different task requirements [15].

2.1.2. Centralized Discovery Mechanism

A key feature of the ROS 1 communication architecture is its centralized node discovery mechanism [3]. During system startup, all ROS nodes must first register with the ROS Master and declare the topics they publish, the message types they subscribe to, and the service interfaces they provide. When a node needs to establish a communication connection with another node, it queries the ROS Master for information about the relevant node. Communication parameters are negotiated via the XML Remote Procedure Call (XML-RPC) protocol, ultimately establishing a point-to-point data transmission connection.
As shown in Figure 2, the ROS Master serves as the central coordinator in the entire communication system. The ROS Master constitutes a single point of failure in the system [14]. If the Master node fails or the network connection is interrupted, nodes in the system will be unable to establish new communication connections. As the system scales, node registration and query requests are concentrated on the Master node, which may lead to performance bottlenecks. In dynamic network environments, such as those involving multi-robot collaboration or mobile networks, frequent node join and departure also increase the load on the Master node. As the scale of robot systems continues to expand, the centralized discovery mechanism of ROS 1 struggles to meet the demands of large-scale distributed systems [16].

2.1.3. Real-Time Performance and Security Issues

ROS 1 was originally designed primarily for research-oriented robotics platforms, and its communication system was not specifically optimized for real-time control systems [14]. In control loops requiring strict time constraints, such as robotic motion control or autonomous driving systems, the ROS 1 communication mechanism struggles to provide deterministic time guarantees [17]. ROS 1 also has significant shortcomings in terms of security mechanisms. By default, communication between ROS nodes does not provide authentication, access control, or data encryption mechanisms [18]. This open communication model may pose security risks in industrial systems or open network environments [19]. These limitations prompted the ROS community to explore new communication architectures, thereby driving the development of ROS 2.

2.2. Communication Innovations in ROS 2

To address the shortcomings of ROS 1 described above, ROS 2 underwent a major overhaul of its communication architecture. Most notably, it introduced a distributed communication model based on DDS. By adopting standardized publish-subscribe middleware, ROS 2 enables automatic node discovery and communication management without the need for a central node. This provides developers with more robust QoS configuration capabilities [19].

2.2.1. Native Integration of the DDS Standard

DDS is a data-oriented publish-subscribe communication standard that is widely used in real-time distributed systems such as aerospace, industrial control, and autonomous driving systems [15,20]. ROS 2 uses DDS as its underlying communication middleware. ROS nodes can exchange data through DDS entities such as participants, publishers, and subscribers.
In the DDS communication model, nodes within the system exchange data through a set of standardized entities [21]. These include components such as Domain Participant, Publisher, Subscriber, Data-Writer, and Data-Reader. ROS 2 maps these DDS entities to ROS communication concepts. ROS nodes can leverage DDS to implement an efficient data distribution mechanism while maintaining the original publish-subscribe model.
As shown in Figure 3, in a ROS 2 system, nodes no longer rely on a central coordinating node such as the ROS Master. Instead, they use DDS’s automatic discovery mechanism to detect other nodes and establish communication links. DDS uses a distributed discovery protocol to broadcast node information across the network [15]. New communication participants can automatically detect existing nodes and establish data channels. This mechanism significantly enhances the system’s adaptability in dynamic network environments. For example, in multi-robot collaboration systems or mobile robot networks, the communication topology can automatically update when new nodes join or leave the system, without the need for central control.
Our benchmark experiments (Section 5.2) further quantify the practical performance of this architecture: CycloneDDS provides 32–48 μs latency for messages ≤ 16 KB while FastDDS excels at large messages (e.g., 2.18 ms at 4 MB). Scalability results confirm that CycloneDDS degrades more gracefully with 50 subscribers.
In addition to its decentralized discovery mechanism, DDS offers a rich set of quality-of-service (QoS) policies. Communication behavior can be flexibly configured according to application requirements. Typical QoS policies include parameters such as reliability, history, persistence, deadlines, and message lifetimes [19]. These policies allow developers to balance system reliability, communication latency, and network resource utilization.
In practical robot systems, different types of sensors have significantly varying requirements for communication performance [22]. For example, LiDAR and camera data typically require high data throughput, whereas inertial measurement unit (IMU) data prioritizes stable publication rates and low latency. To accommodate these diverse needs, ROS 2 allows developers to configure QoS policies independently for each topic, ensuring system stability while optimizing communication performance.
Table 1 summarizes the QoS configuration strategies for common sensor data streams in ROS 2.
The performance impacts described in the last column of Table 1 were partially quantified in a similar manner through our unified benchmarking (Section 5.2). For small-to-medium-sized messages (64 B–16 KB), CycloneDDS achieved an average latency of 32–48 μs corroborating the conclusion of a “60–80% reduction in latency” for LiDAR point clouds compared to the ROS 1 TCPROS benchmark. For RGB cameras, zero-copy transmission reduced the average latency of 8 MB messages from 2264 μs (standard DDS) to 1006 μs, a 56% reduction. This significant improvement aligns with the expected performance gains from eliminating serialization and redundant memory copies, thereby indirectly supporting the qualitative description in Table 1 that “zero-copy eliminates serialization overhead.” For IMU data, under keep_last = 10 and a 10-millisecond cutoff, the reliable quality of service (QoS) aligns with our jitter measurements: for 1 MB message sizes, both CycloneDDS and FastDDS exhibit a coefficient of variation below 20%, indicating temporal stability suitable for 100 Hz IMU data streams. The remaining entries (haptic, ToF, radar, sonar, GPS, magnetometer) represent qualitative recommendations based on sensor characteristics, which are reserved for future experimental validation.
By appropriately configuring QoS parameters, a good balance can be achieved between system reliability, communication latency, and bandwidth utilization. DDS itself possesses excellent real-time communication characteristics and a mature foundation for industrial applications, and ROS 2 is capable of meeting the communication requirements of real-time systems to a certain extent [20]. In a real-time operating system environment, DDS can reduce communication latency jitter through priority scheduling and network resource management mechanisms [14], thereby improving the temporal determinism of the control system. This capability has led to the gradual adoption of ROS 2 in scenarios with high communication reliability requirements, such as industrial robotics and autonomous driving [23,24].
By natively integrating the DDS communication standard, ROS 2 resolves the centralization bottleneck present in the ROS 1 communication architecture. It also significantly enhances system scalability, communication flexibility, and real-time performance. Furthermore, it provides a unified architectural foundation for the future integration of new communication protocols, such as Zenoh and Iceoryx.

2.2.2. Design and Significance of the RMW Interface

While introducing DDS as the underlying communication mechanism, ROS 2 also designed the RMW abstraction layer [3]. Through this abstraction layer, different DDS implementations can be integrated into the ROS 2 system as plugins without requiring modifications to the upper-layer application code [3,14]. The RMW abstraction layer also provides a crucial foundation for the expansion of the ROS communication architecture. Emerging communication technologies (such as Zenoh) can also be integrated with ROS 2 by implementing the RMW interface. This allows for the expansion of the system’s communication capabilities while maintaining consistency with the ROS programming model [25].
As shown in Figure 4, ROS application nodes call a unified communication interface through client libraries (such as rclcpp or rclpy). The actual data transmission is then mapped by the RMW layer to the corresponding DDS implementation, such as CycloneDDS, FastDDS, or RTI Connext. This design enables ROS 2 to select the most appropriate communication middleware for different application scenarios. The RMW abstraction layer enables a fair comparison of different DDS implementations under identical conditions. Using this framework, our unified benchmarks in Section 5.2 reveal key performance differences: CycloneDDS achieves lower latency for small-to-medium messages (e.g., 32.54 μs at 16 KB vs. FastDDS’s 48.31 μs, while FastDDS outperforms CycloneDDS for large messages (2.18 ms vs. 9.10 ms at 4 MB). Throughput under saturation and packet loss rates under load further distinguish the two implementations.
The evolution from ROS 1 to ROS 2 essentially reflects a shift in the communication architecture from “centralized coordination” to “distributed data-driven” communication. In ROS 1, communication control relied on a central node, resulting in high system coupling and limited scalability. By introducing DDS, ROS 2 shifts communication control logic to distributed middleware, enabling node autonomy and a data-driven communication model. Simultaneously, the RMW abstraction layer decouples communication mechanisms, granting the system the ability to extend to multiple middleware solutions. This evolution represents not only a performance optimization but also a fundamental shift in communication architecture design paradigms, from “centralized control” to “decentralized and pluggable.”

2.2.3. Security Enhancements in ROS 2

Compared with ROS 1, ROS 2 introduces comprehensive security mechanisms to address the increasing cybersecurity requirements of modern robotic systems [19]. While ROS 1 primarily relies on network isolation and external security measures, ROS 2 incorporates security capabilities through the DDS Security specification, providing built-in protection for distributed communication.
DDS Security defines a standardized security architecture consisting of authentication, access control, and cryptographic protection. Authentication mechanisms verify the identities of communication participants through Public Key Infrastructure (PKI) based certificates, preventing unauthorized nodes from joining the system. Access control policies regulate the permissions of publishers and subscribers, ensuring that only authorized entities can access specific topics or services. In addition, cryptographic services provide message encryption and integrity verification to protect communication data against eavesdropping, tampering, and replay attacks.
To facilitate practical deployment, ROS 2 introduces Secure ROS 2 (SROS2), which provides tools for certificate generation, key management, and policy configuration. By integrating DDS Security with the ROS 2 middleware framework, SROS2 enables secure communication without requiring significant modifications to application-level code. However, security mechanisms inevitably introduce additional computational overhead due to encryption, authentication, and key management operations. Therefore, achieving an appropriate balance between communication security and real-time performance remains an important consideration in industrial and safety-critical robotic applications.

2.2.4. Comparison of Mainstream DDS Implementations

Among the DDS implementations supported by ROS 2, FastDDS, CycloneDDS, and RTI Connext are the most widely adopted solutions.
CycloneDDS is widely recognized for its lightweight architecture and efficient handling of small and medium-sized messages. The benchmark results presented in Section 5.2 demonstrate that CycloneDDS achieves lower latency, lower CPU utilization, and better scalability than FastDDS in most small message and multi-subscriber scenarios. Furthermore, CycloneDDS exhibits lower packet loss under overload conditions, indicating stronger robustness in resource-constrained and large-scale deployments.
FastDDS is the default DDS implementation in many ROS 2 distributions and provides excellent compatibility with the ROS ecosystem. While its latency performance for small messages is slightly inferior to CycloneDDS, it demonstrates superior performance when handling very large messages, owing to its optimized fragmentation and flow-control mechanisms. Therefore, FastDDS is particularly suitable for data-intensive applications involving cameras, LiDARs, and large sensor streams.
RTI Connext is a commercial DDS implementation that has been extensively adopted in aerospace, defense, medical, and industrial automation systems. Compared with open-source alternatives, RTI Connext provides mature development tools, extensive Quality of Service support, and certified deployment solutions for safety-critical applications. It also offers stronger support for functional safety standards and deterministic communication requirements. However, its commercial licensing model may limit adoption in research and cost-sensitive projects.
Therefore, CycloneDDS is generally preferable for resource-efficient and scalable deployments, FastDDS is advantageous for large-data transmission scenarios, and RTI Connext remains a strong candidate for safety-critical industrial systems requiring certified middleware solutions.

3. Emerging Communication Protocols and the Expansion of the ROS Ecosystem

As robotic systems continue to scale up, application scenarios are evolving toward edge computing and cloud robotics [26,27]. Traditional DDS-based communication architectures present new challenges in certain scenarios. In scenarios such as data transmission across networks, communication on resource-constrained devices, and high-performance Inter-Process Communication (IPC), the deployment complexity of DDS is relatively high. There are also certain limitations in terms of resource consumption and communication efficiency. In recent years, several new communication technologies have emerged within the ROS ecosystem, aimed at addressing the shortcomings of DDS in cross-network and resource-constrained scenarios [26].
Different communication mechanisms exhibit fundamental differences in terms of shared memory versus network transmission paths and data distribution methods. This chapter classifies and analyzes ROS communication middleware from the perspectives of communication models and data transmission paths, taking into account performance characteristics and application requirements.
Figure 5 presents a system-level overview of ROS 2 communication middleware. The taxonomy has been enhanced with representative benchmark results from Section 5.2 for DDS-based and shared memory middleware, directly linking architectural categories to quantitative metrics such as latency, throughput, CPU usage, and copy overhead. This provides a clear view of both functional distinctions and practical performance implications. The Edge-Cloud (Data-Centric) module illustrates architectural features and deployment scenarios but is not yet supported by experimental data. The application mapping highlights typical use cases for each middleware type, offering an initial, performance-informed reference for middleware selection.

3.1. Zenoh: A Communication Protocol for Edge-to-Cloud Collaboration

Zenoh is a communication protocol designed for distributed data systems [28]. Its core objective is to provide a unified mechanism for data access and distribution. It supports efficient communication between embedded devices and cloud systems. Unlike DDS, which is primarily focused on local area network communication, Zenoh is designed with a stronger emphasis on data flow management across network environments. This enables stable data exchange across complex network topologies.
Zenoh adopts a unified data space design philosophy [28]. Nodes within the system can access data via key expressions, enabling flexible data publication and subscription relationships. This design allows Zenoh to support both real-time data streams and historical data queries, making it highly adaptable for distributed robotics systems.
As shown in Figure 6, Zenoh builds a data forwarding network using the Zenoh Router [29]. Different devices can exchange data through the router. Unlike DDS, Zenoh employs a more flexible data routing strategy. This reduces unnecessary data broadcasts in wide-area network environments and improves communication efficiency [30].
In the ROS, Zenoh is typically integrated with the ROS 2 communication system via the ‘zenoh-bridge-ros2’ component [31]. This bridging mechanism maps ROS topics to the Zenoh data space. ROS nodes can access remote data through the Zenoh network. For example, in a cloud-robot system, a local robot can run a ROS node for real-time control while simultaneously transmitting critical data to the cloud via Zenoh for data processing or task scheduling. Chovet et al. evaluated the performance of Zenoh, FastRTPS, and CycloneDDS on dynamic mesh networks in a real world multi-robot environment. The results demonstrated that Zenoh possesses significant advantages in terms of latency, CPU utilization, and reachability [31].
Although Zenoh offers advantages in edge-cloud scenarios, it faces several key challenges. First, its resilience under network partitions remains insufficient. Zenoh’s default routing is best-effort. When a network partition occurs (e.g., a robot loses connection to the edge router), data published on one side may be lost without notification, and reconnection does not automatically synchronize the lost state. Although Zenoh provides a “queryable” mechanism for on-demand data retrieval, it lacks the persistence and persistent QoS policies found in DDS. Second, the lack of standardization means that interoperability between different Zenoh implementations or with DDS is not guaranteed without custom bridges. Third, compared to DDS, Zenoh’s QoS model exhibits coarser granularity. For example, Zenoh does not support deadline-based policies or resource limits, making it less suitable for mixed-criticality systems. In such systems, high-priority control messages must always meet time constraints. Therefore, Zenoh is best suited for non-critical data flows in wide-area networks, while critical control loops should still rely on DDS.

3.2. Iceoryx: A High-Performance Zero-Copy Communication Mechanism

Unlike Zenoh, which primarily addresses cross-network communication issues, Iceoryx focuses on high performance IPC within a single host. In traditional ROS communication mechanisms, even when multiple nodes run on the same computer, data must still undergo serialization and copying. This process incurs additional data copying and CPU overhead. In high-bandwidth data processing scenarios, such as image processing or LiDAR data processing, these overheads can become system performance bottlenecks.
Iceoryx achieves zero-copy data transfer through a shared memory communication mechanism [32]. In this mechanism, the publisher writes data directly to a shared memory buffer. The subscriber simply accesses the same memory block to read the data. This eliminates the multiple data copies inherent in traditional communication models.
As shown in Figure 7, the Iceoryx system manages shared memory pools and communication ports through a central runtime component. When sending data, a publisher simply allocates a buffer in shared memory and writes the data to it. A subscriber can access the data by accessing the buffer via a pointer. This mechanism significantly reduces communication latency and minimizes CPU resource consumption [32].
To strengthen the connection between the architecture and its practical performance, Figure 7 also incorporates representative benchmark results from Section 5.2.6. Under an 8 MB message workload at 100 Hz, the zero-copy mechanism achieves a mean latency of 1006 μs and a CPU utilization of 5.2%, compared with 2264 μs and 11.5% for standard CycloneDDS. Shared memory communication without zero-copy provides intermediate performance, with a latency of 1618 μs and CPU utilization of 8.2%. These results demonstrate the effectiveness of Iceoryx-based zero-copy communication in reducing both latency and computational overhead for large-message, high-frequency ROS 2 applications.
Although Iceoryx achieves microsecond-level latency and zero-copy efficiency, it has some significant limitations that must be handled with care. The most important limitation is the fixed shared memory pool. Iceoryx requires the memory pool size (e.g., chunk size and number of chunks) to be preconfigured before runtime. If a publisher sends a message larger than the configured chunk size, allocation will fail, resulting in silent data loss. Similarly, if the number of concurrent messages exceeds the pool’s depth, messages will be discarded without backpressure notifications being sent to the publisher. This makes Iceoryx less flexible than DDS, which can dynamically allocate resources. Furthermore, Iceoryx is limited to local use; it cannot be used for network communication unless combined with another transport protocol (e.g., via bridging). Another limitation is the lack of QoS policies; Iceoryx cannot provide fine-grained control over reliability, history, or deadlines. Consequently, it is only suitable for high-throughput, low latency in-host communication where message size and rate are known in advance (e.g., camera frames, LiDAR point clouds). For heterogeneous systems or dynamically changing workloads, a shared memory extension with adaptive pool sizing is required, which remains an open research direction.

3.3. Complementary Relationship Between Emerging Communication Protocols and DDS

Although DDS continues to play a central role in the ROS 2 communication architecture, emerging communication technologies such as Zenoh and Iceoryx have demonstrated significant advantages in specific application scenarios. These communication mechanisms are not mutually exclusive but rather form a complementary relationship in terms of data transmission paths and system layers. DDS primarily handles reliable data distribution among distributed nodes. Zenoh extends data routing capabilities across network environments. Iceoryx, meanwhile, optimizes data transmission efficiency within a single host.
In practical robotics systems, different communication mechanisms often need to work in concert within the same system architecture to meet the requirements of various tasks regarding real-time performance, reliability, and computational resources. CompROS is a ROS 2 architecture solution designed for hybrid criticality systems [33]. Its core concept involves isolating tasks of different criticality levels within the same robotics system and achieving efficient data exchange through multiple communication mechanisms.
As shown in Figure 8, in the CompROS architecture, tasks within the system are assigned to different execution domains based on their criticality levels. For example, safety-critical tasks (such as motion control and safety monitoring) typically run in execution environments with real-time guarantees. Data exchange occurs via DDS communication mechanisms configured with strict QoS settings. Non-critical tasks (such as logging, data analysis, or remote monitoring), on the other hand, can run in general-purpose computing environments and interact with the system through other communication mechanisms.
As the ROS communication framework evolves from a single mechanism to a multi-mechanism collaborative approach, Table 2 provides a comparative summary of DDS, Zenoh, and Iceoryx across several key dimensions to facilitate a clearer comparison.
As shown in Table 2, different communication middleware solutions differ in their communication models and data transmission paths. DDS focuses on data consistency and reliable communication in distributed systems. Zenoh emphasizes data access and distribution capabilities across network environments. Iceoryx, on the other hand, is geared toward high performance local data exchange. These differences reflect the trend of the ROS communication system evolving from a single mechanism approach toward a multi-mechanism collaborative architecture.
The introduction of Zenoh and Iceoryx has enabled the ROS communication system to gradually evolve from a single network based publish-subscribe model into a “multi-mechanism collaboration” communication architecture. Specifically, Iceoryx optimizes local data transmission paths through a shared memory mechanism, enabling low latency transmission of high-frequency data. Meanwhile, Zenoh extends cross network communication capabilities via a flexible routing mechanism, enhancing the system’s adaptability in edge to cloud environments. From a system architecture perspective, this evolution manifests as the layering and decoupling of data transmission paths. Specifically, local communication and cross-network communication are handled by distinct mechanisms, forming a “separated data transmission paths” design pattern. This multi-mechanism collaboration model provides complex robotic systems with greater performance flexibility and architectural scalability.

3.4. Mathematical Formulation of the Unified Analytical Framework

To provide a mathematically tractable analytical model for the unified framework proposed in this paper, we define each middleware instance as a five-tuple that explicitly links architectural attributes to measurable performance metrics: M = (P, T, C, Q, F), where:
  • P ∈ {local shared memory, local network, remote network} denotes the data transmission path;
  • T ∈ {publish-subscribe, shared memory, query-based} denotes the communication model;
  • C ∈ {zero-copy, single copy, multi copy} denotes the copy semantics;
  • Q ∈ {rich, medium, basic} denotes the QoS capability level;
  • F: (P, T, C, Q, S, Nsub) → (L, σL, CPU, PLR) is a performance function that maps the architectural parameters together with workload conditions (message size S and number of subscribers Nsub) to four key outcomes: mean end-to-end latency L, latency jitter σL, CPU utilization, and packet loss rate (PLR).
Under this formalization, the three middleware solutions discussed in this paper are instantiated as:
  • Iceoryx = (local shared memory, shared memory, zero-copy, basic, FIceoryx),
  • DDS = (local network, publish-subscribe, multi-copy, rich, FDDS),
  • Zenoh = (remote network, publish-subscribe/query, multi-copy, medium, FZenoh).
Each F is a data-driven function whose parametric form is derived from the benchmark results in Section 5.2. For example, from the 8 MB, 100 Hz experiments in the latency component of F for different C values follows:
L α · S + β · 1 c z e r o c o p y + γ · 1 c = m u l t c o p y
where 1condition is an indicator function. Substituting the measured values:
  • Multi copy (CycloneDDS): L = 2264 μs;
  • Single copy (shared memory): L = 1618 μs;
  • Zero-copy (Iceoryx): L = 1006 μs;
Solving yields β ≈ 612 μs (overhead for exiting zero-copy) and γ ≈ 646 μs (additional overhead for multi copy versus single copy). Thus, FIceoryx predicts L = αS + 1006 (with α obtainable from smaller message sizes), while FDDS predicts L = αS + 1006 + 612 + 648 = αS + 2266, closely matching the measured 2264 μs.
Similarly, the scalability experiments (4 KB message, Nsub from 1 to 50) give the latency scaling law:
L N s u b = L 0 + θ · N s u b
with θCycloneDDS ≈ 5.5 μs/subscriber and θFastDDS ≈ 6.8 μs/subscriber. This difference is encoded in their respective F functions, which capture internal data replication and lock contention strategies.
The model therefore is not a mere classification scheme but a performance-aware analytical tool that decomposes observed performance into architecture-driven components (P, T, C, Q), quantifies the contribution of each component through empirically fitted parameters (e.g., β, γ, θ), and enables a priori performance estimation for new configurations (e.g., predicting latency for 2 MB messages with 20 subscribers). This directly addresses the need for a true formal model that goes beyond taxonomy.

4. ROS Communication Support Mechanisms and System Challenges

A comparison of DDS, Zenoh, and Iceoryx reveals that ROS communication systems are designed with an emphasis on modularity and scalability. By supporting multiple communication technologies through a unified interface, these systems enable robotic systems to select the appropriate communication mechanism based on application requirements [34,35]. Within a system architecture that integrates multiple communication mechanisms, different middleware solutions exhibit variations in data serialization methods, node discovery mechanisms, and support for QoS policies. These differences are key factors affecting the system’s communication efficiency and consistency. This chapter discusses issues such as the RMW abstraction layer, serialization mechanisms, discovery protocols, and QoS semantic mapping, and summarizes the primary challenges encountered during system development.

4.1. RMW Plugin Architecture and Abstraction Mechanism

RMW serves as the core abstraction layer in the ROS 2 communication framework [36]. This design decouples communication mechanisms from application logic, thereby enhancing the system’s flexibility and portability. In terms of implementation, RMW adopts a plugin-based architecture [36]. Different middleware implementations are encapsulated as independent shared libraries. These are loaded on-demand at runtime, allowing for flexible selection of communication middleware across different application scenarios. Additionally, the RMW interface provides an extension path for integrating non-DDS communication technologies. For example, by implementing the RMW interface, emerging communication mechanisms such as Zenoh can be integrated into the ROS [35], thereby expanding communication capabilities while maintaining consistency with the ROS programming model. RMW serves not only as a communication abstraction layer but also as a key supporting mechanism for enabling multi-middleware collaboration and continuous evolution within the ROS communication framework.
While the RMW abstraction layer provides portability and middleware interoperability, it inevitably introduces additional software layers between ROS applications and the underlying communication middleware. Directly isolating the performance overhead of RMW remains challenging because most ROS 2 benchmarking tools operate through the rcl/rclcpp interface. Nevertheless, the unified benchmark results presented in Section 5.2 provide an indirect assessment of its impact. Under identical ROS 2 APIs and QoS configurations, CycloneDDS achieves lower latency for small messages (32.54 μs at 16 KB, whereas FastDDS exhibits superior performance for large messages (2.18 ms at 4 MB,). These observations indicate that the characteristics of the underlying middleware remain the dominant factor affecting communication performance, while the overhead introduced by the RMW abstraction layer is generally secondary compared with middleware-specific implementation differences. A dedicated comparison between native DDS APIs and ROS 2 RMW-based communication would provide a more precise quantification of abstraction overhead and remains an important direction for future research.

4.2. Serialization Mechanisms and Data Representation Challenges

In ROS 2 communication, data transmission between nodes relies on a unified data representation and serialization mechanism. Currently, the system primarily uses the Common Data Representation (CDR) format defined by DDS to ensure cross-platform and cross-language data compatibility. Some DDS implementations (such as FastCDR) reduce serialization overhead by optimizing data encoding and memory layout. However, in high bandwidth data scenarios, the serialization process can still become a bottleneck for system performance. Additionally, ROS 2 employs a strongly typed messaging system. Message structures are determined at compile time, which helps improve system reliability and type safety. DDS uses a type-hashing mechanism to identify message structure versions, thereby supporting both forward and backward compatibility [37]. In practical applications, methods such as optional fields are typically introduced to enable the smooth evolution of message structures.

4.3. Discovery Mechanisms and Topological Adaptability

ROS 2 nodes rely on the DDS automatic discovery mechanism [34]. Compared to ROS 1, this approach offers improved reliability [38]. However, in large-scale node systems, frequent discovery messages can increase network load and potentially prolong system initialization time. To mitigate this issue, some DDS implementations have introduced a discovery server mode, which reduces broadcast overhead by centrally managing node information. In dynamic network environments (such as multi-robot collaboration), the frequent joining and leaving of nodes causes the communication topology to constantly change [39]. This places higher demands on system adaptability. To improve system stability, mechanisms such as timeout detection and data caching are typically introduced at the application layer. This enhances the system’s robustness in dynamic network environments.

4.4. Heterogeneity in QoS Capabilities and Semantic Mapping

There are significant differences among various communication middleware platforms in terms of QoS policy support [40]. DDS provides a relatively comprehensive QoS framework [21], including parameters such as reliability, durability, history, and resource constraints. Developers can exercise fine-grained control over communication behavior based on application requirements. To more clearly compare the differences in QoS support capabilities and control methods among various communication mechanisms, Table 3 summarizes and analyzes typical communication technologies from multiple perspectives.
Because different communication technologies have varying QoS design objectives, performing semantic mapping under a unified communication interface is not straightforward [21]. For example, when mapping persistence policies in DDS to Zenoh’s data storage mechanisms, it is necessary to account for differences in data lifecycles and caching strategies. These differences may lead to inconsistent behavior across different RMW implementations. Therefore, appropriate configuration based on the specific communication implementation is required during system deployment.

4.5. Challenges of Temporal Determinism and Resource Optimization

In robotic control systems, the uncertainty of communication delays can affect system stability. Therefore, real-time communication is a critical issue in ROS applications. ROS 2 supports real-time communication to some extent through multithreaded executors and priority configuration [41]. However, in complex systems, it is still necessary to combine a real-time operating system with appropriate task scheduling strategies. In embedded robotics platforms, computational resources are typically limited; for example, memory capacity and processor performance are constrained. The ROS community has proposed the Micro Robot Operating System (micro-ROS) framework, which provides a lightweight ROS 2 implementation for resource-constrained devices such as microcontrollers. By streamlining runtime components and adopting static memory management, micro-ROS is able to implement basic ROS communication functions on embedded platforms. Wang et al. further improved real-time performance in resource-constrained environments through a priority-driven, chain-aware scheduling method [42]. Additionally, on multi-core heterogeneous platforms, cache coherence for shared memory and the control of memory access latency are also significant challenges in resource optimization.

4.6. DDS and Time-Sensitive Networking Integration

As robotic systems increasingly operate in industrial automation and autonomous driving environments, deterministic communication has become a critical requirement. Although DDS provides configurable Quality of Service (QoS) policies to improve communication reliability and timing behavior, achieving strict end-to-end determinism remains challenging when relying solely on conventional Ethernet networks.
Time-Sensitive Networking (TSN) extends standard Ethernet through a collection of IEEE standards that provide precise clock synchronization, traffic scheduling, and bounded latency guarantees. The integration of DDS and TSN has therefore emerged as a promising solution for real-time robotic communication. In this architecture, DDS continues to provide the data-centric publish–subscribe model and flexible QoS management, while TSN offers deterministic transmission at the network layer. Together, they enable predictable latency, reduced jitter, and improved reliability under network congestion.
The combination of DDS and TSN is particularly attractive for industrial robots, autonomous vehicles, and collaborative robotic systems that require strict timing guarantees. However, several challenges remain, including configuration complexity, interoperability among TSN-enabled devices, and the efficient mapping of DDS QoS policies to TSN traffic classes. Consequently, DDS–TSN integration has become an active research direction for next-generation real-time robotic communication systems.

5. Comprehensive Evaluation of ROS Communication Middleware Performance

Section 4 analyzed the key implementation mechanisms of ROS communication systems from a system architecture perspective. However, in practical robotic systems, architectural design alone is insufficient to guarantee communication performance; quantitative, empirical evaluation under controlled conditions is essential. To address the lack of reliable cross-protocol performance comparisons in the existing literature, this chapter not only summarizes prior work but also conducts a unified benchmark evaluation on a standardized testbed. We systematically measured and compared the performance of mainstream ROS 2 middleware (CycloneDDS and FastDDS) across key metrics, including latency, throughput, jitter, scalability, and packet loss rate—for varying message sizes (64 B to 8 MB) and system scales (1 to 50 nodes). Furthermore, we evaluated the performance of shared memory and zero-copy transmission modes under large message conditions. These comprehensive, reproducible experiments provide a solid foundation for the performance analysis and middleware selection guidelines presented in the following sections.

5.1. Communication Performance Optimization Strategies

To enhance the performance of ROS communication systems, existing research primarily focuses on optimizing two aspects: data transmission paths and processing methods. Zero-copy communication technology effectively reduces system overhead by minimizing the number of data copies. In scenarios involving large data messages, this method can significantly reduce communication latency and optimize system response times from milliseconds to microseconds, thereby improving overall real-time performance [43].
In high-frequency data transmission scenarios, communication performance is also influenced by message granularity. Through message batching and data aggregation mechanisms, the number of system calls can be reduced, and bandwidth utilization improved. Ishikawa-Aso et al. effectively reduced communication overhead in complex robotic systems by optimizing message transmission methods and processing workflows, achieving an approximately 16% improvement in average response performance in autonomous driving scenarios [44].
As system scale expands, hardware acceleration technologies are increasingly being integrated into communication optimization for ROS 2 systems. For example, Remote Direct Memory Access (RDMA) enables efficient memory access by bypassing the operating system’s network protocol stack, while Field Programmable Gate Arrays (FPGAs), Smart Network Interface Cards (SmartNICs), and Data Processing Units (DPUs) can offload communication-related tasks from the host CPU, thereby reducing serialization overhead, memory copy operations, and network processing latency. Combined with shared memory transport and zero-copy communication mechanisms, these hardware-assisted approaches can further improve throughput, reduce CPU utilization, and enhance scalability for data-intensive robotic applications. Bédard et al. demonstrated that through low-level communication mechanism optimization and system level performance analysis, communication latency can be effectively reduced and the overall operational efficiency of distributed systems improved [45]. As robotic platforms continue to integrate high-bandwidth sensors and AI workloads, hardware-accelerated communication is expected to play an increasingly important role in future industrial and autonomous robotic systems.

5.2. Unified Benchmark Evaluation on ROS 2 Middleware

Most existing literature on the performance evaluation of ROS 2 communication middleware is based on heterogeneous experimental environments, varying QoS configurations, and message sizes, resulting in unreliable cross-protocol comparisons. To address this shortcoming, this section conducts systematic performance benchmarking of two mainstream DDS implementations on a unified, controlled test platform. The tests covered message sizes ranging from 64 B to 8 MB and subscriber counts ranging from 1 to 50, and comprehensively reported key metrics such as latency, throughput, jitter, scalability, and packet loss rate. Additionally, for large message scenarios, a separate comparison was conducted among three transmission modes: standard DDS, shared memory, and zero-copy. All experiments strictly controlled experimental variables and the runtime environment. Furthermore, each experiment was repeated five times, and the final results were calculated as the overall average to ensure the comparability and reproducibility of the results.

5.2.1. Latency Scaling with Message Size

End to end latency is the most critical metric for evaluating the real-time performance of a robotic system. Latency directly determines the system’s response speed and stability. High or unpredictable latency can lead to control oscillations, tracking errors, and even system instability. Therefore, measuring the average latency, minimum latency, and maximum latency for messages of different sizes helps determine the operational limits of ROS 2 middleware in real-time tasks.
All experiments were conducted using the base configuration shown in Table 4. The test platform hardware consisted of an AMD Ryzen 7 5800 H CPU @ 3.2 GHz (Advanced Micro Devices, Inc., Santa Clara, CA, USA) and 8 GB RAM. The software environment included Ubuntu 22.04 (Canonical Ltd., London, UK), ROS 2 Humble (Open Robotics, Mountain View, CA, USA), CycloneDDS 0.10.5 (Eclipse Foundation, Brussels, Belgium), and FastDDS 2.6.11 (eProsima, Madrid, Spain).
Table 5 and Table 6 present latency statistics for CycloneDDS and FastDDS.
The latency test results show that CycloneDDS and FastDDS exhibit distinct performance differences across different message sizes. For small messages (64 B–16 KB), CycloneDDS maintains an average latency of 32–48 μs, while FastDDS ranges from 42–48 μs, giving CycloneDDS a slight advantage. In particular, latency drops to as low as 31.92 μs and 32.54 μs at 1 KB and 16 KB, respectively. When the message size reaches 256 KB, latency begins to rise for both. At 1 MB, the two are comparable. The true performance inflection point occurs at 4 MB, where CycloneDDS’s average latency surges to 9.1 ms, while FastDDS remains at just 2.18 ms—FastDDS is approximately 76% faster than CycloneDDS, demonstrating a significant advantage in large message transmission. At 8 MB, the latency for both systems exceeds 18 ms. This phenomenon is related to their underlying implementation mechanisms: FastDDS employs more efficient sharding and flow control strategies for large messages, whereas CycloneDDS likely retains more optimization preferences for small messages. Compared with existing literature [14,46], this experiment confirms the engineering intuition that CycloneDDS is better for small messages, while FastDDS is better for large messages under uniform QoS.

5.2.2. Throughput Scaling with Message Size

Throughput reflects the total amount of data a system can process per unit of time. Insufficient throughput can lead to data backlogs, queue overflows, and packet loss. Peak throughput testing reveals the maximum processing capacity of the middleware, helping developers determine whether the system can support the expected data load.
In this experiment, building upon the environment configuration from Experiment 1, only the Rate parameter was adjusted, setting it to 0. This indicates transmission at the maximum rate, used to test the system’s peak throughput under saturated load. Table 7 and Table 8 present the throughput performance of CycloneDDS and FastDDS under saturated load.
The throughput tests under saturated load reveal the maximum processing capabilities of the two middleware solutions. In the small message range of 64 B to 16 KB, CycloneDDS’s throughput is 4–6 times that of FastDDS. This is because CycloneDDS’s small message path has been highly optimized, enabling it to complete more transmissions with lower per-message overhead. For medium-sized messages of 256 KB, FastDDS outperforms CycloneDDS slightly, indicating that FastDDS is better able to utilize network bandwidth once message sizes exceed a certain threshold. When message sizes reach 1 MB or larger, the throughput of both systems converges, at which point the system bottleneck shifts from within the middleware to network or memory bandwidth. It is worth noting that both middleware solutions experience significant packet loss under saturated loads; therefore, peak throughput reflects maximum processing capacity rather than reliable transmission capacity.

5.2.3. Jitter Scaling with Message Size

Jitter is a key metric for measuring a system’s temporal determinism. In control loops, multi-sensor fusion, and real-time synchronization tasks, excessive jitter can lead to timing discrepancies, degraded control performance, and even system instability, even when the average delay is low.
In this experiment, building upon the environmental configuration of Experiment 1, only the Runtime parameter was adjusted to 60 s. A 60 s runtime ensures a sufficient sample size and reduces random errors. Extended testing minimizes environmental interference and ensures result stability. Table 9 and Table 10 present the jitter metrics for CycloneDDS and FastDDS under different message sizes.
Jitter analysis is critical for evaluating the temporal determinism of real-time systems. The experiments found that both systems exhibited the lowest jitter coefficients at a message size of 256 KB, while maintaining moderate latency. For small messages (64 B–16 KB), while CycloneDDS exhibits low latency, its jitter coefficient ranges from 54% to 84%, which is higher than FastDDS’s 44% to 63%. This indicates that low latency does not necessarily equate to low jitter; CycloneDDS may have sacrificed some transmission stability in pursuit of low latency for small messages. At the 1 MB message size, the jitter coefficients for both systems drop to around 18%. This is because the absolute latency for large messages is longer, so relative fluctuations actually decrease. However, when the message size reaches 4 MB and 8 MB, the jitter coefficient deteriorates sharply. FastDDS exhibits a jitter coefficient as high as 185.6% at 4 MB, significantly higher than CycloneDDS’s 70.1%. Additionally, the delay variance reaches 496.6 × 103 μs2, indicating that FastDDS may experience severe internal queuing jitter or unstable fragment reassembly behavior during large message transmission. At 8 MB, both jitter coefficients approach or exceed 100%, making them unsuitable for any tasks with real-time requirements.

5.2.4. Scalability with Node Counts

Scalability is used to measure the extent to which system performance degrades as the number of nodes increases. If the latency and CPU consumption of the middleware increase linearly or even superlinearly with the number of subscribers, this will severely limit the system’s scalability.
Compared to the environment configuration in Experiment 1, this experiment sets the number of nodes as the independent variable and tests 1, 10, 20, 30, 40, and 50 nodes, respectively. The message size was fixed at 4 KB, and all other parameters remained unchanged. Table 11 and Table 12 summarize the changes in average latency, latency variance, CPU utilization, and the number of voluntary context switches for CycloneDDS and FastDDS as the number of subscribers increases.
In the scalability tests, the message size was fixed at 4 KB, and the number of subscribers was increased from 1 to 50, with a focus on observing changes in latency, CPU utilization, and variance. CycloneDDS performed better at all scales. When the number of subscribers was 50, the average latency was 303.3 μs, which is 21% lower than FastDDS’s 384.4 μs; CPU utilization was 41.9%, approximately 24% lower than FastDDS’s 52.0%, saving about 0.25% in CPU resources per subscriber on average. More importantly, CycloneDDS exhibited a gradual increase in latency variance: from 1 to 50 subscribers, the variance increased from 318.6 μs2 to 111.0 × 103 μs2; whereas FastDDS’s variance surged sharply after exceeding 20 subscribers, indicating that its internal thread scheduling or message distribution mechanism is prone to unstable queuing delays in large-scale, multi-subscriber scenarios. The number of context switches for both systems is similar, suggesting that system call overhead is not the primary source of difference; the discrepancy should be attributed to the middleware’s internal data replication and lock contention strategies. Unlike the literature [5,38], which tested only up to 10 nodes, this experiment was scaled to 50 nodes, clearly demonstrating CycloneDDS’s engineering advantages in large-scale systems.

5.2.5. Packet Loss Under Publication Rate Load

The packet loss rate directly reflects communication reliability and is particularly critical in wireless networks and high-load or resource-constrained environments. In robotic systems, packet loss can lead to serious issues such as lost control commands, gaps in sensor data, and divergence in state estimation.
Compared to the environmental configuration in Experiment 1, this experiment sets “Rate” as the independent variable, testing 0, 1000, 5000, 10,000, and 20,000 Hz. The message size is fixed at 32 KB, while all other parameters remain unchanged. Table 13 and Table 14 summarize the number of sent samples, received samples, lost samples, and packet loss rates for CycloneDDS and FastDDS at different publication rates.
The packet loss rate tests used a fixed message size of 32 KB to evaluate reliability at different publication rates. The most critical finding was that, without rate limiting, CycloneDDS had a packet loss rate of only 3.05%, while FastDDS reached as high as 66.94%—the latter losing nearly two-thirds of the messages. This demonstrates that CycloneDDS can prevent significant packet loss through internal buffering and flow control when the publisher is overproducing. FastDDS, on the other hand, tends to prioritize maintaining the send rate, resulting in massive message discards as soon as the receiving end fails to process messages in a timely manner. When explicit rate limits are applied, both systems achieve zero packet loss within the 1000–5000 msg/s range. In uncontrolled load environments, CycloneDDS demonstrates significantly better overload robustness. However, if rate limits are applied to the publisher side, both systems can achieve industrial-grade reliability.

5.2.6. Comparison of CycloneDDS, Shared Memory, and Zero-Copy

Standard DDS communication involves multiple data copies, serialization, and deserialization, resulting in significant CPU and latency overhead for large messages. Shared memory and zero-copy are two optimized transmission modes; the former reduces inter-process copying, while the latter completely eliminates copying operations. This experiment compares the performance metrics of these three modes in large-message scenarios to provide a basis for optimizing large-data transmission in robotic systems.
Compared to the environment configuration in Experiment 1, this experiment fixes the Message Size at 8 MB, the Rate at 100 Hz, and the Runtime at 60 s, while keeping all other parameters unchanged. Table 15 compares the performance metrics of the three transmission modes—standard CycloneDDS, shared memory, and zero-copy—under conditions of 8 MB messages and a 100 Hz rate.
For the scenario involving 8 MB messages and a moderate rate of 100 Hz, significant performance differences were observed among the three transmission modes. In terms of latency, the zero-copy mode achieved an average latency 56% lower than CycloneDDS and 38% lower than the shared memory mode, with the best minimum and maximum latency values. This validates that zero-copy technology can significantly reduce end-to-end latency by eliminating unnecessary data copying operations. In terms of throughput performance, CycloneDDS outperforms both shared memory and zero-copy modes by a factor of two. This is because the latter two modes did not employ batching optimizations in simple point-to-point tests. Regarding CPU efficiency, the CPU utilization of the zero-copy mode is 55% lower than that of CycloneDDS and 37% lower than that of the shared memory mode. This indicates that zero-copy significantly frees up computational resources, enabling the system to handle more perception or planning tasks. In terms of transmission reliability, no packet loss occurred in any of the three modes. Overall, while CycloneDDS offers the highest and most stable throughput, it also incurs the highest latency and CPU overhead; the zero-copy mode holds an overwhelming advantage in scenarios involving large messages, making it the preferred solution for transmitting high-frequency, large-volume data such as images, point clouds, and radar data. The shared memory mode, as a compromise, also delivers significant improvements.

5.3. Validation of Robot Application Scenarios

To further validate the performance of different communication mechanisms in real-world robot systems, this paper selects typical robot application scenarios for evaluation.

5.3.1. Mobile Robot Navigation Systems

In mobile robot navigation systems, high-frequency sensor data requires low-latency transmission to ensure information synchronization between the perception and control modules. Elena Ferrari and colleagues conducted a communication performance evaluation of mobile robot trajectory tracking tasks in the ROS–Gazebo simulation environment. The results indicate that communication latency and temporal synchronization errors directly affect the accuracy of robot trajectory tracking and control stability [47]. Brekke et al. validated the real-time communication requirements of the ROS 2 navigation stack and sensor data streams using a quadruped robot platform. This experiment also highlighted the importance of low-latency middleware in this scenario [48]. Wang et al. concluded in their study of mobile robot systems that low-latency shared memory communication mechanisms (such as Iceoryx) are more suitable for the local high-frequency sensor data processing stage [43]. For cross-node communication between path planning and control modules, DDS-based data distribution mechanisms can be employed. Such mechanisms offer superior system scalability and network transparency. Kronauer et al. conducted a systematic analysis of communication latency in ROS 2 multi-node systems, finding that system communication performance depends largely on the chosen DDS middleware implementation and its configuration strategy [49].

5.3.2. Multi-Robot Coordination Systems

In multi-robot coordination systems, the scalability of the communication network becomes a critical factor as the number of robots increases. Liang et al. conducted comparative experiments on throughput and latency for various communication protocols, including Zenoh, DDS, and MQTT. The results indicate that Zenoh maintains high data throughput in distributed environments and demonstrates excellent performance scalability in multi-node communication scenarios [50].
By introducing distributed routing and data aggregation mechanisms, Zenoh enables efficient data forwarding between different nodes. This reduces broadcast communication within the network and improves the overall communication efficiency of the system. The publish-subscribe communication mechanism based on DDS typically performs stably in small- to medium-scale systems. As the number of nodes increases, its automatic discovery mechanism introduces additional network communication overhead. Orr et al. provided a comprehensive review of the application of multi-agent deep reinforcement learning in multi-robot systems, noting that collaborative learning among agents can effectively improve the coordination efficiency and system scalability of multi-robot systems in complex tasks [51]. In the process of multi-robot collaborative exploration of unknown environments, the efficiency of path planning and map building is of critical importance. Romeh et al. proposed a Hybrid Vulture-Coordinated Multi-Robot Exploration algorithm, which effectively improves the performance of finite map building by combining coordinated multi-robot exploration with the African Vulture optimization algorithm [52].

5.3.3. Analysis of Real-World Deployment Environments

In real-world robotic systems, communication performance requirements vary significantly across different application scenarios. For example, in autonomous driving systems, where a large volume of high-frequency sensor data must be processed within the vehicle, a hybrid communication architecture is typically employed. Within the onboard computing nodes, Iceoryx is used to achieve high-throughput, low-latency data transmission. For communication between different functional modules or across devices, DDS is employed for network communication to ensure system reliability and scalability. Research by Kronauer et al. indicates that in complex distributed robotic systems, communication latency is closely related to the DDS implementation and communication configuration used. Different middleware implementations can have a significant impact on overall system performance [49].
Research by Paul et al. indicates that different DDS implementations and their configurations significantly affect system communication latency and data transmission performance [53]. In their autonomous driving collaborative perception experiment, by measuring round-trip communication times between ROS 2 nodes, they found that communication latency was significantly influenced by message size, data type, and the DDS implementation method. Chisăliţă et al. further validated the real-time communication capabilities of the Zenoh protocol in dynamic network environments, demonstrating that it can meet the dual requirements of low latency and high reliability for in-vehicle systems [54]. Real-world deployment cases indicate that no single middleware solution is suitable for all scenarios; hybrid architectures and on-demand selection are key to addressing these challenges.

5.4. Optimization-Based Middleware Selection Method

5.4.1. Problem Formulation

The unified analytical framework introduced in Section 3.4 establishes a quantitative relationship between middleware architectural characteristics and communication performance. Building upon this framework, we formulate middleware selection as a performance-aware optimization problem. Let M = {CycloneDDS, FastDDS, Shared Memory, Zero-Copy} denote the set of candidate communication paths.
For a workload characterized by message size S (bytes) and subscriber count Nsub, the performance model introduced in Section 3.4 provides a set of predicted metrics: F(Mi, S, Nsub) = { L ^ i, C ^ i, P ^ i}, where
  • L ^ i denotes the predicted end-to-end latency;
  • C ^ i denotes the predicted CPU utilization;
  • P ^ i denotes the predicted packet-loss rate.
Based on the benchmark observations in Section 5.2, the latency model is expressed as:
L ^ i S , N s u b = L 0 , i + θ i N s u b + α i S + δ c o p y , i
where:
  • L 0 , i is the baseline latency;
  • θ i describes latency growth with increasing subscribers;
  • α i represents message size sensitivity;
  • δ c o p y , i captures the additional overhead introduced by memory-copy operations.
The model parameters are obtained from the benchmark measurements reported in Table 5, Table 6, Table 7, Table 8, Table 9, Table 10, Table 11, Table 12, Table 13, Table 14 and Table 15. Similarly, CPU utilization and packet-loss behavior are represented by fitted performance functions:
C ^ i = f C ( S , N s u b )
  P ^ i = f P ( S , N s u b , R )
where R denotes the publication rate.
Because latency, CPU utilization, and packet-loss rate have different units and scales, direct aggregation is inappropriate. Therefore, each metric is normalized:
L ~ i = L ^ i L m i n L m a x L m i n
C ~ i = C ^ i C m i n C m a x C m i n
P ~ i = P ^ i P m i n P m a x P m i n
where the minimum and maximum values are computed across all candidate middleware for the current workload.
The overall performance cost is then defined as:
C o s t i = w L L ~ i + w C C ~ i + w P P ~ i
subject to
w L + w C + w P = 1
where w L , w C , and w P represent application-specific priorities.
The optimal middleware is selected according to:
M = a r g m i n M i M C o s t i
this formulation transforms middleware selection from a qualitative decision process into a quantitative optimization problem.

5.4.2. Optimization Procedure

Algorithm 1 implements the proposed middleware selection method.
Algorithm 1 Optimization-Based Middleware Selection
Input: Message size S, Subscriber count Nsub, Publication rate R, Weight vector ( w L , w C , w P )
Procedure:
1. For each candidate middleware Mi ∈ M: Compute predicted latency L ^ i ; Compute predicted CPU utilization C ^ i ; Compute predicted packet-loss rate P ^ i .
2. Normalize the predicted metrics to obtain L ~ i , C ~ i , P ~ i .
3. Calculate C o s t i .
4. Select M .
Output: Selected middleware M .

5.4.3. Retrospective Evaluation Using Benchmark Data

We use the benchmark data provided in Section 5.2 to retrospectively evaluate this method. The purpose of this evaluation is not to assess generalization to unseen environments, but rather to determine whether the optimization framework can reproduce the middleware-selection decisions implied by the measured benchmark results. For each benchmark scenario reported in Section 5.2, including different message sizes, subscriber counts, and publication rates, the fitted performance models are used to estimate L ^ i , C ^ i , P ^ i for all candidate middleware. A representative weight configuration ( w L , w C , w P ) = (0.7, 0.2, 0.1) is adopted to reflect latency-sensitive robotic applications. The evaluation results show the following:
For small and medium messages where the message size does not exceed 256 KB, the proposed method consistently selects CycloneDDS. This selection agrees with the lower latency observed in Table 5 and Table 11, where CycloneDDS demonstrates superior performance in terms of both average latency and latency stability under such message sizes.
For large messages of at least 4 MB, the method selects FastDDS when network communication is required, while Zero-Copy is chosen for local inter-process communication. This decision matches the experimental results reported in Table 6 and Table 15, where FastDDS exhibits significantly better throughput and lower latency than CycloneDDS for large payloads, and Zero-Copy outperforms both in shared memory scenarios.
In cases involving a large number of subscribers (i.e., 30 or more), the proposed method favors CycloneDDS. This preference is driven by CycloneDDS’s lower latency growth per additional subscriber and reduced CPU overhead, as consistently shown in the scalability measurements of Table 11 and Table 12.
Under high publication-rate scenarios, the packet-loss penalty in the cost function increases significantly for middleware that exhibits reduced reliability during overload conditions. Consequently, the proposed method avoids configurations associated with excessive packet loss, which is fully consistent with the observations reported in Table 13 and Table 14.
Across all benchmark scenarios presented in Section 5.2, the optimization framework consistently selects the experimentally optimal or near-optimal middleware. These results indicate that the proposed formulation successfully captures the primary trade-offs observed in the benchmark measurements.

5.4.4. Discussion

Unlike the static decision tree shown in Figure 9, the proposed method formulates middleware selection as a quantitative optimization problem driven by measurable performance metrics. Instead of relying on heuristic rules, the algorithm minimizes a normalized cost function that explicitly accounts for latency, CPU utilization, and packet loss rate, with application-specified weights reflecting different priorities.
The optimization framework is independent of any specific middleware implementation, which offers good generality. Additional communication technologies can be incorporated simply by introducing new performance models without modifying the core optimization procedure. Furthermore, the method provides flexibility for application developers, who can adjust the weighting factors according to deployment requirements, thereby enabling different trade-offs among latency, resource consumption, and communication reliability. In addition, the approach is extensible: when new middleware implementations or hardware platforms become available, only the model parameters need to be updated, while the algorithmic structure remains unchanged.
The retrospective evaluation demonstrates that the proposed framework extends benchmark observations into a quantitative middleware-selection mechanism. To the best of our knowledge, existing ROS 2 middleware comparison studies primarily focus on benchmarking and qualitative analysis, whereas the proposed approach provides a performance-aware optimization method that supports middleware selection under heterogeneous workload conditions.

6. Future Directions and Research Challenges

As robotic systems continue to grow in scale and application scenarios become increasingly complex, technologies such as cloud computing, AI-driven optimization, and functional safety standards have begun to be integrated into the field of robotic communication in recent years. Building on the system analysis and performance evaluation discussed earlier, this chapter explores potential future directions for ROS communication systems.

6.1. Cloud-Native Communication Architectures and Deployment Models

As robotic systems gradually evolve toward distributed and cloud-native architectures, significant changes are occurring in communication architectures and deployment models. In an edge-cloud collaboration model, tasks with high real-time requirements are typically executed locally, while computationally intensive tasks are deployed in the cloud. This ensures low latency while enhancing the system’s overall computational capacity. Research by Siriweera et al. indicates that cloud robotics platforms must enable efficient data exchange across cloud environments while balancing system real-time performance and availability [57]. By optimizing network architectures and data flow mechanisms, the efficiency of cloud-edge collaborative communication can be further enhanced [58].
Containerization and microservices deployment provide critical support for ROSs. Encapsulating ROS nodes as containers and leveraging orchestration platforms to achieve automatic scaling and resource isolation can significantly improve system maintainability and deployment flexibility. However, the container network layer also introduces additional communication overhead. Research by Patel et al. indicates that communication latency in container environments typically increases by approximately 5–15% compared to bare-metal systems [59]. Therefore, data transmission paths must be optimized in practical deployments.

6.2. Intelligent-Driven Adaptive Communication Mechanisms

As the operating environments of robotic systems become increasingly complex, communication mechanisms are evolving from static configurations toward dynamic adaptation. In recent years, research has begun to incorporate machine learning methods to optimize communication parameters. For example, reinforcement learning is used to dynamically adjust QoS configurations, data transmission frequencies, and caching strategies based on network conditions. This approach enables a balance between latency and reliability [60]. In addition to reinforcement learning, federated learning also offers new insights for the adaptive optimization of communication parameters. Almeida et al. proposed a modular federated learning framework for dynamic edges. By enabling collaborative learning among nodes while preserving data privacy, this framework provides a reference for intelligent adaptive mechanisms in ROS communication systems [61].
Building on this foundation, communication systems have further evolved toward self-optimization. By continuously monitoring metrics such as latency, packet loss rate, and bandwidth utilization, the system can automatically adjust middleware configurations and data transmission paths, enabling functions such as dynamic routing optimization and fault recovery [62]. Such mechanisms help maintain stable system operation in complex network environments and reduce communication and operational costs in large-scale robotic systems.

6.3. Integration of Real-Time Performance and Functional Safety

In safety-critical fields such as autonomous driving, industrial robotics, and medical devices, communication systems must simultaneously meet stringent real-time performance and functional safety requirements. For example, standards such as IEC 26262 [55], ISO 61508 [56], and IEC 62304 [63] specify concrete requirements for system design, verification, and safety mechanisms. In ROS 2 systems, compliance with these standards necessitates guaranteeing the determinism of message transmission, ensuring that critical messages are delivered within specified time frames, and incorporating fault detection and recovery mechanisms [64]. To achieve this, the ROS communication framework undergoes rigorous safety verification, covering aspects such as real-time performance validation, fault detection, and recovery capability testing. Through formal verification, the execution of QoS policies and fault recovery processes can be analyzed, thereby enhancing system reliability and ensuring stable operation in safety-critical environments.
Emerging digital twin and extended reality (XR) remote monitoring scenarios also impose stringent real-time requirements on ROS communication systems. Research by Sthapit et al. indicates that the integration of digital twin and XR technologies in remote monitoring relies heavily on low-latency, highly reliable data distribution mechanisms to ensure real-time synchronization between virtual models and physical systems [65]. This review offers new insights into the deployment of ROS 2 communication middleware in virtual reality and remote visualization applications.

6.4. Key Technical Challenges and Research Directions

Although substantial progress has been achieved in ROS communication middleware, several technical challenges remain unresolved in large-scale, distributed, and safety-critical robotic systems. The performance evaluation and middleware analysis presented in this review indicate that future research should focus not only on communication efficiency, but also on deterministic networking, adaptive QoS management, security-performance co-design, and hardware-assisted communication optimization. Table 16 summarizes the major research directions, associated technical bottlenecks, and their significance for next-generation ROS communication systems.
Under dynamic network conditions, static QoS configurations often fail to satisfy the diverse requirements of robotic applications. Future research is expected to focus on intelligent QoS adaptation, where machine learning techniques dynamically adjust parameters such as Reliability, Deadline, History Depth, and Lifespan according to network congestion levels and application demands. However, achieving real-time adaptation while controlling learning overhead and maintaining system stability remains a major technical challenge. For example, Abaza’s latest research proposes an AI-based dynamic covariance method that effectively optimizes the localization accuracy and computational efficiency of ROS 2 in dynamic environments [66].
In safety-critical systems such as autonomous driving and industrial robotics, communication mechanisms must simultaneously meet stringent real-time and security requirements. However, there is often a trade-off between performance overhead and system complexity when balancing real-time communication and security mechanisms. Therefore, how to achieve the co-design of real-time guarantees and security authentication mechanisms within the ROS communication framework remains an important future research direction, closely related to the “real-time and security integration” issue in Table 16.
Another important research direction is the integration of DDS with Time-Sensitive Networking (TSN). While DDS provides flexible Quality of Service management at the middleware layer, it cannot independently guarantee deterministic communication under all network conditions. TSN extends standard Ethernet through time synchronization and traffic scheduling mechanisms, enabling bounded latency and reduced communication jitter. Future research should focus on the efficient mapping between DDS QoS policies and TSN traffic classes, as well as end-to-end determinism across heterogeneous robotic networks.
As robotic applications increasingly generate high-bandwidth sensor streams, communication processing is becoming a major system bottleneck. Hardware-accelerated communication technologies, including RDMA, FPGA-based acceleration, SmartNICs, and Data Processing Units (DPUs), offer promising solutions by reducing serialization overhead, memory-copy operations, and CPU utilization. Future work should explore the integration of hardware acceleration with sharememory transport and zero-copy communication mechanisms to further improve throughput and scalability in large-scale distributed robotic systems.
Future ROS communication systems are expected to evolve toward the co-design of middleware, network infrastructure, and intelligent optimization mechanisms. Key research challenges include achieving deterministic communication through DDS–TSN integration, enabling adaptive QoS management under dynamic network conditions, reducing the performance overhead introduced by security mechanisms, and leveraging hardware acceleration to eliminate communication bottlenecks. Addressing these challenges will be essential for supporting next-generation robotic applications that require large-scale collaboration, cloud-edge integration, functional safety, and real-time performance guarantees.

6.5. Recommendations for Communication Solutions for Different Application Scenarios

Based on the analytical model and optimization framework proposed in Section 5.4, the middleware selection process can be transformed into a structured decision procedure. Figure 9 illustrates the practical workflow of the proposed optimization-based middleware selection method. The workflow integrates key factors including safety requirements, communication scope, real-time constraints, resource limitations, and system scale, thereby translating the optimization results into deployable middleware configurations for representative robotic application scenarios. Figure 9 presents the implementation workflow of the proposed optimization-based middleware selection method. The workflow serves as an engineering realization of the optimization model introduced in Section 5.4 and provides a practical procedure for selecting middleware configurations under different operational constraints.
The selection workflow is derived from the optimization framework presented in Section 5.4, where middleware selection is formulated as a constrained optimization problem. The decision process incorporates multiple factors, including communication latency requirements, safety certification requirements, network scope, computational resources, power constraints, and system scale. The resulting workflow provides an interpretable implementation of the optimization method and assists developers in translating performance requirements into middleware deployment decisions. For example, in industrial automation scenarios, communication systems typically need to satisfy strict real-time, reliability, and functional safety requirements. Therefore, mature DDS implementations such as CycloneDDS or RTI Connext are generally preferred. RTI Connext DDS additionally provides support for security mechanisms and industrial deployment requirements, making it suitable for safety-critical applications. The DDS protocol provides stable and reliable real-time communication capabilities and offers significant advantages, particularly in industrial robot systems, consistent with the research findings of Macenski et al. [67].
In service robotics or cloud robotics applications, systems place greater emphasis on development efficiency and scalability [68]. Consequently, solutions such as FastDDS or Zenoh offer certain advantages. Sasaki et al. constructed a system architecture comprising ROS robots, edge nodes, and cloud servers. Through experiments, they validated the scalable communication capabilities of the ROS in a cloud-edge collaborative environment [69].
In autonomous driving systems, the system often needs to process large amounts of high-frequency sensor data internally. A hybrid communication architecture is suitable. Shared memory communication mechanisms are used between internal vehicle nodes to achieve lower latency. DDS-based network communication protocols are adopted for inter-vehicle or vehicle-to-infrastructure (V2I) communication. The ROS, combined with real-time path planning and environmental perception algorithms, can effectively support the high-frequency data processing requirements of autonomous driving systems [70].
In resource-constrained embedded systems, memory usage, CPU overhead, and power consumption must be minimized. Therefore, lightweight DDS implementations or micro-ROS are generally more suitable for deployment on microcontrollers and embedded platforms. The MQTT-based remote robot control framework proposed by Lertyosbordin et al. effectively reduces the resource consumption of the communication framework and improves application efficiency in embedded systems [71].

7. Conclusions

This paper provides an in-depth analysis of the ROS communication system, exploring its architectural evolution, key technologies, and performance characteristics. By comparing the communication mechanisms of ROS 1 and ROS 2, it is evident that the ROS communication architecture has evolved from a centralized model to a distributed communication model based on DDS. This transformation not only enhances the system’s scalability and real-time performance but also improves its adaptability in large-scale collaborative scenarios. By adopting the DDS standard, ROS 2 enables efficient, low-latency data exchange between nodes. Its advantages are particularly evident in edge computing and cloud-edge collaboration environments. Through the introduction of the RMW abstraction layer, ROS 2 successfully decouples the communication middleware, allowing for seamless integration of different DDS implementations and emerging technologies, thereby providing ample room for the system’s continuous evolution.
Most importantly, in order to overcome the shortcomings of previous comparative studies that relied on heterogeneous literature data, this paper provides an original and unified performance benchmark on a controlled experimental platform. Our experiment systematically quantified the performance of CycloneDDS and FastDDS in five key dimensions: latency scaling, peak throughput, jitter, scalability, and packet loss rate at different rates, as well as research on the performance of shared memory and large message zero- copy mode. The main findings include: CycloneDDS exhibits excellent latency and scalability for small and medium-sized messages, while FastDDS shows better throughput for very large messages; Jitter analysis reveals the limitations of two implementation methods for real-time tasks of 4 MB or more; compared to standard DDS with 8 MB messages, zero- copy reduces latency by 56% and CPU usage by 55%.
In addition, this paper proposes an optimization-based middleware selection method that integrates experimentally derived performance characteristics with workload requirements. Compared with conventional rule-based selection guidelines, the proposed method enables quantitative middleware selection and provides a practical approach for communication optimization in ROS 2 systems.

Author Contributions

Conceptualization, Z.W. and H.Y.; resources, Z.W., H.Y., H.X. and Z.D.; writing—original draft preparation, Z.W. and H.Y.; writing—review and editing, Z.W., H.Y., H.X. and Z.D.; supervision, Z.W.; All authors have read and agreed to the published version of the manuscript.

Funding

This work was partially supported by Sichuan Provincial Engineering Research Center of Smart Operation and Maintenance of Civil Aviation Airports under grant No. JCZX2024ZZ16, and the Fundamental Research Funds for the Central Universities under grant Nos. 26CAFUC01008, 26CAFUC01010, 26CAFUC03048, 26CAFUC03077, 25CAFUC03036, 25CAFUC03037, and 25CAFUC09010.

Data Availability Statement

No new data were generated in this paper.

Acknowledgments

We thank the reviewers for their helpful remarks that strengthened this work.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Betz, T.; Schmeller, M.; Teper, H.; Betz, J. How Fast is My Software? Latency Evaluation for a ROS 2 Autonomous Driving Software. In Proceedings of the 2023 IEEE Intelligent Vehicles Symposium (IV), Anchorage, AK, USA, 4–7 June 2023; pp. 1–6. [Google Scholar]
  2. Teper, H.; Günzel, M.; Ueter, N.; von der Brüggen, G.; Chen, J.J. End-To-End Timing Analysis in ROS2. In Proceedings of the 2022 IEEE Real-Time Systems Symposium (RTSS), Houston, TX, USA, 5–8 December 2022; pp. 53–65. [Google Scholar]
  3. Macenski, S.; Moore, T.; Lu, D.V.; Merzlyakov, A.; Ferguson, M. From the desks of ROS maintainers: A survey of modern and capable mobile robotics algorithms in the robot operating system 2. Robot. Auton. Syst. 2023, 168, 104493. [Google Scholar] [CrossRef]
  4. Quigley, M.; Conley, K.; Gerkey, B.; Faust, J.; Foote, T.; Leibs, J.; Wheeler, R.; Ng, A. ROS: An open-source Robot Operating System. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA)—Workshop on Open Source Software, Kobe, Japan, 12–17 May 2009; Volume 3. [Google Scholar]
  5. Park, J.; Delgado, R.; Choi, B.W. Real-Time Characteristics of ROS 2.0 in Multiagent Robot Systems: An Empirical Study. IEEE Access 2020, 8, 154637–154651. [Google Scholar] [CrossRef]
  6. Bode, V.; Buettner, D.; Preclik, T.; Trinitis, C.; Schulz, M. Systematic Analysis of DDS Implementations. In Proceedings of the Middleware ‘23: Proceedings of the 24th ACM/IFIP 24th International Middleware Conference, Bologna, Italy, 11–15 December 2023; pp. 234–246. [Google Scholar]
  7. Alaerjan, A.; Kim, D.-K.; Ming, H.; Kim, H. Configurable DDS as Uniform Middleware for Data Communication in Smart Grids. Energies 2020, 13, 1839. [Google Scholar] [CrossRef]
  8. Profanter, S.; Tekat, A.; Dorofeev, K.; Rickert, M.; Knoll, A. OPC UA versus ROS, DDS, and MQTT: Performance Evaluation of Industry 4.0 Protocols. In Proceedings of the 2019 IEEE International Conference on Industrial Technology (ICIT), Melbourne, Australia, 13–15 February 2019; pp. 955–962. [Google Scholar]
  9. AL-Madani, B.; Elkhider, S.M.; El-Ferik, S. DDS-Based Containment Control of Multiple UAV Systems. Appl. Sci. 2020, 10, 4572. [Google Scholar] [CrossRef]
  10. Mouradian, C.; Naboulsi, D.; Yangui, S.; Glitho, R.H.; Morrow, M.J.; Polakos, P.A. A Comprehensive Survey on Fog Computing: State-of-the-Art and Research Challenges. IEEE Commun. Surv. Tutor. 2018, 20, 416–464. [Google Scholar] [CrossRef]
  11. Raj, R.; Kos, A. A Comprehensive Study of Mobile Robot: History, Developments, Applications, and Future Research Perspectives. Appl. Sci. 2022, 12, 6951. [Google Scholar] [CrossRef]
  12. Camargo, C.; Gonçalves, J.; Conde, M.Á.; Rodríguez-Sedano, F.J.; Costa, P.; García-Peñalvo, F.J. Systematic Literature Review of Realistic Simulators Applied in Educational Robotics Context. Sensors 2021, 21, 4031. [Google Scholar] [CrossRef] [PubMed]
  13. Duan, F.; Li, W.; Tan, Y. ROS Debugging. In Intelligent Robot: Implementation and Applications; Springer: Berlin/Heidelberg, Germany, 2023; pp. 71–92. [Google Scholar]
  14. Puck, L.; Keller, P.; Schnell, T.; Plasberg, C.; Tanev, A.; Heppner, G.; Roennau, A.; Dillmann, R. Performance Evaluation of Real-Time ROS2 Robotic Control in a Time-Synchronized Distributed Network. In Proceedings of the 2021 IEEE 17th International Conference on Automation Science and Engineering (CASE), Lyon, France, 23–27 August 2021; pp. 1670–1676. [Google Scholar]
  15. Gambo, M.L.; Danasabe, A.; Almadani, B.; Aliyu, F.; Aliyu, A.; Al-Nahari, E. A Systematic Literature Review of DDS Middleware in Robotic Systems. Robotics 2025, 14, 63. [Google Scholar] [CrossRef]
  16. Liu, R.; Zheng, J.; Luan, T.H.; Gao, L.; Hui, Y.; Xiang, Y.; Dong, M. ROS-Based Collaborative Driving Framework in Autonomous Vehicular Networks. IEEE Trans. Veh. Technol. 2023, 72, 6987–6999. [Google Scholar] [CrossRef]
  17. Pico, N.; Mite, G.; Morán, D.; Alvarez-Alvarado, M.S.; Auh, E.; Moon, H. Web-Based Real-Time Alarm and Teleoperation System for Autonomous Navigation Failures Using ROS 1 and ROS 2. Actuators 2025, 14, 164. [Google Scholar] [CrossRef]
  18. Yang, S.; Guo, J.; Rui, X. Formal Analysis and Detection for ROS2 Communication Security Vulnerability. Electronics 2024, 13, 1762. [Google Scholar] [CrossRef]
  19. Fernandez, J.; Allen, B.; Thulasiraman, P.; Bingham, B. Performance Study of the Robot Operating System 2 with QoS and Cyber Security Settings. In Proceedings of the 2020 IEEE International Systems Conference (SysCon), Montreal, QC, Canada, 24–27 August 2020; pp. 1–6. [Google Scholar]
  20. Bode, V.; Trinitis, C.; Schulz, M.; Buettner, D.; Preclik, T. DDS Implementations as Real-Time Middleware—A Systematic Evaluation. In Proceedings of the 2023 IEEE 29th International Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA), Niigata, Japan, 30 August–1 September 2023; pp. 186–195. [Google Scholar]
  21. Alaerjan, A. Formalizing the Semantics of DDS QoS Policies for Improved Communications in Distributed Smart Grid Applications. Electronics 2023, 12, 2246. [Google Scholar] [CrossRef]
  22. Ho, M.-H.; Lai, M.-Y.; Liu, Y.-T. Implementation of DDS Cloud Platform for Real-Time Data Acquisition of Sensors for a Legacy Machine. Electronics 2022, 11, 2096. [Google Scholar] [CrossRef]
  23. Papavasileiou, A.; Nikoladakis, S.; Basamakis, F.P.; Aivaliotis, S.; Michalos, G.; Makris, S. A Voice-Enabled ROS2 Framework for Human–Robot Collaborative Inspection. Appl. Sci. 2024, 14, 4138. [Google Scholar] [CrossRef]
  24. D’Avella, S.; Avizzano, C.A.; Tripicchio, P. ROS-Industrial based robotic cell for Industry 4.0: Eye-in-hand stereo camera and visual servoing for flexible, fast, and accurate picking and hooking in the production line. Robot. Comput.-Integr. Manuf. 2023, 80, 102453. [Google Scholar] [CrossRef]
  25. Kang, Z.; Dubey, A. Evaluating DDS, MQTT, and ZeroMQ Under Different IoT Traffic Conditions. Available online: http://www.dre.vanderbilt.edu/~gokhale/WWW/papers/M4IoT2020.pdf (accessed on 21 March 2024).
  26. Zhang, J.; Keramat, F.; Yu, X.; Hernández, D.M.; Queralta, J.P.; Westerlund, T. Distributed Robotic Systems in the Edge-Cloud Continuum with ROS 2: A Review on Novel Architectures and Technology Readiness. In Proceedings of the 2022 Seventh International Conference on Fog and Mobile Edge Computing (FMEC), Paris, France, 12–15 December 2022; pp. 1–8. [Google Scholar]
  27. Chen, K.; Wang, M.; Gualtieri, M.; Tian, N.; Juette, C.; Ren, L.; Ichnowski, J.; Kubiatowicz, J.; Goldberg, K. Fogros2-ls: A location-independent fog robotics framework for latency sensitive ros2 applications. In Proceedings of the 2024 IEEE International Conference on Robotics and Automation (ICRA), Yokohama, Japan, 13–17 May 2024; pp. 10581–10587. [Google Scholar]
  28. Corsaro, A.; Cominardi, L.; Hecart, O.; Baldoni, G.; Avital, J.E.P.; Loudet, J.; Guimares, C.; Ilyin, M.; Bannov, D. Zenoh: Unifying communication, storage and computation from the cloud to the microcontroller. In Proceedings of the 2023 26th Euromicro Conference on Digital System Design (DSD), Golem, Albania, 6–8 September 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 422–428. [Google Scholar]
  29. Baldoni, G.; Loudet, J.; Cominardi, L.; Corsaro, A.; He, Y. Facilitating distributed data-flow programming with Eclipse Zenoh: The ERDOS case. In Proceedings of the 1st Workshop on Serverless mobile networking for 6G Communications, Virtual, 25 June 2021; pp. 13–14. [Google Scholar]
  30. Zhang, J.; Yu, X.; Sier, H.; Queralta, J.P.; Westerlund, T. Comparison of DDS, MQTT, and Zenoh in Edge-to-Edge and Edge-to-Cloud Communication for Distributed ROS 2 Systems. arXiv 2023, arXiv:2309.07496. Available online: https://api.semanticscholar.org/CorpusID:261823232 (accessed on 27 November 2024).
  31. Chovet, L.; Garcia, G.; Bera, A.; Richard, A.; Yoshida, K.; Olivares-Mendez, M. Performance Comparison of ROS2 Middlewares for Multi-Robot Mesh Networks in Planetary Exploration. J. Intell. Robot. Syst. 2025, 111, 18. [Google Scholar] [CrossRef]
  32. Pöhnl, M.; Tamisier, A.; Blass, T. A Middleware Journey from Microcontrollers to Microprocessors. In Proceedings of the 2022 Design, Automation & Test in Europe Conference & Exhibition (DATE), Antwerp, Belgium, 14–23 March 2022; pp. 282–286. [Google Scholar]
  33. Dehnavi, S.; Koedam, M.; Nelson, A.; Goswami, D.; Goossens, K. CompROS: A composable ROS 2 based architecture for real-time embedded robotic development. In Proceedings of the 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Prague, Czech Republic, 27 September–1 October 2021; pp. 6449–6455. [Google Scholar] [CrossRef]
  34. Ichnowski, J.; Chen, K.; Dharmarajan, K.; Adebola, S.; Danielczuk, M.; Mayoral-Vilches, V.; Jha, N.; Zhan, H.; Llontop, E.; Xu, D.; et al. FogROS2: An Adaptive Platform for Cloud and Fog Robotics Using ROS 2. In Proceedings of the 2023 IEEE International Conference on Robotics and Automation (ICRA), London, UK, 29 May–2 June 2023; pp. 5493–5500. [Google Scholar]
  35. Fu, L.; Kapoor, G.; Militano, L.; Carughi, G.T.; Bohnert, T.M. IoRT ROS 2 applications: Evaluating Zenoh and VPN for robotic networking in the edge-cloud continuum. In Proceedings of the 2025 IEEE Symposium on Computers and Communications (ISCC), Bologna, Italy, 2–5 July 2025; pp. 1–6. [Google Scholar]
  36. Carreira, R.; Costa, N.; Ramos, J.; Frazão, L.; Pereira, A. A ROS2-Based Gateway for Modular Hardware Usage in Heterogeneous Environments. Sensors 2024, 24, 6341. [Google Scholar] [CrossRef]
  37. Kaushik, S.; Poonia, R.C.; Khatri, S.K. Comparative study of various protocols of DDS. J. Stat. Manag. Syst. 2017, 20, 647–658. [Google Scholar] [CrossRef]
  38. Yang, Y.; Azumi, T. Exploring Real-Time Executor on ROS 2. In Proceedings of the 2020 IEEE International Conference on Embedded Software and Systems (ICESS), Shanghai, China, 10–11 December 2020; pp. 1–8. [Google Scholar]
  39. Wang, G.; Zhang, C.; Liu, S.; Zhao, Y.; Zhang, Y.; Wang, L. Multi-robot collaborative manufacturing driven by digital twins: Advancements, challenges, and future directions. J. Manuf. Syst. 2025, 82, 333–361. [Google Scholar] [CrossRef]
  40. Basu, S.; Baert, M.; Hoebeke, J. QoS Enabled Heterogeneous BLE Mesh Networks. J. Sens. Actuator Netw. 2021, 10, 24. [Google Scholar] [CrossRef]
  41. Tang, Y.; Feng, Z.; Guan, N.; Jiang, X.; Lv, M.; Deng, Q.; Yi, W. Response time analysis and priority assignment of processing chains on ros2 executors. In Proceedings of the 2020 IEEE Real-Time Systems Symposium (RTSS), Houston, TX, USA, 1–4 December 2020; pp. 231–243. [Google Scholar]
  42. Wang, Z.; Liu, S.; Ji, D.; Yi, W. Improving Real-Time Performance of Micro-ROS with Priority-Driven Chain-Aware Scheduling. Electronics 2024, 13, 1658. [Google Scholar] [CrossRef]
  43. Wang, Y.P.; Tan, W.; Hu, X.Q.; Manocha, D.; Hu, S.M. TZC: Efficient Inter-Process Communication for Robotics Middleware with Partial Serialization. arXiv 2020, arXiv:1810.00556. [Google Scholar] [CrossRef]
  44. Ishikawa-Aso, T.; Kato, S. ROS 2 Agnocast: Supporting unsized message types for true zero-copy publish/subscribe IPC. In Proceedings of the 2025 28th International Symposium on Real-Time Distributed Computing (ISORC), Toulouse, France, 26–28 May 2025; pp. 1–10. [Google Scholar]
  45. Bedard, C.; Lutkebohle, I.; Dagenais, M. ros2_tracing: Multipurpose Low-Overhead Framework for Real-Time Tracing of ROS 2. IEEE Robot. Autom. Lett. 2022, 7, 6511–6518. [Google Scholar] [CrossRef]
  46. Ye, Y.; Nie, Z.; Liu, X.; Xie, F.; Li, Z.; Li, P. ROS2 Real-time Performance Optimization and Evaluation. Chin. J. Mech. Eng. 2023, 36, 144. [Google Scholar] [CrossRef]
  47. Ferrari, E.; Morato, A.; Tramarin, F.; Zunino, C.; Bertocco, M. Quantifying the Trajectory Tracking Accuracy in UGVs: The Role of Traffic Scheduling in Wi-Fi-Enabled Time-Sensitive Networking. Sensors 2026, 26, 881. [Google Scholar] [CrossRef]
  48. Brekke, V.; Berge, E.O.; Dybdahl, E.; Singh, J.; Tyapin, I. ROS 2-Driven Navigation and Sensor Platform for Quadruped Robots. Robotics 2026, 15, 70. [Google Scholar] [CrossRef]
  49. Kronauer, T.; Pohlmann, J.; Matthé, M.; Smejkal, T.; Fettweis, G. Latency Analysis of ROS2 Multi-Node Systems. In Proceedings of the 2021 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems (MFI), Karlsruhe, Germany, 23–25 September 2021; pp. 1–7. [Google Scholar]
  50. Liang, W.-Y.; Yuan, Y.; Lin, H.-J. A performance study on the throughput and latency of Zenoh, MQTT, Kafka, and DDS. arXiv 2023, arXiv:2303.09419. Available online: https://arxiv.org/abs/2303.09419 (accessed on 10 April 2026). [CrossRef]
  51. Orr, J.; Dutta, A. Multi-Agent Deep Reinforcement Learning for Multi-Robot Applications: A Survey. Sensors 2023, 23, 3625. [Google Scholar] [CrossRef] [PubMed]
  52. Romeh, A.E.; Mirjalili, S.; Gul, F. Hybrid Vulture-Coordinated Multi-Robot Exploration: A Novel Algorithm for Optimization of Multi-Robot Exploration. Mathematics 2023, 11, 2474. [Google Scholar] [CrossRef]
  53. Paul, S.; Lephuoc, D.; Hauswirth, M. Performance Evaluation of ROS2-DDS middleware implementations facilitating Cooperative Driving in Autonomous Vehicle. arXiv 2024, arXiv:2412.07485. [Google Scholar] [CrossRef]
  54. Chisăliţă, A.-I.; Korodi, A. Stepping toward Zenoh protocol in automotive scenarios. IEEE Access 2025, 13, 166167–166180. [Google Scholar] [CrossRef]
  55. ISO 26262; Road Vehicles—Functional Safety. International Organization for Standardization (ISO): Geneva, Switzerland, 2018.
  56. IEC 61508; Functional Safety of Electrical/Electronic/Programmable Electronic Safety-Related Systems. International Electrotechnical Commission (IEC): Geneva, Switzerland, 2010.
  57. Siriweera, A.; Naruse, K. Survey on Cloud Robotics Architecture and Model-Driven Reference Architecture for Decentralized Multicloud Heterogeneous-Robotics Platform. IEEE Access 2021, 9, 40521–40539. [Google Scholar] [CrossRef]
  58. Song, B.-Y.; Choi, H. ROS Gateway: Enhancing ROS Availability across Multiple Network Environments. Sensors 2024, 24, 6297. [Google Scholar] [CrossRef]
  59. Patel, D.; Maiti, C.; Muthuswamy, S. Real-Time Performance Monitoring of a CNC Milling Machine using ROS 2 and AWS IoT Towards Industry 4.0. In Proceedings of the IEEE EUROCON 2023—20th International Conference on Smart Technologies, Torino, Italy, 6 July 2023. [Google Scholar]
  60. Licardo, J.; Domjan, M.; Orehovački, T. Intelligent Robotics—A Systematic Review of Emerging Technologies and Trends. Electronics 2024, 13, 542. [Google Scholar] [CrossRef]
  61. Almeida, L.; Teixeira, R.; Baldoni, G.; Antunes, M.; Aguiar, R.L. Federated Learning for a Dynamic Edge: A Modular and Resilient Approach. Sensors 2025, 25, 3812. [Google Scholar] [CrossRef] [PubMed]
  62. Shamaine, C.X.E.; Qiao, Y.; Henry, J.; McNevin, K.; Murray, N. RoSTAR: ROS-based telerobotic control via augmented reality. In Proceedings of the 2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP), Tampere, Finland, 21–24 September 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 1–6. [Google Scholar]
  63. IEC 62304; Medical Device Software—Software Life Cycle Processes. International Electrotechnical Commission (IEC): Geneva, Switzerland, 2006.
  64. Bonci, A.; Gaudeni, F.; Giannini, M.C.; Longhi, S. Robot Operating System 2 (ROS2)-Based Frameworks for Increasing Robot Autonomy: A Survey. Appl. Sci. 2023, 13, 12796. [Google Scholar] [CrossRef]
  65. Sthapit, I.; Olbina, S. Digital Twin (DT) and Extended Reality (XR) in the Construction Industry: A Systematic Literature Review. Buildings 2026, 16, 517. [Google Scholar] [CrossRef]
  66. Abaza, B.F. AI-Driven Dynamic Covariance for ROS 2 Mobile Robot Localization. Sensors 2025, 25, 3026. [Google Scholar] [CrossRef]
  67. Macenski, S.; Foote, T.; Gerkey, B.; Lalancette, C.; Woodall, W. Robot Operating System 2: Design, architecture, and uses in the wild. Sci. Robot. 2022, 7, eabm6074. [Google Scholar] [CrossRef] [PubMed]
  68. Garzón, M. Using ROS in Multi-robot Systems: Experiences and Lessons Learned from Real-World Field Tests. In Robot Operating System (ROS); Koubaa, A., Ed.; Studies in Computational Intelligence; Springer: Cham, Switzerland, 2017; Volume 707. [Google Scholar]
  69. Sasaki, R.; Takefusa, A.; Nakada, H.; Oguchi, M. Development and Evaluation of IoT System Consisting of ROS-based Robot, Edge and Cloud. In Proceedings of the 2023 IEEE 47th Annual Computers, Software, and Applications Conferecnce (COMPSAC), Torino, Italy, 26 June 2023. [Google Scholar]
  70. Megalingam, R.K.; Rajendraprasad, A.; Manoharan, S.K. Comparison of Planned Path and Travelled Path Using ROS Navigation Stack. In Proceedings of the 2020 International Conference for Emerging Technology (INCET), Belgaum, India, 5–7 June 2020; IEEE: Belgaum, India, 2020; pp. 1–6. [Google Scholar]
  71. Lertyosbordin, C.; Wongsanont, D.; Khurukitwanit, N.; Saowapark, W. A Framework for Remote Robot Actuation using ROS Integrated with MQTT. In Proceedings of the 2024 International Technical Conference on Circuits/Systems, Computers, and Communications (ITC-CSCC), Okinawa, Japan, 2 July 2024. [Google Scholar]
Figure 1. Research Methodology Framework.
Figure 1. Research Methodology Framework.
Electronics 15 02632 g001
Figure 2. ROS 1 Centralized Communication Architecture.
Figure 2. ROS 1 Centralized Communication Architecture.
Electronics 15 02632 g002
Figure 3. ROS 2 Decentralized DDS-Based Communication Architecture.
Figure 3. ROS 2 Decentralized DDS-Based Communication Architecture.
Electronics 15 02632 g003
Figure 4. ROS 2 Layered Communication Interface with DDS Implementations.
Figure 4. ROS 2 Layered Communication Interface with DDS Implementations.
Electronics 15 02632 g004
Figure 5. ROS 2 Communication Middleware Taxonomy Framework.
Figure 5. ROS 2 Communication Middleware Taxonomy Framework.
Electronics 15 02632 g005
Figure 6. Zenoh Unified Data Plane for Edge-Cloud Continuum.
Figure 6. Zenoh Unified Data Plane for Edge-Cloud Continuum.
Electronics 15 02632 g006
Figure 7. Iceoryx Zero-Copy Shared-Memory Communication Mechanism.
Figure 7. Iceoryx Zero-Copy Shared-Memory Communication Mechanism.
Electronics 15 02632 g007
Figure 8. CompROS Mixed-Criticality System Architecture.
Figure 8. CompROS Mixed-Criticality System Architecture.
Electronics 15 02632 g008
Figure 9. Optimization-Based Middleware Selection Procedure. The safety-critical decision branch is based on the functional safety requirements specified in IEC 26262 and ISO 61508 [55,56].
Figure 9. Optimization-Based Middleware Selection Procedure. The safety-critical decision branch is based on the functional safety requirements specified in IEC 26262 and ISO 61508 [55,56].
Electronics 15 02632 g009
Table 1. Sensor-Specific DDS QoS Policy Recommendations for ROS 2.
Table 1. Sensor-Specific DDS QoS Policy Recommendations for ROS 2.
Sensor TypeReliabilityHistoryDepthDeadlinePriorityLifespanPerformance Impact
LiDAR PointCloudbest_effortkeep_last = 11100 msHIGHinfinite60–80% latency reduction
RGB Camera (30 fps)best_effortkeep_last = 5533 msHIGH5 sZero copy eliminates serialization
IMU (100 Hz)reliablekeep_last = 101010 msCRITICAL1 s<5 ms jitter
Depth Camerabest_effortkeep_last = 11100 msHIGHinfinite2x throughput
Tactile Sensorreliablekeep_all10Not
Applicable
MEDIUMinfiniteLate-joiner support
ToF Sensorbest_effortkeep_last = 1150 msMEDIUM3 sBandwidth optimization
Radar (77 GHz)best_effortkeep_last = 2250 msHIGH10 sReal-time detection
Sonarreliablekeep_last = 33200 msMEDIUMinfiniteReliable obstacle avoidance
GPSreliablekeep_last = 111 sLOWinfiniteGlobal positioning
Magnetometerreliablekeep_last = 55100 msMEDIUMinfiniteOrientation stability
Table 2. Comparison of Communication Mechanisms and System Characteristics of ROS Communication Middleware.
Table 2. Comparison of Communication Mechanisms and System Characteristics of ROS Communication Middleware.
MiddlewarePrimary Application ScenariosCommunication ModelData
Transfer Path
Key AdvantagesLimitations
DDSDistributed robotic systemsPublish-SubscribeNetwork transportMature standard, rich QoS, supports distributedComplex configuration
ZenohEdge-cloud collaborationPublish-Subscribe/QueryRouting forwardingFlexible routing, unified data spaceLack of standardization
IceoryxLocal high-performance communicationShared memoryLocal zero-copyZero-copy, efficient IPCLocal only
Table 3. Comparison of QoS Features Across Different Communication Middleware.
Table 3. Comparison of QoS Features Across Different Communication Middleware.
MiddlewareQoS Support LevelQoS Control
Dimensions
Control
Granularity
Design FocusTypical
Applications
DDSHighReliability, durability, history, resource limits, etc.Fine-grained (multiple configurable policies)Real-time and communication reliability assuranceDistributed robotic systems
ZenohMediumData distribution strategy, routing control, query mechanismMedium (focus on data access and distribution)Cross-network data routing and unified data accessEdge-Cloud communication
IceoryxLowNo explicit QoS mechanism (relies on shared memory)Coarse-grained (system-level control)Improves local communication performance (low latency/high throughput)Local high-bandwidth communication
Table 4. Standardized Configuration of the Basic Experimental Environment.
Table 4. Standardized Configuration of the Basic Experimental Environment.
ParameterFixed ValueDescription
Executorrclcpp-single-threaded-executorSingle-thread execution mode
Publisher1Single publisher
Subscriber1Single subscriber
QoS ReliabilityBEST_EFFORTBest-effort QoS policy
QoS DurabilityVOLATILEVolatile QoS policy
QoS HistoryKEEP_LAST, depth = 16Keep the last 16 messages
Rate1000 HzMessage publishing frequency
Runtime30 sDuration of each experimental run
DDS Domain ID0DDS domain ID
Topic Nametest_topicTest subject name
Round-trip ModeNoneNo round-trip mode
Table 5. CycloneDDS Latency Test Results.
Table 5. CycloneDDS Latency Test Results.
Message SizeAvg Min Latency (μs)Avg Max Latency (μs)Avg Mean Latency (μs)Avg Variance
64 B17.14991.9047.656.30 × 10−9
256 B21.85649.4643.511.25 × 10−9
1 KB17.48401.8831.924.62 × 10−10
16 KB18.19296.8732.543.02 × 10−10
256 KB40.00588.1867.461.27 × 10−9
1 MB180.96727.49202.452.17 × 10−9
4 MB6715.8113,459.529103.801.26 × 10−4
8 MB17,269.6924,828.4719,730.613.57 × 10−4
Table 6. FastDDS Latency Test Results.
Table 6. FastDDS Latency Test Results.
Message SizeAvg Min Latency (μs)Avg Max Latency (μs)Avg Mean Latency (μs)Avg Variance
64 B28.19417.8948.545.27 × 10−10
256 B24.86362.7544.494.28 × 10−10
1 KB25.75406.4942.644.28 × 10−10
16 KB28.78432.0148.314.23 × 10−10
256 KB52.47700.8082.551.43 × 10−9
1 MB174.96729.69203.702.13 × 10−9
4 MB1100.174781.692184.572.06 × 10−5
8 MB14,738.2224,873.7318,398.293.52 × 10−4
Table 7. CycloneDDS Throughput Test Results.
Table 7. CycloneDDS Throughput Test Results.
Message
Size
Num_Samples
_Received
Total_Data_Received (MB)Num_Samples_SentNum_Samples_LostThroughPut_MB_s (MB/s)ThroughPut_Mbit_s (Mbit/s)Sample_
Throughput (Samples/s)
64 B266,49017.06748,329481,8390.5424.5488883
256 B254,32465.11638,245383,9212.07017.3628477
1 KB219,913225.19467,857247,9447.15960.0517330
16 KB167,4472743.46192,54825,10187.212731.5895582
256 KB29,4097709.5129,4101245.0792055.869980
1 MB31683322.4981725004105.619885.998106
4 MB9554009.0425271572127.4441069.07832
8 MB4383678.74802364116.944980.99715
Table 8. FastDDS Throughput Test Results.
Table 8. FastDDS Throughput Test Results.
Message
Size
Num_Samples
_Received
Total_Data_Received (MB)Num_Samples_SentNum_Samples_LostThroughPut_MB_s (MB/s)ThroughPut_Mbit_s (Mbit/s)Sample_
Throughput (Samples/s)
64 B45,2342.89297,879252,6450.0920.7721508
256 B44,42711.37308,092263,6650.3623.0331481
1 KB42,50343.52298,949256,4461.38411.6061417
16 KB36,520598.35188,087151,56719.021159.5601217
256 KB32,3998493.4034,5112112269.9982264.9061080
1 MB39194109.7310,0056086130.6451095.928131
4 MB8713653.7424811610116.149974.33129
8 MB4263582.54875449113.886955.34514
Table 9. CycloneDDS Jitter Test Results.
Table 9. CycloneDDS Jitter Test Results.
Message
Size
Latency_Mean
(μs)
Latency_Variance (ms2)Latency_Std
(μs)
Jitter_Range
(μs)
Jitter_Coefficient (%)
64 B42.350.7627.58471.7265.12
256 B40.801.1734.17540.6783.75
1 KB47.341.1033.20587.7970.13
16 KB42.070.5322.96369.5454.57
256 KB62.910.4922.24348.1435.35
1 MB206.461.3837.13491.4617.98
4 MB811.09323.49568.762310.2770.12
8 MB18,581.55345.2718,249.618792.7498.21
Table 10. FastDDS Jitter Test Results.
Table 10. FastDDS Jitter Test Results.
Message
Size
Latency_Mean
(μs)
Latency_Variance (ms2)Latency_Std
(μs)
Jitter_Range
(μs)
Jitter_Coefficient (%)
64 B52.170.8829.69535.7056.91
256 B44.160.7827.90382.5963.17
1 KB53.730.7026.48464.0849.28
16 KB54.690.6024.44422.7444.68
256 KB72.980.6725.86410.8235.44
1 MB200.571.3636.88546.0018.39
4 MB1092.64496.562028.372105.88185.64
8 MB19,218.63379.0919,470.178757.52101.31
Table 11. CycloneDDS Scalability Test Results.
Table 11. CycloneDDS Scalability Test Results.
SubscribersLatency_Mean
(μs)
Latency_Variance
(μs2)
Cpu_Info_
Cpu_Usage (%)
Sys_Tracker_
Ru_Nvcsw
132.83318.652.6932,067
1094.062014.6310.03179,298
20157.7349,289.1618.86351,471
30222.6854,436.7128.58552,199
40275.3490,117.0337.04755,601
50303.30111,003.2641.85965,695
Table 12. FastDDS Scalability Test Results.
Table 12. FastDDS Scalability Test Results.
SubscribersLatency_Mean
(μs)
Latency_Variance
(μs2)
Cpu_Info_
Cpu_Usage (%)
Sys_Tracker_
Ru_Nvcsw
150.21600.853.6232,199
10116.971231.9312.75177,613
20186.3048,786.1522.35342,811
30263.0975,307.3533.40516,165
40313.12110,021.8441.14708,677
50384.37171,337.4852.03925,353
Table 13. CycloneDDS Packet Loss Rate Test Results.
Table 13. CycloneDDS Packet Loss Rate Test Results.
Rate (Hz)Avg Num_
Samples_Sent
Avg Num_
Samples_Received
Avg Num_
Samples_Lost
Avg Packet Loss
Rate (%)
0103.26 K99.95 K3309.383.0511
10001.00 K1.00 K0.000.0000
50004999.524999.520.000.0000
10,0009999.629999.140.480.0048
20,00019,999.3119,980.0018.550.0928
Table 14. FastDDS Packet Loss Rate Test Results.
Table 14. FastDDS Packet Loss Rate Test Results.
Rate (Hz)Avg Num_
Samples_Sent
Avg Num_
Samples_Received
Avg Num_
Samples_Lost
Avg Packet Loss
Rate (%)
0134.74 K44.47 K90,270.3466.9415
10009999990.000.0000
50004999.214999.210.000.0000
10,0009999.529998.280.900.0090
20,00019,999.7919,987.5911.410.0571
Table 15. Performance Comparison of the Three Transmission Modes.
Table 15. Performance Comparison of the Three Transmission Modes.
Transmission
Mode
Avg Min
Latency
(μs)
Avg Max
Latency
(μs)
Avg
Mean
Latency
(μs)
Avg
Variance (μs2)
Avg CPU
Usage
(%)
Avg Total Data
Received
Avg
Throughput
Avg
Samples
Lost
Cyclone
DDS
2032.343099.842264.724.30 × 10411.4777800.04 MB13.333 MB/s (106.67 Mbit/s)0.00
Shared Memory1436.402221.971617.971.99 × 1048.2073400.05 MB6.668 MB/s (53.34 Mbit/s)0.00
Zero Copy884.371287.461005.905.64 × 1035.2017400.01 MB6.666 MB/s (53.33 Mbit/s)0.00
Table 16. Key Research Directions for Future ROS Communication.
Table 16. Key Research Directions for Future ROS Communication.
Research DirectionKey ProblemTechnical ChallengeResearch Significance
DDS–TSN
Integration
Conventional Ethernet cannot guarantee deterministic end-to-end communication under network congestionMapping DDS QoS policies to TSN scheduling mechanisms and ensuring bounded latency across heterogeneous networksEnable deterministic communication for industrial robots and autonomous vehicles
Intelligent QoS
Adaptation
Static QoS profiles cannot adapt to dynamic network conditionsRuntime optimization of Reliability, Deadline, History, and Lifespan policies while minimizing decision overheadImprove communication efficiency and adaptability in large-scale robotic systems
Real-Time and
Security Co-design
Security mechanisms introduce additional latency and computational overheadBalancing encryption, authentication, and deterministic communication requirementsSupport safety-critical robotic applications with both security and real-time guarantees
Hardware-
Accelerated
Communication
Increasing sensor bandwidth creates CPU bottlenecks in communication processingEfficient integration of RDMA, FPGA, SmartNIC, and zero-copy transport mechanismsImprove throughput, scalability, and resource utilization in next-generation robotic systems
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Wei, Z.; You, H.; Xu, H.; Deng, Z. The Evolution of the Robot Operating System Communication Ecosystem: An Overview of the DDS Architecture and Emerging Communication Protocols. Electronics 2026, 15, 2632. https://doi.org/10.3390/electronics15122632

AMA Style

Wei Z, You H, Xu H, Deng Z. The Evolution of the Robot Operating System Communication Ecosystem: An Overview of the DDS Architecture and Emerging Communication Protocols. Electronics. 2026; 15(12):2632. https://doi.org/10.3390/electronics15122632

Chicago/Turabian Style

Wei, Zhe, Huitong You, Haibo Xu, and Zhipan Deng. 2026. "The Evolution of the Robot Operating System Communication Ecosystem: An Overview of the DDS Architecture and Emerging Communication Protocols" Electronics 15, no. 12: 2632. https://doi.org/10.3390/electronics15122632

APA Style

Wei, Z., You, H., Xu, H., & Deng, Z. (2026). The Evolution of the Robot Operating System Communication Ecosystem: An Overview of the DDS Architecture and Emerging Communication Protocols. Electronics, 15(12), 2632. https://doi.org/10.3390/electronics15122632

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.
Back to TopTop