Graph-Gated Relational Reasoning for Enhanced Coordination and Safety in Distributed Multi-Robot Systems: A Decentralized Reinforcement Learning Approach
Abstract
1. Introduction
2. Related Work
2.1. Multi-USV Coverage and Sensing Foundations
2.2. Marine Multi-Modal Fusion and Value-Decomposition MARL
2.3. Transformers in MARL and Positioning of This Work
3. Materials and Methods
3.1. System Architecture and Operational Context
3.1.1. USV Platform and Hardware Configuration
- Physical Platform Assumptions
- Onboard Sensor Suite
- Vision System: A forward-facing RGB camera (1920 × 1080 resolution, 30 Hz) housed in a marine-grade (IP68) enclosure for surface obstacle detection and teammate tracking. Representative hardware includes the FLIR Blackfly S (Teledyne FLIR, Wilsonville, OR, USA) or similar industrial marine cameras with fog and spray penetration capabilities.
- Underwater Sonar: A forward-looking imaging sonar system with a 120–130° horizontal field of view (FOV), a maximum range of 60 m, and an update rate of 1–5 Hz for submerged obstacle detection. The architecture supports both mechanical scanning sonars (e.g., Imagenex 881A (Imagenex Technology Corp., Port Coquitlam, BC, Canada)) and modern solid-state systems (e.g., Blueprint Oculus M1200d (Blueprint Subsea, Ulverston, UK)).
- Proprioceptive Sensors: These include a GPS/GNSS receiver (ZED-F9P, u-blox, Thalwil, Switzerland) with Real-Time Kinematic (RTK) correction for high accuracy (<0.1 m horizontal error under ideal conditions), degrading to standard GPS precision (±2–5 m) in constrained environments. A 9-axis IMU (100 Hz) and an optional Doppler Velocity Log (DVL) are integrated for dead-reckoning in GPS-denied scenarios.
- Onboard Computing
- Primary Target Hardware: NVIDIA Jetson AGX Orin (32 GB RAM, 275 TOPS) or Jetson Xavier NX (8 GB RAM, 21 TOPS).
- Power Consumption: Typically, under 30 W, compatible with 12–48 V marine DC power systems.
- Operating System: ROS2 (Humble or Iron distribution) for modular sensor integration and software architecture.
- Real-Time Performance: The inference time of the framework is 4.5 ms on a desktop-grade GPU (see Section 4.3). After INT8 quantization, this translates to 15–20 ms on Jetson hardware, well within the 50–100 ms control loop period required for stable USV navigation.
3.1.2. Communication Architecture
- Decentralized Execution Paradigm:
- Primary Channel: 4G/5G LTE where available (typical in coastal/harbor operations).
- Backup Channel: VHF radio (voice for emergency commands) or satellite (Iridium SBD for status telemetry at 1–2 min intervals in offshore scenarios).
- Data Requirements: Minimal-periodic position beacons (10 bytes @ 1 Hz) and mission status updates (100 bytes @ 0.1 Hz), totaling <5 kbps per vessel.
- Centralized Training Communication (Offline Only):
3.1.3. Integration with Low-Level Control Systems
- Hierarchical Control Architecture:
3.2. Problem Formulation
3.3. TransQMIX Architecture Overview
3.4. Decentralized Perception Module
3.4.1. Sensor Data Preprocessing and Noise Mitigation
- A.
- Visual Data Preprocessing (RGB Camera)
- B.
- Sonar Data Preprocessing (Imaging Sonar)
- C.
- Multi-Modal Data Synchronization
3.4.2. Per-Agent Multimodal Encoding and Feature-Level Fusion
3.4.3. Entity Builder with Padding and Masking
3.4.4. The Graph-Gated Transformer (GGT): An Architecture for Emergent Collective Cognition
- Dynamic Tactical Relational Graph (TRG) Construction
- Nodes (V): The nodes of the graph are the entity tokens X generated by the Entity Builder (as described in Section 3.4.2), representing the agent itself, perceived teammates, and obstacles.
- Edges (): The edges are dynamically generated based on a set of maritime domain-specific heuristics that encode crucial tactical relationships. An edge (i, j) from entity i to entity j is created if any of the following conditions are met:
- Collision-Risk Edges: An edge is formed if the predicted Time to Closest Point of Approach (TCPA) between entities i and j is below a critical safety threshold . This edge type encodes an immediate, high-priority safety relationship, directly reflecting principles of COLREGs.
- Spatial Proximity Edges: An edge is formed if the Euclidean distance d(i, j) is below a predefined perception radius . This captures general situational awareness and the local context of the agent.
- Cooperative-Intent Edges: For teammate entities, an edge is formed if their current velocity vectors are aligned towards a similar unobserved region of the map, signifying a potential cooperative intent for coverage tasks.
- Graph-Gated Attention Mechanism
3.4.5. Individual Q-Network
3.5. Centralized Training via QMIX
3.6. Reward Function Design and Training Protocol
4. Experiments
4.1. Abbreviations and Acronyms
4.1.1. Simulation Environments
- A 10% false negative rate, to emulate phenomena like acoustic shadowing behind other objects or signal attenuation in acoustically challenging water conditions, causing existing obstacles to be missed.
- A 5% false positive rate, to represent the frequent occurrence of ghost echoes resulting from surface/bottom reverberation or dense biological clutter (e.g., schools of fish or kelp beds), which can be mistaken for real obstacles.
4.1.2. Agent Configuration
- Vision System: A forward-looking RGB camera with a 64 × 64pixel resolution. A pre-trained ResNet-18 backbone processes the raw images.
- Sonar Array: A forward-looking sonar system that generates point cloud data of underwater objects. This data is processed by a PointNet architecture.
- Proprioceptive Sensors: A simulated GPS/IMU system providing the agent’s state, including position with Gaussian noise , velocity , and remaining energy .
4.1.3. Evaluation Metrics
4.1.4. Baseline Algorithms
4.1.5. Implementation Details
4.2. Results and Analysis
4.2.1. Comparative Performance Analysis: Analysis of Coordination and Safety Metrics
4.2.2. Ablation Analysis: Impact of Graph-Gated Attention
4.2.3. Ablation Studies on Core Components
4.2.4. Analysis of the Attention Mechanism
4.2.5. Sensitivity Analysis of Hyperparameters
4.2.6. Qualitative Analysis of Learned Behaviors
- (a)
- QMIX Trajectory: The Behavior of the QMIX-controlled agents highlights the pitfalls of context-blind decision-making. As the dynamic obstacle approaches, the agents exhibit hesitant and reactive maneuvers. USV-1 makes a late, sharp turn, which inadvertently places it in the path of USV-2. This forces USV-2 into a conflicting evasive action, resulting in a near-miss between the teammates. This sequence of late and ambiguous maneuvers directly contravenes the principle of taking ‘positive action in ample time’ as stipulated in COLREGs Rule 8, creating a situation of unnecessary risk.
- (b)
- TransQMIX Trajectory: In stark contrast, the TransQMIX agents demonstrate intelligent and proactive coordination that is qualitatively consistent with the principles of good seamanship. Well before the obstacle poses an immediate threat, the agents begin a smooth, coordinated evasive maneuver. USV-1 adjusts its course to port while USV-2 makes a complementary adjustment to starboard, creating a wide and predictable safe passage. This behavior mirrors the safe passage protocol for a head-on encounter as described in COLREGs Rule 14 (‘each shall alter her course to starboard’). More importantly, the action is decisive, early, and clearly communicates intent, preventing any ambiguity and ensuring a wide margin of safety, fully aligning with the spirit of Rule 8. This Behavior is not a pre-programmed rule but an emergent strategy learned through the Transformer’s ability to reason about the future consequences of joint actions.
4.3. Computational Performance
4.3.1. Model Size
4.3.2. Inference Time
- QMIX (Baseline): The average inference time was 0.8 ms.
- TransQMIX (Ours): The average inference time was 4.5 ms.
4.4. Maritime Scenario Case Study: Post-Disaster Harbor Rapid Assessment
4.4.1. Scenario Description and Operational Requirements
- Dynamic Obstacles: Drifting debris and collapsed structures.
- Severe Sensor Degradation: High water turbidity (Secchi depth < 1 m) severely limits optical visibility, while dense acoustic clutter from debris compromises sonar performance.
- Unpredictable Fluid Dynamics: Chaotic currents (0.3–0.8 m/s) and residual wave heights (0.5–1.0 m) are present.
- Mission-Critical Constraints:
- Time-Critical Completion: The entire survey must be completed within 12 h.
- Safety-Critical Operation: Zero collision tolerance due to the high-risk environment.
- Regulatory Compliance: Adherence to COLREGs is required.
4.4.2. Proposed Deployment Configuration
- Compute Module: NVIDIA Jetson AGX Orin (32 GB RAM, 275 TOPS). Our benchmark of 4.5 ms inference time on an RTX 4090 GPU conservatively translates to approximately 15–20 ms on Jetson hardware after INT8 quantization. This is well within the 50–100 ms control loop budget required for stable real-time navigation.
- Sensor Suite:
- Vision: A forward-facing, marine-grade RGB camera (e.g., FLIR Blackfly S).
- Sonar: A multi-beam imaging sonar (e.g., Blueprint Oculus M1200d) with a wide field of view (~130°) for robust submerged hazard detection.
- Proprioception: An RTK-corrected GPS/GNSS, an IMU, and a Doppler Velocity Log (DVL) for dead-reckoning fallback in GPS-denied areas near large metal structures.
- Communication: A primary 4G/5G LTE link for supervisory control, with a VHF radio backup.
4.4.3. Performance Projection and Feasibility Analysis
- Efficiency: The estimated mission time for 95% coverage of the 1.8 km2 harbor is 8–10 h. This represents a 2–3× efficiency gain compared to uncoordinated multi-USV operations (~20–24 h) and a significant improvement over traditional single-vessel surveys (3–4 days).
- Safety: The simulated collision rate of 1.8 collisions/episode (complex scenario) projects to <0.3 incidents/mission in the real harbor. This projection is justified by the fact that the GGT’s attention mechanism excels at tracking persistent, large-scale structures (like breakwaters), which are more common in a real harbor than the uniformly dynamic obstacles of the simulation. This is in stark contrast to baseline QMIX, whose 20.7 collisions/episode would translate to an unacceptable ~3–4 high-risk events per mission.
- Adaptation to Real-World Challenges:
- GPS Multipath: Near large metal structures, the system’s multi-modal fusion capability, as validated in our ablation studies, allows for graceful performance degradation, with dead-reckoning via IMU/DVL providing a robust fallback.
- Acoustic Clutter: The PointNet encoder, trained on noisy sonar data, learns to distinguish true obstacles from clutter. The 5% false positive rate observed in simulation translates to a manageable ~15–20 false detections per mission, which are dynamically down-weighted by the GGT’s attention mechanism.
4.4.4. Practical Feasibility and Deployment Roadmap
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Liu, Y.; Sun, Z.; Wan, J.; Li, H.; Yang, D.; Li, Y.; Fu, W.; Yu, Z.; Sun, J. Hybrid Path Planning Method for USV Based on Improved A-Star and DWA. J. Mar. Sci. Eng. 2025, 13, 934. [Google Scholar] [CrossRef]
- Shem-Tov, E.; Sipper, M.; Elyasaf, A. BERT Mutation: Deep Transformer Model for Masked Uniform Mutation in Genetic Programming. Mathematics 2025, 13, 779. [Google Scholar] [CrossRef]
- Li, Y.; Li, L.; Zhu, L.; Zhang, Z.; Guo, Y. Leader-Following-Based Optimal Fault-Tolerant Consensus Control for Air–Marine–Submarine Heterogeneous Systems. J. Mar. Sci. Eng. 2025, 13, 878. [Google Scholar] [CrossRef]
- Rashid, T.; Samvelyan, M.; de Witt, C.S.; Farquhar, G.; Foerster, J.; Whiteson, S. Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning. J. Mach. Learn. Res. 2020, 21, 1–52. [Google Scholar]
- Manzini, T.; Murphy, R. Differentiable Boustrophedon Paths That Enable Optimization Via Gradient Descent. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Yokohama, Japan, 13–17 May 2024; pp. 8778–8783. [Google Scholar]
- Dai, Y.; Zhang, K.; Peng, M.; Mao, S. Multi-Task Offloading for Digital Twin-Enabled IoT Systems: A Deep Reinforcement Learning Approach. IEEE Trans. Ind. Inform. 2023, 19, 840–850. [Google Scholar]
- Zhou, Z.; Chen, X.; Li, E.; Zeng, L.; Luo, K.; Zhang, J. Edge Intelligence: Paving the Last Mile of Artificial Intelligence With Edge Computing. Proc. IEEE 2019, 107, 1738–1762. [Google Scholar] [CrossRef]
- Fossen, T.I. Handbook of Marine Craft Hydrodynamics and Motion Control; John Wiley & Sons: Chichester, UK, 2011. [Google Scholar]
- Chen, S.; Zhang, Z.; Yang, Y.; Du, Y. STAS: Spatial-Temporal Return Decomposition for Multi-Agent Reinforcement Learning. In Proceedings of the 38th Annual AAAI Conference on Artificial Intelligence, Vancouver, BC, Canada, 20–27 February 2024; Volume 38, pp. 17337–17345. [Google Scholar]
- Xiao, B.; Li, R.; Wang, F.; Peng, C.; Wu, J.; Zhao, Z.; Zhang, H. Stochastic Graph Neural Network-Based Value Decomposition for MARL in Internet of Vehicles. IEEE Trans. Veh. Technol. 2023, 73, 1582–1596. [Google Scholar] [CrossRef]
- Kramar, V.; Dementiev, K.; Kabanov, A. Optimal State Estimation in Underwater Vehicle Discrete-Continuous Measurements via Augmented Hybrid Kalman Filter. J. Mar. Sci. Eng. 2025, 13, 933. [Google Scholar] [CrossRef]
- Bijelic, M.; Gruber, T.; Mannan, F.; Kraus, F.; Ritter, W.; Dietmayer, K.; Heide, F. Seeing Through Fog Without Seeing Fog: Deep Multi-Modal Sensor Fusion in Unseen Adverse Weather. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 7762–7771. [Google Scholar]
- Sunehag, P.; Lever, G.; Gruslys, A.; Czarnecki, W.M.; Zambaldi, V.; Jaderberg, M.; Lanctot, M.; Sonnerat, N.; Leibo, J.Z.; Tuyls, K.; et al. Value-Decomposition Networks for Cooperative Multi-Agent Learning. In Proceedings of the 17th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2018), Stockholm, Sweden, July 10–15 2018; pp. 2085–2087. [Google Scholar]
- Son, K.; Kim, D.; Kang, W.J.; Hostallero, D.; Yi, Y. QTRAN: Learning to Factorize with Transformation for Cooperative Multi-Agent Reinforcement Learning. In Proceedings of the 36th International Conference on Machine Learning, Long Beach, CA, USA, 10–15 June 2019; Volume 97, pp. 5887–5896. [Google Scholar]
- Son, K.; Ahn, S.; Reyes, R.D.; Kim, D.; Kang, W.J. QTRAN++: Improved Value Transformation for Cooperative Multi-Agent Reinforcement Learning. arXiv 2020, arXiv:2006.12010. [Google Scholar]
- Cui, Z.; Deng, K.; Zhang, H.; Zha, Z.; Jobaer, S. Deep Reinforcement Learning-Based Multi-Agent System with Advanced Actor–Critic Framework for Complex Environment. Mathematics 2025, 13, 754. [Google Scholar] [CrossRef]
- Wu, N.; Li, J.; Xiong, J. An Integrated Design of Course-Keeping Control and Extended State Observers for Nonlinear USVs with Disturbances. J. Mar. Sci. Eng. 2025, 13, 967. [Google Scholar] [CrossRef]
- Hao, Y.; Song, S.; Huang, B.; Li, J. Distributed Edge-Event Triggered Formation Control for Multiple Unmanned Surface Vessels with Connectivity Preservation. In Proceedings of the 2022 IEEE 17th International Conference on Control & Automation (ICCA), Hefei, China, 27–30 June 2022; pp. 978–983. [Google Scholar]
- Pina, R.; De Silva, V.; Hook, J.; Kondoz, A. Residual Q-Networks for Value Function Factorizing in Multi-Agent Reinforcement Learning. IEEE Trans. Neural Netw. Learn. Syst. 2024, 35, 4559–4571. [Google Scholar] [CrossRef] [PubMed]
- Memarian, F.; Goo, W.; Lioutikov, R.; Niekum, S.; Topcu, U. Self-Supervised Online Reward Shaping in Sparse-Reward Environments. In Proceedings of the 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Prague, Czech Republic, 27 September–1 October 2021; pp. 5621–5628. [Google Scholar]
- Borgianni, L.; Adami, D.; Giordano, S.; Pagano, M. Enhancing Reliability in Rural Networks Using a Software-Defined Wide Area Network. Computers 2024, 13, 113. [Google Scholar] [CrossRef]
- Toyomoto, Y.; Oshima, T.; Oishi, K.; Bando, M.; Ishii, K. Constraint-Driven Multi-USV Coverage Path Generation for Aquatic Environmental Monitoring. IEEE Trans. Control Syst. Technol. 2023, 31, 2792–2799. [Google Scholar] [CrossRef]
- Ni, J.; Gu, Y.; Tang, G. Cooperative Coverage Path Planning for Multi-Mobile Robots Based on Improved K-Means Clustering and Deep Reinforcement Learning. Electronics 2024, 13, 944. [Google Scholar] [CrossRef]
- De Vries, J.A.; Moerland, T.M.; Plaat, A. On Credit Assignment in Hierarchical Reinforcement Learning. arXiv 2022, arXiv:2203.03292. [Google Scholar] [CrossRef]
- Zhang, J.; Zhou, W.; Deng, X. Optimization of Adaptive Observation Strategies for Multi-AUVs in Complex Marine Environments Using Deep Reinforcement Learning. J. Mar. Sci. Eng. 2025, 13, 865. [Google Scholar] [CrossRef]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention Is All You Need. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 5998–6008. [Google Scholar]
- Ke, H.; Wang, H.; Sun, H. Multi-Agent Deep Reinforcement Learning-Based Partial Task Offloading and Resource Allocation in Edge Computing Environment. Electronics 2022, 11, 2806. [Google Scholar] [CrossRef]
- Hu, S.; Zhu, F.; Chang, X.; Liang, X. UPDeT: Universal Multi-Agent Reinforcement Learning via Policy Decoupling with Transformers. In Proceedings of the 9th International Conference on Learning Representations, Virtual Event, 3–7 May 2021. [Google Scholar]
- Dai, Y.; Zhang, Y.; Zhou, X.; Wang, Q.; Song, X.; Wang, S. MTGNet: Multi-Agent End-to-End Motion Trajectory Prediction with Multi-Modal Panoramic Dynamic Graph. Appl. Sci. 2025, 15, 5244. [Google Scholar] [CrossRef]
- Xu, Z.; Shen, Y.; Xie, Z.; Liu, Y. Research on Autonomous Underwater Vehicle Path Optimization Using a Field Theory-Guided A Algorithm. J. Mar. Sci. Eng. 2024, 12, 1015. [Google Scholar] [CrossRef]














| Parameter | Simple Scenario | Complex Scenarios |
|---|---|---|
| Size | ||
| Obstacle | 15 random static obstacles | ) |
| USV quantity | Four ships | Eight ships |
| Sensor noise | Sonar false detection rate 10% |
| Category | Parameter | Value |
|---|---|---|
| MARL Training Parameters | Learning Rate (AdamW) | Initial: 5 × 10−4, with linear decay |
| Optimizer Betas (β1, β2) | (0.9, 0.999) | |
| Discount Factor (γ) | 0.99 | |
| Replay Buffer Size | 5000 transitions | |
| Batch Size | 32 | |
| Target Network Update Frequency | Every 200 episodes | |
| Epsilon (ε) for ε-greedy | Start: 1.0, End: 0.05, Decay Steps: 50,000 | |
| Gradient Clipping Norm | 10.0 | |
| Model Architecture | GGT: Transformer Layers (L) | 4 |
| GGT: Attention Heads (h) | 8 | |
| GGT: Hidden Dimension | 256 | |
| ResNet-18 Output Dimension | 512 | |
| PointNet Output Dimension | 256 | |
| QMIX Mixing Net Hidden Dimension | 64 | |
| QMIX Hypernetwork Hidden Dimension | 128 | |
| Environment Settings | Max Episode Length (Complex) | 600 steps |
| Number of Agents (Complex) | 8 | |
| Number of Obstacles (Complex) | 50 (dynamic) | |
| Computational Setup | GPU | NVIDIA RTX 4090 (24 GB VRAM) (NVIDIA Corporation, Santa Clara, CA, USA) |
| CPU | AMD Ryzen 9 7950X | |
| Software | PyTorch 2.1, CUDA 12.1 | |
| Operating System | Ubuntu 22.04 LTS |
| Scene | Algorithm | Fraction of Coverage (C) ↑ | Path Efficiency (ζ) ↑ | Synergy (κ) ↑ | ↓ | Number of Collisions ↓ |
|---|---|---|---|---|---|---|
| Simple scenario | IQL | 88.1% | 0.72 | 0.65 | 950 | 5.3 |
| QMIX | 93.2% | 0.81 | 0.88 | 810 | 1.9 | |
| MAPPO | 91.5% | 0.79 | 0.85 | 840 | 2.4 | |
| STAS | 95.5% | 0.86 | 0.91 | 780 | 1.1 | |
| SGNN-VD | 94.8% | 0.84 | 0.90 | 795 | 1.3 | |
| TransQMIX | 97.8% | 0.89 | 0.95 | 750 | 0.8 | |
| GGT-QMIX (Ours) | 98.5% | 0.93 | 0.97 | 720 | 0.2 | |
| Complex scenarios | IQL | 21.4% | 0.25 | 0.18 | 2350 | 51.2 |
| QMIX | 60.3% | 0.52 | 0.61 | 1980 | 20.7 | |
| MAPPO | 58.7% | 0.50 | 0.55 | 2010 | 22.5 | |
| STAS | 90.1% | 0.82 | 0.88 | 1320 | 3.5 | |
| SGNN-VD | 88.5% | 0.80 | 0.85 | 1380 | 4.2 | |
| TransQMIX | 92.5% | 0.85 | 0.91 | 1250 | 2.1 | |
| GGT-QMIX (Ours) | 95.3% | 0.92 | 0.96 | 1180 | 0.4 |
| Model Variant | Fraction of Coverage (C) ↑ | Path Efficiency (ζ) ↑ | Synergy (κ) ↑ | ↓ | Number of Collisions ↓ |
|---|---|---|---|---|---|
| GGT-QMIX (Ours) | 95.3% | 0.92 | 0.96 | 1180 | 0.4 |
| Standard TransQMIX (w/o Gating) | 92.5% | 0.85 | 0.91 | 1250 | 2.1 |
| w/o TF (Transformer) | 68.3% | 0.61 | 0.69 | 1810 | 17.5 |
| w/o MM | 74.2% | 0.66 | 0.73 | 1690 | 14.8 |
| QMIX + Attn | 65.1% | 0.58 | 0.67 | 1850 | 18.1 |
| Subsystem | Current TRL | Path to Deployment |
|---|---|---|
| Perception Module | TRL 4(Lab Validation) | → TRL 6 via HIL testing with real harbor sensor data and domain adaptation. |
| Coordination Algorithm | TRL 4 (Lab Validation) | → TRL 6 via at-sea trials in a controlled harbor environment. |
| System Integration | TRL 3 (Proof-of-Concept) | → TRL 5 via full ROS2 integration on prototype hardware. |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Chang, T.; Ma, Y.; Li, Z.; Huang, S.; Ma, Z.; Xiong, Y.; Huang, S.; Qin, J. Graph-Gated Relational Reasoning for Enhanced Coordination and Safety in Distributed Multi-Robot Systems: A Decentralized Reinforcement Learning Approach. Sensors 2025, 25, 7335. https://doi.org/10.3390/s25237335
Chang T, Ma Y, Li Z, Huang S, Ma Z, Xiong Y, Huang S, Qin J. Graph-Gated Relational Reasoning for Enhanced Coordination and Safety in Distributed Multi-Robot Systems: A Decentralized Reinforcement Learning Approach. Sensors. 2025; 25(23):7335. https://doi.org/10.3390/s25237335
Chicago/Turabian StyleChang, Tianshun, Yiping Ma, Zhiqian Li, Shuai Huang, Zeqi Ma, Yang Xiong, Shijie Huang, and Jingbo Qin. 2025. "Graph-Gated Relational Reasoning for Enhanced Coordination and Safety in Distributed Multi-Robot Systems: A Decentralized Reinforcement Learning Approach" Sensors 25, no. 23: 7335. https://doi.org/10.3390/s25237335
APA StyleChang, T., Ma, Y., Li, Z., Huang, S., Ma, Z., Xiong, Y., Huang, S., & Qin, J. (2025). Graph-Gated Relational Reasoning for Enhanced Coordination and Safety in Distributed Multi-Robot Systems: A Decentralized Reinforcement Learning Approach. Sensors, 25(23), 7335. https://doi.org/10.3390/s25237335
