Evolving Collective Intelligence for Unmanned Marine Vehicle Swarms: A Federated Meta-Learning Framework for Cross-Fleet Planning and Control
Abstract
1. Introduction
1.1. The Grand Challenge: The Data Silo Dilemma in Maritime Autonomy
1.2. A New Paradigm: Towards a Global Maritime Cognitive Network
1.3. The FMTL Framework: A Three-Stage Cognitive Development Architecture
1.4. Contributions and Outline
2. Related Work
2.1. Federated Learning: A Key to Unlocking Collaborative Intelligence Under Privacy Constraints
2.2. Transfer Learning and Foundation Models: Building upon Universal Priors
2.3. Meta-Learning: The “Last Mile” of Rapid, Personalized Adaptation
2.4. Synthesis and Our Contribution: The Case for an Integrated FMTL Framework
3. Materials and Methods
3.1. The Federated Meta-Transfer Learning (FMTL) Framework: An Overview
3.1.1. Stage 1: Foundational Pre-Training via Transfer Learning
3.1.2. Stage 2: Collaborative Evolution via Federated Learning
- Resilience to Node Failures (Stragglers):
- 2.
- Defense Against Malicious Participants:
(1, ∥Δk∥2/γ). This limits the impact any single malicious actor can have on the global weight manifold.3.1.3. Stage 3: Rapid Personalization via Federated Meta-Learning
3.2. System Modeling and Problem Formulation
3.2.1. Kinematic Equations
3.2.2. Hydrodynamic and Stochastic Modeling
3.3. Global Path Planner: Theta Algorithm
- Standard path cost calculation: c(s) = g(p) + cost(p, s).
- Line-of-sight check: Verify if obstacles exist between s and parent(p).
- Path update:
- (a)
- If line-of-sight exists between s and parent(p), calculate shorter path cost: c’(s) = g(parent(p)) + cost(parent(p), s).
- (b)
- If c’(s) < c(s), set parent of s directly to parent(p) and update g(s) = c’(s).
- (c)
- Otherwise, maintain parent of s as p.
3.4. Local Decision Agent: Cognitive Navigation via Graph-Gated Transformer PPO (GGT-TPPO)
3.4.1. Dynamic Tactical Relational Graph (TRG) Construction
3.4.2. Graph-Gated Transformer (GGT) for Structured Reasoning
3.4.3. Decision-Making and MDP Formulation
3.4.4. T-PPO Network Architecture
- Own-ship and task states are concatenated and encoded through (2 layers, 128 neurons each) to produce state feature ∈ .
- Each obstacle feature vector is mapped to high-dimensional space through (2 layers, 64 → 128 neurons) to produce obstacle embeddings , where ∈ .
- Transformer Encoder: The obstacle embeddings are processed by a compact Transformer encoder configuration:
- 2 Transformer layers with 4 attention heads per layer.
- Model dimension = 128, Feed-forward dimension = 512.
- Layer normalization and residual connections.
- No positional encoding (as vessel set is unordered).
- Transformer outputs are aggregated via average pooling to form environment context vector ∈ .
- State feature and environment context are concatenated to form final fused representation ∈ .
- Both Actor and Critic networks consist of 3 fully connected layers (256→output) with ReLU activation.
- Actor outputs Gaussian distribution parameters for continuous action sampling.
- Critic outputs scalar state value estimate.
3.5. Meta-Reinforcement Learning for Fast Adaptation
3.5.1. Problem Formulation
3.5.2. Task Distribution Design (Enhanced)
- Ego-Vessel Dynamics. Hydrodynamic coefficients are drawn from predefined ranges spanning agile to sluggish vessels. These values are encoded and passed to the policy as the latent dynamics vector creating an explicit physical context for adaptation.
- Heterogeneous Traffic Behaviors. Other vessels are instantiated from behavior archetypes (e.g., compliant, aggressive, erratic), altering encounter patterns and COLREGs contexts across episodes.
- Environmental Conditions. Currents/sea state magnitudes and directions are randomized to produce distinct disturbance profiles.
- Link to architecture. The vector is concatenated with perception features (cf. Equation (6)) so that meta-learning must learn how to use explicit dynamics information for rapid in-context adjustment.
3.5.3. Context-Based Meta-RL (RL2-Style) with GRU Integration
3.6. Provable Safety via Control Barrier Function (CBF) Shield
3.6.1. Modeling for Safety Design
3.6.2. Safe Set and CBF Constraint
3.6.3. Quadratic-Program (QP) Safety Filter and Integration
3.7. Integrated Framework and Implementation Details
- Global–Local Interface. Theta provides sparse, kinematically favorable waypoints W. The local T-PPO agent tracks the current waypoint while performing dynamic, COLREGs-compliant avoidance.
- Action Safety. Each policy output is filtered through the CBF-QP to guarantee collision avoidance with minimal intervention.
- Real-Time Optimizations. We employ waypoint pre-culling, Transformer encoding caches for static obstacles, and an asynchronous perception–decision–execution pipeline to achieve sub-10 ms decision cycles (details preserved in Section 4).
- Failure Recovery. A disaster recovery routine monitors a composite risk metric (DCPA/TCPA with temporal decay) and can trigger emergency stop/maneuvers and mission abort logic when necessary (see Algorithm 1 notes).
| Algorithm 1. Federated Learning Loop for the Global Maritime Cognitive Network |
| Goal: Learn a powerful global model w by leveraging private data from N fleets. |
| 1: Initialize: Server initializes global model with foundation model weights . 2: for each communication round t = 1, 2, …, T do 3: Distribution: Server sends current global model to a set of selected fleets 4: for each fleet k in in parallel do 5: // Local Fleet Training ←) // This step calls Algorithm 2 7: end for 8: Aggregation: Server collects all local updates from the fleets. ← Securely aggregate the updates (e.g., using Federated Averaging). 10: end for 11: return final global model . |
| Function LocalFleetUpdate): |
| 1: Client k sets its local model weights to . 2: Train the local model on its private dataset for E local epochs using the T-PPO Agent Training (Algorithm 2). 3: return the updated local model weights . |
4. Training Methodology and Algorithms
| Algorithm 2. Theta-TPPO Hybrid Decision Framework (Deployment, with Safety Shield) |
| . Output: USV trajectory. |
| W ← Set i ← ← W[i], ← 0 while goal not reached do ├── Perceive own-ship/task/obstacles; build . ├──) (DCPA/TCPA with decay). ├──) > CRITICAL then │ ├── += 1. │ ├── > MAX ATTEMPTS then │ │ return MISSION ABORTED. │ └── ← Emergency ). ├── else │ ├── ← // policy proposal │ └── ←) // Safety filter (this work) │ ├── Execute a_t; update state; advance waypoint if within threshold. └── Timeout guard; possibly return MISSION TIMEOUT. end while return MISSION SUCCESS. |
| Algorithm 3. T-PPO Agent Training (Optimization-Aware) |
| (Transformer encoder + GRU context as in Section 3.4.3. . |
| do ├── Reset environment; clear dynamic cache. ├── for t = 0 … - 1 do │ ├── Build encoded state using cached static context + current dynamic encodings. │ ├──. │ ├──. │ └── in D. │ │ if update interval reached then │ ├── Compute GAE advantages  and returns R. │ ├── for epochs = 1 … do │ │ ├──. │ │ └──. │ └── Clear D; update caches if needed. │ │ └── ← . end for end for |
4.1. The FMTL End-to-End Training and Deployment Pipeline
4.1.1. Stage 1: Foundational Pre-Training (Transfer Learning)
4.1.2. Stage 2: Collaborative Training (Federated Learning)
| Algorithm 4. Meta-Training Loop for Meta-Theta-TPPO (RL2-Style) |
| Goal: Learn initialization θ such that adapts rapidly on new tasks using recurrent context. Initialize meta-parameters θ for Actor–Critic (including Transformer + GRU). |
| repeat (outer loop over meta-iterations) ├──} ~ p(T). ├── in batch do │ ├── ←. │ ├── using recurrent policy │ │}). │ ├── Inner update (context-based): update hidden state across steps; │ │ optionally perform a small gradient step on ϕ │ │ (if hybrid RL2 + gradient). │ └── on held-out rollouts. │ ├── Aggregate task-level policy/value losses (post-adaptation) across . └── Meta-update θ ← using PPO-style gradients computed after adaptation behavior, keeping the recurrent context flow intact. until convergence return θ. |
4.1.3. Stage 3: Deployment and Personalization (Federated Meta-Learning)
4.2. Real-Time Optimization Strategies (Kept and Clarified)
- Waypoint pre-culling: Remove waypoints that are too far or occluded to reduce processing overhead.
- Transformer encoding cache: Pre-compute and cache embeddings for static obstacles, with selective refresh only for slow-moving entities.
- Asynchronous perception–decision–execution pipeline: Decouple perception, decision-making, and execution to maximize throughput.
4.3. Implementation Complexity Analysis (Clarified)
- is the number of static obstacles.
- d is the embedding dimension.
- K is the number of cached dynamic entities.
- τ is the temporal window size.
5. Experiments and Results
5.1. Experimental Setup: Simulating a Federated World
“Sea-Sense” Foundation Dataset: Procedural Generation Protocol
| Algorithm 5. Procedural Scenario Generation for “Sea-Sense” |
| Input: Difficulty Tier (T), Map Size (M), Number of Scenarios (N) Output: Dataset Dpublic |
| Initialize Dpublic = ∅ for i = 1 to N do # 1. Static Environment Generate obstacles Ostatic based on T (Perlin Noise or Random Polygons) Validate channel width > 2 × USVwidth # 2. Ego Vehicle Initialization Sample Start (S) and Goal (G) satisfying dist(S, G) > 0.8 × M Generate Reference Path P using Theta* algorithm # 3. Dynamic Traffic Injection Determine num_vessels based on T for v = 1 to num_vessels do Sample encounter_type (Head-on/Crossing/Overtaking) Calculate intercept_point on Path P Back-propagate vessel start position to ensure collision risk (TCPA < Threshold) Assign hydrodynamic profile (Agile/Sluggish) and behavior policy end for # 4. Simulation Sanity Check Run simulation for 10 steps if immediate collision then Discard and Retry else Add to Dpublic end if end for return Dpublic |
5.2. Implementation Details and Computational Infrastructure
5.2.1. Hardware Configuration
- Training Infrastructure:
5.2.2. Software Environment the Complete Software Stack Is as Follows
5.3. Experiment 1: Validating the Foundation Model (The Power of Transfer Learning)
5.4. Experiment 2: Breaking Data Silos with Federated Learning (Core Validation)
5.5. Experiment 3: Rapid Personalization on Unseen Tasks (Validating the Full FMTL Pipeline)
5.6. Comparison with State-of-the-Art Methods (2023–2024)
5.6.1. Baseline Methods
5.6.2. Detailed Analysis
5.6.3. Computational Efficiency Analysis
5.6.4. Ablation Study: Isolating FMTL Components
- Transfer Learning provides a 10.3% absolute improvement in cross-domain success rate by establishing universal maritime priors.
- Federated Learning adds 13.0% improvement by leveraging diverse fleet experiences, demonstrating the power of collaborative learning.
- Meta-Learning contributes 3.9% to final success rate but dramatically reduces adaptation speed by 6.9×, highlighting its critical role in deployment efficiency.
- The synergistic effect is evident: the full FMTL framework (95.4%) substantially exceeds the sum of individual contributions, confirming the hypothesis that these paradigms are complementary, not merely additive.
5.6.5. Qualitative Analysis and Provable Safety
5.7. In-Depth Explainability Study: Visualizing the Cognitive Engine (GGT)
5.8. Qualitative Analysis: What Does Adaptation Look Like?
5.9. Evaluation of Provable Safety
- Meta-TPPO with CBF-Shield (our full model).
- Meta-TPPO without Shield (Ablation).
- Vanilla TPPO without Shield (Baseline).
6. Discussion and Future Work
6.1. Governance and Incentives for a Collaborative Ecosystem
6.2. Security and Trustworthiness in a Decentralized Network
6.3. Towards True Lifelong Learning: Adapting to a Changing World
6.4. Mechatronic Viability and Hardware Realism
6.5. Computational and Energy Feasibility in Real-World Operations
6.6. From Simulation to the High Seas: The Sim-to-Real Challenge
6.7. Path to Architectural Simplification
6.8. Limitations and Real-World Deployment Considerations
7. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Constantinoiu, L.-F.; Bernardino, M.; Rusu, E. Autonomous Shallow Water Hydrographic Survey Using a Proto-Type USV. J. Mar. Sci. Eng. 2023, 11, 799. [Google Scholar] [CrossRef]
- Wang, Z.; Li, G.; Ren, J. Dynamic Path Planning for Unmanned Surface Vehicle in Complex Offshore Areas Based on Hybrid Algorithm. Comput. Commun. 2021, 166, 49–56. [Google Scholar] [CrossRef]
- Xu, X.; Lu, Y.; Liu, X. Intelligent Collision Avoidance Algorithms for USVs via Deep Reinforcement Learning under COLREGs. Ocean Eng. 2020, 217, 107704. [Google Scholar] [CrossRef]
- Feng, Z.; Pan, Z.; Chen, W.; Liu, Y.; Leng, J. Usv Application Scenario Expansion Based on Motion Control, Path Following and Velocity Planning. Machines 2022, 10, 310. [Google Scholar] [CrossRef]
- Lyu, H.; Hao, Z.; Li, J. Ship Autonomous Collision-Avoidance Strategies—A Comprehensive Review. J. Mar. Sci. Eng. 2023, 11, 830. [Google Scholar] [CrossRef]
- Gangopadhyay, M.; Arzoo; Vishwakarma, D.K. Federated Learning for Self-Steering USVs. In Proceedings of the 2025 6th International Conference on Inventive Research in Computing Applications (ICIRCA), Coimbatore, India, 25–27 June 2025; pp. 624–628. [Google Scholar]
- Song, B.; Khanduri, P.; Zhang, X.; Yi, J.; Hong, M. FedAvg Converges to Zero Training Loss Linearly for Overparameterized Multi-Layer Neural Networks. In Proceedings of the 40th International Conference on Machine Learning, Honolulu, HI, USA, 23–29 July 2023; PMLR. Volume 202, pp. 32304–32330. [Google Scholar]
- Xing, S.; Ning, Z.; Zhou, J.; Liao, X.; Xu, J.; Zou, W. N-FedAvg: Novel Federated Average Algorithm Based on FedAvg. In Proceedings of the 2022 14th International Conference on Communication Software and Networks (ICCSN), Chongqing, China, 10–12 June 2022. [Google Scholar]
- Li, R.; Wang, H.; Lu, Q.; Yan, J.; Ji, S.; Ma, Y. Research on medical image classification based on improved fedavg algorithm. Tsinghua Sci. Technol. 2025, 30, 2243–2258. [Google Scholar] [CrossRef]
- Hu, B. Financial risk fraud detection method based on improved FedAvg algorithm. In Proceedings of the Second International Conference on Big Data, Computational Intelligence, and Applications (BDCIA 2024), Huanggang, China, 15–17 November 2025; Volume 13550, pp. 950–956. [Google Scholar]
- Ruder, S.; Peters, M.E.; Swayamdipta, S.; Wolf, T. Transfer learning in natural language processing. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Tutorials, Minneapolis, MN, USA, 2–7 June 2019; pp. 15–18. [Google Scholar]
- Öztürk, C.; Taşyürek, M.; Türkdamar, M.U. Transfer learning and fine-tuned transfer learning methods’ effectiveness analyse in the CNN-based deep learning models. Concurr. Comput. Pract. Exp. 2023, 35, e7542. [Google Scholar] [CrossRef]
- Prottasha, N.J.; Sami, A.A.; Kowsher; Murad, S.A.; Bairagi, A.K.; Masud, M.; Baz, M. Transfer learning for sentiment analysis using BERT based supervised fine-tuning. Sensors 2022, 22, 4157. [Google Scholar] [CrossRef]
- Zhang, L.; Wu, J.; Zhang, K.; Wang, Z.; Yan, X.; Liu, P.; Wang, Q.; Fan, L.; Yao, J.; Yang, Y.; et al. Diagnosis of pumping machine working conditions based on transfer learning and ViT model. Geoenergy Sci. Eng. 2023, 226, 211729. [Google Scholar] [CrossRef]
- Stüber, J.; Kopicki, M.; Zito, C. Feature-based transfer learning for robotic push manipulation. In Proceedings of the 2018 IEEE International Conference on Robotics and Automation (ICRA), Brisbane, Australia, 21–25 May 2018; pp. 5643–5650. [Google Scholar]
- Anwar, A.; Raychowdhury, A. Autonomous navigation via deep reinforcement learning for resource constraint edge nodes using transfer learning. IEEE Access 2020, 8, 26549–26560. [Google Scholar] [CrossRef]
- Amoke, D.A.; Li, Y.; Naqvi, S.M. Transfer Learning-Based Vessel Trajectory Classification in AIS Data. In Proceedings of the 2025 25th International Conference on Digital Signal Processing (DSP), Messinia, Greece, 25–27 June 2025; pp. 1–5. [Google Scholar]
- Jin, K.; Zhu, H.; Gao, R.; Wang, J.; Wang, H.; Yi, H.; Shi, C.-J.R. DEMRL: Dynamic estimation meta reinforcement learning for path following on unseen unmanned surface vehicle. Ocean Eng. 2023, 288, 115958. [Google Scholar] [CrossRef]
- Wang, B.; Jiang, P.; Gao, J.; Huo, W.; Yang, Z.; Liao, Y. A lightweight few-hot marine object detection network for unmanned surface vehicles. Ocean Eng. 2023, 277, 114329. [Google Scholar] [CrossRef]
- Song, R.; Gao, S.; Li, Y. A novel approach to multi-USV cooperative search in unknown dynamic marine environment using reinforcement learning. Neural Comput. Appl. 2025, 37, 16055–16070. [Google Scholar] [CrossRef]
- Nantogma, S.; Zhang, S.; Yu, X.; An, X.; Xu, Y. Multi-USV dynamic navigation and target capture: A guided multi-agent reinforcement learning approach. Electronics 2023, 12, 1523. [Google Scholar] [CrossRef]
- Liu, X.; Deng, Y.; Nallanathan, A.; Bennis, M. Federated learning and meta learning: Approaches, applications, and directions. IEEE Commun. Surv. Tutor. 2023, 26, 571–618. [Google Scholar] [CrossRef]
- Xie, Y.; Ma, Y.; Cheng, Y.; Li, Z.; Liu, X. BIT+ TD3 Hybrid Algorithm for Energy-Efficient Path Planning of Unmanned Surface Vehicles in Complex Inland Waterways. Appl. Sci. 2025, 15, 3446. [Google Scholar] [CrossRef]
- Wang, H.; Tan, A.H.; Nejat, G. Navformer: A Transformer Architecture for Robot Target-Driven Navigation in Unknown and Dynamic Environments. IEEE Robot. Autom. Lett. 2024, 9, 6808–6815. [Google Scholar] [CrossRef]
- Cui, Z.; Guan, W.; Zhang, X.; Zhang, G. Autonomous Collision Avoidance Decision-Making Method for USV Based on ATL-TD3 Algorithm. Ocean Eng. 2024, 312, 119297. [Google Scholar] [CrossRef]
- Fossen, T.I. Handbook of Marine Craft Hydrodynamics and Motion Control; John Wiley & Sons: Hoboken, NJ, USA, 2011. [Google Scholar]
- Liu, C.; Zhang, K.; He, Z.; Lai, L.; Chu, X. Clustering Theta Based Segmented Path Planning Method for Vessels in Inland Waterways. Ocean Eng. 2024, 309, 118249. [Google Scholar] [CrossRef]
- Chen, H.; Zheng, H. Research on Full Coverage Path Planning Algorithm of Mobile Robot Based on Astar Improved Algorithm. In Proceedings of the 2022 7th International Conference on Intelligent Information Technology, Messinia, Greece, 25–27 June 2022; pp. 21–27. [Google Scholar]
- Zhang, Q.; Liu, H.; Wang, Y.; Wang, W.; Bian, C.; Chen, X.; Zhang, G.; Wang, J. GNN-COLREGs: Graph Neural Network-Based COLREGs-Compliant Collision Avoidance for Autonomous Surface Vehicles. IEEE Trans. Intell. Transp. Syst. 2023, 24, 8562–8575. [Google Scholar]
- Li, S.; Chen, X.; Kumar, A.; Song, S.-M. ViT-NavAgent: Vision Transformer for Robust Maritime Navigation in Complex Environments. Ocean Eng. 2024, 295, 116845. [Google Scholar]
- Yang, M.; Zhao, L.; Wu, F.; Chen, W.; Zeng, J.; Zhu, Z. FedDRL-USV: Federated Deep Reinforcement Learning Framework for Multi-USV Cooperative Navigation. J. Mar. Sci. Eng. 2024, 12, 445. [Google Scholar]
- Wang, J.; Huang, B.; Zhang, K.; Ye, Y.; Tian, H.; Sun, J. Hybrid-A-PPO: An Efficient Path Planning and Collision Avoidance Framework for Autonomous Marine Vehicles. IEEE Access 2024, 12, 15234–15249. [Google Scholar]














| Symbol | Definition | Dimension/Type |
|---|---|---|
| System Modeling | ||
| USV position and heading in inertial frame | ||
| USV linear and angular velocities in body frame | ||
| Inertia, Coriolis, and Damping matrices | ||
| Control forces and environmental disturbances | ||
| Federated Learning | ||
| Global model weights at communication round t | ||
| Participating Fleet k and its private dataset | - | |
| Local model weight update (gradient) from Fleet k | ||
| RL & Network | ||
| State, Action, and Reward at timestep t | - | |
| Actor policy network and Critic value network | Neural Networks | |
| Dynamic Tactical Relational Graph | Graph Structure | |
| Latent scene embedding from Graph-Gated Transformer |
| Parameter | Symbol | Value | Description |
| Surge Damping | Xu | 100 N·s/m | Linear damping coefficient |
| Sway Damping | 200 N·s/m | Linear damping coefficient | |
| Yaw Damping | 50 N·m·s/rad | Rotational damping coefficient | |
| Position Noise Std. Dev. | 0.1 m | Simulates GPS error | |
| Heading Noise Std. Dev. | 0.5° | Simulates compass error | |
| Simulation Timestep | Δt | 0.1 s | Integration interval |
| Component | Weight/Constant | Value | Purpose |
|---|---|---|---|
| Waypoint Reward | 0.8 | Encourage goal-directed navigation | |
| Safety Reward | 2.0 | Prioritize collision avoidance | |
| COLREGs Reward | 2.5 | Maximize rule compliance | |
| Smoothness Penalty | 0.1 | Encourage energy-efficient control | |
| Progress Constant | 1.0 | Scale waypoint progress reward | |
| Control Constant | 0.01 | Scale control penalty | |
| Collision Penalty | - | −1000 | Terminate episode on collision |
| Parameter Category | Parameter Name | Range/Value | Distribution |
|---|---|---|---|
| Environment | Map Dimensions | 1000 m × 1000 m | Fixed |
| Static Obstacles (Count) | 0∼30 | Uniform Int | |
| Obstacle Size (Radius) | 10 m∼50 m | Gaussian | |
| (μ = 20, σ = 10) | |||
| Current Velocity | 0∼1.5 | Rayleigh | |
| m/s | |||
| Ego Vessel | Start Position | Edge of Map (Rand) | Uniform |
| Goal Position | Opposite Edge (>800 m) | Constraint-based | |
| Dynamic Traffic | Vessel Count | 2∼25 | Poisson |
| (λ = 12) | |||
| Vessel Speed | 5∼20 | Uniform | |
| knots | |||
| Interaction Type | Head-on, Crossing, Overtaking | Ratio 4:4:2 | |
| Aggressiveness (COLREGs violation) | 0∼1.0 | Beta(2, 5) | |
| (Probability) | |||
| Safety | Min Start Distance | 150 m | Hard Constraint |
| TCPA Threshold for Spawning | <60 s | Collision-Course Filtering |
| Model | Test Env: Open Ocean (Success Rate %) | Test Env: Port (Success Rate %) | Test Env: Adversarial (Success Rate %) | Average Success Rate (%) |
|---|---|---|---|---|
| Isolated Specialist A | 95.2 ± 2.1 | 41.5 ± 5.8 | 55.1 ± 4.5 | 63.9 |
| Isolated Specialist B | 45.8 ± 6.2 | 94.8 ± 2.5 | 60.3 ± 5.1 | 67.0 |
| Isolated Specialist C | 61.2 ± 4.9 | 65.7 ± 4.2 | 92.5 ± 3.0 | 73.1 |
| Federated Model (Ours) | 96.5 ± 1.8 | 95.5 ± 2.0 | 94.1 ± 2.4 | 95.4 |
| Task | Metric | Meta-Theta-TPPO (Ours) | Vanilla TPPO (Fine-Tuning) | A+MAML-PPO |
|---|---|---|---|---|
| ZS-Perf (SR %) | 78.5 ± 4.1 | 42.0 ± 5.5 | 51.5 ± 6.2 | |
| AS (episodes) | 8 ± 2 | 125 ± 15 | 45 ± 8 | |
| Final SR (%) | 94.5 ± 2.0 | 95.0 ± 1.8 | 88.0 ± 3.1 | |
| ZS-Perf (SR %) | 71.0 ± 5.2 | 35.5 ± 6.0 | 45.0 ± 5.8 | |
| AS (episodes) | 12 ± 3 | 150 ± 20 | 60 ± 11 | |
| Final SR (%) | 91.0 ± 2.5 | 92.5 ± 2.3 | 81.5 ± 4.0 |
| Method | Year | Cross-Domain Success Rate (%) | Zero-Shot Performance (%) | Adaptation Speed (Episodes) | COLREGs Compliance Rate (%) | Average Path Length (m) | Inference Time (ms) |
|---|---|---|---|---|---|---|---|
| GNN-COLREGs [29] | 2023 | 88.3 ± 3.2 | 65.2 ± 6.1 | 35 ± 7 | 89.5 ± 4.2 | 2510 ± 105 | 8.2 ± 1.1 |
| ViT-NavAgent [30] | 2024 | 84.7 ± 4.5 | 58.0 ± 7.3 | 45 ± 9 | 82.1 ± 5.8 | 2680 ± 135 | 12.5 ± 1.8 |
| FedDRL-USV [31] | 2024 | 91.5 ± 2.8 | 62.5 ± 5.9 | 55 ± 12 | 88.2 ± 3.9 | 2485 ± 98 | 6.8 ± 0.9 |
| Hybrid-A-PPO [32] | 2024 | 82.1 ± 4.8 | 48.3 ± 6.5 | 80 ± 18 | 79.5 ± 6.2 | 2595 ± 142 | 7.5 ± 1.2 |
| Meta-Theta-TPPO (Ours) | 2025 | 95.4 ± 1.9 | 78.5 ± 4.1 | 8 ± 2 | 93.5 ± 2.5 | 2425 ± 98 | 4.5 ± 0.7 |
| Method | Training Time (GPU-Hours) | Model Parameters (M) | Memory Footprint (GB) | FLOPs per Decision (G) |
|---|---|---|---|---|
| GNN-COLREGs | 156 | 8.5 | 3.2 | 2.8 |
| ViT-NavAgent | 210 | 22.3 | 8.5 | 5.7 |
| FedDRL-USV | 185 | 6.2 | 2.8 | 1.9 |
| Hybrid-A-PPO | 98 | 4.1 | 1.5 | 1.2 |
| Meta-Theta-TPPO (Ours) | 156 | 156 | 3.0 | 2.1 |
| Configuration | Cross-Domain SR (%) | Zero-Shot Performance (%) | Adaptation Speed (Episodes) |
|---|---|---|---|
| Scratch Training (No TL, No FL, No ML) | 71.2 ± 5.8 | 35.0 ± 7.2 | 150 ± 25 |
| +Transfer Learning (TL only) | 78.5 ± 4.9 | 45.8 ± 6.5 | 95 ± 18 |
| +Federated Learning (TL + FL) | 91.5 ± 3.2 | 62.5 ± 5.9 | 55 ± 12 |
| Full FMTL (TL + FL + ML) | 95.4 ± 1.9 | 78.5 ± 4.1 | 8 ± 2 |
| Scenario | Metric | Meta-Theta-TPPO (Ours) | Vanilla TPPO (from Scratch) | A+DWA | PPO (Flat) |
|---|---|---|---|---|---|
| Mixed | SR (%) | 96.5 ± 1.8 | 97.0 ± 1.5 | 85.0 ± 3.2 | 72.0 ± 4.1 |
| Mixed | CCR (%) | 92.5 ± 3.0 | 93.0 ± 2.8 | 52.0 ± 8.5 | 61.0 ± 7.2 |
| Mixed | APL (m) | 2425 ± 98 | 2410 ± 95 | 2580 ± 120 | 3150 ± 215 |
| Crossing | SR (%) | 94.0 ± 2.5 | 95.0 ± 2.1 | 71.0 ± 4.5 | 65.0 ± 5.2 |
| Crossing | CCR (%) | 93.5 ± 2.5 | 94.5 ± 2.2 | 45.0 ± 6.8 | 58.0 ± 5.9 |
| Agent Configuration | Collision Rate (%) | Average Minimum Distance (m) |
|---|---|---|
| Meta-TPPO with CBF-Shield (Our full model) | 0.0% | + ε (e.g., 50.1 m) |
| Meta-TPPO without Shield (Ablation) | 14.3% | 38.5 m |
| Vanilla TPPO without Shield (Baseline) | 27.8% | 29.2 m |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Ye, Y.; Tian, H.; Yin, Y.; Zhou, Y.; Xiong, Y.; Wang, Z.; Liu, Y.; Nie, Z.; Zhang, Z.; Wang, Y.; et al. Evolving Collective Intelligence for Unmanned Marine Vehicle Swarms: A Federated Meta-Learning Framework for Cross-Fleet Planning and Control. J. Mar. Sci. Eng. 2026, 14, 82. https://doi.org/10.3390/jmse14010082
Ye Y, Tian H, Yin Y, Zhou Y, Xiong Y, Wang Z, Liu Y, Nie Z, Zhang Z, Wang Y, et al. Evolving Collective Intelligence for Unmanned Marine Vehicle Swarms: A Federated Meta-Learning Framework for Cross-Fleet Planning and Control. Journal of Marine Science and Engineering. 2026; 14(1):82. https://doi.org/10.3390/jmse14010082
Chicago/Turabian StyleYe, Yuhan, Hongjun Tian, Yijie Yin, Yuhan Zhou, Yang Xiong, Zi Wang, Yaojiang Liu, Zinan Nie, Zitong Zhang, Yichen Wang, and et al. 2026. "Evolving Collective Intelligence for Unmanned Marine Vehicle Swarms: A Federated Meta-Learning Framework for Cross-Fleet Planning and Control" Journal of Marine Science and Engineering 14, no. 1: 82. https://doi.org/10.3390/jmse14010082
APA StyleYe, Y., Tian, H., Yin, Y., Zhou, Y., Xiong, Y., Wang, Z., Liu, Y., Nie, Z., Zhang, Z., Wang, Y., & Sun, J. (2026). Evolving Collective Intelligence for Unmanned Marine Vehicle Swarms: A Federated Meta-Learning Framework for Cross-Fleet Planning and Control. Journal of Marine Science and Engineering, 14(1), 82. https://doi.org/10.3390/jmse14010082

