Vision-Based Unmanned Aerial Vehicle Swarm Cooperation and Online Point-Cloud Registration for Global Localization in Global Navigation Satellite System-Intermittent Environments
Highlights
- A unified, lightweight framework is introduced that fuses bio-inspired passive vision swarm coordination with real-time monocular point-cloud registration, enabling UAV teams to maintain formation and achieve global localization without GNSS, active ranging, or communication.
- Experiments with drone–drone and drone–ground robot pairs show that online cooperative map alignment improves spatial consistency compared to baseline monocular SLAM, even under heterogeneous viewpoints, sparse maps, and global positioning outage.
- The framework provides a scalable and sensor-efficient strategy for resilient multi-drone autonomy in GNSS-intermittent environments such as tunnels, industrial interiors, underground facilities, and disaster zones.
- By maintaining both coordination coherence (stable, collision-free group motion) and spatial coherence (shared global map alignment), the method enables real-time cooperative navigation for mixed aerial–ground teams operating under strict payload and communication constraints.
Abstract
1. Introduction
- A unified cooperative perception and coordination architecture integrating passive-vision, biologically inspired swarm-keeping with cross-platform monocular point cloud fusion;
- A real-time registration pipeline robust to sparse, drone-led heterogeneous monocular maps, enabling fast alignment between different autonomous vehicles’ vSLAM maps during motion;
- Integrated experimental validation in an indoor GNSS-denied testbed, demonstrating stable formation keeping from passive vision, global map alignment improving spatial coherence, and real-time cooperative localization under extended GNSS outages.
2. Materials and Methods
2.1. Swarm Coordination Using Passive Vision
- 1.
- Identify the closest neighbors within each lateral sector.
- 2.
- Compare relative proximity between opposing lateral sectors. For example, for sectors upper (neighbor a at distance ) versus lower (neighbor b at distance )
- 3.
- Select the closest neighbors in the forward sectors.
- 4.
- Compare the relative proximity of forward neighbors to a predefined forward preferred distance (based on the expected relative speed in the forward direction) using an approach similar to that in 1.
- 5.
- If any opposing lateral sector pair contains only one neighbor, apply a similar logic using a lateral preferred distance .
- For opposing lateral sectors containing neighbors in both, generate a speed correction toward the center of the sector with the more distant neighbor. The correction magnitude is fixed and determined through simulation. If relative sizes (proxy for distance) are similar, no correction is applied.
- For opposing lateral sectors with only one neighbor, assume a neighbor exists at the preferred lateral distance and apply the same correction logic.
- Apply the same approach for forward sectors using the preferred forward distance .
- If no neighbors exist in either lateral or forward sectors, no correction is generated.
- Sum all corrections and adjust the commanded speed accordingly.
Passive Distance or Depth Estimation
- Detecting all connected components in the binary image and computing their centroids and pixel areas ; discarding components with areas outside a predefined valid range;
- Computing pairwise distances between remaining centroids and retaining the pair with the smallest separation and with an orientation angle within an acceptable range;
- Associating the selected centroids with those from the previous frame using nearest-neighbor matching to maintain unique marker identities, and selecting of their inter-distance ;
- Repeating the process for each incoming frame, with no need for external initialization.
2.2. VSLAM Theory
2.2.1. Monocular vSLAM
2.2.2. Stereo Calibration and Image Rectification
- Image Distortion Correction: Radial distortion is corrected using the camera’s intrinsic calibration matrix , obtained through camera intrinsic calibration [25]:
- Feature Matching: The detected features are matched between the two images by comparing their descriptors (e.g., via Hamming distance).
2.2.3. Pose Calculation
2.3. Online Cooperative Point Cloud Registration
- Initial Alignment: Begin with an initial transformation guess, often the identity matrix or a prior estimate from a coarse alignment method.
- Closest Point Matching: For each point in the source cloud, identify its nearest neighbor (the closest point) in the target cloud.
- Transformation Estimation: Compute the rigid transformation that minimizes the mean squared error (MSE) between the established matched point pairs.
- Apply Transformation: Update the source point cloud’s position and orientation using the estimated transformation.
- Iteration: Repeat Steps 2 through 4 until convergence, defined by a minimal reduction in the MSE or reaching a maximum iteration limit.
3. Results
3.1. Swarm-Keeping Performance
3.1.1. Simulation Test
3.1.2. Flight Test
3.2. Online Point Cloud Registration
3.2.1. First Configuration: Two Drones
3.2.2. Second Configuration: One Ground Vehicle and One Drone
4. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Tong, P.; Yang, X.; Yang, Y.; Liu, W.; Wu, P. Multi-UAV Collaborative Absolute Vision Positioning and Navigation: A Survey and Discussion. Drones 2023, 7, 261. [Google Scholar] [CrossRef]
- Liu, H.; Fu, Y.; Ma, Y.; Zhang, W. Multimodal Fusion and Dynamic Resource Optimization for Robust Cooperative Localization of Low-Cost UAVs. Drones 2025, 9, 820. [Google Scholar] [CrossRef]
- Lee, I.; Sung, C.; Lee, H.; Nam, S.; Oh, J.; Lee, K.; Park, C. Georeferenced UAV Localization in Mountainous Terrain Under GNSS-Denied Conditions. Drones 2025, 9, 709. [Google Scholar] [CrossRef]
- Yao, F.; Lan, C.; Wang, L.; Wan, H.; Gao, T.; Wei, Z. GNSS-denied geolocalization of UAVs using terrain-weighted constraint optimization. Int. J. Appl. Earth Obs. Geoinf. 2024, 135, 104277. [Google Scholar] [CrossRef]
- Xu, W.; Yang, D.; Liu, J.; Li, Y.; Zhou, M. A Visual Navigation Algorithm for UAV Based on Visual-Geography Optimization. Drones 2024, 8, 313. [Google Scholar] [CrossRef]
- Sarlin, P.E.; DeTone, D.; Malisiewicz, T.; Rabinovich, A. SuperGlue: Learning Feature Matching with Graph Neural Networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 4938–4947. [Google Scholar] [CrossRef]
- Sun, J.; Shen, Z.; Wang, Y.; Bao, H.; Zhou, X. LoFTR: Detector-Free Local Feature Matching with Transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 8922–8931. [Google Scholar] [CrossRef]
- Besl, P.J.; McKay, N.D. A Method for Registration of 3-D Shapes. IEEE Trans. Pattern Anal. Mach. Intell. 1992, 14, 239–256. [Google Scholar] [CrossRef]
- Kim, G.; Kim, A. Scan Context: Egocentric Spatial Descriptor for Place Recognition Within 3D Point Cloud Map. In Proceedings of the 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Madrid, Spain, 1–5 October 2018; pp. 4802–4809. [Google Scholar] [CrossRef]
- Kim, G.; Choi, S.; Kim, A. Scan Context++: Structural Place Recognition Robust to Rotation and Lateral Variations in Urban Environments. IEEE Trans. Robot. 2021, 38, 1856–1874. [Google Scholar] [CrossRef]
- Garcia, G.; Eskandarian, A. Point Cloud Registration for Visual Geo-referenced Localization between Aerial and Ground Robots. In Proceedings of the 22nd International Conference on Informatics in Control, Automation and Robotics, Marbella, Spain, 20–22 October 2025; Volume 2, pp. 211–218. [Google Scholar]
- Ballerini, M.; Cabibbo, N.; Candelier, R.; Cavagna, A.; Cisbani, E.; Giardina, I.; Lecomte, V.; Orlandi, A.; Parisi, G.; Procaccini, A.; et al. Interaction ruling animal collective behavior depends on topological rather than metric distance: Evidence from a field study. Proc. Natl. Acad. Sci. USA 2008, 105, 1232–1237. [Google Scholar] [CrossRef]
- Cavagna, A.; Cimarelli, A.; Giardina, I.; Parisi, G.; Santagati, R.; Stefanini, F.; Viale, M. Scale-free correlations in starling flocks. Proc. Natl. Acad. Sci. USA 2010, 107, 11865–11870. [Google Scholar] [CrossRef]
- Reynolds, C.W. Flocks, herds, and schools: A distributed behavioral model. Comput. Graph. 1987, 21, 25–34. [Google Scholar] [CrossRef]
- Garcia, G.; Eskandarian, A. Bio-Inspired UAS Swarm-Keeping based on Computer Vision. In Proceedings of the 2024 International Conference on Unmanned Aircraft Systems (ICUAS), Chania, Crete, Greece, 4–7 June 2024. [Google Scholar]
- Cieslewski, T.; Choudhary, S.; Scaramuzza, D. Data-Efficient Decentralized Visual SLAM. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Brisbane, Australia, 21–25 May 2018; pp. 2466–2473. [Google Scholar] [CrossRef]
- Carpin, S. Fast and accurate map merging for multi-robot systems. Auton. Robot. 2008, 25, 305–316. [Google Scholar] [CrossRef]
- Sunil, S.; Mozaffari, S.; Singh, R.; Shahrrava, B.; Alirezaee, S. Feature-Based Occupancy Map-Merging for Collaborative SLAM. Sensors 2023, 23, 3114. [Google Scholar] [CrossRef]
- Chen, W.; Wang, X.; Wang, Z.; Lin, X.; Chen, M.; Hu, K. Overview of Multi-Robot Collaborative SLAM from the Perspective of Data Fusion. Machines 2023, 11, 653. [Google Scholar] [CrossRef]
- Vodisch, N.; Cattaneo, D.; Burgard, W.; Valada, A. CoVIO: Online Continual Learning for Visual-Inertial Odometry. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 17–24 June 2023; pp. 2464–2473. [Google Scholar] [CrossRef]
- Major, P.F.; Dill, L.M. The three-dimensional structure of airborne bird flocks. Behav. Ecol. Sociobiol. 1978, 4, 111–122. [Google Scholar] [CrossRef]
- Bajec, I.L.; Zimic, N.; Mraz, M. Flocks on the wing: The fuzzy approach. J. Theor. Biol. 2005, 223, 199–220. [Google Scholar] [CrossRef]
- Martin, G.R. What is binocular vision for? A birds’ eye view. J. Vis. 2009, 9, 1–19. [Google Scholar] [CrossRef]
- Yang, L.; Kang, B.; Huang, Z.; Zhao, Z.; Xu, X.; Feng, J.; Zhao, H. Depth Anything V2. arXiv 2024, arXiv:2406.09414. [Google Scholar] [CrossRef]
- Zhang, Z. A Flexible New Technique for Camera Calibration. IEEE Trans. Pattern Anal. Mach. Intell. 2000, 22, 1330–1334. [Google Scholar] [CrossRef]
- Jian, B.; Vemuri, B.C. Robust Point Set Registration Using Gaussian Mixture Models. IEEE Trans. Pattern Anal. Mach. Intell. 2011, 33, 1633–1645. [Google Scholar] [CrossRef]
- Szeliski, R. Computer Vision: Algorithms and Applications; Springer: London, UK, 2010. [Google Scholar]
- Hartley, R.; Zisserman, A. Multiple View Geometry in Computer Vision, 2nd ed.; Cambridge University Press: Cambridge, UK, 2003. [Google Scholar]
- Rublee, E.; Rabaud, V.; Konolige, K.; Bradski, G. ORB: An efficient alternative to SIFT or SURF. In Proceedings of the 2011 International Conference on Computer Vision, Barcelona, Spain, 6–13 November 2011; pp. 2564–2571. [Google Scholar] [CrossRef]
- Lowe, D.G. Distinctive Image Features from Scale-Invariant Keypoints. Int. J. Comput. Vis. 2004, 60, 91–110. [Google Scholar] [CrossRef]
- Bay, H.; Tuytelaars, T.; Van Gool, L. SURF: Speeded Up Robust Features. In Computer Vision—ECCV 2006; Springer: Berlin/Heidelberg, Germany, 2006; pp. 404–417. [Google Scholar] [CrossRef]
- Rusinkiewicz, S.; Levoy, M. Efficient variants of the ICP algorithm. In Proceedings of the Proceedings Third International Conference on 3-D Digital Imaging and Modeling, Quebec City, QC, Canada, 28 May–1 June 2001; pp. 145–152. [Google Scholar] [CrossRef]
- Crazyflie 2.1 Plus. Available online: https://www.bitcraze.io/products/crazyflie-2-1-plus/ (accessed on 19 November 2025).
- Wifibot Company. Wifibot Lab V4: 4-Wheel Drive Autonomous Platform. 2025. Available online: https://www.wifibot.com (accessed on 21 November 2025).










Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Garcia, G.; Eskandarian, A. Vision-Based Unmanned Aerial Vehicle Swarm Cooperation and Online Point-Cloud Registration for Global Localization in Global Navigation Satellite System-Intermittent Environments. Drones 2026, 10, 65. https://doi.org/10.3390/drones10010065
Garcia G, Eskandarian A. Vision-Based Unmanned Aerial Vehicle Swarm Cooperation and Online Point-Cloud Registration for Global Localization in Global Navigation Satellite System-Intermittent Environments. Drones. 2026; 10(1):65. https://doi.org/10.3390/drones10010065
Chicago/Turabian StyleGarcia, Gonzalo, and Azim Eskandarian. 2026. "Vision-Based Unmanned Aerial Vehicle Swarm Cooperation and Online Point-Cloud Registration for Global Localization in Global Navigation Satellite System-Intermittent Environments" Drones 10, no. 1: 65. https://doi.org/10.3390/drones10010065
APA StyleGarcia, G., & Eskandarian, A. (2026). Vision-Based Unmanned Aerial Vehicle Swarm Cooperation and Online Point-Cloud Registration for Global Localization in Global Navigation Satellite System-Intermittent Environments. Drones, 10(1), 65. https://doi.org/10.3390/drones10010065

