Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (166)

Search Parameters:
Keywords = UAV visual navigation

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
59 pages, 1676 KB  
Review
Vision–Language–Action (VLA) Models for Unmanned Aerial Robotics and Bimanual Manipulation: A Review
by Inkyu Sa, Chanoh Park, Hea-Min Lee, Donghee Noh and Ho Seok Ahn
Drones 2026, 10(6), 412; https://doi.org/10.3390/drones10060412 - 26 May 2026
Viewed by 261
Abstract
Vision–Language–Action (VLA) models unify visual perception, natural-language understanding, and action generation within a single foundation model, allowing a robot to follow instructions such as “fold the towel” or “fly to the red building” directly from camera images. Because VLAs inherit world knowledge from [...] Read more.
Vision–Language–Action (VLA) models unify visual perception, natural-language understanding, and action generation within a single foundation model, allowing a robot to follow instructions such as “fold the towel” or “fly to the red building” directly from camera images. Because VLAs inherit world knowledge from internet-scale pre-training, they have become the dominant framework for learning-based manipulation, with bimanual coordination serving as the most demanding testbed: two arms with 7+ degrees of freedom each must move in concert to fold, assemble, and reorient objects. Unmanned aerial robotics faces a structurally similar challenge: a drone must coordinate thrust, attitude, and increasingly gripper commands from visual observations under strict latency and payload constraints. This review covers 183 contributions spanning 2017–2026 and organized along seven dimensions: VLA architectures, training recipes, action representations, bimanual coordination (2022–2026), unmanned aerial vehicle (UAV) navigation and control (2017–2026), language grounding, and cross-cutting concerns including memory and world models. We show that the coordination strategies, training recipes, and action representations developed for bimanual VLAs transfer to unmanned aerial systems and identify fourteen research directions across both domains. Full article
Show Figures

Graphical abstract

20 pages, 1704 KB  
Article
Digital Twin-Driven Trajectory and Resource Optimization for UAV Swarms in Low-Altitude Urban Logistics and Communication Environments
by Hanyang Tong, Ziyang Song, Zhenyan Zhu and Jinlong Sun
Drones 2026, 10(5), 376; https://doi.org/10.3390/drones10050376 - 14 May 2026
Viewed by 447
Abstract
Unmanned aerial vehicles (UAVs) serve as both communication relays and aerial couriers in modern urban logistics networks. Conventional trajectory optimization methods assume perfect localization and isotropic free-space tracking signal propagation, which limits their effectiveness in urban canyons. To address the positional uncertainty and [...] Read more.
Unmanned aerial vehicles (UAVs) serve as both communication relays and aerial couriers in modern urban logistics networks. Conventional trajectory optimization methods assume perfect localization and isotropic free-space tracking signal propagation, which limits their effectiveness in urban canyons. To address the positional uncertainty and signal blockage from buildings, we propose a digital twin-driven framework for continuous trajectory and resource optimization in UAV swarms. We model an urban environment containing random high-rise structures, applying a non-line-of-sight (NLoS) uncertainty to reflect realistic communication degradation. The digital twin (DT) architecture utilizes a dual-layer spatial representation that captures a dynamically decaying positional uncertainty radius of the recipient. We define a strict visual localization boundary that initiates deterministic target tracking with a state transition mechanism. To manage the complexity of swarm routing, we apply Density-Based Spatial Clustering of Applications with Noise (DBSCAN), assigning one UAV courier and one logistics transfer station to each cluster. The system executes a continuous re-optimization loop using an adaptive multi-objective Genetic Algorithm. This framework jointly minimizes cumulative outage probability and total flight time while enforcing a signal-to-noise ratio threshold and throughput constraints. This continuous adaptation mechanism mitigates NLoS blockage risks, supporting reliable communication and efficient delivery in Global Navigation Satellite System (GNSS)-degraded and obstacle-dense urban environments. Full article
(This article belongs to the Section Innovative Urban Mobility)
Show Figures

Figure 1

24 pages, 6298 KB  
Article
Siamese-ViT: A Local–Global Feature Fusion Method for Real-Time Visual Navigation of UAVs in Real-World Environments
by Yu Cheng, Xixiang Liu, Shuai Chen and Chuan Xu
Remote Sens. 2026, 18(10), 1556; https://doi.org/10.3390/rs18101556 - 13 May 2026
Viewed by 224
Abstract
Visual scene matching navigation (VSMN) for unmanned aerial vehicles (UAVs) boasts advantages such as high precision, high reliability, and autonomy. The biggest challenge lies in the tension between local fine-grained information and global semantics, as well as limited generalization ability in real-world environments. [...] Read more.
Visual scene matching navigation (VSMN) for unmanned aerial vehicles (UAVs) boasts advantages such as high precision, high reliability, and autonomy. The biggest challenge lies in the tension between local fine-grained information and global semantics, as well as limited generalization ability in real-world environments. While existing Transformer-based cross-view geolocation methods enhance global context modeling capabilities, they still generally face issues such as high demands on training data and computational resources, insufficient fusion of local fine-grained information and global semantics, and real-time performance in real-world complex environment. To address these problems, we propose a scene matching and localization algorithm based on the Siamese-ViT. For feature extraction, we use the ViT model to extract global features and K-means clustering to aggregate local features. Combined with the global features extracted by the ViT, a robust local–global feature representation vector is generated. For feature matching, incremental principal component analysis (IPCA) is used to reduce the dimensionality of the high-dimensional feature space, and a KD-tree is constructed for fast feature retrieval to improve matching efficiency. We validated our algorithm on the University-1652 dataset and a dataset of real-world satellite-drone image pairs. The results show that our Siamese-ViT outperforms other models in both Recall and AP. We conduct flight experiments in real-world environments, capturing drone images of complex scenes, including farmland, urban buildings, and waterways. The results show that, at a flight altitude of 350 m, our algorithm achieves an average absolute value of 6.2063 m for latitude, 6.7552 m for longitude, and 10.1922 m for horizontal error. Therefore, our Siamese-ViT demonstrates ideal overall positioning accuracy. Full article
Show Figures

Figure 1

28 pages, 2606 KB  
Article
GRiM-Net: A Two-Stage Cross-View Visual Localization Framework for UAVs
by Yanting Hu and Qinyong Zeng
Remote Sens. 2026, 18(10), 1477; https://doi.org/10.3390/rs18101477 - 8 May 2026
Viewed by 264
Abstract
Autonomous flight of unmanned aerial vehicles (UAVs) in Global Navigation Satellite System (GNSS)-denied environments critically depends on accurate and robust visual localization. To tackle the challenges of cross-view domain discrepancies and real-time high-precision matching, we propose GRiM-Net, a two-stage joint optimization visual localization [...] Read more.
Autonomous flight of unmanned aerial vehicles (UAVs) in Global Navigation Satellite System (GNSS)-denied environments critically depends on accurate and robust visual localization. To tackle the challenges of cross-view domain discrepancies and real-time high-precision matching, we propose GRiM-Net, a two-stage joint optimization visual localization network. First, a global retrieval module aggregates features and selects the most similar satellite map candidate patches from a pre-built index, efficiently narrowing the search from the global map to a local region. Next, a fine matching module performs pixel-level keypoint detection and description on the query image and candidate patches. Bidirectional matching and weighted homography estimation are then used to map the UAV image center to satellite coordinates, yielding precise geographic positions. Both modules share a backbone with domain-adaptive batch normalization, and joint optimization of global retrieval triplet loss with fine matching keypoint, descriptor, and homography reprojection losses enables synergistic enhancement of feature representations. Ablation and comparison experiments conducted on public urban cross-view benchmarks demonstrate that GRiM-Net can achieve efficient and robust geographic coordinate regression for UAVs, providing a practical localization component for broader navigation systems. Full article
Show Figures

Figure 1

20 pages, 2495 KB  
Article
Adaptive UAV Visual Localisation Based on Improved Gradient-Damping Newton Method
by Xunli Zhou, Ancheng Fang, Song Fu, Jiaming Liu, Xiaoge Zhang, Xiong Liao and Jianwei Zhang
Electronics 2026, 15(10), 1974; https://doi.org/10.3390/electronics15101974 - 7 May 2026
Viewed by 348
Abstract
The role of unmanned aerial vehicles (UAVs) in time-sensitive missions such as low-altitude reconnaissance and disaster rescue has gained increasing significance. To address the challenge of visual localisation for UAVs operating in complex terrains under Global Navigation Satellite System (GNSS)-denied environments, this paper [...] Read more.
The role of unmanned aerial vehicles (UAVs) in time-sensitive missions such as low-altitude reconnaissance and disaster rescue has gained increasing significance. To address the challenge of visual localisation for UAVs operating in complex terrains under Global Navigation Satellite System (GNSS)-denied environments, this paper proposes an improved adaptive gradient-damped Newton approach to mitigate the trade-off between terrain non-convexity and computational real-time performance. The proposed approach incorporates a terrain-gradient-based dynamic step-size adjustment mechanism that adaptively captures non-linear terrain characteristics in real time and effectively reduces the numerical oscillations typically observed in steep regions when using the standard Newton method. In addition, a tightly coupled vision–geometry framework was developed to constrain cumulative drift during long-range flight. Monte Carlo simulation results demonstrate that the proposed algorithm maintains submeter localisation accuracy while achieving approximately a three-fold improvement in computational efficiency compared with traditional grid-based methods, and a 27.4% increase in convergence speed relative to the standard Newton method. Experiments conducted under high-noise conditions and highly undulating terrains indicate that the approach exhibits strong convergence stability, offering a computationally efficient and robust solution for UAV navigation. Full article
Show Figures

Figure 1

10 pages, 2099 KB  
Proceeding Paper
Error Correction Using Bayesian GRU Network in Hybrid Visual Inertial Navigation System
by Tarafder Elmi Tabassum, Sorin A. Negru, Ivan Petrunin and Zeeshan Rana
Eng. Proc. 2026, 126(1), 52; https://doi.org/10.3390/engproc2026126052 - 28 Apr 2026
Viewed by 376
Abstract
Vision-based navigation systems (VINS) are increasingly utilised as an alternative to GNSS for UAVs operating in urban environments, but they suffer from performance degradation under visual fault conditions like illumination variation, rapid motion, texture-less environments, and weather effects. While hybrid architecture incorporating Kalman [...] Read more.
Vision-based navigation systems (VINS) are increasingly utilised as an alternative to GNSS for UAVs operating in urban environments, but they suffer from performance degradation under visual fault conditions like illumination variation, rapid motion, texture-less environments, and weather effects. While hybrid architecture incorporating Kalman filters and machine learning (ML) improves performance, they often lack evidence of providing contingency for non-Gaussian error distributions, limiting operational safety. To address these shortcomings, an enhanced hybrid VINS architecture is proposed, featuring a Bayesian GRU-based error correction network (B-GRU) to provide a contingency while compensating model errors. To the best of the authors’ knowledge, this is the first attempt to estimate uncertainty using a B-GRU compensator while addressing data uncertainty for VINS applications. The system architecture integrates an Error-State Kalman Filter (ESKF) and the B-GRU, compensating for position errors with uncertainty prediction. The proposed approach is validated using datasets from MATLAB incorporated in an Unreal Engine simulated environment, replicating the complex fault conditions. The ML model is trained on various visual failure modes to adapt the variability in the signal patterns during flights with simulated datasets and tested across varied flight paths and lighting scenarios. The results demonstrate that the fusion strategy effectively corrects erroneous measurements arising from corrupted sensor data and imperfect models and achieves an improvement of 78.06% compared to SOTA hybrid VIO on the horizontal axis while capturing complex flight dynamics in an unseen environment. A comparative analysis demonstrates the effectiveness of B-GRU in mitigating failure modes with a predictive error boundary, achieving a 72% improvement in performance compared to the architecture that integrates GRU-based error compensation. This approach shows a step forward in enhancing positioning accuracy and contingency in challenging urban environments. Full article
(This article belongs to the Proceedings of European Navigation Conference 2025)
Show Figures

Figure 1

27 pages, 27650 KB  
Article
GLP-VO: A Hybrid Visual Odometry Framework for Low-Altitude UAV Imaging in Complex Urban Environments
by Yuxuan Xu, Bo Jiang, Longyang Huang, Ruokun Qu and Zhiyuan Wang
Drones 2026, 10(5), 329; https://doi.org/10.3390/drones10050329 - 28 Apr 2026
Viewed by 574
Abstract
Accurate and robust UAV navigation in complex urban environments remains challenging due to dense buildings, dynamic obstacles, and unreliable GPS signals. To address this issue, this paper proposes GLP-VO, a hybrid visual odometry framework that combines geometric structure features with point features. An [...] Read more.
Accurate and robust UAV navigation in complex urban environments remains challenging due to dense buildings, dynamic obstacles, and unreliable GPS signals. To address this issue, this paper proposes GLP-VO, a hybrid visual odometry framework that combines geometric structure features with point features. An adaptive weighting strategy is introduced to balance the contributions of different feature types according to matching quality and scene complexity, while geometric constraints are incorporated into the optimization process to improve pose estimation accuracy and stability. Experiments on the TUM RGB-D dataset and real UAV flight sequences verify the effectiveness of the proposed method. GLP-VO achieves the best ATE results in five of the ten evaluated TUM sequences, including 0.91 cm on f1_xyz and 0.62 cm on f3_str_tex_far, and remains competitive on challenging sequences such as f2_360_kidnap with an ATE of 2.26 cm. In the ablation study, the full model reduces ATE and RPE by up to 44.9% and 43.1%, respectively. Moreover, the proposed system runs at approximately 35 FPS on the desktop platform and 11 FPS on the onboard platform, demonstrating a favorable balance between accuracy, robustness, and real-time performance. Full article
(This article belongs to the Special Issue Autonomous Drone Navigation in GPS-Denied Environments)
Show Figures

Figure 1

18 pages, 3878 KB  
Article
Research on Vision-Based Autonomous Landing Fusion Positioning Algorithm for Unmanned Aerial Vehicle
by Hongyuan Zhu, Jing Ni, Nan Yang, Boyang Gao and Xiaoxiong Liu
Machines 2026, 14(5), 460; https://doi.org/10.3390/machines14050460 - 22 Apr 2026
Viewed by 421
Abstract
A multi-task network for runway lines and runway markings based on deep learning was designed to address the issue of prior information dependence on runway width in unmanned aerial vehicle visual autonomous landing application scenarios. By detecting runway images captured at different positions [...] Read more.
A multi-task network for runway lines and runway markings based on deep learning was designed to address the issue of prior information dependence on runway width in unmanned aerial vehicle visual autonomous landing application scenarios. By detecting runway images captured at different positions during flight, the parameters of the runway start line, left and right boundary lines, and runway markings were obtained. On this basis, a runway width estimation model and visual positioning algorithm based on line features were designed. In standard runway scenarios, the recognition of runway signs provides valuable prior information about the runway width. For simplified runways or cases where signs are missing, we have devised a width estimation model based on the left/right boundary lines. Furthermore, considering the variation in pitch angle during the UAV’s landing process, we have analyzed and refined the width estimation model to ensure its applicability throughout the entire landing process. Additionally, we have developed a visual positioning algorithm that utilizes the runway width and runway line parameters to calculate the relative position between the UAV and the runway. Considering the limitations of a single visual positioning algorithm, we adopt a visual and inertial navigation fusion positioning algorithm to enhance the reliability of landing positioning. To validate our algorithms, we have constructed a visual simulation platform and flight test. These tests confirm the effectiveness and accuracy of our detection algorithm and width estimation model. Furthermore, by utilizing the estimated runway width and the detected runway line parameters, we have successfully calculated the relative position, further validating the effectiveness of our positioning algorithm. Full article
(This article belongs to the Special Issue Advanced Flight Control and Intelligent Trajectory Planning in UAVs)
Show Figures

Figure 1

9 pages, 4519 KB  
Proceeding Paper
UAV Position Tracking with Ground Cameras
by Andrea Masiero, Paolo Dabove, Vincenzo Di Pietra, Marco Piragnolo, Alberto Guarnieri, Charles Toth, Wioleta Blaszczak-Bak, Jelena Gabela and Kai-Wei Chiang
Eng. Proc. 2026, 126(1), 50; https://doi.org/10.3390/engproc2026126050 - 15 Apr 2026
Viewed by 464
Abstract
The use of Unmanned Aerial Vehicles (UAVs) has become quite popular in several applications during the last few years. Their spread is motivated by the flexibility of usage of UAVs and by their ability to automatically execute several tasks, mostly thanks to the [...] Read more.
The use of Unmanned Aerial Vehicles (UAVs) has become quite popular in several applications during the last few years. Their spread is motivated by the flexibility of usage of UAVs and by their ability to automatically execute several tasks, mostly thanks to the availability of Global Navigation Satellite Systems (GNSSs), which usually allow reliable outdoor localization of aerial vehicles. However, the extension of task automatic execution indoors, and in other challenging working conditions for the GNSS, requires an alternative positioning system able to compensate for the unreliability or unavailability of GNSS in those cases. To this end, additional sensors are usually considered. Among them, cameras are probably the most popular ones. The most common case of a vision-based positioning system is a camera mounted on a moving platform used to determine its ego-motion in a dead-reckoning approach, i.e., visual odometry. Although this solution is affordable and does not require the installation of any infrastructure, it enables absolute positioning of the camera, i.e., of the UAV, only if certain landmarks, with known position, are visible in the flying area. In contrast, this work considers the use of external cameras installed in the flying area to track the UAV movements. This approach is similar to the one implemented in motion capture systems as well, where a set of static cameras is used to triangulate some target positions using calibrated cameras. Instead, this work investigates the use of vision and machine learning tools to (i) extract the UAV position from each video frame and (ii) estimate its 3D position. Estimation of the 3D UAV position is performed with a single camera, exploiting machine learning tools in order to avoid the need for camera calibration. Performance analysis is provided for a dataset collected at the Agripolis campus of the University of Padua. Full article
(This article belongs to the Proceedings of European Navigation Conference 2025)
Show Figures

Figure 1

26 pages, 5737 KB  
Article
An Improved PST-Based Visual Pose Estimation Algorithm for UAV Navigation
by Shengxin Yu, Jinfa Xu and Tianhan Yang
Appl. Sci. 2026, 16(7), 3551; https://doi.org/10.3390/app16073551 - 5 Apr 2026
Viewed by 376
Abstract
Vision-based pose estimation has been widely applied in unmanned aerial vehicle (UAV) navigation. However, existing visual pose estimation algorithms are highly sensitive to camera imaging distortion, which degrades estimation accuracy, and often suffer from noticeable jitter between frames in dynamic scenarios. To address [...] Read more.
Vision-based pose estimation has been widely applied in unmanned aerial vehicle (UAV) navigation. However, existing visual pose estimation algorithms are highly sensitive to camera imaging distortion, which degrades estimation accuracy, and often suffer from noticeable jitter between frames in dynamic scenarios. To address these issues, this paper proposes an improved visual pose estimation algorithm built upon the Perspective Similar Triangle (PST) geometric model. Using a planar fiducial marker as the observation target, the single-frame pose estimation problem is reformulated as a hierarchical geometric inference framework, including image point distortion correction, depth recovery based on planar similar triangle constraint, and rigid transformation estimation between the camera and world coordinate systems. This formulation improves pose estimation accuracy under distorted imaging conditions. To accommodate distortion variations in practical scenarios, a radial distortion coefficient update method is further designed to adaptively adjust the radial distortion parameters under single-frame observations, ensuring that the distortion model remains consistent with the actual imaging distortion and providing reliable model inputs for distortion correction in pose estimation. In addition, to enhance pose stability in dynamic scenarios, a multi-frame optical center consistency constraint (MOCCC) method is introduced to optimize the pose estimation for more stability. By constraining pose estimation across adjacent frames using the mean optical center over multiple frames as the optimization objective, the proposed method effectively suppresses pose jitter caused by single-frame observation noise. Finally, a three-degree-of-freedom (3-DOF) attitude motion platform is established, and both static and dynamic experimental scenarios are designed to validate the accuracy and stability of the proposed algorithm. Experimental results demonstrate that the proposed algorithm achieves high accuracy and high stability pose estimation under imaging distortion and small perturbations, exhibiting good robustness and suitability for practical UAV visual navigation applications. Full article
Show Figures

Figure 1

25 pages, 36715 KB  
Article
Development of an Autonomous UAV for Multi-Modal Mapping of Underground Mines
by Luis Escobar, David Akhihiero, Jason N. Gross and Guilherme A. S. Pereira
Robotics 2026, 15(3), 63; https://doi.org/10.3390/robotics15030063 - 19 Mar 2026
Viewed by 1776
Abstract
Underground mine inspection is a critical operation for safety and resource management. It presents unique challenges, including confined spaces, harsh environments, and the lack of reliable positioning systems. This paper presents the design, development, and evaluation of an Unmanned Aerial Vehicle (UAV) specifically [...] Read more.
Underground mine inspection is a critical operation for safety and resource management. It presents unique challenges, including confined spaces, harsh environments, and the lack of reliable positioning systems. This paper presents the design, development, and evaluation of an Unmanned Aerial Vehicle (UAV) specifically engineered for supervised autonomous inspection in subterranean scenarios. Key technical contributions include mechanical adaptations for collision tolerance, an optimized sensor-actuator selection for navigation, and the deployment of a mission-governing state machine for seamless autonomous acquisition. Furthermore, we detail the data treatment workflow, employing a multi-modal point cloud registration technique that successfully integrates high-resolution visual-depth scans of critical mine pillars into a comprehensive, globally referenced map derived from Light Detection and Ranging (LiDAR) data of the entire workspace. We show experiments that illustrate and validate our approach in two real-world scenarios, a simulated coal mine used to train mine rescue teams and an operating Limestone mine. Full article
(This article belongs to the Special Issue Localization and 3D Mapping of Intelligent Robotics)
Show Figures

Figure 1

30 pages, 11789 KB  
Article
A Multi-Source Data Fusion-Based Method for Safety Monitoring of Construction Workers on Concrete Placement Surfaces
by Jijiang Chen, Zijun Zhang, Xiao Sun, Yanyin Zhou, Yao Zhou, Yingjie Zhao and Jun Shi
Buildings 2026, 16(6), 1165; https://doi.org/10.3390/buildings16061165 - 16 Mar 2026
Viewed by 510
Abstract
Concrete placement surfaces are characterized by intensive construction processes, frequent equipment interactions, and strong spatial dynamics, which make it difficult to identify unsafe actions of construction workers in real time and to accurately quantify and warn about regional safety risks. To address these [...] Read more.
Concrete placement surfaces are characterized by intensive construction processes, frequent equipment interactions, and strong spatial dynamics, which make it difficult to identify unsafe actions of construction workers in real time and to accurately quantify and warn about regional safety risks. To address these challenges, this study proposes a safety monitoring method for construction workers operating on complex concrete placement surfaces. First, a coupled risk assessment framework integrating regional hazard levels, unsafe action risks, and worker authorization is established based on trajectory intersection theory (TIT). Subsequently, a multi-source continuous sensing system is developed by integrating global navigation satellite system (GNSS) positioning, inertial measurement unit (IMU)-based human activity recognition (HAR) using a BiLSTM-Attention model, and unmanned aerial vehicle (UAV)-based 3D realistic scene modeling. On this basis, real-time visualization and risk warning of worker trajectories, action states, and spatial risks are achieved through multi-source data fusion and a WebGL-based visualization platform. Field validation results indicate that the proposed system can generate alarm outputs that are consistent with the predefined risk rules within 3 s in typical construction scenarios, demonstrating rule-consistent real-time feasibility and stable system response performance. Full article
(This article belongs to the Section Construction Management, and Computers & Digitization)
Show Figures

Figure 1

23 pages, 2271 KB  
Article
Adaptive Particle Filter-Neural Network Fusion for Cooperative Localization of Multi-UAV Systems in GNSS-Denied Indoor Environments
by Zhongyi Wang, Hao Wang and Shuzhi Liu
Computers 2026, 15(3), 172; https://doi.org/10.3390/computers15030172 - 6 Mar 2026
Viewed by 799
Abstract
Accurate autonomous navigation of unmanned aerial vehicles (UAVs) in complex indoor environments where satellite signals are denied remains a critical challenge. Conventional state estimation methods, such as particle filters, often suffer from particle degeneracy and high computational costs, limiting their robustness and real-time [...] Read more.
Accurate autonomous navigation of unmanned aerial vehicles (UAVs) in complex indoor environments where satellite signals are denied remains a critical challenge. Conventional state estimation methods, such as particle filters, often suffer from particle degeneracy and high computational costs, limiting their robustness and real-time applicability. Here, we introduce an adaptive particle filter-neural network (PF-NN) fusion framework that achieves high-fidelity cooperative localization for multi-UAV systems. Our approach integrates a lightweight neural network that optimizes particle weight allocation by learning from motion consistency, thereby mitigating sample impoverishment. This is coupled with an adaptive resampling strategy that dynamically adjusts the particle population based on the effective sample size, balancing computational load with estimation accuracy. By fusing ultra-wideband (UWB) inter-vehicle ranging with visual landmark observations, the system leverages both global and local constraints to achieve robust state estimation. In simulations involving six UAVs in a complex indoor setting, our algorithm demonstrated superior performance, achieving an average root-mean-square error (RMSE) of 0.437 m. This work provides a robust and efficient solution for multi-UAV cooperative localization, paving the way for reliable autonomous operations in GNSS-denied scenarios such as search-and-rescue and industrial inspection. Full article
(This article belongs to the Special Issue AI in Action: Innovations and Breakthroughs)
Show Figures

Graphical abstract

27 pages, 10846 KB  
Article
A Multimodal Feature Fusion Framework for UAV Positioning in Weak GNSS Environments Using a Priori High-Resolution Satellite Imagery
by Liming He, Zhengqi Zhao, Zhenglin Qu, Ronghua He, Yu Zhang, Haoran Li and Yadong Zhu
Remote Sens. 2026, 18(5), 752; https://doi.org/10.3390/rs18050752 - 2 Mar 2026
Viewed by 934
Abstract
To address the challenges of unmanned aerial vehicle (UAV) navigation in weak global navigation satellite system (GNSS) environments, this study proposes a novel multimodal feature fusion framework for real-time positioning using a priori high-resolution satellite imagery. This framework utilizes georeferenced satellite images as [...] Read more.
To address the challenges of unmanned aerial vehicle (UAV) navigation in weak global navigation satellite system (GNSS) environments, this study proposes a novel multimodal feature fusion framework for real-time positioning using a priori high-resolution satellite imagery. This framework utilizes georeferenced satellite images as matching sources and employs a “Multimodal features + LightGlue” algorithm to achieve high-precision cross-modal matching. By combining point, line, and plane features for enhanced robustness in low-texture scenarios, the system further integrates LightGlue’s lightweight confidence classifier to accelerate inference while maintaining high accuracy on challenging image pairs. Consequently, the proposed method outperforms LoFTR, RoMa, SuperPoint + SuperGlue, and SuperPoint + LightGlue in matching performance. Experimental results demonstrate that at a flight altitude of 80 m, the average real-time positioning error is 0.73 m, which increases to 6.24 m at 480 m. Factors such as ground object type, seasonal changes, flight altitude, and satellite image scale significantly influence accuracy. This research demonstrates that the visual navigation system meets practical operational needs for real-time UAV positioning in GNSS-deprived environments. Full article
Show Figures

Figure 1

25 pages, 13812 KB  
Article
Robust and Cost-Effective Vision-Based Indoor UAV Localization with RWA-YOLO
by Feifei Wang, Kun Sun and Yuanqing Wang
Sensors 2026, 26(5), 1469; https://doi.org/10.3390/s26051469 - 26 Feb 2026
Viewed by 538
Abstract
Accurate indoor localization for unmanned aerial vehicles (UAVs) remains challenging in GPS-denied environments, especially for small-object detection and under low-light conditions. We propose Robust Wavelet-Aware YOLO (RWA-YOLO), a vision-based detection framework that integrates a wavelet-aware attention fusion module with a dual multi-path aggregation [...] Read more.
Accurate indoor localization for unmanned aerial vehicles (UAVs) remains challenging in GPS-denied environments, especially for small-object detection and under low-light conditions. We propose Robust Wavelet-Aware YOLO (RWA-YOLO), a vision-based detection framework that integrates a wavelet-aware attention fusion module with a dual multi-path aggregation mechanism to enhance small-object detection and multi-scale feature representation. UAV-mounted LEDs are utilized to ensure robust visual perception in low-light indoor scenarios. The UAV’s three-dimensional position is estimated through multi-view geometric triangulation without relying on external beacons or artificial markers. Beyond static localization, the system is validated under dynamic flight conditions, demonstrating smooth and temporally coherent trajectory reconstruction suitable for real-time control loops (update rate 25FPS). Extensive experiments in real indoor environments achieve centimeter-level localization accuracy (root mean square error: 9.9 mm, 95th percentile error: 13.5 mm), outperforming state-of-the-art vision-based methods and achieving accuracy comparable to or better than representative hybrid ultra-wideband–vision systems reported in the literature. These results confirm the effectiveness, robustness, and real-time capability of RWA-YOLO for indoor UAV navigation in constrained environments. Full article
(This article belongs to the Section Navigation and Positioning)
Show Figures

Figure 1

Back to TopTop