MDPI - Publisher of Open Access Journals

36 pages, 2263 KB

Open AccessArticle

Probabilistic Evaluation of Measurement Uncertainty and Decision Risk in UAV-Based Dimensional Inspection

by Dmytro Malakhov, Tatiana Kelemenová and Michal Kelemen

Drones 2026, 10(6), 405; https://doi.org/10.3390/drones10060405 - 24 May 2026

Unmanned aerial vehicles (UAVs) are increasingly used for remote dimensional inspection in transportation monitoring and infrastructure control. In such applications, measurement results are often interpreted relative to regulatory thresholds, making the reliability of inspection decisions strongly dependent on measurement uncertainty. This study presents [...] Read more.

Unmanned aerial vehicles (UAVs) are increasingly used for remote dimensional inspection in transportation monitoring and infrastructure control. In such applications, measurement results are often interpreted relative to regulatory thresholds, making the reliability of inspection decisions strongly dependent on measurement uncertainty. This study presents a probabilistic framework for evaluating measurement uncertainty and decision risk in UAV-based dimensional inspection tasks. A measurement model describing uncertainty scaling with observation geometry is formulated, and the probability of exceedance relative to a regulatory limit is derived. The framework integrates probabilistic measurement modeling with a risk-based decision formulation that accounts for false-positive and false-negative inspection outcomes. The resulting integral inspection risk is analyzed for representative sensing modalities commonly used in UAV platforms, including vision-based systems, LiDAR, and radar sensors. The results demonstrate that uncertainty scaling with flight altitude significantly influences exceedance probability and decision reliability. Sensors with lower intrinsic dispersion maintain sharper threshold transitions and therefore provide more stable regulatory decisions. Sensitivity analysis further confirms that moderate variations in measurement uncertainty can substantially affect inspection risk. The proposed framework provides a quantitative tool for evaluating sensing technologies in UAV-based inspection missions and supports the design of reliable drone-assisted dimensional compliance monitoring systems. Full article

28 pages, 5551 KB

Open AccessArticle

Capacity-Aware Lightweight Object Detection for UAV Remote Sensing: Dynamic Coupling Regularity and the SP-YOLO Model Family

by Shihao Yin and Weiqiang Tang

Appl. Sci. 2026, 16(11), 5249; https://doi.org/10.3390/app16115249 - 23 May 2026

Abstract

Object detection in UAV remote sensing imagery is confronted with three primary challenges: severe scale variation, densely clustered small targets, and constrained computational resources. This work introduces a family of lightweight detection models guided by the “Capacity-Aware Configuration Regularity” and incorporates a Feature-Refinement [...] Read more.

Object detection in UAV remote sensing imagery is confronted with three primary challenges: severe scale variation, densely clustered small targets, and constrained computational resources. This work introduces a family of lightweight detection models guided by the “Capacity-Aware Configuration Regularity” and incorporates a Feature-Refinement C2f module to enhance representational efficiency. A dynamic coupling mechanism is identified between detection head capacity and the representational quality of Backbone features, which is further validated through systematic ablation studies spanning three parameter magnitudes. Evaluated on the VisDrone2019 benchmark, the proposed model family exhibits a progressive parameter scaling from 1.67 M to 6.15 M. The nano variant achieves 31.7% mAP₅₀ using only 55% of the parameter budget of YOLOv8n, surpassing it by 0.7 percentage points. The small variant, with a parameter budget comparable to YOLOv8n, attains 36.7% mAP₅₀, exceeding it by 5.7 points. The medium variant reaches 43.1% mAP₅₀ with 58% of the parameters of YOLOv8s, outperforming it by 4.1 points. The improvements are pronounced under the stricter mAP_50–95 metric, where the small variant outperforms YOLOv8n by 3.3 points and the medium variant surpasses YOLOv8s by 2.8 points, demonstrating robust localization accuracy across a wide range of IoU thresholds. This consistent superiority in the accuracy–efficiency trade-off extends to the DIOR dataset, confirming the robust generalization of the proposed models across diverse remote sensing scenarios. Moreover, the uncovered capacity-matching regularity offers transferable methodological guidance for designing lightweight detection models tailored to resource-constrained platforms. Full article

(This article belongs to the Section Applied Industrial Technologies)

26 pages, 13961 KB

Open AccessArticle

A UAV–3DGS–VR Workflow for Scenario-Comparable Immersive Review in Heritage Landscapes

by Xintong Li, Wenqi Sheng, Yixuan Tang, Yingwen Yu and Yuyang Peng

Drones 2026, 10(6), 404; https://doi.org/10.3390/drones10060404 - 23 May 2026

Abstract

Unmanned aerial vehicles (UAVs) are widely used for documentation, surveying, and 3D modeling in the built environment, yet their outputs often remain difficult to reuse for immersive comparison of alternative construction scenarios. This study presents a low-cost UAV-to-3DGS-to-VR workflow for constructing scenario-comparable immersive [...] Read more.

Unmanned aerial vehicles (UAVs) are widely used for documentation, surveying, and 3D modeling in the built environment, yet their outputs often remain difficult to reuse for immersive comparison of alternative construction scenarios. This study presents a low-cost UAV-to-3DGS-to-VR workflow for constructing scenario-comparable immersive environments for built-environment review. The workflow combines multi-angle UAV imagery, point-cloud-based geometric anchoring, 3D Gaussian Splatting (3DGS), and Unity-based virtual reality (VR) to transform drone-captured reality into a reusable scene for controlled scenario comparison. The workflow is demonstrated in Middenbeemster, the central town of the Beemster polder World Heritage property. One present-condition scene (M0) and three alternative construction scenarios (M1 to M3) were created within a shared spatial reference. Reconstruction quality was assessed using PSNR and SSIM, and the VR scenes were further evaluated through eye-tracking, head-motion recording, and subjective ranking. The results indicate that the workflow can generate visually reliable and directly comparable immersive scenes from UAV data in this case study. Behavioral and subjective findings showed a consistent pattern, with M1 appearing more compatible than M2 and M3 in this pilot evaluation. The study contributes a pilot UAV-based workflow that links reality capture, immersive scenario comparison, and supplementary behavioral evidence within one process. Full article

(This article belongs to the Topic 3D Documentation of Natural and Cultural Heritage)

20 pages, 58594 KB

Open AccessArticle

FLKFormer: Frequency-Enhanced Large-Kernel Framework for Object Detection in UAV Imagery

by Yunhao Chen, Wen-Zhun Huang, Zhen Wang, Sihao Zeng and Chen Yang

Remote Sens. 2026, 18(11), 1686; https://doi.org/10.3390/rs18111686 - 22 May 2026

Viewed by 87

Abstract

UAV object detection remains challenging due to large scale variation, dense small objects, frequent occlusion, and complex background interference. Existing CNN-based detectors are often limited by weak small-object representation, while Transformer-based detectors may not adequately preserve local details in dense aerial scenes. This [...] Read more.

UAV object detection remains challenging due to large scale variation, dense small objects, frequent occlusion, and complex background interference. Existing CNN-based detectors are often limited by weak small-object representation, while Transformer-based detectors may not adequately preserve local details in dense aerial scenes. This paper proposes a dual-path detection framework that integrates frequency-domain enhancement with large-kernel convolution and Transformer-based global modeling. An FFT Large-Kernel Convolution (FFLKC) module is introduced to enhance high-frequency details and enlarge the effective receptive field. A Transformer pathway with Full-Process Feature Attention (FPFA) is designed to strengthen long-range dependency modeling and semantic representation. A Frequency-Semantic Memory-guided Adaptive Fusion (FMSAF) module is further employed to integrate local detail features and global contextual information. Experiments on UAVDT and VisDrone demonstrate that the proposed method achieves superior overall detection performance and stronger small-object perception than mainstream detectors. The method reaches 58.7

A P

and 51.8

A P_{S}

on UAVDT, and 39.4

A P

and 30.5

A P_{S}

on VisDrone. Qualitative and quantitative results verify the effectiveness of the proposed design in improving detection quality under complex UAV backgrounds. Full article

(This article belongs to the Topic Unmanned Vehicles Technology and Embodied Intelligence Systems for Intelligent Transportation)

19 pages, 5072 KB

Open AccessArticle

MDCL-DETR: Multi-Domain Enhancement and Cross-Layer Feature Fusion for Small Object Detection

by Tianran Hao, Xiao Zhang and Bing Zhou

Sensors 2026, 26(11), 3305; https://doi.org/10.3390/s26113305 - 22 May 2026

Viewed by 148

Abstract

Small object detection in uncrewed aerial vehicle (UAV) imagery is hindered by limited pixels, insufficient detailed information, and strong background interference, leading to weak feature representation and poor contextual modeling. To address these issues, we propose a multi-domain enhancement and cross-layer feature fusion [...] Read more.

Small object detection in uncrewed aerial vehicle (UAV) imagery is hindered by limited pixels, insufficient detailed information, and strong background interference, leading to weak feature representation and poor contextual modeling. To address these issues, we propose a multi-domain enhancement and cross-layer feature fusion detection Transformer (MDCL-DETR) with progressive feature processing. First, a multi-domain enhancement module (MDEM) based on CSP (cross stage partial) structure is proposed, which fuses spatial and frequency-domain features in a lightweight manner to enhance object detail and global structures while effectively distinguishing object features from background interference. Second, a cross-layer feature extraction module (CLEM) is introduced to aggregate multi-scale features across layers, alleviate information loss caused by downsampling, and preserve spatial details of small objects while integrating high-level contextual semantics. Meanwhile, a gated Mamba fusion module (GMFM) is proposed, which adopts the Mamba architecture for long-range dependency modeling of multi-scale features and integrates a gating mechanism to realize the dynamic weighted fusion of local details and global context, further improving feature discriminability and global modeling capability. Finally, a fine-grained enhancement module (FGEM) is designed, which leverages feature reorganization and adaptive feature extraction to reinforce and compensate fine-grained features. Extensive experimental results validate the effectiveness and generalization of the proposed method, achieving mAP

_{50}

scores of

54.1 %

and

56.2 %

on the VisDrone2019 and AI-TOD datasets. Full article

(This article belongs to the Section Sensing and Imaging)

25 pages, 14069 KB

Open AccessArticle

RSMamDet: Efficient UAV Remote Sensing Vehicle Detection via Linear State Space Models and Adaptive Multi-Level Feature Fusion

by Man Wu, Xiaozhang Liu, Xiulai Li and Wenbiao Gan

Drones 2026, 10(5), 396; https://doi.org/10.3390/drones10050396 - 21 May 2026

Viewed by 108

Abstract

Accurate and efficient vehicle detection from unmanned aerial vehicle (UAV) imagery is essential for intelligent transportation, urban monitoring, and public safety, yet this task remains challenging due to high target density, extreme scale variation, complex backgrounds, and stringent onboard computational constraints. Existing DETR-based [...] Read more.

Accurate and efficient vehicle detection from unmanned aerial vehicle (UAV) imagery is essential for intelligent transportation, urban monitoring, and public safety, yet this task remains challenging due to high target density, extreme scale variation, complex backgrounds, and stringent onboard computational constraints. Existing DETR-based detectors model global context through self-attention but incur quadratic

O (N^{2})

complexity that is prohibitive for high-resolution UAV images, while CNN-based methods lack the long-range contextual awareness needed for dense small-object scenarios. We propose RSMamDet, an efficient end-to-end detection framework built upon RT-DETR that replaces quadratic self-attention with linear

O (N)

State Space Model scanning. The framework integrates a MobileMamba backbone with a Selective Feature Scanning module for efficient global context modeling, a Dimension-Aware Selective Integration module for adaptive cross-scale feature fusion, a Poly Kernel Inception Network encoder for multi-receptive-field feature enrichment, and an Adaptive Multi-Level Feature Fusion module for content-aware dynamic upsampling, complemented by an Uncertainty-Minimal Composite loss for stable query selection in cluttered aerial scenes. Experiments on DroneVehicle and VisDrone2019 demonstrate that RSMamDet achieves mAP₅₀ of 72.6% and 40.2%, surpassing state-of-the-art methods by 4.1% and 2.2%, respectively, while maintaining real-time inference at 186.2 FPS with only 19.8M parameters and 42.3 GFLOPs, representing a

6.14 \times

reduction in computational cost and a

3.86 \times

reduction in model parameters compared to the strongest baseline. Full article

(This article belongs to the Topic Advances in Autonomous Vehicles, Automation, and Robotics)

► Show Figures

Figure 1

24 pages, 2250 KB

Open AccessArticle

From Generic to Adaptive: Similarity-Adaptive Receptive-Field Cross DETR for Remote-Sensing Object Detection

by Chenyu Lin, Yunzhan Fu, Hang Xu, Xuyang Teng and Tingyu Wang

Remote Sens. 2026, 18(10), 1670; https://doi.org/10.3390/rs18101670 - 21 May 2026

Viewed by 102

Abstract

Object detection in optical remote sensing imagery faces persistent challenges from severe instance overlap, extreme spatial density, and motion or atmospheric blur. These degradations cause conventional detectors to over-mix neighboring instance features and fail to separate closely packed objects. To address these limitations, [...] Read more.

Object detection in optical remote sensing imagery faces persistent challenges from severe instance overlap, extreme spatial density, and motion or atmospheric blur. These degradations cause conventional detectors to over-mix neighboring instance features and fail to separate closely packed objects. To address these limitations, we propose SARC-DETR, a detection framework that augments the RT-DETR architecture with two complementary plug-in modules: Similarity Adaptive Convolution (SAC) and Receptive Field Cross Convolution (RCC). SAC introduces a reproducing-kernel-Hilbert-space (RKHS) motivated similarity gate that selectively suppresses responses inconsistent with local feature prototypes, thereby reducing cross-instance interference in overlapped and blurred regions. RCC constructs a large directional receptive field through orthogonal strip-based aggregation and content-adaptive fusion, enabling efficient long-range context capture without quadratic complexity overhead. Both modules can be integrated into existing DETR-style detectors without modifying the detection head or training protocol. On VisDrone2019-DET, SARC-DETR improves

{AP}^{val}

from 29.7 to 34.8,

{AP}_{50}^{val}

from 49.5 to 56.2, and

{AP}_{S}^{val}

from 19.2 to 24.8. On DIOR, AP rises from 57.9 to 68.4, and on NWPU VHR-10, from 44.4 to 66.5, demonstrating robust cross-dataset generalization. After structural reparameterization, the additional overhead is less than 0.75 M parameters and 0.36 G FLOPs, confirming deployment suitability for UAV and satellite-based remote sensing applications. Full article

(This article belongs to the Special Issue Deep Learning-Based Interpretation and Processing of Remote Sensing Images)

18 pages, 477 KB

Open AccessSystematic Review

Human-Drone Interaction in Older Adults: A Systematic Review

by Agustín Gómez-López, Yuxa Maya-López, Pablo Olivos-Jara and Rafael Morales

Drones 2026, 10(5), 389; https://doi.org/10.3390/drones10050389 - 20 May 2026

Viewed by 255

Abstract

An aging population, increased life expectancy and loneliness among older people constitute a growing challenge, driving interest in technological solutions such as home drones. The aim of this study is to analyze their potential for older adults through a systematic review following PRISMA [...] Read more.

An aging population, increased life expectancy and loneliness among older people constitute a growing challenge, driving interest in technological solutions such as home drones. The aim of this study is to analyze their potential for older adults through a systematic review following PRISMA guidelines, including articles indexed in Web of Science, Scopus, PubMed and the ACM Digital Library up to February 2026 and following the Joanna Briggs Institute (JBI) methodology. A total of 285 records were initially identified and imported into JBI, of which 41 duplicate records were removed, and 231 studies were excluded after screening, resulting in 13 studies meeting the inclusion criteria. The reviewed studies suggest generally favorable perceptions among some older adults regarding the use of drones in the areas of health, support and safety, alongside barriers related to usability, trust and user interaction. Recent studies incorporate practical applications, highlighting the potential applicability of drones in supporting aspects related to autonomy, health and safety among older adults. Overall, the literature, though still limited, shows a shift towards more specific applications, highlighting the potential of drones to support the autonomy, health and safety of older adults, although their implementation remains influenced by factors of acceptance and user experience. Full article

► Show Figures

Graphical abstract

26 pages, 9060 KB

Open AccessArticle

Synergistic Multi-Model Fusion for Efficient–Accurate Multi-Defect Detection in Power Lines

by Linfeng Xi, Tao Shen, Guanglong Zhao, Nan Wang and Zhi Li

Sensors 2026, 26(10), 3185; https://doi.org/10.3390/s26103185 - 18 May 2026

Viewed by 333

Abstract

In unmanned aerial vehicle (UAV)-based power line inspection, multi-scale defects and complex backgrounds challenge the balance between detection accuracy, speed, and model lightweighting, limiting automated grid inspection. This paper proposes a Multi-Scale Mamba Framework (MS-Mamba) for efficient and accurate defect perception. A drone [...] Read more.

In unmanned aerial vehicle (UAV)-based power line inspection, multi-scale defects and complex backgrounds challenge the balance between detection accuracy, speed, and model lightweighting, limiting automated grid inspection. This paper proposes a Multi-Scale Mamba Framework (MS-Mamba) for efficient and accurate defect perception. A drone inspection dataset containing 5137 images from 14 defect categories was constructed and divided into training and validation sets with an 8:2 split. To address the large scale variation among defects, the categories are decoupled into macroscopic, mesoscopic, and microscopic groups according to physical attributes and visual scales. As the core perception engine, a lightweight state-space mechanism is designed to balance accuracy and deployability. A spatial resolution-aware hierarchical reconstruction strategy and a dynamic feature selection mechanism are integrated to enhance feature extraction, reduce background redundancy, and improve small-target representation. Compared with the YOLOv5s baseline, MS-Mamba achieves an mAP@0.5 of 0.749, corresponding to a 15.6 percentage-point improvement, while reducing parameters by 0.13 M and computational cost by 1.7 GFLOPs. Ablation studies and visual analyses further confirm fewer missed and false detections in complex backgrounds. The developed end-to-end inspection system was validated through closed-loop engineering tests, demonstrating strong potential for industrial deployment. Full article

(This article belongs to the Topic Advanced Strategies for Smart Grid Reliability and Energy Optimization)

► Show Figures

Figure 1

29 pages, 1625 KB

Open AccessArticle

EfficientIR-Det Towards Efficient and Accurate DETR for UAV Infrared Object Detection

by Xiang Yang, Hanbin Li and Xiaolan Xie

Sensors 2026, 26(10), 3129; https://doi.org/10.3390/s26103129 - 15 May 2026

Viewed by 125

Abstract

Infrared (IR) object detection on unmanned aerial vehicle (UAV) platforms is fundamentally challenged by low signal-to-noise ratios and extremely tight onboard computational budgets. Conventional CNNs lack sufficient global context, while Transformers suffer from quadratic complexity, hindering real-time deployment. To address these bottlenecks, we [...] Read more.

Infrared (IR) object detection on unmanned aerial vehicle (UAV) platforms is fundamentally challenged by low signal-to-noise ratios and extremely tight onboard computational budgets. Conventional CNNs lack sufficient global context, while Transformers suffer from quadratic complexity, hindering real-time deployment. To address these bottlenecks, we propose EfficientIR-Det, a lightweight end-to-end detector featuring a holistic optimization of the backbone, encoder, and sampling mechanisms. Specifically, we design a Partial Star Network (PSN) backbone that achieves implicit high-dimensional feature expansion via element-wise multiplication to amplify weak IR signals with minimal redundancy. Furthermore, a Hierarchical Mamba (HiMamba) encoder leverages selective state-space modeling to provide linear-complexity global enhancement with superior hardware efficiency. To refine cross-scale representations, we introduce an Adaptive Gated Sampling (AGS) module and a Hierarchical Sampling Strategy (HSS) to optimize feature fusion and sampling budget allocation toward dim-small targets. On HIT-UAV, EfficientIR-Det achieves 88.4% mAP@0.5, outperforming the RT-DETR-R18 baseline by 3.3 points while reducing FLOPs and parameters by 48.9% and 44.2%, respectively. On the larger-scale DroneVehicle dataset, it consistently leads with a 74.1% mAP@0.5 and a high inference speed of 140.8 FPS. Our results offer a promising research scheme for robust, real-time infrared perception on edge-constrained UAV platforms. Full article

(This article belongs to the Special Issue Emerging Remote Sensing Techniques and Applications for Object Detection)

► Show Figures

Figure 1

24 pages, 4429 KB

Open AccessArticle

SDP-YOLOv8: A Lightweight Enhancement Algorithm for Small Object Detection in UAV Aerial Photography

by You-Chao Lu, Yi-Han Xu, Wen Zhou and Ding Zhou

Appl. Sci. 2026, 16(10), 4941; https://doi.org/10.3390/app16104941 - 15 May 2026

Viewed by 137

Abstract

To overcome the limitations of existing UAV object detection algorithms—particularly missed detections, false alarms, and the progressive loss of fine-grained features for small objects—this paper proposes SDP-YOLOv8, a lightweight and parameter-efficient enhancement of YOLOv8. The design aims to improve small-object detection accuracy while [...] Read more.

To overcome the limitations of existing UAV object detection algorithms—particularly missed detections, false alarms, and the progressive loss of fine-grained features for small objects—this paper proposes SDP-YOLOv8, a lightweight and parameter-efficient enhancement of YOLOv8. The design aims to improve small-object detection accuracy while maintaining a lightweight architecture suitable for deployment on memory-constrained UAV platforms. Four lightweight-oriented modifications are introduced: (1) SCFS, which combines SPD-Conv for low-information-loss downsampling with a C2f block and SimAM attention; (2) DCSPPF, expanding the receptive field via parallel dilated convolutions; (3) a GhostConv-infused Patch Merging upsampling layer for local context enhancement; and (4) an extra small-scale detection head to preserve fine details. On VisDrone2019, experimental results show that SDP-YOLOv8 improved mAP@0.5 by 3.90% and mAP@0.5:0.95 by 2.60%, with a 14.4% reduction in parameters. The model maintains real-time performance (53.5 FPS on an RTX 3090 at FP32 with batch size 1, 38.7 FPS on a Jetson Orin Nano with TensorRT FP16 at batch size 1) and offers a favorable trade-off between detection accuracy, parameter efficiency, and memory footprint, making it a potential candidate for onboard deployment on resource-limited UAVs in aerial monitoring scenarios, pending further validation on diverse datasets and hardware platforms. Full article

► Show Figures

Figure 1

30 pages, 21776 KB

Open AccessArticle

LDSNet: A Lightweight Detail-Sensitive Network for Small Object Detection in Low-Altitude UAV Scenarios

by Tong Tan, Xianrong Peng, Jianlin Zhang, Haorui Zuo, Yao Zhang, Yunhao Wu and Hui Li

J. Imaging 2026, 12(5), 209; https://doi.org/10.3390/jimaging12050209 - 14 May 2026

Viewed by 284

Abstract

Object detection in Unmanned Aerial Vehicle (UAV) imagery faces significant challenges due to the unique aerial perspective. A major bottleneck is the weak feature representation of small objects, which limits both detection accuracy and computational efficiency. To address this issue, we propose a [...] Read more.

Object detection in Unmanned Aerial Vehicle (UAV) imagery faces significant challenges due to the unique aerial perspective. A major bottleneck is the weak feature representation of small objects, which limits both detection accuracy and computational efficiency. To address this issue, we propose a Lightweight Detail-Sensitive Network (LDSNet). Specifically, LDSNet consists of three key components: (1) Lightweight Detail-Sensitive Downsampling (LDSDown), which combines anti-aliasing smoothing with dual-path feature extraction to preserve the spatial details of small objects during downsampling; (2) Shared Recursive Dilated Convolution (SRDC), which uses weight-shared multi-rate dilated convolutions to capture multi-scale context and enlarge the receptive field without introducing extra parameters; and (3) Deeply Decoupled Grouped Head (DGHead), which employs high-ratio grouped convolutions to significantly reduce the computational cost of processing high-resolution inputs. Extensive experiments on the VisDrone2019 and HIT-UAV datasets demonstrate that LDSNet achieves an excellent trade-off between accuracy and efficiency. Compared to the YOLOv11n baseline, LDSNet reduces parameters by 84.6% (from 2.6 M to 0.4 M) and FLOPs by 29.2% (from 6.5 G to 4.6 G), while improving mAP₅₀ by 2.2% on VisDrone2019 and achieving 94.5% on HIT-UAV. Full article

(This article belongs to the Special Issue AI-Driven Remote Sensing Image Processing and Pattern Recognition)

► Show Figures

Figure 1

17 pages, 2705 KB

Open AccessArticle

A Cooperative Network Management Architecture for Manned–Unmanned Aircraft Teaming Using Network Drones

by Changmin Park and Hwangnam Kim

Electronics 2026, 15(10), 2102; https://doi.org/10.3390/electronics15102102 - 14 May 2026

Viewed by 200

Abstract

Conventional direct communication in Manned–Unmanned Teaming (MUM-T) suffers from fundamental scalability and security limitations. As the number of Unmanned Aerial Vehicles (UAVs) increases, the communication burden on the manned aircraft (MA) grows significantly, while security threats originating from UAVs may directly propagate to [...] Read more.

Conventional direct communication in Manned–Unmanned Teaming (MUM-T) suffers from fundamental scalability and security limitations. As the number of Unmanned Aerial Vehicles (UAVs) increases, the communication burden on the manned aircraft (MA) grows significantly, while security threats originating from UAVs may directly propagate to the MA. To address these challenges, this paper proposes a hierarchical communication architecture that introduces dedicated Network Drones (NDs) as intermediate communication mediators and trust boundaries between the MA and multiple UAV swarms. In the proposed design, the MA interacts exclusively with NDs, while UAV swarms communicate through ND-mediated links, effectively bounding the number of MA-facing connections and enabling scalable communication. Building on this structured communication model, a message-level Zero-Trust framework is enforced at the MA–ND interface. Each message is evaluated using a multi-dimensional risk model that incorporates authentication consistency, behavioral consistency, content validity, and contextual information, enabling early detection and containment of compromised UAV behavior. Furthermore, the architecture incorporates backup planning mechanisms, including dynamic reassociation and hot-standby operation, to ensure robust communication under ND failure conditions. Experimental results demonstrate that the proposed approach reduces MA-facing communication overhead, stabilizes end-to-end latency, and improves detection performance in terms of false positives and false negatives, while maintaining system robustness under failure scenarios. Full article

(This article belongs to the Special Issue Intelligent Technologies for Vehicular Networks, 2nd Edition)

► Show Figures

Figure 1

23 pages, 2910 KB

Open AccessArticle

MD-YOLO: A Multi-Scale Adaptive and Dual-Attention Enhanced YOLOv11 for Small Object Detection

by Wenyan Zhou and Gu Gong

Electronics 2026, 15(10), 2099; https://doi.org/10.3390/electronics15102099 - 14 May 2026

Viewed by 217

Abstract

Recent YOLO-based object detection methods have demonstrated strong performance in real-time applications due to their efficient end-to-end architecture. However, in complex scenarios such as VisDrone2019, existing methods still face limitations in small object detection and multi-scale feature modeling capability. These performance bottlenecks are [...] Read more.

Recent YOLO-based object detection methods have demonstrated strong performance in real-time applications due to their efficient end-to-end architecture. However, in complex scenarios such as VisDrone2019, existing methods still face limitations in small object detection and multi-scale feature modeling capability. These performance bottlenecks are not only attributed to model-level constraints, such as the loss of low-level spatial details during progressive downsampling and the insufficient preservation of fine-grained structural information in high-level semantic representations during feature propagation, which consequently limits multi-scale feature representation and fusion, but are also influenced by data-level factors, including long-tailed distributions and spatial distribution bias. To address these limitations, this paper proposes an improved model named MD-YOLO. First, a Multi-scale Adaptive Channel (MAC) module is introduced into the backbone to replace conventional stride-based downsampling, enhancing multi-scale feature representation while preserving fine-grained information. Second, a Dual Attention Feature Fusion (DAFA) module is designed to align features across different resolutions and further enhance fused representations using both channel and spatial attention mechanisms. Furthermore, a high-resolution P2 detection head is incorporated to enhance the detection capability for dense small objects. Experimental results on the VisDrone2019 dataset demonstrate that the proposed method substantially outperforms the YOLOv11s baseline, improving mAP@0.5 from 38.5% to 45.6% and mAP@0.5:0.95 from 22.8% to 27.1%, while maintaining a reasonable computational cost. Full article

(This article belongs to the Special Issue Advanced Technologies and Applications for Computer Vision and Recognition Systems)

► Show Figures

Figure 1

24 pages, 6298 KB

Open AccessArticle

Siamese-ViT: A Local–Global Feature Fusion Method for Real-Time Visual Navigation of UAVs in Real-World Environments

by Yu Cheng, Xixiang Liu, Shuai Chen and Chuan Xu

Remote Sens. 2026, 18(10), 1556; https://doi.org/10.3390/rs18101556 - 13 May 2026

Viewed by 172

Abstract

Visual scene matching navigation (VSMN) for unmanned aerial vehicles (UAVs) boasts advantages such as high precision, high reliability, and autonomy. The biggest challenge lies in the tension between local fine-grained information and global semantics, as well as limited generalization ability in real-world environments. [...] Read more.

Visual scene matching navigation (VSMN) for unmanned aerial vehicles (UAVs) boasts advantages such as high precision, high reliability, and autonomy. The biggest challenge lies in the tension between local fine-grained information and global semantics, as well as limited generalization ability in real-world environments. While existing Transformer-based cross-view geolocation methods enhance global context modeling capabilities, they still generally face issues such as high demands on training data and computational resources, insufficient fusion of local fine-grained information and global semantics, and real-time performance in real-world complex environment. To address these problems, we propose a scene matching and localization algorithm based on the Siamese-ViT. For feature extraction, we use the ViT model to extract global features and K-means clustering to aggregate local features. Combined with the global features extracted by the ViT, a robust local–global feature representation vector is generated. For feature matching, incremental principal component analysis (IPCA) is used to reduce the dimensionality of the high-dimensional feature space, and a KD-tree is constructed for fast feature retrieval to improve matching efficiency. We validated our algorithm on the University-1652 dataset and a dataset of real-world satellite-drone image pairs. The results show that our Siamese-ViT outperforms other models in both Recall and AP. We conduct flight experiments in real-world environments, capturing drone images of complex scenes, including farmland, urban buildings, and waterways. The results show that, at a flight altitude of 350 m, our algorithm achieves an average absolute value of 6.2063 m for latitude, 6.7552 m for longitude, and 10.1922 m for horizontal error. Therefore, our Siamese-ViT demonstrates ideal overall positioning accuracy. Full article

(This article belongs to the Special Issue AI-Enhanced Remote Sensing for High-Precision Positioning and Navigation)

► Show Figures

Figure 1

Search Results (1,857)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (1,857)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI