Robust Object Detection for UAVs in Foggy Environments with Spatial-Edge Fusion and Dynamic Task Alignment

Dong, Qing; Han, Tianxin; Wu, Gang; Sun, Lina; Lu, Yuchang

doi:10.3390/rs18010169

Open AccessArticle

Robust Object Detection for UAVs in Foggy Environments with Spatial-Edge Fusion and Dynamic Task Alignment

by

Qing Dong

¹

,

Tianxin Han

¹

,

Gang Wu

^1,2,*

,

Lina Sun

³

and

Yuchang Lu

²

¹

School of Computer Science and Engineering, Northeastern University, Shenyang 110819, China

²

Key Laboratory of Intelligent Computing in Medical Image, Ministry of Education, Northeastern University, Shenyang 110819, China

³

School of Mechanical Engineering and Automation, Northeastern University, Shenyang 110819, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2026, 18(1), 169; https://doi.org/10.3390/rs18010169

Submission received: 29 September 2025 / Revised: 18 November 2025 / Accepted: 22 November 2025 / Published: 5 January 2026

(This article belongs to the Special Issue Efficient Object Detection Based on Remote Sensing Images)

Download

Browse Figures

Versions Notes

Highlights

What are the main findings?

We introduce Fog-UAVNet, a lightweight detector with a unified design that fuses edge and spatial cues, adapts its receptive field to fog density, and aligns classification with localization.
Across multiple fog benchmarks, Fog-UAVNet consistently achieves higher detection accuracy and more efficient inference than strong baselines, leading to a superior accuracy–efficiency trade-off under foggy conditions.

What are the implications of the main findings?

Robust, real-time UAV perception is feasible without large models, enabling practical onboard deployment.
The design offers a simple recipe for adverse weather detection and may generalize across aerial scenarios.

Abstract

Robust scene perception in adverse environmental conditions, particularly under dense fog, presents a persistent and fundamental challenge to the reliability of object detection systems. To address this critical challenge, we propose Fog-UAVNet, a novel lightweight deep-learning architecture designed to enhance unmanned aerial vehicle (UAV) object detection performance in foggy environments. Fog-UAVNet incorporates three key innovations: the Spatial-Edge Feature Fusion Module (SEFFM), which enhances feature extraction by effectively integrating edge and spatial information, the Frequency-Adaptive Dilated Convolution (FADC), which dynamically adjusts to fog density variations and further enhances feature representation under adverse conditions, and the Dynamic Task-Aligned Head (DTAH), which dynamically aligns localization and classification tasks and thus improves overall model performance. To evaluate the effectiveness of our approach, we independently constructed a real-world foggy dataset and synthesized the VisDrone-fog dataset using an atmospheric scattering model. Extensive experiments on multiple challenging datasets demonstrate that Fog-UAVNet consistently outperforms state-of-the-art methods in both detection accuracy and computational efficiency, highlighting its potential for enhancing robust visual perception under adverse weather.

Keywords:

UAV object detection; foggy environments; lightweight deep learning; real-time perception

1. Introduction

Visual object detection serves as a fundamental component in autonomous perception systems, underpinning high-level tasks such as navigation, interaction, and environmental understanding. Unmanned aerial vehicles (UAVs) have emerged as mobile visual agents operating over wide spatiotemporal scales, offering an efficient means of sensing and analyzing complex scenes [1]. As UAVs are increasingly embedded within Internet of Things (IoT)-driven infrastructures, the demand for accurate, real-time onboard detection under diverse environmental conditions has become a core requirement for scalable autonomous deployment [2]. Reliable perception is essential for autonomous UAVs, especially in dynamic and uncertain environments. Visual object detection enables scene understanding and supports downstream tasks such as planning, navigation, and semantic mapping [3]. However, real-world deployments often confront adverse weather, which degrades contrast, structure, and object visibility.

Among these conditions, fog presents a particularly severe obstacle. As a form of atmospheric degradation, fog reduces image contrast, attenuates structural details, and causes partial occlusions, all of which significantly impair object localization and recognition [4,5]. These degradations undermine the operational reliability of UAV perception systems in real-world deployments [6]. As shown in Figure 1, conventional detectors often miss targets or yield low-confidence predictions under dense fog, whereas specialized frameworks demonstrate improved robustness.

UAV-based object detection remains a significant challenge under dense fog. First, current models often lack the representational flexibility to handle frequency-domain distortions caused by atmospheric scattering, limiting context modeling across spatially heterogeneous visibility. Second, their feature integration pipelines typically overlook the semantic asymmetry between edge-localized and global-context features, degrading boundary perception in low-contrast scenes. Third, conventional inference strategies are largely task-agnostic, failing to address the misalignment between classification and localization under visual degradation. These limitations, common across existing detectors, pose fundamental obstacles to robust UAV perception in low-visibility environments.

To address these challenges, we propose Fog-UAVNet, a lightweight and fog-aware object detection framework tailored for UAV-based IoT applications. It is designed to enhance structural sensitivity, adapt to multi-scale visibility degradation, and maintain task-consistent predictions under adverse conditions. The network integrates structural fusion, receptive field adaptation, and dynamic task alignment to overcome key limitations of existing methods. The main contributions of this work are summarized as follows:

1.: We design Fog-UAVNet, a tailored detection framework for UAV deployment in foggy conditions. It incorporates three novel modules: (i) a Spatial-Edge Feature Fusion Module (SEFFM) for enhancing structural details under low visibility; (ii) a Frequency-Adaptive Dilated Convolution (FADC) for fog-aware receptive field modulation; and (iii) a Dynamic Task-Aligned Head (DTAH) for adaptive multi-task prediction. Together, they enable robust representation learning and task-consistent inference in adverse weather.
2.: We developed the VisDrone-fog dataset by introducing artificial fog into the original VisDrone dataset, creating a specialized resource for UAV object detection. Tailored for training and evaluating detection models under challenging conditions of reduced visibility and varying fog densities, we scientifically generated realistic artificial foggy environments in strict accordance with the Mie scattering theory originally proposed by McCartney in 1975 when constructing the dataset. By leveraging the VisDrone-fog dataset, we demonstrate Fog-UAVNet’s effectiveness in handling the complexities of low-visibility environments, ensuring its adaptability across diverse foggy scenarios.
3.: In order to simulate the real situation even further, we constructed Foggy Datasets, a real-world dataset that covers diverse fog densities and complex environmental conditions. This dataset enables thorough evaluation of model generalization in practical UAV-based IoT scenarios beyond synthetic assumptions.
4.: We extensively validate Fog-UAVNet across six fog-specific benchmarks—VisDrone, VisDrone-Fog, VOC-Foggy, RTTS, Foggy Datasets, and Foggy Driving—demonstrating consistent gains over state-of-the-art detectors in localization and classification accuracy, with a compact and efficient architecture suited for real-world deployment.

The remainder of the paper is structured as follows. Section 2 reviews related work. Section 3 introduces the proposed method. Section 4 presents experiments and results. Section 5 discusses the results. Section 6 concludes the paper and outlines future work.

2. Related Work

2.1. Object Detection Under Complex and Adverse Conditions

Traditional object detection methods, such as the Viola-Jones detector, rely on handcrafted features and simple classifiers, but often fail in complex environments with occlusion and low visibility [7]. With the rise of deep learning, region-based convolutional neural networks (e.g., R-CNN, Fast R-CNN, Faster R-CNN) [8,9,10] have significantly improved accuracy by generating region proposals and applying deep feature extraction. Single-stage detectors like You Only Look Once (YOLO) and SSD [11,12] balance speed and accuracy, enabling real-time applications. More recently, DETR [13] introduced transformer-based architectures with global self-attention, offering improved scalability and robustness. In line with these developments, recent studies have also explored lightweight and domain-adaptive detection frameworks for practical deployments [14], including sparse-voxel designs such as RailVoxelDet [15], which reduce redundancy under degraded sensing, and physics-guided models like PCG-QNN [16], which integrate physical priors to enhance robustness in fog-degraded UAV scenarios.

However, performance under adverse weather conditions—such as fog, haze, or heavy rain—remains a key challenge due to visibility reduction and image degradation [17]. To address this, researchers have explored integrating dehazing with detection. For example, SHA modules [18] and haze-aware transformer backbones [19] enhance feature-level perception. Architectures like Swin Transformer and attention fusion networks further improve adaptability to uneven degradation [20,21].

Joint frameworks such as IDOD-YOLOV7 [22] and IA-YOLO [23] combine image restoration and detection in end-to-end pipelines, improving performance in foggy scenes. TogetherNet [24] employs multi-task learning to integrate enhancement and detection, while MS-DAYOLO [25] uses multi-scale domain adaptation to achieve weather-invariant features. In addition, curated datasets covering diverse weather scenarios are essential for training robust models capable of generalizing across conditions.

2.2. IoT-Driven UAV Detection in Challenging Environments

UAV detection in IoT scenarios introduces further complexity, demanding lightweight, real-time, and weather-resilient solutions. LWUAVDet [26] addresses this with an efficient E-FPN backbone, Pixel Efficient Head, and online distillation for deployment under limited-resource conditions. Similarly, AS-YOLOV5 [27] focuses on small-object detection via soft pooling and subspace attention, improving performance in dense, cluttered environments like road traffic scenes.

Beyond architectural improvements, benchmarking efforts [28] offer design insights for UAV detection in smart cities, and environment-adaptive fusion paradigms such as PMFN [29] indicate that prior-guided dynamic fusion can stabilize detection under varying environmental conditions, providing transferable insights for haze-affected UAV perception. However, most approaches overlook domain-specific challenges such as fog, where detection remains unreliable.

To bridge this gap, we propose Fog-UAVNet, a detection framework tailored for foggy UAV scenarios. By integrating advanced feature fusion and task-aligned prediction, Fog-UAVNet achieves robust detection under visibility degradation. Its design aligns with IoT applications like smart surveillance, disaster response, and environmental monitoring, where weather resilience is critical.

3. Materials and Methods

3.1. Dataset Description

To support algorithm development for UAV-based object detection in foggy conditions, we primarily use the VisDrone-Fog dataset, specifically designed for aerial scenarios. Additionally, we include VOC-Foggy, VOC-Norm, RTTS, Foggy Datasets, and Foggy Driving to ensure comprehensive evaluation across diverse fog conditions. Dataset statistics, including image and instance counts per class, are summarized in Table 1.

VisDrone-Fog Dataset is a large-scale dataset specifically designed for drone-based object detection and tracking tasks [30]. It contains a wide variety of annotated images and video sequences captured by drones in diverse environments and under various weather conditions, including fog. This dataset is instrumental for training and evaluating UAV detection systems in real-world scenarios, making it highly relevant for IoT applications such as surveillance, traffic monitoring, and disaster management. Considering there are few publicly available datasets for UAV object detection in adverse weather conditions, we extended the VisDrone dataset by generating the VisDrone-fog dataset. This extension was created by introducing artificial fog into the original VisDrone dataset using an atmospheric scattering model. This model, based on the Mie scattering theory initially proposed by McCartney in 1975, divides the light received by optical sensors into two components: light reflected by the target object and ambient light surrounding the target. The model is defined as

$I (x) = J (x) e^{- β d (x)} + A (1 - e^{- β d (x)}),$

(1)

where $I (x)$ represents the observed foggy image, $J (x)$ is the corresponding clear image, A is the global atmospheric light, and $d (x)$ denotes the scene depth. The parameter $β = 0.08$ controls the scattering coefficient and is set to ensure a uniform fog density across the dataset.

Depth

d (x)

is computed as

d (x) = - 0.04 \times ρ + \frac{p}{max (row, col)},

(2)

where

ρ

is the Euclidean distance from the current pixel to the central pixel and row and col are the number of rows and columns for the image, respectively. This formula calculates the distance between the object and the imaging device. Using this model, we generated the foggy images, as illustrated in Figure 2.

Figure 2. Examples from the VisDrone-fog dataset: (Left) original image; (Right) corresponding foggy image generated using the atmospheric scattering model.

Representative examples from the four benchmark datasets used in this study are shown in Figure 3, illustrating the diversity of scenes, fog densities, and object categories considered in our evaluation.

VOC-Foggy Dataset is a synthetic benchmark derived from PASCAL-VOC [31] by introducing fog of varying densities [23], simulating visibility degradation in IoT scenarios. It is widely used for benchmarking dehazing and detection algorithms under adverse conditions.
VOC-Norm Dataset includes non-foggy images from VOC2007 and VOC2012 trainval splits, serving as a baseline for evaluating model robustness and calibration across clear and degraded environments.
RTTS Dataset [32] consists of 4322 real-world foggy images with annotated vehicles and pedestrians, enabling performance evaluation in naturalistic urban fog scenarios, especially for smart city applications.
Foggy Driving Dataset [33] comprises 101 annotated images of vehicles and pedestrians in real fog, reflecting typical challenges faced by UAVs in traffic management scenarios.
Foggy Datasets is a self-made dataset capturing real-world foggy scenes, designed to enhance UAV detection performance in diverse IoT contexts. Compiled through data scraping and careful annotation, it includes 8700 images with 55,701 instances across key classes such as car, person, bus, bicycle, motorcycle, and truck. This dataset covers various fog densities, times of day, and weather conditions, making it valuable for training models to handle the natural variability of foggy environments. The diversity in scenes from urban to rural areas under different fog conditions makes this dataset suitable for a wide range of UAV-based IoT applications, including urban surveillance, traffic monitoring, disaster management, and environmental monitoring. To promote further research, the Foggy Datasets, including the images and visualizations shown in Figure 4, are publicly available on our project website.

As illustrated by Figure 2, Figure 3 and Figure 4, the benchmark datasets used in this study jointly cover both controlled synthetic scenarios and challenging real-world UAV scenes under various fog conditions. In this work, we adopt VOC-Foggy as a representative dataset with balanced class distribution and good controllability for conducting ablation studies: on the one hand, VOC-Foggy is derived by extending the classical PASCAL VOC dataset and thus features clear category definitions, stable annotation quality, a relatively balanced class distribution, and a moderate dataset scale; on the other hand, its fog density is generated to cover a wide range of degradation levels from light to heavy fog, and it has therefore been widely used as a standard synthetic benchmark for foggy object detection and dehazing algorithms. Based on these properties, using VOC-Foggy as the main platform for ablation studies enables us to quantitatively analyze the incremental contributions of each proposed module under a standard and widely adopted evaluation setting. In contrast, VisDrone-Fog focuses on realistic UAV-based urban scenes with complex layouts and many small objects, and is thus primarily used to assess the robustness and generalization of these modules under practical visibility degradation. Subsequently, we further validate the complete Fog-UAVNet model on RTTS, Foggy Driving, and Foggy Datasets across diverse real-world traffic and surveillance environments, ensuring that the reported performance is not restricted to a single dataset or a specific fog distribution and providing additional evidence of the model’s robustness.

3.2. Experimental Setup and Evaluation Metrics

All experiments were conducted on a workstation equipped with an Intel^® Xeon^® Platinum 8255C CPU and an NVIDIA GeForce RTX 3090 GPU, using the PyTorch 1.10.0 framework. In all experiments, the networks were trained for 200 epochs with a batch size of 16 using stochastic gradient descent (SGD) with a momentum of 0.9, a weight decay of 0.0005, and an initial learning rate of 0.01. This training configuration is in line with commonly adopted settings in recent object detection works and was further validated by preliminary experiments on our main benchmark, where we observed that a learning rate of 0.01 combined with a weight decay of 0.0005 yielded a favorable trade-off between convergence stability and final accuracy. For the sake of a controlled and fair comparison, the same optimization protocol and hyperparameter configuration were applied to Fog-UAVNet and all reproduced baseline detectors across all datasets, so that the observed performance differences can be attributed primarily to the architectural design rather than to discrepancies in training strategies.

To comprehensively evaluate Fog-UAVNet in fog-aware UAV detection, we adopt a set of metrics covering accuracy, complexity, and efficiency.

Average Precision (AP) quantifies detection accuracy via the area under the precision-recall curve. Mean Average Precision (mAP) is averaged across all categories:

$mAP = \frac{1}{n} \sum_{i = 1}^{n} {AP}_{i}$

(3)
Mean Average Precision at IoU 0.5 (mA $P_{50}$ ) denotes mean AP at an intersection over union (IoU) threshold of 0.5.
Parameters represents the total number of trainable parameters.
Giga Floating Point Operations (GFLOPs) measures the computational cost.
Speed refers to the latency per image during inference.
Frames Per Second (FPS) measures the number of frames that a detector can process per second on a given hardware platform and thus directly reflects the model’s throughput and practical real-time capability. In deployment scenarios, especially under resource-constrained UAV settings, a higher FPS at comparable accuracy is crucial for ensuring timely perception and stable downstream control. In this work, we report FPS on a representative embedded device (NVIDIA Jetson Nano), which provides a realistic estimate of the end-side inference speed and highlights the efficiency advantage of the proposed Fog-UAVNet under typical edge-computing constraints.

This metric suite enables a balanced evaluation of detection accuracy and deployment efficiency, ensuring practical relevance in UAV-based foggy environments.

3.3. Fog-UAVNet

The overall workflow of our IoT-Driven UAV Detection system is illustrated in Figure 5. The process begins with onboard visual sensing and video transmission under foggy conditions. Each frame is analyzed by Fog-UAVNet to detect key objects, which are subsequently categorized to support downstream tasks such as tracking and anomaly detection. All outcomes are integrated into a comprehensive event report, with all computations performed on the UAV’s onboard unit.

At the core of this system lies the proposed Fog-UAVNet architecture, shown in Figure 6. Designed to achieve robust performance in adverse weather, Fog-UAVNet consists of three main modules: a multi-scale feature extraction backbone, a Frequency-Adaptive Dilated Convolution (FADC) module, and a Dynamic Task-Aligned Detection Head (DTAH).

The backbone begins with our pioneering SEFFM to perform initial feature extraction, followed by convolutional blocks that progressively downsample the input and enrich semantic representations. SEFFM extracts key image features by effectively fusing spatial and edge information, improving feature clarity for subsequent convolutional layers and enhancing model robustness. To cope with the spatially varying fog density, the FADC module enhances the feature robustness of the degraded region by extracting the middle features of the model to efficiently unify its local and overall information and adaptively adjusting the dilation rate. Concurrently, we introduced the Spatial Pyramid Pooling–Fast (SPPF) module proposed in YOLOv5, which can capture multi-scale context to facilitate the detection of occluded or low-contrast UAV targets.

For prediction, Fog-UAVNet adopts a top-down fusion pathway that aggregates multi-resolution features via upsampling and concatenation. Final detection is performed by the DTAH module, which realigns the task-specific features obtained by aggregating the feedback from the feed-forward network consisting of modules such as SEFFM, FADC, etc., in order to improve the classification and localization, particularly under visibility degradation.

In summary, the adaptive design of Fog-UAVNet—integrating frequency-aware modulation with task-aligned detection—enables resilient visual inference under degraded visibility. Its robustness against fog-induced perturbations and fine-grained discriminative power make it highly applicable to UAV-based IoT scenarios, including traffic monitoring and emergency response.

3.4. Spatial-Edge Feature Fusion Module

Under foggy conditions, blurred edges and ambiguous spatial context severely hinder effective feature extraction in UAV-based perception. Conventional fusion strategies—such as summation or concatenation—often overlook the semantic heterogeneity between edge and spatial features, resulting in degraded representations. While edge cues are essential for boundary localization, spatial features capture global context; directly fusing them without distinction may introduce semantic conflicts, especially under visibility degradation.

To address this, we propose the Spatial-Edge Feature Fusion Module (SEFFM), which decouples and selectively integrates edge and spatial information to enhance representational quality in fog-impaired scenes (Figure 7).

Given an input feature map

F \in R^{H \times W \times C}

, SEFFM first extracts base features using a standard

3 \times 3

convolution:

F^{'} = {Conv}_{3 \times 3} (F)

(4)

F^{'}

is processed in two parallel branches: an edge path using grouped Sobel-based convolutions for boundary cues and a spatial path with zero padding and max pooling for coarse spatial abstraction. Their outputs are concatenated and refined via stacked convolutions:

F_{concat} = Concat (SobelConv (F^{'}), MaxPool (ZeroPad (F^{'})))

(5)

F_{output} = {Conv}_{1 \times 1} ({Conv}_{3 \times 3} (F_{concat}))

(6)

By explicitly modeling and fusing complementary feature modalities, SEFFM improves semantic consistency and structural discrimination in degraded visibility conditions. Moreover, compared to the traditional case of using convolution for primary feature extraction, our pioneering SEFFM is able to selectively fuse edge and spatial information, which is essential to fusion detection in foggy weather since both edge and structural information are critical in determining detection accuracy under harsh conditions. It is worth noting that although fog blurs object boundaries and reduces local contrast, local intensity variations in the feature space still carry useful information. Therefore, in SEFFM, the Sobel-based edge cues are not used as a single dominant feature source; instead, they are always fused with the more stable MaxPool-based spatial branch and jointly refined by subsequent convolutional layers. This design allows the network to amplify structural cues when meaningful gradient patterns are present, while relying more on coarse spatial context when edges are severely degraded, thereby reducing the risk of unstable behavior in heavy fog conditions. An intuitive visualization of the intermediate feature maps before and after the concatenation of the SobelConv and MaxPool branches is provided in Figure 8, which further illustrates the complementary roles of structural and spatial cues. These facilitate more robust object perception for UAV-based detection tasks in foggy environments.

3.5. Dynamic Task-Aligned Head (DTAH)

Feature misalignment between classification and localization remains a key limitation in dense detectors, especially under fog-induced challenges such as scale variation and low contrast. To address this, we propose the Dynamic Task-Aligned Head (DTAH), which decouples shared features into task-specific representations and dynamically adjusts spatial alignment for consistent prediction. As illustrated in Figure 9, DTAH consists of shared convolutional layers followed by task-specific dynamic branches. Group Normalization [34] is employed to mitigate internal covariate shift and enhance feature stability under varying environmental conditions.

Initial feature processing through the shared convolutional layers enhances the representation of critical detection features, focusing on edges, textures, and other pivotal attributes necessary for differentiating objects in complex scenarios:

features = SharedConv (x)

(7)

Here, SharedConv refers to the operations within the shared convolutional layers, applying filters designed to capture a broad spectrum of features from the input data. In addition, we compute a global context descriptor by applying global average pooling over the spatial dimensions, i.e.,

g = GAP (features) \in R^{C \times 1 \times 1}

, which summarizes the scene-level statistics under foggy conditions and serves as a guidance signal for task-aware feature routing.

After this initial processing, features are dynamically allocated to dedicated paths for classification and regression tasks:

cls_feat, reg_feat = DynamicTaskDecomposition (features, g)

(8)

DynamicTaskDecomposition determines the relevance and importance of features to a particular task based on the global context, which is then used as a reference for their dynamic assignment. Concretely, a lightweight gating module takes g as input and predicts task-specific channel-wise modulation vectors,

α_{cls}, α_{reg} \in {(0, 1)}^{C}

, which re-weight features to obtain cls_feat and reg_feat:

cls_feat = α_{cls} ⊙ features, reg_feat = α_{reg} ⊙ features,

(9)

where ⊙ denotes channel-wise multiplication. In this way, channels receiving higher responses in

α_{cls}

or

α_{reg}

are effectively prioritized for the classification or regression task, respectively, realizing a soft but explicit task-aware feature allocation. For example, features with high spatial resolution may be preferentially assigned to the regression task to ensure accurate object localization, while features with unique texture patterns may be preferentially used for classification.

In the regression pathway, accurate feature alignment is crucial for precise object localization. To this end, a convolution applied to features predicts an offset field and a modulation mask, which are fed into a dynamic deformable convolution module to realize DynamicAlignment on reg_feat. It dynamically changes the offset and mask according to the specific needs of the detected features, thus adjusting the sampling position and weight of the convolution to achieve the effect of customised convolutional filters.

aligned_features = DynamicAlignment (reg_feat)

(10)

DynamicAlignment employs dynamically computed offsets to adjust the convolutional filters, enhancing the accuracy of bounding box predictions.

During the inference phase, the system efficiently merges and decodes bounding boxes and class probabilities using an advanced concatenation and decoding mechanism to ensure high detection accuracy:

y = Concatenate (Decode (dbox), σ (cls))

(11)

Here, the Decode function translates the encoded bounding box coordinates into actual geometrical shapes on the image and the sigmoid function normalizes the output probabilities between 0 and 1, rendering them interpretable as confidence scores.

In summary, DTAH improves task decoupling and spatial adaptability by dynamically allocating features and modulating alignment across detection branches. Compared with traditional methods like YOLOv5 and RetinaNet [35], which use fixed receptive fields and separate task branches, this design significantly enhances the model’s dynamic processing ability. It enables the model to quickly adapt to different task requirements, actively avoid ineffective regions, and focus on key parts of the target in complex environments, thereby improving its performance in challenging scenarios such as foggy conditions.

3.6. Frequency-Adaptive Dilated Convolution (FADC)

To improve feature extraction under visibility-degraded conditions, we introduce the Frequency-Adaptive Dilated Convolution (FADC) module [36]. Unlike conventional dilated convolution with fixed configurations, FADC adaptively adjusts receptive field size and kernel behavior based on the frequency characteristics of input features. This design is particularly beneficial in UAV-based IoT applications, where fog often suppresses spatial detail and disrupts semantic consistency.

Adaptive Kernel (AdaKern) performs preliminary processing on initial features, particularly within heterogeneous IoT environments. By decomposing the convolutional kernel into components tailored to specific frequency ranges, it enables more refined feature extraction. This allows the network to dynamically adjust the weights of the convolutional kernel based on the frequency content of the features, thereby enhancing the accuracy of feature representation:

K_{adapt} = Decompose (K, α (F_{in}), β (F_{in})),

(12)

where K is the original convolution kernel,

α

and

β

adjust the decomposition based on the identified frequency content, and

K_{adapt}

is the adapted kernel used for the final convolution.

The Frequency Selection (FreqSelect) component generates attention weights through AdaKern and applies them across multiple scales of the feature map. This allows the network to emphasize important frequency components at different resolutions, dynamically guiding subsequent convolution operations in foggy environments:

F_{freq} = \sum_{k = 1}^{K} K_{adapt_k} ⊙ F_{in}

(13)

where

K_{adapt_k}

denotes convolution kernel generated by the AdaKern module at the specific scale and

F_{freq}

represents the frequency-enhanced feature map. This process aids in identifying and amplifying crucial spatial details that might be obscured by fog.

The Adaptive Dilation Rate (AdaDR) adjusts the dilation rate to optimize the receptive field for different spatial complexities encountered in foggy scenarios based on the frequency characteristics identified by FreqSelect. This adaptive dilation mechanism leverages

F_{freq}

to apply attention weights across different dilation rates, enabling the network to focus on fine details when high-frequency content is present and to expand its receptive field for low-frequency content:

d = AdaptiveFunction (F_{freq}),

(14)

F_{dilated} = DilatedConv (F_{freq}, d)

(15)

where d is the dynamically adjusted dilation rate and

F_{dilated}

is the dilated feature map.

The structure of the complete FADC module is illustrated in Figure 10. By combining frequency-guided dilation and adaptive kernel modulation, FADC enhances spatial representation under fog, improving object localization and recognition in UAV-based IoT deployments.

3.7. Embedded Deployment and Optimization

Although personal computers and servers provide strong training and inference capabilities, their large size, high power consumption, and lack of portability limit their suitability for UAV-based foggy detection tasks that require lightweight platforms and real-time processing. Cloud-based solutions can support large-scale data processing, but their inherent communication latency poses clear limitations in delay-sensitive scenarios, such as emergency monitoring and target tracking in UAV–IoT applications.

As summarized in Figure 11, the proposed framework follows a complete pipeline from multi-source foggy dataset construction, network training, and ablation analysis to cross-dataset evaluation and final embedded deployment. In this subsection, we focus on the last stage and describe how Fog-UAVNet is deployed and evaluated on a representative edge device.

In this study, we use an NVIDIA Jetson Nano as a representative embedded platform to evaluate the on-device runtime behavior of Fog-UAVNet under realistic resource constraints. The Jetson Nano features a compact form factor, low power consumption, and a moderate yet efficient compute capability, making it a suitable testbed for edge-side visual analysis in resource-limited environments. As illustrated in Figure 11, the overall workflow starts from constructing multi-source foggy UAV datasets and training Fog-UAVNet on a high-performance computing platform. During training, we initialize the network with pretrained backbone weights to accelerate convergence and improve detection accuracy under fog degradation. After the training and validation on the workstation are completed, the optimized Fog-UAVNet model is exported and deployed onto the Jetson Nano for inference benchmarking.

During deployment, we adopt the TensorRT inference engine to fuse and simplify the network graph, achieving operator-level acceleration and a practical trade-off between accuracy and speed. On the Jetson Nano, we then measure key runtime indicators such as frames per second (FPS) and memory usage to quantify the feasibility of running Fog-UAVNet on embedded hardware. This evaluation strategy balances model performance against hardware constraints and demonstrates that Fog-UAVNet can, in principle, satisfy the real-time requirements of typical foggy UAV scenarios on low-power edge devices.

4. Results

4.1. Ablation Studies

This section presents ablation studies to quantify the individual contributions of the Frequency-Adaptive Dilated Convolution (FADC), Spatial-Edge Feature Fusion Module (SEFFM), and Dynamic Task-Aligned Head (DTAH) within the Fog-UAVNet architecture. Experiments are performed on two fog-specific datasets: VOC-Foggy, generated from PASCAL VOC with synthetic fog, provides a controlled environment for module-wise evaluation; VisDrone-Fog, simulating realistic urban fog scenarios, assesses generalization under complex conditions. As illustrated in Figure 12, this evaluation framework enables a systematic analysis of detection performance and validates the practical relevance of Fog-UAVNet in IoT applications such as smart city surveillance and traffic monitoring.

4.1.1. Ablation Studies on VOC-Foggy

To evaluate the impact of each module on the model performance, we conducted ablation experiments on the VOC-Foggy dataset. By incrementally adding each module (FADC, SEFFM, DTAH), we observed its effect on average precision (AP), mean average precision at 50% IoU (mAP50%), frames per second (FPS), number of parameters, and GFLOPS. The results are shown in Table 2. Introducing the FADC module resulted in a slight increase in mAP from 85.3% to 85.5%, with parameters reduced to 10.96 × 107 , GFLOPS increasing to 32.3, and FPS dropping to 400.0. This indicates that FADC significantly enhances feature extraction by adapting convolutional kernel dilation rates to varying fog densities. The addition of the SEFFM module further improved mAP to 85.8%, with parameters slightly increasing to 11.13 × 10⁷ , GFLOPS increasing to 28.9, and FPS maintaining a relatively high level at 434.7. The SEFFM module efficiently fused spatial and edge information, enhancing feature representation, particularly in low-contrast environments, while maintaining computational efficiency. Incorporating the DTAH module significantly boosted mAP to 86.2%, with FPS rising to 500.0 and parameters reducing to 8.87 × 10⁷ , although GFLOPS increased to 33.0. The DTAH module dynamically aligned tasks, optimizing feature fusion and task allocation strategies, resulting in notable improvements in detection performance and computational efficiency. This module’s dynamic resource allocation is particularly effective in multi-object detection scenarios. When combining the SEFFM and DTAH modules, mAP further improved to 86.4%, with parameters slightly increasing to 8.88 × 10⁷, GFLOPS rising to 33.5, and FPS dropping to 384.6. Finally, integrating all modules into the Fog-UAVNet model achieved the highest mAP of 86.6%, with parameters at 8.87 × 10⁷, GFLOPS at 33.0, and FPS maintaining at 370.3. Despite the increased computational complexity, the performance enhancement was substantial, demonstrating the synergistic effect of the combined modules.

4.1.2. Ablation Studies on VisDrone-Fog

We conducted similar ablation studies on the VisDrone-Fog dataset to evaluate the modules’ effects under real-world foggy conditions. The results are presented in Table 3. Introducing the FADC module improved mAP to 37.5%, with parameters slightly increasing to 11.16 × 10⁷, GFLOPS rising to 28.0, and FPS decreasing to 131.6. This module’s adaptability to varying fog densities significantly enhanced feature extraction accuracy, crucial for maintaining high detection performance in foggy environments. Adding the SEFFM module raised mAP to 37.3%, with parameters slightly increasing to 11.13 × 10⁷, GFLOPS slightly increasing to 28.9, and FPS decreasing to 116.3. The SEFFM module’s efficient fusion of edge and spatial features significantly improved feature representation, particularly under low-contrast and complex backgrounds, while maintaining computational efficiency. Incorporating the DTAH module substantially increased mAP to 38.7%, with parameters reducing to 8.87 × 10⁷, GFLOPS rising to 33.0, and FPS increasing to 128.2. The DTAH module dynamically aligned tasks, optimizing feature fusion and task allocation strategies, resulting in notable improvements in detection performance and computational efficiency. When all modules were integrated, the Fog-UAVNet model achieved the highest mAP of 39.8%, with parameters at 8.87 × 10⁷, GFLOPS at 33.0, and FPS being maintained at 131.6. This shows that despite the increased computational complexity, the synergistic effect of the combined modules significantly enhanced overall performance, demonstrating the Fog-UAVNet’s robustness and efficacy in real-world foggy conditions. Through ablation studies on the VOC-Foggy and VisDrone-Fog datasets, we validated the critical roles of the FADC, SEFFM, and DTAH modules in enhancing detection performance. The FADC module improved adaptability to varying fog densities, significantly enhancing feature extraction accuracy. The SEFFM module effectively fused edge and spatial features, improving detection accuracy in low-contrast environments while maintaining computational efficiency. The DTAH module dynamically aligned tasks, optimizing resource allocation and significantly improving detection performance and computational efficiency. Integrating all modules, Fog-UAVNet demonstrated superior performance under both UAV foggy scenarios and foggy conditions in the VOC dataset, with notable improvements in mAP, parameters, GFLOPS, and FPS. These enhancements make Fog-UAVNet particularly suitable for IoT applications such as smart city surveillance and traffic monitoring, showcasing broad application potential and significant practical value.

4.2. Comparison with State-of-the-Art Methods

4.2.1. Comparison on Synthetic Dataset

To demonstrate the efficacy of the Fog-UAVNet model, we conducted extensive comparisons with other state-of-the-art detection models on the VOC-FOG-test dataset, a synthetic dataset tailored for evaluating performance in foggy conditions. The comparison aimed to highlight the robustness and accuracy of our model under challenging visual impairments caused by fog. The results, as summarized in Table 4, show that Fog-UAVNet consistently outperforms other models across most categories, achieving the highest mean average precision (mAP) of 86.60%. This performance underscores the effectiveness of our model in handling adverse weather conditions in UAV applications, particularly beneficial for IoT scenarios where reliable visual data acquisition is crucial. Fog-UAVNet achieved the highest scores in detecting bicycles, cars, and motorbikes, with precision rates of 87.92%, 89.62%, and 86.23%, respectively. These results are significant as they demonstrate the model’s superior ability to discern finer details and maintain detection accuracy despite the presence of fog. Notably, Fog-UAVNet also ranked second in detecting persons and buses, with scores of 84.91% and 84.95%, closely following the top-performing model in these categories. The table also reveals interesting insights into the performance variations among different training strategies and datasets. Models trained on the VOC-Norm dataset, which consists of clean images, generally performed worse under foggy conditions compared to those trained directly on VOC-FOG, highlighting the importance of domain-specific training for adverse weather conditions. Additionally, models employing dehazing strategies like DCP-YOLOXs and FFA-YOLOXs showed improved performance over basic models trained on clean images but still fell short compared to our Fog-UAVNet. This suggests that while dehazing can aid in improving visibility, integrating comprehensive multi-task learning strategies directly tuned for fog conditions, as implemented in Fog-UAVNet, provides a more effective solution. Overall, the comparison validates the advanced capabilities of Fog-UAVNet in navigating the complexities of foggy environments, making it a highly suitable choice for deployment in IoT frameworks that demand high reliability and accuracy, such as in smart city surveillance and disaster management operations.

4.2.2. Comparison on Real-World Datasets

We further validated the performance of our Fog-UAVNet by comparing it with several leading methods on two challenging real-world foggy datasets: the RTTS dataset and the Foggy Driving Dataset. On the RTTS dataset, which contains a wide variety of real-world foggy conditions, Fog-UAVNet demonstrated superior performance across all categories. Detailed results presented in Table 5 show that Fog-UAVNet achieved the highest mean Average Precision (mAP) compared to ten other state-of-the-art detection systems. This performance underscores the effectiveness of Fog-UAVNet’s specialized modules, which are specifically designed to handle the complexities introduced by foggy conditions. Moreover, evaluations using the Foggy Driving Dataset, illustrated in Table 6, consistently show that Fog-UAVNet outperforms competing models. This dataset, focusing on vehicular and pedestrian scenarios in fog, is particularly demanding due to the critical need for high accuracy in real-time applications. The success of Fog-UAVNet on this dataset can be attributed to its innovative Dynamic Task-Aligned Head (DTAH) and Spatial-Edge Feature Fusion Module (SEFFM). These modules enable the model to effectively learn and adapt to varying degrees of visibility and contrast, which are common in adverse weather conditions. The integration of these features allows Fog-UAVNet not only to detect but also to accurately classify objects under challenging visual impairments. These rigorous tests confirm that Fog-UAVNet is not only theoretically sound but also practically effective, making it highly suitable for deployment in IoT scenarios where reliable and accurate object detection is critical, such as in smart city surveillance, traffic monitoring, and disaster response operations.

4.2.3. Comparison on Foggy Dataset

To further validate the effectiveness of Fog-UAVNet, we conducted comparative evaluations on the self-collected Foggy Dataset. As shown in Table 7, Fog-UAVNet consistently achieved the highest mean Average Precision (mAP) across all categories, surpassing state-of-the-art detectors such as YOLOv8s, Cascade R-CNN, and Faster R-CNN.

Specifically, Fog-UAVNet excelled in detecting key object categories like pedestrians, bicycles, and cars, with mAP scores of 96.50%, 90.00%, and 96.25%, respectively. These results underscore the model’s robustness in foggy conditions, where visibility is often severely compromised. The consistent high performance across different object categories indicates that Fog-UAVNet effectively adapts to varying object sizes and shapes, maintaining detection accuracy even in challenging scenarios.

The comparison with other models highlights the limitations of traditional detection systems in foggy conditions. For instance, while models like Cascade R-CNN and Faster R-CNN perform reasonably well under clear conditions, their effectiveness diminishes significantly in the presence of fog.

Overall, the Foggy Dataset results provide further empirical support for Fog-UAVNet’s effectiveness in low-visibility environments, reinforcing its utility for UAV-based object detection in adverse weather scenarios.

4.3. Edge Deployment Evaluation

To evaluate the deployment practicality of the proposed Fog-UAVNet model on embedded platforms, we tested the trained Fog-UAVNet on the NVIDIA Jetson Nano. With its compact size and low power consumption, the Jetson Nano is an ideal platform for resource-constrained environments. The unoptimized model achieved an inference speed of 19.4 FPS on the Jetson Nano, which is suitable for basic applications but insufficient for tasks with higher real-time requirements.

To enhance performance, TensorRT optimization was applied to Fog-UAVNet. As a result, the model achieved a more than threefold increase in frame rate—reaching 78.5 FPS—which is sufficient for real-time UAV onboard applications. Table 8 presents a detailed comparison before and after optimization.

4.4. Object Detection on VisDrone and VisDrone-Fog Datasets

In this section, we meticulously analyze the performance of our proposed model, Fog-UAVNet, against state-of-the-art (SOTA) generic object detectors on both the VisDrone and VisDrone-Fog datasets, as outlined in Table 9. The VisDrone dataset, recognized for its complexity and diversity, was chosen for its representation of real-world urban scenarios, providing a rigorous benchmark to assess the generalizability of UAV detection models. The VisDrone-Fog dataset, which we generated by introducing artificial fog into the VisDrone dataset, offers a unique and challenging environment to test our model’s performance in adverse weather conditions. This dataset is critical for demonstrating the effectiveness of Fog-UAVNet under visibility-impairing conditions, where traditional models often struggle. Fog-UAVNet demonstrated strong performance on the VisDrone dataset, achieving a mean Average Precision (mAP) of 39.6%. While the LWUAVDet-S model slightly outperformed Fog-UAVNet with an mAP of 40.3%, our model remained highly competitive, particularly excelling in challenging categories such as bicycles, cars, trucks, and motorcycles. This highlights Fog-UAVNet’s adaptability to various object sizes and types, making it well-suited for diverse urban surveillance applications. The true strength of Fog-UAVNet, however, is evident on the VisDrone-Fog dataset, where it achieved an mAP of 39.8%, outperforming all other models. This superior performance underscores the effectiveness of the three key modules integrated into Fog-UAVNet. The Frequency-Adaptive Dilated Convolution (FADC) module enhances feature extraction by dynamically adjusting dilation rates based on fog density, ensuring optimal detection accuracy even in low-visibility conditions. The Spatial-Edge Feature Fusion Module (SEFFM) further improves the model’s ability to detect objects in low-contrast environments by effectively combining spatial and edge information. Meanwhile, the Dynamic Task-Aligned Head (DTAH) ensures efficient task alignment and resource allocation, which is crucial for handling the complexities of multi-object detection in UAV operations. Fog-UAVNet was designed to excel in foggy conditions, yet its performance in normal conditions remains highly competitive. This dual capability makes Fog-UAVNet a versatile and reliable tool for a wide range of UAV applications, from smart city surveillance to traffic monitoring. Moreover, the model’s efficient use of computational resources—achieving high detection accuracy without excessive parameters or computational power—demonstrates its practicality for real-time deployments on resource-constrained UAV platforms. In summary, the results from both the VisDrone and VisDroneFog datasets clearly demonstrate that Fog-UAVNet consistently performs at or above the level of existing SOTA models, particularly excelling in challenging foggy environments. This capability positions Fog-UAVNet as a highly effective solution for real-world IoT deployments, where reliability and accuracy are critical.

4.5. Visual Analysis of UAV Object Detection in Foggy Conditions

To demonstrate the effectiveness of our Fog-UAVNet model in the context of UAV object detection under foggy conditions, this section presents a detailed visual comparison among the original images without fog, original images with fog, the detection results of the baseline YOLOv8 model, and the detection results of the FogUAVNet model. These visualizations illustrate the accuracy improvements provided by Fog-UAVNet, highlighting its superior capability to differentiate and accurately identify objects in dense fog. The side-by-side visual comparisons clearly show that Fog-UAVNet significantly reduces false positives and more accurately identifies the true boundaries and types of objects, as illustrated in Figure 13.

These visual representations substantiate not only the enhanced detection capabilities of Fog-UAVNet but also its potential applicability in real-world UAV monitoring scenarios under adverse weather conditions. The clarity and precision of the detections underscore the model’s practical value for improving surveillance, traffic monitoring, and disaster response in foggy environments.

5. Discussion

This study addresses three long-standing challenges faced in UAV target detection under foggy conditions—insufficient accuracy, lack of robustness, and limitations in lightweight design—by proposing a comprehensive solution. Overall experimental results demonstrate that Fog-UAVNet achieves a practical balance among detection accuracy, generalization capability, and model lightweightness in foggy UAV scenarios.

At the module level, Fog-UAVNet addresses weaknesses in perception under foggy conditions through a complementary design of its components. The SEFFM module enhances structural cues in low-contrast, blurred regions by integrating Sobel gradients and MaxPool spatial context. The FADC module dynamically adjusts dilation and convolution based on frequency characteristics, strengthening feature robustness against uneven fog density. Meanwhile, the DTAH module aligns classification and localization branches through task-aware decomposition, reducing inter-task interference. This synergy of structural fusion, frequency awareness, and task-level alignment makes Fog-UAVNet well-adapted to foggy UAV scenarios. Ablation results show that variations in mAP across benchmarks stem primarily from scene complexity rather than model failure. On the simpler VOC-Foggy dataset with limited categories and controlled degradation, the model performs well, while on the more challenging VisDrone-Fog—featuring real-world dense targets, occlusion, and heavy fog—Fog-UAVNet still achieves consistent gains over baselines. These results confirm the model’s effectiveness and generalizability in complex fog-degraded environments.

Nonetheless this study has several limitations that point to directions for future work. Current experiments do not cover performance under varying fog densities—light, moderate, and heavy—or in scenarios involving multiple concurrent degradation factors such as noise. Adapting the model to such complex environments will help enhance its generalizability. Furthermore, although the architecture emphasizes lightweight design, future efforts should focus on evaluating and optimizing its inference power consumption, particularly for deployment on resource-constrained UAV platforms.

6. Conclusions

This work presented Fog-UAVNet, a lightweight and fog-resilient object detection framework tailored for UAV-based IoT applications in low-visibility environments. By incorporating Frequency-Adaptive Dilated Convolution (FADC), Spatial-Edge Feature Fusion Module (SEFFM), and Dynamic Task-Aligned Head (DTAH), the model enhances feature representation and task-specific alignment under fog-induced degradation. To support thorough evaluation, we introduced VisDrone-Fog, a synthetic benchmark with controlled visibility levels, and constructed Foggy Datasets from real-world foggy scenes, offering diverse testing conditions.

Extensive experiments conducted on six benchmarks—VisDrone, VisDrone-Fog, VOC-Foggy, VOC-Norm, RTTS, and Foggy Datasets—demonstrate that Fog-UAVNet consistently surpasses existing state-of-the-art detectors in both accuracy and efficiency, validating its robustness and generalization. Importantly, the model maintains a compact architecture, making it suitable for deployment on resource-constrained UAV platforms.

Future work will focus on architectural refinement and domain-adaptive learning to enhance resilience under extreme fog and broaden applicability to cross-modal UAV perception in complex environmental conditions.

Author Contributions

Q.D. conceived the study. Q.D. and T.H. drafted the manuscript. T.H. performed algorithm development and data analysis. G.W. collected and organized the data. G.W. and L.S. provided technical input. Y.L. assisted with dataset preparation and experimental implementation. G.W. and T.H. handled language editing. All authors have read and agreed to the published version of the manuscript. Q.D. and T.H. contributed equally to this work. We sincerely appreciate the support from our research team and collaborators, whose valuable insights and discussions have significantly contributed to this work.

Funding

This research was funded by the National Key R&D Program of China (Grant No. 2019YFB1405302) and the National Natural Science Foundation of China (Grant No. 61872072).

Data Availability Statement

The datasets used in this study are available from the corresponding author upon reasonable request.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

Abbreviations	Full name
Fog-UAVNet	Fog-aware Unmanned Aerial Vehicle Network
UAV	Unmanned Aerial Vehicle
IoT	Internet of Things
SEFFM	Spatial-Edge Feature Fusion Module
FADC	Frequency-Adaptive Dilated Convolution
DTAH	Dynamic Task-Aligned Head
SPPF	Spatial Pyramid Pooling-Fast
FreqSelect	Frequency Selection
AdaDR	Adaptive Dilation Rate
AdaKern	Adaptive Kernel
CNN	Convolutional Neural Network
AP	Average Precision
mAP	Mean Average Precision
mA $P_{50}$	Mean Average Precision at IoU = 0.5
IoU	Intersection over Union
FPS	Frames Per Second
VOC	Visual Object Classes
VOC-Foggy	Foggy Visual Object Classes (foggy PASCAL VOC)
VOC-Norm	Normal Visual Object Classes (clear PASCAL VOC)
VisDrone-Fog	Foggy VisDrone Dataset
RTTS	Real-world Task-driven Testing Set
SOTA	State of the Art

References

Buzcu, B.; Özgün, M.; Gürcan, G.; Aydo&gcaron;an, R. Fully Autonomous Trustworthy Unmanned Aerial Vehicle Teamwork: A Research Guideline Using Level 2 Blockchain. IEEE Robot. Autom. Mag. 2024, 31, 78–88. [Google Scholar] [CrossRef]
Chandran, I.; Vipin, K. Multi-UAV networks for disaster monitoring: Challenges and opportunities from a network perspective. Drone Syst. Appl. 2024, 12, 1–28. [Google Scholar] [CrossRef]
Piao, S.; Li, N.; Miao, Z. Research on UAV Vision Landing Target Detection and Obstacle Avoidance Algorithm. In Proceedings of the 2023 42nd Chinese Control Conference (CCC), Tianjin, China, 24–26 July 2023; pp. 4443–4448. [Google Scholar] [CrossRef]
Yang, X.; Mi, M.B.; Yuan, Y.; Wang, X.; Tan, R.T. Object Detection in Foggy Scenes by Embedding Depth and Reconstruction into Domain Adaptation. In Proceedings of the Computer Vision—ACCV 2022: 16th Asian Conference on Computer Vision, Macao, China, 4–8 December 2022; pp. 303–318. [Google Scholar] [CrossRef]
Tahir, N.U.A.; Zhang, Z.; Asim, M.; Chen, J.; ELAffendi, M. Object Detection in Autonomous Vehicles under Adverse Weather: A Review of Traditional and Deep Learning Approaches. Algorithms 2024, 17, 103. [Google Scholar] [CrossRef]
Sarker, I.H. Machine Learning: Algorithms, Real-World Applications and Research Directions. SN Comput. Sci. 2021, 2, 160. [Google Scholar] [CrossRef]
Liu, S.; Bai, Y. Multiple UAVs collaborative traffic monitoring with intention-based communication. Comput. Commun. 2023, 210, 116–129. [Google Scholar] [CrossRef]
Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar] [CrossRef]
Girshick, R. Fast R-CNN. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar] [CrossRef]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef]
Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. SSD: Single Shot MultiBox Detector. In Computer Vision—ECCV 2016; Leibe, B., Matas, J., Sebe, N., Welling, M., Eds.; Springer: Cham, Switzerland, 2016; pp. 21–37. [Google Scholar]
Dong, Q.; Han, T.; Wu, G.; Qiao, B.; Sun, L. RSNet: Compact-Align Detection Head Embedded Lightweight Network for Small Object Detection in Remote Sensing. Remote Sens. 2025, 17, 1965. [Google Scholar] [CrossRef]
Carion, N.; Massa, F.; Synnaeve, G.; Usunier, N.; Kirillov, A.; Zagoruyko, S. End-to-End Object Detection with Transformers. In Computer Vision—ECCV 2020; Vedaldi, A., Bischof, H., Brox, T., Frahm, J.M., Eds.; Springer: Cham, Switzerland, 2020; pp. 213–229. [Google Scholar]
Dong, Q.; Han, T.; Wu, G.; Sun, L.; Huang, M.; Zhang, F. Industrial device-aided data collection for real-time rail defect detection via a lightweight network. Eng. Appl. Artif. Intell. 2025, 161, 112102. [Google Scholar] [CrossRef]
Chen, Z.; Yang, J.; Chen, L.; Li, F.; Feng, Z.; Jia, L.; Li, P. RailVoxelDet: A Lightweight 3-D Object Detection Method for Railway Transportation Driven by Onboard LiDAR Data. IEEE Internet Things J. 2025, 12, 37175–37189. [Google Scholar] [CrossRef]
Keshun, Y.; Yingkui, G.; Yanghui, L.; Yajun, W. A novel physical constraint-guided quadratic neural networks for interpretable bearing fault diagnosis under zero-fault sample. Nondestruct. Test. Eval. 2025, 1–31. [Google Scholar] [CrossRef]
Butilă, E.V.; Boboc, R.G. Urban Traffic Monitoring and Analysis Using Unmanned Aerial Vehicles (UAVs): A Systematic Literature Review. Remote Sens. 2022, 14, 620. [Google Scholar] [CrossRef]
Ye, T.; Zhang, Y.; Jiang, M.; Chen, L.; Liu, Y.; Chen, S.; Chen, E. Perceiving and Modeling Density for Image Dehazing. In Computer Vision—ECCV 2022: 17th European Conference, Tel Aviv, Israel, 23–27 October 2022, Proceedings, Part XIX; Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T., Eds.; Springer: Cham, Switzerland, 2022; pp. 130–145. [Google Scholar]
Guo, C.; Yan, Q.; Anwar, S.; Cong, R.; Ren, W.; Li, C. Image Dehazing Transformer with Transmission-Aware 3D Position Embedding. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 5802–5810. [Google Scholar] [CrossRef]
Song, Y.; He, Z.; Qian, H.; Du, X. Vision Transformers for Single Image Dehazing. IEEE Trans. Image Process. 2023, 32, 1927–1941. [Google Scholar] [CrossRef]
Bai, H.; Pan, J.; Xiang, X.; Tang, J. Self-Guided Image Dehazing Using Progressive Feature Fusion. IEEE Trans. Image Process. 2022, 31, 1217–1229. [Google Scholar] [CrossRef] [PubMed]
Qiu, Y.; Lu, Y.; Wang, Y.; Jiang, H. IDOD-YOLOV7: Image-Dehazing YOLOV7 for Object Detection in Low-Light Foggy Traffic Environments. Sensors 2023, 23, 1347. [Google Scholar] [CrossRef]
Liu, W.; Ren, G.; Yu, R.; Guo, S.; Zhu, J.; Zhang, L. Image-Adaptive YOLO for Object Detection in Adverse Weather Conditions. In Proceedings of the AAAI Conference on Artificial Intelligence, Pomona CA USA, 24–28 October 2022. [Google Scholar]
Wang, Y.; Yan, X.; Zhang, K.; Gong, L.; Xie, H.; Wang, F.L.; Wei, M. TogetherNet: Bridging Image Restoration and Object Detection Together via Dynamic Enhancement Learning. Comput. Graph. Forum 2022, 41, 465–476. [Google Scholar] [CrossRef]
Hnewa, M.; Radha, H. Multiscale Domain Adaptive Yolo For Cross-Domain Object Detection. In Proceedings of the 2021 IEEE International Conference on Image Processing (ICIP), Anchorage, AK, USA, 19–22 September 2021; IEEE: Piscataway, NJ, USA, 2021. [Google Scholar] [CrossRef]
Min, X.; Zhou, W.; Hu, R.; Wu, Y.; Pang, Y.; Yi, J. LWUAVDet: A Lightweight UAV Object Detection Network on Edge Devices. IEEE Internet Things J. 2024, 11, 24013–24023. [Google Scholar] [CrossRef]
Xiong, X.; He, M.; Li, T.; Zheng, G.; Xu, W.; Fan, X.; Zhang, Y. Adaptive Feature Fusion and Improved Attention Mechanism-Based Small Object Detection for UAV Target Tracking. IEEE Internet Things J. 2024, 11, 21239–21249. [Google Scholar] [CrossRef]
Bisio, I.; Haleem, H.; Garibotto, C.; Lavagetto, F.; Sciarrone, A. Performance Evaluation and Analysis of Drone-Based Vehicle Detection Techniques From Deep Learning Perspective. IEEE Internet Things J. 2022, 9, 10920–10935. [Google Scholar] [CrossRef]
Liu, K.; Peng, D.; Li, T. Multimodal Remote Sensing Object Detection Based on Prior-Enhanced Mixture-of-Experts Fusion Network. IEEE Trans. Geosci. Remote Sens. 2025, 63, 1–14. [Google Scholar] [CrossRef]
Zhu, P.; Wen, L.; Du, D.; Bian, X.; Fan, H.; Hu, Q.; Ling, H. Detection and tracking meet drones challenge. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 44, 7380–7399. [Google Scholar] [CrossRef]
Everingham, M.; Gool, L.V.; Williams, C.K.I.; Winn, J.; Zisserman, A. The Pascal Visual Object Classes (VOC) Challenge. Int. J. Comput. Vis. 2010, 88, 303–338. [Google Scholar] [CrossRef]
Sindagi, V.A.; Oza, P.; Yasarla, R.; Patel, V.M. Prior-Based Domain Adaptive Object Detection for Hazy and Rainy Conditions. In Computer Vision—ECCV 2020; Vedaldi, A., Bischof, H., Brox, T., Frahm, J.M., Eds.; Springer International Publishing: Cham, Switzerland, 2020; pp. 763–780. [Google Scholar]
Hahner, M.; Dai, D.; Sakaridis, C.; Zaech, J.N.; Gool, L.V. Semantic Understanding of Foggy Scenes with Purely Synthetic Data. In Proceedings of the 2019 IEEE Intelligent Transportation Systems Conference (ITSC), Auckland, New Zealand, 27–30 October 2019; pp. 3675–3681. [Google Scholar] [CrossRef]
Wu, Y.; He, K. Group Normalization. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018. [Google Scholar] [CrossRef]
Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal Loss for Dense Object Detection. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 42, 318–327. [Google Scholar] [CrossRef] [PubMed]
Chen, L.; Gu, L.; Zheng, D.; Fu, Y. Frequency-Adaptive Dilated Convolution for Semantic Segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024; pp. 3414–3425. [Google Scholar]
Zhao, S.; Fang, Y. DCP-YOLOv7: Dark channel prior image-dehazing YOLOv7 for vehicle detection in foggy scene. In Proceedings of the International Conference on Computer Vision and Pattern Analysis, Hangzhou, China, 31 March–2 April 2023. [Google Scholar]
Li, B.; Peng, X.; Wang, Z.; Xu, J.; Feng, D. AOD-Net: All-in-One Dehazing Network. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 4780–4788. [Google Scholar] [CrossRef]
Qin, X.; Wang, Z.; Bai, Y.; Xie, X.; Jia, H. FFA-Net: Feature fusion attention network for single image dehazing. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 11908–11915. [Google Scholar]
Li, C.; Wang, G.; Wang, B.; Liang, X.; Li, Z.; Chang, X. Dynamic Slimmable Network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021. [Google Scholar]

Figure 1. Comparison of detection performance in dense fog. (a) Baseline result. (b) Fog-UAVNet output, showing improved localization and fewer missed detections under severe visibility degradation.

Figure 3. Representative examples from the four benchmark datasets (VOC-Foggy, VOC-Norm, RTTS, and Foggy Driving), illustrating the diversity of scenes, object categories, and fog densities considered in this study.

Figure 4. (a) Statistical analysis of the Foggy Datasets: number of instances per class, distribution of bounding box sizes, object position heat map, and bounding box aspect distribution. (b) Example images illustrating urban scene diversity under varied foggy conditions.

Figure 5. Workflow of Fog-UAVNet: a UAV-mounted sensor transmits visual input to the onboard unit, where key objects are detected and classified to support tasks such as tracking and anomaly detection, with results compiled into real-time reports.

Figure 6. The architecture of Fog-UAVNet, illustrating the integration of key components like the Frequency-Adaptive Dilated Convolution (FADC) module, Spatial-Edge Feature Fusion Module (SEFFM), and the Dynamic Task-Aligned Detection Head (DTAH). These modules form a unified pipeline tailored for robust UAV perception under foggy conditions.

Figure 7. Architecture of the Spatial-Edge Feature Fusion Module (SEFFM), which integrates edge cues via Sobel-based grouped convolution and spatial context via max pooling. The fused features are refined through stacked convolutions.

Figure 8. Illustration of the effect of concatenating the SobelConv and MaxPool branches in SEEFM. The SobelConv branch emphasizes edge structures, while the MaxPool branch preserves coarse spatial context; their concatenation, followed by two convolution layers, yields a fused feature map that provides more reliable cues for detection in foggy scenes.

Figure 9. Structure of the Dynamic Task-Aligned Head (DTAH). The diagram illustrates the layered architecture consisting of shared convolutional layers, dynamic convolution modules, and the GroupNorm components.

Figure 10. Architecture of the proposed Frequency-Adaptive Dilated Convolution (FADC) module, comprising three components: FreqSelect (frequency estimation), AdaDR (dilation adjustment), and AdaKern (kernel decomposition).

Figure 11. Overall workflow of the proposed Fog-UAVNet, from multi-source foggy dataset preparation and network training (with ablation and cross-dataset evaluation on a workstation) to embedded deployment and inference on an NVIDIA Jetson Nano edge device.

Figure 12. Ablation results on VOC-Foggy and VisDrone-Fog, showing mAP, FPS, parameters, and GFLOPS for the baseline and its incremental variants with FADC, SEFFM, DTAH, and the full Fog-UAVNet.

Figure 13. Visualization results on VisDrone-fog dataset. The first column shows the original images without fog, the second column shows the original images with fog, the third column displays the detection results of the YOLOv8 model, and the fourth column presents the detection results from our Fog-UAVNet.

Table 1. Statistics of image counts and object instances per class across benchmark fog-related datasets used in this study.

Dataset	Images	Person	Bicycle	Car	Bus	Motor	Truck	Total
VOC-Foggy-train	8111	13,256	1064	3267	822	1052	-	19,561
VOC-Foggy-val	2734	4528	377	1201	213	325	-	6604
RTTS	4322	7950	534	18,413	1838	862	-	29,577
Foggy Datasets	8700	18,389	1227	29,521	2070	1606	2888	55,701
Foggy Driving	101	181	12	295	21	16	-	525

Table 2. Ablation Study on VOC-Foggy Dataset: Performance Metrics with Incremental Integration of FADC, SEFFM, and DTAH Improvements.

Model	AP%					mA $P_{50}$ %	FPS	Parameters ( $\times 10^{7}$ )	GFLOPS (G)
	Person	Bicycle	Car	Motorbike	Bus
baseline	83.8	87.1	88.2	85.3	82.3	85.3	588.1	11.12	28.4
+FADC	84.0	86.8	88.2	84.9	83.2	85.5	400.0	10.96	32.3
+SEFFM	84.2	87.3	88.7	85.7	83.2	85.8	434.7	11.13	28.9
+DTAH	84.9	86.9	89.4	85.2	84.7	86.2	500.0	8.87	33.0
+SEFFM + DTAH	84.3	87.9	89.3	86.2	84.7	86.4	384.6	8.88	33.5
Fog-UAVNet	84.9	88.5	89.6	85.9	84.9	86.6	370.3	8.87	33.0

Table 3. Ablation Study on VisDrone-Fog Dataset: Performance and Computational Metrics.

Model	Pedestrian	People	Bicycle	Car	Van	Truck	Tricycle	Awning Tricycle	Bus	Motor	mA $P_{50}$ %	FPS	Params ( $\times 10^{7}$ )	GFLOPS (G)
baseline	40.9	30.6	10.7	78.2	41.8	33.9	23.8	13.4	53.8	41.9	36.9	107.5	11.12	28.5
+FADC	40.2	31.9	10.9	78.4	40.9	34.8	25.9	14.0	56.6	41.8	37.5	131.6	11.16	28.0
+SEFFM	40.5	31.5	11.1	78.7	43.0	32.9	25.8	13.8	53.9	41.8	37.3	116.3	11.13	28.9
+DTAH	42.9	33.0	12.1	79.2	43.0	36.3	25.5	13.6	57.7	44.0	38.7	128.2	8.87	33.0
+SEFFM + DTAH	42.5	33.3	11.9	79.3	44.1	35.7	25.8	14.1	58.3	43.3	38.8	117.6	8.88	33.5
Fog-UAVNet	43.6	33.9	12.3	79.9	44.6	37.0	28.5	14.7	58.7	44.4	39.8	131.6	8.87	33.0

Table 4. Comparison of Fog-UAVNet with state-of-the-art detection models on the VOC-FOG-test dataset. * denotes that the model was trained with clean images from the VOC-FOG dataset. Red and blue colors are used to indicate the 1st and 2nd ranks, respectively.

Method	Train Dataset	Type	Person	Bicycle	Car	Motor	Bus	mA $P_{50}$
YOLOv8s	VOC-FOG	Baseline	83.81	87.13	86.20	84.01	82.32	85.30
YOLOv8s *	VOC-Norm	Baseline	79.97	67.95	74.75	58.62	83.12	72.88
DCP-YOLO * [37]	VOC-Norm	Dehaze	81.58	78.80	79.75	78.51	85.64	80.86
AOD-Net * [38]	VOC-Norm	Dehaze	81.26	73.56	76.98	71.18	83.08	77.21
Semi-YOLO *	VOC-Norm	Dehaze	81.15	76.94	76.92	72.89	84.88	78.56
FFA-Net * [39]	VOC-Norm	Dehaze	78.30	70.31	69.97	68.80	80.72	73.62
MS-DAYOLO [25]	VOC-FOG	Dehaze	82.52	75.62	86.93	81.92	90.10	83.42
DS-Net [40]	VOC-FOG	Multi-task	72.44	60.47	81.27	53.85	61.43	65.89
IA-YOLO [23]	VOC-FOG	I-adaptive	70.98	61.98	70.98	57.93	61.98	64.77
TogetherNet [24]	VOC-FOG	Multi-task	87.62	78.19	85.92	84.03	88.75	85.90
Fog-UAVNet	VOC-FOG	Multi-task	84.91	87.92	89.62	86.23	84.95	86.60

Table 5. Comparison of Fog-UAVNet Performance Against Other Methods on the RTTS Dataset. Red and blue denote the best and second-best results in each column, respectively.

Method	Person	Bicycle	Car	Motor	Bus	mA $P_{50}$
YOLOv8s	80.88	64.27	56.39	52.74	29.60	56.74
AOD-Net	77.26	62.43	56.70	53.45	30.01	55.83
GCA-YOLO	79.12	67.10	56.41	58.68	34.17	58.64
DCP-YOLO	78.69	67.99	55.50	57.57	33.27	58.32
FFA-Net	77.12	66.51	64.23	40.64	23.71	52.64
MS-DAYOLO	74.22	44.13	70.91	38.64	36.54	57.39
DS-Net	68.81	18.02	46.13	15.44	15.44	32.71
IA-YOLO	67.25	35.84	42.65	17.64	37.04	37.55
TogetherNet	82.70	57.27	75.31	55.40	37.04	61.55
Ours	79.60	56.90	79.90	51.30	56.94	63.21

Table 6. Comparison of Fog-UAVNet performance against other methods on the Foggy Driving dataset. Red and blue denote the best and second-best results in each column, respectively.

Methods	Person	Bicycle	Car	Motor	Bus	mA $P_{50}$
YOLOv8s	24.36	27.25	55.08	8.04	44.79	33.06
AOD-Net	26.15	33.72	56.95	6.44	34.89	32.51
GCA-YOLO	27.96	34.11	56.36	6.77	34.21	33.77
DCP-YOLO	22.64	11.07	56.37	4.66	36.03	31.56
FFA-Net	19.22	21.40	50.64	3.69	43.85	28.74
MS-DAYOLO	21.52	34.57	57.41	18.20	46.75	34.89
DS-Net	26.74	20.54	54.16	7.14	36.11	29.74
IA-YOLO	20.24	19.04	50.67	14.87	22.97	31.55
TogetherNet	30.48	30.47	57.87	14.87	40.88	36.75
Ours	35.69	35.26	59.15	16.17	45.88	38.41

Table 7. Comparison of Fog-UAVNet performance against other methods on the Foggy dataset. Red and blue denote the best and second-best results in each column, respectively.

Methods	Person	Bicycle	Car	Bus	Motor	Truck	mA $P_{50}$
Faster R-CNN	92.96	92.41	59.36	91.42	34.81	80.57	79.74
Grid R-CNN	92.64	11.07	56.37	4.66	30.63	31.56	26.56
dynamic R-CNN	89.22	79.70	51.64	69.49	62.35	58.94	79.74
YOLOv5	93.74	74.94	92.16	87.54	87.91	83.04	86.60
YOLOv6	93.54	73.34	93.27	85.57	86.97	80.95	85.23
YOLOv3-tiny	90.24	78.04	89.67	88.57	86.97	80.95	85.93
Cascade R-CNN	91.15	92.90	66.95	92.61	79.89	84.21	83.27
Ours	96.50	90.00	96.25	91.90	92.40	83.10	91.90

Table 8. Jetson Nano Inference Comparison.

Model	Parameter	FPS (Jetson Nano)	GFLOPs
baseline	11.12	19.4	28.5
Fog-UAVNet	8.87	78.5	33.0

Table 9. Comparison with the SOTA Generic Object Detectors on VisDrone and VisDrone-fog Test Sets. Note: Red and Blue denote the top-ranked and second-ranked results, respectively, across each evaluation metric.

Model	Pedestrian	People	Bicycle	Car	Van	Truck	Tricycle	Awning Tricycle	Bus	Motor	mA $P_{50}$
VisDrone Test Set
SSD	18.7%	9.0%	7.3%	63.2%	29.9%	33.1%	11.7%	11.1%	49.8%	19.1%	25.3%
Faster R-CNN	12.5%	8.1%	5.8%	44.1%	20.4%	19.0%	8.5%	8.73%	43.8%	16.8%	19.9%
Cascade-RCNN	22.2%	14.8%	7.6%	54.6%	31.5%	21.0%	14.8%	8.6%	33.0%	21.4%	23.2%
CenterNet	22.9%	11.6%	7.5%	61.9%	19.4%	24.7%	13.1%	14.2%	42.6%	18.8%	23.7%
YOLOv8	40.4%	32.2%	11.6%	78.7%	43.5%	35.6%	25.6%	15.3%	53.5%	42.3%	37.7%
FiFoNet	45.1%	35.4%	12.8%	79.1%	40.1%	34.3%	22.0%	12.1%	48.6%	42.4%	37.3%
YOLOv10	43.4%	25.1%	8.65%	75.7%	38.1%	27.7%	20.5%	11.3%	45.5%	36.9%	36.6%
LWUAVDet-S	44.0%	34.6%	14.2%	78.9%	45.6%	35.8%	25.7%	15.8%	56.7%	43.4%	40.3%
LW-YOLOv8	39.3%	31.9%	9.2%	78.5%	42.1%	30.8%	23.8%	15.0%	50.6%	41.4%	36.3%
Fog-UAVNet	43.3%	33.8%	14.8%	79.6%	45.5%	36.5%	26.5%	15.2%	56.1%	43.9%	39.6%
VisDrone-fog Test Set
SSD	16.2%	8.0%	6.2%	54.3%	26.7%	29.5%	10.4%	10.0%	43.1%	17.2%	22.1%
Faster R-CNN	10.8%	6.8%	4.9%	37.9%	17.6%	16.4%	7.3%	7.5%	37.6%	15.4%	17.2%
Cascade-RCNN	19.2%	12.8%	6.6%	46.9%	27.1%	18.1%	13.2%	7.6%	28.4%	18.9%	20.2%
CenterNet	20.1%	10.3%	6.1%	53.7%	16.8%	21.4%	11.3%	12.4%	36.9%	16.1%	20.5%
YOLOv7-tiny	34.2%	31.2%	8.2%	66.5%	32.9%	26.0%	16.6%	8.7%	41.2%	37.1%	30.2%
YOLOv8	40.5%	31.5%	11.1%	78.5%	43.4%	32.1%	25.8%	13.5%	53.2%	41.9%	36.8%
FiFoNet	44.9%	35.8%	11.8%	78.3%	41.1%	33.9%	22.4%	12.3%	48.7%	42.6%	36.9%
YOLOv10	39.1%	20.3%	6.9%	70.5%	34.2%	23.4%	17.6%	9.2%	40.8%	32.0%	33.5%
LWUAVDet-S	33.5%	27.2%	7.8%	67.6%	35.4%	26.7%	20.7%	13.5%	42.5%	34.8%	33.1%
LW-YOLOv8	30.5%	22.2%	6.9%	57.1%	30.2%	23.7%	15.1%	9.5%	41.8%	32.8%	29.9%
Fog-UAVNet	43.6%	33.9%	12.3%	79.9%	44.6%	37.0%	28.5%	14.7%	58.7%	44.1%	39.8%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Dong, Q.; Han, T.; Wu, G.; Sun, L.; Lu, Y. Robust Object Detection for UAVs in Foggy Environments with Spatial-Edge Fusion and Dynamic Task Alignment. Remote Sens. 2026, 18, 169. https://doi.org/10.3390/rs18010169

AMA Style

Dong Q, Han T, Wu G, Sun L, Lu Y. Robust Object Detection for UAVs in Foggy Environments with Spatial-Edge Fusion and Dynamic Task Alignment. Remote Sensing. 2026; 18(1):169. https://doi.org/10.3390/rs18010169

Chicago/Turabian Style

Dong, Qing, Tianxin Han, Gang Wu, Lina Sun, and Yuchang Lu. 2026. "Robust Object Detection for UAVs in Foggy Environments with Spatial-Edge Fusion and Dynamic Task Alignment" Remote Sensing 18, no. 1: 169. https://doi.org/10.3390/rs18010169

APA Style

Dong, Q., Han, T., Wu, G., Sun, L., & Lu, Y. (2026). Robust Object Detection for UAVs in Foggy Environments with Spatial-Edge Fusion and Dynamic Task Alignment. Remote Sensing, 18(1), 169. https://doi.org/10.3390/rs18010169

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Robust Object Detection for UAVs in Foggy Environments with Spatial-Edge Fusion and Dynamic Task Alignment

Highlights

Abstract

1. Introduction

2. Related Work

2.1. Object Detection Under Complex and Adverse Conditions

2.2. IoT-Driven UAV Detection in Challenging Environments

3. Materials and Methods

3.1. Dataset Description

3.2. Experimental Setup and Evaluation Metrics

3.3. Fog-UAVNet

3.4. Spatial-Edge Feature Fusion Module

3.5. Dynamic Task-Aligned Head (DTAH)

3.6. Frequency-Adaptive Dilated Convolution (FADC)

3.7. Embedded Deployment and Optimization

4. Results

4.1. Ablation Studies

4.1.1. Ablation Studies on VOC-Foggy

4.1.2. Ablation Studies on VisDrone-Fog

4.2. Comparison with State-of-the-Art Methods

4.2.1. Comparison on Synthetic Dataset

4.2.2. Comparison on Real-World Datasets

4.2.3. Comparison on Foggy Dataset

4.3. Edge Deployment Evaluation

4.4. Object Detection on VisDrone and VisDrone-Fog Datasets

4.5. Visual Analysis of UAV Object Detection in Foggy Conditions

5. Discussion

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI