When Deep Learning Meets Geometry for Air-to-Ground Perception on Drones

A special issue of Drones (ISSN 2504-446X).

Deadline for manuscript submissions: closed (30 August 2024) | Viewed by 25341

Printed Edition Available!
A printed edition of this Special Issue is available here.

Special Issue Editors


E-Mail Website
Guest Editor
Automatic Target Recognition (ATR) Key Lab, College of Electronic Science and Engineering, National University of Defense Technology (NUDT), Changsha 410073, China
Interests: devleoping air-to-ground sensing algorithms for drones (e.g. classification, detection, tracking, localization and mapping)
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
College of Electronic Science and Technology, National University of Defense Technology, Changsha 410073, China
Interests: optimization algorithms; computer vision; image processing; machine vision; pattern recognition; object recognition; feature extraction; 3D reconstruction; pattern matching; image recognition

E-Mail Website
Guest Editor
College of Intelligence Science and Technology, National University of Defense Technology, Changsha 410073, China
Interests: visual tracking and machine learning
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
The Key Laboratory of Machine Intelligence & System Control, School of Control Science and Engineering, Shandong University, Jinan 250100, China
Interests: visual saliency detection and segmentation

Special Issue Information

Dear Colleagues,

Recently, drones are drawing increasing attention as data acquisition or aerial perception platforms for many civilian or military applications. Owing to the success of deep learning in computer vision, drone images are processed in an end-to-end manner to achieve air-to-ground perception (e.g., detection, tracking, recognition). Generally, drone images are processed as general images ignoring the geometric metadata (e.g., location, altitude, pose) generated by the drone equipped GPS or IMU sensors. Inspired by Simultaneous Localization and Mapping (SLAM) which utilizes both image data and geometric data, this Special Issue aims at boosting deep learning based air-to-ground perception performance with geometric metadata for drones. We welcome submissions which provide the community with the most recent advancements regarding this Special Issue.

Topics of interest include, but are not limited to, the following:

  • Air-to-ground object detection for drones
  • Air-to-ground single/multiple object tracking for drones
  • Air-to-ground object localization for drones
  • Air-to-ground monocular visual slam for drones

Dr. Dongdong Li
Prof. Dr. Gongjian Wen
Dr. Yangliu Kuai
Dr. Runmin Cong
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Drones is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • object detection
  • object tracking
  • object localization
  • visual slam
  • embeded vision on drones

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue polices can be found here.

Published Papers (10 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

18 pages, 22836 KiB  
Article
Drone-Based Visible–Thermal Object Detection with Transformers and Prompt Tuning
by Rui Chen, Dongdong Li, Zhinan Gao, Yangliu Kuai and Chengyuan Wang
Drones 2024, 8(9), 451; https://doi.org/10.3390/drones8090451 - 1 Sep 2024
Viewed by 885
Abstract
The use of unmanned aerial vehicles (UAVs) for visible–thermal object detection has emerged as a powerful technique to improve accuracy and resilience in challenging contexts, including dim lighting and severe weather conditions. However, most existing research relies on Convolutional Neural Network (CNN) frameworks, [...] Read more.
The use of unmanned aerial vehicles (UAVs) for visible–thermal object detection has emerged as a powerful technique to improve accuracy and resilience in challenging contexts, including dim lighting and severe weather conditions. However, most existing research relies on Convolutional Neural Network (CNN) frameworks, limiting the application of the Transformer’s attention mechanism to mere fusion modules and neglecting its potential for comprehensive global feature modeling. In response to this limitation, this study introduces an innovative dual-modal object detection framework called Visual Prompt multi-modal Detection (VIP-Det) that harnesses the Transformer architecture as the primary feature extractor and integrates vision prompts for refined feature fusion. Our approach begins with the training of a single-modal baseline model to solidify robust model representations, which is then refined through fine-tuning that incorporates additional modal data and prompts. Tests on the DroneVehicle dataset show that our algorithm achieves remarkable accuracy, outperforming comparable Transformer-based methods. These findings indicate that our proposed methodology marks a significant advancement in the realm of UAV-based object detection, holding significant promise for enhancing autonomous surveillance and monitoring capabilities in varied and challenging environments. Full article
Show Figures

Figure 1

17 pages, 7252 KiB  
Article
A Large Scale Benchmark of Person Re-Identification
by Qingze Yin and Guodong Ding
Drones 2024, 8(7), 279; https://doi.org/10.3390/drones8070279 - 21 Jun 2024
Viewed by 1200
Abstract
Unmanned aerial vehicles (UAVs)-based Person Re-Identification (ReID) is a novel field. Person ReID is the task of identifying individuals across different frames or views, often in surveillance or security contexts. At the same time, UAVs enhance person ReID through their mobility, real-time monitoring, [...] Read more.
Unmanned aerial vehicles (UAVs)-based Person Re-Identification (ReID) is a novel field. Person ReID is the task of identifying individuals across different frames or views, often in surveillance or security contexts. At the same time, UAVs enhance person ReID through their mobility, real-time monitoring, and ability to access challenging areas despite privacy, legal, and technical challenges.To facilitate the advancement and adaptation of existing person ReID approach to the UAV scenarios, this paper introduces a baseline along with two datasets, i.e., LSMS and LSMS-UAV. Both datasets have the following key features: (1) LSMS: Raw videos captured by a network of 29 cameras deployed across complex outdoor environments. LSMS-UAV: captured by 1 UAV. (2) LSMS: Videos span both winter and spring seasons, encompassing diverse weather conditions and various lighting conditions throughout different times of the day. (3) LSMS: Including the largest number of annotated identities, comprising 7730 identities and 286,695 bounding boxes. LSMS-UAV: comprising 500 identities and 2000 bounding boxes. Comprehensive experiments demonstrate LSMS’s excellent capability in addressing the domain gap issue when facing complex and unknown environments. The LSMS-UAV dataset verifies that UAV data has strong transferability to traditional camera-based data. Full article
Show Figures

Figure 1

17 pages, 8147 KiB  
Article
A Dynamic Visual SLAM System Incorporating Object Tracking for UAVs
by Minglei Li, Jia Li, Yanan Cao and Guangyong Chen
Drones 2024, 8(6), 222; https://doi.org/10.3390/drones8060222 - 29 May 2024
Viewed by 1490
Abstract
The capability of unmanned aerial vehicles (UAVs) to capture and utilize dynamic object information assumes critical significance for decision making and scene understanding. This paper presents a method for UAV relative positioning and target tracking based on a visual simultaneousocalization and mapping (SLAM) [...] Read more.
The capability of unmanned aerial vehicles (UAVs) to capture and utilize dynamic object information assumes critical significance for decision making and scene understanding. This paper presents a method for UAV relative positioning and target tracking based on a visual simultaneousocalization and mapping (SLAM) framework. By integrating an object detection neural network into the SLAM framework, this method can detect moving objects and effectively reconstruct the 3D map of the environment from image sequences. For multiple object tracking tasks, we combine the region matching of semantic detection boxes and the point matching of the optical flow method to perform dynamic object association. This joint association strategy can prevent trackingoss due to the small proportion of the object in the whole image sequence. To address the problem ofacking scale information in the visual SLAM system, we recover the altitude data based on a RANSAC-based plane estimation approach. The proposed method is tested on both the self-created UAV dataset and the KITTI dataset to evaluate its performance. The results demonstrate the robustness and effectiveness of the solution in facilitating UAV flights. Full article
Show Figures

Figure 1

16 pages, 2457 KiB  
Article
Event-Assisted Object Tracking on High-Speed Drones in Harsh Illumination Environment
by Yuqi Han, Xiaohang Yu, Heng Luan and Jinli Suo
Drones 2024, 8(1), 22; https://doi.org/10.3390/drones8010022 - 16 Jan 2024
Cited by 2 | Viewed by 2656
Abstract
Drones have been used in a variety of scenarios, such as atmospheric monitoring, fire rescue, agricultural irrigation, etc., in which accurate environmental perception is of crucial importance for both decision making and control. Among drone sensors, the RGB camera is indispensable for capturing [...] Read more.
Drones have been used in a variety of scenarios, such as atmospheric monitoring, fire rescue, agricultural irrigation, etc., in which accurate environmental perception is of crucial importance for both decision making and control. Among drone sensors, the RGB camera is indispensable for capturing rich visual information for vehicle navigation but encounters a grand challenge in high-dynamic-range scenes, which frequently occur in real applications. Specifically, the recorded frames suffer from underexposure and overexposure simultaneously and degenerate the successive vision tasks. To solve the problem, we take object tracking as an example and leverage the superior response of event cameras over a large intensity range to propose an event-assisted object tracking algorithm that can achieve reliable tracking under large intensity variations. Specifically, we propose to pursue feature matching from dense event signals and, based on this, to (i) design a U-Net-based image enhancement algorithm to balance RGB intensity with the help of neighboring frames in the time domain and then (ii) construct a dual-input tracking model to track the moving objects from intensity-balanced RGB video and event sequences. The proposed approach is comprehensively validated in both simulation and real experiments. Full article
Show Figures

Figure 1

23 pages, 13428 KiB  
Article
Typical Fault Detection on Drone Images of Transmission Lines Based on Lightweight Structure and Feature-Balanced Network
by Gujing Han, Ruijie Wang, Qiwei Yuan, Liu Zhao, Saidian Li, Ming Zhang, Min He and Liang Qin
Drones 2023, 7(10), 638; https://doi.org/10.3390/drones7100638 - 17 Oct 2023
Cited by 2 | Viewed by 2573
Abstract
In the context of difficulty in detection problems and the limited computing resources of various fault scales in aerial images of transmission line UAV inspections, this paper proposes a TD-YOLO algorithm (YOLO for transmission detection). Firstly, the Ghost module is used to lighten [...] Read more.
In the context of difficulty in detection problems and the limited computing resources of various fault scales in aerial images of transmission line UAV inspections, this paper proposes a TD-YOLO algorithm (YOLO for transmission detection). Firstly, the Ghost module is used to lighten the model’s feature extraction network and prediction network, significantly reducing the number of parameters and the computational effort of the model. Secondly, the spatial and channel attention mechanism scSE (concurrent spatial and channel squeeze and channel excitation) is embedded into the feature fusion network, with PA-Net (path aggregation network) to construct a feature-balanced network, using channel weights and spatial weights as guides to achieving the balancing of multi-level and multi-scale features in the network, significantly improving the detection capability under the coexistence of multiple targets of different categories. Thirdly, a loss function, NWD (normalized Wasserstein distance), is introduced to enhance the detection of small targets, and the fusion ratio of NWD and CIoU is optimized to further compensate for the loss of accuracy caused by the lightweightedness of the model. Finally, a typical fault dataset of transmission lines is built using UAV inspection images for training and testing. The experimental results show that the TD-YOLO algorithm proposed in this article compresses 74.79% of the number of parameters and 66.92% of the calculation amount compared to YOLOv7-Tiny and increases the mAP (mean average precision) by 0.71%. The TD-YOLO was deployed into Jetson Xavier NX to simulate the UAV inspection process and was run at 23.5 FPS with good results. This study offers a reference for power line inspection and provides a possible way to deploy edge computing devices on unmanned aerial vehicles. Full article
Show Figures

Figure 1

27 pages, 9078 KiB  
Article
Relative Localization within a Quadcopter Unmanned Aerial Vehicle Swarm Based on Airborne Monocular Vision
by Xiaokun Si, Guozhen Xu, Mingxing Ke, Haiyan Zhang, Kaixiang Tong and Feng Qi
Drones 2023, 7(10), 612; https://doi.org/10.3390/drones7100612 - 29 Sep 2023
Cited by 5 | Viewed by 2670
Abstract
Swarming is one of the important trends in the development of small multi-rotor UAVs. The stable operation of UAV swarms and air-to-ground cooperative operations depend on precise relative position information within the swarm. Existing relative localization solutions mainly rely on passively received external [...] Read more.
Swarming is one of the important trends in the development of small multi-rotor UAVs. The stable operation of UAV swarms and air-to-ground cooperative operations depend on precise relative position information within the swarm. Existing relative localization solutions mainly rely on passively received external information or expensive and complex sensors, which are not applicable to the application scenarios of small-rotor UAV swarms. Therefore, we develop a relative localization solution based on airborne monocular sensing data to directly realize real-time relative localization among UAVs. First, we apply the lightweight YOLOv8-pose target detection algorithm to realize the real-time detection of quadcopter UAVs and their rotor motors. Then, to improve the computational efficiency, we make full use of the geometric properties of UAVs to derive a more adaptable algorithm for solving the P3P problem. In order to solve the multi-solution problem when less than four motors are detected, we analytically propose a positive solution determination scheme based on reasonable attitude information. We also introduce the maximum weight of the motor-detection confidence into the calculation of relative localization position to further improve the accuracy. Finally, we conducted simulations and practical experiments on an experimental UAV. The experimental results verify the feasibility of the proposed scheme, in which the performance of the core algorithm is significantly improved over the classical algorithm. Our research provides viable solutions to free UAV swarms from external information dependence, apply them to complex environments, improve autonomous collaboration, and reduce costs. Full article
Show Figures

Figure 1

15 pages, 2345 KiB  
Article
Drone Based RGBT Tracking with Dual-Feature Aggregation Network
by Zhinan Gao, Dongdong Li, Gongjian Wen, Yangliu Kuai and Rui Chen
Drones 2023, 7(9), 585; https://doi.org/10.3390/drones7090585 - 18 Sep 2023
Cited by 5 | Viewed by 1771
Abstract
In the field of drone-based object tracking, utilization of the infrared modality can improve the robustness of the tracker in scenes with severe illumination change and occlusions and expand the applicable scene of the drone object tracking task. Inspired by the great achievements [...] Read more.
In the field of drone-based object tracking, utilization of the infrared modality can improve the robustness of the tracker in scenes with severe illumination change and occlusions and expand the applicable scene of the drone object tracking task. Inspired by the great achievements of Transformer structure in the field of RGB object tracking, we design a dual-modality object tracking network based on Transformer. To better address the problem of visible-infrared information fusion, we propose a Dual-Feature Aggregation Network that utilizes attention mechanisms in both spatial and channel dimensions to aggregate heterogeneous modality feature information. The proposed algorithm has achieved better performance by comparing with the mainstream algorithms in the drone-based dual-modality object tracking dataset VTUAV. Additionally, the algorithm is lightweight and can be easily deployed and executed on a drone edge computing platform. In summary, the proposed algorithm is mainly applicable to the field of drone dual-modality object tracking and the algorithm is optimized so that it can be deployed on the drone edge computing platform. The effectiveness of the algorithm is proved by experiments and the scope of drone object tracking is extended effectively. Full article
Show Figures

Figure 1

25 pages, 5912 KiB  
Article
Implicit Neural Mapping for a Data Closed-Loop Unmanned Aerial Vehicle Pose-Estimation Algorithm in a Vision-Only Landing System
by Xiaoxiong Liu, Changze Li, Xinlong Xu, Nan Yang and Bin Qin
Drones 2023, 7(8), 529; https://doi.org/10.3390/drones7080529 - 12 Aug 2023
Cited by 3 | Viewed by 1768
Abstract
Due to their low cost, interference resistance, and concealment of vision sensors, vision-based landing systems have received a lot of research attention. However, vision sensors are only used as auxiliary components in visual landing systems because of their limited accuracy. To solve the [...] Read more.
Due to their low cost, interference resistance, and concealment of vision sensors, vision-based landing systems have received a lot of research attention. However, vision sensors are only used as auxiliary components in visual landing systems because of their limited accuracy. To solve the problem of the inaccurate position estimation of vision-only sensors during landing, a novel data closed-loop pose-estimation algorithm with an implicit neural map is proposed. First, we propose a method with which to estimate the UAV pose based on the runway’s line features, using a flexible coarse-to-fine runway-line-detection method. Then, we propose a mapping and localization method based on the neural radiance field (NeRF), which provides continuous representation and can correct the initial estimated pose well. Finally, we develop a closed-loop data annotation system based on a high-fidelity implicit map, which can significantly improve annotation efficiency. The experimental results show that our proposed algorithm performs well in various scenarios and achieves state-of-the-art accuracy in pose estimation. Full article
Show Figures

Figure 1

21 pages, 17339 KiB  
Article
TAN: A Transferable Adversarial Network for DNN-Based UAV SAR Automatic Target Recognition Models
by Meng Du, Yuxin Sun, Bing Sun, Zilong Wu, Lan Luo, Daping Bi and Mingyang Du
Drones 2023, 7(3), 205; https://doi.org/10.3390/drones7030205 - 16 Mar 2023
Cited by 1 | Viewed by 2019
Abstract
Recently, the unmanned aerial vehicle (UAV) synthetic aperture radar (SAR) has become a highly sought-after topic for its wide applications in target recognition, detection, and tracking. However, SAR automatic target recognition (ATR) models based on deep neural networks (DNN) are suffering from adversarial [...] Read more.
Recently, the unmanned aerial vehicle (UAV) synthetic aperture radar (SAR) has become a highly sought-after topic for its wide applications in target recognition, detection, and tracking. However, SAR automatic target recognition (ATR) models based on deep neural networks (DNN) are suffering from adversarial examples. Generally, non-cooperators rarely disclose any SAR-ATR model information, making adversarial attacks challenging. To tackle this issue, we propose a novel attack method called Transferable Adversarial Network (TAN). It can craft highly transferable adversarial examples in real time and attack SAR-ATR models without any prior knowledge, which is of great significance for real-world black-box attacks. The proposed method improves the transferability via a two-player game, in which we simultaneously train two encoder–decoder models: a generator that crafts malicious samples through a one-step forward mapping from original data, and an attenuator that weakens the effectiveness of malicious samples by capturing the most harmful deformations. Particularly, compared to traditional iterative methods, the encoder–decoder model can one-step map original samples to adversarial examples, thus enabling real-time attacks. Experimental results indicate that our approach achieves state-of-the-art transferability with acceptable adversarial perturbations and minimum time costs compared to existing attack methods, making real-time black-box attacks without any prior knowledge a reality. Full article
Show Figures

Figure 1

18 pages, 41864 KiB  
Article
Special Vehicle Detection from UAV Perspective via YOLO-GNS Based Deep Learning Network
by Zifeng Qiu, Huihui Bai and Taoyi Chen
Drones 2023, 7(2), 117; https://doi.org/10.3390/drones7020117 - 8 Feb 2023
Cited by 32 | Viewed by 5082
Abstract
At this moment, many special vehicles are engaged in illegal activities such as illegal mining, oil and gas theft, the destruction of green spaces, and illegal construction, which have serious negative impacts on the environment and the economy. The illegal activities of these [...] Read more.
At this moment, many special vehicles are engaged in illegal activities such as illegal mining, oil and gas theft, the destruction of green spaces, and illegal construction, which have serious negative impacts on the environment and the economy. The illegal activities of these special vehicles are becoming more and more rampant because of the limited number of inspectors and the high cost required for surveillance. The development of drone remote sensing is playing an important role in allowing efficient and intelligent monitoring of special vehicles. Due to limited onboard computing resources, special vehicle object detection still faces challenges in practical applications. In order to achieve the balance between detection accuracy and computational cost, we propose a novel algorithm named YOLO-GNS for special vehicle detection from the UAV perspective. Firstly, the Single Stage Headless (SSH) context structure is introduced to improve the feature extraction and facilitate the detection of small or obscured objects. Meanwhile, the computational cost of the algorithm is reduced in view of GhostNet by replacing the complex convolution with a linear transform by simple operation. To illustrate the performance of the algorithm, thousands of images are dedicated to sculpting in a variety of scenes and weather, each with a UAV view of special vehicles. Quantitative and comparative experiments have also been performed. Compared to other derivatives, the algorithm shows a 4.4% increase in average detection accuracy and a 1.6 increase in detection frame rate. These improvements are considered to be useful for UAV applications, especially for special vehicle detection in a variety of scenarios. Full article
Show Figures

Figure 1

Back to TopTop