MDPI - Publisher of Open Access Journals

21 pages, 1928 KB

Open AccessArticle

A CNN-Transformer Hybrid Framework for Multi-Label Predator–Prey Detection in Agricultural Fields

by Yifan Lyu, Feiyu Lu, Xuaner Wang, Yakui Wang, Zihuan Wang, Yawen Zhu, Zhewei Wang and Min Dong

Sensors 2025, 25(15), 4719; https://doi.org/10.3390/s25154719 - 31 Jul 2025

Cited by 1 | Viewed by 881

Abstract

Accurate identification of predator–pest relationships is essential for implementing effective and sustainable biological control in agriculture. However, existing image-based methods struggle to recognize insect co-occurrence under complex field conditions, limiting their ecological applicability. To address this challenge, we propose a hybrid deep learning [...] Read more.

Accurate identification of predator–pest relationships is essential for implementing effective and sustainable biological control in agriculture. However, existing image-based methods struggle to recognize insect co-occurrence under complex field conditions, limiting their ecological applicability. To address this challenge, we propose a hybrid deep learning framework that integrates convolutional neural networks (CNNs) and Transformer architectures for multi-label recognition of predator–pest combinations. The model leverages a novel co-occurrence attention mechanism to capture semantic relationships between insect categories and employs a pairwise label matching loss to enhance ecological pairing accuracy. Evaluated on a field-constructed dataset of 5,037 images across eight categories, the model achieved an F1-score of 86.5%, mAP50 of 85.1%, and demonstrated strong generalization to unseen predator–pest pairs with an average F1-score of 79.6%. These results outperform several strong baselines, including ResNet-50, YOLOv8, and Vision Transformer. This work contributes a robust, interpretable approach for multi-object ecological detection and offers practical potential for deployment in smart farming systems, UAV-based monitoring, and precision pest management. Full article

(This article belongs to the Special Issue Sensor and AI Technologies in Intelligent Agriculture: 2nd Edition)

► Show Figures

Figure 1

27 pages, 23958 KB

Open AccessArticle

Cross-Scene Multi-Object Tracking for Drones: Leveraging Meta-Learning and Onboard Parameters with the New MIDDTD

by Chenghang Wang, Xiaochun Shen, Zhaoxiang Zhang, Chengyang Tao and Yuelei Xu

Drones 2025, 9(5), 341; https://doi.org/10.3390/drones9050341 - 30 Apr 2025

Cited by 1 | Viewed by 1058 | Correction

Abstract

Multi-object tracking (MOT) is a key intermediate task in many practical applications and theoretical fields, facing significant challenges due to complex scenarios, particularly in the context of drone-based air-to-ground military operations. During drone flight, factors such as high-altitude environments, small target proportions, irregular [...] Read more.

Multi-object tracking (MOT) is a key intermediate task in many practical applications and theoretical fields, facing significant challenges due to complex scenarios, particularly in the context of drone-based air-to-ground military operations. During drone flight, factors such as high-altitude environments, small target proportions, irregular target movement, and frequent occlusions complicate the multi-object tracking task. This paper proposes a cross-scene multi-object tracking (CST) method to address these challenges. Firstly, a lightweight object detection framework is proposed to optimize key sub-tasks by integrating multi-dimensional temporal and spatial information. Secondly, trajectory prediction is achieved through the implementation of Model-Agnostic Meta-Learning, enhancing adaptability to dynamic environments. Thirdly, re-identification is facilitated using Dempster–Shafer Theory, which effectively manages uncertainties in target recognition by incorporating aircraft state information. Finally, a novel dataset, termed the Multi-Information Drone Detection and Tracking Dataset (MIDDTD), is introduced, containing rich drone-related information and diverse scenes, thereby providing a solid foundation for the validation of cross-scene multi-object tracking algorithms. Experimental results demonstrate that the proposed method improves the IDF1 tracking metric by 1.92% compared to existing state-of-the-art methods, showcasing strong cross-scene adaptability and offering an effective solution for multi-object tracking from a drone’s perspective, thereby advancing theoretical and technical support for related fields. Full article

► Show Figures

Figure 1

41 pages, 43778 KB

Open AccessReview

UAV (Unmanned Aerial Vehicle): Diverse Applications of UAV Datasets in Segmentation, Classification, Detection, and Tracking

by Md. Mahfuzur Rahman, Sunzida Siddique, Marufa Kamal, Rakib Hossain Rifat and Kishor Datta Gupta

Algorithms 2024, 17(12), 594; https://doi.org/10.3390/a17120594 - 23 Dec 2024

Cited by 2 | Viewed by 3282

Abstract

Unmanned Aerial Vehicles (UAVs) have transformed the process of data collection and analysis in a variety of research disciplines, delivering unparalleled adaptability and efficacy. This paper presents a thorough examination of UAV datasets, emphasizing their wide range of applications and progress. UAV datasets [...] Read more.

Unmanned Aerial Vehicles (UAVs) have transformed the process of data collection and analysis in a variety of research disciplines, delivering unparalleled adaptability and efficacy. This paper presents a thorough examination of UAV datasets, emphasizing their wide range of applications and progress. UAV datasets consist of various types of data, such as satellite imagery, images captured by drones, and videos. These datasets can be categorized as either unimodal or multimodal, offering a wide range of detailed and comprehensive information. These datasets play a crucial role in disaster damage assessment, aerial surveillance, object recognition, and tracking. They facilitate the development of sophisticated models for tasks like semantic segmentation, pose estimation, vehicle re-identification, and gesture recognition. By leveraging UAV datasets, researchers can significantly enhance the capabilities of computer vision models, thereby advancing technology and improving our understanding of complex, dynamic environments from an aerial perspective. This review aims to encapsulate the multifaceted utility of UAV datasets, emphasizing their pivotal role in driving innovation and practical applications in multiple domains. Full article

(This article belongs to the Special Issue Machine Learning for Pattern Recognition (2nd Edition))

► Show Figures

Figure 1

23 pages, 16936 KB

Open AccessArticle

OMCTrack: Integrating Occlusion Perception and Motion Compensation for UAV Multi-Object Tracking

by Zhaoyang Dang, Xiaoyong Sun, Bei Sun, Runze Guo and Can Li

Drones 2024, 8(9), 480; https://doi.org/10.3390/drones8090480 - 12 Sep 2024

Cited by 5 | Viewed by 3083

Abstract

Compared to images captured from ground-level perspectives, objects in UAV images are often more challenging to track due to factors such as long-distance shooting, occlusion, and motion blur. Traditional multi-object trackers are not well-suited for UAV multi-object tracking tasks. To address these challenges, [...] Read more.

Compared to images captured from ground-level perspectives, objects in UAV images are often more challenging to track due to factors such as long-distance shooting, occlusion, and motion blur. Traditional multi-object trackers are not well-suited for UAV multi-object tracking tasks. To address these challenges, we propose an online multi-object tracking network, OMCTrack. To better handle object occlusion and re-identification, we designed an occlusion perception module that re-identifies lost objects and manages occlusion without increasing computational complexity. By employing a simple yet effective hierarchical association method, this module enhances tracking accuracy and robustness under occlusion conditions. Additionally, we developed an adaptive motion compensation module that leverages prior information to dynamically detect image distortion, enabling the system to handle the UAV’s complex movements. The results from the experiments on the VisDrone2019 and UAVDT datasets demonstrate that OMCTrack significantly outperforms existing UAV video tracking methods. Full article

► Show Figures

Figure 1

23 pages, 36072 KB

Open AccessArticle

Dynamic Screening Strategy Based on Feature Graphs for UAV Object and Group Re-Identification

by Guoqing Zhang, Tianqi Liu and Zhonglin Ye

Remote Sens. 2024, 16(5), 775; https://doi.org/10.3390/rs16050775 - 22 Feb 2024

Cited by 2 | Viewed by 1689

Abstract

In contemporary times, owing to the swift advancement of Unmanned Aerial Vehicles (UAVs), there is enormous potential for the use of UAVs to ensure public safety. Most research on capturing images by UAVs mainly focuses on object detection and tracking tasks, but few [...] Read more.

In contemporary times, owing to the swift advancement of Unmanned Aerial Vehicles (UAVs), there is enormous potential for the use of UAVs to ensure public safety. Most research on capturing images by UAVs mainly focuses on object detection and tracking tasks, but few studies have focused on the UAV object re-identification task. In addition, in the real-world scenarios, objects frequently get together in groups. Therefore, re-identifying UAV objects and groups poses a significant challenge. In this paper, a novel dynamic screening strategy based on feature graphs framework is proposed for UAV object and group re-identification. Specifically, the graph-based feature matching module presented aims to enhance the transmission of group contextual information by using adjacent feature nodes. Additionally, a dynamic screening strategy designed attempts to prune the feature nodes that are not identified as the same group to reduce the impact of noise (other group members but not belonging to this group). Extensive experiments have been conducted on the Road Group, DukeMTMC Group and CUHK-SYSU-Group datasets to validate our framework, revealing superior performance compared to most methods. The Rank-1 on CUHK-SYSU-Group, Road Group and DukeMTMC Group datasets reaches 71.8%, 86.4% and 57.8%, respectively. Meanwhile, our method performance is explored on the UAV datasets of PRAI-1581 and Aerial Image, the infrared datasets of SYSU-MM01 and CM-Group and the NIR dataset of RBG-NIR Scene dataset; the unexpected findings demonstrate the robustness and wide applicability of our method. Full article

(This article belongs to the Special Issue Unmanned Aerial Vehicles (UAV): New Solutions and Applications for Real-Life Tasks)

► Show Figures

Figure 1

20 pages, 5031 KB

Open AccessArticle

An Asymmetric Feature Enhancement Network for Multiple Object Tracking of Unmanned Aerial Vehicle

by Jianbo Ma, Dongxu Liu, Senlin Qin, Ge Jia, Jianlin Zhang and Zhiyong Xu

Remote Sens. 2024, 16(1), 70; https://doi.org/10.3390/rs16010070 - 23 Dec 2023

Cited by 12 | Viewed by 1934

Abstract

Multiple object tracking (MOT) in videos captured by unmanned aerial vehicle (UAV) is a fundamental aspect of computer vision. Recently, the one-shot tracking paradigm integrates the detection and re-identification (ReID) tasks, striking a balance between tracking accuracy and inference speed. This paradigm alleviates [...] Read more.

Multiple object tracking (MOT) in videos captured by unmanned aerial vehicle (UAV) is a fundamental aspect of computer vision. Recently, the one-shot tracking paradigm integrates the detection and re-identification (ReID) tasks, striking a balance between tracking accuracy and inference speed. This paradigm alleviates task conflicts and achieves remarkable results through various feature decoupling methods. However, in challenging scenarios like drone movements, lighting changes and object occlusion, it still encounters issues with detection failures and identity switches. In addition, traditional feature decoupling methods directly employ channel-based attention to decompose the detection and ReID branches, without a meticulous consideration of the specific requirements of each branch. To address the above problems, we introduce an asymmetric feature enhancement network with a global coordinate-aware enhancement (GCAE) module and an embedding feature aggregation (EFA) module, aiming to optimize the two branches independently. On the one hand, we develop the GCAE module for the detection branch, which effectively merges rich semantic information within the feature space to improve detection accuracy. On the other hand, we introduce the EFA module for the ReID branch, which highlights the significance of pixel-level features and acquires discriminative identity embedding through a local feature aggregation strategy. By efficiently incorporating the GCAE and EFA modules into the one-shot tracking pipeline, we present a novel MOT framework, named AsyUAV. Extensive experiments have demonstrated the effectiveness of our proposed AsyUAV. In particular, it achieves a MOTA of 38.3% and IDF1 of 51.7% on VisDrone2019, and a MOTA of 48.0% and IDF1 of 67.5% on UAVDT, outperforming existing state-of-the-art trackers. Full article

► Show Figures

Graphical abstract

13 pages, 2129 KB

Open AccessArticle

High-Performance Detection-Based Tracker for Multiple Object Tracking in UAVs

by Xi Li, Ruixiang Zhu, Xianguo Yu and Xiangke Wang

Drones 2023, 7(11), 681; https://doi.org/10.3390/drones7110681 - 20 Nov 2023

Cited by 5 | Viewed by 4423

Abstract

As a result of increasing urbanization, traffic monitoring in cities has become a challenging task. The use of Unmanned Aerial Vehicles (UAVs) provides an attractive solution to this problem. Multi-Object Tracking (MOT) for UAVs is a key technology to fulfill this task. Traditional [...] Read more.

As a result of increasing urbanization, traffic monitoring in cities has become a challenging task. The use of Unmanned Aerial Vehicles (UAVs) provides an attractive solution to this problem. Multi-Object Tracking (MOT) for UAVs is a key technology to fulfill this task. Traditional detection-based-tracking (DBT) methods begin by employing an object detector to retrieve targets in each image and then track them based on a matching algorithm. Recently, the popular multi-task learning methods have been dominating this area, since they can detect targets and extract Re-Identification (Re-ID) features in a computationally efficient way. However, the detection task and the tracking task have conflicting requirements on image features, leading to the poor performance of the joint learning model compared to separate detection and tracking methods. The problem is more severe when it comes to UAV images due to the presence of irregular motion of a large number of small targets. In this paper, we propose using a balanced Joint Detection and Re-ID learning (JDR) network to address the MOT problem in UAV vision. To better handle the non-uniform motion of objects in UAV videos, the Set-Membership Filter is applied, which describes object state as a bounded set. An appearance-matching cascade is then proposed based on the target state set. Furthermore, a Motion-Mutation module is designed to address the challenges posed by the abrupt motion of UAV. Extensive experiments on the VisDrone2019-MOT dataset certify that our proposed model, referred to as SMFMOT, outperforms the state-of-the-art models by a wide margin and achieves superior performance in the MOT tasks in UAV videos. Full article

(This article belongs to the Special Issue Advances in Perception, Communications, and Control for Drones)

► Show Figures

Figure 1

17 pages, 6406 KB

Open AccessArticle

Improved UAV-to-Ground Multi-Target Tracking Algorithm Based on StrongSORT

by Xinyu Cao, Zhuo Wang, Bowen Zheng and Yajie Tan

Sensors 2023, 23(22), 9239; https://doi.org/10.3390/s23229239 - 17 Nov 2023

Cited by 7 | Viewed by 3347

Abstract

Unmanned aerial vehicles (UAV) are essential for aerial reconnaissance and monitoring. One of the greatest challenges facing UAVs is vision-based multi-target tracking. Multi-target tracking algorithms that depend on visual data are utilized in a variety of fields. In this study, we present a [...] Read more.

Unmanned aerial vehicles (UAV) are essential for aerial reconnaissance and monitoring. One of the greatest challenges facing UAVs is vision-based multi-target tracking. Multi-target tracking algorithms that depend on visual data are utilized in a variety of fields. In this study, we present a comprehensive framework for real-time tracking of ground robots in forest and grassland environments. This framework utilizes the YOLOv5n detection algorithm and a multi-target tracking algorithm for monitoring ground robot activities in real-time video streams. We optimized both detection and re-identification networks to enhance real-time target detection. The StrongSORT tracking algorithm was selected carefully to alleviate the loss of tracked objects due to factors like camera jitter, intersecting and overlapping targets, and smaller target sizes. The YOLOv5n algorithm was used to train the dataset, and the StrongSORT tracking algorithm incorporated the best-trained model weights. The algorithm’s performance has greatly improved, as demonstrated by experimental results. The number of ID switches (IDSW) has decreased by sixfold, IDF1 has increased by 7.93%, and false positives (FP) have decreased by 30.28%. Additionally, the tracking speed has reached 38 frames per second. These findings validate our algorithm’s ability to fulfill real-time tracking requisites on UAV platforms, delivering dependable resolutions for dynamic multi-target tracking on land. Full article

(This article belongs to the Section Vehicular Sensing)

► Show Figures

Figure 1

19 pages, 13378 KB

Open AccessArticle

Automated Identification and Classification of Plant Species in Heterogeneous Plant Areas Using Unmanned Aerial Vehicle-Collected RGB Images and Transfer Learning

by Girma Tariku, Isabella Ghiglieno, Gianni Gilioli, Fulvio Gentilin, Stefano Armiraglio and Ivan Serina

Drones 2023, 7(10), 599; https://doi.org/10.3390/drones7100599 - 25 Sep 2023

Cited by 10 | Viewed by 6296

Abstract

Biodiversity regulates agroecosystem processes, ensuring stability. Preserving and restoring biodiversity is vital for sustainable agricultural production. Species identification and classification in plant communities are key in biodiversity studies. Remote sensing supports species identification. However, accurately identifying plant species in heterogeneous plant areas presents [...] Read more.

Biodiversity regulates agroecosystem processes, ensuring stability. Preserving and restoring biodiversity is vital for sustainable agricultural production. Species identification and classification in plant communities are key in biodiversity studies. Remote sensing supports species identification. However, accurately identifying plant species in heterogeneous plant areas presents challenges in dataset acquisition, preparation, and model selection for image classification. This study presents a method that combines object-based supervised machine learning for dataset preparation and a pre-trained transfer learning model (EfficientNetV2) for precise plant species classification in heterogeneous areas. The methodology is based on the multi-resolution segmentation of the UAV RGB orthophoto of the plant community into multiple canopy objects, and on the classification of the plants in the orthophoto using the K-nearest neighbor (KNN) supervised machine learning algorithm. Individual plant species canopies are extracted with the ArcGIS training dataset. A pre-trained transfer learning model is then applied for classification. Test results show that the EfficientNetV2 achieves an impressive 99% classification accuracy for seven plant species. A comparative study contrasts the EfficientNetV2 model with other widely used transfer learning models: ResNet50, Xception, DenseNet121, InceptionV3, and MobileNetV2. Full article

► Show Figures

Figure 1

20 pages, 19077 KB

Open AccessArticle

Research on the Method of Counting Wheat Ears via Video Based on Improved YOLOv7 and DeepSort

by Tianle Wu, Suyang Zhong, Hao Chen and Xia Geng

Sensors 2023, 23(10), 4880; https://doi.org/10.3390/s23104880 - 18 May 2023

Cited by 19 | Viewed by 3261

Abstract

The number of wheat ears in a field is an important parameter for accurately estimating wheat yield. In a large field, however, it is hard to conduct an automated and accurate counting of wheat ears because of their density and mutual overlay. Unlike [...] Read more.

The number of wheat ears in a field is an important parameter for accurately estimating wheat yield. In a large field, however, it is hard to conduct an automated and accurate counting of wheat ears because of their density and mutual overlay. Unlike the majority of the studies conducted on deep learning-based methods that usually count wheat ears via a collection of static images, this paper proposes a counting method based directly on a UAV video multi-objective tracking method and better counting efficiency results. Firstly, we optimized the YOLOv7 model because the basis of the multi-target tracking algorithm is target detection. Simultaneously, the omni-dimensional dynamic convolution (ODConv) design was applied to the network structure to significantly improve the feature-extraction capability of the model, strengthen the interaction between dimensions, and improve the performance of the detection model. Furthermore, the global context network (GCNet) and coordinate attention (CA) mechanisms were adopted in the backbone network to implement the effective utilization of wheat features. Secondly, this study improved the DeepSort multi-objective tracking algorithm by replacing the DeepSort feature extractor with a modified ResNet network structure to achieve a better extraction of wheat-ear-feature information, and the constructed dataset was then trained for the re-identification of wheat ears. Finally, the improved DeepSort algorithm was used to calculate the number of different IDs that appear in the video, and an improved method based on YOLOv7 and DeepSort algorithms was then created to calculate the number of wheat ears in large fields. The results show that the mean average precision (mAP) of the improved YOLOv7 detection model is 2.5% higher than that of the original YOLOv7 model, reaching 96.2%. The multiple-object tracking accuracy (MOTA) of the improved YOLOv7–DeepSort model reached 75.4%. By verifying the number of wheat ears captured by the UAV method, it can be determined that the average value of an L1 loss is 4.2 and the accuracy rate is between 95 and 98%; thus, detection and tracking methods can be effectively performed, and the efficient counting of wheat ears can be achieved according to the ID value in the video. Full article

(This article belongs to the Section Smart Agriculture)

► Show Figures

Figure 1

14 pages, 1994 KB

Open AccessArticle

Directional Statistics-Based Deep Metric Learning for Pedestrian Tracking and Re-Identification

by Abdelhamid Bouzid, Daniel Sierra-Sosa and Adel Elmaghraby

Drones 2022, 6(11), 328; https://doi.org/10.3390/drones6110328 - 28 Oct 2022

Cited by 3 | Viewed by 2371

Abstract

Multiple Object Tracking (MOT) is the problem that involves following the trajectory of multiple objects in a sequence, generally a video. Pedestrians are among the most interesting subjects to track and recognize for many purposes such as surveillance, and safety. In recent years, [...] Read more.

Multiple Object Tracking (MOT) is the problem that involves following the trajectory of multiple objects in a sequence, generally a video. Pedestrians are among the most interesting subjects to track and recognize for many purposes such as surveillance, and safety. In recent years, Unmanned Aerial Vehicles (UAV’s) have been viewed as a viable option for monitoring public areas, as they provide a low-cost method of data collection while covering large and difficult-to-reach areas. In this paper, we present an online pedestrian tracking and re-identification framework based on learning a compact directional statistic distribution (von-Mises-Fisher distribution) for each person ID using a deep convolutional neural network. The distribution characteristics are trained to be invariant to clothes appearances and to transformations including rotation, translation, and background changes. Learning a vMF for each ID helps simultaneously in measuring the similarity between object instances and re-identifying the pedestrian’s ID. We experimentally validated our framework on standard publicly available dataset, which we used as a case study. Full article

► Show Figures

Figure 1

20 pages, 9922 KB

Open AccessArticle

One-Shot Multiple Object Tracking in UAV Videos Using Task-Specific Fine-Grained Features

by Han Wu, Jiahao Nie, Zhiwei He, Ziming Zhu and Mingyu Gao

Remote Sens. 2022, 14(16), 3853; https://doi.org/10.3390/rs14163853 - 9 Aug 2022

Cited by 32 | Viewed by 4129

Abstract

Multiple object tracking (MOT) in unmanned aerial vehicle (UAV) videos is a fundamental task and can be applied in many fields. MOT consists of two critical procedures, i.e., object detection and re-identification (ReID). One-shot MOT, which incorporates detection and ReID in a unified [...] Read more.

Multiple object tracking (MOT) in unmanned aerial vehicle (UAV) videos is a fundamental task and can be applied in many fields. MOT consists of two critical procedures, i.e., object detection and re-identification (ReID). One-shot MOT, which incorporates detection and ReID in a unified network, has gained attention due to its fast inference speed. It significantly reduces the computational overhead by making two subtasks share features. However, most existing one-shot trackers struggle to achieve robust tracking in UAV videos. We observe that the essential difference between detection and ReID leads to an optimization contradiction within one-shot networks. To alleviate this contradiction, we propose a novel feature decoupling network (FDN) to convert shared features into detection-specific and ReID-specific representations. The FDN searches for characteristics and commonalities between the two tasks to synergize detection and ReID. In addition, existing one-shot trackers struggle to locate small targets in UAV videos. Therefore, we design a pyramid transformer encoder (PTE) to enrich the semantic information of the resulting detection-specific representations. By learning scale-aware fine-grained features, the PTE empowers our tracker to locate targets in UAV videos accurately. Extensive experiments on VisDrone2021 and UAVDT benchmarks demonstrate that our tracker achieves state-of-the-art tracking performance. Full article

► Show Figures

Figure 1

25 pages, 62464 KB

Open AccessArticle

An Adaptively Attention-Driven Cascade Part-Based Graph Embedding Framework for UAV Object Re-Identification

by Bo Shen, Rui Zhang and Hao Chen

Remote Sens. 2022, 14(6), 1436; https://doi.org/10.3390/rs14061436 - 16 Mar 2022

Cited by 3 | Viewed by 2877

Abstract

With the rapid development of unmanned aerial vehicles (UAVs), object re-identification (Re-ID) based on the UAV platforms has attracted increasing attention, and several excellent achievements have been shown in the traditional scenarios. However, object Re-ID in aerial imagery acquired from the UAVs is [...] Read more.

With the rapid development of unmanned aerial vehicles (UAVs), object re-identification (Re-ID) based on the UAV platforms has attracted increasing attention, and several excellent achievements have been shown in the traditional scenarios. However, object Re-ID in aerial imagery acquired from the UAVs is still a challenging task, which is mainly due to the reason that variable locations and diverse viewpoints in UAVs platform are always resulting in more appearance ambiguities among the intra-objects and inter-objects. To address the above issues, in this paper, we proposed an adaptively attention-driven cascade part-based graph embedding framework (AAD-CPGE) for UAV object Re-ID. The AAD-CPGE aims to optimally fuse node features and their topological characteristics on the multi-scale structured graphs of parts-based objects, and then adaptively learn the most correlated information for improving the object Re-ID performance. Specifically, we first executed GCNs on the parts-based cascade node feature graphs and topological feature graphs for acquiring multi-scale structured-graph feature representations. After that, we designed a self-attention-based module for adaptive node and topological features fusion on the constructed hierarchical parts-based graphs. Finally, these learning hybrid graph-structured features with the most correlation discriminative capability were applied for object Re-ID. Several experimental verifications on three widely used UAVs-based benchmark datasets were carried out, and comparison with some state-of-the-art object Re-ID approaches validated the effectiveness and benefits of our proposed AAD-CPGE Re-ID framework. Full article

(This article belongs to the Special Issue Object-Level Remote Sensing Image Information Extraction and Applications)

► Show Figures

Figure 1

23 pages, 22566 KB

Open AccessArticle

Orientation- and Scale-Invariant Multi-Vehicle Detection and Tracking from Unmanned Aerial Videos

by Jie Wang, Sandra Simeonova and Mozhdeh Shahbazi

Remote Sens. 2019, 11(18), 2155; https://doi.org/10.3390/rs11182155 - 16 Sep 2019

Cited by 41 | Viewed by 6879

Abstract

Along with the advancement of light-weight sensing and processing technologies, unmanned aerial vehicles (UAVs) have recently become popular platforms for intelligent traffic monitoring and control. UAV-mounted cameras can capture traffic-flow videos from various perspectives providing a comprehensive insight into road conditions. To analyze [...] Read more.

Along with the advancement of light-weight sensing and processing technologies, unmanned aerial vehicles (UAVs) have recently become popular platforms for intelligent traffic monitoring and control. UAV-mounted cameras can capture traffic-flow videos from various perspectives providing a comprehensive insight into road conditions. To analyze the traffic flow from remotely captured videos, a reliable and accurate vehicle detection-and-tracking approach is required. In this paper, we propose a deep-learning framework for vehicle detection and tracking from UAV videos for monitoring traffic flow in complex road structures. This approach is designed to be invariant to significant orientation and scale variations in the videos. The detection procedure is performed by fine-tuning a state-of-the-art object detector, You Only Look Once (YOLOv3), using several custom-labeled traffic datasets. Vehicle tracking is conducted following a tracking-by-detection paradigm, where deep appearance features are used for vehicle re-identification, and Kalman filtering is used for motion estimation. The proposed methodology is tested on a variety of real videos collected by UAVs under various conditions, e.g., in late afternoons with long vehicle shadows, in dawn with vehicles lights being on, over roundabouts and interchange roads where vehicle directions change considerably, and from various viewpoints where vehicles’ appearance undergo substantial perspective distortions. The proposed tracking-by-detection approach performs efficiently at 11 frames per second on color videos of 2720p resolution. Experiments demonstrated that high detection accuracy could be achieved with an average F1-score of 92.1%. Besides, the tracking technique performs accurately, with an average multiple-object tracking accuracy (MOTA) of 81.3%. The proposed approach also addressed the shortcomings of the state-of-the-art in multi-object tracking regarding frequent identity switching, resulting in a total of only one identity switch over every 305 tracked vehicles. Full article

(This article belongs to the Special Issue Trends in UAV Remote Sensing Applications)

► Show Figures

Graphical abstract

Search Results (14)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (14)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI