You are currently viewing a new version of our website. To view the old version click .
Sensors
  • Article
  • Open Access

16 June 2023

Real-Time Vehicle Detection from UAV Aerial Images Based on Improved YOLOv5

,
,
,
and
College of Intelligent Equipment, Shandong University of Science and Technology, Taian 271019, China
*
Author to whom correspondence should be addressed.
This article belongs to the Section Vehicular Sensing

Abstract

Aerial vehicle detection has significant applications in aerial surveillance and traffic control. The pictures captured by the UAV are characterized by many tiny objects and vehicles obscuring each other, significantly increasing the detection challenge. In the research of detecting vehicles in aerial images, there is a widespread problem of missed and false detections. Therefore, we customize a model based on YOLOv5 to be more suitable for detecting vehicles in aerial images. Firstly, we add one additional prediction head to detect smaller-scale objects. Furthermore, to keep the original features involved in the training process of the model, we introduce a Bidirectional Feature Pyramid Network (BiFPN) to fuse the feature information from various scales. Lastly, Soft-NMS (soft non-maximum suppression) is employed as a prediction frame filtering method, alleviating the missed detection due to the close alignment of vehicles. The experimental findings on the self-made dataset in this research indicate that compared with YOLOv5s, the mAP@0.5 and mAP@0.5:0.95 of YOLOv5-VTO increase by 3.7% and 4.7%, respectively, and the two indexes of accuracy and recall are also improved.

1. Introduction

The usage of small, low-altitude UAVs has snowballed in recent years [1,2,3,4]. Objection detection techniques based on UAVs equipped with vision sensors have attracted much interest in areas such as unmanned vehicles and intelligent transportation systems [5,6,7,8]. UAV-based aerial vehicle detection techniques are less expensive than cameras installed at fixed locations and produce more extensive image views, greater flexibility, and broader coverage. UAVs can monitor road traffic over any range and provide critical information for subsequent intelligent traffic supervision tasks such as traffic flow calculation, unexpected accident detection, and traffic situational awareness. However, the vast percentage of vehicle targets have few feature points and small sizes [9,10], which presents a difficulty for precise and real-time vehicle detection in the UAV overhead view [11].
Existing vehicle detection approaches can be roughly divided into traditional and deep learning-based vehicle detection algorithms. Traditional vehicle detection algorithms must extract features [12,13] manually and then use SVM, AdaBoost, and other machine learning methods for classification. However, this way is time-consuming and can only extract shallow features, which has significant limitations when applied to aerial photography scenes with small targets. In recent years, with the continuous development of deep learning techniques, various artificial intelligence algorithms based on convolutional neural networks have played a great role in different fields, such as autonomous driving [14], optimization of medicine policies [15], and wildlife census [16]. Deep learning-based target detection algorithms have also been extensively applied, mainly including two-stage and single-stage algorithms. Two-stage target detection algorithms need to extract candidate regions first and then perform regression localization and classification of targets, with common examples including: Fast R-CNN [17], Faster R-CNN [18], and R-FCN [19]. Singh et al. [20] used Fast R-CNN-optimized samples to design a real-time intelligent framework that performs well on vehicle detection tasks with complex backgrounds and many small targets. Nevertheless, the model may not fit well for cases where the objective sizes vary widely. The authors of [21] conducted a study on vehicle detection based on Faster R-CNN, and the improved model reduced the latency and enhanced the detection performance for small targets. However, the model requires high computational resources in the detection process. Kong et al. [22] use a parallel RPN network combined with a density-based sample assigner to improve the detection of vehicle-dense areas in aerial images. However, the model structure is complex and requires two stages to complete the detection, which cannot meet the requirement of real-time detection. Since the two-stage detection algorithm requires the pre-generation of many pre-selected boxes, it is highly accurate but slow and cannot meet the needs of real-time detection [23]. The single-stage target detection algorithm directly transforms the localization and classification problem into a regression problem, which has an absolute speed advantage and accuracy potential compared with the two-stage one. The mainstream single-stage target detection algorithms mainly include the YOLO (You Only Look Once) series [24,25,26,27] and the SSD series [28]. Yin et al. [29] obtained outstanding detection performance for small objects by improving the efficiency of SSD in using feature information at different scales. However, the default box needs to be selected manually, which may affect the performance of the model in detecting small targets. Lin et al. [30] detect oriented vehicles in aerial images based on YOLOv4, and the improved model significantly improved the detection performance in scenarios with densely arranged vehicles and buildings. However, further improvement studies are lacking for scenes with small targets. Adel et al. [31] compared the detection performance of Faster R-CNN, YOLOv3, and YOLOv4 on the UAV aerial vehicle dataset but without considering the impact of vehicle occlusion, shooting angle, and lighting conditions on the model. Zhang et al. [32] propose a novel multi-scale adversarial network for improved vehicle detection in UAV imagery. The model performs great in images from different perspectives, heights, and imaging situations. However, the classification of vehicles is not specific enough, with only two categories: large vehicles and small vehicles.
Because of its excellent detection accuracy and quick inference, YOLOv5 [33] is applied extensively in various fields for practical applications. Niu et al. [34] used the ZrroDCE low-light enhancement algorithm to optimize the dataset and combined it with YOLOv5 and AlexNet for traffic light detection. Sun et al. [35] employed YOLOv5 to identify the marks added to bolts and nuts, from which the relative rotation angle was calculated to determine whether the bolts were loose. Yan et al. [36] applied an enhanced model based on YOLOv5 to apple detection, which improved the detection speed and reduced the false detection rate of obscured targets.
To reduce the false and missed detection rates of vehicle detection tasks, this paper conducts research to refine YOLOv5s, the smallest network in YOLOv5 version 6.1. The details are outlined as follows:
(1)
In this paper, a smaller detection layer is added to the three detection layers of the original network. It makes the network more sensitive to small targets in high-resolution pictures and strengthens the multi-scale detection capability of the network.
(2)
We introduce the Bifpn structure [37] based on YOLOv5, which strengthens the feature extraction and fusion process. Bifpn enables the model to utilize the deep and shallow feature information more effectively and thus obtain more details about the small and occluded objects.
(3)
YOLOv5s adopts the NMS algorithm, which directly deletes the one with low confidence in two candidate frames that overlap too much, resulting in missed detection. Therefore, we use the Soft-NMS (soft-non-maximum suppression) algorithm [38] to optimize the anchor frame confidence, effectively alleviating the missed detection caused by vehicle occlusion.

3. Experiments

3.1. Experimental Setup

In our experiments, the operating system was Linux, the CPU was an Intel(R) Xeon(R) Platinum 8358P CPU @ 2.60 GHz, the GPU was an RTX A5000-24 GB, and the framework was Pytorch. The experimental settings were based on the official YOLOv5 default parameters, including an adaptive anchor strategy and mosaic data enhancement. The parameters of the training process are set as shown in Table 1.
Table 1. Parameters of training.

3.2. Dataset Description

Vehicles of four categories—car, van, truck, and bus—were selected for training, validation, and testing by collating the open-source dataset VisDrone2019-DET [42]. The number of labels for each category is shown in Figure 8.
Figure 8. Pie chart describing the proportion of instances of labels for each category.
There are ten categories in the VisDrone2019-DET dataset labels, several of which have few vehicle targets in the photos. As a result, we carefully selected 3650 photos from the original dataset as the experimental dataset for this paper to increase the training efficiency of the model. Figure 9 shows some of the images in the dataset of this paper. The dataset is usually divided into a training set, a validation set, and a test set. Among them, the training set is responsible for training the model, the validation set is used to optimize the parameters continuously, and the test set is assigned to evaluate the model. If the data distributions of these three sets differ greatly, this may affect the generalization ability of the model in real scenarios. Therefore, it is essential to allocate the dataset randomly, and the common ratios are 8:1:1 and 7:2:1. In this work, we randomly partition the dataset roughly according to the proportion of 8:1:1 to obtain 2800 in the training set, 350 in the validation set, and 500 in the test set.
Figure 9. A few examples of the images in the dataset used in this article.

3.3. Data Pre-Processing

We applied adaptive image scaling and mosaic data enhancement to pre-process the dataset. Because many original images have different aspect ratios, they need to be scaled and padded before feeding them into the model. If the sides are filled with more black borders, it will lead to redundant information and affect the training speed. Therefore, we use adaptive image scaling, which can adaptively add the least amount of black borders to the original images, thus speeding up the learning speed of the model. Mosaic data enhancement selects four original randomly scaled images that are cropped and arranged, and then it stitches them into a new image. This data enhancement method can effectively boost the ability to detect small targets. Figure 10 shows the two data preprocessing types.
Figure 10. Effect of adaptive image scaling and mosaic data enhancement.

3.4. Evaluation Metrics

In this study, AP and mAP are used as the evaluation metrics of the model. The average precision considers the precision (P) and recall (R) of the model. FLOPs, parameters, and FPS are applied to estimate the model size. The equations for precision P, recall, and mAP are as follows.
P = T P T P + F P
R = T P T P + F N
A P = 0 1 P ( R ) d R
m A P = i = 1 N A P i N
The terms TP, FP, and FN indicate the number of objects that were correctly detected, wrongly detected, and undiscovered, respectively. P is the precision, indicating how many vehicles predicted to be in a certain category actually belong to that category in our manuscript. R is the recall, which shows the proportion of vehicles in a category in the dataset that are correctly predicted. It is easy to see that precision focuses on the accuracy of the detected vehicle category, while recall pursues the detection of more vehicles in a particular type. AP is the area under the P-R curve for a single class. Finally, mAP indicates the average AP of all categories and is a composite measure of detection performance.

4. Results

4.1. Ablation Experiment

Three structures are explored to improve the YOLOv5s algorithm in this paper. The first is the addition of a new detection layer, P2, to enhance the recognition capability of small target vehicles. The second is the introduction of a de-weighted BiFPN to make the feature fusion process more reasonable and practical. The third is the use of Soft-NMS as a prediction frame filtering algorithm to improve the detection performance of overlapping and occluded vehicles. We designed the corresponding ablation experiments to verify the effectiveness of YOLOv5 after adding different modules, and the results are shown in Table 2. The number of parameters and computation of the model modestly increased compared with the baseline model after adding the P2 detection layer, as seen from the data analysis in the table. By further introducing BiFPN, however, the number of parameters and calculations are reduced significantly while the accuracy is guaranteed. The three improvement strategies are combined to produce the improved model, YOLOv5-VTO. While the addition of Soft-NMS reduces the AP of “car” compared to using only P2 and Bifpn, the remaining categories of AP are improved. Because the model has achieved excellent detection performance for “car”, we think “van”, “truck”, and “bus” are more in demand for a boost in AP. In addition, the substantial improvement of mAP also indicates that the introduction of Soft-NMS plays a great role in enhancing the comprehensive performance of the model. Of course, it is also clear in Figure 7 that Soft-NMS does decrease the missed detection of closely arranged vehicles.
Table 2. The comparison of the performance with different modules.
Compared with the benchmark model, the two comprehensive indexes of mAP@0.5 and mAP@0.5:0.95 are improved by 3.7% and 4.7%, respectively, effectively improving the accuracy of aerial vehicle detection. Although there is a small increase in the number of parameters and computation compared with the benchmark model, it is discovered that the modified model can still satisfy the requirements of real-time detection in the following comparative experiments. The ablation experiments demonstrate that the approach used in this paper is excellent in the UAV aerial vehicle detection task, outperforming the base model in scenarios with tiny targets and more overlapping occluded objects.
Throughout the training procedure, the YOLOv5 and the YOLOv5-VTO models in this article use the same dataset and parameter settings. The mAP and loss comparison graphs of the two models are plotted according to the log files saved during the training process, as shown in Figure 11.
Figure 11. Comparison of the training curve between our model and YOLOv5s. (a) The loss function curve of the training set; (b) The loss function curve of the validation set; (c) The mAP curve.
Figure 11a shows that the model obtained a higher mAP after improvement. In contrast, Figure 11b,c illustrate no obvious overfitting problem in the training process. Furthermore, compared to the baseline model, the overall loss value of the model in the training and validation sets is much lower.
We plotted the P-R curves shown in Figure 12 based on the log files regarding precision and recall generated during the training of the model. Because Soft-NMS is a prediction-frame optimization method in the prediction phase, it is not involved in the model training process. Therefore, Figure 12B is only a curve obtained by training the model after adding P2 and Bifpn. The area below the P-R curve indicates the AP of a category, so that the closer the curve is to the upper right corner, the better the overall performance of the algorithm.
Figure 12. PR curve comparison: (A) PR curve of YOLOv5s and (B) PR curve of improved YOLOv5s.
It is not difficult to find that precision and recall have an inverse relationship in the P–R curve. This is because when the model pursues a high accuracy rate, it will be more conservative in prediction. Some samples with low confidence cannot be predicted confidently, and recall will be reduced accordingly. The importance of these two metrics is different in various scenarios. Therefore, we have to make a trade-off between precision and recall according to the needs of specific problems.
We can see that the model enhances the detection capability of vehicles, especially in the categories of “truck” and “bus,” by comparing the PR curves before and after the model improvement in Figure 12. However, compared with the “car” category, the performance of the updated model in the “truck” and “van” categories still needs to be improved, and the average accuracy of the best and worst detection categories is 0.902 and 0.579, respectively. The reason for this is that there are fewer target instances in the “truck” and “van” categories in the dataset compared to the “car” category. In addition, “truck” has many types of shapes, resulting in complex and variable feature information, which increases the difficulty of detection. As a result, we will continue looking for approaches to boost the detection effect of the model in the following stage, such as data supplementation and enhancement for the relevant categories.

4.2. Comparative Experiment

We compare YOLOv5-VTO with a series of target detection algorithms, including YOLOv5s, Faster-RCNN, SSD, YOLOv3-tiny, YOLOv7-tiny [43], and Efficientdet-D0, to further evaluate the advantages of the algorithms in this study for the vehicle identification task. All models involved in the comparison were trained and validated using the same dataset, and the experimental data are presented in Table 3.
Table 3. Comparison of detection performance of different algorithms.
The experimental results of different algorithms in Table 3 show that the YOLOv5-VTO algorithm proposed in this paper achieves the highest mAP compared to other mainstream detection models. Compared with the benchmark YOLOv5s, the proposed model has significantly improved mAP@0.5, mAP@0.5:0.95, Precision, and Recall while keeping the detection speed pretty much the same.
As a representative of the anchor-free detection model, the Efficientdet algorithm has room for improvement in mAP. On the other hand, the two-stage detection algorithm Faster-RCNN is slower owing to the need to extract feature vectors from feature regions during the testing phase. A high accuracy rate is achieved for the same single-stage detection algorithm SSD. However, the lack of a low-level feature convolution layer of SSD leads to inadequate features extracted from small target vehicles, resulting in a high number of missed detections and low recall. Given the need for the model to detect in real-time, two miniature volume versions, YOLOv3-tiny and YOLOv7-tiny, are selected for comparison in this study. YOLOv7-tiny achieved good results in terms of recall and FPS based on the experimental data. Yet the model provided in this paper still has benefits in several other metrics, such as mAP, especially mAP@0.5:0.95. YOLOv3-tiny has a significant gap compared to YOLOv5-VTO in various indexes except for FPS. Although our model is lower than these two algorithms in terms of detection speed, it may still satisfy the demand for real-time detection.
In summary, the overall performance of this model is remarkable compared with other models, and the balance of detection accuracy and detection speed is achieved, which verifies the effectiveness of this model.

4.3. Visualizing the Detection Performance of Different Models

We have demonstrated the comparison of the detection effect before and after the modification, as shown in Figure 13, to evaluate the model more intuitively. Figure 13A shows that the model YOLOv5-VTO can reduce the false detection of vehicles. It can be observed from comparing the findings of groups B and C that the revised model decreases the rate of missed detection and remains effective even in scenes with insufficient light. From the contrast results of both groups D and E, it can be found that the detection performance of the model in this paper for tiny targets is improved. The above visualization results show that our model achieves better detection performance for tiny and obscured vehicles in aerial images.
Figure 13. Comparison of YOLOv5s algorithm detection results before and after improvement. (A) Improved model reduces false detections; (B) Mitigates missed detections in low-light scenes; (C) Improved model reduces missed detections; (D) Improved model enhances the detection performance for small targets; (E) Improved model reduces missed detection of mutually obscuring vehicles.

5. Conclusions and Future Works

We propose an enhanced model, YOLOv5-VTO, based on YOLOv5s to improve the detection performance of obscured vehicles and tiny vehicles in aerial images. Above all, a new detection branch, P2, that can discover tiny targets accurately is added to three detection layers of the baseline model. Then, the bi-directional feature pyramid (BiFPN), rather than the PAN structure of the original model, is adopted to achieve the excellent fusion of feature information of multiple scales to reduce the conflict between the fusion of features of different scales. By visualizing the detection results at last, we find that the Soft-NMS algorithm plays a good role in the scenario where vehicles block each other.
The experimental results indicate that the improved algorithm is more effective than the original YOLOv5s algorithm. Further, the detection speed can still achieve 30FPS, which can meet the demands of real-time detection. Although soft-NMS can improve the detection of obscured vehicles, it also slightly reduces the AP of some categories, such as “car”. Therefore, our following research will focus on how to mitigate this side effect after introducing Soft-NMS. Furthermore, there are still many uncertainties that limit the detection speed of the model, such as the large number of vehicles in the image, changing lighting conditions, the selection of the anchor box, the setting of the confidence threshold, and the deployment of high-performance hardware devices. Therefore, in future work, we will explore how to satisfy real-time detection applications under constraints.

Author Contributions

Conceptualization, S.L.; methodology, S.L.; software, X.Y.; validation, X.Y. and X.L.; formal analysis, X.Y.; investigation, S.L.; resources, S.L., X.Y. and X.L.; data curation, S.L. and X.L.; writing—original draft preparation, S.L.; writing—review and editing, S.L., X.Y. and X.L.; visualization, S.L. and Y.Z.; supervision, X.Y., J.W. and X.L.; project administration, S.L., X.Y., X.L., Y.Z. and J.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The data used in this paper were derived from the following sources available in the public domain [42]: VisDrone-DET2019: The Vision Meets Drone Object Detection in Image Challenge Results.

Acknowledgments

We are grateful to the reviewers for their suggestions for this paper.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this paper:
YOLOYou Only Look Once
IoUIntersection over Union
HOGHistogram of Oriented Gradients
SIFTScale Invariant Feature Transform
FPNFeature Pyramid Network
PANetPath Aggregation Network
UAVUnmanned Aerial Vehicle
NMSNon-Maximum Suppression
APAverage Precision
mAPMean Average Precision
SVMSupport Vector Machine
SSDSingle Shot Detector
FPSFrames Per Second
FLOPSFloating Point of Operations
CBSConv BN SiLU
TPTrue Positives
FPFalse Positives
FNFalse Negatives

References

  1. Xiong, J.; Liu, Z.; Chen, S.; Liu, B.; Zheng, Z.; Zhong, Z.; Yang, Z.; Peng, H. Visual detection of green mangoes by an unmanned aerial vehicle in orchards based on a deep learning method. Biosyst. Eng. 2020, 194, 261–272. [Google Scholar] [CrossRef]
  2. Byun, S.; Shin, I.-K.; Moon, J.; Kang, J.; Choi, S.-I. Road traffic monitoring from UAV images using deep learning networks. Remote Sens. 2021, 13, 4027. [Google Scholar] [CrossRef]
  3. Peng, X.; Zhong, X.; Zhao, C.; Chen, A.; Zhang, T. A UAV-based machine vision method for bridge crack recognition and width quantification through hybrid feature learning. Constr. Build. Mater. 2021, 299, 123896. [Google Scholar] [CrossRef]
  4. Jung, H.K.; Choi, G.S. Improved yolov5: Efficient object detection using drone images under various conditions. Appl. Sci. 2022, 12, 7255. [Google Scholar] [CrossRef]
  5. Bouguettaya, A.; Zarzour, H.; Kechida, A.; Taberkit, A.M. Vehicle detection from uav imagery with deep learning: A review. IEEE Trans. Neural Netw. Learn. Syst. 2021, 33, 6047–6067. [Google Scholar] [CrossRef]
  6. Ali, B.S. Traffic management for drones flying in the city. Int. J. Crit. Infrastruct. Prot. 2019, 26, 100310. [Google Scholar]
  7. Srivastava, S.; Narayan, S.; Mittal, S. A survey of deep learning techniques for vehicle detection from uav images. J. Syst. Architect. 2021, 117, 102152. [Google Scholar] [CrossRef]
  8. Qu, Y.; Jiang, L.; Guo, X. Moving vehicle detection with convolutional networks in UAV videos. In Proceedings of the 2016 2nd International Conference on Control, Automation and Robotics (ICCAR), Hong Kong, China, 28–30 April 2016; pp. 225–229. [Google Scholar]
  9. Tang, T.; Zhou, S.; Deng, Z.; Zou, H.; Lei, L. Vehicle Detection in Aerial Images Based on Region Convolutional Neural Networks and Hard Negative Example Mining. Sensors 2017, 17, 336. [Google Scholar] [CrossRef]
  10. Qu, T.; Zhang, Q.; Sun, S. Vehicle detection from high-resolution aerial images using spatial pyramid pooling-based deep convolutional neural networks. Multimed. Tools. Appl. 2017, 76, 21651–21663. [Google Scholar] [CrossRef]
  11. Zhu, X.; Lyu, S.; Wang, X.; Zhao, Q. TPH-YOLOv5: Improved YOLOv5 Based on Transformer Prediction Head for Object Detection on Drone-Captured Scenarios. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)Workshops, Montreal, BC, Canada, 11–17 October 2021; pp. 2778–2788. [Google Scholar]
  12. Xu, Y.; Yu, G.; Wang, Y.; Wu, X.; Ma, Y. A Hybrid Vehicle Detection Method Based onViola-Jones and HOG plus SVM from UAV Images. Sensors 2016, 16, 1325. [Google Scholar] [CrossRef]
  13. Moranduzzo, T.; Melgani, F. Detecting Cars in UAV lmages With a Catalog-Based Approach. IEEE Trans. Geosci. Remote Sens. 2014, 52, 6356–6367. [Google Scholar] [CrossRef]
  14. Jin, X.; Li, Z.; Yang, H. Pedestrain detection with YOLOv5 in autonomous driving scenario. In Proceedings of the 2021 5th CAA International Conference on Vehicular Control and Intelligence (CVCI), Tianjin, China, 29–31 October 2021; pp. 1–5. [Google Scholar]
  15. Tutsoy, O. Pharmacological, Non-Pharmacological Policies and Mutation: An Artificial Intelligence Based Multi-Dimensional Policy Making Algorithm for Controlling the Casualties of the Pandemic Diseases. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 44, 9477–9488. [Google Scholar] [CrossRef]
  16. Kellenberger, B.; Marcos, D.; Tuia, D. Detecting mammals in UAV images: Best practices to address a substantially imbalanced dataset with deep learning. Remote Sens. Environ. 2018, 216, 139–153. [Google Scholar] [CrossRef]
  17. Girshick, R. Fast R-CNN. In Proceedings of the IEEE International Conference on Computer Vision(ICCV), Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar]
  18. Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. In Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada, 7–12 December 2015; pp. 91–99. [Google Scholar]
  19. Dai, J.; Li, Y.; He, K.; Sun, J. R-FCN:Object Detection via Region-based Fully Convolutional Networks. In Proceedings of the Neural Information Processing Systems (NIPS 2016), Barcelona, Spain, 5–10 December 2016. [Google Scholar]
  20. Singh, C.H.; Mishra, V.; Jain, K.; Shukla, A.K. FRCNN-Based Reinforcement Learning for Real-Time Vehicle Detection, Tiracking and Geolocation from UAS. Drones 2022, 6, 406. [Google Scholar] [CrossRef]
  21. Ou, Z.; Wang, Z.; Xiao, F.; Xiong, B.; Zhang, H.; Song, M.; Zheng, Y.; Hui, P. AD-RCNN: Adaptive Dynamic Neural Network for Small Object Detection. IEEE Internet Things J. 2023, 10, 4226–4238. [Google Scholar] [CrossRef]
  22. Kong, X.; Zhang, Y.; Tu, S.; Xu, C.; Yang, W. Vehicle Detection in High-Resolution Aerial Images with Parallel RPN and Density-Assigner. Remote Sens. 2023, 15, 1659. [Google Scholar] [CrossRef]
  23. Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar]
  24. Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
  25. Redmon, J.; Farhadi, A. YOLO9000: Better, Faster, Stronger. In Proceedings of the 30th IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 6517–6525. [Google Scholar]
  26. Redmon, J.; Farhadi, A. Yolov3:An incremental improvement. arXiv 2018, arXiv:1804. 02767. [Google Scholar]
  27. Bochkovskiy, A.; Wang, C.Y.; Liao, H. YOLOv4:Optimal Speed and Accuracy of Object Detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
  28. Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. SSD: Single Shot MultiBox Detector. Lect. Notes Comput. Sci. 2016, 9905, 21–37. [Google Scholar]
  29. Yin, Q.; Yang, W.; Ran, M.; Wang, S. FD-SSD: An improved SSD object detection algorithm based on feature fusion and dilated convolution. Signal Process. Image Commun. 2021, 98, 116402. [Google Scholar] [CrossRef]
  30. Lin, T.; Su, C. Oriented Vehicle Detection in Aerial Images Based on YOLOv4. Sensors 2022, 22, 8394. [Google Scholar] [CrossRef] [PubMed]
  31. Ammar, A.; Koubaa, A.; Ahmed, M.; Saad, A.; Benjdira, B. Vehicle Detection from Aerial Images Using Deep Learning: A Comparative Study. Electronics 2021, 10, 820. [Google Scholar] [CrossRef]
  32. Zhang, R.; Newsam, S.; Shao, Z.; Huang, X.; Wang, J.; Li, D. Multi-scale adversarial network for vehicle detection in UAV imagery. ISPRS J. Photogramm. Remote Sens. 2021, 180, 283–295. [Google Scholar] [CrossRef]
  33. Glenn Jocher YOLOv5. Available online: https://github.com/ultralytics/yolov5 (accessed on 8 November 2022).
  34. Niu, C.; Li, K. Traffic Light Detection and Recognition Method Based on YOLOv5s and AlexNet. Appl. Sci. 2022, 12, 10808. [Google Scholar] [CrossRef]
  35. Sun, Y.; Li, M.; Dong, R.; Chen, W.; Jiang, D. Vision-Based Detection of Bolt Loosening Using YOLOv5. Sensors 2022, 22, 5184. [Google Scholar] [CrossRef] [PubMed]
  36. Yan, B.; Fan, P.; Lei, X.; Liu, Z.; Yang, F. A Real-Time Apple Targets Detection Method for Picking Robot Based on Improved YOLOv5. Remote Sens. 2021, 13, 1619. [Google Scholar] [CrossRef]
  37. Tan, M.; Pang, R.; Le, Q.V. Efficientdet: Scalable and efficient object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 10781–10790. [Google Scholar]
  38. Bodla, N.; Singh, B.; Chellappa, R.; Davis, L.S. Soft-NMS—Improving Object Detection with One Line of Code. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 5562–5570. [Google Scholar]
  39. Lin, T.Y.; Dollar, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 936–944. [Google Scholar]
  40. Liu, S.; Qi, L.; Qin, H.; Shi, J.; Jia, J. Path aggregation network for instance segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 8759–8768. [Google Scholar]
  41. Lin, T.; Maire, M.; Belongie, S. Microsoft COCO: Common Objects in Context. In Proceedings of the European Conference on Computer Vision (ECCV), Zurich, Switzerland, 6–12 September 2014; pp. 740–755. [Google Scholar]
  42. Du, D.; Zhu, P.; Wen, L.; Bian, X.; Lin, H.; Hu, Q.; Peng, T.; Zheng, J.; Wang, X.; Zhang, Y.; et al. VisDrone-DET2019: The Vision Meets Drone Object Detection in Image Challenge Results. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 213–226. [Google Scholar]
  43. Wang, C.Y.; Bochkovskiy, A.; Liao, H.Y.M. Yolov7:Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. arXiv 2022, arXiv:2207.02696. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.