1. Introduction
Remote sensing images, acquired through the detection of ground object data, have become indispensable digital assets, considering the rapid advancement of remote sensing technology [
1]. Target detection technology offers precise and valuable data for remote sensing image analysis, making significant contributions to research on natural resource distribution, terrain features, ports, and more. Additionally, the detection technology of small and micro-targets such as airplanes, automobiles, and vessels against complex backgrounds in remote sensing images has also gradually been taken seriously.
In the domain of image detection, the utilization of Deep Learning (DL) algorithms has become indispensable for precise target detection [
2]. These DL-based methods leverage intricate neural networks to discern and identify objects within remotely sensed imagery, thereby contributing to enhanced accuracy and efficiency of detection. In general, these DL-based methods can be classified as two- or one-stage approaches. Two-stage methods generate candidate boxes through sampling, utilize Convolutional Neural Networks (CNNs) for feature extraction and classification, and ultimately achieve accurate target localization through post processing operations. For example, the region-based CNN (R-CNN) series of algorithms [
3,
4,
5] is a classical two-stage approach. In contrast, one-stage object detection methods do not generate candidate boxes. Instead, they convert the task of localizing the target bounding box into a regression problem and successfully achieve accurate target localization through regression. Representative one-stage algorithms include the Single Shot MultiBox Detector (SSD) [
6], Centernet [
7], and the You Only Look Once (YOLO) algorithm series [
8,
9,
10,
11,
12,
13,
14,
15,
16,
17,
18]. Consequently, the former surpasses the latter in target detection accuracy and localization; however, the latter outperforms the former in detection speed. In recent years, transformer-based architectures like Detection Transformer (DETR) [
19], have advanced object detection through self-attention mechanisms. Another notable model, the Segment Anything Model (SAM) [
20], is an exemplary image segmentation model based on the transformer framework. These advancements have streamlined detection pipelines, eliminating the need for handcrafted components and achieving state-of-the-art results. However, transformer-based models typically exhibit slower speeds, limiting their application in real-time monitoring fields.
Processing speed is a crucial consideration in devices for real-time systems; hence, the one-stage detection method has attracted more scholars’ attention, particularly for real-time detection scenarios. Huo et al. [
21] proposed SAFF-SSD for small-object detection. Building upon SSD, they achieved enhanced feature extraction capabilities for SAFF-SSD through the incorporation of a local light transformer block. Betti et al. introduced YOLO-S [
22], a network akin to YOLO but specifically designed for detecting small objects. This approach demonstrates enhanced performance in detecting small objects. Lai developed a feature extraction component that combines a CNN with multi-head attention to expand the receptive view [
23]. The STC-YOLO algorithm performs well for traffic-sign identification. Qu et al.’s approach [
24] introduced a feature fusion strategy based on an attention mechanism. By merging target information from different scales, this strategy enhances the semantic expression of shallow features, consequently improving the tiny-object identification capacity of the algorithm. Our team has also proposed the PDWT-YOLO [
25] algorithm for target detection in unmanned aerial vehicle images, effectively enhancing its capability to detect small objects. Despite the advancements made through these improvements, several lingering issues continue to persist. SAFF-SSD [
21] demonstrates good feature extraction capabilities but lacks a significant speed advantage. YOLO-S [
20] employs a detection network with relatively outdated methods, and STC-YOLO [
23] is primarily applied to traffic sign detection. The algorithm proposed in Ref. [
24] exhibits competent detection performance; however, its detection time is also significantly increased. PDWT-YOLO [
25] is primarily designed for detecting small targets and exhibits a fast detection speed; however, it is not suitable for detecting much smaller targets, such as the targets in AI-TOD [
26].
In the domain of object detection, small objects typically stand for objects with a pixel area smaller than 32 × 32 pixels [
27]. Objects in remote sensing images are often even smaller, such as in AI-TOD, where the average size of targets is approximately 12.8 pixels. In this paper, objects smaller than 16 pixels are defined as “micro-targets”. Fewer pixels result in less feature information being extracted from the target, which significantly increases the difficulty of detection. Hence, most small-object detection methods are not suitable for micro-targets, prompting some researchers to focus on their detection [
25,
26]. Guanlin Lu et al. [
28] propose MStrans, a multi-scale transformer-based aerial object detector that effectively tackles the difficulties of detecting micro-instances in aerial images. Shuyan Ni et al. [
29] introduce JSDNet, a network designed for detecting micro-targets by leveraging the geometric Jensen–Shannon divergence. JSDNet incorporates the Swin Transformer model to enhance feature extraction for micro-targets and addresses IoU sensitivity through the JSDM module. Nevertheless, MStrans [
28] and JSDNet [
29] do not meet the real-time application requirements due to their larger model sizes and slower speeds.
This paper introduces a real-time detection method specifically for micro-targets by modifying the YOLOv7-tiny network and loss function. The proposed method incorporates three key innovations, as outlined below:
- (1)
Integration of a new loss function: A normalized weighted intersection over union (NWD) [
30] is integrated with a CIOU to replace the singular use of CIOU. Through experiments, the optimal fusion factor for the NWD and CIOU is established to mitigate the sensitivity issue related to micro-targets.
- (2)
Utilization of lightweight Content-Aware Reassembly of Features (CARAFE): The CARAFE operator takes the place of the initial bilinear interpolation upsampling operator. This lightweight operator effectively reassembles features within predefined regions centered at each position, thereby achieving enhanced feature extraction related to micro-targets through weighted combinations.
- (3)
Inclusion of a Spatial Pyramid Structure (SPP) in the high-resolution feature map layer: Contextual Spatial Pyramid Spatial Pyramid Pooling (CSPSPP) is added to the object detection layer, which has a resolution of 80 × 80 pixels. Hence, the algorithm’s ability to capture the various scale features of micro-targets has improved.
These improvements collectively address the shortcomings of the existing models, enhancing accuracy and efficiency for the detection of micro-targets. Experimental results validate the efficacy of the suggested approach in identifying micro-targets.
This paper contains six sections that comprehensively address the realm of micro-object detection.
Section 2 synthesizes the relevant theoretical frameworks.
Section 3 discusses the proposed innovations, details the improved modules, and elucidates the rationale behind each improvement, including the refinements to the Intersection Over Union (IOU) loss function, the integration of CARAFE module, and addition of the CSPSPP module.
Section 4 outlines the experiments, describes the chosen datasets and parameter configurations, and provides an in-depth examination of the outcomes.
Section 5 presents the discussion of results, provides a comprehensive comparison with classic algorithms, and highlights the current shortcomings and directions for improvement in the research.
Section 6 demonstrates the conclusions.
Author Contributions
Conceptualization, L.Z. and N.X.; methodology, L.Z.; software, W.G.; validation, W.G.; resources; data curation, L.Z. and N.X.; writing—original draft preparation, L.Z. and W.G.; writing—review and editing, N.X. and P.W.; visualization, P.W. All authors have read and agreed to the published version of the manuscript.
Funding
This research was funded by the general project of Key R & D Plan of Shanxi Province, high-technology field (grant number 201903D121171) and the National Natural Science Foundation of China (serial number 61976134).
Informed Consent Statement
Not applicable.
Data Availability Statement
Acknowledgments
We would like to express our gratitude to Xiaodong Yue (Shanghai University) for providing computational resources and support.
Conflicts of Interest
The authors declare no conflicts of interest.
Abbreviations
The following abbreviations are used in this manuscript:
AI-TOD | Tiny Object Detection in Aerial Images Dataset |
CARAFE | Content-Aware Reassembly of Features |
CIOU | Complete Intersection Over Union |
CNNs | Convolutional Neural Networks |
CSPSPP | Contextual Spatial Pyramid Spatial Pyramid Pooling |
DETR | Detection Transformer |
DL | Deep Learning |
DIOU | Distance Intersection Over Union |
EIOU | Enhanced Intersection Over Union |
ELAN | Efficient Long-Range Aggregation Network |
Fast R-CNN | Fast Region-Based Convolutional Network |
FPN | Feature Pyramid Network |
CBL | Convolutional Block Layer |
GFLOPs | Giga Floating-Point Operations Per Second |
GIOU | Generalized Intersection Over Union |
GWD | Gaussian Wasserstein Distance |
IOU | Intersection Over Union |
MPConv | Compact Convolution Module |
NMS | Non-Maximum Suppression |
NWD | Normalized Wasserstein Distance |
PAFPN | Path Aggregation Feature Pyramid Network |
PANet | Path Aggregation Network |
R-CNN | Region with CNN Features |
SAM | Segment Anything Model |
SIMD | Satellite Imagery Multi-Vehicles Dataset |
SPP | Spatial Pyramid Structure |
SSD | Single Shot Multibox Detector |
YOLO | You Only Look Once |
References
- Tong, K.; Wu, Y.; Zhou, F. Recent Advances in Small Object Detection Based on Deep Learning: A Review. Image Vis. Comput. 2020, 97, 103910. [Google Scholar] [CrossRef]
- Ahmed, M.; Hashmi, K.A.; Pagani, A.; Liwicki, M.; Stricker, D.; Afzal, M.Z. Survey and performance analysis of deep learning based object detection in challenging environments. Sensors 2021, 21, 5116. [Google Scholar] [CrossRef]
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar] [CrossRef]
- Girshick, R. Fast R-CNN. In Proceedings of the 2015 IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar] [CrossRef]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. SSD: Single shot multibox detector. In Proceedings of the 14th European Conference on Computer Vision 2016, Amsterdam, The Netherlands, 11–14 October 2016; Volume 9905, pp. 21–37. [Google Scholar] [CrossRef]
- Zhou, X.; Wang, D.; Krähenbühl, P. Objects as points. arXiv 2019, arXiv:1904.07850. [Google Scholar] [CrossRef]
- Redmon, J.; Divvala, S.K.; Girshick, R.B.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar] [CrossRef]
- Redmon, J.; Farhadi, A. YOLO9000: Better, faster, stronger. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 6517–6525. [Google Scholar] [CrossRef]
- Redmon, J.; Farhadi, A. YOLOv3: An incremental improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar] [CrossRef]
- Bochkovskiy, A.; Wang, C.Y.; Liao, H.Y.M. YOLOv4: Optimal speed and accuracy of object detection. arXiv 2020, arXiv:2004.10934. [Google Scholar] [CrossRef]
- Zhu, X.; Lyu, S.; Wang, X.; Zhao, Q. TPH-YOLOv5: Improved YOLOv5 based on transformer prediction head for object detection on drone-captured scenarios. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops, Montreal, BC, Canada, 11–17 October 2021; pp. 2778–2788. [Google Scholar] [CrossRef]
- Ultralytics: Yolov5. [EB/OL]. Available online: https://github.com/ultralytics/yolov5 (accessed on 1 November 2021).
- Chen, Z.; Zhang, F.; Liu, H.; Wang, L.; Zhang, Q.; Guo, L. Real-time detection algorithm of helmet and reflective vest based on improved YOLOv5. J. Real-Time Image Process. 2023, 20, 4. [Google Scholar] [CrossRef]
- Wu, D.; Jiang, S.; Zhao, E.; Liu, Y.; Zhu, H.; Wang, W.; Wang, R. Detection of Camellia oleifera fruit in complex scenes by using YOLOv7 and data augmentation. Appl. Sci. 2022, 12, 11318. [Google Scholar] [CrossRef]
- Jiang, K.; Xie, T.; Yan, R.; Yan, R.; Wen, X.; Li, D.; Jiang, H.; Jiang, N.; Feng, L.; Duan, X.; et al. An attention mechanism-improved YOLOv7 object detection algorithm for hemp duck count estimation. Agriculture 2022, 12, 1659. [Google Scholar] [CrossRef]
- Li, B.; Chen, Y.; Xu, H.; Fei, Z. Fast vehicle detection algorithm on lightweight YOLOv7-tiny. arXiv 2023, arXiv:2304.06002. [Google Scholar] [CrossRef]
- Kulyukin, V.A.; Kulyukin, A.V. Accuracy vs. energy: An assessment of bee object inference in videos from on-hive video loggers with YOLOv3, YOLOv4-Tiny, and YOLOv7-Tiny. Sensors 2023, 23, 6791. [Google Scholar] [CrossRef]
- Carion, N.; Massa, F.; Synnaeve, G.; Usunier, N. End-to-end object detection with transformers. In European conference on computer vision; Glasgow, UK, 23–28 August 2020, Springer International Publishing: Cham, Switzerland, 2020; pp. 213–229. [Google Scholar] [CrossRef]
- Kirillov, A.; Mintun, E.; Mintun, E. Segment anything. arXiv preprint 2023, arXiv:2304.02643. [Google Scholar] [CrossRef]
- Huo, B.; Li, C.; Zhang, J.; Xue, Y.; Lin, J. SAFF-SSD: Self-attention combined feature fusion-based SSD for small object detection in remote sensing. Remote Sens. 2023, 15, 3027. [Google Scholar] [CrossRef]
- Betti, A.; Tucci, M. YOLO-S: A lightweight and accurate YOLO-like network for small target detection in aerial imagery. Sensors 2023, 23, 1865. [Google Scholar] [CrossRef]
- Lai, H.; Chen, L.; Liu, W.; Yan, Z.; Ye, S. STC-YOLO: S mall object detection network for traffic signs in complex environments. Sensors 2023, 23, 5307. [Google Scholar] [CrossRef]
- Qu, J.; Tang, Z.; Zhang, L.; Zhang, Y.; Zhang, Z. Remote sensing small object detection network based on attention mechanism and multi-scale feature fusion. Remote Sens. 2023, 15, 2728. [Google Scholar] [CrossRef]
- Zhang, L.; Xiong, N.; Pan, X.; Yue, X.; Wu, P.; Guo, C. Improved Object Detection Method Utilizing YOLOv7-Tiny for Unmanned Aerial Vehicle Photographic Imagery. Algorithms 2023, 16, 520. [Google Scholar] [CrossRef]
- Wang, J.; Yang, W.; Guo, H.; Zhang, R.; Xia, G.S. Tiny object detection in aerial images. In Proceedings of the 25th International Conference on Pattern Recognition, Milan, Italy, 10–15 January 2021; pp. 3791–3798. [Google Scholar] [CrossRef]
- Chen, X.; Fang, H.; Lin, T.Y. Microsoft coco captions: Data collection and evaluation server. arXiv preprint 2015, arXiv:1504.00325. [Google Scholar] [CrossRef]
- Lu, G.; He, X.; Wang, Q.; Shao, F.; Wang, J.; Hao, L. MStrans: Multiscale Vision Transformer for Aerial Objects Detection. IEEE Access 2022, 10, 75971–75985. [Google Scholar] [CrossRef]
- Ni, S.; Lin, C.; Wang, H.; Li, Y.; Liao, Y.; Li, N. Learning geometric Jensen-Shannon divergence for tiny object detection in remote sensing images. Front. Neurorobot. 2023, 17, 1273251. [Google Scholar] [CrossRef]
- Wang, J.; Xu, C.; Yang, W.; Yu, L. A normalized Gaussian Wasserstein distance for tiny object detection. arXiv 2021, arXiv:2110.13389. [Google Scholar] [CrossRef]
- Wang, C.Y.; Bochkovskiy, A.; Liao, H.Y.M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 7464–7475. [Google Scholar] [CrossRef]
- Lin, T.Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2117–2125. [Google Scholar] [CrossRef]
- Liu, S.; Qi, L.; Qin, H.; Shi, J.; Jia, J. Path aggregation network for instance segmentation. In Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 8759–8768. [Google Scholar] [CrossRef]
- Wang, C.Y.; Yeh, I.; Liao, H.Y.M. You only learn one representation: Unified network for multiple tasks. arXiv 2021, arXiv:2105.04206. [Google Scholar] [CrossRef]
- Rezatofighi, H.; Tsoi, N.; Gwak, J.Y.; Sadeghian, A.; Reid, I.; Savarese, S. Generalized intersection over union: A metric and a loss for bounding box regression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 658–666. [Google Scholar] [CrossRef]
- Zheng, Z.; Wang, P.; Liu, W.; Li, J.; Ye, R.; Ren, D. Distance-IoU loss: Faster and better learning for bounding box regression. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 12993–13000. [Google Scholar] [CrossRef]
- Zhang, Y.F.; Ren, W.; Zhang, Z.; Jia, Z.; Wang, L.; Tan, T. Focal and efficient IOU loss for accurate bounding box regression. Neurocomputing 2022, 506, 146–157. [Google Scholar] [CrossRef]
- Yang, X.; Zhang, G.; Yang, X.; Zhou, Y.; Wang, W.; Tang, J.; He, T.; Yan, J. Detecting rotated objects as Gaussian distributions and its 3-D generalization. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 45, 4335–4354. [Google Scholar] [CrossRef] [PubMed]
- Noh, H.; Hong, S.; Han, B. Learning deconvolution network for semantic segmentation. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1520–1528. [Google Scholar] [CrossRef]
- Wang, J.; Chen, K.; Xu, R.; Liu, Z.; Loy, C.C.; Lin, D. CARAFE: Content-aware reassembly of features. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 3007–3016. [Google Scholar] [CrossRef]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 37, 1904–1916. [Google Scholar] [CrossRef] [PubMed]
- Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal loss for dense object detection. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 42, 318–327. [Google Scholar] [CrossRef] [PubMed]
- Zhang, S.; Chi, C.; Yao, Y.; Lei, Z.; Li, S.Z. Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 9759–9768. [Google Scholar] [CrossRef]
- Cai, Z.; Vasconcelos, N. Cascade R-CNN: Delving into high quality object detection. In Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 13–23 June 2018; pp. 6154–6162. [Google Scholar] [CrossRef]
- Haroon, M.; Shahzad, M.; Fraz, M.M. Multisized object detection using spaceborne optical imagery. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2020, 13, 3032–3046. [Google Scholar] [CrossRef]
Figure 1.
YOLOv7-tiny network architecture.
Figure 2.
IOU calculation.
Figure 3.
Improved network architecture. The green dotted box represents Trick 1, in which the new loss function is used. The orange dotted box is Trick 2, in which Upsample in the structure is replaced by CARAFE. The red dotted box represents Trick 3, in which CBL in the structure is replaced by CSPSPP.
Figure 4.
CARAFE structure.
Figure 5.
CSPSPP structure.
Figure 6.
Sample images from AI-TOD dataset.
Figure 7.
Model training process.
Figure 8.
Comparison of training process curves for baseline and enhanced method.
Figure 9.
Comparison of YOLOv7-tiny and the proposed method for micro-target detection.
Figure 10.
Comparison of YOLOv7-tiny and the proposed method for dense-target detection.
Figure 11.
Comparison of YOLOv7-tiny and the proposed method for micro-target detection against complex backgrounds.
Table 1.
Experiment parameter configuration.
Name | Value |
---|
epochs | 800 |
batch_size | 32 |
lr0 | 0.05 |
lrf | 0.1 |
momentum | 0.937 |
img_size | 640 |
Table 2.
Improvement point ablation experiment. The best results are shown in bold.
Method | Loss | CARAFE | CSPSPP | mAP0.5/% | mAP0.5:0.95/% | FPS | Params/M | GFLOPs |
---|
YOLOv7-tiny | | | | 42.0 | 16.8 | 161 | 6.02 | 13.1 |
Method 1 | √ | | | 46.7 | 17.9 | 156 | 6.03 | 13.1 |
Method 2 | | √ | | 44.5 | 17.5 | 156 | 6.04 | 13.3 |
Method 3 | | | √ | 45.5 | 17.9 | 149 | 6.47 | 18.7 |
Method 4 | √ | √ | | 47.5 | 18.1 | 147 | 6.05 | 13.3 |
Method 5 | | √ | √ | 46.4 | 18.2 | 142 | 6.49 | 18.8 |
Method 6 | √ | √ | √ | 48.7 | 18.9 | 139 | 6.49 | 18.8 |
Table 3.
Comparative experiment results for different networks. The best results are shown in bold.
Method | mAP0.5/% | mAP0.5:0.95/% | Model Size/MB | FPS |
---|
SSD-512 [6] | 21.7 | 7.0 | - | |
RetinaNet [42] | 13.6 | 4.7 | - | |
CenterNet [7] | 39.2 | 13.4 | - | |
Faster R-cnn [5] | 26.3 | 11.1 | 236.33 | 16 |
ATSS [43] | 30.6 | 12.8 | 244.56 | 13 |
Cascade R-CNN [44] | 30.8 | 13.8 | 319.45 | 11 |
JSDNet [29] | 52.5 | 21.4 | 88 | 62 |
YOLOv5s [13] | 42.2 | 18.6 | 29.97 | 102 |
PDWT-YOLO [25] | 45.6 | 18.2 | 12.9 | 141 |
YOLOv7-tiny | 42.0 | 16.8 | 12.3 | 161 |
Proposed method | 48.7 | 18.9 | 13.3 | 139 |
Table 4.
Comparison of different CIOU/NWD ratios. The best results are shown in bold.
CIOU | NWD | mAP0.5/% | mAP0.5:0.95/% |
---|
1 | 0 | 42.0 | 16.8 |
0.75 | 0.25 | 44.9 | 17.7 |
0.5 | 0.5 | 46.1 | 17.8 |
0.25 | 0.75 | 46.7 | 17.9 |
0 | 1 | 45.6 | 16.6 |
Table 5.
Comparison of CARAFE hyperparameter experiment results. The best results are shown in bold.
kencoder | kup | mAP0.5/% | mAP0.5:0.95/% | GFLOPs |
---|
1 | 3 | 44.3 | 17.5 | 13.1 |
3 | 5 | 44.5 | 17.4 | 13.3 |
5 | 7 | 42.8 | 17.5 | 14.6 |
Table 6.
Comparison of results for various CSPSPP positions. The best results are shown in bold.
Position | mAP0.5/% | mAP0.5:0.95/% | GFLOPs |
---|
1 | 45.3 | 17.7 | 18.7 |
2 | 45.5 | 17.9 | 18.7 |
3 | 43.6 | 17.4 | 24.3 |
Table 7.
Comparison of YOLOv7-tiny and proposed algorithm results based on SIMD dataset.
Method | mAP0.5/% | mAP0.5:0.95/% | APS/% | APM/% | APL/% |
---|
YOLOv7-tiny | 80.2 | 63.3 | 9.8 | 54.8 | 68.2 |
Proposed | 81.7 | 64.1 | 12.6 | 55.6 | 70.1 |
| Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).