Skip to Content
JMSEJournal of Marine Science and Engineering
  • Article
  • Open Access

7 May 2023

Underwater-YCC: Underwater Target Detection Optimization Algorithm Based on YOLOv7

,
,
,
and
1
School of Electronic Information and Artificial Intelligence, Shaanxi University of Science and Technology, Xi’an 710021, China
2
School of Marine Science and Technology, Northwestern Polytechnical University, Xi’an 710072, China
*
Authors to whom correspondence should be addressed.

Abstract

Underwater target detection using optical images is a challenging yet promising area that has witnessed significant progress. However, fuzzy distortions and irregular light absorption in the underwater environment often lead to image blur and color bias, particularly for small targets. Consequently, existing methods have yet to yield satisfactory results. To address this issue, we propose the Underwater-YCC optimization algorithm based on You Only Look Once (YOLO) v7 to enhance the accuracy of detecting small targets underwater. Our algorithm utilizes the Convolutional Block Attention Module (CBAM) to obtain fine-grained semantic information by selecting an optimal position through multiple experiments. Furthermore, we employ the Conv2Former as the Neck component of the network for underwater blurred images. Finally, we apply the Wise-IoU, which is effective in improving detection accuracy by assigning multiple weights between high- and low-quality images. Our experiments on the URPC2020 dataset demonstrate that the Underwater-YCC algorithm achieves a mean Average Precision (mAP) of up to 87.16% in complex underwater environments.

1. Introduction

The ocean is the largest repository of resources on Earth, and its related industries, such as marine ranching, are constantly improving due to the rapid development of underwater equipment. A crucial step in resource extraction and utilization is detection. New technologies, such as artificial intelligence, have provided significant impetus to improve detection. While many studies on underwater target detection are based on acoustic detection methods [1], these methods are inadequate for detecting small-sized underwater organisms due to their low sound source level, which can easily be drowned out by background noise. Additionally, the feature diversity in acoustic detection methods may not meet the demand for distinguishing small differences between underwater organisms. For this reason, optical images are more suitable for detecting small targets at close range, as they contain rich features of the target.
However, the complex underwater environment can seriously affect optical images. In general, the quality of underwater images is poor. The primary reason for this poor quality is the complexity and variability of underwater lighting conditions [2]. Specifically, (i) the energy attenuation of red to blue light in the chromatographic process changes from fast to slow, resulting in blue-green tone and underwater image color distortion. (ii) Different colors scatter in water to varying degrees and manners, causing loss of fine image details. (iii) Real-life water bodies are often turbid, containing sediment and plankton, which degrade the imaging quality of underwater cameras and blur the images. (iv) Due to the specific habitat of underwater organisms, they are usually attached to mud, sand, and reefs, which are difficult to distinguish from the background. Target occlusion is also a problem due to the specificity of organism distribution. All of these factors pose significant challenges to underwater target detection, and traditional target detection algorithms are often less robust, more costly, and unsuitable for complex underwater environments [3].
Deep learning has demonstrated remarkable success in feature extraction, reducing the impact of errors caused by human factors. Its high speed and generalization make it widely used in many fields [4]. Deep-learning-based target detection algorithms can be broadly classified into two main categories. The first is a two-stage algorithm [5,6,7,8], which generates candidate regions on an image to determine if they contain a target. If a target is detected, the candidate region is classified with bounding box regression. However, the two-stage algorithm involves significant repetitive computation operations [9], leading to slow inference speed.
The one-stage detection algorithm is used to complete the target localization and regression directly on the image. OverFeat [10] was among the earliest one-stage detectors to be developed. Subsequently, the YOLO series [11,12,13,14] has demonstrated strong performance in practical engineering. In recent years, many researchers have applied YOLO networks to underwater target detection projects. Zhao et al. [15] proposed an underwater target detection algorithm, YOLO-UOD, based on YOLOv4-tiny. This algorithm introduced a symmetric FPN-attention module in the Neck architecture to achieve more efficient feature fusion and added a label-smoothing training strategy. This approach demonstrated superior detection performance. Zhang et al. [16] combined MobileNet V2 and depth-separable convolution to reduce the number of model parameters while using an improved AFFM for better fusion, achieving a balance between time and accuracy for underwater target detection. Li et al. [17] improved the feature extraction capability by embedding the triplet attention mechanism into the Neck structure of YOLOv5 and optimized the detection head to capture small-sized objects. This approach demonstrated good performance in detecting underwater organisms. Zhai et al. [18] added the CBAM module in YOLOv5s to save parameters and arithmetic power. They also increased the number of detection layers in the Head network by increasing the number of up-sampled layers in the Neck structure, thereby improving the accuracy of sea cucumber detection. Liu et al. [19] added CBAM to CSPDarkbet53 to enhance the feature extraction of occluded and overlapping targets. Additionally, they used SAGHS to recover underwater images and finally obtain a detection model suitable for occluded underwater targets. Overall, these studies demonstrate the potential of YOLO-based algorithms for underwater object detection and the importance of optimizing network architectures and training strategies for specific applications.
In this paper, we propose a novel optimization algorithm, termed Underwater-YCC (YOLOv7 with CBAM and Conv2Former, YCC), for improving the accuracy of underwater target detection. Experimental results on the URPC2020 dataset demonstrate that Underwater-YCC outperforms YOLOv7 in terms of detection accuracy. The main innovations are as follows:
  • Underwater data collection poses challenges due to the poor image quality and limited number of learnable samples. To overcome these challenges, this paper adopts data-enhancement methods, including random flipping, stretching, mosaic enhancement, and mixup, to enrich the learnable samples of the model. This approach improves the generalization ability of the model and helps to prevent overfitting.
  • In order to extract more comprehensive semantic information and enhance the feature extraction capability of the model, we incorporate the CBAM attention mechanism into each component of the YOLOv7 architecture. Specifically, we introduce the CBAM attention mechanism into the Backbone, Neck, and Head structures, respectively, to identify the most effective location for the attention mechanism. Our experimental results reveal that embedding the CBAM attention mechanism into the Neck structure yields the best performance, as it allows the model to capture fine-grained semantic information and more effectively detect targets.
  • To enhance the ability of the model to detect objects in underwater images with poor quality, this paper introduces Conv2Former as the Neck component of the network. The Conv2Former model can effectively handle images with different resolutions and extract useful features for fusion, thereby improving the overall detection performance of the network on blurred underwater images.
  • As low-quality underwater images can negatively affect the model’s generalization ability, this paper introduces Wise-IoU as a bounding box regression loss function. This function improves the detection accuracy of the model by weighing the learning of samples of different qualities, resulting in more accurate localization and regression of targets in low-quality underwater images.
The paper is organized as follows. Section 2 focuses on the work related to this algorithm, with emphasis on the data enhancement approach and the YOLOv7 architecture. Section 3 introduces the content of the proposed Underwater-YCC algorithm. In Section 4 the relevant experimental results are analyzed and discussed. Section 5 presents conclusions.

3. Underwater-YCC Algorithm

In this section, the Underwater-YCC target detection algorithm is introduced. The main structure diagram of this algorithm is shown in Figure 14.
Figure 14. The architecture of Underwater-YCC.

3.1. YOLOv7 with CBAM

In the field of target detection, there is no single rule for where the best results can be achieved by adding attention mechanisms, and the results vary from location to location. For YOLOv7, three different fusion methods have been chosen for the three modules Backbone, Neck, and Head. The first is to add the attention mechanism to the Backbone section, which is part of the network where the features are extracted. The fusion of attention at this location can help the network to extract more effective information and locate fine-grained features more easily, thus improving the overall performance of the network. The second method is to add the attention mechanism to the Neck part of the network, where the features are integrated and extracted. When fusing information at different scales, adding the attention mechanism can help the network to fuse more valuable information into the features to refine the features. The last approach is to add the attention mechanism to the Head section, which is for feature classification as well as regression prediction, and to add the attention mechanism before the three different scales of features in and out, to perform attention reconstruction on the feature map and ultimately improve the network performance. The three attention mechanisms are added as shown in Figure 15.
Figure 15. Left: Incorporate an attention mechanism in the Backbone. Middle: Incorporate an attention mechanism in the Neck. Right: Incorporate an attention mechanism in the Head.

3.2. Neck Improvement Based on Conv2Former

The introduction of the transformer has given a huge boost to the field of computer vision, demonstrating powerful performance in areas such as image segmentation and target detection. More and more researchers are proposing the encoding of spatial features by convolution, and Conv2Former is one of the most efficient methods for encoding spatial features using convolution. The structure of Conv2Former [25] is shown in Figure 16, which is a transformer-style convolutional network with a pyramidal structure and a different number of convolutional blocks in each of the four stages. Each stage has a different feature map resolution, and a patch-embedding block is used in between two consecutive stages to reduce the resolution. The core of the method lies in the convolutional modulation operation, as shown in Figure 17, using only deep convolutional features as weights to modulate the representation, combined with Hadamard product to simplify the self-attentive mechanism and make more efficient use of large kernel convolution. Inspired by TPH-YOLOv5 [26], Conv2Former replaces the ELAN-F convolution block in the Neck of the original YOLOv7. Compared with the original structure, Conv2Former can better capture the global information and contextual semantic information of the network, and thus obtain rich features for fusion operation, which enables the network performance to be improved.
Figure 16. Overall architecture of Conv2Former.
Figure 17. Left: Transformer; Right: Convolutional modulation.

3.3. Introduction of Wise-IoU Bounding Box Loss Function

In the field of target detection, the setting of the bounding box loss function directly affects the accuracy of the target detection result. The bounding box loss function is used to optimize the error between the position of the detected object and the real object so that the output prediction box is infinitely close to the real box. As the scenes and datasets faced in underwater practical work are of poor quality, we propose the use of Wise-IoU as the bounding box loss function, thus balancing the results of the model-trained images of varying quality to obtain a more accurate detection result. Wise-IoU [27] is a category weight introduced on top of the traditional IoU to minimize the difference between categories, thus reducing the impact on detection results. That is, a weight is assigned to each category and then the overlap between different categories is weighted using different weights in the calculation of IoU to obtain a more accurate evaluation result. Wise-IoUv1 with a two-level attention mechanism is first constructed based on the distance metric with the following equation:
L W i s e - I o U v 1 = R W i s e - I o U L I o U
R W i s e - I o U = exp x x g t 2 + y y g t 2 W g 2 + H g 2 *
An anchor box is represented by B = [ x y w h ] , where the value represents the center coordinates and size of the corresponding bounding box, and B g t = [ x g t   y g t   w g t   h g t ] refers to the corresponding value of the target box. W g and H g are the minimum dimensions of the bounding box, R W i s e - I o U can significantly amplify the IoU Loss of an ordinary quality anchor box, and L I o U can reduce R W i s e - I o U of a high-quality anchor box. The method used in this paper applies Wise-IoU with β on top of Wise-IoUv1. The outlier β is used to describe the quality of anchor frames, with a smaller outlier representing a higher-quality anchor frame. A smaller gradient gain is assigned to anchor frames with larger outliers, preventing low-quality images from affecting the training results. The outlier is defined as follows:
β = L I o U * L I o U 0 , +
The Wise-IoU used is defined as follows: δ makes r = 1 when β = δ . The anchor box will have the highest gradient when the outlier is equal to a fixed value. According to Equation (7), the criteria for dividing the anchor box are dynamic, so Wise-IoU can use the best gradient gain allocation strategy and improve the positioning accuracy of the model.
L W i s e - I o U = r L W i s e - I o U v 1 , r = β δ α β δ

4. Experiments

4.1. Experimental Platform

The experimental environment of this paper is shown in Table 1.
Table 1. Experimental environment and parameters.

4.2. Evaluation Metrics

In this paper, the metric’s precision, recall, F1 score, and mAP are selected to evaluate the performance of the model. If the predicted value is the same as the true value, the predicted value is a positive sample, denoted TP. If the predicted value is a negative sample, it is denoted TN. If they are not the same, and the predicted value is a positive sample, it is denoted FP, and if the predicted value is a negative sample, it is denoted FN. The recall, precision and F1 score are calculated as follows:
P e r c i s i o n = T P T P + F P
R e c a l l = T P T P + F N
F 1 = 2 × P r e c i s i o n × R e c a l l P r e c i s i o n + R e c a l l
AP is the average of the precision values on the PR curve, obtained using different combinations of precision and pecall points to calculate the area under the curve. mAP is the mean average precision; these metrics can be expressed as:
A P = 0 1 P R d R
m A P = 1 c l a s s _ n u m 0 1 P R d R

4.3. Experimental Results and Analysis

The results in this section are obtained experimentally on the URPC2020 dataset. The mislabeled images in this dataset are re-labeled, the overly blurred images are filtered out, and the final experimental results are obtained on the optimized dataset.

4.3.1. Data Augmentation

Experiments were conducted using different data enhancement methods on the original structure of YOLOv7. From Table 2, the mAP of the model training results was only 64.59% when no data enhancement method was used, which increased by 4.91% and 17.38% after training with mixup and mosaic, respectively, and by 21.08% when the two enhancement methods were used together. The experimental results show that both data augmentation methods can help train the model well, and the use of both can greatly improve the detection accuracy of the model.
Table 2. Data Augmentation.

4.3.2. Fusion Attention Mechanism Comparison Test

The model and attention mechanism were optimally combined by adding the attention mechanism at different locations in YOLOv7, and CBAM was added to the Backbone, Neck, and Head parts of the network, respectively. Table 3 shows the experimental results. The addition of CBAM to the network improved the recognition accuracy of the network, with the best result being 86.68% at the Neck; both accuracy and recall were higher than the original model. The results show that CBAM does not work in all parts of the network. In the Head part, due to the deeper model, the underlying semantic information has been lost, and it is difficult to obtain results with fewer features for further attention weighting, so many metrics have decreased. The best embedding results are obtained in the Neck part, where the attentional weighting of the feature maps of different dimensions is more effective at obtaining fine-grained semantic information. This helps the network to grasp the detection target, and thus obtain the most significant effect.
Table 3. Fusion Attention Mechanism.

4.3.3. Ablation Experiments

In order to verify the effectiveness of each improved method for underwater target detection, the effect of different modules on detection results is analyzed by ablation experiments. Among them, YOLOv7_A adds CBAM to the Neck, YOLOv7_B uses Conv2Former to improve the Neck, YOLOv7_C uses Wise-IoU, YOLOv7_D uses both CBAM and Wise-IoU, and YOLOv7_E uses both Conv2Former and Wise-IoU. Underwater-YCC is the underwater target detection method proposed in this paper.
From Table 4, we can see that the experimental results obtained for each of the modular methods used are improved compared to the original YOLOv7, indicating that all reinforcement methods used in this paper are effective and can all be used to improve underwater detection activities. (1) Analyzing the results of the three single methods in experiments (a–c) shows that the addition of each optimization method is improved compared to YOLOv7, where the addition of Conv2Former has improved the mAP of the network by 0.85%. This means that the Conv2Former module can capture the global information of the network well and retain the semantic information. The introduction of CBAM gives the network the ability to acquire more valuable features for fusion. The 0.88% improvement using Wise-IoU means that using this method allows the network to focus more on effective features and have better weight selection for images of different quality. (2) The results of experiments (d,e) show that combining Wise-IoU with CBAM and Conv2Former, respectively, improves 1.17% and 1.26%, compared to YOLOv7, indicating that this bounding box loss function is effective after adding the optimization method. (3) After summarizing the above optimization methods, this paper proposes an optimization algorithm for Underwater-YCC, which adds CBAM while using Conv2Former for Neck feature fusion, and lastly uses Wise-IoU for bounding box loss regression. This model improved the mAP by 1.49% compared to the original YOLOv7. The results show that the Underwater-YCC method can perform high-quality detection in complex underwater environments.
Table 4. Ablation Experiments.
Figure 18 depicts the test results of Underwater-YCC compared with YOLOv7. Among them, Figure 18a is the detection result of YOLOv7 and Figure 18b is the detection result of Underwater-YCC. From the figures, we can get that our proposed model can detect more targets compared with the original model and has better results for the detection of complex underwater environments.
Figure 18. Comparison of experimental effects, (a) YOLOv7; (b) Underwater-YCC.

4.3.4. Target Detection Network Comparison Experiment Results

Table 5 compares the results of Underwater-YCC with classical target detection algorithms, such Faster-RCNN [28], YOLOv3, YOLOv5s, YOLOv6 [29], and YOLOv7-Tiny. It can be seen from the results that although the detection time increases slightly due to the complex structure of the model, Underwater-YCC has higher detection accuracy and is more adaptable to the complex underwater environment.
Table 5. Compare with classical target detection algorithms.

5. Conclusions

In this study, we addressed the challenges of false and missed detection caused by blurred underwater images and the small size of underwater creatures. To tackle these issues, we proposed an underwater target detection algorithm called Underwater-YCC based on YOLOv7. We tested our algorithm on the URPC2020 dataset, which includes underwater images of echinus, holothurian, scallop, and starfish categories.
Our proposed algorithm leverages various techniques to improve detection accuracy. Firstly, we reorganized and labeled the dataset to better suit our needs. Secondly, we embedded the attention mechanism in the Neck part of YOLOv7 to improve the detection ability of the model. Thirdly, we used Conv2Former to enable the network to obtain features that are more valuable and fuse them efficiently. Lastly, we used Wise-IoU for bounding box regression calculation to effectively avoid the drawbacks caused by the large sample gap.
Experimental results demonstrate that the Underwater-YCC algorithm can achieve improved detection accuracy under the same dataset. Our approach also exhibits robustness in the case of blurring and color bias. However, there is still ample room for improving the whole network structure, and the real-time and lightweight aspects of the underwater target detection technology need to be studied further. The proposed algorithm is promising and may serve as a starting point for future research in the field of underwater target detection.

Author Contributions

Conceptualization, X.C. and M.Y.; Formal analysis, Q.Y., H.Y. and H.W.; Funding acquisition, X.C. and H.W.; Investigation, Q.Y.; Methodology, X.C. and M.Y.; Resources, Q.Y. and H.Y.; Software, M.Y.; Validation, M.Y.; Writing—original draft, X.C. and M.Y.; Writing—review & editing, X.C., M.Y. and H.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Key project of National Natural Science Foundation of China, grant number 62031021, and Natural Science Foundation of Shaanxi Province, China, grant number 20JK0532.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

Data and results supporting the findings of this study can be obtained from the corresponding author upon reasonable request.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Sarkar, P.; De, S.; Gurung, S. A Survey on Underwater Object Detection. In Intelligence Enabled Research; Springer: Singapore, 2022; pp. 91–104. [Google Scholar]
  2. Jian, M.; Liu, X.; Luo, H.; Lu, X.; Yu, H.; Dong, J. Underwater image processing and analysis: A review. Signal Process. Image Commun. 2021, 91, 116088. [Google Scholar] [CrossRef]
  3. Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
  4. Zhao, Z.Q.; Zheng, P.; Xu, S.T.; Wu, X. Object detection with deep learning. In Proceedings of the Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018. [Google Scholar]
  5. Uijlings, J.R.R.; Van De Sande, K.E.A.; Gevers, T.; Smeulders, A.W.M. Selective search for object recognition. Int. J. Comput. Vis. 2013, 104, 154–171. [Google Scholar] [CrossRef]
  6. Girshick, R. Fast r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar]
  7. He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2961–2969. [Google Scholar]
  8. Cai, Z.; Vasconcelos, N. Cascade r-cnn: Delving into high quality object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 6154–6162. [Google Scholar]
  9. Deng, J.; Xuan, X.; Wang, W.; Li, Z.; Yao, H.; Wang, Z. A review of research on object detection based on deep learning. J. Phys. Conf. Ser. 2020, 1684, 012028. [Google Scholar] [CrossRef]
  10. Sermanet, P.; Eigen, D.; Zhang, X.; Mathieu, M.; Fergus, R.; LeCun, Y. Overfeat: Integrated recognition, localization and detection using convolutional networks. arXiv 2013, arXiv:1312.6229. [Google Scholar]
  11. Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
  12. Redmon, J.; Farhadi, A. YOLO9000: Better, faster, stronger. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 7263–7271. [Google Scholar]
  13. Redmon, J.; Farhadi, A. Yolov3: An incremental improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
  14. Bochkovskiy, A.; Wang, C.Y.; Liao, H.Y.M. Yolov4: Optimal speed and accuracy of object detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
  15. Zhao, S.; Zheng, J.; Sun, S.; Zhang, L. An Improved YOLO Algorithm for Fast and Accurate Underwater Object Detection. Symmetry 2022, 14, 1669. [Google Scholar] [CrossRef]
  16. Zhang, M.; Xu, S.; Song, W.; He, Q.; Wei, Q. Lightweight underwater object detection based on yolo v4 and multi-scale attentional feature fusion. Remote Sens. 2021, 13, 4706. [Google Scholar] [CrossRef]
  17. Li, Y.; Bai, X.; Xia, C. An Improved YOLOV5 Based on Triplet Attention and Prediction Head Optimization for Marine Organism Detection on Underwater Mobile Platforms. J. Mar. Sci. Eng. 2022, 10, 1230. [Google Scholar] [CrossRef]
  18. Zhai, X.; Wei, H.; He, Y.; Shang, Y.; Liu, C. Underwater Sea Cucumber Identification Based on Improved YOLOv5. Appl. Sci. 2022, 12, 9105. [Google Scholar] [CrossRef]
  19. Liu, Z.; Zhuang, Y.; Jia, P.; Wu, C.; Xu, H.; Liu, Z. A Novel Underwater Image Enhancement and Improved Underwater Biological Detection Pipeline. J. Mar. Sci. Eng. 2022, 10, 1204. [Google Scholar] [CrossRef]
  20. Shorten, C.; Khoshgoftaar, T.M. A survey on image data augmentation for deep learning. J. Big Data 2019, 6, 60. [Google Scholar] [CrossRef]
  21. Zhang, H.; Cisse, M.; Dauphin, Y.N.; Lopez-Paz, D. mixup: Beyond empirical risk minimization. arXiv 2017, arXiv:1710.09412. [Google Scholar]
  22. Guo, M.-H.; Xu, T.-X.; Liu, J.-J.; Liu, Z.-N.; Jiang, P.-T.; Mu, T.-J.; Zhang, S.-H.; Martin, R.R.; Cheng, M.-M.; Hu, S.-M. Attention mechanisms in computer vision: A survey. Comput. Vis. Media 2022, 8, 331–368. [Google Scholar] [CrossRef]
  23. Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
  24. Wang, C.Y.; Bochkovskiy, A.; Liao, H.Y.M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. arXiv 2022, arXiv:2207.02696. [Google Scholar]
  25. Hou, Q.; Lu, C.Z.; Cheng, M.M.; Feng, J. Conv2Former: A Simple Transformer-Style ConvNet for Visual Recognition. arXiv 2022, arXiv:2211.11943. [Google Scholar]
  26. Zhu, X.; Lyu, S.; Wang, X.; Zhao, Q. TPH-YOLOv5: Improved YOLOv5 based on transformer prediction head for object detection on drone-captured scenarios. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 2778–2788. [Google Scholar]
  27. Tong, Z.; Chen, Y.; Xu, Z.; Yu, R. Wise-IoU: Bounding Box Regression Loss with Dynamic Focusing Mechanism. arXiv 2023, arXiv:2301.10051. [Google Scholar]
  28. Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed]
  29. Li, C.; Li, L.; Jiang, H.; Weng, K.; Geng, Y.; Li, L.; Ke, Z.; Li, Q.; Cheng, M.; Nie, W.; et al. YOLOv6: A single-stage object detection framework for industrial applications. arXiv 2022, arXiv:2209.02976. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.