UO-YOLO: Ureteral Orifice Detection Network Based on YOLO and Biformer Attention Mechanism
Abstract
:1. Introduction
- Combining practical scenarios, using ConvNextV2 to improve the feature extraction backbone of the detection network.
- Introducing the SCConv convolutional structure on the basis of mitigating the impact of redundant features through spatial and channel enhancement to optimize the network structure.
- Introducing attention mechanisms in the feature fusion stage to improve detection performance.
- Replacing the original loss function to better suit the usage scenario and improve detection performance.
2. Materials and Methods
2.1. Dataset
2.2. Network Structure
2.2.1. YOLOv5
2.2.2. Backbone
2.2.3. BiFormer
2.2.4. Loss Functions
3. Experiments
4. Results and Discussion
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Huang, G.; Liu, Z.; Van Der Maaten, L.; Weinberger, K.Q. Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 4700–4708. [Google Scholar]
- Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Adam, H. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv 2017, arXiv:1704.04861. [Google Scholar]
- Tan, M.; Le, Q. Efficientnet: Rethinking model scaling for convolutional neural networks. In Proceedings of the International Conference on Machine Learning (PMLR), Long Beach, CA, USA, 10–15 June 2019; pp. 6105–6114. [Google Scholar]
- Wang, C.Y.; Liao, H.Y.M.; Wu, Y.H.; Chen, P.Y.; Hsieh, J.W.; Yeh, I.H. CSPNet: A new backbone that can enhance learning capability of CNN. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Seattle, WA, USA, 14–19 June 2020; pp. 390–391. [Google Scholar]
- Lin, T.Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2117–2125. [Google Scholar]
- Liu, S.; Qi, L.; Qin, H.; Shi, J.; Jia, J. Path aggregation network for instance segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 8759–8768. [Google Scholar]
- Ghiasi, G.; Lin, T.Y.; Le, Q.V. Nas-fpn: Learning scalable feature pyramid architecture for object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 7036–7045. [Google Scholar]
- Tan, M.; Pang, R.; Le, Q.V. Efficientdet: Scalable and efficient object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 10781–10790. [Google Scholar]
- Liu, S.; Huang, D.; Wang, Y. Learning spatial fusion for single-shot object detection. arXiv 2019, arXiv:1911.09516. [Google Scholar]
- Zhao, Q.; Sheng, T.; Wang, Y.; Tang, Z.; Chen, Y.; Cai, L.; Ling, H. M2det: A single-shot object detector based on multi-level feature pyramid network. In Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 27 January–1 February 2019; Volume 33, pp. 9259–9266. [Google Scholar]
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar]
- Girshick, R. Fast r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. In Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada, 7–12 December 2015; Volume 28. [Google Scholar]
- Fu, C.Y.; Liu, W.; Ranga, A.; Tyagi, A.; Berg, A.C. Dssd: Deconvolutional single shot detector. arXiv 2017, arXiv:1701.06659. [Google Scholar]
- Li, Z.; Zhou, F. FSSD: Feature fusion single shot multibox detector. arXiv 2017, arXiv:1712.00960. [Google Scholar]
- Terven, J.; Cordova-Esparza, D. A comprehensive review of YOLO: From YOLOv1 to YOLOv8 and beyond. arXiv 2023, arXiv:2304.00501. [Google Scholar]
- Woo, S.; Debnath, S.; Hu, R.; Chen, X.; Liu, Z.; Kweon, I.S.; Xie, S. Convnext v2: Co-designing and scaling convnets with masked autoencoders. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 16133–16142. [Google Scholar]
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 10012–10022. [Google Scholar]
- Carion, N.; Massa, F.; Synnaeve, G.; Usunier, N.; Kirillov, A.; Zagoruyko, S. End-to-end object detection with transformers. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; Springer International Publishing: Cham, Switherland, 2020; pp. 213–229. [Google Scholar]
- Park, J.; Woo, S.; Lee, J.Y.; Kweon, I.S. Bam: Bottleneck attention module. arXiv 2018, arXiv:1807.06514. [Google Scholar]
- Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
- Li, J.; Wen, Y.; He, L. SCConv: Spatial and Channel Reconstruction Convolution for Feature Redundancy. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 6153–6162. [Google Scholar]
- Wang, Q.; Wu, B.; Zhu, P.; Li, P.; Zuo, W.; Hu, Q. ECA-Net: Efficient channel attention for deep convolutional neural networks. In Proceedings of the IEEE/CVF Conference On Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 11534–11542. [Google Scholar]
- Yang, L.; Zhang, R.Y.; Li, L.; Xie, X. Simam: A simple, parameter-free attention module for convolutional neural networks. In Proceedings of the International Conference on Machine Learning (PMLR), online, 18–24 July 2021; pp. 11863–11874. [Google Scholar]
- Zhu, L.; Wang, X.; Ke, Z.; Zhang, W.; Lau, R.W. BiFormer: Vision Transformer with Bi-Level Routing Attention. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 10323–10333. [Google Scholar]
- Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 7132–7141. [Google Scholar]
- Nisha, J.S.; Gopi, V.P.; Palanisamy, P. Automated colorectal polyp detection based on image enhancement and dual-path CNN architecture. Biomed. Signal Process. Control. 2022, 73, 103465. [Google Scholar] [CrossRef]
- Luca, M.; Ciobanu, A. Polyp detection in video colonoscopy using deep learning. J. Intell. Fuzzy Syst. 2022, 43, 1751–1759. [Google Scholar] [CrossRef]
- Pacal, I.; Karaman, A.; Karaboga, D.; Akay, B.; Basturk, A.; Nalbantoglu, U.; Coskun, S. An efficient real-time colonic polyp detection with YOLO algorithms trained by using negative samples and large datasets. Comput. Biol. Med. 2022, 141, 105031. [Google Scholar] [CrossRef]
- Kim, D.; Cho, H.C.; Cho, H. Gastric lesion classification using deep learning based on fast and robust fuzzy C-means and simple linear iterative clustering superpixel algorithms. J. Electr. Eng. Technol. 2019, 14, 2549–2556. [Google Scholar] [CrossRef]
- Ding, Z.; Shi, H.; Zhang, H.; Meng, L.; Fan, M.; Han, C.; Zhang, K.; Ming, F.; Xie, X.; Liu, H.; et al. Gastroenterologist-level identification of small-bowel diseases and normal variants by capsule endoscopy using a deep-learning model. Gastroenterology 2019, 157, 1044–1054.e5. [Google Scholar] [CrossRef]
- An, N.S.; Lan, P.N.; Hang, D.V.; Long, D.V.; Trung, T.Q.; Thuy, N.T.; Sang, D.V. BlazeNeo: Blazing fast polyp segmentation and neoplasm detection. IEEE Access 2022, 10, 43669–43684. [Google Scholar] [CrossRef]
- Tang, C.P.; Chang, H.Y.; Wang, W.C.; Hu, W.X. A Novel Computer-Aided Detection/Diagnosis System for Detection and Classification of Polyps in Colonoscopy. Diagnostics 2023, 13, 170. [Google Scholar] [CrossRef]
- Lazo, J.F.; Marzullo, A.; Moccia, S.; Catellani, M.; Rosa, B.; Calimeri, F.; De Momi, E. A Lumen Segmentation Method in Ureteroscopy Images based on a Deep Residual U-Net architecture. In Proceedings of the 2020 25th International Conference on Pattern Recognition (ICPR), Milan, Italy, 10–15 January 2021. [Google Scholar] [CrossRef]
- Gupta, S.; Ali, S.; Goldsmith, L.; Turney, B.; Rittscher, J. Multi-class motion-based semantic segmentation for ureteroscopy and laser lithotripsy. Comput. Med. Imaging Graph. 2022, 101, 102112. [Google Scholar] [CrossRef]
- Black, K.M.; Law, H.; Aldoukhi, A.H.; Roberts, W.W.; Deng, J.; Ghani, K.R. Deep learning computer vision algorithm for detecting kidney stone composition: Towards an automated future—ScienceDirect. Eur. Urol. Suppl. 2019, 18, e853–e854. [Google Scholar] [CrossRef]
- Zhu, G.; Li, C.; Guo, Y.; Sun, L.; Jin, T.; Wang, Z.; Li, S.; Zhou, F. Predicting stone composition via machine-learning models trained on intra-operative endoscopic digital images. BMC Urol. 2024, 24, 5. [Google Scholar] [CrossRef]
- Elton, D.C.; Turkbey, E.B.; Pickhardt, P.J.; Summers, R.M. A deep learning system for automated kidney stone detection and volumetric segmentation on noncontrast CT scans. Med. Phys. 2022, 49, 2545–2554. [Google Scholar] [CrossRef]
- El Beze, J.; Mazeaud, C.; Daul, C.; Ochoa-Ruiz, G.; Daudon, M.; Eschwège, P.; Hubert, J. Evaluation and understanding of automated urinary stone recognition methods. BJU Int. 2022, 130, 786–798. [Google Scholar] [CrossRef]
- Peng, X.; Liu, D.; Li, Y.; Xue, W.; Qian, D. Real-time detection of ureteral orifice in urinary endoscopy videos based on deep learning. In Proceedings of the 2019 41st Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Berlin, Germany, 23–27 July 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 1637–1640. [Google Scholar]
- Liu, D.; Peng, X.; Liu, X.; Li, Y.; Bao, Y.; Xu, J.; Bian, X.; Xue, W.; Qian, D. A real-time system using deep learning to detect and track ureteral orifices during urinary endoscopy. Comput. Biol. Med. 2021, 128, 104104. [Google Scholar] [CrossRef]
- Ren, S.; Zhou, D.; He, S.; Feng, J.; Wang, X. Shunted self-attention via multi-scale token aggregation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 10853–10862. [Google Scholar]
- Tong, Z.; Chen, Y.; Xu, Z.; Yu, R. Wise-IoU: Bounding Box Regression Loss with Dynamic Focusing Mechanism. arXiv 2023, arXiv:2301.10051. [Google Scholar]
- Ultralytics. YOLOv5, PyTorch Implementation of YOLOv5. 2020. Available online: https://github.com/ultralytics/yolov5 (accessed on 25 May 2024).
Part | Color Format | Picture Format | Case | Figure |
---|---|---|---|---|
Train | RGB | Jpg | 84 | 820 |
Val | 20 | 223 |
Parameters | Describe | Value |
---|---|---|
fl_gamma | focal loss gamma | 1.5 |
hsv_h | HSV-Hue augmentation | 0.015 |
hsv_s | HSV-Saturation augmentation | 0.7 |
hsv_v | HSV-Value augmentation | 0.4 |
degrees | image rotation | 0.0 |
translate | image translation | 0.1 |
scale | image scale | 0.5 |
shear | image shear | 0.01 |
perspective | image perspective | 0.0 |
flipud | image flip up-down | 0.5 |
fliplr | image flip left-right | 0.5 |
mocaic | image mosaic | 0.0 |
mixup | image mixup | 0.0 |
Module | p-Value | R-Value | mAP@0.5 |
---|---|---|---|
YOLOv5-p1 | 0.819 | 0.811 | 0.835 |
YOLOv5-p2 | 0.858 | 0.803 | 0.851 |
YOLOv5-p3 | 0.838 | 0.798 | 0.864 |
Ours | 0.928 | 0.756 | 0.896 |
Module | GFLOPs | Inference Time | p-Value | R-Value | mAP@0.5 |
---|---|---|---|---|---|
YOLOv8 | 78.7 | 3.5ms | 0.830 | 0.771 | 0.850 |
YOLOv7 | 105.1 | 4.6ms | 0.885 | 0.762 | 0.815 |
YOLOv5 | 47.9 | 3.7ms | 0.894 | 0.758 | 0.828 |
Ours | 59.8 | 5.7ms | 0.928 | 0.756 | 0.896 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Liang, L.; Yuanjun, W. UO-YOLO: Ureteral Orifice Detection Network Based on YOLO and Biformer Attention Mechanism. Appl. Sci. 2024, 14, 5124. https://doi.org/10.3390/app14125124
Liang L, Yuanjun W. UO-YOLO: Ureteral Orifice Detection Network Based on YOLO and Biformer Attention Mechanism. Applied Sciences. 2024; 14(12):5124. https://doi.org/10.3390/app14125124
Chicago/Turabian StyleLiang, Li, and Wang Yuanjun. 2024. "UO-YOLO: Ureteral Orifice Detection Network Based on YOLO and Biformer Attention Mechanism" Applied Sciences 14, no. 12: 5124. https://doi.org/10.3390/app14125124
APA StyleLiang, L., & Yuanjun, W. (2024). UO-YOLO: Ureteral Orifice Detection Network Based on YOLO and Biformer Attention Mechanism. Applied Sciences, 14(12), 5124. https://doi.org/10.3390/app14125124