OrthoDETR: A Streamlined Transformer-Based Approach for Precision Detection of Orthopedic Medical Devices
Abstract
:1. Introduction
- (1)
- We introduce a unique object detection strategy, termed as OrthoDETR, that is specifically designed for the efficient and accurate identification of orthopedic medical devices. It counters the limitations of existing models and operates effectively against the challenges emerged from intricate medical imaging data and varied device appearances.
- (2)
- OrthoDETR extends the core architecture of DETR (Detection Transformer) and incorporates several significant innovations to better accommodate the orthopedic domain. These key enhancements involve substituting the ResNet backbone with an MLP-Mixer for superior feature extraction, refining the multi-head self-attention mechanism for enhanced context comprehension, and adjusting the loss function for optimized model training.
- (3)
- Through rigorous experimentation, we demonstrate that OrthoDETR provides considerable improvements in detection speed, while only resulting in a slight decrease in performance. This makes it a valuable tool for detecting orthopedic medical devices, particularly in the context of fine-grained management during workflow processes.
2. Related Work
3. Materials and Methods
3.1. Improved DETR Model
3.1.1. ResNet Replacement for MLP-Mixer
3.1.2. Improved Transformer Encoder
3.1.3. Optimization of Loss Functions
4. Results
4.1. Dataset
4.1.1. Data Set Analysis and Annotation Instructions
4.1.2. Rational Assessment of Data Sets
4.1.3. Data Enhancements
4.2. Experimental Setup
4.3. Experimental Results and Analysis
4.3.1. Horzontal Comparative Experiment
4.3.2. Improved Strategy Ablation Experiment
4.3.3. Data Enhanced Ablation Experiments
4.3.4. Example Images and Analysis of Test Results
4.4. Complexity and Cost Analysis of OrthoDETR
4.4.1. Computational Complexity
4.4.2. Memory Usage and Training Cost
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Chua, C.Y.X.; Liu, H.-C.; Di Trani, N.; Susnjar, A.; Ho, J.; Scorrano, G.; Rhudy, J.; Sizovs, A.; Lolli, G.; Hernandez, N.; et al. Carbon fiber reinforced polymers for implantable medical devices. Biomaterials 2021, 271, 120719. [Google Scholar] [CrossRef] [PubMed]
- Huzum, B.; Puha, B.; Necoara, R.M.; Gheorghevici, S.; Puha, G.; Filip, A.; Sirbu, P.D.; Alexa, O. Biocompatibility assessment of biomaterials used in orthopedic devices: An overview (Review). Exp. Ther. Med. 2021, 22, 1315. [Google Scholar] [CrossRef] [PubMed]
- Wang, L.; Ding, X.; Feng, W.; Gao, Y.; Zhao, S.; Fan, Y. Biomechanical study on implantable and interventional medical devices. Acta Mech. Sin. 2021, 37, 875–894. [Google Scholar] [CrossRef]
- Wang, Y.; Xu, K.; Wang, Y.; Ye, W.; Hao, X.; Wang, S.; Li, K.; Du, J. Investigation and analysis of four countries’ recalls of osteosynthesis implants and joint replacement implants from 2011 to 2021. J. Orthop. Surg. Res. 2022, 17, 443. [Google Scholar] [CrossRef] [PubMed]
- Sambolek, S.; Ivasic-Kos, M. Automatic person detection in search and rescue operations using deep CNN detectors. IEEE Access 2021, 9, 37905–37922. [Google Scholar] [CrossRef]
- Maity, M.; Banerjee, S.; Chaudhuri, S.S. Faster r-cnn and yolo based vehicle detection: A survey. In Proceedings of the 2021 5th International Conference on Computing Methodologies and Communication (ICCMC), Erode, India, 8–10 April 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 1442–1447. [Google Scholar]
- Chu, Y.; Yang, X.; Li, H.; Ai, D.; Ding, Y.; Fan, J.; Song, H.; Yang, J. Multi-level feature aggregation network for instrument identification of endoscopic images. Phys. Med. Biol. 2020, 65, 165004. [Google Scholar] [CrossRef] [PubMed]
- Carion, N.; Massa, F.; Synnaeve, G.; Usunier, N.; Kirillov, A.; Zagoruyko, S. End-to-end object detection with transformers. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; Springer International Publishing: Cham, Switzerland, 2020; pp. 213–229. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 770–778. [Google Scholar]
- Tolstikhin, I.O.; Houlsby, N.; Kolesnikov, A.; Beyer, L.; Zhai, X.; Unterthiner, T.; Yung, J.; Steiner, A.; Keysers, D.; Uszkoreit, J.; et al. Mlp-mixer: An all-mlp architecture for vision. Adv. Neural Inf. Process. Syst. 2021, 34, 24261–24272. [Google Scholar]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016. [Google Scholar]
- Jiang, P.; Ergu, D.; Liu, F.; Cai, Y.; Ma, B. A review of yolo algorithm developments. Procedia Comput. Sci. 2022, 199, 1066–1073. [Google Scholar] [CrossRef]
- Ge, Z.; Liu, S.; Wang, F.; Li, Z.; Sun, J. Yolox: Exceeding yolo series in 2021. arXiv 2021, arXiv:2107.08430. [Google Scholar]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
- Song, H.; Sun, D.; Chun, S.; Jampani, V.; Han, D.; Heo, B.; Kim, W.; Yang, M.H. An extendable, efficient and effective transformer-based object detector. arXiv 2022, arXiv:2204.07962. [Google Scholar]
- Dai, X.; Chen, Y.; Yang, J.; Zhang, P.; Yuan, L.; Zhang, L. Dynamic detr: End-to-end object detection with dynamic attention. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 2988–2997. [Google Scholar]
- Ickler, M.K.; Baumgartner, M.; Roy, S.; Wald, T.; Maier-Hein, K.H. Taming Detection Transformers for Medical Object Detection. In BVM Workshop; Springer Fachmedien Wiesbaden: Wiesbaden, Germany, 2023; pp. 183–188. [Google Scholar]
- Mathesul, S.; Swain, D.; Satapathy, S.K.; Rambhad, A.; Acharya, B.; Gerogiannis, V.C.; Kanavos, A. COVID-19 Detection from Chest X-ray Images Based on Deep Learning Techniques. Algorithms 2023, 16, 494. [Google Scholar] [CrossRef]
- Sakaida, M.; Yoshimura, T.; Tang, M.; Ichikawa, S.; Sugimori, H. Development of a Mammography Calcification Detection Algorithm Using Deep Learning with Resolution-Preserved Image Patch Division. Algorithms 2023, 16, 483. [Google Scholar] [CrossRef]
- Carballo, J.A.; Bonilla, J.; Fernández-Reche, J.; Nouri, B.; Avila-Marin, A.; Fabel, Y.; Alarcón-Padilla, D.C. Cloud Detection and Tracking Based on Object Detection with Convolutional Neural Networks. Algorithms 2023, 16, 487. [Google Scholar] [CrossRef]
- Sami, A.A.; Sakib, S.; Deb, K.; Sarker, I.H. Improved YOLOv5-Based Real-Time Road Pavement Damage Detection in Road Infrastructure Management. Algorithms 2023, 16, 452. [Google Scholar] [CrossRef]
- Du, G.; Cao, X.; Liang, J.; Chen, X.; Zhan, Y. Medical image segmentation based on u-net: A Review. J. Imaging Sci. Technol. 2020, 64, 020508. [Google Scholar] [CrossRef]
- Ji, Y.; Zhang, R.; Li, Z.; Ren, J.; Zhang, S.; Luo, P. Uxnet: Searching multi-level feature aggregation for 3d medical image segmentation. In Proceedings of the Medical Image Computing and Computer Assisted Intervention–MICCAI 2020: 23rd International Conference, Lima, Peru, 4–8 October 2020; Proceedings, Part I 23. Springer International Publishing: Berlin/Heidelberg, Germany, 2020; pp. 346–356. [Google Scholar]
- Grignaffini, F.; Troiano, M.; Barbuto, F.; Simeoni, P.; Mangini, F.; D’andrea, G.; Piazzo, L.; Cantisani, C.; Musolff, N.; Ricciuti, C.; et al. Anomaly Detection for Skin Lesion Images Using Convolutional Neural Network and Injection of Handcrafted Features: A Method That Bypasses the Preprocessing of Dermoscopic Images. Algorithms 2023, 16, 466. [Google Scholar] [CrossRef]
- Wang, H.; Qiu, S.; Ye, H.; Liao, X. A Plant Disease Classification Algorithm Based on Attention MobileNet V2. Algorithms 2023, 16, 442. [Google Scholar] [CrossRef]
- Apostolopoulos, D.J.; Apostolopoulos, I.D.; Papathanasiou, N.D.; Spyridonidis, T.; Panayiotakis, G.S. Explainable Artificial Intelligence Method (ParaNet+) Localises Abnormal Parathyroid Glands in Scintigraphic Scans of Patients with Primary Hyperparathyroidism. Algorithms 2023, 16, 435. [Google Scholar] [CrossRef]
- Han, K.; Wang, Y.; Chen, H.; Chen, X.; Guo, J.; Liu, Z.; Tang, Y.; Xiao, A.; Xu, C.; Xu, Y.; et al. A survey on vision transformer. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 45, 87–110. [Google Scholar] [CrossRef] [PubMed]
- Yang, Z.; Yang, D.; Dyer, C.; He, X.; Smola, A.; Hovy, E. Hierarchical attention networks for document classification. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego, CA, USA, 12–17 June 2016; pp. 1480–1489. [Google Scholar]
- Zhang, S.; Loweimi, E.; Bell, P.; Renals, S. Windowed attention mechanisms for speech recognition. In Proceedings of the ICASSP 2019—2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK, 12–17 May 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 7100–7104. [Google Scholar]
- Tay, Y.; Bahri, D.; Yang, L.; Metzler, D.; Juan, D.C. Sparse sinkhorn attention. In Proceedings of the International Conference on Machine Learning, Virtual, 13–18 July 2020; PMLR. pp. 9438–9447. [Google Scholar]
- Fan, X.; Liu, Z.; Lian, J.; Zhao, W.X.; Xie, X.; Wen, J.R. Lighter and better: Low-rank decomposed self-attention networks for next-item recommendation. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, Paris, France, 11–15 July 2021; pp. 1733–1737. [Google Scholar]
Model | AP50 | AP50:95 | AR50:95 | FPS | Parmeters (Millions) | FLOPs (Billion) |
---|---|---|---|---|---|---|
DETR | 0.852 | 0.842 | 0.862 | 20 | 41.5 | 244 |
Faster R-CNN | 0.865 | 0.815 | 0.845 | 24 | 134.0 | 150 |
YOLOv8 | 0.886 | 0.852 | 0.893 | 33 | 64.9 | 139 |
SSD | 0.835 | 0.793 | 0.820 | 28 | 26.3 | 31 |
RetinaNet | 0.861 | 0.812 | 0.847 | 22 | 36.8 | 138 |
OrthoDETR (Ours) | 0.897 | 0.864 | 0.895 | 26 | 39.7 | 123 |
MLP-Mixer Backbone | Improved Transformer | Optimized Loss Function | MAP | FPS |
---|---|---|---|---|
0.718 | 20 | |||
√ | 0.703 | 24 | ||
√ | 0.750 | 23 | ||
√ | 0.739 | 19 | ||
√ | √ | √ | 0.756 | 26 |
Model | Average Precision |
---|---|
Baseline Model | 80.0% |
Contrast Enhancement | 82.5% |
Noise Addition | 81.3% |
Brightness Adjustment | 83.2% |
Flipping | 82.4% |
Rotation | 82.1% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhang, X.; Li, H.; Li, J.; Zhou, X. OrthoDETR: A Streamlined Transformer-Based Approach for Precision Detection of Orthopedic Medical Devices. Algorithms 2023, 16, 550. https://doi.org/10.3390/a16120550
Zhang X, Li H, Li J, Zhou X. OrthoDETR: A Streamlined Transformer-Based Approach for Precision Detection of Orthopedic Medical Devices. Algorithms. 2023; 16(12):550. https://doi.org/10.3390/a16120550
Chicago/Turabian StyleZhang, Xiaobo, Huashun Li, Jingzhao Li, and Xuehai Zhou. 2023. "OrthoDETR: A Streamlined Transformer-Based Approach for Precision Detection of Orthopedic Medical Devices" Algorithms 16, no. 12: 550. https://doi.org/10.3390/a16120550