ASC-YOLO: Multi-Scale Feature Fusion and Adaptive Decoupled Head for Fracture Detection in Medical Imaging
Abstract
1. Introduction
Related Works and Clinical Motivations
2. Materials and Methods
2.1. Datasets and Preprocessing
2.2. Implementation Details
2.3. Model Architecture
2.4. SSFF Module Design
SSFF Module: Feature Alignment and Noise Suppression
- Cross-Scale Feature Fusion: The SSFF module achieves precise alignment of low-level and high-level features through adaptive pooling and nearest-neighbor interpolation. Low-level features retain fine details and textures, while high-level features preserve semantic information. This alignment ensures that important details are maintained while avoiding the loss of crucial information due to resolution discrepancies.
- Channel Attention Mechanism: After feature fusion, a channel attention mechanism is applied to suppress redundant or noisy information. This mechanism learns inter-channel relationships and generates importance weights, which are applied to the fused features, emphasizing critical features and effectively filtering out noise and irrelevant information.
2.5. AsDDet Head
2.5.1. Depthwise Separable Convolution (DWConv)
- Computational complexity ( vs. )
- Number of learnable parameters
2.5.2. Distribution Focal Loss (DFL)
2.5.3. Decoupling Classification and Regression in AsDDet for Enhanced Small-Target Localization
- Depthwise Convolutions for Classification: In the classification branch, we use depthwise separable convolutions, which are highly effective at capturing local features without the computational burden of standard convolutions. This allows the model to focus on the critical textures and fine patterns that are essential for detecting small targets, such as fractures or anomalies in medical images. Importantly, this method reduces the complexity, making the classification task more efficient and faster, without compromising accuracy.
- Shuffle Operations for Regression: In the regression branch, shuffle operations are applied to enhance the model’s ability to understand spatial relationships. By rearranging the channels, we increase the communication between them, allowing the model to better learn the geometric relationships between fracture boundaries and other important structures in the image. This method helps the model handle complex shapes, such as irregular fracture lines, which significantly boosts localization accuracy, especially for small targets.
- Decoupling for Task Optimization: Decoupling the two tasks allows the model to specialize. The classification task focuses solely on identifying textures and patterns, while the regression task zeroes in on positioning. This approach minimizes cross-task interference, enabling the model to refine its performance for both tasks, which leads to more accurate detection of small targets like fractures, particularly in noisy and challenging environments such as medical X-rays.
2.6. The Loss Function EfficiCIoU
- Aspect Ratio Sensitivity: CIoU constrains aspect ratio similarity via the arctan function but shows insensitivity to minor absolute size differences and lacks multi-scale adaptability [26]
- Scale-aware Penalty: CIoU applies equal penalty weights to targets of different scales, while medical imaging exhibits significant scale variations (e.g., pediatric wrist bones are only 40% the size of adult bones)
- is the center distance penalty, inherited from CIoU.
- are the absolute width/height difference penalties, which directly constrain the absolute errors between predicted and ground truth boxes to address CIoU’s insensitivity to minor size variations. The penalty terms are defined as:
Detailed Analysis of EfficiCIoU Loss and Its Role in Fracture Detection
2.7. Contribution Analysis of SSFF and AsDDet Modules
3. Results
3.1. Main Performance Evaluation
3.2. Ablation Study
3.3. Generalization Analysis
3.3.1. Dataset Preparation
3.3.2. Results of the Experiment
4. Discussion
4.1. Generalization Analysis
4.2. Deployment-Oriented Model Selection and Optimization
4.3. Limitations and Prospects
5. Conclusions
Author Contributions
Funding
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Görgens, S.; Patel, D.; Keenan, K.; Fishbein, J.; Bullaro, F. Assessing the Variability of Antibiotic Management in Patients with Open Hand Fractures Presenting to the Pediatric Emergency Department. Pediatr. Emerg. Care 2022, 38, 502–505. [Google Scholar] [CrossRef] [PubMed]
- George, M.P.; Bixby, S. Frequently missed fractures in pediatric trauma: A pictorial review of plain film radiography. Radiol. Clin. 2019, 57, 843–855. [Google Scholar] [CrossRef] [PubMed]
- Nagy, E.; Janisch, M.; Hržić, F.; Sorantin, E.; Tschauner, S. A pediatric wrist trauma x-ray data-set (grazpedwri-dx) for machine learning. Sci. Data 2022, 9, 222. [Google Scholar] [CrossRef] [PubMed]
- Sohan, M.; Sai Ram, T.; Rami Reddy, C.V. A review on yolov8 and its advancements. In Proceedings of the International Conference on Data Intelligence and Cognitive Informatics, Tirunelveli, India, 18–20 November 2024; Springer: Singapore, 2024; pp. 529–545. [Google Scholar]
- Wang, C.Y.; Yeh, I.H.; Liao, H.Y.M. Yolov9: Learning what you want to learn using programmable gradient information. In Proceedings of the European Conference on Computer Vision, Milan, Italy, 29 September–4 October 2024; Springer Nature: Cham, Switzerland, 2024; pp. 1–21. [Google Scholar]
- Wang, A.; Chen, H.; Liu, L.; Chen, K.; Lin, Z.; Han, J.; Ding, G. Yolov10: Real-time end-to-end object detection. In Proceedings of the 38th International Conference on Neural Information Processing Systems, Vancouver, BC, Canada, 10–15 December 2024; Volume 37, pp. 107984–108011. [Google Scholar]
- Cao, K.; Liu, M.; Su, H.; Wu, J.; Zhu, J.; Liu, S. Analyzing the noise robustness of deep neural networks. IEEE Trans. Vis. Comput. Graph. 2020, 27, 3289–3304. [Google Scholar] [CrossRef] [PubMed]
- Liu, J.; Fan, X.; Jiang, J.; Liu, R.; Luo, Z. Learning a deep multi-scale feature ensemble and an edge-attention guidance for image fusion. IEEE Trans. Circuits Syst. Video Technol. 2021, 32, 105–119. [Google Scholar] [CrossRef]
- Li, X.; Wang, W.; Wu, L.; Chen, S.; Hu, X.; Li, J.; Tang, J.; Yang, J. Generalized focal loss: Learning qualified and distributed bounding boxes for dense object detection. In Proceedings of the 4th Conference on Neural Information Processing Systems (NeurIPS 2020), Online, 6–12 December 2020; Volume 33, pp. 21002–21012. [Google Scholar]
- Hua, J.; Wang, Z.; Zou, Q.; Xiao, J.; Tian, X.; Zhang, Y. Re-decoupling the classification branch in object detectors for few-class scenes. Pattern Recognit. 2024, 153, 110541. [Google Scholar] [CrossRef]
- Wen, N.; Guo, R.; Ma, D.; Ye, X.; He, B. AIoU: Adaptive bounding box regression for accurate oriented object detection. Int. J. Intell. Syst. 2022, 37, 748–769. [Google Scholar] [CrossRef]
- Dibo, R.; Galichin, A.; Astashev, P.; Dylov, D.V.; Rogov, O.Y. Deeploc: Deep learning-based bone pathology localization and classification in wrist X-ray images. In Proceedings of the International Conference on Analysis of Images, Social Networks and Texts, Yerevan, Armenia, 28–30 September 2023; Springer Nature: Cham, Switzerland, 2023; pp. 199–211. [Google Scholar]
- Ahmed, A.; Imran, A.S.; Manaf, A.; Kastrati, Z.; Daudpota, S.M. Enhancing Wrist Fracture Detection with YOLO. arXiv 2024, arXiv:2407.12597. [Google Scholar]
- Ju, R.Y.; Cai, W. Fracture detection in pediatric wrist trauma X-ray images using YOLOv8 algorithm. Sci. Rep. 2023, 13, 20077–20090. [Google Scholar] [CrossRef] [PubMed]
- Chien, C.T.; Ju, R.Y.; Chou, K.Y.; Chiang, J.S. YOLOv9 for fracture detection in pediatric wrist trauma X-ray images. Electron. Lett. 2024, 60, e13248. [Google Scholar] [CrossRef]
- Goceri, E. Medical image data augmentation: Techniques, comparisons and interpretations. Artif. Intell. Rev. 2023, 56, 12561–12605. [Google Scholar] [CrossRef] [PubMed]
- Gao, Y.; Dai, Y.; Liu, F.; Chen, W.; Shi, L. An anatomy-aware framework for automatic segmentation of parotid tumor from multimodal MRI. Comput. Biol. Med. 2023, 161, 107000. [Google Scholar] [CrossRef] [PubMed]
- Xu, Z.; Zhang, X.; Zhang, H.; Liu, Y.; Zhan, Y.; Lukasiewicz, T. EFPN: Effective medical image detection using feature pyramid fusion enhancement. Comput. Biol. Med. 2023, 163, 107149. [Google Scholar] [CrossRef] [PubMed]
- Liu, S.; Huang, D.; Wang, Y. Path aggregation network for instance segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 8759–8768. [Google Scholar]
- Ghiasi, G.; Lin, T.Y.; Le, Q.V. NAS-FPN: Learning scalable feature pyramid architecture for object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 7036–7045. [Google Scholar]
- Huang, H.; Chen, Z.; Zou, Y.; Lu, M.; Chen, C. Channel prior convolutional attention for medical image segmentation. Comput. Biol. Med. 2024, 178, 108784. [Google Scholar] [CrossRef] [PubMed]
- Kanakis, M.; Bruggemann, D.; Saha, S.; Georgoulis, S.; Obukhov, A.; Gool, L.V. Reparameterizing convolutions for incremental multi-task learning without task interference. In Proceedings, Part XX 16, Proceedings of the Computer Vision—ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; Springer International Publishing: Berlin/Heidelberg, Germany, 2020; pp. 689–707. [Google Scholar]
- Zhang, X.; Zhou, H.; Ma, N.; Sun, J. ShuffleNet V2: Practical guidelines for efficient CNN architecture design. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 1234–1246. [Google Scholar]
- Wang, C.Y.; Bochkovskiy, A.; Liao, H.Y.M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 1–15. [Google Scholar]
- Liu, B.; Liu, X.; Jin, X.; Stone, P.; Liu, Q. Conflict-averse gradient descent for multi-task learning. In Proceedings of the 35th Conference on Neural Information Processing Systems (NeurIPS 2021), Online, 6–14 December 2021; Volume 34, pp. 18878–18890. [Google Scholar]
- Rezatofighi, H.; Tsoi, N.; Gwak, J.; Sadeghian, A.; Reid, I.; Savarese, S. Generalized intersection over union: A metric and a loss for bounding box regression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 658–666. [Google Scholar]
- Lin, T.Y.; Goyal, P.; Girshick, R. Focal loss for dense object detection. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 42, 318–327. [Google Scholar] [CrossRef] [PubMed]
- Zhao, Y.; Lv, W.; Xu, S.; Wei, J.; Wang, G.; Dang, Q.; Liu, Y.; Chen, J. Detrs beat yolos on real-time object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024; pp. 16965–16974. [Google Scholar]
Models | Parameters | Recall | mAP@50 | mAP@50-95 | F1-Score | Jaccard Index |
---|---|---|---|---|---|---|
YOLOv8 | 3.1 | 0.510 | 0.537 | 0.328 | 0.61 | 0.439 |
YOLOv10 | 4.3 | 0.520 | 0.540 | 0.338 | 0.621 | 0.45 |
YOLOv11 | 5.1 | 0.520 | 0.542 | 0.336 | 0.621 | 0.45 |
RT-DETR | 8.9 | 0.530 | 0.533 | 0.340 | 0.624 | 0.454 |
Ours | 2.8 | 0.550 | 0.611 | 0.402 | 0.655 | 0.487 |
Configuration | Parameters | Recall | mAP@50 | mAP@50-95 |
---|---|---|---|---|
YOLOv8 | 3.1 | 0.510 | 0.537 | 0.328 |
+EfficiCIoU | 3.1 | 0.520 | 0.548 | 0.341 |
+AsDDet Head | 3.2 | 0.520 | 0.563 | 0.342 |
+SSFF Module | 2.8 | 0.550 | 0.611 | 0.402 |
Models | Parameters | Recall | mAP@50 | mAP@50-95 |
---|---|---|---|---|
YOLOv11 | 5.1 | 0.520 | 0.532 | 0.343 |
RT-DETR | 8.3 | 0.490 | 0.501 | 0.309 |
ASC-YOLO | 2.9 | 0.550 | 0.602 | 0.412 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Du, S.; Wei, Y. ASC-YOLO: Multi-Scale Feature Fusion and Adaptive Decoupled Head for Fracture Detection in Medical Imaging. Appl. Sci. 2025, 15, 9031. https://doi.org/10.3390/app15169031
Du S, Wei Y. ASC-YOLO: Multi-Scale Feature Fusion and Adaptive Decoupled Head for Fracture Detection in Medical Imaging. Applied Sciences. 2025; 15(16):9031. https://doi.org/10.3390/app15169031
Chicago/Turabian StyleDu, Shenghong, and Yan Wei. 2025. "ASC-YOLO: Multi-Scale Feature Fusion and Adaptive Decoupled Head for Fracture Detection in Medical Imaging" Applied Sciences 15, no. 16: 9031. https://doi.org/10.3390/app15169031
APA StyleDu, S., & Wei, Y. (2025). ASC-YOLO: Multi-Scale Feature Fusion and Adaptive Decoupled Head for Fracture Detection in Medical Imaging. Applied Sciences, 15(16), 9031. https://doi.org/10.3390/app15169031