ASC-YOLO: Multi-Scale Feature Fusion and Adaptive Decoupled Head for Fracture Detection in Medical Imaging
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsThe manuscript presents a new neural network for detecting fractures in X-ray dataset. The network consists of several components that the authors say would help improve the object detection, and a novel loss function is proposed. In general, the method descriptions lack in details and need precise clarifications. I have several comments that the authors need to address.
- (All figures) please expand the figure captions. Full names for the abbreviated terms are required. For example, in Figure 1 caption, C2f, SPPF, SSFF, and AsDDet need to have their full names.
- (the first paragraph of 2.5) discrete probability distribution regression (DFL) needs to be clearly defined and explained.
- (Figure 3 and 4) It is difficult to understand the flowcharts.
- (Eq 14) It is difficult to understand why arctan is useful.
- The authors should explicitly mention the number of classes used for the fracture detection.
- (Figure 5) The authors should include ground truths for the fractures and explain why they should be diagnosed as fractures. It is difficult to see the presence of fractures because of the presence of the bounding boxes.
- (Table 2) What does mAP@50-95 mean? Why is mAP@50-95 always smaller than mAP@50? Please explain.
- (Figure 5 and 6 captions) The captions have 222222. These are not appropriate expressions.
There are many grammatical errors found in the manuscript.
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Reviewer 2 Report
Comments and Suggestions for AuthorsThe article lacks in originality and merit of scientific soundness.
The methodology section lacks in logical flow, it is hard to read and follow the text.
Just YOLO model has been considered in the problem formulation, other models to be considered and evaluated.
The purpose of AsDDet head is not properly justified.
The article lacks in novelty and enough experimental analysis.
Apart from this, there are a number of language and formatting issues, which is considered as very poor standard for this journal level.
Comments on the Quality of English Language
How this abbreviate to ... discrete probability distribution regression (DFL) ??
In first instance DFL is cited to [3], in which DFL is completely different meaning. This impacts the originality of this article!
Check the figure and table captions!! A figure name within "", another shows as 6.22222, etc.
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Reviewer 3 Report
Comments and Suggestions for AuthorsSeveral classification tasks should be mentioned in the abstract to determine the benchmark's complexity. Using a single or local dataset could interfere with the generalization of the final results.
The introduction should include a list of the main findings of this interesting study. Additionally, a section focused on related works must be developed, as well as a comparative table to position the advantages of this research over the literature.
Table 1 shows the target detection results. We can see the proposed method's improvement; however, the overall results are not reliable for medical applications. Please, explain how such low-level metric results can be useful. The authors should include the F1-measure, G-mean, and Jaccard Index.
The paper is carelessly written in some sections. For instance, the captions in Figures 5 and 6 are wrong or incomplete. Besides, all subimages should be carefully described. The paragraph in line 304 should be revised.
All references to figures are wrong, and Figure 6 seems to belong to another database. The quality of the paper is at stake.
The author should include additional datasets to validate the proposed method.
The section used for the Conclusions is missing.
The bibliographical references are outdated. Please, it is highly recommended to use references from the last five years and high-impact journals.
Comments on the Quality of English LanguageThe manuscript requires deep proofreading.
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Reviewer 4 Report
Comments and Suggestions for AuthorsThe paper focuses on diagnosing human bone fractures using X-ray medical images. It introduces the ASC‑YOLO 5 model—based on YOLO—that incorporates a Semantic‑Scaled Feature Fusion module to enhance multi-scale feature extraction through inter-layer interaction. These methods are applied in automated medical diagnostic systems.
The paper demonstrates scientific novelty and practical significance.
The following comments and suggestions are intended to improve the manuscript:
- There are many abbreviations used throughout the paper, which impede readability. The authors should provide a glossary or legend, clearly defining each term upon its first occurrence.
- The literature review in Section 1 should conclude with a summary of limitations found in existing methods. This would clarify the gaps the current work aims to address.
- The manuscript lacks a clearly stated problem definition and the authors' specific contributions. These should be explicitly formulated.
- Section 2 begins directly with the implementation steps of the proposed method. It would benefit from an introductory schematic diagram that outlines the overall approach before diving into specifics.
- The authors present the formula for the Efficient CIoU (Complete IoU) loss, but have not justified their choice of this metric. A rationale—perhaps comparing it to other bounding box loss functions—should be included.
- Figures 5 and 6 require revision for clarity. The authors should refine these visuals to improve readability and comprehension.
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Reviewer 5 Report
Comments and Suggestions for Authors
This paper introduces ASC‑YOLO, an enhanced YOLO-based model with SSFF and AsDDet modules to boost multi‑scale feature fusion and small‑fracture localization accuracy in radiographs, achieving significant mAP improvements.
- The Introduction should be improved by clearly highlighting the paper’s novelty and concluding with a brief outline of the workflow for the subsequent sections. The last paragraph is too lengthy in the introduction.
- Throughout the paper, figures and tables lack proper explanations in the text and should each be described in detail. Additionally, captions for Figures 5 and 6 are missing, and all other captions need enhancement. Please also verify and correct the figure numbering in the text.
- How does the SSFF module align and fuse low‑level and high‑level features to reduce background interference and enhance multi‑scale feature complementarity? In what way, does the AsDDet detection head’s decoupling of classification and regression tasks using depthwise convolutions for classification and Shuffle operations for regression, improve small‑target localization accuracy? How does the Discrete Probability Distribution Regression (DFL) method mitigate quantization errors and enhance robustness compared to traditional linear bounding‑box regression? What are the components of the EfficiCIoU loss function, and how do the added width/height penalty term and adaptive normalization strategy increase sensitivity to irregular fracture shapes?
- How does ASC‑YOLO’s integration of the SSFF module and AsDDet detection head contribute to its 7.4% mAP@50 improvement over YOLOv8 and superior performance across Precision, Recall, mAP@50, and mAP@50-95 on the GRAZPEDWRI-DX dataset? In what ways do the visualization results illustrate ASC‑YOLO’s enhanced accuracy and robustness in detecting small fracture targets amid complex background interference?
- Please explain how each variable was chosen to obtain the reported results. Further results are required.
- Overall, the presentation of the results and the explanations of the methods must be improved. The authors should enhance the paper’s structure and readability and provide clearer explanations of the results.
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Round 2
Reviewer 1 Report
Comments and Suggestions for AuthorsI have comments that the authors need to address.
- (Figure 1) The input image is provided, which is good. The output is not provided. It needs connections from the three AsDDet blocks to the output.
- (Figure 2) change mid-level characteristics to mid-level features. In Figure 1, the SSFF module accepts two inputs. But in Figure 2 the SSFF module accepts three inputs. please make it consistent.
- (2.5.2. Distribution Focal Loss) It needs citations regarding DFL.
- (Table 2) When looking at the Configuration column, I understand that the first row uses only Yolov8, the second row uses Yolov8 + EfficiCIoU, the 3rd row uses Yolov8 + EfficiCIoU + AsDDet Head, and the 4th row uses Yolov8 + EfficiCIoU + AsDDet Head + SSFF module. So model complexity seems to increase by adding modules. But the number of parameters decreases from top to bottom. Please explain clearly what happens.
- (3.3.1. Dataset Preparation) It seems to use the brain tumor dataset for a generalization purpose only. But it mentions train/test split. Did the authors use the train dataset to test the models' generalization capability? please provide a link to the data repository.
- (3.3.1. Dataset Preparation) It states that negative means images without brain tumors. But, when looking at the leftmost figure in (a) ASC-YOLO in Figure 6, I think the object inside the bounding box looks like brain tumors but it is labeled as negative. This needs clear explanation. Also correct the figure label (a) ASC-YOLO to (c) ASC-YOLO in Figure 6.
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Reviewer 2 Report
Comments and Suggestions for Authors Provide the in/out size of each block in figure 1 and 2. The result has to be benchmarked with literature. Still, the article needs to be improved in terms of language and punctuation.The article to be enhanced before considering for publication:
1. The terms are defined several times, it makes the reader lose focus, for example Distribution Focal Loss (DFL) is defined many times in the text. Whereas, some terms are not defined properly, for example CIoU.
2. Check for similar repetition in the abbreviations and fix it.
3. Abstract to be improved, for example there can't be a sentence starting with And! Spacing between the sentences to be included.
4. Spacing between the keywords to be included. Not only that, in the entire article, for example line 50,
5. Check appropriate case of letter in title heading/body of the text, for example section 2.3, line 136, figure 1 description, line 313 (Where:??)
6. Title of Section 2.5.3 - make it appropriate 7. Check line 390.. is it fig3.1??
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Reviewer 3 Report
Comments and Suggestions for AuthorsThe authors have substantially improved the manuscript. Some typographical issues still persist. Therefore, last proofread is required. Please, revise if image quality or resolution can be improved.
Comments on the Quality of English LanguageThe manuscript require deep proofreading
Author Response
Comment:The authors have substantially improved the manuscript. Some typographical issues still persist. Therefore, last proofread is required. Please, revise if image quality or resolution can be improved.
Response:Thank you for your positive feedback. We have carefully proofread the manuscript once again to correct the remaining typographical issues. In addition, we reviewed all figures and improved their quality and resolution where possible to ensure better readability.
Reviewer 5 Report
Comments and Suggestions for AuthorsIn line 459, which figure do you mean?
Add a brief overview of the workflow for the subsequent sections at the end of the Introduction. Review all abbreviations to ensure each is defined only once, and check punctuation throughout.
Author Response
Comments:In line 459, which figure do you mean?
Response:Thank you for pointing this out. In line 459, we are referring to the entire Figure 6, which consists of all 12 subfigures.
Comments:Add a brief overview of the workflow for the subsequent sections at the end of the Introduction. Review all abbreviations to ensure each is defined only once, and check punctuation throughout.
Response:Thank you for your helpful suggestion. We have added a brief overview of the workflow for the subsequent sections at the end of the Introduction to guide the reader through the paper. In addition, we carefully reviewed all abbreviations to ensure that each is defined only once, and we thoroughly checked punctuation throughout the manuscript to improve clarity and consistency.
Round 3
Reviewer 2 Report
Comments and Suggestions for AuthorsAll issues raised in second version has been addressed.
All the very best.