BEMF-Net: A Boundary-Enhanced Multi-Scale Feature Fusion Network
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsThis article proposes a boundary enhanced multi-scale feature fusion network (BEMF Net) for endoscopic image segmentation of kidney tumors, which addresses challenges such as boundary blurring, multi-scale variations, and detail similarity by introducing three novel modules (BSA, MFA, HCA). some questions should be discussed.
1. The article mentions that BSA, MFA, and HCA are all "proposed", but some structures (such as CFP, CCA, PSA) have similar designs in existing works. Suggest providing a clearer explanation of the differences and improvements from existing methods in the 'Related Works' or' Methods' section.
2. Training details: The article mentions the use of multi-scale training strategies (0.75, 1.0, 1.25) instead of data augmentation, but does not specify whether other augmentation methods (such as flipping and rotation) will be used. Suggest adding.
3.Calculation efficiency analysis: Indicators such as model parameter count, inference speed, and memory usage were not mentioned, which are important for real-time clinical application evaluation.
Author Response
|
Response 1: Thank you for your valuable suggestion. We agree that it is important to clarify the novelty and differences of our proposed modules compared to existing methods. Accordingly, we have significantly revised Sections 3.3, 3.4, and 3.5 in the manuscript to provide a clearer and more detailed explanation of how our BSA, MFA, and HCA modules differ from and improve upon related structures (e.g., CFP, CCA, PSA). Specifically: 1、In Section 3.3 (MFA), we now explicitly compare our hierarchical fusion strategy with prior multi-scale approaches, emphasizing the novel integration of CCA, CFP with HFF, and PSA within a unified framework tailored for renal tumor boundary ambiguity. 2、In Section 3.4 (HCA), we highlight the dual-branch design that synergizes Transformer-based global modeling with CNN-based local detail extraction, which is distinct from previous single-modality attention mechanisms. 3、In Section 3.5 (BSA), we elaborate on the three-stream decomposition (foreground, background, boundary) and its clinical motivation, distinguishing it from conventional boundary detection modules. These revisions can be found in the updated manuscript on pages 7–10 (Sections 3.3–3.5).
|
|
Comments 2: Training details: The article mentions the use of multi-scale training strategies (0.75, 1.0, 1.25) instead of data augmentation, but does not specify whether other augmentation methods (such as flipping and rotation) will be used. Suggest adding. |
|
Response 2: Thank you for pointing this out. We have now supplemented the training details in Section 4.2.1 to clearly state all data augmentation techniques used during training. In addition to multi-scale resizing, we also applied random horizontal flipping, random rotation within ±15°, and color jittering to enhance model robustness and prevent overfitting. The revised text reads: “To enhance model robustness and prevent overfitting, we employ a multi-scale training strategy with scaling factors of {0.75, 1.0, 1.25}, along with data augmentation techniques including random horizontal flipping, random rotation within ±15°, and color jittering.” This clarification appears on pages 12-13, Section 4.2.1.
Comments 3: Calculation efficiency analysis: Indicators such as model parameter count, inference speed, and memory usage were not mentioned, which are important for real-time clinical application evaluation. Response 3: We appreciate this constructive suggestion. We have now added a computational efficiency analysis in Section 4.2.6 (Table 7) and the corresponding discussion. The table compares BEMF-Net with several state-of-the-art methods in terms of parameter count (Params) and floating-point operations (FLOPs), using a standard input size of 3×352×352. While inference speed (FPS) and memory usage are highly platform-dependent, we have included FLOPs as a proxy for computational complexity and discussed the trade-off between accuracy and efficiency. We also note that BEMF-Net maintains a competitive parameter count while achieving superior segmentation performance, which supports its potential for clinical deployment. The added content can be found on pages 18-19 (Section 4.2.6) and in the Discussion section (Section 5.2), on pages 19-20. |
Author Response File:
Author Response.pdf
Reviewer 2 Report
Comments and Suggestions for AuthorsThis manuscript proposes BEMF-Net, a boundary-enhanced multi-scale feature fusion network for renal tumor segmentation in endoscopic images. The topic is relevant and timely, and the paper presents a clear architecture with extensive experiments, including ablation studies and cross-dataset generalization. The results indicate improved performance over several state-of-the-art baselines.
Major Points for Improvement
--- Dataset characterization is insufficient.
The Re-TMRS dataset includes only 10 clinical cases, which is very limited for training a Transformer-based architecture. Please provide additional details on patient diversity, imaging conditions, acquisition equipment, annotation protocol, and potential biases.
--- Reference issues.
Several citations appear as “Error! Reference source not found.” These must be corrected before publication.
--- English editing required.
The manuscript contains grammatical errors, awkward phrasing, and repeated sentences. A professional English revision is recommended.
--- Strengthen the Discussion.
The Discussion section reiterates parts of the Introduction and does not analyze failure cases, computational cost, or architectural limitations. Including these would significantly improve the scientific value.
--- Clarify module motivation.
MFA, BSA, and HCA incorporate known mechanisms (CCA, CFP, PSA, etc.). Please elaborate on why these specific components were combined and how each uniquely addresses the identified challenges.
--- Add statistical significance analysis.
Improvements over competing methods are often small (0.3–1.9%). A statistical test (e.g., Wilcoxon signed-rank test) would strengthen the claims.
--- Improve presentation of Figures.
Some qualitative masks in Figure 6 appear low-resolution. Enhancing clarity would improve comparison.
Minor Points
--- Ensure consistency in formatting mathematical expressions and equation numbering.
--- In Table 4, verify the MAE value for the Kvasir dataset, as it appears inconsistent with model performance trends.
Conclusion
With these revisions, the manuscript would present a clearer and more robust contribution to medical image segmentation. I recommend acceptance after moderate revision.
Comments on the Quality of English LanguagePlease refer to the comments above.
Author Response
|
Comments 1: Dataset characterization is insufficient. The Re-TMRS dataset includes only 10 clinical cases, which is very limited for training a Transformer-based architecture. Please provide additional details on patient diversity, imaging conditions, acquisition equipment, annotation protocol, and potential biases. |
|
Response 1: Thank you for this important suggestion. We have added a detailed description of the Re-TMRS dataset in Section 4.1, including: Patient demographics (age, gender, tumor types); Imaging equipment and acquisition settings (endoscope type, resolution, lighting conditions); Annotation process (performed by two experienced urologists, inter-rater reliability analysis); Discussion of potential biases (e.g., imaging artifacts, occlusions) and how they were addressed. These additions are found in Section 4.1, pages 11–12.
|
|
Comments 2: Reference issues. Several citations appear as “Error! Reference source not found.” These must be corrected before publication. |
|
Response 2: We sincerely apologize for this oversight. All reference errors have been corrected, and the reference list has been verified for consistency and completeness. All citations now correctly point to the corresponding entries in the reference list.
Comments 3: English editing required. The manuscript contains grammatical errors, awkward phrasing, and repeated sentences. A professional English revision is recommended. Response 3: We have thoroughly revised the manuscript for English language and clarity. The revised version has been proofread by a native English speaker and a professional editing service. Sentences have been restructured for better flow, and redundant content has been removed.
Comments 4: English editing required. The manuscript contains grammatical errors, awkward phrasing, and repeated sentences. A professional English revision is recommended. Response 4: Thank you for this suggestion. We have significantly expanded the Discussion section (Section 5) to include: 1、Analysis of typical failure cases (e.g., severely blurred boundaries, small tumor regions) 2、Computational cost analysis (inference time, memory usage, comparison with other models) 3、Limitations of the proposed architecture and suggestions for future improvement These additions are found in Section 5, pages 19–20.
Comments 5: Clarify module motivation. MFA, BSA, and HCA incorporate known mechanisms (CCA, CFP, PSA, etc.). Please elaborate on why these specific components were combined and how each uniquely addresses the identified challenges. Response 5: Thank you for your valuable suggestion. We agree that it is important to clarify the novelty and differences of our proposed modules compared to existing methods. Accordingly, we have significantly revised Sections 3.3, 3.4, and 3.5 in the manuscript to provide a clearer and more detailed explanation of how our BSA, MFA, and HCA modules differ from and improve upon related structures (e.g., CFP, CCA, PSA). Specifically: 1、In Section 3.3 (MFA), we now explicitly compare our hierarchical fusion strategy with prior multi-scale approaches, emphasizing the novel integration of CCA, CFP with HFF, and PSA within a unified framework tailored for renal tumor boundary ambiguity. 2、In Section 3.4 (HCA), we highlight the dual-branch design that synergizes Transformer-based global modeling with CNN-based local detail extraction, which is distinct from previous single-modality attention mechanisms. 3、In Section 3.5 (BSA), we elaborate on the three-stream decomposition (foreground, background, boundary) and its clinical motivation, distinguishing it from conventional boundary detection modules. These revisions can be found in the updated manuscript on pages 7–10 (Sections 3.3–3.5).
Comments 6: Add statistical significance analysis. Improvements over competing methods are often small (0.3–1.9%). A statistical test (e.g., Wilcoxon signed-rank test) would strengthen the claims. Response 6: We have performed a Wilcoxon signed-rank test on the mDice and mIoU results across all compared methods. The results show that our method’s improvements are statistically significant (p<0.05). A new subsection “Statistical Significance Analysis” has been added in Section 4.2.2 and 4.2.5, on pages 14 and 16.
Comments 7: Improve presentation of Figures. Some qualitative masks in Figure 6 appear low-resolution. Enhancing clarity would improve comparison. Response 7: We have regenerated all qualitative segmentation masks using higher resolution and improved contrast. Figure 6 has been updated accordingly, and all labels are now clearer.
|
Author Response File:
Author Response.pdf
Round 2
Reviewer 2 Report
Comments and Suggestions for AuthorsThe authors have addressed most of the major comments from the previous review round, including an improved dataset description, clearer motivation of the proposed modules, added statistical significance analysis, and an expanded discussion section.
However, some issues still require attention before final acceptance. In particular, several references in the revised manuscript still appear as “Error! Reference source not found” and must be fully corrected. Minor issues related to English expression and figure readability also remain.
I recommend acceptance after minor revision, conditional on correcting the remaining reference errors and performing a final language and presentation polish.
Comments on the Quality of English LanguagePlease refer to the comments above.
Author Response
|
Comments 1: However, some issues still require attention before final acceptance. In particular, several references in the revised manuscript still appear as “Error! Reference source not found” and must be fully corrected. |
|
Response 1: We sincerely apologize for this oversight. All reference errors have been corrected, and the reference list has been verified for consistency and completeness. All citations now correctly point to the corresponding entries in the reference list. We have removed citations to several Chinese journal articles from the original manuscript. This update ensures better alignment with international publishing standards and improves the relevance of the reference list for a global readership. For all retained references, we have verified and added complete DOI or stable article links. This step has fully resolved the “Reference source not found” errors and improves the accessibility and reliability of the citations These corrections have been applied throughout the manuscript, most notably in the Introduction and Related Work sections. The reference list is now accurate, complete, and properly formatted.
Comments 2: Minor issues related to English expression and figure readability also remain. Response 2: We agree with the reviewer’s suggestion. We have performed a thorough language polish to improve clarity and fluency, with special attention to grammatical and stylistic issues. The revised version has been proofread by a native English speaker and a professional editing service. In addition, we have optimized the resolution, labels, and layout of all figures to enhance readability. |
Author Response File:
Author Response.docx
