A Multi-Scale Cross-Layer Fusion Method for Robotic Grasping Detection
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsThis study introduces MCFG-Net, a novel deep learning framework for robotic grasp detection that integrates multi-scale spatial feature enhancement (MSFEM) and a cascade fusion attention module (CFAM) within an encoder–decoder architecture to improve the accuracy and robustness of pixel-level grasp prediction. The model achieves state-of-the-art performance on benchmark datasets (99.6% on Cornell, 95.5% on Jacquard) and demonstrates strong generalization in real-world robotic grasping scenarios.
Here are my comments:
The architectural improvements are evolutionary rather than revolutionary. Many techniques like attention modules, multi-scale design are well-established in related work.
The CFAM design is somewhat ad hoc: though dual-path attention is reasonable, there's no theoretical justification or systematic design space exploration to show its optimality. Why use three dilation rates? Why not dynamic fusion or learnable dilation?
The computational overhead of MSFEM and CFAM is not well characterized—although inference time is briefly discussed, the per-module FLOPs or parameter increase should be quantified.
The use of GeLU as an activation function is only marginally justified. An empirical comparison with ReLU or Swish in the ablation would clarify its contribution.
3D grasp transformation is described, but its implementation details such as camera calibration error, hand-eye matrix precision, latency in real-world execution are lacking, because MCFG-Net is tested on a physical robot.
This paper lacks reviewing recent studies in introdution, like Machine learning-enhanced soft robotic system inspired by rectal functions to investigate fecal incontinence; Predicting flow status of a flexible rectifier using cognitive computing.
No statistical significance analysis or variance reporting is provided. Accuracy is reported to 0.1%, but without confidence intervals or standard deviation, claims of superiority are weakly supported.
Robustness metrics are missing. For example, noise resilience, partial occlusion performance.
Failure case analysis is too superficial. There’s little attempt to quantify failure modes or apply model introspection (Grad-CAM) to interpret attention maps.
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Reviewer 2 Report
Comments and Suggestions for Authors-
Add the real-robot success rates (98.5 % and 95 %) to the abstract.
-
In the introduction, explain how MSFEM’s same-layer fusion differs from dilated-conv and ASPP approaches.
-
Define the acronyms CBG and TBG in the model architecture section.
-
Specify the output channel dimensions after each down- and up-sampling block.
-
Use consistent subscripts for T_trc and T_trr and describe each transformation matrix.
-
Clarify how MSFEM channels are partitioned (e.g., 256→4×64) and based on what criteria.
-
Annotate pooling scales and convolution kernel sizes in Figure 2.
-
Correct typographical errors in CFAM Equations 6–9 (operators and subscripts).
-
Provide a brief algorithmic summary of the CFAM dual-path aggregation.
-
In training details, list the initial learning rate and the warm-up/cosine schedule.
-
State the random seed and hardware/software (driver/CUDA) used for reproducibility.
-
In Tables 1–4, report mean ± σ for accuracy and success-rate metrics over at least three runs.
-
Define “IW” and “OW” splits in the table captions.
-
For real-robot experiments, report calibration accuracy and gripper force parameters.
-
Explain how failure cases were counted (drops vs. mis-detections).
-
Propose concrete mitigations for failure cases, such as adding tactile feedback or improved planning.
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Reviewer 3 Report
Comments and Suggestions for AuthorsThe paper is generally interesting, mostly properly structured and within the scope of the journal. Hovewer, it fails to meet quality standards of the regular journal paper. Therefore improvements are needed. The authors are invited to consider the following suggestions:
- Title of the paper needs reconsideration. It is advisable to avoid interpunction (:) and also abbreviatio MCFG does not bring any merrit to the title, therefore the first part could be omitted. Simplification and improved clarity helps to enahance citations of the paper in the futre.
- Contributions of the paper are directly stated at the end of the Introduction section, which is positive. However, they are not formulated well, unfocused, and not strong enough. They have to be separated as subparagraph from previous text (line break) and improved/updated.
- It is advisable to add short introduction to the paper contents at the end of Introduction section, such as "The rest of the paper is organized as follows: in Section 2 ...."
- Conclusions section is weak and too limited. It has to be more directly related to results and expanded, with relevant general conclusions resulting from the study.
- In conclusions section, future research directions need to be addressed as separate second subparagraph (new line). This is very attractive for the interested readers of the paper and greatly improves citations of the paper in the future.
- In order to highlight contemporary nature of the paper subject, authors are advised to add some journal references from 2025.
- List of keywords need reconsideration. It should be expanded to at least 6 keywords and revised in order to better cover core subject of the study.
- Authors are invited to avoid first person expressions (we, our) especially in the abstract. Convert to passive voice.
- Part of the reated literatre review is performed in section 1 and part in section 2. Consider uniting or some other better and more logical organisation.
- Titles of the sectons 3 and 4 need improvements to be more informative and clear.
- Section 4 is too long considering its title and subject compared to other parts of the paper.
- It is advisable to introduce section Disscusion and to devote parts of the experiments section and others to it. This would improve the organization of the paper and align it with standard journal paper structure.
- To make the figures easily readable, avoid abbreviations in fugure captions or define them there again (MSFEM, CFAM, etc.)
I regret to advise against publishing of the paper in its current form. However, if authors are willing to consider remarks and improve the paper it can be reconsidered for publishing as the study contains valuable results.
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Round 2
Reviewer 1 Report
Comments and Suggestions for AuthorsI have reviewed the revised manuscript and I am pleased to report that the authors have made substantial improvements in response to my previous comments. The manuscript now addresses the key concerns I raised, and the revisions enhance both the clarity and depth of the work. I recommend that the manuscript be accepted for publication in its current form.
Author Response
We sincerely thank you for your positive evaluation of our revised manuscript and for your valuable comments during the review process. We truly appreciate your time and effort in helping us improve our work.
Reviewer 2 Report
Comments and Suggestions for AuthorsAuthors have addressed all reviews successfully.
Author Response
We sincerely thank you for your positive evaluation of our revised manuscript and for your valuable comments during the review process. We truly appreciate your time and effort in helping us improve our work.
Reviewer 3 Report
Comments and Suggestions for AuthorsThe authors have invested considerable effort into considering and addressing all the remarks and concerns raised, which is well recognized. Responses are well elaborated, and even when corrections are somewhat questionable authors defend their possition sufficiently well. The paper is in my opinion significantly improved and could be published in its current form. Only minor remarks remain:
- Contrinutions of the paper are now well emphasized and could be organized in four bullet points as proposed, only the text in bullet points could be a bit shorter to provide focused and sharp expression of contrinutions. Just shorten a bit bullet points without any essential change.
- Re read conclusions to improve style. For example second paragraph properly starts with 'Future research directions will focus on:', but then the second point (2) starts with future work again.
- In Table 1 caption starts with asterisk, But the rest of the caption is not related to asterisk mark. Start possibly with definition of IW and OW, then define asterisk. Also instead of 'ours' it might be better to write This study.
Author Response
Please see the attachment
Author Response File: Author Response.pdf