EMR-YOLO: A Multi-Scale Benthic Organism Detection Algorithm for Degraded Underwater Visual Features and Computationally Constrained Environments
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsI would like to thank the authors for their submitted work. The following is my observations and areas of improvement:
- YOLOv8 was adopted by the authors as the base model without discussing why it is suitable more than other options. A discussion needs to be added on this selection justification.
- Compare your proposed model EMR-YOLO against unmodified baselines such as: YOLOv8, YOLOv7, etc.
- Does EMR-YOLO require any hyperparameter tuning?
- In figures 1, 3 and 4, annotate them more clearly.
- Clarify if you have used day augmentation. I suspect you used it, but wasn’t that clear in the text.
- Generalizability of your EMR-YOLO to different underwater conditions (datasets) was not tested. For your experiment to be detailed reported, add a threats to validity section, where you discuss various threats of validity faced, with any mitigation actions taken to avoid them.
- Were results averaged over multiple runs? This is needs to be clarified.
Author Response
Please see the attachment
Author Response File: Author Response.pdf
Reviewer 2 Report
Comments and Suggestions for AuthorsThe paper "EMR-YOLO: A Multi-scale Benthic Organism Detection Algorithm for Degraded Underwater Visual Features and Computationally Constrained Environments" is devoted to the development of EMR-YOLO for Benthic Organism Detection in complex underwater conditions.
The main goal that the authors want to achieve is to create a highly accurate and resource-efficient detection algorithm that can correctly identify objects despite the degradation of underwater images (turbidity, color distortion, movement), work in real time on devices with limited computing resources, while adapting to variations in the size and shape of the Benthic Organism.
The problem being solved is relevant, since it can effectively participate in solving the problems of monitoring marine ecosystems and automating research. It is considered that: underwater images suffer from low contrast, color distortion, blur, Benthic Organisms are often barely noticeable, have complex shapes and blend into the background, and that underwater robot equipment requires low-resource algorithms (GPU and memory limitations).
The authors propose the following provisions as scientific novelty. Multi-threaded downsampling with cross-layer feature fusion and gradient control is proposed, which will reduce computational costs without loss of accuracy. A dynamically sparse architecture based on DCNv4 is proposed, which will improve the detection of small and deformed objects.
The following comments arose while reading the paper.
- The authors use YOLOv8 as a basis, but do not explain why they did not choose newer versions (YOLOv9/YOLOv10) or alternative architectures. There is no comparison of the efficiency of EMR-YOLO when integrated with other SOTA models.
- The studies (Table 3) do not test the contribution of each module (MBFDown, RTRA, EDSHead) separately in combination with different backbones. For example, how does RTRA perform without MBFDown?
- Only one dataset (DUO) is used. No tests on other datasets (e.g., URPC, SUOD) to evaluate generalization.
- The criteria for data filtering are unclear: 26% of the original images are removed as "poor quality", but the quality metrics or the selection method are not described.
- Table 4 compares EMR-YOLO with YOLOv3–v11, but the parameters of the models (e.g., input image size, hyperparameters) are not unified. The difference in FLOPs may be due to different settings rather than architectural advantages.
- No comparison with state-of-the-art methods for underwater detection (e.g., UWNet, UPDET). The "superiority over SOTA" claim requires more context.
- "Efficiency for embedded devices" is claimed, but there is no data on inference rate (FPS) on GPU/CPU or power consumption. The tables only contain FLOPs and parameters, which is not enough to assess real-world performance.
- RTRA (Section 3.2) does not specify the size of the S×S regions. This is critical for reproducibility.
- MBFDown (Figure 2) uses 3×3 convolution after MaxPool, but does not explain the choice of such operations (why not Depthwise Conv?).
- Formulas (7)–(9) for EDSHead contain undefined elements (e.g., the functions a1,β1 in (9) are described superficially).
- Figure 7 ("real results") does not contain examples with false positives/false negatives, which is important for assessing model errors.
- Section 4.6 states that YOLOv7 is better at detecting multiscale objects than EMR-YOLO. This contradicts the main thesis about the superiority of the proposed method.
- It is stated that "data is contained in the paper", but the DUO dataset is not publicly available (no link to the repository). This prevents verification of the results.
- Patents are mentioned (Section 7), but no details are disclosed.
The criticism does not cancel the significance of the work (innovative MBFDown/RTRA modules), but without addressing these issues, the paper loses scientific rigor.
Author Response
Please see the attachment
Author Response File: Author Response.pdf
Reviewer 3 Report
Comments and Suggestions for AuthorsThe authors proposed the method " EMR-YOLO: A Multi-scale Benthic Organism Detection Algorithm for Degraded Underwater Visual Features and Computationally Constrained Environments" to resolve the issue BOD.
The research proposed could be more enhanced after the following revision comment based on major revisions are well completed for well improved research and for appropriate review as:
- In the abstract authors should clearly state the result of their proposed model as compared to the state of the art.
- In fact in the description of figure 2 authors do not state what problem they are solving and in related what state of the art and just describe the process of how dimensions are obtained in the levels does not make a sound research.
- Even the authors just designed figure 1 without stating or explain them and exactly how they affect subsequent module figures 1-4 of the system proposed.---Major revision
- Many terms are not well explained in the equations as well as in the figures as exactly and how they influence in the system design should be well understood.
- The authors comparison of their proposed EMR-YOLO and Yolov8 and EdsHead methods should be clearly demonstrated on how the outcome of the comparison methods affect their proposed method.—Major revision
- It is noted on in the end that the method EdsHead was developed in the latter part of the manuscript. This development should be detailed in the model and how they affect the proposed methodology and solve the BOD algorithm problem. If authors can also demonstrate with clear algorithms should be fine.—Major revision
- Again authors compared their proposed scheme and compared with MLCA [35], GAM[36], CA[37] etc…. If they know that cannot be well accounted for by detailing how they affect their proposed scheme as they adopted RTRA, they should not include the comparison method unless they are clearly justifiable.-- Major Revision
Reference should be clearly changed after all the above major revisions are completed in revising the manuscript
Author Response
Please see the attachment
Author Response File: Author Response.pdf
Round 2
Reviewer 2 Report
Comments and Suggestions for AuthorsI thank the authors for their detailed and clear answers. I believe that the article can be recommended for publication.
Author Response
Comments:I thank the authors for their detailed and clear answers. I believe that the article can be recommended for publication.
Response: We would like to express our sincere gratitude to you and the reviewers for your valuable time and professional insights regarding our manuscript. We are very pleased to learn that the reviewers acknowledged our detailed responses and revisions based on the first-round comments and provided a favorable recommendation for publication. Thank you once again for your kind support and constructive feedback throughout the review process.
Reviewer 3 Report
Comments and Suggestions for AuthorsAs far as I have taken into consideration how some of the minor revisions were not properly addressed by the authors, some major revisions, including comments 5, and 6 should be well addressed, which is not.
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Round 3
Reviewer 3 Report
Comments and Suggestions for AuthorsThe authors proposed that the EMR-YOLO SCHEME was developed from YOLOv8, which was optimized based on modules developed, such as MBFDown, EDSHead, and RTRA.
- The authors described these three modules well and clearly it should has showing what these developments have enhanced in the YOLOv8.
- I expect that the YOLOv8 limitation and optimization should have been clear based on these three modules, MBFDown, EDSHead, and RTRA. development, which should clearly demonstrate these developments have optimized YOLOv8.
- They should be able to clearly show the comparison of their model with YOLOv8 based on whatever performance metrics they assess their proposed EMR-YOLO scheme, such as using the AP metrics well
- The authors also conducted ablation studies with the three modules proposed in their method; however, they developed EDSHead and compared it with EDSHead, their proposed EMR-YOLO scheme, and EDSHead-RTRA, as shown in Figure 6 for determining precision as against Recall, and that should have been compared with YOLOv8. Because it makes more confusion as to why their proposed schme EMR-YOLO is compared to the developed modules in it such a s EDSHead, EDSHead+RTRA etc. This makes the presentation of the research more confusing and unclear.
- I suggest as their proposed model contains some good information, the presentation should be clearer, developing the YOLOv8 limitation well and optimizing based on their proposed three modules, MBFDown, EDShead, and RTRA should be the main concentration
Author Response
Please see the attachment.
Author Response File: Author Response.pdf