Noise-Perception Multi-Frame Collaborative Network for Enhanced Polyp Detection in Endoscopic Videos
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsThanks for sharing an interesting study. Here are my comments below.
1. Explain STFT in full name and give more detail.
2. Section 2.3 is too short. More relevant work should be included
3. Figure 3(c), I suggest using LoFre and HiFre maps instead in order to make it more readable.
4. "attention mechanisms like SE and CBAM [38] [39]to suppress noise and artifacts effectively, enhancing feature selectivity and accuracy." Explain how the attention mechanism suppresses noise in the discussion.
5. "the latter on global features such as shapes and contours." Contours and the detailed aspects of shapes are high-frequency features because they involve rapid changes in pixel values. Please revise.
6. The multi-head attention usage has not been revealed in the figures.
7. RDN is better than this work, particularly in recall. A meticulous discussion and limitation of this work comparing with RND study is required.
8. An overview of the study method in the form of a figure should be added to improve ability.
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Reviewer 2 Report
Comments and Suggestions for AuthorsI think the authors submitted an interesting and well written manuscript which uses a good English. I think the potential readers of the manuscript will appreciate that the manuscript is accompanied by source code. I think there are no big problems with this manuscript. After reading the manuscript, I found only minor issues and points where the authors could further improve the quality of the manuscript.
1) It would be good to read something about the applied programming languages, libraries, and computer configuration.
2) Since deep learning involves a lot of experiments, the publication of training curves would be nice.
3) In Section 4.1, the main characteristics of the applied databases could be summarized in a table. This way, the readers could better see the properties of the databases.
4) Every Equation should be ended by "." or ",".
5) In table captions please indicate what bold font types and underscores mean.
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Reviewer 3 Report
Comments and Suggestions for AuthorsAuthors have presented a noise and artifact suppression framework for polyp detection from endoscopic videos. There are three novel methodologies implemented:
1. High-Low Frequency Feature Fusion (HFLF) framework that improves the model’s sensitivity to high-frequency details to better combat the noisy conditions.
2. STFT-LSTM (Long Short-Term Memory) Polyp Detection (SLPD) module, which combines STFT and LSTM networks that capture dynamic information between frames in video sequences.
3. Image Augmentation Polyp Detection (IAPD) module, which applies various augmentation strategies (e.g., blur, noise, scaling, and rotation) to improve performance on low-quality images.
Combination of of these three methodologies has resulted in a better polyp detection performance compared to their counterpart methods in terms of F1 score, F2 score, accuarcy and recall. The results and discussion supported by output/ resultant images helps understanding the article in a better way.
Whereas the graphical explanation (using the figures) of the three methods- HFLF, SPLD and IAPD is excellent, the richness of the article can be graetly elevated if they are supported with the corresponding mathetical equations in detail.
Author Response
Please see the attachment.
Author Response File: Author Response.pdf