Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

High-Dimensional Attention Generative Adversarial Network Framework for Underwater Image Enhancement

Electronics 2025, 14(6), 1203; https://doi.org/10.3390/electronics14061203

by Shasha Tian¹

, Adisorn Sirikham^1,*

, Jessada Konpang¹

and Chuyang Wang²

Reviewer 1:

Haofeng Hu

Reviewer 2: Anonymous

Reviewer 3: Anonymous

Electronics 2025, 14(6), 1203; https://doi.org/10.3390/electronics14061203

Submission received: 7 February 2025 / Revised: 15 March 2025 / Accepted: 17 March 2025 / Published: 19 March 2025

(This article belongs to the Special Issue Artificial Intelligence in Graphics and Images)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

Regarding the issues of color distortion, underexposure, and blurring in underwater images, the paper proposes a high-dimensional attention generative adversarial network framework (HDAGAN) for underwater image enhancement. The proposed method produces visually more pleasing enhancement results. Overall, the paper is meaningful and interesting. However, there are still several areas that need further clarification and attention. Please address the following points:

1.Chapter Structure Review: Please review the structure of Sections 2 and 3 to ensure that it conforms to the journal's "Instructions for Authors" writing guidelines.

2.Lack of Experimental or Citation Support for Some Claims: Certain claims in the paper lack supporting experimental evidence, citations, or detailed descriptions. For example:

“Due to the inherent characteristics of underwater images, the distribution of key features across different color channels and pixel positions is inconsistent. Consequently, varying receptive fields are required to effectively capture scene-specific information”、“By cascading two distinct channel attention mechanisms, the CARM module effectively filters out irrelevant feature information from multi-channel inputs while retaining channel features that are conducive to image encoding” Please carefully check for other similar issues throughout the paper.

3.Details of the Strengthen Operate Subtract Module: The description of the Strengthen Operate Subtract Module (SOSM) is insufficient. Please provide more detailed information.

4.Incomplete Introduction of Loss Functions: The introduction of the loss functions is incomplete. For example, the paper lacks a description of the parameterization of the loss functions, and the selection of comparison variables in Figure 6 is inappropriate.

5.Non-Standard Figures in Figures 10 and 11: There is a significant issue with Figures 10 and 11, as the images appear to be improperly cropped, which is disappointing.

6.Missing Recent References:The paper lacks the following recent references, which should be added:

*PODB: A learning-based polarimetric object detection benchmark for road scenes in adverse weather conditions.

*FDNet: Fourier transform guided dual-channel underwater image enhancement diffusion network.

*Review of polarimetric image denoising.

Please revise and improve the paper according to the above comments.

Author Response

Comments 1: Chapter Structure Review: Please review the structure of Sections 2 and 3 to ensure that it conforms to the journal's "Instructions for Authors" writing guidelines.

Response 1: Thank you for pointing this out. We have carefully reviewed the structure of Sections 2 and 3 to ensure full compliance with the journal’s "Instructions for Authors." Upon revision, we confirmed that the logical flow, subsection hierarchy, and formatting (e.g., headings, equations, figures, and tables) align with the journal’s guidelines.

Comments 2: Lack of Experimental or Citation Support for Some Claims: Certain claims in the paper lack supporting experimental evidence, citations, or detailed descriptions. For example: “Due to the inherent characteristics of underwater images, the distribution of key features across different color channels and pixel positions is inconsistent. Consequently, varying receptive fields are required to effectively capture scene-specific information”,“By cascading two distinct channel attention mechanisms, the CARM module effectively filters out irrelevant feature information from multi-channel inputs while retaining channel features that are conducive to image encoding” Please carefully check for other similar issues throughout the paper.

Response 2: We agree with this comment.Thank you for your constructive comments and valuable suggestions, which have greatly improved the quality and rigor of this paper.We have added new sections titled 3.1.1 (Channel Attention) and 3.1.2 (Pixel Attention). In these sections, we provide comprehensive explanations of the channel attention and pixel attention mechanisms in CARM, supported by illustrative figures and mathematical formulations. Specifically, we highlight the attention weight allocation strategies proposed in our paper, as detailed on pages 8 and 9.

Comments 3: Details of the Strengthen Operate Subtract Module: The description of the Strengthen Operate Subtract Module (SOSM) is insufficient. Please provide more detailed information.

Response 3: Agree. Thank you for pointing this out.We have accordingly revised Section 3.2 ("Strengthen Operate Subtract Module") to emphasize this point. Specifically, we added a visual illustration (Fig. 4) of the SOSM and provided detailed explanations using mathematical formulas to clarify its mechanism. These revisions can be found in Section 3.2: Page 9, Paragraph 2.

Comments 4: Incomplete Introduction of Loss Functions: The introduction of the loss functions is incomplete. For example, the paper lacks a description of the parameterization of the loss functions, and the selection of comparison variables in Figure 6 is inappropriate.

Response 4: Agree.Deeply thankful for your insights. We agree with this comment. Therefore, to supervise the adversarial training and loss functions of the conditional generative adversarial network and preserve the integrity of image detail features, we have incorporated additional essential loss functions into the objective function: the adversarial loss function, global similarity loss function, and content perception loss function. The newly added content can be found on page 11.

Comments 5:Non-Standard Figures in Figures 10 and 11: There is a significant issue with Figures 10 and 11, as the images appear to be improperly cropped, which is disappointing.

Response 5: We agree with this comment. We have accordingly revised Figures 10 and 11 in Section 4.5 ("Ablation Study") to address the inadvertent cropping errors in their original versions. The figures have been corrected to ensure clarity and completeness of the visual results.These revisions can be found on page 18,19,and 20.

Comments 6: Missing Recent References:The paper lacks the following recent references, which should be added:

*PODB: A learning-based polarimetric object detection benchmark for road scenes in adverse weather conditions.

*FDNet: Fourier transform guided dual-channel underwater image enhancement diffusion network.

*Review of polarimetric image denoising.

Response 6: Thank you for your feedback. We agree with this comment. Therefore, we have incorporated the following references into the manuscript to enhance the academic impact, credibility, and dissemination value of our work. These additions can be found in the References 24,43, and 44.

Reviewer 2 Report

Comments and Suggestions for Authors

This paper proposed HDAGAN framework, a generative adversarial network (GAN)-based framework with an encoder-decoder structure for underwater image enhancement.

There are several methodological questions.

The integration mechanism of channel and pixel attention in CARM lacks detailed explanation and theoretical support (e.g., attention weight allocation strategy).
The authors only validates the combination of L1+L2+adversarial loss. What about other losses (e.g., perceptual loss, color consistency loss) relevant to underwater scenarios?
The data splitting is opaque. for example, details on trainand test splits (e.g., scene or depth stratification) are unclear, raising data leakage concerns.
The proposed method should compare with recent Transformer-based UIE methods (e.g., Uformer, SwinIR) for the breadth of experimental validation.
The HDAGAN framework should be tested on extreme scenarios (e.g., high turbidity, extremely low light), which is important and necessary.
No failure cases (e.g., poor enhancement on certain images) are shown, weakening credibility.

Author Response

Comments 1: The integration mechanism of channel and pixel attention in CARM lacks detailed explanation and theoretical support (e.g., attention weight allocation strategy).

Response 1: We agree with this comment.Thank you for your constructive comments and valuable suggestions, which have greatly improved the quality and rigor of this paper.We have added new sections titled 3.1.1 (Channel Attention) and 3.1.2 (Pixel Attention). In these sections, we provide comprehensive explanations of the channel attention and pixel attention mechanisms in CARM, supported by illustrative figures and mathematical formulations. Specifically, we highlight the attention weight allocation strategies proposed in our paper, as detailed on pages 8 and 9.

Comments 2: The authors only validates the combination of L1+L2+adversarial loss. What about other losses (e.g., perceptual loss, color consistency loss) relevant to underwater scenarios?

Response 2: Agree.Deeply thankful for your insights. We agree with this comment. Therefore, to supervise the adversarial training and loss functions of the conditional generative adversarial network and preserve the integrity of image detail features, we have incorporated additional essential loss functions into the objective function: the adversarial loss function, global similarity loss function, and content perception loss function. The newly added content can be found on page 11.

Comments 3: The data splitting is opaque. for example, details on trainand test splits (e.g., scene or depth stratification) are unclear, raising data leakage concerns.

Response 3: Thank you for your feedback. We agree with this comment. Therefore, in Section 4.3, we have newly incorporated the EUVP dataset to validate the performance of HDAGAN in diverse environments and comprehensively assess the adaptability of our model. Additionally, we provide detailed descriptions of the training and testing phases for all three datasets (UIEB, URPC, and EUVP). These updates can be found on page 14.

Comments 4: The proposed method should compare with recent Transformer-based UIE methods (e.g., Uformer, SwinIR) for the breadth of experimental validation.

Response 4: Agree. We have accordingly modified Section 4.4 ("Analysis of Experimental Results") to broaden the scope of experimental models. Specifically, we added the Transformer-based UIE method (URSCT-SESR model) and included corresponding results in Figures 8, 9, 10 and Table 3. These revisions can be found on Pages 14-18.

Comments 5: The HDAGAN framework should be tested on extreme scenarios (e.g., high turbidity, extremely low light), which is important and necessary.

Response 5: Thank you for your feedback. We agree with this comment. Therefore, in Section 4.3, we have introduced the EUVP dataset, which includes both supervised and unsupervised subsets specifically designed for underwater image enhancement. The EUVP dataset contains abundant samples under extremely low-light conditions (e.g., dark underwater environments), addressing extreme low-illumination scenarios. Additionally, the URPC dataset, associated with underwater robot grasping competitions, covers more challenging scenarios such as turbid waters, where low visibility is a critical issue.These updates can be found on page 14. Furthermore, we have elaborated on the channel attention and pixel attention mechanisms of the CARM module, particularly emphasizing the attention weight allocation strategy. This refinement ensures that the HDAGAN framework achieves robust performance in extreme scenarios (e.g., high turbidity, extremely low light).These updates can be found on page 8 and 9.

Comments 6: No failure cases (e.g., poor enhancement on certain images) are shown, weakening credibility.

Response 6: Agree. We have accordingly revised the manuscript to address the limitations of the HDAGAN model in certain image enhancement scenarios. During the revision period, we improved the enhancement performance through minor architectural adjustments. We acknowledge that further optimization is required. Future work will focus on extending training cycles and refining the temporal dynamic fusion mechanism to achieve more robust results.

Reviewer 3 Report

Comments and Suggestions for Authors

Please add more visual illustrations of the Strengthen Operate Subtract Module (SOSM) to help readers better understand its function.
The experimental results should include a comparison of the FLOPs and parameters between the proposed model and other methods.
The paper was tested only on the UIEB and URPC datasets, which cover some underwater scenarios but are insufficient for fully evaluating the model’s adaptability. We suggest incorporating additional datasets, such as EUVP, SQUID, or UFO-120, to validate HDAGAN’s performance across diverse environments. Additionally, please verify the full name of the URPC dataset. Public sources indicate it should be "Underwater Robot Picking Contest", not "Underwater Vehicle Professional Competition". If incorrect, we recommend correcting it and providing a proper reference to ensure accuracy.
The paper uses cGAN, L1, and L2 loss to enhance image quality, ensuring realism and structural consistency. However, L1/L2 loss mainly focuses on pixel similarity and may miss high-level semantics and fine textures. We suggest adding VGG Perceptual Loss or Feature Matching Loss to further improve visual quality and detail preservation.

Comments on the Quality of English Language

Could be improved.

Author Response

Comments 1: Please add more visual illustrations of the Strengthen Operate Subtract Module (SOSM) to help readers better understand its function.

Response 1: Agree. Thank you for pointing this out.We have accordingly revised Section 3.2 ("Strengthen Operate Subtract Module") to emphasize this point. Specifically, we added a visual illustration (Fig. 4) of the SOSM and provided detailed explanations using mathematical formulas to clarify its mechanism. These revisions can be found in Section 3.2: Page 9, Paragraph 2.

Comments 2: The experimental results should include a comparison of the FLOPs and parameters between the proposed model and other methods.

Response 2: Agree. We have accordingly revised Section 4.5.1 to emphasize this point. Specifically, we added a comparative analysis of FLOPs and parameters between our model and other methods, demonstrating its efficiency and competitive performance.The newly added content can be found on page 18.

Comments 3: The paper was tested only on the UIEB and URPC datasets, which cover some underwater scenarios but are insufficient for fully evaluating the model’s adaptability. We suggest incorporating additional datasets, such as EUVP, SQUID, or UFO-120, to validate HDAGAN’s performance across diverse environments. Additionally, please verify the full name of the URPC dataset. Public sources indicate it should be "Underwater Robot Picking Contest", not "Underwater Vehicle Professional Competition". If incorrect, we recommend correcting it and providing a proper reference to ensure accuracy.

Response 3: Thank you for pointing this out. We agree with this comment. Therefore, in Section 4.3, we have introduced the EUVP dataset to validate the performance of HDAGAN across diverse environments and comprehensively evaluate the adaptability of our model. Additionally, we provide detailed descriptions of the training and testing procedures for all three datasets: UIEB, URPC, and EUVP. Furthermore, we sincerely appreciate your correction regarding the URPC dataset. We have revised its full name to "Underwater Robot Picking Contest"to ensure accuracy.These updates can be found on page 14.

Comments 4: The paper uses cGAN, L1, and L2 loss to enhance image quality, ensuring realism and structural consistency. However, L1/L2 loss mainly focuses on pixel similarity and may miss high-level semantics and fine textures. We suggest adding VGG Perceptual Loss or Feature Matching Loss to further improve visual quality and detail preservation.

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

The authors properly revised the manuscript according to the comments. I think the currernt version can be considered to be accepted.

Author Response

We sincerely appreciate your positive feedback on our revised manuscript. Thank you for recognizing and accepting the revisions.

We confirm that all previous comments have been thoroughly addressed, and the manuscript has been meticulously polished to align with the journal’s standards.

Reviewer 3 Report

Comments and Suggestions for Authors

The authors have addressed well the reviewer's comments. This reviewer recommended its publication.

Comments on the Quality of English Language

Could be improved.

Author Response

We sincerely appreciate the reviewer's positive feedback and recommendation for publication. We are grateful that our revisions addressed the comments satisfactorily. Thank you for your valuable input, which has significantly enhanced the quality of our work.

Article Menu

High-Dimensional Attention Generative Adversarial Network Framework for Underwater Image Enhancement

Further Information

Guidelines

MDPI Initiatives

Follow MDPI