DAE-GAN: Underwater Image Super-Resolution Based on Symmetric Degradation Attention Enhanced Generative Adversarial Network

: Underwater images often exhibit detail blurring and color distortion due to light scattering, impurities, and other influences, obscuring essential textures and details. This presents a challenge for existing super-resolution techniques in identifying and extracting effective features, making high-quality reconstruction difficult. This research aims to innovate underwater image super-resolution technology to tackle this challenge. Initially, an underwater image degradation model was created by integrating random subsampling, Gaussian blur, mixed noise


Introduction
Underwater imaging plays a pivotal role in marine science and engineering applications, offering significant value to oceanographic research, ecological monitoring, exploration of marine resources, and maintenance of underwater equipment [1].It not only enhances the monitoring capabilities of marine life and coral reefs but also plays a central role in the precise localization, detection, and identification of underwater targets.However, the complexity of the underwater environment leads to loss of detail, reduced contrast, color distortion, blurred images, and increased noise in underwater images.Consequently, super-resolution technology for underwater images becomes critical.This technology compensates for various quality deficiencies in low-resolution images by reconstructing high-resolution images, thereby significantly enhancing image quality.
Single image super-resolution (SISR) is a well-established challenge within the realm of computer vision and image processing.The goal is to reconstruct a high-resolution image from a provided low-resolution input.The application of deep learning techniques has markedly improved the capabilities in this super-resolution (SR) task [2]; several methods based on the convolutional neural network (CNN) have been suggested [3][4][5][6][7]  almost dominated this field in the recent years.Subsequently, super-resolution methods based on generative adversarial networks (GANs) [8][9][10][11] have garnered attention.These methods enhance the quality of generated images through adversarial training, particularly in terms of restoring image details and textures.Techniques based on GANs have emerged as a significant branch within the field of image super-resolution, demonstrating their broad potential across multiple application domains.Recently, due to the advancements in natural language processing, Transformer [12] has captured the interest of the computer vision community.After making rapid progress on high-level vision application [13], methods based on the Transformer architecture are now being applied to low-level vision tasks, including super-resolution [14][15][16][17].
Underwater image super-resolution technology faces unique challenges.Firstly, the degradation process from high-resolution (HR) to low-resolution (LR) images underwater is often unknown, deviating from the specific methods typically used (such as bilinear interpolation, nearest-neighbor interpolation, etc.) to generate paired training data from HR to LR.Consequently, when models generate high-resolution images from low-resolution images, their performance often falls short of expectations.Secondly, in the complex underwater environment, the abundance of impurities, suspended particles, and the optical properties of water significantly affect image quality, leading to distortion and a considerable amount of noise in underwater images.These factors interfere with the basic structure of images and, to a certain extent, obscure important texture and detail information [18,19].This makes it difficult for networks to extract and recognize useful features from underwater images, thereby impacting the quality and accuracy of super-resolution reconstruction.
Building upon an in-depth analysis and leveraging insights from the super-resolution domain, specifically ESRGAN [20], this study has developed and optimized a specialized underwater image super-resolution network tailored to address the unique challenges of underwater imaging.Firstly, a novel approach was designed to simulate the degradation process of underwater images, enabling the network to better learn the mapping relationship between HR and LR images, thereby enhancing the quality of underwater image reconstruction.Secondly, a series of innovative adjustments and optimizations were made to the model.A significant improvement involves the integration of an adaptive residual attention module within the dense residual blocks of the model, aimed at bolstering the network's ability to recognize and extract key features in underwater images [21].Furthermore, a suite of targeted design optimizations was implemented, involving adjustments to the network's loss function and improvements to the configuration of convolutional layers, along with the introduction of spectral normalization (SN) layers to enhance the model's stability and generalization capacity.These comprehensive improvement strategies work in synergy to elevate the model's performance in processing underwater images.
The key contributions of this paper are outlined as follows: • We propose a method to simulate the actual degradation process for underwater images, enabling the network to better learn the mapping between high-resolution and low-resolution images, thereby enhancing the quality of reconstructed images.• The adaptive residual attention module designed for underwater images automatically assesses image importance using an energy function and, when integrated into dense residual blocks, enhances the precision of key feature extraction and the effectiveness of super-resolution reconstruction.

•
Experimental results demonstrate that our approach achieves high PSNR values while maintaining low LPIPS scores, traditionally seen as opposing outcomes.

Deep Networks for Image Super-Resolution
Since SRCNN first applied deep convolution neural networks to the image SR task and obtains superior performance over conventional SR methods, a variety of deep learning models [22][23][24] have been developed to further elevate image reconstruction quality.For instance, many methods apply more elaborate convolution module designs, such as residual Symmetry 2024, 16, 588 3 of 17 block [25][26][27] and dense block [28], to boost the representational power of the models.Some studies also investigate alternative architectures like recursive neural networks and graph neural networks [29].To enhance perceptual quality, [30] employs adversarial learning to generate more realistic results.By integrating attention mechanism, [31][32][33] achieve further improvement in terms of reconstruction fidelity.Recently, a new wave of Transformerbased networks [34] have been introduced, consistently setting new benchmarks in the SR field and demonstrating the robust representational capabilities of the Transformer.

Degradation Models
In current super-resolution research, many networks still rely on simple interpolation methods [35] or traditional degradation models [36], which often struggle to accurately simulate the complex degradation phenomena present in the real world.Underwater images are typically affected by a variety of complex factors, including the scattering and absorption of light, the suspension of particulate matter, and dynamic blurring caused by water flow, all of which contribute to a decline in image quality.To effectively address this issue, we have designed a degradation process specifically tailored for the underwater environment.By simulating the unique degradation characteristics of underwater settings, our method ensures that the processed high-resolution images more closely resemble the properties of real underwater images.

Attention-Based Image Super-Resolution
Attention mechanisms enhance image reconstruction quality and model adaptability to the diversity and complexity of underwater images by highlighting critical features and extracting detailed information.Thus, they have become essential for improving underwater image super-resolution.For instance, RCAN [37] enhances network performance through channel attention mechanisms; SAN [38] leverages second-order channel attention to strengthen feature correlation learning; NLSN demonstrates the potential of attention mechanisms in addressing non-local dependencies; and SwinIR employs self-attention mechanisms from transformers.Moreover, CAL-GAN [39] effectively improves the superresolution quality of photorealistic images by adopting a content-aware local generative adversarial network strategy, while DAT achieves efficient feature aggregation by merging features across spatial and channel dimensions.

A Practical Degradation Model
The SISR degradation model [40] can be mathematically formulated as where y denotes the HR image; x denotes the LR image; k is the blur kernel; ⊗ denotes convolution operator; and ↓ s denotes sub-sampling operator with stride of s.
Underwater imaging is subject to unique degradation factors distinct from those affecting conventional images, rendering traditional models inadequate for underwater image restoration.To address this, we have devised a degradation model specifically for underwater scenes, concentrating on unique aquatic factors to minimize computational overhead and enhance processing efficiency.The degradation dynamics of underwater images are captured by the following formula: where n denotes the added noise, p denotes the suspended particles in the underwater environment.As shown in Figure 1, the degradation model employs a first-order degradation process, and the detailed choices included in each degradation process are listed.In our previous experiments, we tested different sequences of degradation steps and found that their impact on the final results was negligible.
where n denotes the added noise, p denotes the suspended particles in the underwater environment.
As shown in Figure 1, the degradation model employs a first-order degradation process, and the detailed choices included in each degradation process are listed.In our previous experiments, we tested different sequences of degradation steps and found that their impact on the final results was negligible.

Resize
Downsampling is a fundamental operation for generating low-resolution images in the realm of super-resolution [41].Broadening our scope, we evaluate both downsampling and upsampling, that is, the resizing procedure.Various algorithms for resizing exist, including nearest-neighbor interpolation, area resizing, bilinear interpolation, and bicubic interpolation.Each method introduces its own distinctive effect, with some leading to blurriness, while others may produce overly sharp images accompanied by overshoot artifacts [42].
To encompass a richer array of complex resizing effects, we incorporate a stochastic selection of resizing techniques from the methods mentioned.Due to the misalignment complications presented by nearest-neighbor interpolation, we discount this method in favor of area, bilinear, and bicubic techniques.

Noise
The refractive index variation of air particles is mainly attributed to scattering, indicated as scatter n , with chromatic dispersion, chromatic n , being secondary and often overlooked.The wave-based index, wave n , is essential for accurate long-distance light transmis- sion.To accurately simulate these influences, a comprehensive stochastic noise model has been constructed: In the proposed model, weighting coefficients α , β and γ quantify the relative contri- butions of distinct noise sources, adhering to the normalization condition α β γ This ensures precise modulation of each noise component within the comprehensive noise framework.Additionally, γ is restricted to 0 0.1

γ ≤ ≤
, permitting nuanced adjustment of noise influence in a primarily linear domain, which is crucial for accurate noise behavior analysis.Through weighted fusion, we can better control the impact of noise on the system, improving system performance and stability.

Blur
To simulate the uniform blurring effects caused by less-than-ideal lighting conditions or environmental particulates, an isotropic Gaussian kernel is utilized.This kernel is founded on a two-dimensional normal distribution, with the standard deviation being equal

Resize
Downsampling is a fundamental operation for generating low-resolution images in the realm of super-resolution [41].Broadening our scope, we evaluate both downsampling and upsampling, that is, the resizing procedure.Various algorithms for resizing exist, including nearest-neighbor interpolation, area resizing, bilinear interpolation, and bicubic interpolation.Each method introduces its own distinctive effect, with some leading to blurriness, while others may produce overly sharp images accompanied by overshoot artifacts [42].
To encompass a richer array of complex resizing effects, we incorporate a stochastic selection of resizing techniques from the methods mentioned.Due to the misalignment complications presented by nearest-neighbor interpolation, we discount this method in favor of area, bilinear, and bicubic techniques.

Noise
The refractive index variation of air particles is mainly attributed to scattering, indicated as n scatter , with chromatic dispersion, n chromatic , being secondary and often overlooked.The wave-based index, n wave , is essential for accurate long-distance light transmission.To accurately simulate these influences, a comprehensive stochastic noise model has been constructed: In the proposed model, weighting coefficients α, β and γ quantify the relative contributions of distinct noise sources, adhering to the normalization condition α + β + γ = 1.This ensures precise modulation of each noise component within the comprehensive noise framework.Additionally, γ is restricted to 0 ≤ γ ≤ 0.1, permitting nuanced adjustment of noise influence in a primarily linear domain, which is crucial for accurate noise behavior analysis.Through weighted fusion, we can better control the impact of noise on the system, improving system performance and stability.

Blur
To simulate the uniform blurring effects caused by less-than-ideal lighting conditions or environmental particulates, an isotropic Gaussian kernel is utilized.This kernel is founded on a two-dimensional normal distribution, with the standard deviation being equal in all directions, thus ensuring the uniformity of the blur effect.The isotropic Gaussian kernel k can be represented as follows: Within the matrix, (i, j) denotes the spatial coordinates, N is the normalization factor ensuring the sum of all weights equals 1, and σ represents the standard deviation [43].
During experimentation, kernels of various dimensions-3 × 3, 5 × 5, 7 × 7, and 9 × 9-were implemented to replicate blurring effects across different area widths.The standard deviation was modulated from 1 to 3 to span blur intensities ranging from slight to severe.

Suspended Particles
In underwater imaging, image quality is notably impacted by suspended particulates that scatter incident light, producing distinct light spots.This research utilizes random field theory [44] to quantify scattering in this heterogeneous medium.The method allows simulation of the stochastic interactions between light and particles, generating statistically characterized scatter patterns.The function of the kernel based on spatial coordinates I(x, y) is defined as follows: The coordinates (x 0 , y 0 ) denote the centroid of the osculating circle located within the central segment of the elliptical distribution, where σ symbolizes the standard deviation thereof.The parameter A signifies the amplitude of said distribution, providing an index of its density, whereas the standard deviation σ conveys the extent of its dispersion.

Validation of the Degradation Model Efficacy
To accurately evaluate the effectiveness of the designed degradation model, this study quantified the degree of degradation by calculating the standard deviation (STD) of texture and noise in image samples.This approach considers the common noise and texture distortion characteristics of underwater images.By analyzing the standard deviations, it is possible to quantitatively assess how different models simulate underwater environments, thereby identifying the most suitable model for super-resolution reconstruction.The analysis results, as illustrated in Figure 2, revealed that the degraded image had a texture STD of 19.83 and a noise STD of 15.05, figures that align more closely with the characteristic high noise levels and lower texture clarity found in underwater imaging environments.In contrast, the texture STD of the image subjected to direct downsampling significantly increased to 78.95, while the noise STD decreased to 5.51.In summary, the texture and noise characterizations of the degraded image further affirm the suitability and superiority of our model for processing underwater images.To thoroughly validate the degradation model's capability to mimic authentic underwater image characteristics, we expanded our sample size and extracted a total of 100 images from the USR-248 dataset in increments of 5 for a comprehensive evaluation.The study involved a comparative analysis of images synthesized by the model against those produced by standard downsampling, focusing on the standard deviations of noise and texture.The analysis organized texture deviation in descending order and noise deviation in ascending order to establish trends.As shown in Figure 3, images processed by the degradation model demonstrate a higher standard deviation in noise and a lower one in texture, aligning more closely with the inherent properties of underwater images., we first exploit one convolution layer to extract the shallow feature , where in C and C denote the channel number of the input and the intermediate feature.Then, a series of attention enhanced residual dense blocks (AERDB) and one 3 × 3 convolution layer are utilized to perform the deep feature extraction, with a total of 7 blocks ultimately employed in the final experiment.The final feature reconstruction layer ultimately generates high-resolution images.The discriminator is composed of feature extraction and activation layers followed by a classifier.It processes generated and real high-resolution images, uses convolutional layers and spectral normalization to stabilize training, and outputs a scalar authenticity score via a sigmoid function after a fully connected layer.

Attention Enhanced Residual Dense Block
As shown in Figure 5, each AERDB fuses the residual in residual dense block (RRDB) with the adaptive residual attention module (ARAM), the latter focusing on key features through its unique energy function, thereby enhancing the overall performance of the module.It is important to highlight that this structure exhibits symmetry, with both the main architecture and the branching structure being symmetric.This symmetry ensures balanced processing and optimization across the entire network, contributing to its effectiveness in image super-resolution.Inspired by SimAM [45], the ARAM implements a unique energy function informed by neuroscience to modulate neuronal activity within feature maps, thereby boosting the model's ability to capture details.This function accounts for the spatial arrangement and activity patterns of neurons, enabling the network to process visual information with greater precision.The energy function is

Attention Enhanced Residual Dense Block
As shown in Figure 5, each AERDB fuses the residual in residual dense block (RRDB) with the adaptive residual attention module (ARAM), the latter focusing on key features through its unique energy function, thereby enhancing the overall performance of the module.It is important to highlight that this structure exhibits symmetry, with both the main architecture and the branching structure being symmetric.This symmetry ensures balanced processing and optimization across the entire network, contributing to its effectiveness in image super-resolution.

Attention Enhanced Residual Dense Block
As shown in Figure 5, each AERDB fuses the residual in residual dense block (RRDB) with the adaptive residual attention module (ARAM), the latter focusing on key features through its unique energy function, thereby enhancing the overall performance of the module.It is important to highlight that this structure exhibits symmetry, with both the main architecture and the branching structure being symmetric.This symmetry ensures balanced processing and optimization across the entire network, contributing to its effectiveness in image super-resolution.Inspired by SimAM [45], the ARAM implements a unique energy function informed by neuroscience to modulate neuronal activity within feature maps, thereby boosting the model's ability to capture details.This function accounts for the spatial arrangement and activity patterns of neurons, enabling the network to process visual information with greater precision.The energy function is Inspired by SimAM [45], the ARAM implements a unique energy function informed by neuroscience to modulate neuronal activity within feature maps, thereby boosting the model's ability to capture details.This function accounts for the spatial arrangement and activity patterns of neurons, enabling the network to process visual information with greater precision.The energy function is In our approach, t = w t t + b t and xi = w t x i + b t represent linear transformations of t and x i , where t is the target neuron and x i signifies other neurons within a single channel of the input feature map X that belongs to R H×W×C .i is index over spatial dimension and M = H × W is the number of neurons on the channel.w t and b t are weight and bias the transform.All variables are scalars, achieving minimum values when t aligns with y t and all x i align with y o , where y t and y o represent distinct scalar values.Minimizing this function is tantamount to ascertaining the linear separability of the target neuron t from its peers in the channel.To streamline this process, binary labels (1 and −1) are assigned to y t and y o Incorporating a regularizer into the equation, the final energy function is formulated as The energy function for each channel, in theory, can be computationally intensive to solve with iterative algorithms such as SGD.Fortunately, a rapid resolution method allows for the efficient calculation of the overall solution: In the model, µ t and σ 2 t , representing the mean and variance across all neurons, can be challenging to compute channel-wise.Therefore, we utilize global mean and variance as proxies.Under this assumption, these statistics are computed over all neurons and applied to adjust the aforementioned neurons, thereby alleviating the model's computational load.Consequently, the minimized energy function can be succinctly expressed by the following formula: The improved e * t can help in differentiating neurons with significant characteristic differences, which is beneficial for feature extraction.Therefore, the importance of each neuron can be obtained by 1/e * t .In this approach, the model not only learns the intensity of each pixel but also the inter-pixel relationships.As shown in Figure 5, to incorporate SimAM into the RRDB module, a SimAM unit is placed right after the output of each dense block.Specifically, the feature map output from RRDB's dense block is fed into SimAM, which then calculates attention weights for every neuron within it, with the weighted output serving as the input for the following layer.Through this process, SimAM adaptively emphasizes features with higher variability or greater importance to the reconstruction task by minimizing its energy function, while suppressing less contributory information for the current task.This adaptive adjustment strategy not only improves RRDB's ability to discriminate features but also enhances the network's generalization capabilities in complex scenarios.
Grad-CAM [46] is applied for visualizing feature activation to evaluate the performance differences between the RRDB module used alone and the RRDB module integrated with an attention mechanism.As shown in Figure 6, the RRDB module with integrated attention mechanism enhances the network's focus on key areas during image processing.This focused attention leads to more pronounced activation of critical features, rather than a uniform distribution of attention across the entire image.Such improvements enhance the model's ability to recognize important information in images, significantly benefiting the accuracy and efficiency of deep learning models in image processing tasks.
a uniform distribution of attention across the entire image.Such improvements enhance the model's ability to recognize important information in images, significantly benefiting the accuracy and efficiency of deep learning models in image processing tasks.

ARAM Sea Slug
Sea Turtle Stingray Clownfish Figure 6.Feature activation comparison using Grad-CAM, illustrating the SimAM's impact on emphasizing key areas in underwater images for various marine species.
On the heatmap, it may be difficult to clearly observe significant differences between different models.Therefore, to assess the model's performance more accurately, we introduced attention scores as an additional metric.Attention scores quantify the degree of focus of deep neural networks on different regions of the image, serving as a measure of model performance.Higher scores indicate a greater focus on key areas, improving the recognition and understanding of critical information within the image.Thus, by utilizing attention scores, we can comprehensively evaluate the model's performance.The specific scores are shown in Table 1.
Table 1.The attention scores when using the RRDB module alone and when integrating the RRDB module with an attention mechanism, where higher scores indicate a greater focus on key areas by the model.

Generator
This study has innovatively enhanced the generator architecture of ESRGAN by integrating an attention mechanism within its RRDB modules.Moreover, the architecture, initially designed for 4× upscaling, has been expanded to support super-resolution at higher scale factors.These improvements not only bolster the network's capability for detail processing but also enhance its versatility across different magnification rates.On the heatmap, it may be difficult to clearly observe significant differences between different models.Therefore, to assess the model's performance more accurately, we introduced attention scores as an additional metric.Attention scores quantify the degree of focus of deep neural networks on different regions of the image, serving as a measure of model performance.Higher scores indicate a greater focus on key areas, improving the recognition and understanding of critical information within the image.Thus, by utilizing attention scores, we can comprehensively evaluate the model's performance.The specific scores are shown in Table 1.This study has innovatively enhanced the generator architecture of ESRGAN by integrating an attention mechanism within its RRDB modules.Moreover, the architecture, initially designed for 4× upscaling, has been expanded to support super-resolution at higher scale factors.These improvements not only bolster the network's capability for detail processing but also enhance its versatility across different magnification rates.

Discriminator with Spectral Normalization
To ensure overall model stability, this study has incorporated spectral normalization into the discriminator architecture to enhance its resistance to interference.The integration of a spectral normalization layer within the discriminator helps to maintain the spectral norm of the weights during training, preventing the weights from growing indefinitely.This approach effectively ensures the stability of the model and significantly improves its ability to resist interference.

Loss Function
The loss function used by the network in this study is defined as where L recons is the pixel-wise reconstruction loss, L percep is the perceptual loss measuring the feature distance in VGG feature space, and L adv denotes the adversarial loss.The coefficients λ 1 , λ 2 , and λ 3 are the balancing parameters, with values typically set to 0.1, 1, and 0.005, respectively.To train our model, two open-source datasets were selected, namely USR-248 [47] and UFO120 [48].The USR-248 dataset is the first dataset designed for the super-resolution reconstruction of underwater images, containing 1060 pairs of underwater images for training and 248 pairs for testing.The UFO120 dataset consists of 1500 training samples and 120 testing samples.The low-resolution images in both datasets are created through artificial simulation and deformation.All samples were processed according to standard procedures for optical and spatial image degradation and combined with manually labeled saliency mappings to generate data pairs.Additionally, to validate the effectiveness of the model, two more datasets, EUVP and SQUID, were used as test datasets.We trained the model separately on the USR-248 dataset and the UFO120 dataset, and then used the model trained on the USR-248 dataset to test the EUVP dataset and the SQUID dataset.The specific information of the datasets used in our experiments is shown in Table 2.The model in this study was developed within the PyTorch framework and trained using the Adam optimizer, with the hyperparameters β1 and β2 for the optimizer set at 0.9 and 0.99, respectively.The initial learning rate was set to 2 × 10 −4 and was halved after 200k iterations, with the entire training process spanning 400k iterations.Training on the dataset utilized input image patches of 64 × 64 pixels, with a batch size set to 32.To improve training effectiveness, symmetric image flipping was used as an augmentation technique.This exposed the model to a wider range of image variations, enhancing generalization performance and adaptability to various underwater conditions.We used NVIDIA RTX 3090 GPUs with CUDA acceleration for all training processes in the experiments.The computer had 32GB of RAM, and the CPU was an i7 13700k.

Evaluation Metrics
This study employs PSNR, structural similarity index measure (SSIM), UIQM, and learned perceptual image patch similarity (LPIPS) to evaluate underwater image superresolution.PSNR and SSIM gauge signal-to-noise ratio and visual similarity of reconstructed images.UIQM addresses quality degradation from underwater scattering and absorption.LPIPS, using deep learning, assesses perceptual image quality, aligning evaluation with human visual observation.Specifically, PSNR and SSIM are computed on the Y channel in the YCbCr space.

Quantitative Results
Table 3 shows the quantitative comparison of our method against other methods on the USR-248 dataset-Deep WaveNet [49], RDLN [50], etc.-while Table 4 presents the quantitative comparison of our method against other methods on the UFO120 dataset: SRDRM [51], AMPCNet [52], HNCT [53], URSCT [54], etc.These outcomes are derived from the average performance metrics across all test samples.Notably, DAE-GAN achieved significant improvements in both SSIM and PSNR metrics and also performed admirably with respect to UIQM and LPIPS metrics.It is imperative to underscore that a lower LPIPS score indicates superior image quality.Upon evaluating the DAE-GAN approach against other state-of-the-art methods, it emerges as the clear leader in the USR-248 dataset, particularly excelling at a 2× scale with a PSNR of 29.95dB and an SSIM of 0.85.Its prowess extends to superior image quality metrics, outshining competitors with the lowest LPIPS score, indicative of higher image fidelity at a 4× scale.As magnification increases to 8×, DAE-GAN consistently upholds its exceptional performance, achieving a PSNR of 23.83dB and an SSIM of 0.64, reinforcing its robustness in enhancing image resolution and quality across scales.Further analysis on the UFO120 dataset corroborates DAE-GAN's superior capabilities.It leads with a notable margin, particularly at 2× magnification, where it achieves a PSNR of 26.26dB and a remarkable SSIM of 0.80, surpassing other methodologies.Even at higher magnifications of 3× and 4×, DAE-GAN maintains its supremacy, reflected through its consistent scores, notably, a top-tier LPIPS of 0.25 at 3× and 0.30 at 4× magnification.

Qualitative Results
To conduct a comprehensive evaluation of DAE-GAN's performance, this study visually compares the effectiveness of various methods.Figures 7 and 8 illustrate the reconstruction results of our method at a 4× scale using a single network for arbitrary-scale SR, clearly showing from the comparisons that DAE-GAN excels in restoring image clarity and texture details, particularly at the edges.These visual results underscore the significant advantages of DAE-GAN in enhancing image quality, demonstrating its efficacy in the task of refined image restoration.This paper employs symmetrical images for visualizing results, clearly demonstrating the DAE-GAN model's precision in symmetry reconstruction.These visualizations are both appealing and assist in the detailed evaluation of the reconstruction process.To conduct a comprehensive evaluation of DAE-GAN's performance, this study visually compares the effectiveness of various methods.Figures 7 and 8 illustrate the reconstruction results of our method at a 4× scale using a single network for arbitrary-scale SR, clearly showing from the comparisons that DAE-GAN excels in restoring image clarity and texture details, particularly at the edges.These visual results underscore the significant advantages of DAE-GAN in enhancing image quality, demonstrating its efficacy in the task of refined image restoration.This paper employs symmetrical images for visualizing results, clearly demonstrating the DAE-GAN model's precision in symmetry reconstruction.These visualizations are both appealing and assist in the detailed evaluation of the reconstruction process.

Model Performance Evaluation on Test Datasets
To rigorously ascertain the efficacy and robustness of the DAE-GAN model propounded in this research, an extensive battery of tests was executed on the EUVP and SQUID datasets.As shown in Table 5, the DAE-GAN model proposed in this study demonstrates solid overall performance in the quantitative evaluation of 4× super-resolution, underscoring its effectiveness in tackling the challenges of underwater image super-resolution from various dimensions.Upon closer examination, the model displays the best performance across all assessment metrics on the SQUID dataset.On the EUVP dataset, it scores slightly lower in the LPIPS index compared to CAL-GAN and is slightly outperformed in the PSNR index by BSRDM [55]; however, it secures the top results in all other relevant evaluation metrics.

Ablation Study
In this section, we examine the importance of every fundamental element within our suggested approach.Through a series of exhaustive ablation experiments on the USR-248

Model Performance Evaluation on Test Datasets
To rigorously ascertain the efficacy and robustness of the DAE-GAN model propounded in this research, an extensive battery of tests was executed on the EUVP and SQUID datasets.As shown in Table 5, the DAE-GAN model proposed in this study demonstrates solid overall performance in the quantitative evaluation of 4× super-resolution, underscoring its effectiveness in tackling the challenges of underwater image super-resolution from various dimensions.Upon closer examination, the model displays the best performance across all assessment metrics on the SQUID dataset.On the EUVP dataset, it scores slightly lower in the LPIPS index compared to CAL-GAN and is slightly outperformed in the PSNR index by BSRDM [55]; however, it secures the top results in all other relevant evaluation metrics.

Ablation Study
In this section, we examine the importance of every fundamental element within our suggested approach.Through a series of exhaustive ablation experiments on the USR-248 dataset, this study comprehensively evaluates and confirms the performance and effectiveness of the proposed degradation model (DM) and ARAM when applied independently and in conjunction, as shown in Table 6.We have employed an innovative training strategy by horizontally flipping every other image and rotating them 180 degrees to augment the dataset, increasing it by 1.5 times.This approach helps the model better grasp symmetry features, enhancing image super-resolution performance.Introducing symmetry enables more accurate capture and reconstruction of symmetric structures, particularly in imagery rich in symmetry like underwater images.This strategy enriches dataset diversity and strengthens the model's ability to recognize and reconstruct various types of symmetric structures underwater, ultimately improving generalization and robustness.The other experimental conditions, consistent with previous experiments, are reflected in the experimental results shown in Table 7.

Conclusions
In this work, an innovative generative adversarial network architecture, termed DAE-GAN, is introduced with the aim of enhancing the super-resolution processing of underwater images.To more accurately reflect the inherently complex and irregular degradation phenomena present in underwater environments, a specialized degradation model was specifically designed and placed at the forefront of the super-resolution network.This model not only simulates the unique degradation process of underwater images but also provides more realistic input conditions for subsequent super-resolution reconstruction.To effectively capture the delicate features in underwater images, an adaptive residual attention module and dense residual blocks were integrated, boosting the network's sensitivity to details and its feature extraction capability.Extensive experiments conducted on multiple datasets, and evaluations at different magnification scales, have demonstrated not only a significant improvement in visual effects but also outstanding performance across multiple objective evaluation metrics.These achievements indicate the potential and practical value of DAE-GAN in the field of underwater image super-resolution.Furthermore, this approach not only provides a fresh avenue for the enhancement of underwater visual technology but also carries substantial implications for the progression of underwater image processing methodologies.Future research directions may involve exploring super-resolution processing in more complex underwater environments, such as deep-sea or multimodal underwater imagery.Potential enhancements could entail the design of novel neural network architectures tailored to deep-sea characteristics and the development of multimodal fusion algorithms.Nevertheless, these avenues also present challenges, including the computational resource requirements for handling large-scale datasets and the high costs associated with data acquisition.

Figure 1 .
Figure 1.Overview of the degradation model, where each degradation process employs the classical degradation model.

Figure 1 .
Figure 1.Overview of the degradation model, where each degradation process employs the classical degradation model.

Figure 2 .
Figure 2. Comparative analysis of texture and noise in degraded versus directly downsampled underwater images.

Figure 2 .
Figure 2. Comparative analysis of texture and noise in degraded versus directly downsampled underwater images.

Figure 2 .
Figure 2. Comparative analysis of texture and noise in degraded versus directly downsampled underwater images.

Figure 3 .
Figure 3. Comparative analysis of noise and texture standard deviations between the degradation model and direct downsampling, with the red line representing the degradation model and the blue line denoting direct downsampling.
3.2.Network Architecture3.2.1.Overall StructureAs shown in Figure4, our proposed symmetric degradation-aware and attention enhanced generative adversarial network (DAE-GAN) is structured into three main components: the degradation model, the generator, and the discriminator.The degradation model converts high-resolution images to low-resolution ones to mimic underwater conditions.The generator comprises a tripartite architecture, encompassing modules for shallow feature extraction, deep feature extraction, and image reconstruction.Specifically, for a given low

Figure 3 .
Figure 3. Comparative analysis of noise and texture standard deviations between the degradation model and direct downsampling, with the red line representing the degradation model and the blue line denoting direct downsampling.

3. 2 .
Network Architecture 3.2.1.Overall Structure As shown in Figure 4, our proposed symmetric degradation-aware and attention enhanced generative adversarial network (DAE-GAN) is structured into three main components: the degradation model, the generator, and the discriminator.The degradation model converts high-resolution images to low-resolution ones to mimic underwater conditions.The generator comprises a tripartite architecture, encompassing modules for shallow feature extraction, deep feature extraction, and image reconstruction.Specifically, for a given low-resolution input I LR ∈ R H×W×C in , we first exploit one convolution layer to extract the shallow feature F 0 ∈ R H×W×C , where C in and C denote the channel number of the input and the intermediate feature.Then, a series of attention enhanced residual dense blocks (AERDB) and one 3 × 3 convolution layer are utilized to perform the deep feature extraction, with a total of 7 blocks ultimately employed in the final experiment.The final feature reconstruction layer ultimately generates high-resolution images.The discriminator is composed of feature extraction and activation layers followed by a classifier.It processes generated and real high-resolution images, uses convolutional layers and spectral normalization to stabilize training, and outputs a scalar authenticity score via a sigmoid function after a fully connected layer.

Figure 4 .
Figure 4.The overall architecture of DAE-GAN, with a symmetric structure in both the generator and discriminator components.

Figure 5 .
Figure 5. Schematic representation of the basic block, accentuating the positional relationship between the dense blocks and ARAMs.

Figure 4 .
Figure 4.The overall architecture of DAE-GAN, with a symmetric structure in both the generator and discriminator components.

Figure 4 .
Figure 4.The overall architecture of DAE-GAN, with a symmetric structure in both the generator and discriminator components.

Figure 5 .
Figure 5. Schematic representation of the basic block, accentuating the positional relationship between the dense blocks and ARAMs.

Figure 5 .
Figure 5. Schematic representation of the basic block, accentuating the positional relationship between the dense blocks and ARAMs.

Figure 6 .
Figure 6.Feature activation comparison using Grad-CAM, illustrating the SimAM's impact on emphasizing key areas in underwater images for various marine species.

Figure 7 .
Figure 7. Qualitative comparison of different methods on ×4 super-resolution for the USR-248 dataset.Regions for comparison are highlighted with red boxes in the original high-resolution images.Zoom in for the best view.Higher PSNR, better quality; lower LPIPS, closer to original.

Figure 7 .
Figure 7. Qualitative comparison of different methods on ×4 super-resolution for the USR-248 dataset.Regions for comparison are highlighted with red boxes in the original high-resolution images.Zoom in for the best view.Higher PSNR, better quality; lower LPIPS, closer to original.

Figure 8 .
Figure 8. Qualitative comparison of different methods on ×4 super-resolution for the UFO120 dataset.Regions for comparison are highlighted with red boxes in the original high-resolution images.Zoom in for the best view.Higher PSNR, better quality; lower LPIPS, closer to original.

Figure 8 .
Figure 8. Qualitative comparison of different methods on ×4 super-resolution for the UFO120 dataset.Regions for comparison are highlighted with red boxes in the original high-resolution images.Zoom in for the best view.Higher PSNR, better quality; lower LPIPS, closer to original.
and have

Table 1 .
The attention scores when using the RRDB module alone and when integrating the RRDB module with an attention mechanism, where higher scores indicate a greater focus on key areas by the model.

Table 2 .
Detailed information on training and validation datasets.

Table 3 .
Experimental evaluation on the USR-248 dataset, offering a quantitative comparison at magnification factors ×2, ×4, and ×8 with other methods, utilizing four metrics: PSNR (dB)↑, SSIM↑, UIQM↑, and LPIPS↓.The best results are highlighted in red and the second-best in blue.

Table 4 .
Experimental evaluation on the UFO120 dataset, offering a quantitative comparison at magnification factors ×2, ×3, and ×4 with other methods, utilizing four metrics: PSNR (dB)↑, SSIM↑, UIQM↑, and LPIPS↓.The best results are highlighted in red and the second-best in blue.

Table 5 .
Experimental evaluation on the EUVP and SQUID test datasets, offering a quantitative comparison exclusively at a magnification factor of ×4 against other methods, utilizing four metrics: PSNR (dB)↑, SSIM↑, UIQM↑, and LPIPS↓.The best results are highlighted in red and the second-best in blue.

Table 5 .
Experimental evaluation on the EUVP and SQUID test datasets, offering a quantitative comparison exclusively at a magnification factor of ×4 against other methods, utilizing four metrics: PSNR (dB)↑, SSIM↑, UIQM↑, and LPIPS↓.The best results are highlighted in red and the second-best in blue.

Table 6 .
Ablation study on the impact of degradation models and ARAM for ×4 super-resolution in the USR-248 dataset.The best results are highlighted in red and the second-best in blue.

Table 7 .
The ablation study on the impact of image flipping for ×4 super-resolution in the USR-248 dataset aims to investigate the effect of image flipping on the symmetry features in the superresolution task.