A Method for Underwater Image Enhancement Utilizing Polarization Inspired by the Mantis Shrimp’s Multi-Dimensional Visual Imaging Mechanism

Liu, Qingyu; Li, Ruixin; Li, Congcong; Chen, Canrong; Huang, Yifan; Yang, Huayu; Yuan, Fei

doi:10.3390/jmse14060582

Open AccessArticle

A Method for Underwater Image Enhancement Utilizing Polarization Inspired by the Mantis Shrimp’s Multi-Dimensional Visual Imaging Mechanism

by

Qingyu Liu

¹

,

Ruixin Li

¹

,

Congcong Li

¹

,

Canrong Chen

¹

,

Yifan Huang

²

,

Huayu Yang

¹

and

Fei Yuan

^1,*

¹

Key Laboratory of Underwater Acoustic Communication and Marine Information Technology, Ministry of Education, Xiamen University, Xiamen 361005, China

²

School of Information and Communication, Guilin University of Electronic Technology, Guilin 541004, China

^*

Author to whom correspondence should be addressed.

J. Mar. Sci. Eng. 2026, 14(6), 582; https://doi.org/10.3390/jmse14060582

Submission received: 11 February 2026 / Revised: 18 March 2026 / Accepted: 19 March 2026 / Published: 21 March 2026

(This article belongs to the Special Issue Application of Deep Learning in Underwater Image Processing—2nd Edition)

Download

Browse Figures

Versions Notes

Abstract

Optical attenuation caused by absorption and scattering in turbid water significantly degrades underwater image quality, making reliable underwater imaging a challenging problem. Underwater polarization imaging has attracted increasing attention because of its ability to suppress scattered light and provide additional polarization cues. However, existing polarization-based enhancement approaches often adapt conventional underwater image enhancement strategies, and the multi-dimensional characteristics of polarization information are not always fully utilized, which may limit detail restoration in complex underwater environments. To address this issue, this paper proposes a bio-inspired underwater polarization image enhancement framework motivated by the polarization vision mechanism of marine organisms. Specifically, a two-stage architecture consisting of a Polarization Adversarial Network (PAN) and a Polarization Enhancement Network (PEN) is designed. The PAN incorporates a Bionic Antagonistic Module (BAM) to exploit complementary information among polarization channels, while Salient Feature Extraction (SFE) is introduced to reduce redundant feature interference. The subsequent PEN integrates a frequency-aware Mamba-based structure to enhance feature representation and improve detail reconstruction. Experiments on simulated underwater polarization datasets indicate that the proposed framework can effectively suppress backscattering and improve structural detail visibility in challenging underwater scenes, demonstrating competitive performance compared with representative traditional and learning-based methods.

Keywords:

underwater image enhancement; deep learning-driven; biomimetic visual processing; polarization imaging; underwater de-scattering

1. Introduction

Autonomous underwater vehicles (AUVs) have been deeply applied in various fields such as marine resource development, archaeological exploration, and military reconnaissance [1,2,3], depending on their high intelligence and flexible operational capabilities. In marine information acquisition technology systems, underwater optical imaging has become a core means of obtaining marine environment and target information due to its advantages of intuitive visual presentation and rich detail expression. However, the absorption effect of water molecules and the scattering effect of suspended particles in the aquatic environment cause energy attenuation and optical path distortion during light propagation. Particularly under complex hydrological conditions, optical signals struggle to achieve lossless long-distance transmission. This degradation manifests itself directly as diminished imaging contrast and blurred details, which severely constrain both the quality of information acquisition and the efficiency of the application of underwater vision systems.

To address the issue of degraded underwater image quality, researchers have developed numerous image restoration methods, including model-free [4,5,6,7], model-based [8,9,10,11], and data-driven approaches [12,13,14,15]. Model-free methods mostly improve the visual quality of underwater images by adjusting pixel values. However, these methods ignore the physical causes or intrinsic mechanisms of underwater image degradation, often introducing color distortion or noise amplification while enhancing local contrast. Model-based methods recover image information by constructing physical process models of underwater imaging with key parameters. Nevertheless, these methods highly depends on the model’s fitting accuracy to the real underwater environment. Idealized models often fail to fully match dynamically changing real scenes, leading to parameter estimation biases in complex aquatic environments. Data-driven methods rely on large amounts of paired data to train networks, allowing models to learn the mapping from degraded to pristine states. However, such strictly aligned image pairs before and after degradation in natural scenes are extremely difficult to obtain, resulting in high costs for training data acquisition and difficulty in covering the complex and variable degradation patterns in real scenarios, greatly limiting the model’s generalization capability in practical applications. Therefore, faced with the diversity and dynamics of image degradation mechanisms in complex water environments, researchers have begun to turn to other optical enhancement methods.

As one of the fundamental properties of light, polarization contains rich information about the imaging scene. Influenced by physical characteristics such as shape, shadows, and roughness, it can provide information significantly uncorrelated with other physical properties [16]. Fully utilizing the differences and uniqueness of polarization information in underwater scattered light fields can effectively remove water scattering, improve imaging quality, and increase imaging distance. Biological research shows that many marine creatures also possess polarization vision, such as octopuses, fiddler crabs, and mantis shrimps [17], which similarly demonstrates that polarization vision can effectively cope with highly scattering and turbid water environments, offering new perspectives for underwater optical imaging.

Inspired by the multi-channel visual imaging mechanism of the peacock mantis shrimp’s polarization-sensitive compound eyes, a novel underwater polarization image enhancement method is proposed. The experimental results demonstrate that this method effectively resists various visual interferences and maintains highly robust visual processing in complex underwater scenes by simulating the physiological mechanisms of the peacock mantis shrimp in processing information. The main contributions of this paper are as follows:

Drawing inspiration from the multichannel antagonistic imaging structure and frequency division sensing function of retinal cells in the compound eyes of the marine organism mantis shrimp, this study develops a bio-inspired underwater polarization image enhancement method. The method is dedicated to effectively restoring the details of underwater polarization images and improving the visual discriminability of the images;
To maximize the retention of information provided by mutually orthogonal polarization subgraphs by simulating the polarization vision formation mechanism and the microvillus structure of the retina of the mantis shrimp, this study proposes PAN (Polarization Adversarial Network). BAM (Bionic Antagonistic Module) is designed in the fusion stage of this network. Combining the design concept of “suppression–counter-suppression”, this module simulates the game strategy between orthogonal channels during the imaging process. It determines priority perception regions through SFE (Salient Feature Extraction), and it maximizes the retention of the complementary features of each channel in an unsupervised paradigm, laying a foundation for the subsequent extraction of polarization features submerged in a large amount of noise;
After processing by the polarization adversarial network, a novel frequency-domain Mamba-based PEN (Polarization Enhancement Network) is established. The design takes cues from the channel division mechanism of retinal cells and incorporates unique conclusions derived from frequency-domain observations. Equipped with elaborately designed GAMBs (Global Aware Mamba Blocks), PEN leverages the strong long-range information capture capability of Mamba networks to effectively extract low-frequency information, including color and contour details. It further integrates residual block-based extraction of high-frequency details, thereby achieving robust enhancement of turbid underwater targets. Experimental results verify that the proposed overall network exhibits remarkable performance and efficiency in addressing the turbidity problem of underwater polarization images.

2. Related Work

2.1. Underwater Optical Image Enhancement

Underwater optical imaging offers superior performance over other detection methods in terms of detail rendering, real-time operation, intuitiveness, and potential for multi-spectral data fusion. Thus, its status in underwater tasks has gradually increased. Current underwater image enhancement methods can typically be divided into three types: model-free, model-based, and data-driven.

2.1.1. Model-Free Methods

Model-free methods employ statistical features or prior knowledge of the image itself, mainly including enhancements based on Histogram Equalization (HE) [18] and corrections based on Retinex theory [19], as well as variants of these two methods, such as Contrast Limited Adaptive Histogram Equalization (CLAHE) [4], Global Histogram Stretching [20], Retinex variational model inspired by hyper-Laplacian reflectance priors [21], Multi-Scale Retinex (MSR) [22], etc. In fact, these methods can enhance the contrast in the image by adjusting the statistical distribution of the pixel values. However, when water turbidity increases and the illumination distribution becomes uneven, this “superficial optimization” shows significant shortcomings. The global/local statistical features or brightness/reflectance prior knowledge they rely on can fail due to drastic environmental light fluctuations, making it difficult to stably recover target details and contrast, ultimately leading to significant deviations between enhancement and real scenes.

2.1.2. Model-Based Methods

Model-based methods start from the physical mechanism of underwater imaging [23], combine prior analysis of the underwater environment, and establish mathematical models of light propagation in water to restore clear images. Among them, the Jaffe–McGlamery model [24] laid the foundation for underwater optical imaging models, describing the effects of light attenuation and backscattering on images. UDCP proposed by Drews et al. [25] introduced the Dark Channel Prior (DCP) used in atmospheric environments on underwater images, estimating image transmission and ambient light by analyzing the dark channel prior of underwater images. Song et al. [26] proposed a rapid scene depth estimation model using an underwater light attenuation prior to estimate background light and a transmission map, effectively removing blur and chromatic aberration in underwater images. Liu et al. [27] proposed an underwater image enhancement method based on an adaptive attenuation-curve prior, estimating the transmission map through the distribution of pixels on the attenuation curve. Hou et al. [11] designed a variational framework based on the Illumination Channel Sparsity Prior (ICSP), effectively correcting non-uniform illumination via HSI brightness statistics and employing guided transmission estimation with ADMM optimization to achieve high-quality restoration. Model-based methods typically consider the absorption and scattering characteristics of water, preserving physical constraints between images during degradation. However, the precision of model establishment and prior conditions severely affects image restoration quality, thus limiting the stability and universality of enhancement.

2.1.3. Data-Driven Methods

Data-driven methods leverage the powerful feature extraction and mapping capabilities of deep learning, training networks with large amounts of data to achieve underwater image enhancement. Chen et al. [28] proposed an underwater perception network based on multi-scale feature fusion, which restores images by extracting, fusing, and attentively reconstructing image features. Yan et al. [29] combined GAN networks with underwater imaging models, using generative adversarial networks to directly estimate background light, transmission maps, scene depth, and attenuation coefficients, improving network interpretability. Saini et al. [30] further constrained the performance of GANs on underwater images using refinement networks, preserving more key features. Yang et al. [31] designed a Transformer-based dual-branch network that jointly trains image enhancement and depth estimation to improve image quality. Wang et al. [32] introduced reinforcement learning into underwater environments, addressing the lack of paired data, missing depth information, and difficulties in physical model parameter tuning through three progressively advanced stages. In addition, drawing inspiration from the biological visual system provides another unique approach to addressing underwater image degradation. Li et al. [33] simulated the functional modules of various cells in the biological visual system, enhancing the interpretability of the network and effectively balancing brightness and correcting color distortion. However, data-driven methods mostly rely on large amounts of high-quality training data, and their generalization capability and computational efficiency still need further improvement.

2.2. Underwater Polarization Image Enhancement

Recently, underwater polarization imaging technology has developed significantly with advances in polarization basic theory and polarization camera technology. In early-stage research [34], the application of polarization images was mainly based on basic polarization parameters such as DoP and AoP. By analyzing the differences in polarization characteristics between target light and scattered light, preliminary separation of scattered light was achieved. While differential imaging offers a route to polarization information [35], these methods are often insufficiently robust for challenging conditions. Originally developed for hazy weather, they fail to account for the stronger attenuation characteristics of underwater environments.

Subsequently, researchers have proposed various methods combining polarization characteristics to construct underwater imaging models for obtaining high-quality underwater polarization images. Schechner et al. [36] first proposed an underwater passive polarization imaging model, combining polarization with underwater imaging models. later, Guan et al. [37] further proposed using Stokes vectors, with model analysis showing that background can be completely eliminated under common-mode suppression conditions, thereby improving imaging performance. Huang et al. [38] extended the underwater passive polarization imaging model by proposing a novel version that incorporates the degree of polarization of the target information light. This model integrates intermediate maps and polarization orthogonal differential signals to derive the optimal scene transmittance. Han et al. [39] proposed a method based on optical correlation to find the optimal image pair with polarization characteristics. Wang et al. [40] used median filtering of polarization images to estimate background dispersed light and used the bright channel method to estimate background light at infinity, enhancing the contour retention of underwater targets. Model-based methods typically consider the physical relationships between multi-polarization angle images, have real-time imaging capability, and are good at denoising Gaussian noise, but they perform poorly under actual complex noise conditions.

Benefiting from the powerful nonlinear feature learning capability of deep learning technology, researchers have attempted to use deep learning technology to achieve polarization imaging in turbid underwater environments. Hu et al. [41] designed a densely connected neural network for training to learn rich multi-layer feature information. Zhang et al. [42] used one grayscale image and four polarization images as input, hoping to restore images in turbid water by adding polarization images to the dataset. Qi et al. [43] used the CycleGAN architecture for unpaired polarized underwater image restoration, introducing unsupervised learning. Guan et al. [44] introduced a loss function based on SROCC (Spearman’s Rank Correlation Coefficient) to constrain polarization image network training, avoiding prior limitations and more reasonably obtaining the rich information contained in polarization images. Liu et al. [45] decomposed polarization images into high- and low-frequency layers via Fourier transform and processed them separately in two sub-networks to achieve noise elimination. These methods demonstrate the potential of deep learning in underwater polarization image restoration. However, most existing approaches mainly treat polarization images as multi-channel inputs or rely on feature-level fusion, while the intrinsic relationships among polarization components are not explicitly modeled. As a result, the multi-dimensional characteristics of polarization information are not fully exploited, which may limit detail reconstruction in complex underwater environments. In addition, recent works have begun to investigate domain generalization and adaptive restoration strategies. For example, Tian et al. [46] proposed a domain-adversarial learning framework to improve generalization across different water types, and Zhang et al. [47] introduced an adaptive partition strategy for polarization image restoration in complex scenes. Nevertheless, effectively integrating polarization characteristics into deep network architectures remains an open problem.

3. Methods

3.1. Visual Imaging Mechanism of Mantis Shrimp

Crustaceans are creatures that have lived in the ocean for a long time. After billions of years of evolution, they have evolved eye structures with different structures to adapt to the complex and changeable marine environment. The mantis shrimp, as one of them, possesses one of the most complex visual systems in the world.

3.1.1. Visual Information Reception Mechanism

Distinct from human binocular vision, the peacock mantis shrimp has multiple compound eyes located on movable eye stalks that can rotate independently in the light signal reception stage. Each compound eye consists of three morphologically discontinuous regions: a ventral Peripheral Region (vPR), dorsal Peripheral Region (dPR), and a narrow linear Mid-Band (MB) region. Its structure is shown in Figure 1. The outermost layer is the cornea, which is the first refractive medium for light entering the compound eye, providing preliminary focusing of light and protection. Subsequently, the crystalline cone, as the second refractive medium, helps the mantis shrimp precisely focus light onto the photoreceptor cells (R1 to R8 cells), achieving refractive adjustment for both distant and near vision. The photoreceptor cells have fine microvilli, and mutually orthogonal microvilli interweave to form ring-like structures, as shown in Figure 2. The microvilli of R1, R4, and R5 cells are oriented in one direction, perpendicular to those of R2, R3, R6, and R7 cells. The distal R8 cell has microvilli in both directions, which are also mutually orthogonal and have a 45-degree vector difference from the two directions formed by the aforementioned R1–R7 cells [48]. This configuration forms polarization angles of 0°, 45°, 90°, and 135°. Receptor sensitivity is highest when the polarization angle of light is parallel to the long axis of the microvilli, which is the principle behind its polarization vision. Thanks to the compound eye structure and microvilli, the mantis shrimp can acquire visual information under different polarization angles and integrate high-quality visual information from various angles. We define this mechanism as the “antagonistic” mechanism.

3.1.2. Photosensitive Signal Processing Mechanism

Compared to humans who only have three types of photoreceptor cells corresponding to the three primary colors red, green, and blue, the mantis shrimp has 16 different types of photoreceptors in its eyes when their neural signals are converted. These photoreceptors provide rich band selection, allowing more spectral signals to be converted into neural signals. Research shows that R8 cells in the mantis shrimp’s eyes are sensitive to ultraviolet light, with sensitivity peaks at 310, 320, 330, 340, and 380 nm [49], while the cell groups of R1, R2, R3, R4, R5, R6, and R7 are mainly responsible for color channels from 400 to 700 nm. Namely, when processing light signals, different retinal cells have distinct divisions of labor, independently processing information for low-frequency and high-frequency waves, ultimately combining processing to form vision transmitted to the brain.

3.2. Overall Framework

Inspired by the aforementioned visual information reception mechanism and photosensitive signal processing mechanism of the mantis shrimp, a two-stage underwater polarization feature enhancement network is proposed in this section. Figure 3 shows the overall framework of the proposed method.

The first-stage PAN functionally maps to the visual information reception mechanism. First, mutually orthogonal polarization antagonistic pairs are input into the generator, here taking 0°–90° as an example (similarly for 45°–135°), corresponding to

P_{h} - P_{v}

(

P_{d} - P_{f}

), generating a rough output image. Then, the SFE is used to extract salient masks

{M a s k}_{h v}

(

{M a s k}_{d f}

) from the polarization antagonistic image pairs to determine the information parts to be retained in each polarization antagonistic pair. In the discriminator part, we multiply the image output by the generator with masks corresponding to

P_{h}

,

P_{v}

(

P_{d}

,

P_{f}

) to obtain

G_{h}

,

G_{v}

(

G_{d}

,

G_{f}

), which are then input into two discriminators along with the input

P_{h}

,

P_{v}

for discrimination. The discriminators enhance our model features through adversarial games with the generator, corresponding to the antagonistic mechanism during visual reception. The intention of this design is that we know polarization images have different information feature representations for the same scene, so instead of designing a series of new image optimization rules for the entire image, we leverage this advantage to compare the output image with the advantageous parts of the input polarization antagonistic pairs for screening. This part of the network is inspired by the cell structure used by the peacock mantis shrimp to receive light intensity signals. Its task is not actually de-turbidity but to make two mutually orthogonal polarization pairs retain as much information as possible, providing higher quality polarization features for subsequent image de-turbidity.

The second-stage PEN echoes the information processing logic of the photosensitive signal processing mechanism. When the generator can produce sufficiently high-quality polarization feature maps, we channel-wise concatenate the output

{O u t p u t}_{h v}

and

{O u t p u t}_{d f}

as input to this stage. The polarization enhancement network extracts image features at different scales through up and down sampling. For each layer of features, a Mamba-based Frequency Enhance Block (MFEB) is used for enhancement. Specifically, wavelet transform is first used to decompose into low-frequency and high-frequency passbands. For the low-frequency passband, a self-designed Global Aware Mamba Block (GAMB) is used to strengthen global information. For the high-frequency passband, CNN residual modules are used to extract local details. Through targeted enhancement of respective frequency bands, impurities in low-frequency information can be effectively removed, preserving better contour textures. Finally, inverse wavelet transform is performed to complete the image de-turbidity effect. This stage processes polarization images in separate frequency domains according to the mechanism of independent processing of low-frequency and high-frequency waves of photosensitive signals. The polarization image quality is significantly enhanced, achieving our final image clarification goal, while underwater color cast is also corrected to some extent. While preserving the integrity of key information, visual effects and physical features are synergistically optimized.

3.3. PAN

3.3.1. SFE

Photoreceptor cells in the retina are not uniformly distributed but form high-density aggregations in the macular region—cone cells accurately capture effective light reflected by targets through selective responses to specific wavelengths. Meanwhile, photoreceptor cells form hierarchical connections with bipolar cells and ganglion cells, suppressing background noise signals through lateral inhibition effects, making neural impulses more focused on visual information in the target area. This signal screening–noise suppression–target enhancement mechanism implemented at the cellular level is essentially an efficient information focusing strategy formed through biological evolution, ensuring priority perception of key targets in complex underwater environments. Inspired by this, we designed the SFE to evaluate the retention degree of each piece of information. It consists of no-reference image quality assessment (NR-IQA), entropy (EN), and luminance contrast (LC). For no-reference image evaluation, we use Natural Image Quality Evaluator (NIQE) for assessment, which can measure whether image quality declines due to specific types of distortions, such as blur, compression, and other forms of noise. EN is used to calculate the amount of information carried by the input image, and LC is used to evaluate the illumination quality of the image, constraining over-bright or over-dark areas. NIQE, EN, and LC are all normalized to a unified scale using min–max normalization.

The overall module design is shown in Figure 4.

The definitions of NIQE, EN, and LC are as follows:

N I Q E = \sqrt{{(V_{I} - V_{t r a i n})}^{T} Σ_{t r a i n}^{- 1} (V_{I} - V_{t r a i n})},

(1)

where

V_{I}

is the mean feature vector of the distorted image to be tested,

V_{t r a i n}

is the mean feature vector of natural images obtained from training, and

Σ_{t r a i n}

is the covariance matrix of the training set features.

E N = - \sum_{l = 0}^{L - 1} p_{l} {log}_{2} p_{l},

(2)

where L denotes the total number of gray levels in the image with

L = 256

. l is the gray-level index with

l = 0, 1, \dots, L - 1

.

p_{l}

represents the probability of occurrence of the l-th gray level. EN measures the information content of an image. A higher entropy value indicates greater information content of the image, corresponding to richer complexity and finer details.

L C (I_{k}) = ∥ I_{k} - I_{i, j} ∥,

(3)

where

∥ \cdot ∥

represents the color distance metric, with a value range of (0, 255).

I_{i, j}

represents the average brightness value of the corresponding image.

The mask

{M a s k}_{h v}

generated by the SFE (similarly for

{M a s k}_{d f}

) can be expressed as:

{M a s k}_{h v} = \frac{ω_{h}}{ω_{h} + ω_{v}},

(4)

ω_{h} = \frac{L C_{P_{h}}}{L C_{P_{h}} + L C_{P_{v}}} (N I Q E_{P_{h}} + E N_{P_{h}}),

(5)

ω_{v} = \frac{L C_{P_{v}}}{L C_{P_{h}} + L C_{P_{v}}} (N I Q E_{P_{v}} + E N_{P_{v}}),

(6)

\hat{{M a s k}_{h v}} = 1 - {M a s k}_{h v},

(7)

When the value of

{M a s k}_{h v}

is greater than 0.5, it represents the area where

P_{h}

needs to be retained, shown in white, while when the value of

{M a s k}_{h v}

is less than 0.5, it represents the area where

P_{v}

needs to be retained, shown in black. It is worth noting that through this design, we no longer only use

{F u s i o n}_{h v}

generated by the generator as the pseudo-image, but we use

{M a s k}_{h v}

to reconstruct

{F a k e}_{h}

and

{F a k e}_{v}

, specifically:

{F a k e}_{h} = {M a s k}_{h v} \times P_{h} + \hat{{M a s k}_{h v}} \times {F u s i o n}_{h v},

(8)

{F a k e}_{v} = \hat{{M a s k}_{h v}} \times P_{v} + {M a s k}_{h v} \times {F u s i o n}_{h v},

(9)

The intention of this design is to enable the discriminator to still retain the polarization features required by

P_{h}

and

P_{v}

.

3.3.2. BAM

Microvilli are distributed on the surface of photoreceptor cells. These microvilli can directly convert polarized light signals into neural electrical signals through selective absorption of light in different polarization directions, forming precise polarization vision. Photoreceptor cells with mutually orthogonal microvilli directions are stacked to form ring structures. Orthogonal channels achieve a dynamic balance of suppression and counter-suppression through interactions of cell membrane potentials. This antagonistic mechanism can both amplify effective polarization information and suppress noise interference, ensuring efficient transmission of visual signals. To mimic this cell-level antagonistic processing mechanism, we designed the BAM. Through adversarial learning between orthogonal polarization image pairs in the generator and discriminator, it simulates the orthogonal polarization perception characteristics of microvilli and the suppression–counter-suppression dynamic balance of ring cell structures, ultimately achieving maximum retention of polarization information during the fusion process.

The network architecture of the generator is shown in Figure 5. First, the polarization antagonistic pair images are concatenated in the channel dimension, passed through a

5 \times 5

HA [50] convolutional layer, and then through three

3 \times 3

HA convolutional layers. For each convolutional layer, the stride is set to 1 to not change the feature map size, and residual connections are applied to improve the model’s ability to learn complex features and training efficiency. After all convolutional layers, Batch Norm and Leaky ReLU activation functions are used to retain information and enhance nonlinear expression capability, avoiding gradient disappearance. For the last layer, we use a

3 \times 3

HA convolution and Tanh activation function. The input and output channels of the five layers are 6:16, 16:16, 32:16, 48:16, and 64:3, respectively.

The loss function consists of three parts: adversarial loss

L_{a d v}

, gradient loss

L_{g r a d}

, and mean squared error loss

L_{m s e}

. The specific formula is as follows:

L_{G} = α L_{a d v} + β L_{g r a d} + γ L_{m s e},

(10)

where

α

,

β

, and

γ

are weight coefficients set to 1.

L_{a d v}

represents the adversarial loss between the generator and the two discriminators, encouraging the generator to generate more “realistic” samples, defined as:

L_{a d v} = \frac{1}{N} \sum_{i = 1}^{N} [{(1 - D_{1} ({F a k e}_{h}))}^{2} + δ {(1 - D_{2} ({F a k e}_{v}))}^{2}],

(11)

where

N = H \cdot W

represents the image size, and

δ

is a weight coefficient set to 1. The definitions of

{F a k e}_{h}

and

{F a k e}_{v}

can be found in Section 3.3.1.

L_{g r a d}

is defined as:

L_{g r a d} = \frac{1}{N} \sum_{i = 1}^{N} (∥ \nabla {F u s i o n}_{h v} - \nabla P_{h} ∥^{2} + ε ∥ \nabla {F u s i o n}_{h v} - \nabla P_{v} ∥^{2}),

(12)

where

{F u s i o n}_{h v}

represents the fused image of the

P_{h} - P_{v}

antagonistic pair generated by the generator, ∇ represents the gradient operator, and

ε

is a weight coefficient set to 0.5 here.

L_{m s e}

is defined as:

L_{m s e} = \frac{1}{N} \sum_{i = 1}^{N} (∥ {F u s i o n}_{h v} - P_{h} ∥^{2} + ϵ ∥ {F u s i o n}_{h v} - P_{v} ∥^{2}),

(13)

ϵ

is set to 0.5 here.

L_{g r a d}

and

L_{m s e}

are used to force the fused image to retain a large amount of information from the source images.

There are two discriminators, D1 and D2, in our network, and their architectures are the same, as shown in Figure 6. First, they go through four

3 \times 3

HA convolutional layers, forming a five-layer convolutional neural network. The stride is set to 2. We apply Batch Norm in the second, third, and fourth convolutional layers, and we use Leaky ReLU activation functions in the first four layers. The last layer uses Adaptive Pool for average pooling, followed by classification through a fully connected layer. The input and output channels of the five layers are 3:32, 32:64, 64:128, 128:256, and 256:1, respectively.

The design of the loss function is as follows:

L_{D 1} = \frac{1}{N} \sum_{i = 1}^{N} [{(1 - D_{1} (P_{h}))}^{2} + {(D_{1} ({F a k e}_{h}))}^{2}],

(14)

L_{D 2} = \frac{1}{N} \sum_{i = 1}^{N} [{(1 - D_{2} (P_{v}))}^{2} + {(D_{2} ({F a k e}_{v}))}^{2}],

(15)

L_{D 1}

and

L_{D 2}

represent the losses of the two discriminators

D_{1}

and

D_{2}

, respectively, used to force the discriminators to “see through” the fake images generated by the generator.

3.4. PEN

3.4.1. Analysis of Frequency Domain Characteristics

To further determine what kind of enhancement methods are suitable for photosensitive signals in the high- and low-frequency domains, we first analyze underwater images in the frequency domain. As shown in Figure 7, we use 2D wavelet transform to decompose the turbid underwater image (Turbid) and the clear GroundTruth (Clear) into corresponding high- and low-frequency subbands. Then, we exchange the high- and low-frequency subbands of the turbid image and the clear image to generate a synthetic turbid image (SynTurbid) containing the low-frequency subband of the turbid image and the high-frequency subband of the clear image, and a synthetic clear image (SynClear) containing the low-frequency subband of the clear image and the high-frequency subband of the turbid image.

First, observing the frequency distribution maps of the two input images, it can be found that the RGB channel distribution of the clear image is relatively uniform, while the red channel of the turbid image is almost completely attenuated. After exchanging the high-frequency subbands with the turbid image, the RGB channel distribution of the clear image remains uniform, not much different from the clear image itself, but the peak frequency of the contained pixel values is significantly reduced. After exchanging high-frequency information with the clear image, the attenuation of the red channel of the turbid image is significantly less severe. Although the changes in the cyan channel distribution are not as obvious as those in the red channel, each component has significantly increased.

Then, observing the exchanged synthetic images from a visual perspective, it can be seen that because the turbid image receives high-frequency information from the clear image, much edge information submerged in background noise is restored, but these edges are very harsh and do not meet human visual requirements. The clear image loses the originally detail-rich high-frequency information. Although the general outline of the image can be recognized, it presents a “hazy and blurry” state, as if the resolution has been reduced.

Through the above observations, we can draw the following conclusion: in the wavelet domain, most of the image information exists in the low-frequency subbands. This low-frequency information contains the overall structure and illumination information of the image, dominating category judgment, such as the shape outline of objects, while sparse texture details exist in the high-frequency subbands. This high-frequency information often accounts for a small proportion in the image and is very sensitive to noise, but it determines the quality of imaging and the clarity of image details.

3.4.2. Network Structure Design

The retinal cells of the peacock mantis shrimp show a highly specialized division of labor mode in optical signal processing. Based on the aforementioned discovery, we use a multi-scale feature extraction structure [51] as the basic framework to build a polarization enhancement network. To imitate the frequency response characteristics of different cell groups, we carefully designed a new MFEB for feature enhancement at each scale. This module performs parallel optimization of high- and low-frequency branches for images. Figure 8 shows the network details.

For the low-frequency domain containing more content information, we use 2D-SSM [52] more suitable for the image field to build the Mamba branch to achieve accurate capture of global content information in the low-frequency domain. For this purpose, GAMB is specially designed in this branch. First, the features are projected through a linear layer, then the main line uses the sequence modeling capability of 2D-SSM to maintain attention to the global receptive field. The branch uses ReLU activation and then multiplies element-wise with the main line. Finally, it goes through a linear layer to achieve feature reorganization. This design avoids excessive attention to irrelevant noise, while its computational complexity has an exponential improvement compared to transformer, effectively balancing performance and computational cost, functionally echoing the high efficiency and energy saving characteristics of the biological visual system.

In the high-frequency domain containing more edge information,

3 \times 3

convolutional layers combined with residual connection methods are used for capture. Each convolutional layer is followed by a ReLU activation function to improve the model’s ability to learn complex features and training efficiency.

The loss function of the network consists of content loss

L_{c o n t e n t}

, edge loss

L_{S S I M}

, and perceptual loss

L_{T V}

:

L_{F U} = λ L_{c o n t e n t} + μ L_{S S I M} + ν L_{T V},

(16)

where

λ

and

μ

are set to 0.95, while

ν

is set to 1. The content loss

L_{c o n t e n t}

is used to constrain the absolute error at the pixel level, maintaining the overall brightness and contrast of the image. The edge loss

L_{S S I M}

is used to measure the structural similarity of the image, focusing on the preservation of texture and details. The perceptual loss

L_{T V}

is used to avoid meaningless pixel fluctuations in the generated image, prompting the image to remain smooth in local areas while preserving necessary edge information. Specifically:

L_{c o n t e n t} = {∥ O u t p u t - G T ∥}_{L 1},

(17)

L_{S S I M} = 1 - S S I M (O u t p u t, G T),

(18)

L_{T V} = \frac{1}{N} (\sum_{i = 1}^{H - 1} \sum_{j = 1}^{W} {(x_{i + 1, j} - x_{i, j})}^{2} + \sum_{i = 1}^{H} \sum_{j = 1}^{W - 1} {(x_{i, j + 1} - x_{i, j})}^{2}),

(19)

where

N = H \cdot W

represents the image size.

4. Results

4.1. Experimental Settings

4.1.1. Datasets

Given the absence of publicly available authoritative underwater polarization datasets to date, this study adopts a generated dataset for network training. Specifically, the classic semantic segmentation dataset Foggy Cityscapes-DBF [53] is selected as the base dataset, and an underwater active imaging model is employed to simulate underwater images.

I = I_{n a t u r a l} + I_{a r t i f i c i a l},

(20)

where

I_{n a t u r a l}

represents imaging under natural light, and

I_{a r t i f i c i a l}

represents imaging with artificial light sources:

I_{n a t u r a l} = R \cdot t (x, y) + A_{\infty} \cdot (1 - t (x, y)),

(21)

I_{a r t i f i c i a l} = R \cdot f (x, y) \cdot t (x, y) \cdot E_{A},

(22)

where

t (x, y) = e^{- β \cdot z}

is the transmittance (transmission map) of the underwater propagation process,

β = {β (c)}

is the scattering coefficient,

z = {z (x, y)}

represents the distance between the pixel point

(x, y)

and the camera,

f (x, y) = 1 - \frac{\sqrt{{(x - x_{0})}^{2} - (y - y_{0})}}{r}

is the range function, representing the illumination influence range of the artificial light source.

(x_{0}, y_{0})

is the illumination center of the artificial light source. Around the center point, the illumination gradually decreases. This paper sets the center point as the image center.

E_{A}

is the illumination intensity of the artificial light source. Figure 9 shows the generated natural light underwater images and corresponding artificial light source underwater images.

Then, the polarization imaging formula from Ref. [54] is used to generate simulated underwater polarization images, as shown in Figure 10. Table 1 presents the ranges of values for the required random parameters.

In the exploration of improving the model’s adaptability to real-world scenarios, this study further simulates and constructs polarization images based on the real underwater dataset SUIM [55], aiming to verify the generalization capability of the proposed method in underwater scenes and make the simulated polarization dataset more consistent with the actual characteristics of real underwater environments. The SUIM dataset provides high-precision semantic segmentation maps, which can clearly distinguish different regions including underwater targets, water bodies, and seabed backgrounds. Taking advantage of this unique feature, polarization degree parameters with differentiated values can be randomly generated in a targeted manner according to the actual physical properties of each region. This effectively avoids the scene distortion problem caused by the global homogenization of polarization parameters. On this basis, the reference-free polarization dataset constructed via Malus’ Law [56] is used for subsequent verification of the model’s generalization performance. This ensures that the proposed network can still stably achieve enhancement effects in unseen real underwater scenarios. The effect of the underwater reference-free polarization dataset is illustrated in Figure 11.

4.1.2. Experimental Details

Experiments were conducted on a total of 2000 sets of simulated underwater polarization data. To ensure fair evaluation, the dataset was randomly divided into independent training and testing subsets, with no overlap between the two sets. All input images were resized to a spatial resolution of 256 × 256 pixels before being fed into the network. The network was implemented using the PyTorch (version 2.2.2) framework and optimized with the Adam optimizer, using a fixed learning rate of 0.0001 and a batch size of 4. During the data generation process, the parameters used for simulating underwater polarization images were sampled within predefined ranges. To further enhance the diversity of the simulated data and improve the generalization ability of the network, small, random perturbations were introduced to the parameter values so that even samples generated under the same parameter range exhibited slight variations. The training process was conducted for 400 iterations per stage under the above settings. To reduce the influence of random initialization and ensure the reliability of the results, we conducted multiple independent training runs using different random seeds and observed consistent performance trends across runs. All experiments were carried out on a workstation equipped with an NVIDIA GeForce RTX 4090Ti GPU.

4.1.3. Comparison Methods

To compare the performance of the proposed method with existing underwater image enhancement methods, this study evaluated a total of nine image enhancement approaches. Among them, three were model-free underwater image enhancement methods, namely MLLE [57], CBLA [8] and HFM [7]. Two belonged to physical model-driven underwater image enhancement methods, including UDCP [25] and ICSP [11]. Another one was the traditional polarization imaging enhancement method, PDS [58]. The deep learning-based methods covered two general underwater image enhancement approaches, UDNET [59] and SINET [60], as well as one polarization image dehazing method, Polar Dehaze [61]. To ensure fairness of comparison, all comparative methods adopted the optimal parameter configurations reported in their original publications, and the testing procedures were kept consistent with those of the method proposed in this study.

4.2. Qualitative and Quantitative Analysis

4.2.1. Qualitative Analysis

Qualitative evaluation results of the proposed method and nine comparative methods on the generated dataset are presented in Figure 12. Among the comparative methods, UDCP not only fails to eliminate scattering but also causes more severe cyan color cast. CBLA tends to introduce red artifacts. Although the PDS enhancement method effectively improves image sharpness, it completely neglects color information. MLLE and HFM induce yellow noise, with HFM performing more aggressively in enhancing deep blue background regions. ICSP shows limited effectiveness in correcting image color cast. Compared with traditional methods, the deep learning-based UDNET and SINET can better enhance local object contours (e.g., the contours of vehicles in the images), yielding more favorable visual experience. However, their enhancement effects are insignificant in severely scattered regions. Polar Dehaze, a method designed for polarization image dehazing, generates numerous artifacts. This method is suitable for atmospheric dehazing scenarios and thus exhibits obvious limitations in underwater environments with stronger scattering and turbidity. The performance of all methods on the no-reference dataset is illustrated in Figure 13. The proposed method effectively corrects color cast and blurriness in underwater scenes. Clearly disturbed by non-uniform noise, Polar Dehaze introduces additional low-light noise. ICSP improves the overall brightness of images at the cost of detailed information. Benefiting from the incorporation of polarization information, PDS achieves effective contour enhancement yet suffers from severe color information loss, leading to color imbalance. UDNET yields remarkable enhancement effects on seabed backgrounds but exhibits negligible improvement in the quality of target regions within the images. SINET excessively corrects the red channel, which results in the introduction of red artifacts in low-light areas. In contrast, UDCP presents more prominent red artifacts along with brightness loss. HFM shows obvious yellow color cast, which is particularly noticeable in images without yellow target objects. MLLE significantly enhances image contrast but causes local over-enhancement, thus impairing human visual comfort.

4.2.2. Quantitative Analysis

Quantitative evaluation results on the reference dataset and no-reference dataset are presented in Table 2. For evaluation metrics, both full-reference and no-reference metrics are adopted for quantitative assessment. Full-reference metrics include Peak Signal-to-Noise Ratio (PSNR) and Structure Similarity Index Measure (SSIM), while no-reference metrics consist of Natural Image Quality Evaluator (NIQE) for natural image quality and Underwater Color Image Quality Evaluation (UCIQE) for underwater image quality. A higher PSNR value indicates lower image distortion, and a higher SSIM score implies that the restored image is more similar to the ground truth (GT) image in terms of structural and texture features. In contrast, a lower NIQE value corresponds to better reconstructed image quality, whereas the opposite applies to UCIQE. To mitigate training randomness, experiments were repeated with different random seeds, and the reported results are the mean ± std of all metrics, ensuring reliable assessment of model performance and stability. As shown in Table 2, the proposed method achieves competitive PSNR and SSIM performance compared with representative approaches, and it achieves favorable NIQE and UCIQE scores on both datasets. This confirms enhanced visual quality in structure, color, and sharpness and stable performance across diverse underwater environments. Unfortunately, the proposed method does not consistently achieve the highest UCIQE score in some cases, which is mainly attributed to UCIQE’s focus on color contrast and chromatic distribution. In contrast, our framework prioritizes structural restoration and detail enhancement to improve scene visibility and suppress backscattering, so structural fidelity improvement may not always correspond to the highest UCIQE value. These results demonstrate that the images enhanced by the proposed method exhibit more comprehensive performance in sharpness, which is more consistent with the perception mechanism of the human visual system. The performance on the no-reference test dataset also reflects that the proposed method possesses generalization capability across different underwater environments.

4.3. Ablation Study

To verify the individual effectiveness of the key components in the proposed framework for underwater polarization image enhancement, ablation experiments were conducted by removing or replacing several modules in the network. Specifically, four ablation variants were designed: w/o PAN, w/o SFE, w/o MFEB, and w/o Mamba. For the w/o PAN configuration, the four polarization sub-images were directly used as inputs to the frequency-domain Mamba-based polarization enhancement network during training, without the polarization antagonistic processing stage. For w/o SFE, the Salient Feature Extraction (SFE) module was disabled by applying uniform masks (all-black or all-white masks) such that the network no longer relied on the feature selection mechanism provided by SFE during training. For w/o MFEB, the MFEB module proposed in this study is replaced with a simple two-layer Conv–ReLU network to evaluate the contribution of the designed feature extraction structure. For w/o Mamba, the original GAMB (Mamba-based) module is replaced with a lightweight Conv–ReLU block, allowing the influence of the Mamba-based architecture to be assessed while keeping the overall network structure largely consistent.

The qualitative comparison results are shown in Figure 14, and the quantitative evaluation results are summarized in Table 3. Experimental results demonstrate that the complete method framework achieves superior performance in global perception and detail contour preservation. After removing PAN, the absence of polarization antagonistic constraints leads to local overexposure in images, resulting in the loss of details in highlight regions and severe degradation of edge features. When MFEB is removed, the network loses the capability for fine-grained extraction and enhancement of global information, which reduces the de-scattering effect in certain regions and even impairs the method’s ability to eliminate color deviation in some cases. Removing the SFE module weakens the network’s ability to selectively emphasize informative features, which leads to partial pixelated recovery artifacts in certain image regions. In addition, the capability of correcting color deviation on the no-reference dataset becomes relatively weaker. When the Mamba-based module is replaced, the restored images exhibit a noticeable decrease in sharpness, and the color restoration becomes less uniform across different regions. In terms of quality evaluation metrics, the full model attains the optimal performance in both PSNR and SSIM, along with the best results in NIQE and suboptimal results in certain aspects of UCIQE. In conclusion, each module designed in this framework is indispensable for the enhancement of underwater polarization images.

5. Conclusions

Inspired by the multi-dimensional visual perception mechanism of mantis shrimp, this study focuses on the problem of underwater image enhancement in polarization imaging scenarios, aiming to improve the visual quality of underwater targets and scenes under complex scattering conditions. To better exploit polarization cues and enhance structural details, a bio-inspired framework based on a two-stage PAN→PEN architecture is proposed. The PAN is designed to extract complementary information from orthogonal polarization channels and retain polarization-related features, while the PEN further performs reconstruction and enhancement through frequency decomposition and Mamba-based feature modeling to mitigate underwater backscattering. Experimental results on simulated underwater polarization datasets indicate that the proposed framework can provide visually improved restoration results compared with several representative methods, showing effective detail preservation and enhanced scene visibility in challenging underwater conditions.

However, the training and evaluation in this work mainly rely on synthetically generated underwater polarization data, and a gap may still exist between simulated data and real underwater polarization measurements. In addition, the proposed model achieves an inference time of approximately 0.235 s per image on the experimental platform, which may not yet be optimal for real-time applications but remains acceptable for many non-real-time underwater image processing tasks. Future work will focus on further investigating the synthetic-to-real domain gap, analyzing potential failure cases, and exploring model lightweighting strategies to improve computational efficiency while maintaining restoration performance.

Author Contributions

Conceptualization, Q.L. and F.Y.; Methodology, Q.L.; Software, Q.L. and H.Y.; Validation, Q.L., C.C. and H.Y.; Formal Analysis, Q.L. and C.L.; Investigation, Q.L. and Y.H.; Resources, R.L. and F.Y.; Data Curation, Q.L., R.L. and Y.H.; Writing—Original Draft, Q.L. and R.L.; Writing—Review and Editing, Q.L., C.L. and F.Y.; Visualization, Q.L. and Y.H.; Supervision, C.C. and F.Y.; Project Administration, Y.H. and H.Y.; Funding Acquisition, F.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China (General Program) under Grant 62371404.

Data Availability Statement

The data are available in a publicly accessible repository. Cityscapes-DBF Dataset, Name: Semantic Understanding of Urban Street Scenes (URL: https://www.cityscapes-dataset.com/, accessed on 28 October 2025); SUIM Dataset, Name: Minnesota Interactive Robotics and Vision Laboratory Repository (URL: https://irvlab.cs.umn.edu/resources/suim-dataset, accessed on 1 November 2025).

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

PAN	Polarization Adversarial Network
SFE	Salient Feature Extraction
BAM	Bionic Antagonistic Module
PEN	Polarization Enhancement Network
MFEB	Mamba-based Frequency Enhance Block
GAMB	Global Aware Mamba Block

References

Yoerger, D.R.; Jakuba, M.; Bradley, A.M.; Bingham, B. Techniques for deep sea near bottom survey using an autonomous underwater vehicle. In Robotics Research; Thrun, S., Brooks, R., Durrant-Whyte, H., Eds.; Springer Tracts in Advanced Robotics; Springer: Berlin/Heidelberg, Germany, 2007; Volume 28, pp. 416–429. [Google Scholar]
Allotta, B.; Costanzi, R. The ARROWS project: Adapting and developing robotics technologies for underwater archaeology. IFAC-PapersOnLine 2015, 48, 194–199. [Google Scholar] [CrossRef]
Ioannou, G. The ARROWS project: Underwater Inspection and Monitoring: Technologies for Autonomous Operations. IEEE Aerosp. Electron. Syst. Mag. 2024, 39, 4–16. [Google Scholar] [CrossRef]
Zuiderveld, K. Contrast limited adaptive histogram equalization. In Graphics Gems; Glassner, A., Ed.; Morgan Kaufmann: San Francisco, CA, USA, 1994; pp. 474–485. [Google Scholar]
Ancuti, C.O.; Ancuti, C.; De Vleeschouwer, C.; Bekaert, P. Color balance and fusion for underwater image enhancement. IEEE Trans. Image Process. 2018, 27, 379–393. [Google Scholar] [CrossRef]
Zhou, J.; Sun, J.; Zhang, W.; Lin, Z. Multi-view underwater image enhancement method via embedded fusion mechanism. Eng. Appl. Artif. Intell. 2023, 121, 105946. [Google Scholar] [CrossRef]
An, S.; Xu, L.; Deng, Z.; Zhang, H. HFM: A hybrid fusion method for underwater image enhancement. Eng. Appl. Artif. Intell. 2024, 127, 107219. [Google Scholar] [CrossRef]
Jha, M.; Bhandari, A.K. CBLA: Color-balanced locally adjustable underwater image enhancement. IEEE Trans. Instrum. Meas. 2024, 73, 1–11. [Google Scholar] [CrossRef]
Xie, J.; Hou, G.; Wang, G.; Pan, Z. A variational framework for underwater image dehazing and deblurring. IEEE Trans. Circuits Syst. Video Technol. 2022, 32, 3514–3526. [Google Scholar] [CrossRef]
Xiao, F.; Yuan, F.; Huang, Y.; Cheng, E. Turbid Underwater Image Enhancement Based on Parameter-Tuned Stochastic Resonance. IEEE J. Ocean. Eng. 2023, 48, 127–146. [Google Scholar] [CrossRef]
Hou, G.; Li, N.; Zhuang, P.; Li, K.; Sun, H.; Li, C. Non-uniform illumination underwater image restoration via illumination channel sparsity prior. IEEE Trans. Circuits Syst. Video Technol. 2024, 34, 799–814. [Google Scholar] [CrossRef]
Wang, Z.; Shen, L.; Xu, M.; Yu, M.; Wang, K.; Lin, Y. Domain adaptation for underwater image enhancement. IEEE Trans. Image Process. 2023, 32, 1442–1457. [Google Scholar] [CrossRef]
Zhao, C.; Cai, W.; Dong, C.; Hu, C. Wavelet-based fourier information interaction with frequency diffusion adjustment for underwater image restoration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 17–24 June 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 8281–8291. [Google Scholar]
Zhao, C.; Cai, W.; Dong, C.; Zeng, Z. Toward sufficient spatial-frequency interaction for gradient-aware underwater image enhancement. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Seoul, Republic of Korea, 14–19 April 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 3220–3224. [Google Scholar]
Guo, Y.; Li, H.; Zhuang, P. Underwater image enhancement using a multiscale dense generative adversarial network. IEEE J. Ocean. Eng. 2020, 45, 862–870. [Google Scholar] [CrossRef]
Atkinson, G.A.; Hancock, E.R. Polarization-based surface reconstruction for underwater imaging: Physical properties and shape recovery. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 40, 657–670. [Google Scholar]
Daly, I.; How, M.; Partridge, J.; Marshall, N.J.; Cronin, T.W. Dynamic polarization vision in mantis shrimps. Nature Commun. 2016, 7, 12140. [Google Scholar] [CrossRef] [PubMed]
Li, C.-Y.; Guo, J.-C.; Cong, R.-M.; Pang, Y.-W.; Wang, B. Underwater image enhancement by dehazing with minimum information loss and histogram distribution prior. IEEE Trans. Image Process. 2016, 25, 5664–5677. [Google Scholar] [CrossRef]
Fu, X.; Zhuang, P.; Huang, Y.; Liao, Y.; Zhang, X.-P.; Ding, X. A Retinex-based enhancing approach for single underwater image. In Proceedings of the IEEE International Conference on Image Processing (ICIP), Paris, France, 27–30 October 2014; IEEE: Piscataway, NJ, USA, 2014; pp. 4572–4576. [Google Scholar]
Huang, D.; Wang, Y.; Song, W.; Sequeira, J.; Mavromatis, S. Shallow-water image enhancement using relative global histogram stretching based on adaptive parameter acquisition. In Proceedings of the International Conference on Multimedia Modeling (MMM), Bangkok, Thailand, 4–6 January 2018; Springer: Cham, Switzerland, 2018; pp. 453–465. [Google Scholar]
Zhuang, P.; Wu, J.; Porikli, F.; Li, C. Underwater image enhancement with hyper-Laplacian reflectance priors. IEEE Trans. Image Process. 2022, 31, 5442–5455. [Google Scholar] [CrossRef]
Zhang, L.; Peng, T. Underwater image enhancement based on improved adaptive MSRCR and Gamma function. In Proceedings of the 2nd International Conference on Cloud Computing, Big Data Application and Software Engineering (CBASE), Wuhan, China, 24–26 March 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 246–252. [Google Scholar]
McGlamery, B. A computer model for underwater camera systems. In Proceedings of Ocean Optics VI, San Diego, CA, USA, 14–16 October 1980; SPIE: Bellingham, WA, USA, 1980; Volume 208, pp. 221–231. [Google Scholar]
Jaffe, J.S. Computer modeling and the design of optimal underwater imaging systems. IEEE J. Ocean. Eng. 1990, 15, 101–111. [Google Scholar] [CrossRef]
Drews, P.; Nascimento, E.; Moraes, F.; Botelho, S.; Campos, M. Transmission estimation in underwater single images. In Proceedings of the IEEE International Conference on Computer Vision Workshops (ICCV), Sydney, Australia, 1–8 December 2013; IEEE: Piscataway, NJ, USA, 2013; pp. 825–830. [Google Scholar]
Song, W.; Wang, Y.; Huang, D.; Tjondronegoro, D. A Rapid Scene Depth Estimation Model Based on Underwater Light Attenuation Prior for Underwater Image Restoration. In Proceedings of the Pacific-Rim Conference on Multimedia (PCM), Hefei, China, 21–22 September 2018; Springer: Cham, Switzerland, 2018; pp. 678–688. [Google Scholar]
Liu, K.; Liang, Y. Underwater image enhancement method based on adaptive attenuation-curve prior. Opt. Express 2021, 29, 10321–10345. [Google Scholar] [CrossRef]
Chen, R.; Cai, Z.; Cao, W. MFFN: An underwater sensing scene image enhancement method based on multiscale feature fusion network. IEEE Trans. Geosci. Remote Sens. 2022, 60, 4205612. [Google Scholar] [CrossRef]
Yan, H. UW-CycleGAN: Model-driven CycleGAN for underwater image restoration. IEEE Trans. Geosci. Remote Sens. 2023, 61, 4207517. [Google Scholar]
Saini, K.; Bharti, S.; Kumar, A. Enhancement of underwater images using GAN and Refinement network. In Proceedings of the International Conference on Pervasive Computational Technologies (ICPCT), New Delhi, India, 15–17 March 2025; IEEE: Piscataway, NJ, USA, 2025; pp. 588–593. [Google Scholar]
Yang, G.; Kang, G.; Lee, J.; Cho, Y. Joint-ID: Transformer-Based Joint Image Enhancement and Depth Estimation for Underwater Environments. IEEE Sens. J. 2024, 24, 3113–3122. [Google Scholar]
Wang, W.; Zhang, W.; Bai, L.; Ren, P. Metalantis: A Comprehensive Underwater Image Enhancement Framework. IEEE Trans. Geosci. Remote Sens. 2024, 62, 1–19. [Google Scholar] [CrossRef]
Li, C.; Zheng, X.; Xiao, F.; Yuan, F. BRIUIE: A Bio-Retina Inspired Underwater Image Enhancement Framework. IEEE Trans. Circuits Syst. Video Technol. 2025, 35, 10000–10016. [Google Scholar] [CrossRef]
Liang, J.; Ren, L.-Y.; Ju, H.-J.; Qu, E.-S.; Wang, Y.-L. Visibility enhancement of hazy images based on a universal polarimetric imaging method. J. Appl. Phys. 2014, 116, 173107. [Google Scholar] [CrossRef]
Walker, J.G.; Chang, P.C.Y.; Hopcraft, K.I. Visibility depth improvement in active polarization imaging in scattering media. Appl. Opt. 2000, 39, 4933–4941. [Google Scholar] [CrossRef] [PubMed]
Schechner, Y.Y.; Karpel, N. Recovery of underwater visibility and structure by polarization analysis. IEEE J. Ocean. Eng. 2005, 30, 570–587. [Google Scholar] [CrossRef]
Guan, G.; Zhu, I.P.; Tian, H.; Li, J.; Wang, Z. Real-time polarization difference underwater imaging based on Stokes vector. Acta Phys. Sin. 2015, 64, 224203. [Google Scholar] [CrossRef]
Huang, B.; Liu, T.G.; Hu, H.F.; Zhang, Y.; Li, X. Underwater image recovery considering polarization effects of objects. Opt. Express 2016, 24, 9826–9838. [Google Scholar] [CrossRef]
Han, P.L.; Liu, F.; Wei, Y.; Zhang, J.; Li, H. Optical correlation assists to enhance underwater polarization imaging performance. Opt. Lasers Eng. 2020, 134, 106256. [Google Scholar] [CrossRef]
Wang, Z.; Yu, H.; Peng, J. Underwater image recovery method considering target polarization characteristics. In Proceedings of the 4th International Conference on Computer Vision, Image and Deep Learning (CVIDL), Xiamen, China, 25–27 August 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 123–127. [Google Scholar]
Hu, H.F.; Zhang, Y.B.; Li, X.B.; Wang, Z.; Liu, T. Polarimetric underwater image recovery via deep learning. Opt. Lasers Eng. 2020, 133, 106152. [Google Scholar] [CrossRef]
Zhang, R.; Gui, X.; Cheng, H.; Chu, J. Underwater image recovery utilizing polarimetric imaging based on neural networks. Appl. Opt. 2021, 60, 8419–8425. [Google Scholar] [CrossRef]
Qi, P.; Wang, Y.; Li, J.; Zhang, H.; Chen, W. U2R-pGAN: Unpaired underwater-image recovery with polarimetric generative adversarial network. Opt. Lasers Eng. 2022, 157, 107112. [Google Scholar] [CrossRef]
Guan, B.; Wang, Y.; Yin, J.; Li, H.; Zhang, J. Underwater Image Enhancement Method Based on Polarized Images Fusion and Quality Evaluation. In Proceedings of the 24th International Conference on Computer and Information Science (ICIS), Jeju Island, Republic of Korea, 20–22 June 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 127–132. [Google Scholar]
Liu, H.; Wang, Z.; Li, J.; Zhang, H.; Chen, W. PID2Net: A Neural Network for Joint Underwater Polarimetric Images Descattering and Denoising. IEEE Sens. J. 2024, 24, 27803–27814. [Google Scholar] [CrossRef]
Tian, F.; Xue, J.; Shi, Z. Polarimetric image recovery method with domain-adversarial learning for underwater imaging. Sci. Rep. 2025, 15, 3922. [Google Scholar] [CrossRef] [PubMed]
Zhang, S.; Li, R.; Fan, Y. Adaptive partition based underwater polarization image restoration method for complex objects. Sci. Rep. 2025, 15, 40410. [Google Scholar] [CrossRef] [PubMed]
Xu, M.; Sun, X.; Cao, Y.; Zhang, Y.; Gao, X. The visual system characteristics and research progress of mantis shrimp. Opt. Instrum. 2021, 43, 79–88. [Google Scholar]
Marshall, J.; Oberwinkler, J. The colourful world of the mantis shrimp. Nature 1999, 401, 873–874. [Google Scholar] [CrossRef]
Yang, P.; Wu, H.; He, C.; Luo, S. Underwater image restoration for seafloor targets with hybrid attention mechanisms and conditional generative adversarial network. Digit. Signal Process. 2023, 134, 103900. [Google Scholar] [CrossRef]
Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention (MICCAI), Munich, Germany, 5–9 October 2015; Springer: Cham, Switzerland, 2015; pp. 234–241. [Google Scholar]
Zhao, L.; Lu, S.-P.; Chen, T.; Yang, Z.; Shamir, A. Deep symmetric network for underexposed image enhancement with recurrent attentional learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 12075–12084. [Google Scholar]
Sakaridis, C.; Dai, D.; Hecker, S.; Gool, L.V. Model adaptation with synthetic and real data for semantic dense foggy scene understanding. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; Springer: Cham, Switzerland, 2018; pp. 687–704. [Google Scholar]
Xue, C.; Liu, Q.; Huang, Y.; Cheng, E.; Yuan, F. A Dual-Branch Autoencoder Network for Underwater Low-Light Polarized Image Enhancement. Remote Sens. 2024, 16, 1134. [Google Scholar] [CrossRef]
Islam, M.J.; Edge, C.; Xiao, Y.; Luo, P.; Mehtaz, M.; Morse, C.; Enan, S.S.; Sattar, J. Semantic Segmentation of Underwater Imagery: Dataset and Benchmark. In Proceedings of the 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Las Vegas, NV, USA, 24 October–24 January 2021; pp. 1769–1776. [Google Scholar]
Hecht, E. Optics, 5th ed.; Addison Wesley: San Francisco, CA, USA, 2002; Volume 5. [Google Scholar]
Zhang, W.; Zhuang, P.; Sun, H.; Li, G.; Kwong, S.; Li, C. Underwater Image Enhancement via Minimal Color Loss and Locally Adaptive Contrast Enhancement. IEEE Trans. Image Process. 2022, 31, 3997–4010. [Google Scholar] [CrossRef]
Shen, L.; Reda, M.; Zhang, X.; Zhao, Y.; Kong, S.G. Polarization-Driven Solution for Mitigating Scattering and Uneven Illumination in Underwater Imagery. IEEE Trans. Geosci. Remote Sens. 2024, 62, 4202615. [Google Scholar] [CrossRef]
Saleh, A.; Sheaves, M.; Jerry, D.; Azghadi, M.R. Adaptive deep learning framework for robust unsupervised underwater image enhancement. Expert Syst. Appl. 2025, 268, 126314. [Google Scholar] [CrossRef]
Panda, G.; Kundu, S.; Bhattacharya, S.; Routray, A. SINET: Sparsity-driven Interpretable Neural Network for Underwater Image Enhancement. In Proceedings of the ICASSP 2025—2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Tokyo, Japan, 19–23 May 2025; pp. 1–5. [Google Scholar]
Zhou, C.; Teng, M.; Han, Y.; Xu, C.; Shi, B. Learning to Dehaze with polarization. In Proceedings of the 35th International Conference on Advances in Neural Information Processing Systems (NeurIPS), Virtual, 6–14 December 2021; pp. 11487–11500. [Google Scholar]

Figure 1. Eye structure of the peacock mantis shrimp.

Figure 2. Photoreceptor cell details and their microvilli ring structure.

Figure 3. Overall framework of the proposed method.

Figure 4. Details of the saliency module.

Figure 5. Generator network details.

Figure 6. Discriminator network details.

Figure 7. Exchange of high and low subbands between turbid and clear images in wavelet transform domain.

Figure 8. Details of the polarization enhancement network based on frequency domain Mamba design.

Figure 9. Examples of underwater optical images. The figure shows a comparison of natural light imaging and artificial light source imaging in five different scenes, with the top row being natural light imaging and the bottom row being artificial light source imaging. (a) Scene 1; (b) Scene 2; (c) Scene 3; (d) Scene 4; (e) Scene 5.

Figure 10. Underwater polarization dataset examples (from left to right: (a) 0°; (b) 45°; (c) 90°; (d) 135°. Each column corresponds to imaging at a polarization angle. Each row corresponds to different scenes.

Figure 11. Examples of the underwater reference-free polarization dataset, simulating (a) 0°, (b) 45°, (c) 90°, (d) 135°.

Figure 12. Comparison of the proposed method with state-of-the-art methods on the generated reference dataset. (a) Input; (b) MLLE; (c) CBLA; (d) HFM; (e) UDCP; (f) ICSP; (g) PDS; (h) UDNET; (i) SINET; (j) Polar Dehaze; (k) Ours; (l) GT.

Figure 13. Comparison of the proposed method with state-of-the-art methods on the no-reference underwater dataset. (a) Input, (b) MLLE, (c) CBLA, (d) HFM, (e) UDCP, (f) ICSP, (g) PDS, (h) UDNET, (i) SINET, (j) Polar Dehaze, (k) Ours.

Figure 14. Results of ablation experiments. (a) Input, (b) w/o PAN, (c) w/o MFEB, (d) w/o SFE, (e) w/o Mamba, (f) full model. The top two rows are derived from the reference dataset, while the bottom two rows are from the no-reference dataset. The red box indicates the region of interest, and the inset shows a magnified (zoom-in) view of this area.

Table 1. The random parameter values of the dataset.

Parameter		Value Range	Floating Coefficient
Ambient light assumption	R	(0.0, 0.1)	0.1
	G	(0.4, 0.6)	0.1
	B	(0.8, 1.0)	0.1
Scattering coefficient	R	(0.0, 0.1)	0.1
	G	(0.4, 0.6)	0.1
	B	(0.8, 1.0)	0.1
Target degree of polarization		(0.3, 0.8)	0.02
Background degree of polarization		(0.15, 0.2)	0.02
E_A		(0.8, 2.0)	0

Table 2. Quantitative evaluation results of different underwater image enhancement methods (red: best, blue: second, green: third, ↑: higher is better, ↓: lower is better).

		Method
Dataset	Metric	MLLE	CBLA	HFM	UDCP	ICSP	PDS	UDNET	SINET	Polar Dehaze	Ours
Ref	PSNR ↑	12.3984	9.8156	12.3459	10.2754	9.2657	10.8346	15.5976	10.3157	13.4454	28.0548 ± 0.01
	SSIM ↑	0.5293	0.4459	0.5872	0.3922	0.5684	0.4017	0.6041	0.5257	0.6112	0.8924 ± 0.002
	NIQE ↓	3.5745	3.1028	3.5671	3.2763	3.2681	2.9683	2.8316	3.6568	2.8764	2.8735 ± 0.001
	UCIQE ↑	0.5973	0.4726	0.5189	2.0051	1.9560	1.8246	0.7624	0.4873	1.2244	1.9128 ± 0.004
No-ref	NIQE ↓	3.4128	5.6764	3.4128	5.6411	6.8726	7.1385	6.4826	6.9571	4.3126	3.9396 ± 0.003
No-ref	UCIQE ↑	3.1673	2.1219	3.1673	5.9834	3.1927	1.5973	2.4726	2.8416	1.2497	3.9736 ± 0.006

Table 3. Comparisons between the model without polarization adversarial mechanism (w/o PAN), without frequency-domain division mechanism (w/o MFEB), without significant feature extractor (w/o SFE), without Mamba (w/o Mamba) and the full model (red: best, blue: second, ↑: higher is better, ↓: lower is better.).

Dataset	Metric	w/o PAN	w/o MFEB	w/o SFE	w/o Mamba	Full Model
Ref	PSNR ↑	28.0527	27.2094	27.3692	27.5758	28.0548
	SSIM ↑	0.8084	0.8810	0.8897	0.8463	0.8924
	NIQE ↓	2.6043	2.8744	2.8894	2.9932	2.8735
	UCIQE ↑	0.9911	1.2867	0.9683	0.8752	1.0128
No-ref	NIQE ↓	4.0816	3.8472	4.0739	4.2116	3.9396
No-ref	UCIQE ↑	3.9563	3.8647	3.7749	3.8573	3.9736

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Liu, Q.; Li, R.; Li, C.; Chen, C.; Huang, Y.; Yang, H.; Yuan, F. A Method for Underwater Image Enhancement Utilizing Polarization Inspired by the Mantis Shrimp’s Multi-Dimensional Visual Imaging Mechanism. J. Mar. Sci. Eng. 2026, 14, 582. https://doi.org/10.3390/jmse14060582

AMA Style

Liu Q, Li R, Li C, Chen C, Huang Y, Yang H, Yuan F. A Method for Underwater Image Enhancement Utilizing Polarization Inspired by the Mantis Shrimp’s Multi-Dimensional Visual Imaging Mechanism. Journal of Marine Science and Engineering. 2026; 14(6):582. https://doi.org/10.3390/jmse14060582

Chicago/Turabian Style

Liu, Qingyu, Ruixin Li, Congcong Li, Canrong Chen, Yifan Huang, Huayu Yang, and Fei Yuan. 2026. "A Method for Underwater Image Enhancement Utilizing Polarization Inspired by the Mantis Shrimp’s Multi-Dimensional Visual Imaging Mechanism" Journal of Marine Science and Engineering 14, no. 6: 582. https://doi.org/10.3390/jmse14060582

APA Style

Liu, Q., Li, R., Li, C., Chen, C., Huang, Y., Yang, H., & Yuan, F. (2026). A Method for Underwater Image Enhancement Utilizing Polarization Inspired by the Mantis Shrimp’s Multi-Dimensional Visual Imaging Mechanism. Journal of Marine Science and Engineering, 14(6), 582. https://doi.org/10.3390/jmse14060582

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Method for Underwater Image Enhancement Utilizing Polarization Inspired by the Mantis Shrimp’s Multi-Dimensional Visual Imaging Mechanism

Abstract

1. Introduction

2. Related Work

2.1. Underwater Optical Image Enhancement

2.1.1. Model-Free Methods

2.1.2. Model-Based Methods

2.1.3. Data-Driven Methods

2.2. Underwater Polarization Image Enhancement

3. Methods

3.1. Visual Imaging Mechanism of Mantis Shrimp

3.1.1. Visual Information Reception Mechanism

3.1.2. Photosensitive Signal Processing Mechanism

3.2. Overall Framework

3.3. PAN

3.3.1. SFE

3.3.2. BAM

3.4. PEN

3.4.1. Analysis of Frequency Domain Characteristics

3.4.2. Network Structure Design

4. Results

4.1. Experimental Settings

4.1.1. Datasets

4.1.2. Experimental Details

4.1.3. Comparison Methods

4.2. Qualitative and Quantitative Analysis

4.2.1. Qualitative Analysis

4.2.2. Quantitative Analysis

4.3. Ablation Study

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI