Polarization-Enhanced Multi-Target Underwater Salient Object Detection

Song, Jiayi; Zhao, Peikai; Li, Jiangtao; Zhu, Liming; Chew, Khian-Hooi; Chen, Rui-Pin

doi:10.3390/photonics12070707

Open AccessArticle

Polarization-Enhanced Multi-Target Underwater Salient Object Detection

by

Jiayi Song

,

Peikai Zhao

,

Jiangtao Li

,

Liming Zhu

^*,

Khian-Hooi Chew

and

Rui-Pin Chen

^*

Key Laboratory of Optical Field Manipulation of Zhejiang Province, Department of Physics, Zhejiang Sci-Tech University, Hangzhou 310018, China

^*

Authors to whom correspondence should be addressed.

Photonics 2025, 12(7), 707; https://doi.org/10.3390/photonics12070707

Submission received: 7 May 2025 / Revised: 10 July 2025 / Accepted: 11 July 2025 / Published: 12 July 2025

(This article belongs to the Section New Applications Enabled by Photonics Technologies and Systems)

Download

Browse Figures

Versions Notes

Abstract

Salient object detection (SOD) plays a critical role in underwater exploration systems. Traditional SOD approaches encounter notable constraints in underwater image analysis, primarily stemming from light scattering and absorption effects induced by suspended particulate matter in complex underwater environments. In this work, we propose a deep learning-based multimodal method guided by multi-polarization parameters that integrates polarization de-scattering mechanisms with the powerful feature learning capability of neural networks to achieve adaptive multi-target SOD in an underwater turbid scattering environment. The proposed polarization-enhanced salient object detection network (PESODNet) employs a multi-polarization-parameter-guided, material-aware attention mechanism and a contrastive feature calibration unit, significantly enhancing its multi-material, multi-target detection capabilities in underwater scattering environments. The experimental results confirm that the proposed method achieves substantial performance improvements in multi-target underwater SOD tasks, outperforming state-of-the-art models of salient object detection in detection accuracy.

Keywords:

deep learning; polarization imaging; underwater salient object detection

1. Introduction

Salient object detection (SOD) emulates biological visual perception mechanisms to localize and segment the most visually conspicuous regions within images. As a critical component in computer vision systems, SOD enables diverse downstream applications including object tracking [1,2], medical image analysis [3,4], action recognition [5,6], and semantic segmentation [7,8]. Early-stage SOD methodologies mainly relied on hand-crafted feature extraction [9], which incurred excessive temporal and human resource costs. The advent of deep learning architectures has revolutionized this field, with FCN [10] and UNet [11] emerging as fundamental frameworks due to their hierarchical feature learning capabilities, achieving significant performance breakthroughs. Nevertheless, these deep learning-based methods remain constrained to scattering-free RGB imagery, while continuing to struggle in turbid underwater environments where light–particle interactions induce scattering and absorption effects that substantially degrade detection accuracy. This challenge is further compounded in multi-target scenarios, where the superposition of inter-target scattering interference and the differences in target materials notably increase the complexity of SOD, thereby imposing stricter requirements on SOD performance. While the surface plasmon effects of 2D materials have been exploited for micro-scale detection in optical biosensing [12], there is growing interest in leveraging polarization-dependent optical properties for adaptive SOD in complex scattering scenes such as turbid underwater scenes.

Recent advancements in polarization-based dehazing technology have achieved notable progress in turbid underwater environments, as polarimetric imaging provides more information to mitigate the scattering and absorption effects caused by suspended particles [13,14]. Additionally, polarization imaging delivers richer target surface information to improve the detection capacity of material identification through different polarization component evolutions [15,16]. Wang et al. integrated polarization characteristic dispersion with neural networks to enhance target detection in turbid environments [17]. Gao et al. introduced a de-scattering neural network, MTM-Net [18], to improve the performance of underwater imaging based on the physical Mueller matrix dehazing model. Hu et al. proposed a 3D-CNN framework [19] utilizing polarization information to improve the image quality in high turbidity. These investigations demonstrate that the utilization of polarization characteristics can significantly suppress the particulate scattering interference in turbid underwater environments, achieving high-quality de-scattering imaging and object observation in a scattering underwater environment. In a recent study, Yang et al. [20] proposed a lightweight MobileNetV2 backbone integrated with a Structural Feature Learning Module (StFLM) and Semantic Feature Learning Module (SeFLM), Their approach demonstrates promising performance in turbidity conditions by utilizing dual inputs of DoLP and S0. The architecture efficiently balances model efficiency and detection speed through spatial attention mechanisms and channel-wise feature recalibration, showing potential for underwater applications. However, further exploration is needed to incorporate additional polarization information, such as the AoP (angle of polarization), and to develop more targeted network designs, especially to address the challenges of multi-material and multi-target tasks in high-turbidity environments where scattering interference is intensified, inter-target interactions become more complex, and the distinct polarization characteristics of different materials are critical for accurate discrimination.

In this work, we propose a deep learning-based multi-modal polarimetric SOD method that integrates the polarimetric de-scattering mechanisms with the powerful feature learning capabilities of neural networks to achieve adaptive de-scattering multi-target SOD. The proposed polarization enhanced SOD network (PESODNet) innovatively integrates a material-aware channel attention (MACA) module to enhance feature discrimination of multi-material targets, along with a dynamic scattering suppression unit (DSSU) to adaptively mitigate scattering-induced noise. The experimental results demonstrate the excellent performance of the proposed PESODNet across multiple evaluation indicators for multi-target underwater SOD tasks, particularly in highly turbid underwater environments.

2. Physical Basis and Architecture of the Polarization-Enhanced De-Scattering Multi-Target SOD Network

The physical properties and surface structural characteristics of a target are critical determinants for its accurate detection and recognition. The polarized reflectance characteristics of target surfaces are fundamentally characterized by the polarized bidirectional reflectance distribution function model [21]. The relationship between incident and reflected light can be modeled by the Jones matrix [22]:

(\begin{matrix} E_{s}^{r} \\ E_{p}^{r} \end{matrix}) = (\begin{matrix} \cos (φ_{r}) & \sin (φ_{r}) \\ - \sin (φ_{r}) & \cos (φ_{r}) \end{matrix}) (\begin{matrix} r_{s s} & 0 \\ 0 & r_{p p} \end{matrix}) \times (\begin{matrix} \cos (φ_{i}) & - \sin (φ_{i}) \\ \sin (φ_{i}) & \cos (φ_{i}) \end{matrix}) (\begin{matrix} E_{s}^{i} \\ E_{p}^{i} \end{matrix}),

(1)

where the superscripts r and i of E correspond to reflected and incident light, respectively. The subscripts s and p indicate s and p waves with polarization perpendicular and parallel to the incidence plane, respectively. The angle φ_i is defined as the dihedral angle between the incident plane (spanned by the incident light direction and the macro-scale normal vector of the target surface) and the micro-scale incident plane (spanned by the incident light direction and the micro-scale normal vector of the microsurface), whereas the angle φ_r analogously denotes the dihedral angle between the reflected plane (formed by the reflected light direction and the macro-scale normal) and the micro-scale reflected plane (formed by the reflected light direction and the micro-scale normal). The Fresnel reflection coefficients r_ss and r_pp, which characterize the polarization-dependent reflectance at the material interface, are expressed as follows [23]:

r_{s s} = \frac{\cos (θ_{i}) - \sqrt{n^{2} - \sin^{2} (θ_{i})}}{\cos (θ_{i}) + \sqrt{n^{2} - \sin^{2} (θ_{i})}},

(2)

r_{p p} = \frac{n \cos (θ_{i}) - \sqrt{1 - \sin^{2} (θ_{i}) / n^{2}}}{n \cos (θ_{i}) + \sqrt{1 - \sin^{2} (θ_{i}) / n^{2}}},

(3)

where θ_i is the incident polar angle, and n = n₂/n₁, n₁ and n₂ are the refractive indices of the object and medium, respectively. According to the polarimetric reflectance characteristics of the target surface, the angle of polarization (AoP) and degree of linear polarization (DoLP) are mathematically formulated as follows [24,25,26]:

A o P = \frac{1}{2} \arctan (\frac{S_{2}}{S_{1}}) = - φ_{r},

(4)

D o L P = \sqrt{S_{1}^{2} + S_{2}^{2}} / S_{0} = \frac{|r_{s s}^{2} - r_{p p}^{2}|}{r_{s s}^{2} + r_{p p}^{2}},

(5)

where S₀ = I₀ + I₉₀, S₁ = I₀ − I₉₀, and S₂ = 2I₄₅ − (I₀+ I₉₀) are the Stokes parameters, denoting the intensity (S₀), the horizontally and vertically linear polarization components (S₁), and the linear polarization components of 45° and 135° (S₂).

As recognized from Equations (2)–(5), the AoP is determined by the difference between the target surface and microsurface φ_r, and the DoLP variation depends on the ratio of the refractive indices between the object and medium n and the incident angle parameters θ_i. Since n can be regarded as a key criterion for the detection and identification of multiple targets of different materials, the physical properties of detection targets can be characterized by the polarization information parameters, AoP and DoLP, in polarization imaging data. In particular, in a scattering turbid underwater environment, the utilization of polarization characteristics has demonstrated a significant achievement in de-scattering imaging, whereas optical intensity imaging severely degrades with the obscured detailed information of the target [27].

Herein, a polarization-enhanced SOD network (PESODNet) is constructed with an encoder–decoder structure, as shown in Figure 1a. The encoder leverages MobileNetV3 (MN) [28] as the backbone to extract multi-modal optical features. The polarization images AoP, DoLP, and the RGB image S0 are adopted as the inputs to the network. The RGB stream captures the color and brightness features, and the polarization streams extract the complementary polarization features. MobileNetV3 (MN) extracts features at five distinct levels for each stream. The Pyramid Pooling Module (PPM) [29] is added at the end of each stream, allowing the network to capture both local and global context information. The MACA module fuses features from the AoP and DoLP streams at multiple levels. By leveraging the differences of polarization characteristics reflected by various materials, the MACA module generates material encoding to enhance the network’s feature responsiveness for the identification of materials such as metals, plastics, and wood. Subsequently, the DSSU dynamically suppresses scattering noise by integrating the polarization de-scattering function and S0-guided Sobel gradient operators [30]. The final saliency map is obtained through the decoder with cascaded CBR modules.

By leveraging the differences in polarization properties reflected by different materials, the MACA module generates material encodings to enhance the network’s feature response to various materials. The MACA module involves input feature preprocessing, material encoding generation, and material-aware response enhancement, as shown in Figure 1b. Specifically, the MACA module receives feature inputs from the AoP and DoLP streams and employs global average pooling operations to extract global statistical information, AoP_avg and DoLP_avg. After concatenation, the AoP_avg and DoLP_avg are fed into a lightweight multi-layer perceptron to generate the material encoding vector. Material-aware channel attention weights are obtained by applying the sigmoid function to the material encoding vector:

ω_{c} = S i g m o i d (M L P (C o n c a t (D o L P_{a v g}, A o P_{a v g}))) .

(6)

The CBR (Convolution + BatchNorm + Relu) module projects DoLP and AoP into higher-dimensional feature spaces, which then undergo channel-wise multiplication with the weights derived from Equation (6), thereby generating feature representations F_mp with reinforced material perception sensitivity.

F_{m p} = C B R (C o n c a t (D o L P, A o P)) ⊙ ω_{c} .

(7)

The DSSU module adaptively suppresses scattering noise by recalibrating material-aware response features F_mp through S0-guided local gradient information, as shown in Figure 1c. Specifically, F_mp and S0 are fused along the channel dimension to constitute the feature F_raw pending recalibration. The spatial scattering levels are subsequently estimated via a lightweight CNN architecture:

P_{s c a t t e r} = S i g m o i d (C o n v_{1 \times 1} (F_{r a w})),

(8)

where the Conv_1×1 reduces channel depth to 1, with its output constrained to the [0, 1] interval via sigmoid activation.

Simultaneously, the Sobel threshold in the DSSU is treated as a hyperparameter and selected through extensive experiments across different turbidity levels (0–72 NTU), with the optimal setting of 0.2 validated to maintain consistent performance. The gradient feature G_S₀ obtained from Sobel gradient operators is subsequently subjected to normalization and non-linear transformation to compute the gradient compensation coefficient:

α = R e l u (G_{S 0} / m a x (G_{S 0})),

(9)

where α∈[0, 1] approaches 1 in high-gradient regions (object edges) and converges to 0 in low-gradient regions (scattering background areas).

The DSSU ultimately modulates feature weights via a dynamic suppression mechanism that synergistically combines spatial scattering intensity estimates S0 and gradient compensation coefficients α, as formalized by the following:

F_{r c} = F_{r a w} \cdot (1 - P_{s c a t t e r}) + F_{r a w} ⊙ α \cdot P_{s c a t t e r} .

(10)

The MACA module fuses polarization features (AoP/DoLP), generating material encoding through an MLP, and applying channel attention weights to reinforce material-aware features by the physical relationship between the DoLP and material feature (see Equations (2)–(5)). The DSSU integrates these fused features with S₀-guided Sobel gradients to estimate scattering levels, dynamically suppressing noise while preserving edges through a gradient compensation coefficient. Through the collaborative fusion mechanism of the MACA and DSSU modules, polarization characteristics enhance RGB features by boosting the model’s material recognition capability and anti-scattering ability. The decoder receives the multi-scale features from the DSSU module via skip connections and employs multiple layers of CBR blocks for upsampling, enabling effective cross-layer feature fusion. This process effectively preserves high-level semantic information while integrating the structural details of lower layers, thereby enhancing the integrity and richness of the features. Through this stepwise refinement process, the decoder reconstructs high-quality outputs from abstract features to final predictions, ensuring precise alignment with ground truth labels and optimizing end-to-end task performance.

The framework outputs the final saliency map y′. The standard binary cross-entropy loss L_bce [31] is utilized for optimization:

L_{b c e} = - \frac{1}{N} \sum_{1}^{N} [G * \log (s i g m o i d (y^{'})) + (1 - G) * (1 - s i g m o i d (y^{'}))],

(11)

where L_bce provides efficient optimization and effective binary classification. It automatically balances class weights when saliency/non-saliency regions are imbalanced. The GT (Ground Truth) refers to a precisely annotated binary mask where each spatial position within the image coordinate system is numerically encoded as 1 for pixels belonging to salient foreground objects and 0 for background regions.

3. Experiment Results and Discussion

The underwater imaging experiments were conducted for the multi-target SOD, as illustrated in Figure 2. A random number of targets were positioned within a transparent container (Acrylic tank, 65 × 30 × 40 cm³) filled with water. The scattering environment under varying levels of turbidity was generated by adding different amounts of skim milk to the experimental container. Nephelometric Turbidity Units (NTU) were employed as the metric for quantifying the degree of scattering in turbid underwater environments [32]. The reflected light from the target object is captured by a commercial focal-plane division polarization camera (LUCID, PHX050S-QC, Lucid Vision Labs Inc., Richmond, BC, Canada). The polarization camera has a pixel number of 2048 × 2448, and its pixel array consists of macroblocks with four different polarization components (90°, 45°, 135°, and 0° linear polarization components). This configuration, common in polarization imaging systems, is essential for computing the Stokes parameters and deriving key polarization metrics (AoP and DoLP) via Equations (3)–(5), thereby enabling the comprehensive extraction of polarization information required for material discrimination and scattering suppression in underwater environments [17,18]. Since the linear polarization components along four angles (0°, 45°, 90°, and 135°) can be directly detected by the polarization-sensitive pixel array of the commercial focal-plane division polarization camera used in our experiments, the four linear polarization components were adopted to calculate the AoP and DoLP. A total of 1000 groups of polarization images with various turbidity were collected as the original dataset, and each group included an intensity image S0, an AoP/DoLP image computed via Equations (2)–(5), and a ground truth (GT) image with manual annotations, as shown in Figure 3. The dataset was augmented through a combination of geometric transformations (random cropping, rotation, and horizontal/vertical flipping), intensity adjustments (brightness and contrast variations), and noise injection (Gaussian blur and Poisson noise), and subsequently partitioned into training and validation subsets at an 8:2 ratio. Finally, the training dataset of 200,000 images with 448 × 448 pixel resolution was obtained. Training was conducted using an AdamW optimizer with an initial learning rate of 1 × 10⁻⁴, scheduled via a polynomial decay strategy (‘poly’ LR scheduler) over 600 epochs with a batch size of 16, conducted on NVIDIA RTX 3090 Ti GPUs.

The comparison and ablation experiments were conducted to verify the validity of the proposed PESODNet, as shown in Figure 4 with distinct turbidity environments (0 NTU, 36 NTU, and 72 NTU). In comparison experiments, the proposed method was compared with classical networks (UNet and FCN) trained exclusively on S0. In the ablation experiments, four PESODNet variants were compared without relevant modules: (1) Baseline (S0 stream only), (2) PES-M (without MACA), and (3) PES-D (without DSSU), to systematically evaluate the contributions of the core modules and multi-stream architecture.

In the comparison study, the PESODNet demonstrates superior segmentation accuracy compared to that of intensity-based methods (FCN and UNet) across varying turbidity conditions (0 NTU, 36 NTU, and 72 NTU), as shown in Figure 4. In clear water (0 NTU), while FCN and UNet show generally acceptable object detection, the intensity-based methods (FCN and UNet) exhibit subtle edge inaccuracies. At this turbidity level, minimal scattering interference in clear water reduces reliance on the PESODNet’s polarization-based mechanisms (i.e., the dynamic scattering suppression unit (DSSU) and material-aware channel attention (MACA) modules), thus the visual discrepancy among the methods remains minimal. However, the PESODNet still demonstrates advantages with respect to fine-grained detail, such as sharper target boundaries and fewer segmentation voids, as shown in Figure 4a. In moderate turbidity (36 NTU), the incomplete segmentation of multiple objects, such as the appearance of black holes (unsegmented regions), appears with the FCN method, and a large area of segmentation failure at the edge of the image occurs with UNet. Under extreme scattering (72 NTU), both FCN and UNet exhibit unacceptable segmentation results with severely incomplete edges and failed segmentation boundaries for all objects.

The ablation study evaluated three variants of the PESODNet: the baseline, PES-M, and PES-D. In clear water (0 NTU), all variants maintain relatively clear segmentation edges, and the PESODNet achieves higher edge precision. In moderate turbidity (36 NTU), the PES-M and PES-D exhibit edge blurring and minor internal voids, while the baseline shows more severe degradation. In extreme 72 NTU turbidity, the PES-M and PES-D produce serrated edges and structural separations, whereas the baseline suffers from large-area segmentation failures, as shown in Figure 4.

In contrast, the PESODNet maintains structural integrity in its segmentation results across varying turbidity conditions, with higher edge consistency and better overlap with ground-truth boundaries, as shown in Figure 4. These results validate the PESODNet’s superiority in multi-target SOD under turbid underwater environments. The polarization feature fusion and scattering suppression mechanisms of the PESODNet enable stable segmentation performance and strong robustness even under highly scattering interference.

To quantify the performance of these methods, Mean Absolute Error [33] (MAE), S-measure [34] (S), maximum F-measure [35] (F), and maximum E-measure [36] (E) were adopted as quantitative performance evaluation metrics. The MAE was used to evaluate the pixel-wise average error between the predicted image P and the GT:

M A E = \frac{1}{H \times W} \sum_{i = 1}^{H} \sum_{j = 1}^{W} |P (i, j) - G T (i, j)|

, with lower values indicating superior model performance. The S-measure was used to calculate the structural similarity between the predicted image P and the GT:

S = (1 - α) \times S_{r} + α \times S_{o}

, where S_r is the region-based structural similarity, S_o is the object-based structural similarity, and α is empirically set to 0.5. The F-measure is the weighted harmonic mean of precision and recall, which was used to comprehensively evaluate the model’s performance:

F_{β} = \frac{(1 + β^{2}) \times \Pr e c i s i o n \times Re c a l l}{β^{2} \times (\Pr e c i s i o n + Re c a l l)}

, where the value of β is empirically set to 0.3. The E-measure is an evaluation metric based on the enhanced alignment mapping between the predicted and true maps:

E_{φ} = \frac{1}{W H} \sum_{i = 1}^{W} \sum_{i = 1}^{H} φ F M (i, j)

,where φFM is the enhanced alignment matrix [36]. Higher values of S, E, and F measure indicate better model performance. The corresponding evaluation metrics for various SOD methods are shown in Table 1.

The experimental results demonstrate the consistent performance superiority of the PESODNet architecture across all tested turbidity conditions. The ablation study reveals a performance hierarchy among the different network variants. The removal of either key module (MACA or DSSU) leads to measurable performance degradation, as shown in Table 1. Notably, the PES-M and PES-D variants still maintain a competitive advantage over traditional networks, particularly in high-turbidity scenarios. To investigate the interplay between the MACA and DSSU modules, we introduce the variant PES-MD, which excludes both modules. The PES-MD performs worse than the PESODNet, PES-M, and PES-D, yet outperforms traditional baselines such as FCN and UNet. This confirms the synergistic effect of the MACA and DSSU modules: MACA enhances material-related features while mitigating the impact of scattering-induced noise, thereby reducing the burden on the DSSU and in turn, the DSSU’s noise suppression reinforces the MACA module’s feature responses. Through the dual mechanisms of feature enhancement and noise suppression, the PESODNet achieves superior performance and robustness. Comparative experiments show that the PESODNet architecture exhibits remarkable robustness with increasing water turbidity, while other methods show progressive degradation, as shown in Table 1. The performance gap between the PESODNet and conventional networks (UNet and FCN) widens substantially under challenging conditions, demonstrating the effectiveness of specialized design for the polarization feature fusion and scattering suppression mechanisms of the PESODNet. These results collectively validate the synergistic operation of the PESODNet’s core components and its superior capability in handling SOD in turbid underwater environments compared to existing approaches. This validates that polarization features provide a critical foundation for robust detection in turbid environments by mitigating the scattering-induced noise that undermines traditional RGB-based approaches. Our approach innovatively integrates the physics model of polarization parameter-dependent material recognition with deep learning technology to address underwater multi-target salient object detection, utilizing specialized modules to suppress scattering effects.

The inference time, the number of model parameters, and floating-point operations (FLOPs) were adopted as the key metrics for evaluating the efficiency of deep neural networks. The MobileNetV3 backbone serves as a lightweight foundation, with the MACA module adding approximately 0.2 M parameters and 0.05 GFLOPs via global pooling, a compact MLP, and channel attention, while the DSSU contributes ~0.15 M parameters and 0.03 GFLOPs through 1 × 1 convolutions and gradient operations. On an NVIDIA RTX 3090, the PESODNet achieves around 28 FPS for 448 × 448 images, outperforming UNet.

At the current stage, we focus on validating the feasibility of the polarization-based anti-scattering enhanced SOD mechanism in a controlled lab environment. Complex factors like dynamic lighting and organic particles in natural underwater scenes remain to be addressed in future work. Dynamic lighting, such as sunlight attenuation, may lead to less accurate scattering noise suppression in the DSSU due to intensity fluctuations in the S₀ channel. However, the polarization parameters (AoP and DoLP) employed in the PESODNet are less sensitive to such brightness variations, as they characterize material properties rather than intensity. Organic particles in natural waters exhibit more complex scattering properties than the milk particles in our controlled setup. However, the multi-polarization parameter guidance in the PESODNet, specifically the adaptive feature fusion of the AoP and DoLP in the MACA module, mitigates such impacts. Since the MACA module encodes material differences based on polarization characteristics, it can capture the distinct polarization signatures of multi-material targets even under complex scattering. Additionally, the dynamic suppression mechanism in the DSSU adapts to varying scattering levels through S₀-guided gradients, partially compensating for model mismatches caused by organic particle complexity and maintaining discriminability for multi-material targets. Depth-dependent color shifts affect the color consistency of the S0 channel, however, the polarization parameters AoP and DoLP help the model maintain stable detection accuracy. Their material-encoding properties are independent of spectral variations. Future research will further address the complexities of real-world underwater environments, such as dynamic lighting, organic particulate scattering, and depth-related color shifts, to enhance the model’s stability in practical underwater scenarios.

4. Conclusions

We propose a deep learning-based multi-modal method guided by multi-polarization parameters that integrates the polarization de-scattering mechanisms with the adaptive-feature learning capabilities of neural networks to achieve adaptive de-scattering SOD. The proposed polarization-enhanced SOD network (PESODNet) enhances multi-target SOD in underwater scattering environments by combining a multi-polarization-parameter-guided material-aware attention mechanism and a dynamic scattering suppression unit, effectively addressing the task of multi-target SOD. The experimental results confirm that the proposed method exhibits significant enhancements in detection accuracy and environmental robustness compared to state-of-the-art networks, especially under highly turbid underwater conditions. These results provide a novel approach for multi-target SOD and associated tasks in turbid underwater environments.

Author Contributions

Conceptualization, R.-P.C.; methodology, R.-P.C. and J.S.; validation, R.-P.C. and L.Z.; investigation, J.S., P.Z. and R.-P.C.; data curation, J.S., P.Z., J.L., L.Z. and R.-P.C.; supervision, R.-P.C.; writing—original draft preparation, J.S.; writing—review and editing, R.-P.C. and K.-H.C. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China under Grant 12474300 and 62405281.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Hong, S.; You, T.; Kwak, S.; Han, B. Online Tracking by Learning Discriminative Saliency Map with Convolutional Neural Network. In Proceedings of the 32nd International Conference on Machine Learning, Lille, France, 6 July 2015. [Google Scholar]
Zhou, Z.; Pei, W.; Li, X.; Wang, H.; Zheng, F.; He, Z. Saliency-Associated Object Tracking. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV 2021), Montreal, QC, Canada, 11–17 October 2021; IEEE: New York, NY, USA, 2021. [Google Scholar]
Wu, Y.-H.; Gao, S.-H.; Mei, J.; Xu, J.; Fan, D.-P.; Zhang, R.-G.; Cheng, M.-M. JCS: An Explainable COVID-19 Diagnosis System by Joint Classification and Segmentation. IEEE Trans. Image Process. 2021, 30, 3113–3126. [Google Scholar] [CrossRef] [PubMed]
Andrushia, A.D.; Sagayam, K.M.; Hien, D.; Pomplun, M.; Quach, L. Visual-Saliency-Based Abnormality Detection for MRI Brain Images-Alzheimer’s Disease Analysis. Appl. Sci. 2021, 11, 9199. [Google Scholar] [CrossRef]
Rapantzikos, K.; Avrithis, Y.; Kollias, S. Dense Saliency-Based Spatiotemporal Feature Points for Action Recognition. In Proceedings of the CVPR: 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; IEEE: New York, NY, USA, 2009. [Google Scholar]
Wang, X.; Qi, C. Detecting Action-Relevant Regions for Action Recognition Using a Three-Stage Saliency Detection Technique. Multimed. Tools Appl. 2020, 79, 7413–7433. [Google Scholar] [CrossRef]
Hoyer, L.; Munoz, M.; Katiyar, P.; Khoreva, A.; Fischer, V. Grid Saliency for Context Explanations of Semantic Segmentation. In Proceedings of the Advances in Neural Information Processing Systems 32 (NIPS 2019), La Jolla, CA, USA, 8–14 December 2019. [Google Scholar]
Zeng, Y.; Zhuge, Y.; Lu, H.; Zhang, L. Joint Learning of Saliency Detection and Weakly Supervised Semantic Segmentation. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV 2019), Seoul, Republic of Korea, 27 October–2 November 2019; IEEE: New York, NY, USA, 2019. [Google Scholar]
Itti, L.; Koch, C.; Niebur, E. A Model of Saliency-Based Visual Attention for Rapid Scene Analysis. IEEE Trans. Pattern Anal. Mach. Intell. 1998, 20, 1254–1259. [Google Scholar] [CrossRef]
Shelhamer, E.; Long, J.; Darrell, T. Fully Convolutional Networks for Semantic Segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 640–651. [Google Scholar] [CrossRef] [PubMed]
Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015, Cham, Switzerland, 5–9 October 2015. [Google Scholar]
Lei, Z.-L.; Guo, B. 2D Material-Based Optical Biosensor: Status and Prospect. Adv. Sci. 2022, 9, 2102924. [Google Scholar] [CrossRef] [PubMed]
Zhao, Y.; He, W.; Ren, H.; Li, Y.; Fu, Y. Polarization Descattering Imaging through Turbid Water without Prior Knowledge. Opt. Lasers Eng. 2022, 148, 106777. [Google Scholar] [CrossRef]
Liu, F.; Han, P.; Wei, Y.; Yang, K.; Huang, S.; Li, X.; Zhang, G.; Bai, L.; Shao, X. Deeply Seeing through Highly Turbid Water by Active Polarization Imaging. Opt. Lett. 2018, 43, 4903–4906. [Google Scholar] [CrossRef] [PubMed]
Liu, K.; Liang, Y. Enhancement of Underwater Optical Images Based on Background Light Estimation and Improved Adaptive Transmission Fusion. Opt. Express 2021, 29, 28307–28328. [Google Scholar] [CrossRef] [PubMed]
Nunes-Pereira, E.J.; Peixoto, H.; Teixeira, J.; Santos, J. Polarization-Coded Material Classification in Automotive LIDAR Aiming at Safer Autonomous Driving Implementations. Appl. Opt. 2020, 59, 2530–2540. [Google Scholar] [CrossRef] [PubMed]
Wang, G.; Gao, J.; Xiang, Y.; Li, Y.; Chew, K.-H.; Chen, R.-P. Deep Learning-Driven Underwater Polarimetric Target Detection Based on the Dispersion of Polarization Characteristics. Opt. Laser Technol. 2024, 174, 110549. [Google Scholar] [CrossRef]
Gao, J.Y.; Wang, G.; Chen, Y.; Wang, X.; Li, Y.; Chew, K.; Chen, R. Mueller Transform Matrix Neural Network for Underwater Polarimetric Dehazing Imaging. Opt. Express 2023, 31, 27213–27222. [Google Scholar] [CrossRef] [PubMed]
Liu, H.; Li, X.; Cheng, Z.; Liu, T.; Zhai, J.; Hu, H. Polarization Maintaining 3-D Convolutional Neural Network for Color Polarimetric Images Denoising. IEEE Trans. Instrum. Meas. 2023, 72, 1–9. [Google Scholar] [CrossRef]
Yang, X.; Li, Q.; Yu, D.; Gao, Z.; Huo, G. Polarization Spatial and Semantic Learning Lightweight Network for Underwater Salient Object Detection. J. Electron. Imaging 2024, 33, 33010–33017. [Google Scholar] [CrossRef]
Zhang, Y.; Zhang, Y.; Zhao, H.; Wang, Z. Improved Atmospheric Effects Elimination Method for pBRDF Models of Painted Surfaces. Opt. Express 2017, 25, 16458–16475. [Google Scholar] [CrossRef] [PubMed]
Priest, R.; Germer, T. Polarimetric BRDF in the Microfacet Model: Theory and Measurements. Proc. 2000 Mil. Sens. Symp. Spec. Group Passiv. Sens. 2002, 1, 169–181. [Google Scholar]
Butler, S.D.; Nauyoks, S.E.; Marciniak, M.A. Comparison of Microfacet BRDF Model Elements to Diffraction BRDF Model Elements. In Proceedings of the Algorithms and Technologies for Multispectral, Hyperspectral, and Ultraspectral Imagery XXI, Baltimore, MD, USA, 21 May 2015; SPIE: Pune, India, 2015. [Google Scholar]
Sun, R.; Sun, X.; Chen, F.; Pan, H.; Song, Q. An Artificial Target Detection Method Combining a Polarimetric Feature Extractor with Deep Convolutional Neural Networks. Int. J. Remote Sens. 2020, 41, 4995–5009. [Google Scholar] [CrossRef]
Li, Y.; Wang, S. Underwater Object Detection Technology Based on Polarization Image Fusion. In Proceedings of the 5th International Symposium on Advanced Optical Manufacturing and Testing Technologies: Optoelectronic Materials and Devices for Detector, Imager, Display, and Energy Conversion Technology, Dalian, China, 22 October 2010; SPIE: Pune, India, 2010. [Google Scholar]
Qiu, T.; Zhang, Y.; Li, J.; Yang, W. Target Information Enhancement Using Polarized Component of Infrared Images. In Proceedings of the International Symposium on Optoelectronic Technology and Application 2014: Infrared Technology and Applications, Beijing, China, 20 November 2014; SPIE: Pune, India, 2014. [Google Scholar]
Ren, Q.; Xiang, Y.; Wang, G.; Gao, J.; Wu, Y.; Chen, R.-P. The Underwater Polarization Dehazing Imaging with a Lightweight Convolutional Neural Network. Optik 2022, 251, 168381. [Google Scholar] [CrossRef]
Howard, A.; Sandler, M.; Chen, B.; Wang, W.; Chen, L.-C.; Tan, M.; Chu, G.; Vasudevan, V.; Zhu, Y.; Pang, R.; et al. Searching for MobileNetV3. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019. [Google Scholar]
Zhao, H.; Shi, J.; Qi, X.; Wang, X.; Jia, J. Pyramid Scene Parsing Network. In Proceedings of the CVPR: 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; IEEE: New York, NY, USA, 2017. [Google Scholar]
Chien, Y. Pattern Classification and Scene Analysis. IEEE Trans. Autom. Control. 1974, 19, 462–463. [Google Scholar] [CrossRef]
Lin, Z.; Pan, J.; Zhang, S.; Wang, X.; Xiao, X.; Huang, S.; Xiao, L.; Jiang, J. Understanding the Ranking Loss for Recommendation with Sparse User Feedback. In Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Barcelona, Spain, 24 August 2024; Association for Computing Machinery: New York, NY, USA, 2024. [Google Scholar]
Liu, F.; Zhang, S.; Han, P.; Chen, F.; Zhao, L.; Fan, Y.; Shao, X. Depolarization Index from Mueller Matrix Descatters Imaging in Turbid Water. Chin. Opt. Lett. 2022, 20, 022601. [Google Scholar] [CrossRef]
Perazzi, F.; Krähenbühl, P.; Pritch, Y.; Hornung, A. Saliency Filters: Contrast Based Filtering for Salient Region Detection. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA, 16–21 June 2012. [Google Scholar]
Fan, D.-P.; Cheng, M.-M.; Liu, Y.; Li, T.; Borji, A. Structure-Measure: A New Way to Evaluate Foreground Maps. In Proceedings of the 2017 IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017. [Google Scholar]
Achanta, R.; Hemami, S.; Estrada, F.; Susstrunk, S. Frequency-Tuned Salient Region Detection. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009. [Google Scholar]
Fan, D.-P.; Gong, C.; Cao, Y.; Ren, B.; Cheng, M.-M.; Borji, A. Enhanced-Alignment Measure for Binary Foreground Map Evaluation. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, Stockholm, Sweden, 13–19 July 2018. [Google Scholar]

Figure 1. (a) Schematic of the proposed PESODNet; (b) structure of MACA; and (c) structure of DSSU.

Figure 2. Experiment setup of multi-target underwater SOD based on polarization imaging.

Figure 3. Intensity (S0), AoP, DoLP, and ground truth (GT) images of multi-target underwater polarization SOD dataset with different turbidity: (a) 0NTU; (b) 36NTU; and (c) 72NTU.

Figure 4. Comparison of detection results with different detection methods in various turbid underwater environments: (a) 0NTU; (b) 36NTU; and (c) 72NTU.

Table 1. The performance of different methods in multiple metrics.

	0NTU				36NTU				72NTU
	MAE	S	F	E	MAE	S	F	E	MAE	S	F	E
FCN	0.092	0.715	0.675	0.825	0.098	0.695	0.648	0.802	0.115	0.645	0.595	0.785
UNet	0.072	0.825	0.805	0.898	0.078	0.792	0.767	0.872	0.095	0.725	0.700	0.825
Baseline	0.063	0.815	0.795	0.895	0.068	0.782	0.752	0.872	0.088	0.715	0.675	0.825
PES-MD	0.037	0.912	0.902	0.945	0.041	0.902	0.892	0.932	0.050	0.875	0.855	0.915
PES-D	0.031	0.928	0.918	0.958	0.037	0.912	0.902	0.942	0.045	0.885	0.865	0.925
PES-M	0.029	0.932	0.922	0.962	0.034	0.922	0.912	0.952	0.042	0.895	0.875	0.935
PESODNet	0.014	0.965	0.955	0.982	0.017	0.952	0.942	0.972	0.022	0.935	0.925	0.965

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Song, J.; Zhao, P.; Li, J.; Zhu, L.; Chew, K.-H.; Chen, R.-P. Polarization-Enhanced Multi-Target Underwater Salient Object Detection. Photonics 2025, 12, 707. https://doi.org/10.3390/photonics12070707

AMA Style

Song J, Zhao P, Li J, Zhu L, Chew K-H, Chen R-P. Polarization-Enhanced Multi-Target Underwater Salient Object Detection. Photonics. 2025; 12(7):707. https://doi.org/10.3390/photonics12070707

Chicago/Turabian Style

Song, Jiayi, Peikai Zhao, Jiangtao Li, Liming Zhu, Khian-Hooi Chew, and Rui-Pin Chen. 2025. "Polarization-Enhanced Multi-Target Underwater Salient Object Detection" Photonics 12, no. 7: 707. https://doi.org/10.3390/photonics12070707

APA Style

Song, J., Zhao, P., Li, J., Zhu, L., Chew, K.-H., & Chen, R.-P. (2025). Polarization-Enhanced Multi-Target Underwater Salient Object Detection. Photonics, 12(7), 707. https://doi.org/10.3390/photonics12070707

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Polarization-Enhanced Multi-Target Underwater Salient Object Detection

Abstract

1. Introduction

2. Physical Basis and Architecture of the Polarization-Enhanced De-Scattering Multi-Target SOD Network

3. Experiment Results and Discussion

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI