3.1. Experimental Setup
This experiment utilizes a polarization camera, the MER2-550-POL, which is equipped with a global exposure Sony IMX264MZR CMOS sensor chip. The camera is capable of simultaneously capturing polarization images at angles of 0°, 45°, 90°, and 135°, with a resolution of 2448 × 2048 pixels. It employs a USB3.0 interface for image data transfer, and the third party software Galaxy SDK is used to configure the camera and acquire the corresponding four angle images.
For the validation of the fusion method proposed in this paper, we selected self-collected images, two public underwater image datasets: U2PNet [
35], UPBD [
36]—as the data sources. Meanwhile, eight existing methods were chosen for comparison with our approach. The eight methods are listed in
Table 1, among which method 1 and method 2 are polarization image fusion methods designed for general scenes; method 3 is an underwater polarization image fusion method; and methods 4, 5, and 6 are underwater visible-light enhancement methods; methods 7 and 8 (PAPIF and CPIFuse) are state-of-the-art deep-learning-based polarization image fusion networks, included to comprehensively evaluate our method against contemporary data-driven architectures. This study selects five key metrics from four dimensions—information content, clarity, structure, and fidelity—based on references [
37,
38,
39,
40]. Specifically, entropy (EN) evaluates the information richness of the fused image; standard deviation (SD) measures its overall contrast; average gradient (AG) and edge intensity (EI) quantify detail sharpness and edge-feature preservation, respectively; while spatial frequency (SF) reflects the overall detail activity of the image.
To ensure maximum methodological transparency and facilitate the reproducibility of our proposed framework, the determination strategies for all key hyperparameters are systematically summarized in
Table 2. Rather than relying on rigid empirical constants, our framework determines these parameters either through strict mathematical derivations (e.g., the
rule) or comprehensive quantitative ablation studies. This adaptive and rigorously verified parameter selection mechanism ensures an optimal balance between detail enhancement and noise suppression across diverse underwater scenarios.
3.2. Experimental Results and Analysis
In response to issues such as low contrast and blurred details in underwater images caused by scattering, refraction, and other effects, this study selected objects, including a steel ruler, stones, plastic bottles, agate, and algae balls, to represent metal materials, rough minerals, plastic waste, ceramics, and biological attachments, respectively, for experimental validation. In each scenario, (a,b) denote the intensity image and the degree of linear polarization (DOLP) image, (c–h) correspond to the fused images obtained by the six reference fusion algorithms, (i,j) correspond to the newly added deep-learning baselines (PAPIF and CPIFuse), and (k) represents the fused image generated by the proposed algorithm in this paper.
In
Figure 5, the intensity image appears overall too dark, with edge details being indistinct. Fusing the polarization-degree image yields a result with more balanced detail and brightness. Images (c,d,h) produced are too dark overall, failing to capture fine details; image (f) exhibits severe distortion; the quality of image (e) degrades noticeably, appearing blurred; while image (g) enhances brightness but tends to over-amplify background noise. Regarding the deep-learning methods, PAPIF (i) extracts the salient polarization edges but inadvertently amplifies the background granular noise, exhibiting typical domain-shift artifacts in degraded underwater scenarios. CPIFuse (j) produces a visually soft result, failing to effectively transfer the high-frequency polarization textures of the steel ruler, which leads to blurred edge details. In contrast, the fused image generated by the proposed algorithm (k) achieves the most favorable balance. It strictly preserves the sharp polarization characteristics of the steel ruler while effectively suppressing background speckle noise, yielding clear, natural, and artifact-free details.
In
Figure 6, images (c,d,h) suffer from overall low contrast, blurred object edges, and substantial loss of detail. Image (e) exhibits severe processing artifacts along with local overexposure. Image (f) introduces noticeable structural distortion during the fusion process. Image (g) improves the contrast of the target region to some extent, but the background still appears relatively flat. Regarding the deep-learning approaches, PAPIF (i) incorrectly treats the ubiquitous underwater speckle noise as salient features, leading to severe granular noise amplification across both the stones and the background. CPIFuse (j), on the other hand, exhibits an over-smoothing effect, losing the critical high-frequency crack details distinctly captured in the DOLP image. In contrast, the fused image generated by the proposed algorithm (k) achieves a more optimal balance among luminance distribution, detail clarity, and structural integrity, effectively highlighting the surface textures and cracks of the stones while maintaining a clean, noise-suppressed background.
In
Figure 7, images (c,d) appear overall grayish, with limited contrast improvement after fusion; the text edges on the bottle body are blurred, and the detail hierarchy is unclear. Image (e) exhibits noticeable over-enhancement of brightness, leading to saturation in highlight areas and partial loss of textual information. Image (f) is generally dark; although it suppresses some background interference, the information in the target region is not fully retained. Image (g) enhances the surface texture and text legibility of the plastic bottle to a certain extent, but is still accompanied by strong background noise and local luminance inhomogeneity. Image (h) strongly reinforces edge information, but simultaneously amplifies noise and artifacts. When evaluating the deep-learning models, PAPIF (i) fails to distinguish between salient text features and underwater scattering, resulting in a severe amplification of granular speckle noise across the entire image. CPIFuse (j) effectively suppresses noise but excessively smooths the image, causing the high-frequency text details on the plastic bottle to become severely blurred and illegible. In contrast, the fused image generated by the proposed algorithm (k) presents the text and pattern structure on the bottle surface more clearly while effectively suppressing underwater scattering and reflection interference, perfectly avoiding both the noise amplification and the over-smoothing artifacts seen in the data-driven approaches.
In
Figure 8, images (c,d,h) yield results with overall insufficient contrast, a dark appearance, and severe loss of detail. Image (e) suffers from significant degradation in quality after processing, with underexposure making the image difficult to discern. Image (f) introduces noticeable geometric distortion or structural deformation during the fusion process, compromising the accuracy of the object’s form. Image (g) achieves relatively better processing results among the traditional methods by preserving some polarization characteristics of the target. However, when examining the deep-learning baselines, PAPIF (i) introduces prominent granular noise into the background, failing to cleanly separate the target’s textures from water scattering. CPIFuse (j) suffers from a severe loss of global contrast, resulting in a completely washed-out appearance that severely obscures the fine concentric banding details of the agate slice. In contrast, the fused image generated by the proposed algorithm (k) achieves the most optimal visual quality. It preserves the polarization characteristics of the target highly effectively, revealing the crisp concentric textures of the agate slice while maintaining a clean, high-contrast background.
In
Figure 9, images (c,d) appear overall dark, with limited contrast improvement after fusion; the edge and internal texture information of the algae ball are not effectively restored. Image (e) exhibits obvious over-enhancement, with excessively high brightness in the background area. Image (f) introduces noticeable artifacts and structural distortion near the target edges. Image (g) enhances the brightness and contrast of the algae ball to some extent, but strong scattering interference remains in the background. Image (h) is generally too dark, causing the target information to be submerged again in the dark background. As for the deep-learning models, PAPIF (i) incorrectly amplifies background scattering as structural features, resulting in a heavily noise-contaminated image that severely degrades the overall visual experience. CPIFuse (j) produces a highly smoothed and excessively darkened result, completely failing to restore the fine, hairy surface textures inherent to the algae ball. In contrast, the fused image generated by the proposed algorithm (k) clearly presents the contour structure and surface texture characteristics of the algae ball, achieving the highest visual fidelity while maintaining a remarkably clean background.
From a visual perspective, the proposed method achieves more stable fusion performance under complex underwater imaging conditions. While avoiding issues such as excessive darkness, overexposure, and structural distortion, the method effectively enhances the contrast and detail clarity of target regions, and significantly suppresses underwater scattering and fusion artifacts. Furthermore, the method preserves the polarization characteristics of targets effectively, making object contours, textures, and textual information more distinct and recognizable. Overall, the visual quality outperforms that of the comparative methods.
To objectively evaluate the image quality obtained by several algorithms,
Table 3,
Table 4,
Table 5,
Table 6 and
Table 7 present the objective evaluation metrics for the five different scenes, respectively.
As can be seen from the above tables, the fusion images obtained by our proposed algorithm rank first or second in most evaluation metrics. Meanwhile, to validate the accuracy and applicability of our algorithm for enhanced images in underwater scenarios, we conducted experiments using two publicly available underwater datasets from the Internet, U2PNet and UPBD, as detailed in
Table 8 and
Table 9. It can be observed that when compared with underwater algorithms 3 to 8, our algorithm consistently achieves superior metrics. The excessively high AG and SF values obtained by algorithms 1 and 2 are attributed to the severe distortion in their resulting images, which leads to amplified edges.
To further objectively validate the algorithm’s performance in detail recovery and noise suppression, we evaluated a standard resolution chart scene from the UPBD dataset (
Figure 10). Analysis of the source images reveals their inherent limitations: the low contrast of the intensity image (
) causes dense micro-scales and high-frequency lines to become blurred; while the degree of linear polarization (DOLP) image—despite possessing specific physical edge characteristics—is severely degraded by background speckle noise, which completely submerges effective structural information. When observing the baseline methods, none achieved a satisfactory balance. The fusion results of Algorithms 1 and 2 are overly dark and lack sharpness. Algorithms 3 and 4 performed poorly in noise control; the former amplified background noise into a grainy texture, while the latter lost high-frequency details due to over-smoothing. Furthermore, Algorithm 5 suffers from local overexposure, causing bright lines to blend together, and Algorithm 6 introduced highly unnatural halo artifacts and structural distortions around geometric blocks. Similarly, the deep-learning baselines failed these extremely high-frequency challenges. PAPIF (i) misinterprets speckle noise as features, destroying the chart’s clean background with pervasive granular noise. CPIFuse (j) excessively smooths the dense lines, merging critical micro-scales and thereby defeating the resolution test’s purpose. In contrast, the proposed algorithm (k) achieved the optimal visual balance. By utilizing a multiscale denoising module based on local information entropy, it successfully filtered out polarization noise to present a clean background while maximizing the recovery and sharpening of dense high-frequency lines. Ultimately, the algorithm completely avoids overexposure, over-smoothing, and noise amplification artifacts, demonstrating superior detail protection and visual enhancement capabilities.
To further validate the algorithm’s performance in detail recovery and noise suppression within a complex real-world underwater scene, we selected a complex target scene from the U2PNet scene variation set for comparative analysis, as shown in
Figure 11. Observing the source images, the intensity image (
) suffers from low overall contrast and distinct haziness due to underwater scattering, causing the complex reticular textures on the coral surface to appear blurry. Meanwhile, the degree of linear polarization (DOLP) image is severely contaminated by extreme background speckle noise, which almost completely submerges its physical structural information. Among the compared methods, Algorithms 1 and 2 fail to suppress the polarization noise effectively; their fusion results exhibit a strong grayish graininess and lack sharpness. Algorithm 3 suffers from severe distortion in brightness processing, resulting in massive overexposure (whitewashing) across the image that completely destroys all effective structural information. Algorithm 4 causes overall over-smoothing and a loss of high-frequency details during its denoising process, rendering the image excessively dark. Algorithm 5 exhibits obvious local overexposure, causing the textures in the brighter areas of the coral to blend together and lose depth. Algorithm 6, while increasing contrast, severely compresses shadow details and introduces unnatural structural artifacts around the object edges. The deep-learning baselines similarly fail in this complex environment. PAPIF (i) incorrectly amplifies the speckle noise, burying the coral under severe granular artifacts. CPIFuse (j) applies excessive smoothing, completely washing out the intricate reticular textures. In contrast, the proposed algorithm (k) achieves the optimal visual balance. Its information entropy-based multiscale denoising module not only successfully filters out the highly destructive polarization noise to maintain a smooth background, but also maximizes the recovery and sharpening of the fine textures on the coral surface without any overexposure, over-smoothing, or artifact interference. This comprehensively demonstrates its superior detail preservation and visual enhancement capabilities in real-world underwater environments.
Driven by the imperative to furnish a rigorous and objective appraisal of the algorithm’s proficiency in edge resolution recovery and low-polarization target restoration, this study strategically employs coral reef and resolution chart data from the public U2PNet and UPBD datasets for comparative analysis. The corresponding metric evaluations for the two images are presented in
Table 10 and
Table 11. It is worth noting that although some comparative algorithms (e.g., Algorithms 1, 2, 3, and 5) achieved abnormally high values in metrics such as Average Gradient (AG), Edge Intensity (EI), Spatial Frequency (SF), or Information Entropy (EN), this is actually because these methods failed to effectively filter out highly destructive speckle noise or introduced severe overexposure and halo artifacts during the fusion process. These meaningless noise particles and unnatural pixel fluctuations are mistakenly interpreted by gradient and frequency calculation formulas as “rich edge details,” resulting in artificially inflated objective metrics. This explains why these images exhibit extremely poor subjective visual quality despite performing exceptionally well in certain objective scores. In contrast, by strictly eliminating background noise and completely avoiding overexposure and artifact interference, the proposed algorithm achieves highly balanced and genuine objective metrics, truly realizing the perfect unification of high-quality physical structure recovery and human subjective visual perception.
Experimental results indicate that the proposed algorithm can effectively enhance the detail clarity and edge preservation capability of fused images across different scenarios, while well reflecting the overall structural characteristics. It achieves superior performance in both subjective visual effects and objective evaluation metrics.
3.4. Ablation Study and Parameter Analysis
To comprehensively evaluate the design choices within our proposed framework, this section presents a two-part ablation study. First, we perform a quantitative parameter analysis to determine the optimal values for the filtering scale and enhancement coefficients. Second, we conduct a module-level structural ablation to validate the indispensability of our core architectural innovations.
3.4.1. Parameter Optimization Analysis
To rigorously justify the empirical parameters utilized in our proposed framework, we conducted a quantitative ablation study focusing on two critical variables: the standard deviation of the Gaussian low-pass filter (
) and the detail enhancement coefficient (
). We utilized the representative “Underwater Agate Slice” scene (Scene 4) for this analysis, tracking five objective metrics: Information Entropy (EN), Standard Deviation (SD), Average Gradient (AG), Edge Intensity (EI), and Spatial Frequency (SF). The quantitative results are summarized in
Table 13 and
Table 14.
Selection of Gaussian Filter Standard Deviation (): As observed in the fixed-group (), increasing from 1.0 to 2.0 brings significant structural gains, improving AG by +3.14%, EI by +3.01%, and SF by +5.31%. This indicates that a sufficiently large effectively isolates granular polarization speckle noise into the high-frequency band. However, as continues to increase to 2.5 and 3.0, the improvement in structural metrics enters a stage of diminishing returns (with gains dropping to <). Concurrently, the Information Entropy (IE) decreases monotonically (from 7.4804 down to 7.4647), and larger values inherently require larger Gaussian kernel sizes, directly increasing computational complexity. Consequently, acts as the optimal inflection point, perfectly balancing structural enhancement, information fidelity, and computational overhead.
Selection of Detail Enhancement Coefficient (): The coefficient controls the intensity of injecting DOLP details into the low-frequency component. In the fixed- group (), setting yields the maximum values for AG, EI, and SF. However, it also produces the lowest EN and the highest SD, indicating an overly aggressive local fluctuation enhancement that indiscriminately amplifies fragmented textures and noise. On the other end of the spectrum, increasing to 1.0 results in a noticeable drop in AG, EI, and SF, making the image visually softer and weaker in detail. The adopted setting of serves as a highly robust default middle ground. Structurally, it remains close to the optimal aggressive state (relative to , the drops in AG and EI are <, and SF decreases by only 2.34%), while keeping IE and SD more centralized and stable. This default moderate injection strength ensures excellent detail recovery without pushing the image into a noisy, over-sharpened state, ensuring stable performance across varying scenarios.
3.4.2. Structural Ablation of Key Modules
To broaden the scope of our ablation study and comprehensively validate the necessity of the proposed architectural innovations, we conducted an additional module-level structural ablation analysis. We compared the full proposed framework against two structural variants using an additional underwater steel ruler scene (distinct from the scenes evaluated in previous sections). This specific scene, characterized by severe speckle noise and complex edge structures, was deliberately selected to maximally isolate and visualize the individual contributions of our noise suppression and artifact avoidance mechanisms: (1) Removed IEB-MSD: The novel Information Entropy-Based Multiscale Denoising module is replaced by bilateral filtering, disabling the entropy-guided noise discrimination. (2) Removed Soft-Mask: The low-frequency structure-guided soft mask () in the high-frequency fusion stage is removed, and a conventional blind “take-the-maximum-absolute-value” fusion rule is applied instead.
The quantitative metrics for these variants are summarized in
Table 15, and the corresponding visual results are presented in
Figure 12.
As demonstrated by the visual and quantitative results, removing the IEB-MSD module leads to a severe loss of high-frequency details. Traditional filters fail to distinguish between valid polarization textures and noise, resulting in a blurry fused image with the lowest Average Gradient (AG = 31.5024) and Spatial Frequency (SF = 36.6565).
Conversely, the “Removed Soft-Mask” variant completely exposes the limitations of traditional blind fusion. Although this variant yields abnormally high objective metrics (e.g., AG = 79.2538, EI = 349.4619), a visual inspection reveals that these inflated scores are entirely driven by the severe amplification of background granular noise and unnatural edge halo artifacts. In objective mathematical calculations, intense noise and rigid artifacts are often miscalculated as high gradients. Without the structural guidance of , conflicting high-frequency components clash, severely degrading the visual naturalness.
The full proposed architecture successfully bridges this gap. It effectively suppresses artifact inflation while maintaining genuine structural sharpness, ensuring that robust noise suppression and precise, artifact-free detail restoration are achieved simultaneously.