Next Article in Journal
Regional Wave Analysis in the East China Sea Based on the SWAN Model
Next Article in Special Issue
Design and Hydrodynamic Performance Analysis of Airlift Sediment Removal Equipment for Seedling Fish Tanks
Previous Article in Journal
Use of Machine-Learning Techniques to Estimate Long-Term Wave Power at a Target Site Where Short-Term Data Are Available
Previous Article in Special Issue
Adaptive Responses in Byssal Growth and Shedding: Insights from Pteria penguin Under Thread Trimming and Non-Trimming Conditions
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Semi-Supervised Underwater Image Enhancement Method Using Multimodal Features and Dynamic Quality Repository

1
South China Sea Fisheries Research Institute, Chinese Academy of Fishery Sciences/Key Laboratory for Sustainable Utilization of Open-Sea Fishery, Ministry of Agriculture and Rural Affairs, Guangzhou 510300, China
2
College of Engineering Science and Technology, Shanghai Ocean University, Shanghai 201306, China
*
Author to whom correspondence should be addressed.
J. Mar. Sci. Eng. 2025, 13(6), 1195; https://doi.org/10.3390/jmse13061195
Submission received: 27 May 2025 / Revised: 15 June 2025 / Accepted: 17 June 2025 / Published: 19 June 2025

Abstract

Obtaining clear underwater images is crucial for smart aquaculture, so it is necessary to repair degraded underwater images. Although underwater image restoration techniques have achieved remarkable results in recent years, the scarcity of labeled data poses a significant challenge to continued advancement. It is well known that semi-supervised learning can make use of unlabeled data. In this study, we proposed a semi-supervised underwater image enhancement method, MCR-UIE, which utilized multimodal contrastive learning and a dynamic quality reliability repository to leverage the unlabeled data during training. This approach used multimodal feature contrast regularization to prevent the overfitting of incorrect labels, and secondly, introduced a dynamic quality reliability repository to update the output as pseudo ground truth. The robustness and generalization of the model in pseudo-label generation and unlabeled data learning were improved. Extensive experiments conducted on the UIEB and LSUI datasets demonstrated that the proposed method consistently outperformed existing traditional and deep learning-based approaches in both quantitative and qualitative evaluations. Furthermore, its successful application to images captured from deep-sea cage aquaculture environments validated its practical value. These results indicated that MCR-UIE held strong potential for real-world deployment in intelligent monitoring and visual perception tasks in complex underwater scenarios.

1. Introduction

Driven by the expansion of marine resource exploitation and the advancement of oceanographic research, underwater images have progressively gained importance in fields such as marine surveillance, underwater archeology, and aquaculture. Nevertheless, the unique optical properties of underwater environments, such as light absorption, scattering, and refraction, often lead to severe image degradation, including color distortion, low contrast, and detail blurring. These degradations significantly impair both the visual perception and the effectiveness of subsequent image analysis. Underwater image enhancement seeks to address these challenges by improving visual quality, thereby facilitating human interpretation and automated processing [1,2]. Through enhancement processing, the sharpness, contrast, and color fidelity of images can be increased, which plays a vital role in tasks encompassing underwater target detection, recognition, and tracking.
Researchers have explored various enhancement techniques, each with their own set of advantages and limitations. Currently, underwater image enhancement approaches are typically classified into three main categories: physical model-based approaches [3,4], non-physical model-based approaches [5,6], and deep learning-based approaches [7,8,9].
Physical model-based: These methods describe the degradation process of underwater images through mathematical deduction, which mainly includes establishing an underwater optical imaging model, estimating the parameters of the model, and restoring underwater images through reverse deduction. The physical parameters in the model are obtained through two different methods: extraction by polarization imaging equipment and derivation by prior knowledge. However, since the parameter estimation process has difficulty fully considering different water environments and shooting conditions, the algorithm is not universal enough.
Non-physical model-based: These approaches based on non-physical models directly adjust the pixel values of the entire image by constructing functions to adjust the color and contrast of the image, thereby enhancing human subjective visual perception. The enhancement approach based on the non-physical model is relatively simple, with low computational complexity, and is easy to implement and apply. However, since it does not take into account the physical process and optical properties, it has poor adaptability to changes in underwater environments and different scenes. It is easy for noise and color deviation to be introduced during processing, and there are problems such as vignetting and oversaturation which affect the quality and clarity of the image.
Underwater image enhancement methods grounded in physical models need to be evaluated and optimized according to specific circumstances in practical applications, which also leads to the insufficient robustness of the methods. Underwater image enhancement approaches grounded in non-physical models also have shortcomings such as poor robustness, limited processing effects, and the presence of artifacts and noise, and can be considered as auxiliary methods. With the continuous development of image processing technology, deep learning-based approaches have gradually addressed some of the shortcomings of traditional methods.
Despite their respective advantages, both physical and non-physical methods suffer from generalization issues when applied to real-world underwater environments. Recent advances in computer vision have spurred the development of deep learning-based methods [10,11,12], which leverage powerful neural architectures including convolutional neural network (CNN), generative adversarial network (GAN), and autoencoders to learn the mappings from degraded to enhanced images. These approaches have shown promising results, especially when trained on large annotated datasets. However, their performance is highly dependent on the availability of labeled data, which is costly and labor-intensive to obtain in underwater domains.
In contrast, while unlabeled underwater images are relatively easy to collect, the primary challenge lies in how to utilize them effectively to train robust and generalizable models [13]. Existing supervised underwater image enhancement (UIE) methods heavily depend on paired data or high-quality reference images, which are difficult to obtain in real-world underwater environments. Moreover, conventional semi-supervised learning methods, although promising, often fall short in underwater applications due to their reliance on static pseudo-labels and heuristic confidence mechanisms that cannot adapt to sample-wise quality variations or unknown degradations.
To address these limitations, we introduce a semi-supervised learning framework specifically tailored for underwater image enhancement, aiming to improve the model’s generalization to diverse and unseen real-world underwater scenarios. Our method is built upon the mean teacher paradigm [14,15], which leverages an exponential moving average (EMA) of the student model to form the teacher network. The teacher provides pseudo-labels for the unlabeled data, and a consistency loss is used to guide the student’s training, enabling the model to benefit from both labeled and unlabeled samples.
However, directly applying the mean teacher method to underwater image enhancement poses several critical challenges. (1) The teacher model, especially in the early training stages, is not guaranteed to outperform the student, leading to unreliable pseudo-labels that can misguide the student and hinder convergence. (2) The use of a conventional pixel-wise consistency loss (typically L1 loss) can be overly strict, causing the model to overfit noisy pseudo-labels and suffer from confirmation bias. These issues highlight a scientific gap: understanding how to integrate pseudo-label selection with image quality awareness in a dynamically evolving underwater learning scenario.
To this end, we propose a novel dynamic quality reliability repository (DQR), which continuously tracks and stores high-quality outputs from the teacher model using an NR-IQA metric (MUSIQ). This allows the student to be guided by only the most reliable pseudo-labels, effectively filtering out noisy supervision and stabilizing the semi-supervised process. Furthermore, to alleviate overfitting and enforce a more flexible learning objective, we introduce a multimodal contrastive loss that leverages complementary modality cues, such as VGG features, edge information, color distributions, and local texture regions, to provide gradient-level supervision. This auxiliary contrastive regularization improves representation robustness and is especially beneficial when working with unlabeled, degraded underwater images.
Taken together, our proposed approach directly addresses the shortcomings of previous semi-supervised UIE methods by integrating dynamic reliability assessment and multimodal regularization. It offers a principled strategy to fully exploit large-scale unlabeled underwater data, filling an important methodological gap and improving model generalization across a wide range of underwater environments.
The primary contributions of this work are summarized as follows: (1) We proposed MCR-UIE, a semi-supervised underwater image enhancement framework that leveraged multimodal loss and a dynamic quality reliability repository to effectively utilize unlabeled data, thereby improving the generalization capability of the trained model on real-world images. (2) To ensure the reliability of the pseudo-labels, we constructed a dynamic quality reliability repository that archived the best outputs produced by the teacher model. (3) We adopted multimodal contrastive loss as a regularization technique to alleviate confirmation bias during training. (4) Extensive experimental results demonstrated the effectiveness and robustness of the proposed approach.
The remainder of this manuscript is organized as follows: Section 2 reviews the related work. In Section 3, we introduce the proposed semi-supervised underwater image enhancement method, which incorporates multimodal contrastive loss and the dynamic quality reliability repository. Section 4 presents the enhanced experimental and analytical results. Finally, the conclusions are summarized in Section 5.

2. Related Works

2.1. Underwater Image Enhancement Methods

Traditional underwater image enhancement approaches are generally categorized into physical and non-physical model-based methods. Physical model-based approaches [3,4] aim to describe the image degradation process by estimating unknown parameters in underwater imaging models. These parameters typically include the transmission map and ambient light, which are derived using handcrafted priors and assumptions based on optical principles. In contrast, non-physical model-based approaches directly improve image quality by adjusting pixel intensities or contrast through algorithmic design. Typical techniques include CLAHE [6], Retinex [5], Fusion [16], and MMLE [17]. Although these traditional methods have achieved reasonable performance in relatively simple underwater scenes, they often struggle to handle complex and dynamic real-world environments. Their limitations become evident when facing the varying lighting conditions, turbidity levels, and color distortions that are common in practical underwater applications.
Early deep learning-based underwater image enhancement approaches [18,19] commonly relied on physical imaging models. These methods typically trained neural networks to estimate parameters such as transmission maps and ambient light. However, the challenge of accurately estimating these parameters often led to suboptimal restoration performance, particularly in complex underwater environments.
To overcome these limitations, recent research has shifted towards purely data-driven approaches that dispense with explicit imaging models. These approaches aim to learn a direct mapping from degraded to enhanced images using supervised learning on paired datasets. For example, some frameworks employ feature fusion strategies that integrate outputs from multiple traditional enhancement methods to guide the restoration process [20]. Others incorporate prior-inspired modules, such as spatial encoders and transmission-guided decoders, to refine structural and color representations [21]. Additionally, GAN-based architectures have also been introduced to achieve efficient, real-time image enhancement [22].

2.2. Semi-Supervised Approaches

In recent times, semi-supervised learning has become an effective strategy in computer vision by enabling the joint utilization of labeled and unlabeled data. Several representative approaches have been proposed, including mean teacher [14], virtual adversarial learning [23], and Fixmatch [24]. Among these approaches, the mean teacher method, grounded in consistency regularization, has shown remarkable effectiveness in image classification tasks. Its effectiveness has subsequently inspired its application in other areas such as semantic segmentation [25] and image restoration [26].
Despite the increasing popularity of semi-supervised learning in various vision-related tasks, its potential in underwater image restoration remains largely unexplored. A preliminary study [27] attempted to apply a semi-supervised strategy by jointly optimizing supervised and unsupervised losses within a single network. Building upon this idea, our work proposes a more systematic framework that incorporates several key components tailored for underwater scenarios. Specifically, we adopt the mean teacher mechanism and further enhance it with a dynamic quality reliability repository to filter pseudo-labels, as well as a multimodal contrastive loss that promotes better feature representation and mitigates confirmation bias.

2.3. Contrastive Learning

Contrastive learning represents a powerful paradigm in self-supervised representation learning [28]. It facilitates the acquisition of meaningful visual features by enforcing similarity between semantically related samples while pushing apart dissimilar ones. In the domain of image restoration, previous studies primarily focus on constructing contrastive pairs and designing appropriate feature projections. For instance, some approaches [29,30] consider clean images as positive samples and degraded ones as negatives, projecting them into a learned embedding space using networks such as VGG [31]. However, these implementations typically rely on paired ground truth and apply the contrastive loss in a supervised manner, limiting their applicability to unlabeled data.
To date, contrastive learning has seen limited use in underwater image restoration. A prior work [32] incorporated contrastive loss as a regularization term to improve performance within a supervised learning framework, but it still depended on labeled data. In contrast, this study presents a systematic approach to utilizing unlabeled data through multimodal contrastive learning. By designing contrastive objectives that leverage information from multiple modalities, our method enables the network to learn more robust features without requiring ground truth, thereby enhancing its generalization to complex real-world underwater scenes.

3. Methods

3.1. The Network Structure of MCR-UIE

Semi-supervised learning is intended to leverage both labeled and unlabeled data to improve model generalization and learning efficiency. In the context of underwater image restoration, we formally define the problem as follows: Let the labeled dataset be denoted as D L = { ( x i l , y i l ) | x i l I l L Q , y i l I l H Q } i = 1 N , where x i l and y i l represent the degraded underwater image and its corresponding clean ground truth, respectively, sampled from the low-quality set I l L Q and the high-quality set I l H Q . Similarly, the unlabeled dataset is defined as D U = { x i u | x i u I u L Q } i = 1 M , where each x i u is an underwater image from the unlabeled degraded set I u L Q . It is important to note that the labeled and unlabeled images are disjointed, i.e., D L D U = . The overall objective is to learn a restoration mapping function over the combined dataset D = D L D U , such that any degraded underwater image x can be effectively transformed into its clean version y .
Our semi-supervised learning framework adopts the standard architecture commonly used in semi-supervised settings [14,24], as presented in Figure 1. Specifically, the proposed MCR-UIR consists of two networks with identical architectures, referred to as the teacher and student networks. The key distinction between them lies in their parameter update strategy: the student network is trained via gradient descent, while the teacher network is updated using the exponential moving average of the student’s weights during training.
The teacher network’s parameters, denoted as θ t , are refined using the EMA of the student network’s parameters θ s , following the update rule:
θ t = λ θ t + ( 1 λ ) θ s
where λ ( 0 ,   1 ) is a momentum coefficient that controls the update speed. This strategy enables the teacher model to accumulate knowledge from the student network over time, effectively aggregating its parameters after each training step. As highlighted in [33], such temporal weight averaging not only helps to stabilize the training process but also leads to better generalization performance compared to standard gradient descent.
The student network’s parameters are refined via gradient descent. Typically, the optimization objective of the student model is defined by minimizing the following loss function:
L t o t a l = L s u p + λ L u n s u p
where L s u p = i = 0 N   | f θ s ( x i l ) y i l | denotes the supervised loss, and L u n s u p = i = 0 M   | f θ s ( ϕ s ( x i u ) ) f θ t ( ϕ t ( x i u ) ) | represents the unsupervised student–teacher consistency loss. Here, denotes the L1 distance, and ϕ s and ϕ t refer to the data augmentation functions applied to the student and teacher inputs, respectively.
In principle, as the teacher network generally yields superior performance, the unsupervised loss L u n s u p provides effective guidance for training the student model on unlabeled samples. Accordingly, the teacher’s output y ^ i u = f θ t ( ϕ t ( x i u ) ) is referred to as a pseudo-label. However, it is important to note that the teacher’s predictions are not always more accurate than those of the student. Inaccurate pseudo-labels may introduce noise and negatively impact the learning process of the student network.

3.2. Dynamic Quality and Reliability Repository

To tackle the aforementioned problem, we utilize the most confident outputs from the teacher network as pseudo-labels. Similar approaches have been employed in image classification and semantic segmentation tasks [25]; output reliability is typically assessed based on prediction entropy or confidence scores. However, directly extending these approaches to image restoration tasks is non-trivial due to several unique challenges. Specifically, as a regression problem, underwater image restoration requires the accurate recovery of fine textures and the effective removal of color casts, which cannot be reliably evaluated using classification-based confidence metrics.
To mitigate the issue of unreliable pseudo-labels, we design a reliable repository that dynamically stores the most trustworthy outputs generated by the teacher network during training. Initially, the repository B U is empty. At each iteration, we evaluate the current output of the teacher model by comparing it with both the corresponding output of the student model and the existing pseudo-label in the repository. If the teacher’s current prediction exhibits better quality, it replaces the previous one in B U . As training proceeds, the repository gradually accumulates a set of reliable pseudo-labels, denoted as B U = { y i b } i = 1 M . In this way, we construct the updated pseudo-labeled dataset D = D U B U = { x i u , y i b } i = 1 M , where each unlabeled image is associated with its most reliable pseudo-label. This mechanism ensures that the unsupervised consistency loss L u n s u p is computed using high-quality targets, thereby reducing the adverse effects of noisy supervision. The revised loss function can be formulated as follows:
L u n s u p = i = 0 M f θ s ϕ s x i u y i b
Intuitively, one might consider using non-reference image quality assessment (NR-IQA) metrics. However, as highlighted in [3,20], widely used metrics such as UCIQE [34] and UIQM [35] do not reliably capture the quality of restored underwater images. Consequently, relying on these metrics to construct our dynamic quality and reliability repository could lead to suboptimal results. To address this issue, we perform an empirical evaluation of multiple NR-IQA metrics to identify the most suitable one for assessing the quality of underwater images. We observe that the deep learning-based MUSIQ [36] metric best aligns with the monotonicity law. To justify the use of MUSIQ as the reliability criterion in our dynamic quality repository, we conduct a comparative study on seven commonly used NR-IQA methods over the EUVP benchmark, which covers a wide range of underwater scenarios. As shown in Figure 2, the evaluation highlights that deep learning-based metrics, particularly MUSIQ, exhibit better monotonicity and alignment with visual quality as compared to traditional handcrafted metrics such as BRISQUE and NIQE. MUSIQ consistently provides more stable and perceptually meaningful scores across varying underwater degradations. Based on this observation, we adopt MUSIQ to estimate the reliability of the network outputs, guiding the update of pseudo-labels in our dynamic quality reliability repository.
The construction steps of the dynamic quality reliability repository are detailed in Algorithm 1, with corresponding explanations provided for each step. Obtain the predictions of teachers and students: We calculate the prediction results of the teacher and student models for unlabeled samples. The predictions of the teacher model are used to generate candidate pseudo-labels, and the predictions of the student model are used for comparative judgment. Segment local areas: We divide the prediction results into multiple small blocks to evaluate the quality of the image more carefully, so as to avoid the situation where the overall score is affected by the poor local quality of the image. Calculate local quality scores and entropy metrics: We calculate the NR-IQA score of each local area separately, and use the entropy metric to measure the uncertainty of the prediction, and then weight the local scores into a global score. The smaller the entropy, the more confident the model prediction is, so a negative weight is given to the entropy metric. Update the reliable sample repository: If the prediction quality score of the teacher model is significantly higher than that of the student model and exceeds the existing pseudo-labels in the repository, it means that the pseudo-label is more reliable and can be replaced in the reliable sample repository.
We adopt an online update mechanism for the dynamic quality reliability repository, where the repository is dynamically refreshed during each training iteration. For every unlabeled input image, we generate enhancement results from both the teacher and student branches and assess their quality using a composite reliability score that integrates the MUSIQ score and entropy-based confidence. A new prediction from the teacher branch is allowed to replace an existing sample in the repository only when it achieves a higher quality score than both the corresponding student output and the current repository entry. This selective replacement ensures that only more reliable and higher quality pseudo-labels are preserved, allowing the DQR to evolve towards greater consistency and trustworthiness over time.
To ensure stable convergence in this feedback-based design, we apply a progressive warm-up strategy to the consistency regularization coefficient ρ , which gradually increases its influence throughout training. In the early stages, when pseudo-labels may still be noisy, this scheduling helps prevent unstable updates. As the network matures, fewer replacements occur, and the repository contents stabilize, thereby promoting convergence. This interplay between dynamic updating and progressive regularization contributes to the robustness and effectiveness of our semi-supervised enhancement framework.
Algorithm 1: Update of dynamic quality reliability repository
Require: NR-IQA method Ψ ( ) , Entropy metric H ( ) , Local region split function s p l i t ( ) ;
Initialize:  B U = ;
Sample a batch of unlabeled images { x i u } i = 1 b from D U ;
for each x i u  do
 Get teacher’s prediction: y ^ i u = f θ t ( ϕ t ( x i u ) ) ;
 Get student’s prediction: y ~ i u = f θ s ( ϕ s ( x i u ) ) ;
 Compute enhanced quality scores for y ^ i u , y ~ i u , and y i b B U ;
 Split each prediction into local regions: { r k ( y ^ i u ) } ,   { r k ( y ~ i u ) } ,   { r k ( y i b ) } using s p l i t ( ) ;
 Compute NR-IQA scores of each region for teacher prediction:
z t k = Ψ ( r k ( y ^ i u ) ) ;
 Compute NR-IQA scores of each region for student prediction:
z s k = Ψ ( r k ( y ~ i u ) ) ;
 Compute NR-IQA scores of each region for existing reliable bank sample:
z b k = Ψ ( r k ( y i b ) ) .
 Aggregate regional scores with weighted mean for global score:
z t = 0.8 × m e a n ( { z t k } ) 0.2 × H ( y ^ i u )
z s = 0.8 × m e a n ( { z s k } ) 0.2 × H ( y ~ i u )
z b = 0.8 × m e a n ( { z b k } ) 0.2 × H ( y i b )
if  z t > z s and z t > z b  then
  Replace the y i b   in   B U by y ^ i u ;
end if
end for

3.3. Multimodal Contrastive Loss

In most cases, numerous mean teacher-based approaches utilize the L1 distance as the consistency loss, as illustrated in Equation (2). The simple consistency loss may cause the student model to overfit incorrect predictions, thereby introducing confirmation bias. To alleviate this problem, we propose the integration of contrastive loss during training. Contrastive learning, a prominent approach in self-supervised learning [28], drives the model to differentiate between positive and negative sample pairs by pulling together their representations in the former case and pushing them apart in the latter. In our context, the positive samples correspond to the pseudo-labels, while the negative samples are the corresponding degraded underwater images. However, traditional contrastive learning typically performs a global comparison of features, which may be insufficient for capturing the complex characteristics of underwater images. To address this limitation, we aim to extend it into a comprehensive contrastive loss that integrates multimodal information, multi-scale feature representations, and adaptive feature selection. This enhancement is designed to significantly improve the robustness and generalization ability of the model, particularly in the generation of pseudo-labels and the effective utilization of unlabeled data.
To optimize the robustness and generalization of the student model during unsupervised learning, we introduce a multimodal contrastive loss that integrates VGG features, edge cues, color information, and local region features. This multimodal contrastive strategy enables the model to distinguish subtle variations in the image content, thereby improving pseudo-label reliability.

3.3.1. VGG Feature Contrastive Loss

Let a v g g i , p v g g i , and n v g g i represent the anchor, positive, and negative VGG features at the i -th layer, respectively. The distances are then computed as follows: anchor-positive distance: d a p i = a v g g i p v g g i 1 , and anchor-negative distance: d a n i = a v g g i n v g g i 1 . The contrastive loss at layer i is:
L v g g i = d a p i d a n i + ϵ
If the negative sample is harder (i.e., d a n i < d a p i ), a hard sample weight w h a r d is applied:
L v g g i = L v g g i × w h a r d
To emphasize discriminative layers, we introduce static weights w i and dynamically computed complexity-aware weights w d y n a m i c i . The final VGG-based contrastive loss is as follows:
L v g g = i   w i × L v g g i × w d y n a m i c i

3.3.2. Edge Feature Contrastive Loss

Let a e d g e , p e d g e , and n e d g e denote the edge features of the anchor, positive, and negative images, respectively. The loss is defined as L e d g e = a e d g e p e d g e 1 a e d g e n e d g e 1 + ϵ . If a hard negative is detected ( d a n e d g e < d a p e d g e ), then apply L e d g e = L e d g e × w h a r d , and finally:
L e d g e = L e d g e × w d y n a m i c e d g e

3.3.3. Color Feature Contrastive Loss

Let a c o l o r , p c o l o r , and n c o l o r be the color features of anchor, positive, and negative samples. We define L c o l o r = a c o l o r p c o l o r 1 a c o l o r n c o l o r 1 + ϵ . With hard sample adjustment ( d a n c o l o r < d a p c o l o r ): L c o l o r = L c o l o r × w h a r d , and finally:
L c o l o r = L c o l o r × w d y n a m i c c o l o r

3.3.4. Local Region Contrastive Loss

The image is divided into four local regions. For each region j , the anchor, positive, and negative local features are a l o c a l j , p l o c a l j , and n l o c a l j . The region-wise contrastive loss is L l o c a l j = a l o c a l j p l o c a l j 1 a l o c a l j n l o c a l j 1 + ϵ . If d a n l o c a l , j < d a p l o c a l , j , apply L l o c a l j = L l o c a l j × w h a r d . The overall local contrastive loss is averaged:
L l o c a l = j = 1 4   0.25 × L l o c a l j
The total loss combines all components:
L m c r = L v g g + L e d g e + L c o l o r + L l o c a l
This comprehensive loss function enables robust and fine-grained representation learning from unlabeled underwater images, effectively suppressing confirmation bias and improving pseudo-label quality.
To enhance the adaptiveness of our contrastive loss to varying feature complexities across different layers and modalities, we introduce a dynamic weighting scheme based on feature variance. The intuition is that feature maps with higher variance often contain richer structural or semantic information, and should contribute more significantly to the contrastive learning process.
For the i -th feature layer or modality, we define the dynamic weight w d y n a m i c i as Equation (9):
w d y n a m i c i = V a r f i j V a r f j , V a r f i = 1 N k = 1 N f k i f ¯ i 2
Here, f i denotes the feature map of the i -th layer, f k i is the k -th pixel (or feature vector), f ¯ i is the mean feature value, and N is the total number of pixels in the feature map. This normalization ensures that weights across all layers or modalities sum to 1, stabilizing the training. By integrating this variance-based dynamic weighting, the model can prioritize feature levels that contain more discriminative information, leading to more effective pseudo-label learning and better generalization in complex underwater environments.
Building on Equation (2), the overall training objective is composed of a supervised loss and a refined unsupervised loss, as detailed below.
To optimize the effectiveness of the supervised loss beyond the standard L1 formulation presented in Equation (2), we adopt a more comprehensive objective inspired by [18], which incorporates not only the pixel-wise L1 loss but also a perceptual loss L p e r and a gradient penalty term L g r a d , thereby encouraging the restoration of finer textures and more accurate structural details.
L s u p = L s u p + α 1 L p e r + α 2 L g r a d
For the unsupervised component, we refine the original L u n s u p by formulating it as a combination of the proposed reliable teacher–student consistency loss and the multimodal contrastive loss, aiming to enhance the stability of pseudo-label learning and improve the model’s generalization on unlabeled data.
L u n s u p = L u n s u p + β L m c r
Finally, the overall optimization objective is reformulated as follows, consistent with the structure of Equation (2):
L o v e r a l l = L s u p + ρ L u n s u p
where L s u p denotes the enhanced supervised loss incorporating perceptual and gradient components, and L u n s u p represents the improved unsupervised loss that combines reliable consistency and contrastive constraints.
We adopt an initial learning rate of 2 × e−4 and train the model for 200 epochs. The learning rate is decayed by a factor of 0.1 after 100 epochs to facilitate stable convergence. During training, all images are uniformly cropped to a 256 × 256 resolution. For data augmentation, we apply standard geometric transformations (resizing, random cropping, and rotation) to the labeled data. Regarding the unlabeled data, the teacher branch receives weakly augmented inputs (resizing only), while the student branch is exposed to strong augmentations, including resizing, color jittering, Gaussian blur, and grayscale conversion, to encourage consistency under perturbations. The loss function comprises several components, whose weights are empirically set as follows: α 1 = 0.3, α 2 = 0.1, and β = 1. Additionally, the consistency regularization coefficient ρ is gradually increased during training using an exponential equation [37]: ρ t = 0.2 × e 5 ( 1 t / 200 ) 2 , where t denotes the training epoch. This progressive scheduling helps stabilize training in the early stages by controlling the influence of the unsupervised loss component.

4. Experimental Results

4.1. Datasets and Settings

4.1.1. Software Configuration

The proposed approach was developed using the PyTorch (1.11.0 + cu113) framework and executed on an NVIDIA RTX 4090 D GPU. To accelerate convergence and minimize training duration, the AdamP optimizer [38] was employed, owing to its efficiency in reaching optimal solutions.

4.1.2. Introduction to Dataset

The training dataset comprises 1600 labeled image pairs and 1600 unlabeled images. The labeled pairs are randomly selected in an equal proportion from the dataset proposed in [39] and the UIEB dataset [20]. Specifically, ref. [39] provides a collection of synthetically generated underwater images captured in indoor scenes, while UIEB [20] contains 890 real-world underwater images accompanied by corresponding ground truth references. The unpaired subset of the EUVP benchmark [22] serves as the source of unlabeled images, which includes diverse underwater scenes with varying water types and illumination conditions. To evaluate performance, the test set incorporates both full-reference and no-reference benchmark datasets, comprising 89 images from the UIEB and 500 images from the LSUI [40].

4.1.3. Evaluation Metrics

To evaluate model performance, we utilize a set of commonly adopted image quality assessment metrics. In particular, peak signal-to-noise ratio (PSNR), structural similarity index (SSIM), and root mean square error (RMSE) are selected as full-reference metrics. A higher PSNR signifies superior image fidelity, whereas SSIM values approaching 1 denote stronger structural resemblance to the ground truth. Conversely, a lower RMSE implies a reduced restoration error. Additionally, two no-reference evaluation metrics—underwater image quality measure (UIQM) and underwater color image quality evaluation (UCIQE)—are applied to assess the perceptual quality of the enhanced underwater images. Elevated UIQM and UCIQE scores reflect improved visual appeal and color restoration accuracy.
The calculation formulas for the evaluation metrics are presented as follows: The definitions of PSNR, SSIM, and RMSE are provided in Equations (13), (14), and (15), respectively.
P S N R = 10 l o g 10 ( m a x 2 M S E )
S S I M = ( 2 μ x μ y + c 1 ) ( 2 σ x y + c 2 ) ( μ x 2 + μ y 2 + c 1 ) ( σ x 2 + σ y 2 + c 2 )
R M S E = M S E = 1 M N i = 1 M j = 1 N [ y e ( i , j ) y ( i , j ) ] 2
UIQM is a no-reference metric designed to evaluate the perceptual quality of underwater images by integrating three aspects: colorfulness, sharpness, and contrast. The overall UIQM is calculated as a weighted combination of these components, as shown in Equation (16):
U I Q M = c 1 U I C M + c 2 U I S M + c 3 U I C o n M
Among them, the commonly used weight coefficients are as follows: c 1 = 0.0282, c 2 = 0.2953, and c 3 = 3.5753. UCIQE is a no-reference metric used to assess the perceptual quality of underwater images. It primarily considers three visual attributes: colorfulness, contrast, and saturation. The UCIQE score is computed as a weighted linear combination of the standard deviation of chroma ( ω c ), the contrast of luminance ( c o n l ), and the mean of saturation ( μ s ), as defined in Equation (17):
U C I Q E = c 1 ω c + c 2 c o n l + c 3 μ s
where c 1 = 0.4680, c 2 = 0.2745, and c 3 = 0.2576 are the empirically determined weighting coefficients.

4.2. Enhanced Experiments on Public Datasets

We first conduct enhancement experiments on the UIEB test set, which contains 89 real-world underwater images, each resized to 256 × 256 pixels for evaluation. To validate the effectiveness of our approach, we compare it against several representative underwater image enhancement approaches, including both traditional and deep learning-based methods: NLD [41], CLAHE [6], DCP [42], UDCP [43], UNet [44], UWNet [45], CycleGAN [46], and FUnIE-GAN [22]. A selection of enhanced images are shown in Figure 3 for visual comparison. Specifically, (a) is the input image; (b–j) are the results of NLD, CLAHE, DCP, UDCP, UNet, UWNet, CycleGAN, FUnIE-GAN, and our MCR-UIE, respectively; and (k) is ground truth.
As shown in Figure 3, deep learning-based approaches clearly outperform traditional enhancement approaches in terms of visual quality. This superiority can be attributed to the ability of deep learning models to automatically learn complex representations from data. These models effectively capture subtle textures, edges, and color distributions in underwater scenes. In contrast, traditional approaches typically rely on hand-crafted features and heuristic adjustments, such as enhancing brightness or contrast, which are insufficient to fully recover the degraded content and intricate details present in underwater images.
To further demonstrate the efficacy of our proposed approach, additional enhancement experiments were performed on the LSUI dataset. The LSUI test set contains 500 underwater images, each resized to 256 × 256 pixels for evaluation. A selection of enhanced images are shown in Figure 4 for visual comparison. Specifically, (a) is the input image; and (b–j) are the outputs of NLD, CLAHE, DCP, UDCP, UNet, UWNet, CycleGAN, FUnIE-GAN, and our MCR-UIE, respectively.
To better visualize the enhancement effects, a representative region was selected and zoomed in on, presented in the lower right corner of Figure 4. Upon closer inspection, it is evident that the LSUI test set more closely resembles real underwater scenes in terms of visual characteristics. As with previous results, deep learning-based approaches continue to outperform traditional approaches. Notably, the CycleGAN method produces images with relatively high color saturation, which may lead to unrealistic results. In contrast, our proposed approach achieves superior visual performance, preserving natural color tones and fine details more effectively than the other compared methods. While subjective visual comparisons provide an intuitive understanding of enhancement quality, they are insufficient for a comprehensive evaluation. Therefore, we further assess the performance using objective evaluation metrics, which include both full-reference and no-reference metrics. The final reported numbers represent the average values across the entire test set. Table 1 presents the full-reference evaluation results on the UIEB and LSUI datasets after image enhancement, where the best and second-best results are marked in red and blue, respectively.
As presented in Table 1, the proposed MCR-UIE approach significantly outperforms traditional underwater image enhancement techniques, including NLD, CLAHE, DCP, and UDCP. On the UIEB dataset, MCR-UIE improves the PSNR by 44.3% and SSIM by 20.2% compared to the best-performing traditional method (UDCP). Similarly, on the LSUI dataset, MCR-UIE achieves a 46.3% increase in PSNR and a 14.4% improvement in SSIM over UDCP. Additionally, MCR-UIE reduces the RMSE by 46.0% on UIEB and 55.4% on LSUI, demonstrating its superior ability to recover image details and reduce distortion.
Compared to deep learning-based methods which encompass UNet, UWNet, CycleGAN, and FUnIE-GAN, MCR-UIE also achieves consistent performance gains. On the UIEB dataset, MCR-UIE surpasses CycleGAN in PSNR and SSIM by 9.1% and 7.0%, respectively, while reducing RMSE by 20.3%. On the LSUI dataset, MCR-UIE improves PSNR by 11.0%, SSIM by 10.3%, and lowers RMSE by 21.9% compared to CycleGAN. These results confirm that our approach not only effectively restores underwater image quality but also exhibits a strong generalization capability across different datasets.
To provide a more intuitive comparison of performance differences among the methods, Figure 5 presents the box plots of the PSNR and RMSE metrics corresponding to the results in Table 1. These plots illustrate the distribution, central tendency, and variability of each method’s performance across the test set, enabling a clearer visual interpretation of their relative effectiveness.
Table 2 reports the results of no-reference quality evaluation metrics, including UIQM and UCIQE, on the UIEB and the LSUI datasets. These metrics are designed to assess image quality in the absence of ground truth by evaluating attributes such as colorfulness, contrast, and sharpness. From the results, we observe that deep learning-based approaches generally outperform traditional enhancement approaches. Notably, UNet achieves the highest UIQM score (3.075) on the LSUI dataset, indicating its strong ability to enhance image contrast and sharpness in certain scenes. However, its UCIQE performance remains relatively modest (0.532), suggesting potential issues in maintaining consistent color balance.
Our proposed MCR-UIE method obtains a UIQM of 2.881 on UIEB and 3.000 on LSUI, ranking among the top-performing methods across both datasets. Although its UIQM is slightly lower than that of UNet and FUnIE-GAN, MCR-UIE shows a more stable and balanced performance, with UCIQE scores of 0.606 (UIEB) and 0.572 (LSUI) that are consistently high across datasets. In contrast, some methods (e.g., CycleGAN and UWNet) exhibit strong UIQM but relatively poor UCIQE, indicating possible color over-enhancement or inconsistency.
In summary, MCR-UIE achieves a strong trade-off between sharpness, color fidelity, and contrast, producing visually pleasing results while maintaining generalization ability across different underwater environments. This is also consistent with the qualitative results shown in Figure 4. To provide a more intuitive comparison of the results presented in Table 2, Figure 6 shows a bar chart to highlight the differences in image quality enhancement across different metrics.
To comprehensively evaluate the performance of different underwater image enhancement methods, we analyze both full-reference metrics (Table 1) and no-reference metrics (Table 2) across the UIEB and LSUI datasets. From the perspective of full-reference metrics including PSNR, SSIM, and RMSE, our proposed MCR-UIE achieves the best overall performance.
In terms of no-reference evaluation metrics, MCR-UIE also performs competitively. However, it is worth noting that although FUnIE-GAN performs well on no-reference metrics, its performance on full-reference metrics (e.g., PSNR = 19.524 dB on UIEB) is relatively low, suggesting that its visual enhancement may not be structurally accurate. In contrast, MCR-UIE achieves a strong balance between full-reference and no-reference metrics, with consistently high UIQM (2.881/3.000) and UCIQE (0.606/0.572) values across both datasets. This indicates that MCR-UIE not only preserves structural fidelity but also enhances visual perception quality effectively.
Overall, the proposed MCR-UIE framework exhibits excellent generalization, stable enhancement quality, and balanced performance from both subjective and objective evaluation perspectives, surpassing traditional and recent deep learning-based underwater image enhancement methods. On widely recognized public benchmarks such as UIEB and LSUI, our method achieves notable improvements in visual clarity, color fidelity, and contrast restoration, validating its effectiveness under diverse underwater conditions and degradation types. These results confirm that the proposed multimodal contrastive regularization and dynamic quality reliability strategy are highly effective in guiding the network toward producing perceptually pleasing and semantically faithful outputs.
Given its strong performance on benchmark datasets and its robustness in real-world scenarios, MCR-UIE holds great promise for practical deployment in underwater visual applications, especially in aquaculture environments. Specifically, as detailed in Section 4.3, we conduct underwater image enhancement experiments using images collected from deep-sea aquaculture cages. The model demonstrates a strong adaptability and enhancement capability in complex oceanic environments, characterized by low visibility, high turbidity, and dynamic illumination. These experiments not only confirm the real-world utility of MCR-UIE but also pave the way for its integration into intelligent monitoring systems in aquaculture, such as fish detection, behavior analysis, and health condition assessment.

4.3. Enhanced Experiments on Deep-Sea Cage Dataset

To further validate the effectiveness and generalization ability of the proposed MCR-UIE method in real-world scenarios, we deployed our model in the context of the deep-sea cage aquaculture, where underwater images are typically degraded due to the low illumination, high turbidity, and non-uniform color distortion caused by complex and dynamic oceanic environments.
We conducted two sets of underwater image enhancement experiments using images captured from the same large-scale deep-sea aquaculture cage on 16 April and 17 April 2025, respectively. These images were taken during the routine monitoring of Trachinotus ovatus in deep-sea cages and present significant visual challenges that conventional enhancement methods struggle to address.
The first test set, collected on 16 April, consisted of 350 underwater images mainly characterized by a bluish-green color cast. Four representative images from this test set were selected to visually compare the enhancement results before and after applying MCR-UIE, as illustrated in Figure 7. The second test set, collected on 17 April, included 380 images with a yellowish-green background. Similarly, four enhanced image samples were chosen from this set to show the improvements achieved by our model, as demonstrated in Figure 8. These color differences reflect environmental variations such as water turbidity, light penetration, and biological activity.
We employed two widely used no-reference image quality metrics, UIQM and UCIQE, to assess the visual quality of the degraded input images. The first test set yielded an average UIQM score of 1.810 and a UCIQE score of 0.456. The second test set achieved slightly higher values, with an average UIQM of 1.905 and UCIQE of 0.485.
Interestingly, while the second test set yielded higher objective scores, subjective evaluation suggested that the first test set produced more visually pleasing results, with clearer textures, more natural color reproduction, and better overall visual appeal. This discrepancy suggests that current NR-IQA metrics may not fully capture perceptual quality in complex underwater scenes. For instance, the yellowish-green cast in the second set may have led to artificially higher metric scores due to increased global contrast or saturation, even though human observers preferred the more naturally enhanced results from the first set.
These findings highlight the limitations of relying solely on objective metrics for underwater image evaluation. They also emphasize the need to integrate both quantitative and qualitative assessments when validating enhancement models in practical deployments.
In summary, the results from these two real-world test sets demonstrate that MCR-UIE can significantly improve the visual quality of underwater images captured under varying environmental conditions in deep-sea aquaculture. The model effectively suppresses color casts, enhances texture details, and boosts the overall image contrast. This confirms its practical potential for deployment in intelligent aquaculture monitoring systems, supporting downstream tasks such as fish detection, length measurement, and behavioral analysis.
However, we acknowledge that the current evaluation in practical deep-sea aquaculture scenarios primarily relies on qualitative comparisons and NR-IQA metrics, due to the absence of paired ground-truth images. Acquiring such reference data in real-world underwater environments is inherently challenging because of dynamic lighting conditions, uncontrollable turbidity, and the non-rigid nature of underwater scenes.
Moving forward, potential strategies may include the use of synthetic datasets that simulate the degradation characteristics of deep-sea cages, or indirect validation through downstream tasks such as fish detection, tracking, and body length estimation. Additionally, expert visual assessments can provide a complementary perspective when reference data are unavailable. These approaches could help mitigate the limitations of no-reference evaluation and enhance the robustness of model validation in real-world aquaculture applications.

4.4. Ablation Experiments

To evaluate the effectiveness of MCR-UIR, we perform a series of ablation experiments to investigate the contributions of its key components. The following variants are examined: (a) Semi-base: A baseline semi-supervised framework employing the consistency loss. (b) Semi-base + DQR: Extends the Semi-base model by incorporating the dynamic quality reliability repository, while excluding multimodal contrastive loss. (c) Semi-base + MCL1: Adds the VGG feature contrastive loss to the Semi-base model, without utilizing the DQR repository. (d) Semi-base + MCL2: Adds the edge feature contrastive loss to the Semi-base model, without utilizing the DQR repository. (e) Semi-base + MCL3: Adds the color feature contrastive loss to the Semi-base model, without utilizing the DQR repository. (f) Semi-base + MCL4: Adds the local region contrastive loss to the Semi-base model, without utilizing the DQR repository. (g) Semi-base + MCL: Adds the multimodal contrastive loss to the Semi-base model, without utilizing the DQR repository. (h) MCR-UIR: The complete proposed method, integrating both the DQR repository and the multimodal contrastive loss.
The qualitative comparisons are illustrated in Figure 9, with particular emphasis on the results of Semi-base + MCL and Semi-base + DQR. In addition, the quantitative results are given in Table 3.
(1) For Semi-base + MCL, due to the absence of reliable positive samples, the contrastive loss compels the network to differentiate excessively from the negative samples (i.e., the input images), which unfortunately leads to over-enhancement artifacts.
(2) To further dissect the impact of each modality within the multimodal contrastive loss, we additionally conduct experiments using individual contrastive branches, including VGG feature contrastive loss (MCL1), edge feature contrastive loss (MCL2), color feature contrastive loss (MCL3), and local region contrastive loss (MCL4). The results show that among the single-modality variants, the MCL3 achieves the most competitive performance, particularly on the LSUI dataset, suggesting its strong contribution to real-world color restoration. Conversely, the MCL4 leads to relatively poor performance, possibly due to its sensitivity to local distortions and instability in degraded underwater scenes. When combining all four modalities into a unified MCL framework, we observe consistent improvements across both datasets. The full MCL configuration, which integrates all modalities, achieves better results than any single component, confirming the complementary benefits of multimodal representations.
(3) In contrast, Semi-base + DQR lacks the contrastive regularization mechanism, and although it benefits from the dynamic quality reliability repository, the restored images still exhibit noticeable color distortions and remain visually similar to the degraded inputs.
These observations validate the complementary effectiveness of both the dynamic quality reliability repository and multimodal contrastive regularization in improving restoration quality.

4.5. Deployment Feasibility

To assess the practical deployment potential of the proposed MCR-UIE model, we evaluate its inference speed on two underwater image enhancement benchmarks used in this experiment. The model contains approximately 1.68 million parameters (1,675,281) and is designed to be lightweight and computationally efficient.
We first conduct inference tests on the UIEB dataset, which comprises 89 underwater images with a resolution of 256 × 256 pixels. The total inference time on an NVIDIA RTX 4090D GPU is 2.827 s, yielding an average inference time of 31.76 ms per image, corresponding to approximately 31.48 frames per second (FPS). Additionally, on the LSUI test set, which includes 500 images of the same resolution, the total inference time is 13.106 s, resulting in an average of 26.20 ms per image, or about 38.15 FPS.
These results demonstrate that the proposed method achieves near real-time performance, indicating its strong potential for real-world applications that require timely underwater image enhancement. In particular, such efficiency suggests its practical applicability in aquaculture monitoring systems, where high-throughput and low-latency visual processing are essential.

4.6. Analysis of Limitations

To provide a more comprehensive understanding of the limitations of the proposed MCR-UIE method, we present three representative cases, as shown in Figure 10. These examples illustrate how the model may underperform or introduce perceptual distortions under specific conditions.
Figure 10a This image features a high-resolution underwater scene with a clear blue background and a large number of fish. After enhancement, the output shows negligible changes. This suggests that the model is overly conservative in scenes that deviate from the typical degradation patterns present in the training data. The low degree of visible enhancement may result from the pseudo-label filtering mechanism and the consistency constraint, which prevent aggressive adjustments when degradation is not apparent. However, in practical applications, such scenes may still benefit from subtle contrast or clarity improvements, indicating a gap between perceptual enhancement and model behavior.
Figure 10b The input contains an object with alternating green and red bands—an uncommon color pattern in typical underwater datasets. After enhancement, these colors shift to blue and orange. This semantic color distortion reflects the model’s difficulty in preserving object colors that fall outside its learned color priors. Since the model is trained primarily on degraded natural underwater scenes without explicit object or semantic supervision, it may misinterpret such color combinations as artifacts and apply misguided corrections, leading to visually unrealistic results.
Figure 10c In this case, the input image includes a bright yellow rubber glove, which is transformed to have an orange hue after enhancement. The color shift suggests that the model mistakenly treats vivid artificial objects as being affected by underwater color distortion. Without an understanding of object semantics or color constancy, the enhancement process adjusts the hue toward what it believes is a more “natural” underwater tone, resulting in a loss of color fidelity for human-made objects.
These cases highlight two major limitations: a lack of semantic understanding, which causes the model to misinterpret unusual colors as degradations and incorrectly adjust them, and limited adaptability to high-quality or complex scenes, where the model’s enhancement behavior becomes conservative or misaligned with perceptual needs.
To address these issues, future work could consider integrating semantic segmentation priors, human-object-aware loss functions, or expanding the training dataset to include a greater diversity of object colors and scene types. In particular, increasing the diversity of the unlabeled dataset, which currently consists solely of EUVP samples, would mitigate potential domain shift effects and allow the model to better adapt to underwater environments not represented in the original benchmark. This expanded dataset could incorporate varied water types, lighting conditions, and object appearances, thereby helping the model distinguish between actual degradation and semantically meaningful color variations. Together, these enhancements would improve the model’s robustness and generalization for real-world deployment scenarios such as deep-sea aquaculture monitoring.

5. Conclusions

In this study, we propose a novel semi-supervised underwater image enhancement approach, termed MCR-UIE, which integrates a multimodal contrastive learning (MCL) and a dynamic quality reliability repository (DQR) to fully exploit both labeled and unlabeled data. The MCL introduces contrastive constraints across multiple modalities, such as perceptual, edge, color, and local region features, to enhance feature discrimination and robustness. Meanwhile, the DQR maintains high-quality pseudo-labels by dynamically selecting the best teacher outputs based on a learned reliability criterion, thus mitigating the risk of confirmation bias during training. A wide range of experiments conducted on multiple underwater image enhancement benchmarks demonstrate that MCR-UIE consistently outperforms existing advanced methods, achieving notable improvements across both full-reference and no-reference evaluation metrics. In subsequent research, we plan to extend this semi-supervised framework to other low-level vision tasks and explore more efficient memory-aware training strategies to further improve scalability and performance.

Author Contributions

Conceptualization, M.D. and G.L.; methodology, M.D.; software, M.D.; validation, G.L. and H.L.; formal analysis, Q.H.; writing—original draft preparation, M.D.; writing—review and editing, M.D. and X.H.; visualization, Y.H.; supervision, Q.H.; project administration, Y.H. and X.H.; funding acquisition, M.D., G.L., H.L. and X.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the earmarked fund for CARS-47, the Central Public-interest Scientific Institution Basal Research Fund, CAFS (No. 2023TD97), and the Central Public-interest Scientific Institution Basal Research Fund, South China Sea Fisheries Research Institute, CAFS (No. 2023RC01, No. 2022TS06, No. 2024TS07 and No. 2024TS08).

Data Availability Statement

The datasets used in this study are available from the corresponding author upon reasonable request. They are not publicly released in order to prevent potential misuse.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Shuang, X.; Zhang, J.; Tian, Y. Algorithms for improving the quality of underwater optical images: A comprehensive review. Signal Process. 2024, 219, 109408. [Google Scholar] [CrossRef]
  2. Rout, D.K.; Kapoor, M.; Subudhi, B.N.; Thangaraj, V.; Jakhetiya, V.; Bansal, A. Underwater visual surveillance: A comprehensive survey. Ocean Eng. 2024, 309, 118367. [Google Scholar] [CrossRef]
  3. Berman, D.; Levy, D.; Avidan, S.; Treibitz, T. Underwater single image color restoration using haze-lines and a new quantitative dataset. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 43, 2822–2837. [Google Scholar] [CrossRef]
  4. Galdran, A.; Pardo, D.; Picón, A.; Alvarez-Gila, A. Automatic red-channel underwater image restoration. J. Vis. Commun. Image Represent. 2015, 26, 132–145. [Google Scholar] [CrossRef]
  5. Zhang, S.; Wang, T.; Dong, J.; Yu, H. Underwater image enhancement via extended multi-scale Retinex. Neurocomputing 2017, 245, 1–9. [Google Scholar] [CrossRef]
  6. Zuiderveld, K.J. Contrast limited adaptive histogram equalization. Graph. Gems 1994, 4, 474–485. [Google Scholar]
  7. Wen, Z.; Zhao, Y.; Gao, F.; Su, H.; Rao, Y.; Dong, J. NUAM-Net: A Novel Underwater Image Enhancement Attention Mechanism Network. J. Mar. Sci. Eng. 2024, 12, 1216. [Google Scholar] [CrossRef]
  8. Zhang, B.; Fang, J.; Li, Y.; Wang, Y.; Zhou, Q.; Wang, X. GFRENet: An Efficient Network for Underwater Image Enhancement with Gated Linear Units and Fast Fourier Convolution. J. Mar. Sci. Eng. 2024, 12, 1175. [Google Scholar] [CrossRef]
  9. Zhao, S.; Mei, X.; Ye, X.; Guo, S. MSFE-UIENet: A Multi-Scale Feature Extraction Network for Marine Underwater Image Enhancement. J. Mar. Sci. Eng. 2024, 12, 1472. [Google Scholar] [CrossRef]
  10. Yang, J.; Zhu, S.; Liang, H.; Bai, S.; Jiang, F.; Hussain, A. PAFPT: Progressive aggregator with feature prompted transformer for underwater image enhancement. Expert Syst. Appl. 2025, 262, 125539. [Google Scholar] [CrossRef]
  11. Xiang, D.; He, D.; Sun, H.; Gao, P.; Zhang, J.; Ling, J. HCMPE-Net: An unsupervised network for underwater image restoration with multi-parameter estimation based on homology constraint. Opt. Laser Technol. 2025, 186, 112616. [Google Scholar] [CrossRef]
  12. Fu, F.; Liu, P.; Shao, Z.; Xu, J.; Fang, M.M.-G. A multi-scale evolutionary generative adversarial network for underwater image enhancement. J. Mar. Sci. Eng. 2024, 12, 1210. [Google Scholar] [CrossRef]
  13. Wang, M.; Zhang, K.; Wei, H.; Chen, W.; Zhao, T. Underwater image quality optimization: Researches, challenges, and future trends. Image Vis. Comput. 2024, 146, 104995. [Google Scholar] [CrossRef]
  14. Tarvainen, A.; Valpola, H. Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
  15. Huang, S.; Wang, K.; Liu, H.; Chen, J.; Li, Y. Contrastive semi-supervised learning for underwater image restoration via reliable bank. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 18145–18155. [Google Scholar]
  16. Ancuti, C.O.; Ancuti, C.; De Vleeschouwer, C.; Bekaert, P. Color balance and fusion for underwater image enhancement. IEEE Trans. Image Process. 2017, 27, 379–393. [Google Scholar] [CrossRef]
  17. Zhang, W.; Zhuang, P.; Sun, H.-H.; Li, G.; Kwong, S.; Li, C. Underwater image enhancement via minimal color loss and locally adaptive contrast enhancement. IEEE Trans. Image Process. 2022, 31, 3997–4010. [Google Scholar] [CrossRef]
  18. Kar, A.; Dhara, S.K.; Sen, D.; Biswas, P.K. Zero-shot single image restoration through controlled perturbation of koschmieder’s model. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 16205–16215. [Google Scholar]
  19. Wang, K.; Hu, Y.; Chen, J.; Wu, X.; Zhao, X.; Li, Y. Underwater image restoration based on a parallel convolutional neural network. Remote Sens. 2019, 11, 1591. [Google Scholar] [CrossRef]
  20. Li, C.; Guo, C.; Ren, W.; Cong, R.; Hou, J.; Kwong, S.; Tao, D. An underwater image enhancement benchmark dataset and beyond. IEEE Trans. Image Process. 2019, 29, 4376–4389. [Google Scholar] [CrossRef]
  21. Li, C.; Anwar, S.; Hou, J.; Cong, R.; Guo, C.; Ren, W. Underwater image enhancement via medium transmission-guided multi-color space embedding. IEEE Trans. Image Process. 2021, 30, 4985–5000. [Google Scholar] [CrossRef]
  22. Islam, M.J.; Xia, Y.; Sattar, J. Fast underwater image enhancement for improved visual perception. IEEE Robot. Autom. Lett. 2020, 5, 3227–3234. [Google Scholar] [CrossRef]
  23. Miyato, T.; Maeda, S.-i.; Koyama, M.; Ishii, S. Virtual adversarial training: A regularization method for supervised and semi-supervised learning. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 41, 1979–1993. [Google Scholar] [CrossRef]
  24. Sohn, K.; Berthelot, D.; Carlini, N.; Zhang, Z.; Zhang, H.; Raffel, C.A.; Cubuk, E.D.; Kurakin, A.; Li, C.-L. Fixmatch: Simplifying semi-supervised learning with consistency and confidence. Adv. Neural Inf. Process. Syst. 2020, 33, 596–608. [Google Scholar]
  25. Wang, Y.; Wang, H.; Shen, Y.; Fei, J.; Li, W.; Jin, G.; Wu, L.; Zhao, R.; Le, X. Semi-supervised semantic segmentation using unreliable pseudo-labels. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 4248–4257. [Google Scholar]
  26. Wang, L.; Yoon, K.-J. Semi-supervised student-teacher learning for single image super-resolution. Pattern Recognit. 2022, 121, 108206. [Google Scholar] [CrossRef]
  27. Zhu, H.; Han, X.; Tao, Y. Semi-supervised advancement of underwater visual quality. Meas. Sci. Technol. 2020, 32, 015404. [Google Scholar] [CrossRef]
  28. He, K.; Fan, H.; Wu, Y.; Xie, S.; Girshick, R. Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 9729–9738. [Google Scholar]
  29. Liang, D.; Li, L.; Wei, M.; Yang, S.; Zhang, L.; Yang, W.; Du, Y.; Zhou, H. Semantically contrastive learning for low-light image enhancement. In Proceedings of the AAAI Conference on Artificial Intelligence, Online, 22 February–1 March 2022; pp. 1555–1563. [Google Scholar]
  30. Wu, H.; Qu, Y.; Lin, S.; Zhou, J.; Qiao, R.; Zhang, Z.; Xie, Y.; Ma, L. Contrastive learning for compact single image dehazing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 10551–10560. [Google Scholar]
  31. Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
  32. Han, J.; Shoeiby, M.; Malthus, T.; Botha, E.; Anstee, J.; Anwar, S.; Wei, R.; Armin, M.A.; Li, H.; Petersson, L. Underwater image restoration via contrastive learning and a real-world dataset. Remote Sens. 2022, 14, 4297. [Google Scholar] [CrossRef]
  33. Polyak, B.T.; Juditsky, A.B. Acceleration of stochastic approximation by averaging. SIAM J. Control Optim. 1992, 30, 838–855. [Google Scholar] [CrossRef]
  34. Yang, M.; Sowmya, A. An underwater color image quality evaluation metric. IEEE Trans. Image Process. 2015, 24, 6062–6071. [Google Scholar] [CrossRef]
  35. Panetta, K.; Gao, C.; Agaian, S. Human-visual-system-inspired underwater image quality measures. IEEE J. Ocean. Eng. 2015, 41, 541–551. [Google Scholar] [CrossRef]
  36. Ke, J.; Wang, Q.; Wang, Y.; Milanfar, P.; Yang, F. Musiq: Multi-scale image quality transformer. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 5148–5157. [Google Scholar]
  37. Liu, Y.; Zhu, L.; Pei, S.; Fu, H.; Qin, J.; Zhang, Q.; Wan, L.; Feng, W. From synthetic to real: Image dehazing collaborating with unlabeled real data. In Proceedings of the 29th ACM International Conference on Multimedia, Chengdu, China, 20–24 October 2021; pp. 50–58. [Google Scholar]
  38. Heo, B.; Chun, S.; Oh, S.J.; Han, D.; Yun, S.; Kim, G.; Uh, Y.; Ha, J.-W. Adamp: Slowing down the slowdown for momentum optimizers on scale-invariant weights. arXiv 2020, arXiv:2006.08217. [Google Scholar]
  39. Li, C.; Anwar, S.; Porikli, F. Underwater scene prior inspired deep underwater image and video enhancement. Pattern Recognit. 2020, 98, 107038. [Google Scholar] [CrossRef]
  40. Peng, L.; Zhu, C.; Bian, L. U-shape transformer for underwater image enhancement. IEEE Trans. Image Process. 2023, 32, 3066–3079. [Google Scholar] [CrossRef] [PubMed]
  41. Berman, D.; Avidan, S. Non-local image dehazing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 1674–1682. [Google Scholar]
  42. He, K.; Sun, J.; Tang, X. Single image haze removal using dark channel prior. IEEE Trans. Pattern Anal. Mach. Intell. 2010, 33, 2341–2353. [Google Scholar] [PubMed]
  43. Drews, P.; Nascimento, E.; Moraes, F.; Botelho, S.; Campos, M. Transmission estimation in underwater single images. In Proceedings of the IEEE International Conference on Computer Vision Workshops, Sydney, NSW, Australia, 2–8 December 2013; pp. 825–830. [Google Scholar]
  44. Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the Medical Image Computing and Computer Assisted Intervention, Munich, Germany, 5–9 October 2015; pp. 234–241. [Google Scholar]
  45. Naik, A.; Swarnakar, A.; Mittal, K. Shallow-uwnet: Compressed model for underwater image enhancement (student abstract). In Proceedings of the AAAI Conference on Artificial Intelligence, Online, 2–9 February 2021; pp. 15853–15854. [Google Scholar]
  46. Zhu, J.-Y.; Park, T.; Isola, P.; Efros, A.A. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2223–2232. [Google Scholar]
Figure 1. Illustration of our MCR-UIE framework. (a) The network structure of the proposed MCR-UIE; (b) the Asymmetric Illumination-Aware Multi-Scale Network.
Figure 1. Illustration of our MCR-UIE framework. (a) The network structure of the proposed MCR-UIE; (b) the Asymmetric Illumination-Aware Multi-Scale Network.
Jmse 13 01195 g001aJmse 13 01195 g001b
Figure 2. Comparison of seven NR-IQA metrics on the EUVP benchmark. MUSIQ shows the best consistency with visual quality perception, validating its suitability for underwater pseudo-label selection.
Figure 2. Comparison of seven NR-IQA metrics on the EUVP benchmark. MUSIQ shows the best consistency with visual quality perception, validating its suitability for underwater pseudo-label selection.
Jmse 13 01195 g002
Figure 3. Visual comparison of enhancement effects on UIEB dataset. (a) Input; (b) NLD; (c) CLAHE; (d) DCP; (e) UDCP; (f) UNet; (g) UWNet; (h) CycleGAN; (i) FUnIE-GAN; (j) MCR-UIE; (k) ground truth.
Figure 3. Visual comparison of enhancement effects on UIEB dataset. (a) Input; (b) NLD; (c) CLAHE; (d) DCP; (e) UDCP; (f) UNet; (g) UWNet; (h) CycleGAN; (i) FUnIE-GAN; (j) MCR-UIE; (k) ground truth.
Jmse 13 01195 g003
Figure 4. Visual comparison of enhancement effects on LSUI dataset. (a) Input; (b) NLD; (c) CLAHE; (d) DCP; (e) UDCP; (f) UNet; (g) UWNet; (h) CycleGAN; (i) FUnIE-GAN; (j) MCR-UIE.
Figure 4. Visual comparison of enhancement effects on LSUI dataset. (a) Input; (b) NLD; (c) CLAHE; (d) DCP; (e) UDCP; (f) UNet; (g) UWNet; (h) CycleGAN; (i) FUnIE-GAN; (j) MCR-UIE.
Jmse 13 01195 g004
Figure 5. Box plots of performance metrics for different enhancement methods. (a) PSNR of UIEB, (b) RMSE of UIEB, (c) PSNR of LSUI, (d) RMSE of LSUI.
Figure 5. Box plots of performance metrics for different enhancement methods. (a) PSNR of UIEB, (b) RMSE of UIEB, (c) PSNR of LSUI, (d) RMSE of LSUI.
Jmse 13 01195 g005
Figure 6. Bar charts of performance metrics for different enhancement methods. (a) UIQM of UIEB, (b) UIQM of LSUI, (c) UCIQE of UIEB, (d) UCIQE of LSUI.
Figure 6. Bar charts of performance metrics for different enhancement methods. (a) UIQM of UIEB, (b) UIQM of LSUI, (c) UCIQE of UIEB, (d) UCIQE of LSUI.
Jmse 13 01195 g006
Figure 7. Example 1 of using MCR-UIE to enhance underwater images of deep-sea cage aquaculture. (ad) Raw image captured in aquaculture environment; (eh) enhanced output by our method.
Figure 7. Example 1 of using MCR-UIE to enhance underwater images of deep-sea cage aquaculture. (ad) Raw image captured in aquaculture environment; (eh) enhanced output by our method.
Jmse 13 01195 g007aJmse 13 01195 g007b
Figure 8. Example 2 of using MCR-UIE to enhance underwater images of deep-sea cage aquaculture. (ad) Raw image captured in aquaculture environment; (eh) enhanced output by our method.
Figure 8. Example 2 of using MCR-UIE to enhance underwater images of deep-sea cage aquaculture. (ad) Raw image captured in aquaculture environment; (eh) enhanced output by our method.
Jmse 13 01195 g008
Figure 9. Visual comparison of ablation results using representative images from the UIEB and LSUI datasets. (a) Input; (b) Semi-base; (c) Semi-base + DQR; (d) Semi-base + MCL1; (e) Semi-base + MCL2; (f) Semi-base + MCL3; (g) Semi-base + MCL4; (h) Semi-base + MCL; (i) MCR-UIE.
Figure 9. Visual comparison of ablation results using representative images from the UIEB and LSUI datasets. (a) Input; (b) Semi-base; (c) Semi-base + DQR; (d) Semi-base + MCL1; (e) Semi-base + MCL2; (f) Semi-base + MCL3; (g) Semi-base + MCL4; (h) Semi-base + MCL; (i) MCR-UIE.
Jmse 13 01195 g009
Figure 10. Challenging cases that reflect the limitations of MCR-UIE. (a) Example 1; (b) example 2; (c) example 3.
Figure 10. Challenging cases that reflect the limitations of MCR-UIE. (a) Example 1; (b) example 2; (c) example 3.
Jmse 13 01195 g010
Table 1. Performance metrics using different enhancement approaches (PSNR, SSIM, RMSE).
Table 1. Performance metrics using different enhancement approaches (PSNR, SSIM, RMSE).
MethodUIEBLSUI
PSNR ↑ (dB)SSIM ↑RMSE ↓PSNR ↑SSIM ↑RMSE ↓
NLD16.4160.70841.26114.6290.69449.862
CLAHE16.8120.75139.18214.7130.74448.154
DCP16.5260.71341.55814.0250.69452.976
UDCP17.4780.75236.28415.6130.75643.976
UNet14.6680.70650.55016.8510.77238.738
UWNet17.7710.75936.14618.7820.78331.139
CycleGAN21.7230.79522.69420.5700.78425.124
FUnIE-GAN19.5240.78427.58417.9480.77732.951
MCR-UIE23.6980.85118.08922.8350.86519.612
Table 2. Performance metrics using different enhancement methods (UIQM, UCIQE). The best and second-best results are marked in red and blue, respectively.
Table 2. Performance metrics using different enhancement methods (UIQM, UCIQE). The best and second-best results are marked in red and blue, respectively.
MethodUIQM ↑UCIQE ↑
UIEBLSUIUIEBLSUI
NLD2.5182.5400.6000.571
CLAHE2.6652.5150.5620.523
DCP2.3862.4100.6020.558
UDCP2.8292.8210.6010.559
UNet2.8103.0750.5730.532
UWNet2.8492.9050.5310.498
CycleGAN2.8502.9970.6040.508
FUnIE-GAN3.0333.0690.6140.586
MCR-UIE2.8813.0000.6060.572
Table 3. Ablation studies on UIEB and LSUI datasets with PSNR, SSIM, and RMSE. Bold metrics represent the best results.
Table 3. Ablation studies on UIEB and LSUI datasets with PSNR, SSIM, and RMSE. Bold metrics represent the best results.
MethodUIEBLSUI
PSNR ↑SSIM ↑RMSE ↓PSNR ↑SSIM ↑RMSE ↓
Semi-base22.9850.84719.50321.9820.85022.003
Semi-base + DQR23.2010.84819.01322.2850.86120.892
Semi-base + MCL121.9020.83722.20521.1650.78526.121
Semi-base + MCL222.2820.84021.14521.7840.84522.309
Semi-base + MCL322.5620.84420.35522.0300.85321.817
Semi-base + MCL421.8980.83622.67019.8260.83530.048
Semi-base + MCL22.7590.83820.97722.3560.84923.151
MCR-UIE23.6980.85118.08922.8350.86519.612
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Ding, M.; Li, G.; Hu, Y.; Liu, H.; Hu, Q.; Huang, X. Semi-Supervised Underwater Image Enhancement Method Using Multimodal Features and Dynamic Quality Repository. J. Mar. Sci. Eng. 2025, 13, 1195. https://doi.org/10.3390/jmse13061195

AMA Style

Ding M, Li G, Hu Y, Liu H, Hu Q, Huang X. Semi-Supervised Underwater Image Enhancement Method Using Multimodal Features and Dynamic Quality Repository. Journal of Marine Science and Engineering. 2025; 13(6):1195. https://doi.org/10.3390/jmse13061195

Chicago/Turabian Style

Ding, Mu, Gen Li, Yu Hu, Hangfei Liu, Qingsong Hu, and Xiaohua Huang. 2025. "Semi-Supervised Underwater Image Enhancement Method Using Multimodal Features and Dynamic Quality Repository" Journal of Marine Science and Engineering 13, no. 6: 1195. https://doi.org/10.3390/jmse13061195

APA Style

Ding, M., Li, G., Hu, Y., Liu, H., Hu, Q., & Huang, X. (2025). Semi-Supervised Underwater Image Enhancement Method Using Multimodal Features and Dynamic Quality Repository. Journal of Marine Science and Engineering, 13(6), 1195. https://doi.org/10.3390/jmse13061195

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop