Delving into Underwater Image Utility: Benchmark Dataset and Prediction Model

Liu, Jiapeng; Liu, Yi; Jiang, Qiuping

doi:10.3390/rs17111906

Open AccessArticle

Delving into Underwater Image Utility: Benchmark Dataset and Prediction Model

by

Jiapeng Liu

¹,

Yi Liu

²

and

Qiuping Jiang

^1,*

¹

School of Information Science and Engineering, Ningbo University, Ningbo 315211, China

²

College of Science and Technology, Ningbo University, Ningbo 315300, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2025, 17(11), 1906; https://doi.org/10.3390/rs17111906

Submission received: 1 April 2025 / Revised: 21 May 2025 / Accepted: 26 May 2025 / Published: 30 May 2025

(This article belongs to the Special Issue Advanced Techniques for Water-Related Remote Sensing (Second Edition))

Download

Browse Figures

Versions Notes

Abstract

High-quality underwater images are essential for both human visual perception and machine analysis in marine vision applications. Although significant progress has been achieved in Underwater Image Quality Assessment (UIQA), almost all existing UIQA methods focus on the visual perception-oriented image quality issue and cannot be used to gauge the utility of underwater images for the use in machine vision applications. To address this issue, in this work, we focus on the problem of automatic underwater image utility assessment (UIUA). On the one hand, we first construct a large-scale Object Detection-oriented Underwater Image Utility Assessment (OD-UIUA) dataset, which includes 1200 raw underwater images, corresponding to 12,000 enhanced results by 10 representative underwater image enhancement (UIE) algorithms and 13,200 underwater image utility scores (UIUSs) for all raw and enhanced underwater images in the dataset. On the other hand, based on this newly constructed OD-UIUA dataset, we train a deep UIUA network (DeepUIUA) that can automatically and accurately predict UIUS. To the best of our knowledge, this is the first benchmark dataset for UIUA and also the first model focusing on the specific UIUA problem. We comprehensively compare the performance of our proposed DeepUIUA model with that of 14 state-of-the-art no-reference image quality assessment (NR-IQA) methods by using the OD-UIUA dataset as the benchmark. Extensive experiments showcase that our proposed DeepUIUA model has superior performance compared with the existing NR-IQA methods in assessing UIUS. The OD-UIUA dataset and the source code of our DeepUIUA model will be released.

Keywords:

convolutional neural network; benchmark dataset; image enhancement; image quality assessment

1. Introduction

High-quality underwater images are essential for marine scientific research, resource development, environmental monitoring, and other applications [1,2,3,4]. However, due to light absorption and scattering effects in underwater environments, raw underwater images usually suffer from various degradation issues such as color cast, blur, low light, and reduced contrast, which not only affects image quality, but also poses great challenges for underwater machine vision tasks [5,6,7,8,9,10]. In order to improve the quality of underwater images, researchers have developed a variety of underwater image enhancement (UIE) algorithms [11,12,13], which have achieved impressive results in restoring image colors and improving contrast and details. However, the effectiveness of these UIE methods in improving underwater machine analysis performance remains unclear. In addition, almost all existing Underwater Image Quality Assessment (UIQA) methods [14,15,16,17,18] focus on measuring human visual perception-oriented image quality, which cannot be directly used to measure the utility of underwater images in machine vision tasks, i.e., underwater image utility assessment (UIUA). Thus, how to fairly gauge the utility of underwater images in machine vision applications remains a challenging and unsolved problem [19,20].

Similar to UIQA, which focuses on measuring human visual perception-oriented image quality, UIUA aims to assess the utility of underwater images in specific machine vision-related tasks [21]. A comprehensive underwater image dataset with reliable utility score annotations plays an indispensable role for establishing reliable UIUA metrics, just as a well-constructed dataset for UIQA [22]. Such a dataset should include relevant data that can help in accurately assessing the utility of underwater images in complex underwater scenarios, which parallels the role of benchmark datasets in the development of UIQA.

Existing UIQA datasets [23,24] contain raw underwater images, enhanced counterparts, and human visual perception-oriented quality scores. These datasets have facilitated significant advances in the development of UIQA metrics, allowing more accurate evaluation of visual quality and also driving significant progress in the design of more advanced UIE algorithms [25]. However, these datasets do not provide annotations tailored to assessing the utility of underwater images in machine vision applications. The lack of underwater image utility score annotations restricts researchers from measuring the practical value of underwater images for real-world machine vision tasks. Clearly, a large-scale benchmark dataset is essential for performing a fair and comprehensive performance evaluation of UIUA metrics [26]. This kind of dataset should include raw underwater images with diverse underwater scenes and typical degradation types, corresponding enhanced results generated by representative UIE algorithms, and the corresponding utility scores. Although collecting raw underwater images and producing representative enhanced results is relatively straightforward, ensuring the reliability of utility scores continues to be a crucial challenge.

In addition to the absence of a benchmark dataset for UIUA, a dedicated objective UIUA metric is also lacking at the current stage. Although significant progress has been achieved in UIQA during the past decade, the focus of UIQA is different from that of UIUA. For the classical UIQA task, the most critical issue is to extract visual quality-aware features that can well reflect subjective perception of humans in terms of different image attributes. Meanwhile, for the UIUA task, the goal is to extract task-oriented features from the input images. Although one can resort to the end-to-end deep learning technique to directly train a UIUA network, the feature extraction module should still be carefully designed so as to increase the interpretability of the network architecture. This gap between perception-oriented and task-oriented features makes existing UIQA metrics unable to achieve the desired performance on the UIUA task.

In order to address the above issues, in this work, we first construct a large-scale Object Detection-oriented UIUA dataset (OD-UIUA) that contains diverse underwater images and the corresponding utility scores, and then propose an effective deep UIUA network (DeepUIUA) that can predict the utility of underwater images well. In summary, the main contributions of this paper are as follows:

To the best of our knowledge, we are the first to construct a large-scale Object Detection-oriented Underwater Image Utility Assessment (OD-UIUA) dataset for the task of UIUA. The OD-UIUA dataset comprises 1200 raw underwater images, corresponding 12,000 enhanced images generated by 10 representative UIE algorithms, and 13,200 utility scores for all images in the dataset. The construction of such a dataset not only provides a fair and reliable benchmark platform for the UIUA task but also inspires further innovations of machine vision-oriented UIE algorithms.
Unlike vanilla UIQA methods that mainly focus on visual perception-oriented image quality evaluation, we train a deep UIUA network (DeepUIUA) specifically tailored to assess the utility of underwater images for machine vision tasks (in particular, the object detection task). DeepUIUA is capable of accurately capturing object region-oriented features to facilitate accurately predicting the utility score of underwater images. Extensive experiments demonstrate that our proposed DeepUIUA model achieves superior predictive accuracy compared with the existing state-of-the-art no-reference image quality assessment (NR-IQA) methods in assessing the utility of underwater images.

The remaining parts of this paper are organized as follows: Section 2 reviews materials and methods. Section 3 presents the results. Section 4 introduces the discussion. Finally, Section 5 draws the conclusion of this paper.

2. Materials and Methods

2.1. Related Works

2.1.1. Underwater Image Dataset

The collection and analysis of underwater image data are of great importance to marine scientific research. High-quality underwater image datasets not only provide researchers with accurate and reliable data, but also expand the use of underwater image technology in various application scenarios [27]. For this reason, many researchers have proposed various underwater image datasets in recent years. For example, Liu et al. [28] built a multi-view underwater imaging system, collected more than 4000 real underwater images, and divided them into three subsets: the Underwater Image Quality Set (UIQS), the Underwater Color Cast Set (UCCS), and the Underwater Higher-Level Task-driven Set (UHTS), which were investigated to address three challenging enhancement requirements. Li et al. [29] also compiled a dataset by collecting 45 underwater images from existing datasets and real underwater scenes for testing purposes. However, a common issue faced by these datasets is the lack of corresponding reference images. To solve this problem, Li et al. [30] constructed an underwater image enhancement dataset containing 950 images, each enhanced by 12 different algorithms, and finally, volunteers selected the best enhancement result as the reference image, generating 890 raw underwater images with reference and 60 challenging images. To advance perceptual image enhancement research, Islam et al. [31] gathered over 20,000 underwater images using 7 different camera devices, comprising more than 12,000 paired images and 8000 unpaired images. More recently, Jiang et al. [23] built a quality evaluation dataset of UIE containing subjective annotations, which contained 100 raw underwater images and their 1000 enhanced images processed by 10 UIE algorithms, and each enhanced image carries a subjective ranking score, which provides the basic truth for underwater image enhancement quality evaluation. Furthermore, Liu et al. [32] created a large-scale UIQA database by collecting 5369 underwater images from existing underwater computer vision studies, covering a broad range of underwater scenes and assigning subjective quality scores to each image.

Although these datasets contribute significantly to UIQA, there remains a notable gap: lacking large-scale datasets with utility scores. This limitation hinders the broader application of underwater images in machine vision tasks, highlighting the urgent need for further research to fill this gap.

2.1.2. Underwater Image Enhancement

Underwater image enhancement significantly improves visual information quality and plays a crucial role in various underwater research fields [33,34]. Existing underwater image enhancement algorithms can be broadly divided into two categories: traditional methods based on handcrafted feature extraction and deep learning-based algorithms. For instance, Fu et al. [35] proposed a two-branch deep learning network designed to compensate global color cast and local contrast degradation and further enhance image quality through compressed histogram equalization. Chen et al. [36] used a convolutional neural network to estimate backward scattering and direct transmission and by generating enhanced underwater images via a reconstruction module. Ankita et al. [37] developed a lightweight deep neural network that maintains the image enhancement effect while reducing the model computational requirements. Zhang et al. [38] improved image quality by minimizing color loss and applying local adaptive contrast enhancement, combining color shift and maximum decay mapping for local color correction, followed by contrast enhancement using integral and squared integral mapping. Xiao et al. [39] proposed a lightweight network guided by statistical methods, using a bi-statistical white balance module for color distortion correction and adaptive contrast enhancement through a multi-color space stretching module. Guo et al. [40] investigated the impact of ranked UIQA methods, developing a loss function to guide image enhancement effect. Li et al. [41] designed a quality discriminative network containing a Siamese multilayered inference structure, improving the generation effect by comparing the qualities of multiple enhancement candidates through a comparative learning framework. Jiang et al. [42] introduced a two-stage lightweight UIE network. The first stage decomposes the complex underwater degradation into smaller subproblems, and the second stage boosts detail perception using multi-branch color enhancement and pixel attention modules. Zhang et al. [43] performed color correction through decay mapping, combined the maximum information entropy and fast integrated optimization strategies to enhance the global and local contrast of the image, and finally combined the high- and low-frequency components of different scale images through a weighted wavelet fusion strategy to generate high-quality underwater images. Rao et al. [44] proposed a two-stage enhancement approach that performs color compensation through a probabilistic color compensation network, and then performed image enhancement. This network improves color performance and enhances model robustness by estimating color probability distributions through multi-scale texture and color feature fusion.

While these methods perform well in enhancing visual visibility, they primarily focus on human visual perception, leaving challenges in optimizing the utility of underwater images in machine vision applications. However, some researchers are now beginning to study the relationship between underwater image enhancement and object detection performance. Wang et al. [45] conducted experimental research on the impact of underwater image enhancement algorithms on object detectors, and the results showed that underwater image enhancement can suppress object detection performance, especially when detecting difficult samples. Ashraf et al. [3] conducted a comparative analysis of the detection performance before and after enhancing a single image and found that most enhanced images performed equally or better than the raw image when evaluated separately, but excessive enhancement would reduce detection performance.

2.1.3. Underwater Image Quality Assessment

In recent years, UIQA has gradually become a hot research topic [46]. Yang et al. [47] introduced the first metric for assessing underwater color image quality UCIQE, which combines chromaticity, saturation, and contrast to quantify color cast, blur, and low-contrast issues in underwater images. Panetta et al. [48] proposed UIQM, which is inspired by the human visual system (HVS), to evaluate underwater image quality by measuring color, sharpness, and contrast. Wang et al. [49] proposed CCF, a metric based on the principle of underwater imaging, combining the color index, the contrast index, and the fog density index. Yang et al. [50] improved the existing manual extraction of colorimetry, contrast, and sharpness method and proposed a new quality evaluation index, FDUM, for underwater images. Jiang et al. [23] found that unsatisfactory color bias and luminance would lead to the degradation of visual quality, and thus extracted quality-aware features in the chromaticity and luminance components, and proposed NUIQ through SVM model training. To evaluate underwater image enhancement quality, Guo et al. [51] combined the underwater image imaging model and visual attributes to construct UWEQM, a benchmark for underwater image enhancement quality evaluation. Zheng et al. [24] proposed UIF, combining naturalness, clarity, and structural metrics, and used a saliency pooling strategy to improve the effectiveness of underwater image enhancement quality evaluation. However, previous IQA methods mainly focus on intrinsic image distortions, overlooking artificial distortions introduced by enhancement algorithms. To address this, Wang et al. [52] proposed GLCQE to measure the quality of the enhanced image by constructing the optimal enhancement direction and based on the luminance and chromaticity modules. Liu et al. [53] proposed UIQI, a metric that comprehensively analyzes luminance, chromaticity, sharpness, contrast, fog density, noise, color cast, and color depth. In addition, Liu et al. [32] used a deep learning approach to evaluate underwater image quality from both global and local perspectives, introducing a benchmark dataset and combining attention mechanisms with a visual transformer to better characterize image quality.

However, most of these UIQA methods are designed based on the characteristics of the HVS and focus on the human visual perception-oriented image quality issue [54]. Although these methods fulfill the needs of human vision, they have significant shortcomings in assessing the utility of underwater images in machine vision applications. Therefore, how to develop UIQA methods that can accurately assess the utility of underwater images has become a critical issue that needs to be addressed.

2.2. Construction of OD-UIUA Dataset

By systematically reviewing previous works, we found that the field of UIUA still lacks large-scale datasets with utility scores, and the UIQA methods cannot be used to gauge the utility of underwater images in machine vision applications. In this section, we provide a detailed description of the OD-UIUA dataset, covering raw underwater image collection, enhanced image generation, and utility score generation.

Overall, our constructed OD-UIUA dataset contains 13,200 underwater images and the corresponding underwater image utility scores (UIUSs). The 13,200 underwater images consist of 1200 raw underwater images and 12,000 enhanced images. Section 2.2.1 describes the collection process for the 1200 raw underwater images. Section 2.2.2 details the generation process of the 12,000 enhanced images. Section 2.2.3 describes the acquisition of UIUSs. To provide a comprehensive understanding of the dataset, Section 2.2.4 presents the statistical characteristics and utility analysis of the OD-UIUA dataset.

2.2.1. Raw Underwater Image Collection

To reflect real-world underwater application scenarios for machine vision tasks, we follow the key principle that the collected raw underwater images should cover a diversity of biological categories and the typical underwater degradation types. Specifically, to cover a diversity of biological categories, we select the images containing typical marine organisms such as different types of echini, starfish, holothurians, and scallops. To cover typical underwater degradation types, we select images containing different degrees of color cast, low light, reduced contrast, etc. Based on this principle, we select 1200 raw underwater images from the existing underwater object detection dataset DUO [55]. Sample images in Figure 1 illustrate practical underwater scenarios with biodiversity and typical degradation types.

2.2.2. Enhanced Image Generation

To enable the evaluation of underwater enhanced results with different qualities, we generate

1200 \times 10

= 12,000 enhanced results via ten representative UIE methods in last 5 years. The ten UIE methods contains eight deep learning-based methods and two traditional methods. The eight deep learning-based methods include GLCHE [35], DLIFM [36], Shallow-UWNet [37], USLN [39], NU₂Net [40], CLUIE-Net [41], FiveAPlus-Net [42], and P2CNet [44]. The other two traditional methods include MLLE [38] and WWPF [43]. As shown in Figure 2, the five representative raw underwater images in column (a) and their corresponding ten enhanced results shown in column (b) to column (k) present underwater images with different qualities.

In order to construct the UIUA dataset, it is necessary to obtain the utility scores of all images. In the following section, we describe the utility testing for obtaining the utility scores of these images.

2.2.3. Utility Score Generation

To obtain the UIUSs of 13,200 images and ensure that the utility scores are fair and reliable, the UIUS of each image is averaged by five mean Average Precision (mAP) values generated by five classical object detectors including three single-stage detectors (RetinaNet [56], FCOS [57], and TOOD [58]) and two two-stage detectors (Faster R-CNN [59] and Cascade R-CNN [60]). The calculation of the UIUS can be expressed by the following equation:

U I U S = \frac{\sum_{i = 1}^{5} m A P_{i}}{5}

(1)

where

U I U S

represents the UIUS of a single image in an OD-UIUA dataset, and

m A P_{i}

represents the mAP value generated by the i-th object detector.

To obtain reasonable mAP values for underwater images, we construct a new underwater image dataset and retrain all these five detectors. Figure 3 shows the process flowchart for constructing datasets and generating UIUS. This new dataset is divided into 11 subsets: 1 subset for the raw underwater images and other 10 subsets for the enhanced underwater images processed by 10 representative UIE algorithms described in Section 2.1.2. The raw subset contains 7717 images selected from the DUO dataset in which the duplicate images and the images without objects are eliminated. Among these 7717 images, 1200 raw underwater images selected in Section 2.2.1 are regarded as the testing set, and the remaining 6517 images are regarded as the training set. The other 10 enhanced subsets contain 7717 × 10 = 77,170 images generated by the 10 UIE algorithms from the 7717 raw images. In addition, each of the 5 detectors are trained on each training subset individually; i.e., every detector will be trained on the above-mentioned 11 subsets to obtain 11 pre-trained models, finally generating a total of 5 × 11 = 55 pre-trained models. These models are used to assess the performance of object detection (generating mAP values) across different testing subsets, providing a comprehensive understanding of the impact of detection accuracy from UIE. Finally, the UIUS of each image is calculated by averaging the mAP values obtained from the detections made by the 55 pre-trained models on the corresponding subsets.

The OD-UIUA dataset has been constructed after finishing the above utility annotation. In the next section, we analyze the OD-UIUA dataset in detail and explore its statistical properties and utility assessment.

2.2.4. Dataset Analysis

To deeply understand the OD-UIUA dataset, we conduct a comprehensive analysis focusing on the proportions of different object categories, the number of objects, category diversity, UIUS distribution, and the impact of UIE algorithms on UIUS. The detailed results of these analysis are presented below:

First, in order to show the basic information of the OD-UIUA dataset, we analyze the biological composition of the OD-UIUA dataset in terms of the proportions of different object categories. Figure 4 shows the proportions of different object categories in the OD-UIUA dataset. Specifically, the proportions of echini, starfish, holothurians, scallops are

68.1

%,

18.9

%,

10.6

%, and

2.4

%, respectively.

Second, to reflect the complexity of the scenes in the OD-UIUA dataset, we analyze the proportions between the number of objects and categories per image. Figure 5a shows the proportion of the number of objects per image. Specifically, the proportions of 1–2, 3–5, 6–10, and ≥11 are

12.7

%,

27.1

%,

29.1

%, and

31.1

%, respectively. Figure 5b shows the proportion of the number of categories per image. Specifically, the proportions of 1, 2, and ≥3 are

28.7

%,

44.6

%, and

26.7

% respectively. Images containing different categories of organisms not only improve the learning ability of relationships between categories but also promote the generalization ability of UIUA methods.

Subsequently, we further analyze the distribution of UIUS in the OD-UIUA dataset. As illustrated in Figure 6, the UIUSs cover the entire horizontal axis, and this distribution not only intuitively reflects the diversity of the OD-UIUA dataset but also indicates that the UIUA methods trained on the OD-UIUA dataset can effectively evaluate samples with different utility levels.

In addition, analyzing UIUSs has potential value, such as the impact of different UIE algorithms on the performance of image object detection. In the next part, we will analyze around this point.

We present the 11 mean UIUSs of all images in each subset (constructed in Section 2.2.3) in Figure 7, which shows that all the 10 mean UIUSs of 1 enhanced subset are lower than the mean UIUS of a raw subset. This suggests that existing UIE algorithms produce the negative effect on object detection, while the visual quality of the enhanced images is improved by these UIE algorithms. Does this mean that each enhanced image will degrade the performance of object detection?

To answer the above question, we present the UIUSs of 100 consecutive raw underwater images and their corresponding enhanced images, as shown in Figure 8. The 100 consecutive raw underwater images are selected from the OD-UIUA dataset according to their UIUSs from low to high. The results indicate that the 10 UIE algorithms can either increase or decrease UIUS, which reveals the uncertainty of the enhancement effect on object detection performance. This uncertainty, combined with the conclusion from Figure 7, shows that although the UIE algorithms produce the negative effect on the overall performance of object detection, it is still possible for the UIE algorithm to produce the positive effect on the object detection performance of a single image. Meanwhile, these observations imply that no single UIE algorithm can improve the object detection performance across all types of underwater images.

The above analysis presents a new challenge to UIQA methods. Almost all existing UIQA methods focus on the visual perception-oriented image quality issue and cannot be used to gauge the utility of underwater images in machine vision applications. Therefore, it is urgent to propose a UIUA method that can objectively gauge the utility of underwater images. In the next section, we will introduce our proposed UIUA method.

2.3. Proposed DeepUIUA

With the OD-UIUA dataset, we proposed a UIUA method to assess the object detection performance of underwater images, called DeepUIUA. The proposal of DeepUIUA not only aims to assess the utility of underwater images, but also calls for further research on UIUA.

Existing UIQA methods usually focus on the degree of degradation or the type of distortion of the entire image. However, they are not the main factors in improving the UIUA. In contrast, we consider that the UIUA method should focus on the quality of the object region so that the utility score can accurately reflects the utility of the underwater image. To achieve this, DeepUIUA employs a two-stage design that enables the UIUA model to focus on the quality of object regions. The architecture of the proposed DeepUIUA is shown in Figure 9.

2.3.1. Stage1: Object Region-Oriented Feature Capturing Network

In the first stage, a backbone network optimized for focusing on object regions to capture features is obtained via following the architecture of the YOLOX [61] including a backbone network, a neck layer, and a detection head. The following part will introduce the architecture.

Backbone: To enhance the feature extraction capability of the backbone network, we replace the backbone network of YOLOX with ResNet50 [62] in this paper due to the following two reasons: First, ResNet50, a classical convolutional neural network, is widely used in visual tasks by virtue of its excellent feature extraction ability, and choosing it as the backbone network can ensure the reliability of the model. Second, ResNet50 provides abundant pre-trained weights, which can effectively improve the initial performance of the model.

Neck: In order to cope with the common problems of uneven illumination and color shift in underwater environments and improve the ability of the backbone network to focus on underwater objects with different scales, we utilize a Feature Pyramid Network (FPN) and a Path Aggregation Network (PAN) at the neck layer to extract and fuse multi-scale features. The FPN enhances the recognition performance of blurred underwater objects by integrating high-level semantic information through a top–down structure, while PAN increases the expression ability of different scale features through a bottom–up path aggregation, allowing the detection network to be more stable when dealing with underwater objects of different sizes and shapes in the face of a complex underwater scene. As shown in Figure 9, we extract features

F_{1}

,

F_{2}

, and

F_{3}

from layers 4, 3, and 2 of the backbone network and obtain the first set of multi-scale features

F_{1}^{'}

,

F_{2}^{'}

, and

F_{3}^{'}

through the FPN. Then the first set of multi-scale features is transmitted to PAN to produce the second set of multi-scale features

F_{1}^{″}

,

F_{2}^{″}

, and

F_{3}^{″}

, which are subsequently input to the detection head for detection. The process of multi-scale feature extraction and fusion can be expressed with the following equation:

F_{1}^{'}, F_{2}^{'}, F_{3}^{'} = F P N (F_{1}, F_{2}, F_{3})

(2)

F_{1}^{″}, F_{2}^{″}, F_{3}^{″} = P A N (F_{1}^{'}, F_{2}^{'}, F_{3}^{'})

(3)

Head: The detection head uses the multi-scale features from the neck layer to perform object classification, bounding box regression, and confidence estimation. Due to the multi-scale characteristic of the features, the detection head can identify both large and small objects, thereby improving the accuracy of detecting objects of different sizes in an underwater environment.

2.3.2. Stage2: Multi-Scale Quality Assessment Network

In the second stage, the predicted UIUSs of underwater images are generated by incorporating the pre-trained backbone network from the first stage and the quality prediction module. The following part will introduce the backbone and the quality prediction module.

Backbone: To ensure that quality scores reflect the underwater images’ object detection performance more accurately, we use the ResNet50 model trained in the first stage as the backbone network in this stage so that image features are focused on underwater object areas. The input image I is processed through the backbone network

F_{p t - b a c k b o n e}

to extract features

F_{a}

,

F_{b}

, and

F_{c}

from layers 2, 3, and 4. The feature extraction can be expressed with the following equation:

F_{a}, F_{b}, F_{c} = F_{p t - b a c k b o n e} (I)

(4)

Quality Prediction Module: The quality prediction module utilizes the multi-scale features of underwater images for quality prediction. Shallow features have rich detail information and higher resolution, which can accurately reflect the quality of small objects, while deep features have a larger receptive field and richer semantic information, which can accurately reflect the quality of large objects. The lightweight and concise detection head in the object detection model has the ability to detect different-sized objects through feature maps of different scales, which is also important for UIUA. Therefore, the design of our DeepUIUA draws on this idea to accurately assess the underwater objects utility of different sizes through feature maps of different scales. Specifically, we extract multi-scale features

F_{a}

,

F_{b}

, and

F_{c}

from layers 2, 3, and 4 of the backbone network and input them to the quality prediction module for prediction. These three multi-scale features are passed through average pooling layers

F_{A v g p o o l}

cascaded by three different scales’ fully connected layers

F_{F C}

to produce three output values. These values are concatenated to form a fused feature vector

\hat{F}

. Then,

\hat{F}

is input into a fully connected layer

F C

, and finally, the utility score q of the image is obtained, represented by the following formula:

\hat{F} = C o n c a t (F_{F C} (F_{A v g p o o l} (F_{a}, F_{b}, F_{c})))

(5)

q = F C (\hat{F})

(6)

Total Loss: In order to optimize the performance of the model, we minimize the

L_{1}

loss function between the predicted UIUSs and the baseline UIUSs. The

L_{1}

loss function is defined as follows:

L_{1} = \sum_{i = 1}^{N} | q_{i} - s_{i} |

(7)

where N is the total number of training images, and

q_{i}

and

s_{i}

denote the predicted UIUSs and baseline UIUS of the i-th training image, respectively.

3. Results

In this section, we first detail the experimental protocol, and then compare the performance of DeepUIUA with that of the SOTA NR-IQA methods.

3.1. Experimental Protocol

3.1.1. Utility Score Generation Details

As stated, the UIUS of each image is generated by averaging the mean Average Precision (mAP) values generated by applying the five object detectors RetinaNet [56], FCOS [57], TOOD [58], Faster R-CNN [59], and Cascade R-CNN [60] on each image. Specifically, these object detectors are implemented on the MMDetection [63] platform. We use the same SGD optimizer with the same parameters for all detectors during training. All momentum values are set to

0.9

, and weight decay values are set to

1 \times 10^{- 4}

.

3.1.2. Training Details of DeepUIUA

In the first stage, to train the object detection network, we utilize images from the raw subset as mentioned in Section 2.2.3 and their corresponding object detection labels from the DUO dataset. In addition, to ensure that the network sufficiently utilizes the image detail, the original resolutions of the raw subset images are maintained during this stage. In the second stage, to train the quality prediction network, we utilize images from the OD-UIUA dataset and their corresponding UIUS. Meanwhile, we uniformly adjust the input images to the resolution of 224 × 224. The proposed DeepUIUA is implemented in the PyTorch framework. The Adam optimizer with an initial learning rate of

1 \times 10^{- 4}

and a training batch size of 32 is used for training the proposed DeepUIUA on a server with NVIDIA GTX3090.

3.1.3. Evaluation Metrics

To evaluate the prediction performance of DeepUIUA, we use four widely adopted statistical metrics: Spearman Rank Correlation Coefficient (SRCC), Kendall Rank Correlation Coefficient (KRCC), Pearson Linear Correlation Coefficient (PLCC), and Root Mean Square Error (RMSE). Higher values of SRCC, KRCC, and PLCC and lower values of RMSE indicate the better predictive performance of the method.

3.2. Performance Comparison with the State-of-the-Art Methods

We compare DeepUIUA with 14 SOTA deep learning-based NR-IQA methods, including CNNIQA [64], IQA-CNN+ [65], WaDIQaM-NR [66], DBCNN [67], HyperIQA [68], MetaIQA [69], TRes [70], DACNN [71], VCRNet [72], DEIQT [73], StairIQA [74], TTA-IQA [75], TOPIQ [76], and ATUIQP [32]. The first 13 methods focus on natural image visual quality assessment, and the last method, ATUIQP, is specifically for underwater image visual quality assessment. For the evaluation experiments, the OD-UIUA dataset is divided into 6 groups. Each group contains 2200 images: 200 raw underwater images and 2000 corresponding enhanced images. The performance of model is evaluated via cross-validation. In each round of experiments, one group is used as the testing set, and the remaining five groups are used as the training set. We conducted training and testing on each group in turn, and the final evaluation metric is the average results across the 6 experiments. Table 1 presents the comparison results between DeepUIUA and these 14 SOTA NR-IQA methods.

Three key conclusions can be drawn from the result in Table 1. First, the performance of NR-IQA methods utilizing pre-trained initialization remarkably surpasses that of the NR-IQA methods without pre-trained initialization. Specifically, The NR-IQA methods, such as StairIQA, HyperIQA, and DEIQT, integrating pre-trained initialization, perform better than the NR-IQA methods, such as CNN-IQA, WaDIQaM-NR, and ATUIQP, without pre-trained initialization across all metrics. This is because pre-trained initialization provides the network with prior knowledge, which enables the network to better understand and adapt to the features of underwater images. Second, ATUIQP, which is specifically designed for UIQA, performs worst among all compared methods. Finally, our proposed DeepUIUA achieves the best performance compared with existing SOTA NR-IQA methods. The combination of a pre-training technique and multi-scale features designed for underwater machine vision tasks enables DeepUIUA to accurately assess the utility of an underwater image.

As shown in Figure 10, to intuitively visualize the consistency between UIUS and DeepUIUA prediction scores, we selected six underwater images representing different levels of utility from the OD-UIUA dataset. With the increase in UIUS, the localization accuracy and image confidence are obviously improved. For example, comparing image (a) with image (f), image (a), with UIUS

0.341

, results in poor object localization accuracy and a low confidence level of 50–65%, as marked by the boxes. In contrast, image (f), with UIUS

0.807

, shows accurate object localization with a high confidence level of above 80%. In addition, the quality scores predicted by our proposed DeepUIUA are highly consistent with UIUS, further demonstrating the effectiveness of DeepUIUA in UIUA.

To evaluate the computational complexity of each NR-IQA model, we conduct a systematic comparative analysis of parameter counts, FLOPs, and running times of existing NR-IQA methods to assess their computational efficiency. To ensure fairness in the comparison, all NR-IQA methods are tested on a 224 × 224 image. Additionally, the running times of these methods are implemented on an NVIDIA RTX 4090 GPU. The detailed results are shown in Table 2. From the table, we can see that although IQA-CNN++ demonstrates the best performance in the comparison of computational complexities, it performs poorly in terms of the quality assessment metrics. However, although our proposed method does not have an advantage in computational complexity, it achieves state-of-the-art performance in terms of quality assessment metrics. In summary, our proposed method strikes the best balance between performance and computational complexity.

4. Discussion

4.1. Ablation Study

4.1.1. Ablation of the Proposed Components

To validate the effectiveness of the components in DeepUIUA, we conduct a series of ablation experiments. The experimental results are summarized in Table 3. Specifically, “BL” (Baseline) refers to the general pre-trained ResNet50 network without the training on underwater object detection datasets; this is the setting of the baseline, which provides a baseline comparison for subsequent additions of components. “PT” refers to the Pre-Trained backbone network trained by the first stage of DeepUIUA, and “MSQP” refers to the Multi-Scale Quality Prediction module in the second stage of DeepUIUA. Our proposed network utilizes these two components to improve the underwater image utility assessment, and therefore requires a comparison in an ablation study to validate the effectiveness of these two components. Four ablation models, i.e., AM1∼AM4, respectively represent with BL only, with BL+PT, with BL+MSQP, and with BL+PT+MSQP, i.e., the whole network of stage 2.

From the table, we can see that adding any components to the baseline can effectively improve the performance of the model. Specifically, by comparing the ablation model AM1 with AM2 and AM3, we observe that both PT and MSQP improve the performance of the model, with PT showing a more remarkable impact. For example, compared with the model AM1, the PLCC value of the models AM2 and AM3 increased by

0.0461

and

0.0193

, respectively. Comparing the ablation model AM4 with AM2 and AM3, the combination of PT and MSQP is able to further improve the overall performance of the model. For example, compared with the models AM2 and AM3, the PLCC value of the model AM4 increased by

0.0341

and

0.0609

, respectively. To sum up, both PT and MSQP, which combine the pre-training technique and multi-scale features, play a crucial role in improving the performance of the UIUA model.

4.1.2. Visual Explanation

In order to deeply understand the merits of simultaneously using PT and MSQP from another view, we present the visual activations on six representative samples from the OD-UIUA dataset, as shown in Figure 11. These visual activations are the Grad-CAM maps of the features outputted by the four ablation models. The six representative samples in column (a) include two raw underwater images in the first row and forth row and their corresponding four enhanced images in the remaining rows. The Grad-CAM activation maps of these six samples in AM1∼AM4 are shown in column (b)∼(e). Specifically, comparing the images in the first 3 rows of column (b) and column (c) and the images in the last 3 rows of column (b) and column (d), the activation in echinus regions of single using PT and single using MSQP is better than that of single using BL. Meanwhile, comparing all the images of column (b)∼(d) with that of column (e), the activation on echinus regions of simultaneously using PT and MSQP is better than that of others, which indicates that simultaneously using PT and MSQP effectively guides the network to focus on the underwater object and provide these critical features for prediction. Thus, our proposed DeepUIUA simultaneously consisting PT and MSQP is highly suitable to analyze the utility of underwater images.

4.2. Limitations

Based on the previous experimental verification, the effectiveness of the DeepUIUA method has been fully demonstrated. However, in practical applications, there is still space for optimization of this method in particular situations. In the three typical images shown in Figure 12, there is a notable disparity between the UIUS and the predicted score, which may be due to the impact of various degradation and distortion factors in the underwater environment, resulting in biased prediction results. In addition, due to the complexity and diversity of underwater environments, the current OD-UIUA dataset has limitations in scale. In future research, we intend to expand our exploration of different underwater environments and biological species by increasing the size of the dataset and further optimizing the assessment algorithm. At the same time, based on the evaluation results of algorithms, we will implement targeted improvements to underwater image enhancement and object detection techniques, thereby promoting the overall progress of underwater image processing technology.

5. Conclusions

In this work, we explore the UIUA from two aspects. First, we constructed the large-scale OD-UIUA dataset, which contains 1200 raw underwater images, corresponding to 12,000 enhanced results by 10 representative UIE algorithms, and 13,200 utility scores for all images. The proposed OD-UIUA dataset not only provides a fair and reliable benchmark for the UIUA task but also promotes further development of machine vision-oriented UIE algorithms. Second, we propose a two-stage UIUA model, DeepUIUA, which accurately predicts utility scores by fully employing the merits of a pre-training technique and multi-scale features. Extensive performance comparisons with 14 SOTA NR-IQA methods on the OD-UIUA dataset demonstrate the superiority of the proposed DeepUIUA, exceeding suboptimal methods by more than 10% in multiple evaluation metrics, which is expected to play a significant role in future machine vision applications.

Author Contributions

Methodology, J.L.; investigation, Y.L.; supervision, Q.J. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the Natural Science Foundation of China (62271277), in part by the Natural Science Foundation of Zhejiang (LR22F020002), and in part by the Natural Science Foundation of Ningbo (2022J081).

Data Availability Statement

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Lin, Z.; He, Z.; Jin, C.; Luo, T.; Chen, Y. Joint Luminance-Saliency Prior and Attention for Underwater Image Quality Assessment. Remote Sens. 2024, 16, 3021. [Google Scholar] [CrossRef]
Li, F.; Li, W.; Zheng, J.; Wang, L.; Xi, Y. Contrastive Feature Disentanglement via Physical Priors for Underwater Image Enhancement. Remote Sens. 2025, 17, 759. [Google Scholar] [CrossRef]
Saleem, A.; Awad, A.; Paheding, S.; Lucas, E.; Havens, T.C.; Esselman, P.C. Understanding the influence of image enhancement on underwater object detection: A quantitative and qualitative study. Remote Sens. 2025, 17, 185. [Google Scholar] [CrossRef]
Jaffe, J.S. Computer modeling and the design of optimal underwater imaging systems. IEEE J. Ocean. Eng. 1990, 15, 101–111. [Google Scholar] [CrossRef]
Li, L.; Li, Y.; Wang, H.; Yue, C.; Gao, P.; Wang, Y.; Feng, X. Side-Scan Sonar Image Generation Under Zero and Few Samples for Underwater Target Detection. Remote Sens. 2024, 16, 4134. [Google Scholar] [CrossRef]
Hao, Y.; Yuan, Y.; Zhang, H.; Zhang, Z. Underwater Optical Imaging: Methods, Applications and Perspectives. Remote Sens. 2024, 16, 3773. [Google Scholar] [CrossRef]
Wen, X.; Wang, J.; Cheng, C.; Zhang, F.; Pan, G. Underwater side-scan sonar target detection: YOLOv7 model combined with attention mechanism and scaling factor. Remote Sens. 2024, 16, 2492. [Google Scholar] [CrossRef]
Esmaeilzehi, A.; Ou, Y.; Ahmad, M.O.; Swamy, M.N.S. DMML: Deep Multi-Prior and Multi-Discriminator Learning for Underwater Image Enhancement. IEEE Trans. Broadcast. 2024, 70, 637–653. [Google Scholar] [CrossRef]
Qiao, N.; Dong, L.; Sun, C. Adaptive deep learning network with multi-scale and multi-dimensional features for underwater image enhancement. IEEE Trans. Broadcast. 2022, 69, 482–494. [Google Scholar] [CrossRef]
Song, W.; Wang, Y.; Huang, D.; Liotta, A.; Perra, C. Enhancement of underwater images with statistical model of background light and optimization of transmission map. IEEE Trans. Broadcast. 2020, 66, 153–169. [Google Scholar] [CrossRef]
Kang, Y.; Jiang, Q.; Li, C.; Ren, W.; Liu, H.; Wang, P. A Perception-Aware Decomposition and Fusion Framework for Underwater Image Enhancement. IEEE Trans. Circuits Syst. Video Technol. 2023, 33, 988–1002. [Google Scholar] [CrossRef]
Jiang, Q.; Kang, Y.; Wang, Z.; Ren, W.; Li, C. Perception-Driven Deep Underwater Image Enhancement Without Paired Supervision. IEEE Trans. Multimed. 2024, 26, 4884–4897. [Google Scholar] [CrossRef]
Liu, Y.; Jiang, Q.; Wang, X.; Luo, T.; Zhou, J. Underwater Image Enhancement with Cascaded Contrastive Learning. IEEE Trans. Multimed. 2024, 27, 1512–1525. [Google Scholar] [CrossRef]
Liao, X.; Wei, X.; Zhou, M.; Kwong, S. Full-Reference Image Quality Assessment: Addressing Content Misalignment Issue by Comparing Order Statistics of Deep Features. IEEE Trans. Broadcast. 2024, 70, 305–315. [Google Scholar] [CrossRef]
Zhou, T.; Tan, S.; Zhou, W.; Luo, Y.; Wang, Y.G.; Yue, G. Adaptive Mixed-Scale Feature Fusion Network for Blind AI-Generated Image Quality Assessment. IEEE Trans. Broadcast. 2024, 70, 833–843. [Google Scholar] [CrossRef]
Hu, B.; Zhao, T.; Zheng, J.; Zhang, Y.; Li, L.; Li, W.; Gao, X. Blind Image Quality Assessment with Coarse-Grained Perception Construction and Fine-Grained Interaction Learning. IEEE Trans. Broadcast. 2024, 70, 533–544. [Google Scholar] [CrossRef]
Zhou, M.; Lang, S.; Zhang, T.; Liao, X.; Shang, Z.; Xiang, T.; Fang, B. Attentional Feature Fusion for End-to-End Blind Image Quality Assessment. IEEE Trans. Broadcast. 2023, 69, 144–152. [Google Scholar] [CrossRef]
Zhou, M.; Wang, H.; Wei, X.; Feng, Y.; Luo, J.; Pu, H.; Zhao, J.; Wang, L.; Chu, Z.; Wang, X.; et al. HDIQA: A Hyper Debiasing Framework for Full Reference Image Quality Assessment. IEEE Trans. Broadcast. 2024, 70, 545–554. [Google Scholar] [CrossRef]
Ancuti, C.O.; Ancuti, C.; De Vleeschouwer, C.; Bekaert, P. Color balance and fusion for underwater image enhancement. IEEE Trans. Image Process. 2017, 27, 379–393. [Google Scholar] [CrossRef]
Wang, Y.; Song, W.; Fortino, G.; Qi, L.Z.; Zhang, W.; Liotta, A. An experimental-based review of image enhancement and image restoration methods for underwater imaging. IEEE Access 2019, 7, 140233–140251. [Google Scholar] [CrossRef]
Jiang, Q.; Liu, Z.; Gu, K.; Shao, F.; Zhang, X.; Liu, H.; Lin, W. Single image super-resolution quality assessment: A real-world dataset, subjective studies, and an objective metric. IEEE Trans. Image Process. 2022, 31, 2279–2294. [Google Scholar] [CrossRef] [PubMed]
Jiang, Q.; Yi, X.; Ouyang, L.; Zhou, J.; Wang, Z. Towards dimension-enriched underwater image quality assessment. IEEE Trans. Circuits Syst. Video Technol. 2024, 35, 1385–1398. [Google Scholar] [CrossRef]
Jiang, Q.; Gu, Y.; Li, C.; Cong, R.; Shao, F. Underwater Image Enhancement Quality Evaluation: Benchmark Dataset and Objective Metric. IEEE Trans. Circuits Syst. Video Technol. 2022, 32, 5959–5974. [Google Scholar] [CrossRef]
Zheng, Y.; Chen, W.; Lin, R.; Zhao, T.; Le Callet, P. UIF: An objective quality assessment for underwater image enhancement. IEEE Trans. Image Process. 2022, 31, 5456–5468. [Google Scholar] [CrossRef]
Cheng, J.; Wu, Z.; Wang, S.; Demonceaux, C.; Jiang, Q. Bidirectional collaborative mentoring network for marine organism detection and beyond. IEEE Trans. Circuits Syst. Video Technol. 2023, 33, 6595–6608. [Google Scholar] [CrossRef]
Jiang, Q.; Liu, Z.; Wang, S.; Shao, F.; Lin, W. Toward top-down just noticeable difference estimation of natural images. IEEE Trans. Image Process. 2022, 31, 3697–3712. [Google Scholar] [CrossRef] [PubMed]
Jiang, Q.; Gu, Y.; Wu, Z.; Li, C.; Xiong, H.; Shao, F.; Wang, Z. Deep Underwater Image Quality Assessment with Explicit Degradation Awareness Embedding. IEEE Trans. Image Process. 2025, 34, 1297–1310. [Google Scholar] [CrossRef] [PubMed]
Liu, R.; Fan, X.; Zhu, M.; Hou, M.; Luo, Z. Real-World Underwater Enhancement: Challenges, Benchmarks, and Solutions Under Natural Light. IEEE Trans. Circuits Syst. Video Technol. 2020, 30, 4861–4875. [Google Scholar] [CrossRef]
Li, H.; Li, J.; Wang, W. A fusion adversarial underwater image enhancement network with a public test dataset. arXiv 2019, arXiv:1906.06819. [Google Scholar]
Li, C.; Guo, C.; Ren, W.; Cong, R.; Hou, J.; Kwong, S.; Tao, D. An Underwater Image Enhancement Benchmark Dataset and Beyond. IEEE Trans. Image Process. 2020, 29, 4376–4389. [Google Scholar] [CrossRef] [PubMed]
Islam, M.J.; Xia, Y.; Sattar, J. Fast Underwater Image Enhancement for Improved Visual Perception. IEEE Robot. Autom. Lett. 2020, 5, 3227–3234. [Google Scholar] [CrossRef]
Liu, Y.; Zhang, B.; Hu, R.; Gu, K.; Zhai, G.; Dong, J. Underwater Image Quality Assessment: Benchmark Database and Objective Method. IEEE Trans. Multimed. 2024, 26, 7734–7747. [Google Scholar] [CrossRef]
Zhou, J.; Wang, S.; Lin, Z.; Jiang, Q.; Sohel, F. A pixel distribution remapping and multi-prior retinex variational model for underwater image enhancement. IEEE Trans. Multimed. 2024, 26, 7838–7849. [Google Scholar] [CrossRef]
Jiang, Q.; Mao, Y.; Cong, R.; Ren, W.; Huang, C.; Shao, F. Unsupervised decomposition and correction network for low-light image enhancement. IEEE Trans. Intell. Transp. Syst. 2022, 23, 19440–19455. [Google Scholar] [CrossRef]
Fu, X.; Cao, X. Underwater image enhancement with global–local networks and compressed-histogram equalization. Signal Process. Image Commun. 2020, 86, 115892. [Google Scholar] [CrossRef]
Chen, X.; Zhang, P.; Quan, L.; Yi, C.; Lu, C. Underwater Image Enhancement based on Deep Learning and Image Formation Model. arXiv 2021, arXiv:2101.00991. [Google Scholar]
Naik, A.; Swarnakar, A.; Mittal, K. Shallow-UWnet: Compressed Model for Underwater Image Enhancement (Student Abstract). In Proceedings of the AAAI Conference on Artificial Intelligence, Virtual, 2–9 February 2021; Volume 35, pp. 15853–15854. [Google Scholar]
Zhang, W.; Zhuang, P.; Sun, H.H.; Li, G.; Kwong, S.; Li, C. Underwater Image Enhancement via Minimal Color Loss and Locally Adaptive Contrast Enhancement. IEEE Trans. Image Process. 2022, 31, 3997–4010. [Google Scholar] [CrossRef]
Xiao, Z.; Han, Y.; Rahardja, S.; Ma, Y. USLN: A statistically guided lightweight network for underwater image enhancement via dual-statistic white balance and multi-color space stretch. arXiv 2022, arXiv:2209.02221. [Google Scholar]
Guo, C.; Wu, R.; Jin, X.; Han, L.; Zhang, W.; Chai, Z.; Li, C. Underwater ranker: Learn which is better and how to be better. In Proceedings of the AAAI Conference on Artificial Intelligence, Washington, DC, USA, 7–14 February 2023; Volume 37, pp. 702–709. [Google Scholar]
Li, K.; Wu, L.; Qi, Q.; Liu, W.; Gao, X.; Zhou, L.; Song, D. Beyond single reference for training: Underwater image enhancement via comparative learning. IEEE Trans. Circuits Syst. Video Technol. 2022, 33, 2561–2576. [Google Scholar] [CrossRef]
Jiang, J.; Ye, T.; Bai, J.; Chen, S.; Chai, W.; Jun, S.; Liu, Y.; Chen, E. Five A⁺ Network: You Only Need 9K Parameters for Underwater Image Enhancement. arXiv 2023, arXiv:2305.08824. [Google Scholar]
Zhang, W.; Zhou, L.; Zhuang, P.; Li, G.; Pan, X.; Zhao, W.; Li, C. Underwater image enhancement via weighted wavelet visual perception fusion. IEEE Trans. Circuits Syst. Video Technol. 2023, 34, 2469–2483. [Google Scholar] [CrossRef]
Rao, Y.; Liu, W.; Li, K.; Fan, H.; Wang, S.; Dong, J. Deep color compensation for generalized underwater image enhancement. IEEE Trans. Circuits Syst. Video Technol. 2023, 34, 2577–2590. [Google Scholar] [CrossRef]
Wang, Y.; Guo, J.; He, W.; Gao, H.; Yue, H.; Zhang, Z.; Li, C. Is underwater image enhancement all object detectors need? IEEE J. Ocean. Eng. 2023, 49, 606–621. [Google Scholar] [CrossRef]
Zhou, J.; Liu, C.; Zhang, D.; He, Z.; Sohel, F.; Jiang, Q. RSUIA: Dynamic No-Reference Underwater Image Assessment via Reinforcement Sequences. IEEE Trans. Multimed. 2025. early access. [Google Scholar] [CrossRef]
Yang, M.; Sowmya, A. An underwater color image quality evaluation metric. IEEE Trans. Image Process. 2015, 24, 6062–6071. [Google Scholar] [CrossRef]
Panetta, K.; Gao, C.; Agaian, S. Human-visual-system-inspired underwater image quality measures. IEEE J. Ocean. Eng. 2015, 41, 541–551. [Google Scholar] [CrossRef]
Wang, Y.; Li, N.; Li, Z.; Gu, Z.; Zheng, H.; Zheng, B.; Sun, M. An imaging-inspired no-reference underwater color image quality assessment metric. Comput. Electr. Eng. 2018, 70, 904–913. [Google Scholar] [CrossRef]
Yang, N.; Zhong, Q.; Li, K.; Cong, R.; Zhao, Y.; Kwong, S. A reference-free underwater image quality assessment metric in frequency domain. Signal Process. Image Commun. 2021, 94, 116218. [Google Scholar] [CrossRef]
Guo, P.; Liu, H.; Zeng, D.; Xiang, T.; Li, L.; Gu, K. An underwater image quality assessment metric. IEEE Trans. Multimed. 2022, 25, 5093–5106. [Google Scholar] [CrossRef]
Wang, Z.; Shen, L.; Wang, Z.; Lin, Y.; Jin, Y. Generation-based joint luminance-chrominance learning for underwater image quality assessment. IEEE Trans. Circuits Syst. Video Technol. 2022, 33, 1123–1139. [Google Scholar] [CrossRef]
Liu, Y.; Gu, K.; Cao, J.; Wang, S.; Zhai, G.; Dong, J.; Kwong, S. UIQI: A comprehensive quality evaluation index for underwater images. IEEE Trans. Multimed. 2023, 26, 2560–2573. [Google Scholar] [CrossRef]
Jin, J.; Jiang, Q.; Wu, Q.; Xu, B.; Cong, R. Underwater Salient Object Detection via Dual-stage Self-paced Learning and Depth Emphasis. IEEE Trans. Circuits Syst. Video Technol. 2024, 35, 2147–2160. [Google Scholar] [CrossRef]
Liu, C.; Li, H.; Wang, S.; Zhu, M.; Wang, D.; Fan, X.; Wang, Z. A dataset and benchmark of underwater object detection for robot picking. In Proceedings of the 2021 IEEE International Conference on Multimedia & Expo Workshops (ICMEW), Shenzhen, China, 5–9 July 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 1–6. [Google Scholar]
Ross, T.Y.; Dollár, G. Focal loss for dense object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2980–2988. [Google Scholar]
Tian, Z.; Shen, C.; Chen, H.; He, T. FCOS: A simple and strong anchor-free object detector. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 44, 1922–1933. [Google Scholar] [CrossRef] [PubMed]
Feng, C.; Zhong, Y.; Gao, Y.; Scott, M.R.; Huang, W. Tood: Task-aligned one-stage object detection. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, BC, Canada, 10–17 October 2024; IEEE Computer Society: Washington, DC, USA, 2021; pp. 3490–3499. [Google Scholar]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed]
Cai, Z.; Vasconcelos, N. Cascade r-cnn: Delving into high quality object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 6154–6162. [Google Scholar]
Zheng, G.; Songtao, L.; Feng, W.; Zeming, L.; Jian, S. YOLOX: Exceeding YOLO series in 2021. arXiv 2021, arXiv:2107.08430. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
Chen, K.; Wang, J.; Pang, J.; Cao, Y.; Xiong, Y.; Li, X.; Sun, S.; Feng, W.; Liu, Z.; Xu, J.; et al. MMDetection: Open mmlab detection toolbox and benchmark. arXiv 2019, arXiv:1906.07155. [Google Scholar]
Kang, L.; Ye, P.; Li, Y.; Doermann, D. Convolutional neural networks for no-reference image quality assessment. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 1733–1740. [Google Scholar]
Kang, L.; Ye, P.; Li, Y.; Doermann, D. Simultaneous estimation of image quality and distortion via multi-task convolutional neural networks. In Proceedings of the 2015 IEEE International Conference on Image Processing (ICIP), Quebec City, QC, Canada, 27–30 September 2015; pp. 2791–2795. [Google Scholar]
Bosse, S.; Maniry, D.; Müller, K.R.; Wiegand, T.; Samek, W. Deep neural networks for no-reference and full-reference image quality assessment. IEEE Trans. Image Process. 2017, 27, 206–219. [Google Scholar] [CrossRef]
Zhang, W.; Ma, K.; Yan, J.; Deng, D.; Wang, Z. Blind Image Quality Assessment Using a Deep Bilinear Convolutional Neural Network. IEEE Trans. Circuits Syst. Video Technol. 2020, 30, 36–47. [Google Scholar] [CrossRef]
Su, S.; Yan, Q.; Zhu, Y.; Zhang, C.; Ge, X.; Sun, J.; Zhang, Y. Blindly assess image quality in the wild guided by a self-adaptive hyper network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 3667–3676. [Google Scholar]
Zhu, H.; Li, L.; Wu, J.; Dong, W.; Shi, G. MetaIQA: Deep meta-learning for no-reference image quality assessment. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 14143–14152. [Google Scholar]
Golestaneh, S.A.; Dadsetan, S.; Kitani, K.M. No-reference image quality assessment via transformers, relative ranking, and self-consistency. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA, 3–8 January 2022; pp. 1220–1230. [Google Scholar]
Pan, Z.; Zhang, H.; Lei, J.; Fang, Y.; Shao, X.; Ling, N.; Kwong, S. DACNN: Blind image quality assessment via a distortion-aware convolutional neural network. IEEE Trans. Circuits Syst. Video Technol. 2022, 32, 7518–7531. [Google Scholar] [CrossRef]
Pan, Z.; Yuan, F.; Lei, J.; Fang, Y.; Shao, X.; Kwong, S. VCRNet: Visual compensation restoration network for no-reference image quality assessment. IEEE Trans. Image Process. 2022, 31, 1613–1627. [Google Scholar] [CrossRef]
Qin, G.; Hu, R.; Liu, Y.; Zheng, X.; Liu, H.; Li, X.; Zhang, Y. Data-efficient image quality assessment with attention-panel decoder. In Proceedings of the AAAI Conference on Artificial Intelligence, Washington, DC, USA, 7–14 February 2023; Volume 37, pp. 2091–2100. [Google Scholar]
Sun, W.; Min, X.; Tu, D.; Ma, S.; Zhai, G. Blind quality assessment for in-the-wild images via hierarchical feature fusion and iterative mixed database training. IEEE J. Sel. Top. Signal Process. 2023, 17, 1178–1192. [Google Scholar] [CrossRef]
Roy, S.; Mitra, S.; Biswas, S.; Soundararajan, R. Test time adaptation for blind image quality assessment. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 2–3 October 2023; pp. 16742–16751. [Google Scholar]
Chen, C.; Mo, J.; Hou, J.; Wu, H.; Liao, L.; Sun, W.; Yan, Q.; Lin, W. Topiq: A top-down approach from semantics to distortions for image quality assessment. IEEE Trans. Image Process. 2024, 33, 2404–2418. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Raw underwater image samples of the OD-UIUA dataset. These images not only include a variety of biological categories but also cover typical degraded scenes with different degrees of color cast, blur, low light, reduced contrast, etc.

Figure 2. Raw underwater images and corresponding enhanced results by 10 representative UIE algorithms: (a) raw images, (b) CLUIE-Net [41], (c) DLIFM [36], (d) FiveAPlus-Net [42], (e) GLCHE [35], (f) MLLE [38], (g) P2CNet [44], (h) Shallow-UWNet [37], (i) USLN [39], (j) NU₂Net [40], and (k) WWPF [43].

Figure 3. Process flowchart from constructing datasets to generating UIUS.

Figure 4. The proportions of different object categories in the OD-UIUA dataset.

Figure 5. The proportions between the number of objects and categories per image in the OD-UIUA dataset: (a) the proportion of the number of objects per image and (b) the proportion of the number of categories per image.

Figure 6. The distribution of UIUS.

Figure 7. The UIUS of subsets obtained by different UIE methods. D1: CLUIE-Net [41], D2: DLIFM [36], D3: FiveAPlus-Net [42], D4: GLCHE [35], D5: MLLE [38], D6: P2CNet [44], D7: Raw, D8: Shallow-UWNet [37], D9: USLN [39], D10: NU₂Net [40], and D11: WWPF [43].

Figure 8. UIUS comparison between raw images and enhanced images.

Figure 9. Framework of the proposed DeepUIUA.

Figure 10. Underwater sample images with different quality levels associated with their labeled UIUSs and predicted UIUSs by DeepUIUA (UIUS/DeepUIUA score).

Figure 11. The activations of feature sets extracted by different ablation models under different images and enhancement methods: (a) input image, (b) AM1, (c) AM2, (d) AM3, and (e) AM4.

Figure 12. Some failure cases of DeepUIUA (UIUS/DeepUIUA score).

Table 1. Performance comparison of 14 state-of-the-art methods and the proposed DeepUIUA on the OD-UIUA dataset.

Comparison Method	PLCC	SRCC	KRCC	RMSE
CNN-IQA [64]	0.4625	0.4450	0.3052	0.1536
IQA-CNN++ [65]	0.4529	0.4435	0.3037	0.1548
WaDIQaM-NR [66]	0.4112	0.3729	0.2537	0.1632
DBCNN [67]	0.4788	0.4647	0.3210	0.1569
HyperIQA [68]	0.5301	0.5036	0.3517	0.1470
MetaIQA [69]	0.5070	0.4907	0.3423	0.1538
TRes [70]	0.3444	0.3410	0.2314	0.1895
DACNN [71]	0.4413	0.4117	0.2817	0.1559
VCRNet [72]	0.4580	0.4279	0.2947	0.1557
DEIQT [73]	0.5287	0.4912	0.3416	0.1549
StairIQA [74]	0.5307	0.4971	0.3486	0.1474
TTA-IQA [75]	0.3617	0.3270	0.2220	0.1803
TOPIQ [76]	0.5179	0.4891	0.3407	0.1528
ATUIQP [32]	0.2512	0.2214	0.1476	0.1737
Ours	0.6004	0.5595	0.3963	0.1386

Table 2. Parameters, FLOPs, and running times of different NR-IQA methods.

Comparison Method	Params (MB)	FLOPs (G)	Running Times (s)
CNN-IQA [64]	0.35	0.73	0.011
IQA-CNN++ [65]	0.05	0.08	0.007
WaDIQaM-NR [66]	3.28	4.98	0.038
DBCNN [67]	16.50	15.31	0.116
HyperIQA [68]	4.34	27.38	0.015
MetaIQA [69]	1.83	13.24	0.034
TRes [70]	8.39	34.46	0.053
DACNN [71]	0.33	2.90	0.143
VCRNet [72]	10.27	11.41	0.043
DEIQT [73]	4.26	22.77	0.019
StairIQA [74]	5.11	30.49	0.015
TTA-IQA [75]	8.30	34.40	0.043
ATUIQP [32]	4.87	28.10	0.019
Ours	25.61	4.13	0.106

Table 3. Composition and performance comparison of ablation models.

	BL	PT	MSQP	PLCC	SRCC	KRCC	RMSE
AM1	✓			0.5202	0.4951	0.3459	0.1488
AM2	✓	✓		0.5663	0.5263	0.3701	0.1443
AM3	✓		✓	0.5395	0.5110	0.3570	0.1481
AM4	✓	✓	✓	0.6004	0.5595	0.3963	0.1386

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Liu, J.; Liu, Y.; Jiang, Q. Delving into Underwater Image Utility: Benchmark Dataset and Prediction Model. Remote Sens. 2025, 17, 1906. https://doi.org/10.3390/rs17111906

AMA Style

Liu J, Liu Y, Jiang Q. Delving into Underwater Image Utility: Benchmark Dataset and Prediction Model. Remote Sensing. 2025; 17(11):1906. https://doi.org/10.3390/rs17111906

Chicago/Turabian Style

Liu, Jiapeng, Yi Liu, and Qiuping Jiang. 2025. "Delving into Underwater Image Utility: Benchmark Dataset and Prediction Model" Remote Sensing 17, no. 11: 1906. https://doi.org/10.3390/rs17111906

APA Style

Liu, J., Liu, Y., & Jiang, Q. (2025). Delving into Underwater Image Utility: Benchmark Dataset and Prediction Model. Remote Sensing, 17(11), 1906. https://doi.org/10.3390/rs17111906

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Delving into Underwater Image Utility: Benchmark Dataset and Prediction Model

Abstract

1. Introduction

2. Materials and Methods

2.1. Related Works

2.1.1. Underwater Image Dataset

2.1.2. Underwater Image Enhancement

2.1.3. Underwater Image Quality Assessment

2.2. Construction of OD-UIUA Dataset

2.2.1. Raw Underwater Image Collection

2.2.2. Enhanced Image Generation

2.2.3. Utility Score Generation

2.2.4. Dataset Analysis

2.3. Proposed DeepUIUA

2.3.1. Stage1: Object Region-Oriented Feature Capturing Network

2.3.2. Stage2: Multi-Scale Quality Assessment Network

3. Results

3.1. Experimental Protocol

3.1.1. Utility Score Generation Details

3.1.2. Training Details of DeepUIUA

3.1.3. Evaluation Metrics

3.2. Performance Comparison with the State-of-the-Art Methods

4. Discussion

4.1. Ablation Study

4.1.1. Ablation of the Proposed Components

4.1.2. Visual Explanation

4.2. Limitations

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI