Improved Binary Classification of Underwater Images Using a Modified ResNet-18 Model

Mehrunnisa,; Leszczuk, Mikolaj; Juszka, Dawid; Zhang, Yi

doi:10.3390/electronics14152954

Open AccessArticle

Improved Binary Classification of Underwater Images Using a Modified ResNet-18 Model

¹

AGH University of Krakow, 30-059 Krakow, Poland

²

Xi’an Jiaotong University, Xi’an 710049, China

^*

Author to whom correspondence should be addressed.

Electronics 2025, 14(15), 2954; https://doi.org/10.3390/electronics14152954

Submission received: 28 May 2025 / Revised: 8 July 2025 / Accepted: 21 July 2025 / Published: 24 July 2025

(This article belongs to the Special Issue Recent Advances and Applications of Machine Learning in Pattern Recognition)

Download

Browse Figures

Versions Notes

Abstract

In recent years, the classification of underwater images has become one of the most remarkable areas of research in computer vision due to its useful applications in marine sciences, aquatic robotics, and sea exploration. Underwater imaging is pivotal for the evaluation of marine eco-systems, analysis of biological habitats, and monitoring underwater infrastructure. Extracting useful information from underwater images is highly challenging due to factors such as light distortion, scattering, poor contrast, and complex foreground patterns. These difficulties make traditional image processing and machine learning techniques struggle to analyze images accurately. As a result, these challenges and complexities make the classification difficult or poor to perform. Recently, deep learning techniques, especially convolutional neural network (CNN), have emerged as influential tools for underwater image classification, contributing noteworthy improvements in accuracy and performance in the presence of all these challenges. In this paper, we have proposed a modified ResNet-18 model for the binary classification of underwater images into raw and enhanced images. In the proposed modified ResNet-18 model, we have added new layers such as Linear, rectified linear unit (ReLU) and dropout layers, arranged in a block that was repeated three times to enhance feature extraction and improve learning. This enables our model to learn the complex patterns present in the image in more detail, which helps the model to perform the classification very well. Due to these newly added layers, our proposed model addresses various complexities such as noise, distortion, varying illumination conditions, and complex patterns by learning vigorous features from underwater image datasets. To handle the issue of class imbalance present in the dataset, we applied a data augmentation technique. The proposed model achieved outstanding performance, with 96% accuracy, 99% precision, 92% sensitivity, 99% specificity, 95% F1-score, and a 96% Area under the Receiver Operating Characteristic Curve (AUC-ROC) score. These results demonstrate the strength and reliability of our proposed model in handling the challenges posed by the underwater imagery and making it a favorable solution for advancing underwater image classification tasks.

Keywords:

ResNet-18; CNN; ReLU; dropout

1. Introduction

In recent years, the classification of underwater images has attracted the great attention of the computer vision research community because of its useful application in marine sciences, aquatic robotics, and sea exploration. Underwater imaging plays a very important role in a variety of analyses and evaluations of various kinds of underwater objects. It includes the evaluation of marine ecosystems, the analysis of biological habitats, and the observation of the condition of offshore oil, gas pipelines, and underwater optical fibers as well [1]. This kind of imaging is predominantly achieved by making use of professional specialized cameras specially designed for underwater environments. These kinds of special camera are commonly encapsulated in waterproof enclosures to eliminate any water damage [2]. It is a very challenging and difficult task to classify objects from underwater images. This is because the underwater environment is very challenging due to lightening conditions, color distortion, and scattering, making its analysis and processing more challenging. The underwater images are also degraded and poorly contrasted. This kind of challenge makes it difficult to extract useful information from foreground objects from underwater images using classical digital image processing and classification techniques [3].

In recent years, deep learning technology-based methods have become influential tools to overcome these challenges by providing state-of-the-art solutions for underwater image classification tasks. The emergence of deep learning has transformed the domain of image analysis by empowering learning models to learn complex feature representations from raw data. For underwater images, deep learning models, including convolutional neural networks (CNNs), have shown unusual performance in the classification and recognition of objects in the presence of noise, distortion, and low visibility [4,5].

The CNN-based models can learn robust features from large datasets of underwater images, which can lead to notable improvements in classification accuracy. Despite these improvements, the classification task of underwater images employing deep learning techniques still suffers from various obstacles. The limited availability of high-quality annotated datasets, combined with the inherent variability of underwater conditions, often results in the poor generalization of trained models. Furthermore, underwater images suffer from issues like blur, non-uniform illumination, and degraded color fidelity, which complicates feature extraction. Therefore, designing deep learning models that can adapt to these underwater-specific challenges is an area of ongoing research [6,7].

Recent advances in neural network architectures, such as ResNet, EfficientNet, and attention-based models, have further improved the capabilities of deep learning in underwater image classification. These models introduce more efficient feature extraction mechanisms, allowing better handling of noise and distortions typical in underwater imagery. Coupling these architectures with advanced training techniques, such as adversarial learning and data augmentation, has shown great promise in pushing the boundaries of what is achievable in this domain. In this paper, we have focused on the binary classification of underwater images by using the ResNet-18 model, which is a CNN-based architecture. We have used a variety of underwater images to train our modified ResNet-18 model. Our proposed model performs very well in the presence of noise, illumination, distortion, and complex patterns, showing its strength in the classification task. We have conducted various experiments to fine-tune and train our proposed model in a training set and evaluate its performance on the test set. Moreover, distinguishing between raw and enhanced underwater images is crucial in many practical applications. Enhanced images, often produced using Underwater Image Enhancement (UIE) algorithms, are typically used in tasks such as object detection, habitat analysis, and infrastructure monitoring [8]. In large-scale or automated image processing pipelines, it is not always clear whether an image has already been enhanced. An automatic classification of images into raw or enhanced categories can prevent redundant processing, improve pipeline efficiency, and support consistent dataset management. Thus, a reliable binary classifier is essential for maintaining operational accuracy and integrity in underwater image analysis systems.

The main contribution of this paper is as follows:

We have proposed a modified ResNet-18 model to perform the binary classification of underwater images.
Three new layers are added to the modified ResNet-18 model after frozen layers, namely, linear, ReLU, and dropout. These three layers are repeated three times in the form of blocks.
By adding these layers, we enabled our model to learn complex patterns present in the underwater images to obtain informative features and subsequently perform classification well.
Various complexities, including noise, distortion, and varying illumination conditions, are also addressed very well because of these additional layers added.
We have applied the data augmentation technique to fix the issue of class imbalance.

The remaining sections are organized as follows: In Section 3, we have given the methodological background of the ResNet-18 architecture. Our proposed model of modified ResNet-18 is described and discussed in detail in Section 4 along with a model overview. All the experimental results and their analysis are given in Section 5. Finally, this paper is concluded in Section 9.

2. Literature Review

Underwater imaging and its classification have attracted substantial attention from the computer vision research community in recent years. This is due to its importance in marine ecology, biodiversity conservation, and underwater exploration. Researchers have applied a variety of deep learning techniques [4,5] to deal with various challenges such as low visibility, color distortion, and limited labeled datasets. The use of transfer learning [4,6,9,10,11], convolutional neural networks (CNNs), [7,11,12,13,14,15,16,17,18] and image enhancement methods [3,12,19,20,21,22,23,24,25,26,27,28,29,30,31] have improved the classification accuracy of underwater species and objects. In this section of the literature review, we have studied various techniques to explore various methodologies, from transfer learning with few-shot learning to deep residual networks and YOLOv3 [31] architectures. We have provided insights into how these approaches have handled the complexities that are present in underwater environments.

The authors in [4] use transfer learning with few-shot learning to optimize underwater sonar image classification with limited labeled data. The authors used pre-trained deep neural networks. After fine-tuning them, they used meta-learning techniques to obtain better accuracy. The work presented in [19] proposed a method for the classification of underwater images. It combines various techniques such as image enhancement and information quality evaluation in order to improve the results. Their proposed method improved the visibility of underwater images by applying enhancement techniques. Fu et al. in [12] introduced a deep learning-based model for the enhancement of underwater images. They performed this enhancement by combining water body pre-classification. Their proposed method first classifies the types of water. After that, they applied various enhancement techniques to improve image quality.

Another contribution giving a detailed survey is presented in [3]. The authors of this work had given a survey of several deep learning techniques applied to underwater image classification, where [1] analyzes and compares the performance of deep learning models (VGG-16, EfficientNetB0, SimCLR) in classifying unsupervised underwater images, using clustering and dimensionality reduction techniques to enhance accuracy and generalization. This study aims to identify the most effective approaches for improving underwater image analysis. Mahmood et al. [20] offered a new method called ResFeats, a technique for underwater image classification. This methodology applies residual network-based features. They proposed a framework that carries the strengths of residual networks to optimize feature extraction from underwater images.

In [21], a method was proposed to enhance underwater images and videos by combining multiple sources of images, using fusion methods. The authors of this work proposed a framework that improved visibility and color fidelity by incorporating both original and processed images, where the model presented in [32] enhances underwater images through adaptive color correction and data-driven Retinex decomposition, while a hierarchical U-shaped transformer network improves contrast and reduces blur. This approach outperforms existing methods on benchmark datasets. Another approach [22] has been proposed to enhance underwater images that integrate wavelength compensation and dehazing techniques. The authors of this work resolved the challenges of color distortion and reduced visibility present in underwater images. These challenges are triggered by light absorption and scattering. One approach integrates a light field module and a sketch module to improve color representation and preserve structural details [33], while another combines Transformer and CNN in a parallel fusion design to capture both local and global features, leading to superior PSNR, SSIM, and detection performance [34]. These methods demonstrate significant improvements over traditional techniques in both visual quality and computational efficiency.

In terms of datasets, there was a method [23] to automate the annotation of images from the coral reef survey by applying some computer vision techniques. The system developed in this work can detect and classify various marine species within underwater images by facilitating efficient data collection and analysis for ecological studies. This method applied various deep learning algorithms to improve annotation accuracy and reduce additional manual effort.

Raveendran et al. [24] have provided an inclusive review of underwater image enhancement techniques by discussing recent trends, challenges, and their applications. The authors of this review classified the existing techniques into traditional and deep learning-based techniques. A detailed review of advanced techniques [25] in underwater image processing was provided that focused on restoration and enhancement techniques. The discussion discussed various methodologies to resolve multiple challenges, including color distortion, low visibility, and light attenuation present in underwater images due to their environment. A method presented in [26] is a deep learning-based approach to the classification of lake zooplankton using image data. The authors employ convolutional neural networks (CNNs) to automate the identification and categorization of zooplankton species, addressing challenges in traditional manual classification methods.

Prasetyo et al. [27] proposed a multilevel residual networking VGGNet for the classification of fish species. Its performance was enhanced by combining residual connections, resulting in improved feature extraction and fixing the vanishing gradient problem. Another approach [7] was proposed for underwater gesture recognition based on its environment that integrates computer vision and deep learning methods. This technique worked well in challenging underwater conditions, demonstrating the effectiveness of their approach.

The cross-pooled FishNet technique [9] based on the transfer learning model was proposed for the classification of fish species present in the images. It utilized pre-trained deep learning models to improve the classification performance and minimize the need for large labeled datasets. The fish species are identified by applying the CNN-based model [14] that was trained on synthetic data generated using computer graphics. It effectively augmented the training data, leading to improved classification accuracy. Gori et al. [15] proposed a method to recognize fish species by applying deep learning techniques for image classification. It uses various CNN architectures to improve identification accuracy and other evaluation measures. Another approach proposed by Guo et al. [35] focused on the identification of underwater sea cucumbers using deep residual networks, such as ResNets, to improve classification accuracy. Its model architecture addressed various challenges raised by underwater imaging conditions.

There also exists a review for the classification of plankton within ocean ecosystems [36] using computer vision techniques. It reviewed various techniques and algorithms utilized for effective plankton identification and analysis, emphasizing their significance for ecological monitoring. A CNN-based model [16] was implemented by improving the squeeze and excitation blocks for the biometric classification of temperate fish species. Their proposed architecture improved the feature representation and classification accuracy for fish identification tasks. Wang et al. [37] introduced a deep encoding–decoding network architecture for underwater object recognition to improve classification accuracy in challenging underwater environments. This novel architecture extracted features and then effectively reconstructed the images. It addressed various issues such as visibility loss and distortion in a better way.

Another approach equipped with an underwater drone panoramic camera was proposed for automatic recognition of fish using deep learning techniques [2]. This system captured underwater images and analyzed them to enhance fish identification in real time. A methodology presented in [10] transferred the knowledge from the deep learning model for object recognition in low-quality underwater videos. The proposed method uses pre-trained models to improve recognition performance despite the challenges posed by poor visibility and distortion. Szymak et al. [38] proposed a model for underwater object recognition using deep learning techniques. Various neural network architectures were utilized for object classification, dealing with various challenges related to underwater image quality. Another architecture was proposed [11] using the transfer learning technique with the deep CNN model to recognize live fish underwater. The authors of this methodology showed how pre-trained models can be fitted to enhance classification accuracy in challenging underwater conditions.

A deep learning-based technique for the accurate and efficient identification of coral reef fish in underwater images was also proposed by Villon et al. [39]. They applied CNN-based models to improve classification accuracy and performance, addressing challenges in traditional identification methods. Machine learning and deep learning techniques were also applied [40] to identify Posidonia meadows in underwater images. The authors proposed this methodology and compared various models, including convolutional neural networks, to improve the accuracy of underwater vegetation detection. Jin et al. [41] studied the use of deep learning-based architectures for underwater image recognition with small sample-sized datasets. They presented a model to enhance classification accuracy by applying data augmentation and various pre-trained networks. Rathi et al. [17] applied another approach for the classification of underwater fish species using convolutional neural networks and deep learning techniques. Their designed model performed effectively to classify fish species in challenging underwater environments.

In some proposed methodologies for underwater image classification, data augmentation techniques are utilized along the CNN-based model, as in [18]. By enhancing the size of the underwater image dataset, the model trained very well and also performed well in a challenging environment. One more deep learning-based architecture named DeepFish was proposed [28] for the precise recognition of live fish underwater. The authors design a model that effectively addresses the challenges of underwater image quality and fish species variability. Salman et al. [29] presented a deep learning-based approach to the classification of fish species in challenging underwater environments. These challenges may include variable lighting, occlusion, and the poses of fish. Convolutional neural networks are used to extract and learn features from underwater images. The model performed well in the above challenges.

Some advanced image processing and deep learning techniques [30] are used to deal with low-resolution underwater images for fish recognition. Various challenges established due to poor image quality, including blurriness and noise, which often delay accurate species identification, are addressed in this work. Their proposed CNN model improves recognition performance under these conditions. A deep learning model based on the YOLOv3 architecture [31] was developed for underwater object recognition, basically designed for real-time object detection [42]. YOLOv3 is applied to handle explicit challenges in underwater environments, such as low visibility and image distortion. In terms of datasets, an extended version of the marine underwater environment database was designed [43] to collaborate with research on underwater image processing and object recognition. Deep detailed descriptions of the database were provided, including diverse underwater scenes and various marine species.

Various challenges of real-world underwater image enhancement are also addressed in [44,45]. These challenges are under natural light conditions and focus on various issues such as color distortion and low visibility. The authors of this work reviewed various existing benchmark datasets and used evaluation metrics to effectively assess enhancement algorithms. In addition, they proposed various solutions using deep learning techniques to improve image quality in underwater environments. Another approach for the classification of underwater objects was proposed in [6]. The authors fixed the issue of limited labeled sonar data by producing synthetic data. They used it to further train their proposed deep learning models. Transfer learning is applied to improve the classification accuracy for the sonar image dataset. Lu et al. [46] introduced the FDCNet model, which is a filtering convolutional network. They designed it for the classification of marine organisms in underwater environments. They added a novel filter layer to improve the model’s ability to handle underwater image noise and distortion in order to improve classification.

A technique for adaptive foreground extraction was proposed to improve fish classification in underwater imagery using deep learning models [5]. Separating fish from complex underwater backgrounds enhances the performance of classification models. Their proposed method works well under varying environmental conditions, such as lighting and water clarity, to better segment fish in the images.

Several recent studies have investigated advanced techniques for underwater object detection and classification using deep learning. Pachaiyappan et al. proposed an approach that integrates diffusion models with the Convolutional Block Attention Module (CBAM) and the modified sweep transformer block (MSTB) to enhance underwater image quality. Their method effectively addresses challenges such as water turbidity and variable lighting conditions, resulting in improved accuracy for object detection and classification [47]. Similarly, Roy and Talukder developed a deep learning-based underwater object detection system, integrating these models with autonomous robots for real-time image analysis [48]. Their study emphasizes the importance of efficient processing for underwater exploration and maintenance, although specific performance metrics were not detailed.

The classification of underwater images has been approached using various methods, including transfer learning models [4,9], CNN-based classification [3,20], and image enhancement techniques [21,22]. However, these methods often face limitations such as the dependency of the dataset, inadequate feature extraction, and poor integration of enhancement and classification processes. YOLO-based models [2,31] improve processing speed but struggle with high noise levels and low-resolution images, which affects classification accuracy, whereas our proposed model is optimized for underwater conditions, ensuring improved robustness and accuracy. Unlike previous methods that focus on either enhancement or deep learning for classification, our approach offers improved feature extraction and classification in multiple challenging underwater conditions.

3. Methodological Background of ResNet-18 Architecture

In this section, a detailed description of the existing ResNet-18 architecture is given in detail.

ResNet-18 Architecture

ResNet-18 [49,50] is one of the deep learning-based models introduced in the literature. Neural network-based models, deeper in layers, are complex to train. This complexity increases as the model grows deeper by increasing the number of layers. If this increase in the depth of layers is not optimized cleverly, they may suffer from various degradation and vanishing gradient problems. This degradation and vanishing gradient problem affects the model performance because the model shows no improvement when adding this number of layers. To overcome this issue, the ResNet architecture was introduced. It gives a residual learning framework, as shown in Figure 1, that makes training these networks easier than those models used earlier. Its layers are reformulated as learning residual functions. It makes network optimization easier during backpropagation. They can also achieve good accuracy at an increased depth level. This strategy makes it more prominent in the literature from a performance point of view. In this architecture, the problem of degradation is handled by explicitly mapping the stacked layers to residual mapping. Here, a different mapping is used as given in Equation (1).

F (x) = H (x) - x

(1)

In this, H(x) shows the required mapping. This residual learning is then mapped to the non-linear layers that are stacked together. This formulation was originally proposed by He et al. [51]. The original mapping is as given in Equation (2)

G (x) + x

(2)

This mapping is referenced in the network to the input layer. The key concept of residual mapping is that its optimization is simpler compared to a mapping that is not referenced. G(x) + x is executed using “shortcut connections” in the input network.

The concept of ‘identity mapping’ is enabled by using these ‘short cuts’ connections. The output obtained from these shortcut connections is added further to the output of the stacked layers. The benefit of these identity mappings is that there is no additional computational overhead. The overall ResNet architecture shows better accuracy with an increase in its depth. There exist various versions of ResNet architecture in the literature having different numbers of layers, including 18-layer, 34-layer, 50-layer, 101-layer, and 152-layer architectures. In this proposed work, we have used the ResNet-18 architecture for underwater image classification. The number of layers in various architectures with parameters is given in Table 1.

4. Model Overview

In this section, we have given a complete description of our proposed ResNet-18 model, which is designed specifically for our binary classification problem of underwater image datasets.

4.1. Model Selection

Recent studies have compared ResNet-18, ResNet-50, EfficientNet, and Vision Transformers (ViT) in various image classification tasks, illustrating their respective strengths and weaknesses. ResNet-18 has been identified as an efficient model due to its lightweight architecture and low computational cost [52]. Although ResNet-50 offers deeper feature extraction, it incurs higher computational demands, making it less practical for resource-limited environments [51].

EfficientNet provides high accuracy with optimized efficiency, but requires extensive tuning and more computational resources [53]. ViTs are superior in learning complex patterns, but require large-scale datasets and significant computational power [54].

Given these insights, ResNet-18 emerges as the optimal choice for our binary underwater image classification, effectively balancing accuracy, efficiency, and computational feasibility. It performs well on small to moderate datasets, requires less computational power, and can achieve high accuracy with appropriate enhancement and training. The evaluation results presented in Section 5.5 further validate the effectiveness of ResNet-18 for this application.

Table 2 clearly compares the models on computational costs, strengths, and weaknesses, strengthening our choice of ResNet-18.

4.2. Proposed ResNet-18 Model

In this section, a brief overview of our proposed binary classification model is given.

The classification of underwater imaging is a challenging task, as it has various complexities in underwater image datasets, as discussed in Section 1. There also exist problems of class imbalance, such as the imbalance number of samples in each class, which create hurdles in performing true classification. To overcome these challenges, we have proposed a modified version of the ResNet-18 model by adding several layers and freezing some of its layers. A detailed overview of the designed model is shown in Figure 2. In this, the initial layers of the ResNet-18 model are frozen, so these initial layers are used as is. Their learned weights are not modified in it. We have used these learned weights by applying the transfer learning approach. After that, we added more layers to the ResNet-18 model. A linear layer of 512 × 512 is added, followed by ReLU activation and 50 percent dropout layer. After that, one more block of these three layers is repeated with a different linear layer of 512 × 256 and the same ReLU and dropout as earlier. The same block is repeated with the dimension of 256 × 128 of the linear layer with ReLU and dropout, and at the end the dimension of 128 × 2 is added. The model performs the classification of two classes such as raw and enhanced images. The designed model is trained using the SAUD [8] dataset consisting of a variety of underwater images, but the dataset here is highly unbalanced because the raw images were less than the enhanced images. Each raw image in the dataset was enhanced to 10 different forms using UIE algorithms, which actually doubled the number of images in the enhanced image class. To address the issue of class imbalance, the data augmentation and downsampling technique is applied to balance the ratio of samples in each class. This augmented dataset is passed to the proposed architecture of Res Net-18 for its training. The parameters and hyper-parameters of the proposed architecture are fine-tuned during training according to our problem at hand, including batch size, learning rate, number of epochs, selection of optimizer, loss function, and more. After that, the performance of the model is tested using the test dataset.

4.3. Proposed Classification Model

In this section, we have given a complete description of our proposed model in detail step by step. We have used a pre-trained ResNet-18 model for binary classification of our dataset.

Let

D B = (X, Y)

be a labeled dataset of underwater images, where

X = (x_{1}, x_{2}, x_{3}, \dots, x_{n})

represents n samples and

Y = (y_{1}, y_{2}, y_{3}, \dots, y_{n})

represents the labels corresponding to the samples in X.

Let the augmentation technique *flip* be applied to the dataset

D B

.

The augmented dataset is represented as

D B^{'}

.

Similarly, X and Y after augmentation are represented as

X^{'}

and

Y^{'}

, respectively.

Each sample

x_{i}

in

X^{'}

is represented as a vector of characteristics of dimension d:

x_{i} = f_{1}, f_{2}, f_{3}, \dots, f_{d}

. Assume that

x_{i}^{j}

represents a sample that belongs to class j, where

j = 0, 1

.

From each class j, 90% of the samples are selected for training and the remaining 10% of the samples are chosen for testing purposes to evaluate the performance of the proposed model.

Suppose

X_{train}^{'}

denotes the training set and

X_{test}^{'}

denotes the test set.

Similarly,

Y_{train}^{'}

and

Y_{test}^{'}

represent the labels of the training and test datasets, respectively.

After splitting the dataset, the augmented training dataset

X_{train}^{'}

is passed to the proposed model, which consists of the following steps:

The designed model takes an input image $x_{i} = (h \times w \times c h)$ in the input layer of the ResNet-18 model.
The input feature map f of the image $x_{i}$ , in the input layer l, is passed to the frozen layers of the ResNet-18 model.
This input feature map f is convolved with learned weights of the frozen layers (layer by layer).
ReLU activation function is applied to activate convolved feature maps in each frozen layer as given in Equations (3) and (4).

$g (β) = \{\begin{matrix} 0 & if β < 0 \\ β & if β \geq 0 \end{matrix}$

(3)

$max (0, x)$

(4)
The convolved feature maps are then passed to the newly added layers, such as linear layer, ReLU, and dropout. The description of these added layers is given in Section 4.2.
The weights of these newly added layers (linear, ReLU, and dropout) are updated and optimized during the backpropagation of the ResNet-18 model using the optimizer function.
Before the last classification layer, a fully connected (FC) layer is added to reduce the output size to a single value.
The output of the FC layer is then passed to the last layer, such as the classification layer that has the sigmoid activation function for binary classification.
Several iterations are performed on the training dataset $X_{train}^{'}$ to train the ResNet-18 model.
Steps 1 to 9 are performed for all samples $x_{i}$ in the training dataset $X_{train}^{'}$ .
During backpropagation, the Adaptive Moment Estimation (ADAM) optimizer is used to optimize the loss of the model as given in Equation (5).

$w_{t_{s + 1}} = w_{t_{s}} - α m_{s}$

(5)

where

$m_{s} = β m_{s - 1} + (1 - β) [\frac{δ L}{δ w_{t_{s}}}]$

(6)
The Binary Cross-Entropy loss function is applied to calculate the loss of the model, as given in Equation (6):

$C E_{bin} = - \frac{1}{N} \sum_{j = 1}^{N} [y_{j} log ({\hat{y}}_{j}) + (1 - y_{j}) log (1 - {\hat{y}}_{j})]$

(7)

5. Experimental Results and Analysis

In this section, we have discussed the experiments performed to evaluate the performance of the proposed model to classify underwater images. The details of these experiments are given in the following sub-sections.

5.1. Dataset

To train and evaluate our proposed ResNet-18 model, we have used the SAUD [8] dataset, consisting of a wide variety of underwater objects such as various underwater environments, underwater plants, a wide variety of fish, statues, ship wrecks, and other underwater species of different types as shown in Figure 3 in which raw images were pre-processed to enhanced images using 10 different UIE algorithms as shown in Figure 4. The dataset has two classes, raw and enhanced, fit for a binary classification problem as shown in Figure 5. The raw and enhanced images were manually labeled as class 0 and class 1. A brief description of the dataset is given in Table 3, where the total number of instances in class 0 or the raw image is 100 and in class 1, such as the enhanced image, is 1000. From Table 3, it can be observed that the dataset is highly unbalanced. To train our proposed ResNet-18 model, we have transformed the unbalanced dataset into a balanced dataset using augmentation and downsampling techniques. The detailed description is given in Section 5.2.

5.2. Data Augmentation

To resolve the issue of an unbalanced dataset, we have applied data augmentation techniques to the available dataset. The images from both classes are augmented by applying a flip operation on them. In doing so, we have increased the number of samples in each class such that the number of samples in the raw image class becomes 3960 and in the enhanced image becomes 7652, as shown in Table 4. However, this increase in the number of samples in each class still creates problems of data imbalance. To fix this issue, we have further applied the downsampling technique only on the enhanced class, having a majority number of samples in it, to reduce the number of samples and make them equal to the raw image class. By applying downsampling techniques, the images to be removed are selected randomly by the technique. After downsampling, the number of samples in the enhanced class becomes 3960, which is now equal to the number of samples in the raw image class. The balanced dataset is given in Table 5.

5.3. Evaluation Metrics

The following evaluation measures are used to evaluate the performance of the proposed ResNet-18 model [47].

Accuracy: Accuracy is used to measure the accuracy with which a classification test recognizes a condition. It is the fraction of accurate results for the entire number of instances.

$Accuracy = \frac{TN + TP}{TN + TP + FN + FP}$

(8)
Sensitivity: Sensitivity measures the capability of the developed method to properly detect patients with the disease who do have the condition. It is the fraction of those whose test result is positive for the disease among those who have the disease.

$Sensitivity (Recall) = \frac{TP}{TP + FN}$

(9)
Precision/Recall: Precision (positive predicted value) is the proportion of related examples among the retrieved ones.

$Precision = \frac{TP}{TP + FP}$

(10)
F1-Score: It summarizes the predictive performance of a model by integrating precision and recall. It is the harmonic mean of precision and recall.

$F 1 - Score = 2 \times \frac{Precision \times Recall}{Precision + Recall}$

(11)
AUC-ROC Score: The ROC curve depicts the performance of a model at various classification thresholds. ROC-AUC is a graph between the true positive rate (TPR) and the false positive rate (FPR).

5.4. Experimental Setup

All experiments are conducted in Google Colab using the Jupyter Notebook version 6 to evaluate the performance of our proposed ResNet-18 model. The balanced dataset, as shown in Table 5, was prepared by rescaling the input images to 512 × 512 to make them acceptable to the ResNet-18 model during training. All images are normalized to keep their values between 0 and 1. After that, the whole dataset at hand is divided into a 90:10 ratio such that 90% of samples are selected as training data from each class and the remaining 10% are kept as test set from each individual class. The division of the dataset between the train and the test set is shown in Table 6.

Various experiments were performed to fine-tune the parameters and hyperparameters of our proposed ResNet-18 model. The details of these parameter settings are shown in Table 7. After conducting various experiments, a batch size of 32 was selected, as the proposed model performed very well with this batch size. To train our proposed model, we executed it for a total of 1000 epochs with early stopping criteria.

To improve the performance of our model and reduce the loss value, we used the ADAM optimizer during backpropagation. During the experiments, we observed that the proposed model performed very well with the ADAM optimizer and a learning rate of 0.0001.

To calculate the loss, we applied the Binary Cross-Entropy loss function, as given in Equation (12).

C E_{bin} = - \frac{1}{N} \sum_{j = 1}^{N} [y_{j} log ({\hat{y}}_{j}) + (1 - y_{j}) log (1 - {\hat{y}}_{j})]

(12)

The ReLU activation function was used in various activation layers of the proposed model. During training, our proposed model stopped at epoch 83 due to early stopping criteria, indicating that the model was no longer improving after this epoch. The patience of the proposed model was set to 10 during the training. All these parameters were set after performing several experiments to train the proposed model, and the model performed very well with these parameter settings. To evaluate the performance of our proposed model, various evaluation measures such as precision, sensitivity, precision, specificity, F1-score, and AUC-ROC score were used.

5.5. Results and Discussion

In this section, we give the experimental results obtained from our proposed model and discuss them in detail.

Figure 6 presents a confusion matrix that evaluates the performance of our classification model. Color coding indicates classification frequencies, with brighter colors (yellow/green) representing higher values and darker shades (purple) indicating lower values. The matrix reveals that 396 samples were correctly classified as class 0 and 361 as class 1, highlighted in bright yellow/green. There were 4 false positives and 31 false negatives, represented by the darker shades, which indicate fewer misclassifications. Overall, this suggests strong performance of the model. Our proposed model achieved an accuracy of 96%, which shows that the model has learned the weights very effectively during training and has learned the complex patterns very well present in these images. Improvements in other evaluation measures such as sensitivity, specificity, precision, F1-score, and AUC-ROC scores can be seen in Table 8 and Table 9, which further proves that the proposed model does not suffer from any overfitting and underfitting issues, also the proposed model does not show bias toward a single class. Our proposed model has obtained impressive results as shown in Table 9 with precision = 99%, sensitivity = 92%, specificity = 99%, F1-score = 95%, and AUC-ROC score = 96%. From these results, it can be observed that the proposed model did not get confused in identifying the difference among images based on their complexities.

In addition, the training and validation loss curves are shown in Figure 7. These curves show that the proposed model is learning very effectively without suffering from overfitting, since there is much less difference between these two curves.

The ROC curve of the proposed ResNet-18 model in the classification of the test dataset is shown in Figure 8. The ROC curve shows that the proposed model performs well and very effectively and gives a good true positive rate. The dataset on hand is very complex, as there is a much smaller difference between raw and enhanced images.

6. Comparison of Our Proposed Model with Other Deep Learning Models

In this section, we have evaluated our proposed model with two state-of-the-art deep learning architectures, EfficientNet and VIT. Using the PyTorch 2.4 framework and torch vision libraries, both models were trained on the same dataset [8] with an 80/20 split between training and test sets. Preprocessing involved resizing images to 224 × 224 pixels and applying normalization aligned with ImageNet standards. Training optimized by Adam Optimizer that updates model weights during training with a learning rate of 0.0001. Both models are trained with batch size 32 over multiple epochs up to 1000, where each epoch represents one complete pass through the training data. During each epoch, the optimizer adjusts the weights to reduce loss and improve accuracy. Performance is assessed using the same set of metrics. To handle class imbalance and to avoid overfit, we applied the data augmentation technique during training of both models (EfficientNet and ViT). Each image was resized to 224 × 224 pixels and went through several transformations, including random horizontal and vertical flips, 90-degree rotations, and transpositions. We also added geometric variations by randomly shifting, scaling, and rotating images. All images were then normalized using the standard ImageNet mean and standard deviation and converted to tensor format. These enhancements improved generalization and reduced overfitting. Table 10 shows the performance comparison of our proposed model with the two benchmark models EfficientNet and ViT, where our model shows better performance. The proposed ResNet-18 model was specifically tailored and fine-tuned to address the unique challenges of underwater image classification. This included handling class imbalance through extensive data augmentation and strategic downsampling, which helped ensure a more balanced class distribution during training.

Additionally, we modified the ResNet-18 architecture by freezing its initial layers to preserve valuable pre-trained features, while introducing multiple custom fully connected layers with ReLU activation and dropout. These enhancements allowed the model to learn domain-specific features effectively while mitigating the risk of overfitting. As a result, the proposed model consistently outperformed both EfficientNet and ViT in all evaluation metrics, demonstrating the effectiveness of combining transfer learning, architectural refinements, and targeted preprocessing for this complex task.

7. Evaluating the Performance of Our Proposed Model on Marine Snow-Affected Underwater Images

The underwater images affected by marine snow are subaqueous visuals characterized by the presence of suspended particles of matter, including organic detritus, plankton, and other microscale debris distributed throughout the water. These particles create a visual noise known as “marine snow”, which can degrade the visual quality of the image and make it difficult for computer vision systems to accurately detect and classify images. To assess the performance of our model, we trained our model using datasets [55,56]. The model is trained with an 80/20 split for training and test sets over 1000 epochs, allowing it to learn complex patterns in the image data. Throughout the training process, performance metrics such as accuracy, precision, recall, specificity, F1-Score, and AUC-ROC are likely tracked to evaluate and monitor model improvements over time.

PHISMID in [55] contains 800 images in total, where the dataset is divided into 400 original images and 400 images with artificially added marine snow effects on the original images. All images have a resolution of 384 × 384 pixels.

Next, we trained our model on the MSRB dataset [56], which contains the original images and the images that contain small marine snow artifacts (MSR-1). Each subdataset consists of 2700 images, and the maximum width/height of the artifacts in MSR-1 is restricted to six pixels.

Table 11 shows the evaluation metrics for both datasets highlight the effectiveness of our proposed model in classifying images affected by marine snow.

8. Performance Comparison of Variants of ResNet-18 on Different Datasets

In this section, we have provided a detailed comparison of the performance of ResNet-18 and its variants in different datasets. The results are shown in Table 12. From these results, it can be seen that ResNet-18 is showing significant performance on various kinds of underwater image datasets. All of these datasets have a variety of classes and less variation in them. Datasets such as fish and deep-ship are less complex datasets. However, the Danjiangkou and ShipsEar datasets are more complex, as they have intricate features that show a lower sensitivity and an F1-score for a few classes. It shows that the homogeneity of the dataset may have a great influence on the performance of the model.

There is another comparison of the performance of various models on different underwater image datasets. This comparison is shown in Table 13. From this, various models are showing significant performance on different underwater image datasets. It can be observed that MLR-based models, such as MLR-VGG16 and MLR-VGG19, perform better than traditional VGG models in the F4K datasets. All of these results demonstrate the effectiveness of advanced models, such as the MLR-based and FDA-based models. From these results, it can be concluded that the characteristics of each dataset have a great influence on the performance of the model.

9. Conclusions

In this paper, we have introduced a modified version of the ResNet-18 model for the binary classification of underwater image datasets into raw and enhanced ones. Our proposed model performs very well in the presence of noise, distortion, illumination, and various complex patterns present in underwater images. This shows that our proposed model learns robust and informative features very well even in the presence of these challenges. From the experiments carried out, we can observe that our model is showing impressive results. From these findings, we concluded that our proposed model has strength and reliability to handle various challenges present in underwater image datasets. Our proposed technique shows an encouraging direction to further advance the underwater image classification task in the presence of complex patterns and various challenges.

10. Future Work

Although this study demonstrates the effectiveness of a modified ResNet-18 model for binary classification of underwater images, it also leads to several broader research scopes for further exploration and improvement.

Expanding dataset diversity: The current work uses a carefully curated dataset. Future work can focus on expanding this dataset with a variety of different types of underwater environments, lighting conditions, and object categories. Incorporating semi-supervised learning approaches can also improve the model adaptability to unseen underwater scenes.
Hybrid Transformer-CNN architectures: Although this study focused on CNN-based models, future work may explore combining CNNs with Transformer architectures to capture both local and global image features. Such hybrid models could improve performance in challenging underwater conditions by enhancing preprocessing (e.g., attention-guided enhancement) and boosting classification robustness across various image degradations.
Multi-Class Classification and Fine-Grained Recognition: Our existing model uses binary classification (raw and enhanced image). A logical extension would be to develop a multiclass classifier that can differentiate various underwater scenes, different objects, and different degradation levels.
Enhancing Robustness Against Environmental Factors: Underwater images suffer from varying environmental conditions such as scattering in lights, colors, turbidity, and depth randomization. Future models could incorporate domain adaptation, adversarial training, or self-supervised learning techniques to improve resilience in real-world underwater conditions.
Integration with Multi-Modal Data: Combining underwater optical images with sonar, LiDAR, and acoustic signals could improve the understanding of the scene. Multimodal learning approaches could significantly improve the performance of underwater classification.
Statistical robustness through repeated trials and cross-validation: Future work will involve conducting multiple training runs with different random initializations and employing k-fold cross-validation. This will allow us to report evaluation metrics (for example, accuracy, F1-score, AUC-ROC) with standard deviations or confidence intervals, thereby strengthening the assessment of the stability and generalization performance of the model.

Author Contributions

Conceptualization, M.; Methodology, M.; Software, M.; Validation, M.; Formal analysis, M.; Investigation, M.; Writing—original draft, M.; Writing—review & editing, M.; Visualization, M.; Supervision, M.L., D.J. and Y.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Polish Ministry of Science and Higher Education with subsidy funds from the Faculty of Computer Science, Electronics and Telecommunications of AGH University, and in part by the National Natural Science Foundation of China under Grant 62271384.

Data Availability Statement

The data presented in this study are available on request from the corresponding author as the data are part of an ongoing project and may be used in future related research.

Conflicts of Interest

The authors declare no conflict of interest.

References

Saleem, M.; Juszka, D.; Zhang, Y.; Leszczuk, M. Comparative Performance Analysis of Deep Learning Architectures in Underwater Image Classification. Przegląd Telekomun. i Wiadomości Telekomun. 2024, XCVII, 187. [Google Scholar] [CrossRef]
Meng, L.; Hirayama, T.; Oyanagi, S. Underwater-drone with panoramic camera for automatic fish recognition based on deep learning. IEEE Access 2018, 6, 17880–17886. [Google Scholar] [CrossRef]
Mittal, S.; Srivastava, S.; Jayanth, J.P. A survey of deep learning techniques for underwater image classification. IEEE Trans. Neural Netw. Learn. Syst. 2022, 34, 6968–6982. [Google Scholar] [CrossRef] [PubMed]
Chungath, T.T.; Nambiar, A.M.; Mittal, A. Transfer learning and few-shot learning based deep neural network models for underwater sonar image classification with a few samples. IEEE J. Ocean. Eng. 2023, 49, 294–310. [Google Scholar] [CrossRef]
Seese, N.; Myers, A.; Smith, K.; Smith, A.O. Adaptive foreground extraction for deep fish classification. In Proceedings of the 2016 ICPR 2nd Workshop on Computer Vision for Analysis of Underwater Imagery (CVAUI), Cancun, Mexico, 4 December 2016; pp. 19–24. [Google Scholar]
Huo, G.; Wu, Z.; Li, J. Underwater object classification in sidescan sonar images using deep transfer learning and semisynthetic training data. IEEE Access 2020, 8, 47407–47418. [Google Scholar] [CrossRef]
Martija, M.A.M.; Dumbrique, J.I.S.; Naval, P.C., Jr. Underwater gesture recognition using classical computer vision and deep learning techniques. J. Image Graph. 2020, 8, 9–14. [Google Scholar] [CrossRef]
Jiang, Q.; Gu, Y.; Li, C.; Cong, R.; Shao, F. Underwater Image Enhancement Quality Evaluation: Benchmark Dataset and Objective Metric. IEEE Trans. Circuits Syst. Video Technol. 2022, 32, 5959–5974. [Google Scholar] [CrossRef]
Mathur, M.; Vasudev, D.; Sahoo, S.; Jain, D.; Goel, N. Crosspooled FishNet: Transfer learning based fish species classification model. Multimed. Tools Appl. 2020, 79, 31625–31643. [Google Scholar] [CrossRef]
Sun, X.; Shi, J.; Liu, L.; Dong, J.; Plant, C.; Wang, X.; Zhou, H. Transferring deep knowledge for object recognition in low-quality underwater videos. Neurocomputing 2018, 275, 897–908. [Google Scholar] [CrossRef]
Tamou, A.B.; Benzinou, A.; Nasreddine, K.; Ballihi, L. Transfer learning with deep convolutional neural network for underwater live fish recognition. In Proceedings of the 2018 IEEE International Conference on Image Processing, Applications and Systems (IPAS), Sophia Antipolis, France, 12–14 December 2018; pp. 204–209. [Google Scholar]
Fu, Y.; Yu, J.; Liu, H.; Guo, B.; Yu, X.; Yu, L.; Zhu, Y. Underwater image enhancement based on deep learning water body pre-classification. In Proceedings of the Sixth Conference on Frontiers in Optical Imaging and Technology: Applications of Imaging Technologies, Nanjing, China, 22–24 October 2023; Volume 13157, pp. 290–304. [Google Scholar]
Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. In Proceedings of the Advances in Neural Information Processing Systemsm, Lake Tahoe, NV, USA, 3–6 December 2012; Volume 25. [Google Scholar]
Allken, V.; Handegard, N.O.; Rosen, S.; Schreyeck, T.; Mahiout, T.; Malde, K. Fish species identification using a convolutional neural network trained on synthetic data. ICES J. Mar. Sci. 2019, 76, 342–349. [Google Scholar] [CrossRef]
Gori, A.; Kapadnis, A.; Patil, R.; Patel, D.; Nikumbh, D. Fish Species Recognition using Deep Learning Techniques. In Proceedings of the 2023 International Conference on Advanced Computing Technologies and Applications (ICACTA), Mumbai, India, 6–7 October 2023; pp. 1–6. [Google Scholar]
Olsvik, E.; Trinh, C.M.; Knausgård, K.M.; Wiklund, A.; Sørdalen, T.K.; Kleiven, A.R.; Jiao, L.; Goodwin, M. Biometric fish classification of temperate species using convolutional neural network with squeeze-and-excitation. In Advances and Trends in Artificial Intelligence. From Theory to Practice, Proceedings of the 32nd International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems, IEA/AIE 2019, Graz, Austria, 9–11 July 2019; Proceedings 32; Springer: Berlin/Heidelberg, Germany, 2019; pp. 89–101. [Google Scholar]
Rathi, D.; Jain, S.; Indu, S. Underwater fish species classification using convolutional neural network and deep learning. In Proceedings of the 2017 Ninth International Conference on Advances in Pattern Recognition (ICAPR), Bangalore, India, 27–30 December 2017; pp. 1–6. [Google Scholar]
Xu, Y.; Zhang, Y.; Wang, H.; Liu, X. Underwater image classification using deep convolutional neural networks and data augmentation. In Proceedings of the 2017 IEEE international conference on signal processing, communications and computing (ICSPCC), Xiamen, China, 22–25 October 2017; pp. 1–5. [Google Scholar]
Xiao, S.; Shen, X.; Zhang, Z.; Wen, J.; Xi, M.; Yang, J. Underwater image classification based on image enhancement and information quality evaluation. Displays 2024, 82, 102635. [Google Scholar] [CrossRef]
Mahmood, A.; Bennamoun, M.; An, S.; Sohel, F.; Boussaid, F. ResFeats: Residual network based features for underwater image classification. Image Vis. Comput. 2020, 93, 103811. [Google Scholar] [CrossRef]
Ancuti, C.; Ancuti, C.O.; Haber, T.; Bekaert, P. Enhancing underwater images and videos by fusion. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA, 16–21 June 2012; pp. 81–88. [Google Scholar]
Chiang, J.Y.; Chen, Y.C. Underwater image enhancement by wavelength compensation and dehazing. IEEE Trans. Image Process. 2011, 21, 1756–1769. [Google Scholar] [CrossRef] [PubMed]
Beijbom, O.; Edmunds, P.J.; Kline, D.I.; Mitchell, B.G.; Kriegman, D. Automated annotation of coral reef survey images. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA, 16–21 June 2012; pp. 1170–1177. [Google Scholar]
Raveendran, S.; Patil, M.D.; Birajdar, G.K. Underwater image enhancement: A comprehensive review, recent trends, challenges and applications. Artif. Intell. Rev. 2021, 54, 5413–5467. [Google Scholar] [CrossRef]
Schettini, R.; Corchs, S. Underwater image processing: State of the art of restoration and image enhancement methods. EURASIP J. Adv. Signal Process. 2010, 2010, 1–14. [Google Scholar] [CrossRef]
Kyathanahally, S.P.; Hardeman, T.; Merz, E.; Bulas, T.; Reyes, M.; Isles, P.; Pomati, F.; Baity-Jesi, M. Deep learning classification of lake zooplankton. Front. Microbiol. 2021, 12, 746297. [Google Scholar] [CrossRef] [PubMed]
Prasetyo, E.; Suciati, N.; Fatichah, C. Multi-level residual network VGGNet for fish species classification. J. King Saud Univ.-Comput. Inf. Sci. 2022, 34, 5286–5295. [Google Scholar] [CrossRef]
Qin, H.; Li, X.; Liang, J.; Peng, Y.; Zhang, C. DeepFish: Accurate underwater live fish recognition with a deep architecture. Neurocomputing 2016, 187, 49–58. [Google Scholar] [CrossRef]
Salman, A.; Jalal, A.; Shafait, F.; Mian, A.; Shortis, M.; Seager, J.; Harvey, E. Fish species classification in unconstrained underwater environments based on deep learning. Limnol. Oceanogr. Methods 2016, 14, 570–585. [Google Scholar] [CrossRef]
Sun, X.; Shi, J.; Dong, J.; Wang, X. Fish recognition from low-resolution underwater images. In Proceedings of the 2016 9th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI), Datong, China, 15–17 October 2016; pp. 471–476. [Google Scholar]
Yang, H.; Liu, P.; Hu, Y.; Fu, J. Research on underwater object recognition based on YOLOv3. Microsyst. Technol. 2021, 27, 1837–1844. [Google Scholar] [CrossRef]
Zhang, Y.; Chandler, D.M.; Leszczuk, M. Retinex-based underwater image enhancement via adaptive color correction and hierarchical U-shape transformer. Opt. Express 2024, 32, 24018–24040. [Google Scholar] [CrossRef] [PubMed]
Yeh, C.H.; Lai, Y.W.; Lin, Y.Y.; Chen, M.J.; Wang, C.C. Underwater Image Enhancement Based on Light Field-Guided Rendering Network. J. Mar. Sci. Eng. 2024, 12, 1217. [Google Scholar] [CrossRef]
Liu, X.; Chen, Z.; Xu, Z.; Zheng, Z.; Ma, F.; Wang, Y. Enhancement of Underwater Images through Parallel Fusion of Transformer and CNN. J. Mar. Sci. Eng. 2024, 12, 1467. [Google Scholar] [CrossRef]
Guo, X.; Zhao, X.; Liu, Y.; Li, D. Underwater sea cucumber identification via deep residual networks. Inf. Process. Agric. 2019, 6, 307–315. [Google Scholar] [CrossRef]
Lumini, A.; Nanni, L. Ocean ecosystems plankton classification. In Recent Advances in Computer Vision: Theories and Applications; Springer: Cham, Switzerland, 2019; pp. 261–280. [Google Scholar]
Wang, X.; Ouyang, J.; Li, D.; Zhang, G. Underwater object recognition based on deep encoding-decoding network. J. Ocean Univ. China 2019, 18, 376–382. [Google Scholar] [CrossRef]
Szymak, P. Recognition of underwater objects using deep learning in Matlab. In Proceedings of the 2018 International Conference on Applied Mathematics & Computational Science (ICAMCS. NET), Budapest, Hungary, 6–8 October 2018; pp. 53–535. [Google Scholar]
Villon, S.; Mouillot, D.; Chaumont, M.; Darling, E.S.; Subsol, G.; Claverie, T.; Villéger, S. A deep learning method for accurate and fast identification of coral reef fishes in underwater images. Ecol. Inform. 2018, 48, 238–244. [Google Scholar] [CrossRef]
Gonzalez-Cid, Y.; Burguera, A.; Bonin-Font, F.; Matamoros, A. Machine learning and deep learning strategies to identify posidonia meadows in underwater images. In Proceedings of the OCEANS 2017-Aberdeen, Aberdeen, UK, 19–21 June 2017; pp. 1–5. [Google Scholar]
Jin, L.; Liang, H. Deep learning for underwater image recognition in small sample size situations. In Proceedings of the OCEANS 2017-Aberdeen, Aberdeen, UK, 19–21 June 2017; pp. 1–4. [Google Scholar]
Liawatimena, S.; Abdurachman, E.; Trisetyarso, A.; Wibowo, A.; Ario, M.K.; Edbert, I.S. Fish classification system using YOLOv3-ResNet18 model for mobile phones. CommIT (Commun. Inf. Technol. J. 2023, 17, 71–79. [Google Scholar] [CrossRef]
Jian, M.; Qi, Q.; Yu, H.; Dong, J.; Cui, C.; Nie, X.; Zhang, H.; Yin, Y.; Lam, K.M. The extended marine underwater environment database and baseline evaluations. Appl. Soft Comput. 2019, 80, 425–437. [Google Scholar] [CrossRef]
Liu, R.; Fan, X.; Zhu, M.; Hou, M.; Luo, Z. Real-world underwater enhancement: Challenges, benchmarks, and solutions under natural light. IEEE Trans. Circuits Syst. Video Technol. 2020, 30, 4861–4875. [Google Scholar] [CrossRef]
Hong, J.; Fulton, M.; Sattar, J. Trashcan: A semantically-segmented dataset towards visual detection of marine debris. arXiv 2020, arXiv:2007.08097. [Google Scholar]
Lu, H.; Li, Y.; Uemura, T.; Ge, Z.; Xu, X.; He, L.; Serikawa, S.; Kim, H. FDCNet: Filtering deep convolutional network for marine organism classification. Multimed. Tools Appl. 2018, 77, 21847–21860. [Google Scholar] [CrossRef]
Pachaiyappan, P.; Chidambaram, G.; Jahid, A.; Alsharif, M.H. Enhancing Underwater Object Detection and Classification Using Advanced Imaging Techniques: A Novel Approach with Diffusion Models. Sustainability 2024, 16, 7488. [Google Scholar] [CrossRef]
Roy, J.; Talukder, K.H. Under Water Objects Detection and Classification using Deep Learning Technique. In Proceedings of the 2024 International Conference on Advances in Computing, Communication, Electrical, and Smart Systems (iCACCESS), Dhaka, Bangladesh, 8–9 March 2024; pp. 1–5. [Google Scholar] [CrossRef]
Jiang, Z.; Zhao, C.; Wang, H. Classification of underwater target based on S-ResNet and modified DCGAN models. Sensors 2022, 22, 2293. [Google Scholar] [CrossRef] [PubMed]
Yao, Q.; Wang, Y.; Yang, Y. Underwater acoustic target recognition based on data augmentation and residual CNN. Electronics 2023, 12, 1206. [Google Scholar] [CrossRef]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
Nauen, T.C.; Palacio, S.; Raue, F.; Dengel, A. Which Transformer to Favor: A Comparative Analysis of Efficiency in Vision Transformers. arXiv 2023, arXiv:2308.09372. [Google Scholar]
Tan, M.; Le, Q. Efficientnet: Rethinking model scaling for convolutional neural networks. In Proceedings of the International Conference on Machine Learning, Long Beach, CA, USA, 9–15 June 2019; pp. 6105–6114. [Google Scholar]
Liu, J.; Sun, J.; Zhou, X. Comparison of ResNet-50 and vision transformer models for trash classification. In Proceedings of the Third International Conference on Artificial Intelligence and Computer Engineering (ICAICE 2022), Online Conference, 11–13 November 2022; Volume 12610, pp. 486–491. [Google Scholar]
Kaneko, R.; Ueda, T.; Higashi, H.; Tanaka, Y. Physics-Inspired Synthesized Underwater Image Dataset. arXiv 2024, arXiv:2404.03998. [Google Scholar]
Kaneko, R.; Sato, Y.; Ueda, T.; Higashi, H.; Tanaka, Y. Marine snow removal benchmarking dataset. In Proceedings of the 2023 Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Taipei, Taiwan, 31 October–3 November 2023; pp. 771–778. [Google Scholar]
Irfan, M.; Jiangbin, Z.; Ali, S.; Iqbal, M.; Masood, Z.; Hamid, U. DeepShip: An underwater acoustic benchmark dataset and a separable convolution based autoencoder for classification. Expert Syst. Appl. 2021, 183, 115270. [Google Scholar] [CrossRef]
Hong, F.; Liu, C.; Guo, L.; Chen, F.; Feng, H. Underwater acoustic target recognition with resnet18 on shipsear dataset. In Proceedings of the 2021 IEEE 4th International Conference on Electronics Technology (ICET), Chengdu, China, 7–10 May 2021; pp. 1240–1244. [Google Scholar]
Li, J.; Wang, B.; Cui, X.; Li, S.; Liu, J. Underwater acoustic target recognition based on attention residual network. Entropy 2022, 24, 1657. [Google Scholar] [CrossRef] [PubMed]
Yao, Y.; Zeng, X.; Wang, H.; Liu, J. Research on underwater acoustic target recognition method based on densenet. In Proceedings of the 2022 3rd International Conference on Big Data, Artificial Intelligence and Internet of Things Engineering (ICBAIE), Xi’an, China, 15–17 July 2022; pp. 114–118. [Google Scholar]
Yang, J.; Cai, M.; Yang, X.; Zhou, Z. Underwater image classification algorithm based on convolutional neural network and optimized extreme learning machine. J. Mar. Sci. Eng. 2022, 10, 1841. [Google Scholar] [CrossRef]

Figure 1. Framework of the residual learning block [51].

Figure 2. Proposed classification process using ResNet-18 model.

Figure 3. Samples of raw images before enhancements.

Figure 4. Raw image enhanced in multiple ways using UIE algorithms.

Figure 5. Comparison of raw and enhanced images, highlighting improvements in visibility, contrast, and color correction.

Figure 6. Confusion matrix showing the performance of the proposed ResNet-18 model for binary classification of underwater images, highlighting correct and misclassified cases.

Figure 7. (a) Training loss and (b) validation loss curves of the proposed model, illustrating the convergence behavior and model generalization.

Figure 8. ROC curve of the proposed model, demonstrating classification performance on the test dataset.

Table 1. Various versions of ResNet with different numbers of layers.

Layer Name	Output Size	18-Layer	34-Layer	50-Layer	101-Layer	152-Layer
conv1	112 × 112	7 × 7, 64, stride 2
conv2_x	56 × 56	3 × 3 max pool, stride 2
		$[\begin{matrix} 3 \times 3, 64 \\ 3 \times 3, 64 \end{matrix}] \times 2$	$[\begin{matrix} 3 \times 3, 64 \\ 3 \times 3, 64 \end{matrix}] \times 3$	$[\begin{matrix} 1 \times 1, 64 \\ 3 \times 3, 64 \\ 1 \times 1, 256 \end{matrix}] \times 3$	$[\begin{matrix} 1 \times 1, 64 \\ 3 \times 3, 64 \\ 1 \times 1, 256 \end{matrix}] \times 3$	$[\begin{matrix} 1 \times 1, 64 \\ 3 \times 3, 64 \\ 1 \times 1, 256 \end{matrix}] \times 3$
conv3_x	28 × 28	$[\begin{matrix} 3 \times 3, 128 \\ 3 \times 3, 128 \end{matrix}] \times 2$	$[\begin{matrix} 3 \times 3, 128 \\ 3 \times 3, 128 \end{matrix}] \times 4$	$[\begin{matrix} 1 \times 1, 128 \\ 3 \times 3, 128 \\ 1 \times 1, 512 \end{matrix}] \times 4$	$[\begin{matrix} 1 \times 1, 128 \\ 3 \times 3, 128 \\ 1 \times 1, 512 \end{matrix}] \times 4$	$[\begin{matrix} 1 \times 1, 128 \\ 3 \times 3, 128 \\ 1 \times 1, 512 \end{matrix}] \times 8$
conv4_x	14 × 14	$[\begin{matrix} 3 \times 3, 256 \\ 3 \times 3, 256 \end{matrix}] \times 2$	$[\begin{matrix} 3 \times 3, 256 \\ 3 \times 3, 256 \end{matrix}] \times 6$	$[\begin{matrix} 1 \times 1, 256 \\ 3 \times 3, 256 \\ 1 \times 1, 1024 \end{matrix}] \times 6$	$[\begin{matrix} 1 \times 1, 256 \\ 3 \times 3, 256 \\ 1 \times 1, 1024 \end{matrix}] \times 23$	$[\begin{matrix} 1 \times 1, 256 \\ 3 \times 3, 256 \\ 1 \times 1, 1024 \end{matrix}] \times 36$
conv5_x	7 × 7	$[\begin{matrix} 3 \times 3, 512 \\ 3 \times 3, 512 \end{matrix}] \times 2$	$[\begin{matrix} 3 \times 3, 512 \\ 3 \times 3, 512 \end{matrix}] \times 3$	$[\begin{matrix} 1 \times 1, 512 \\ 3 \times 3, 512 \\ 1 \times 1, 2048 \end{matrix}] \times 3$	$[\begin{matrix} 1 \times 1, 512 \\ 3 \times 3, 512 \\ 1 \times 1, 2048 \end{matrix}] \times 3$	$[\begin{matrix} 1 \times 1, 512 \\ 3 \times 3, 512 \\ 1 \times 1, 2048 \end{matrix}] \times 3$
	1 × 1	Average pool, 1000-d fc, softmax
FLOPs		$1.8 \times 10^{9}$	$3.6 \times 10^{9}$	$3.8 \times 10^{9}$	$7.6 \times 10^{9}$	$11.3 \times 10^{9}$

Table 2. Comparison of deep learning architectures.

Model	Cost	Strengths	Weaknesses
ResNet-18	Low	Fast, efficient	Limited features
ResNet-50	High	Strong extraction	High computation
EfficientNet	High	Optimized accuracy	Resource-intensive
ViT	Very High	High accuracy	Needs large data

Table 3. A brief description of the dataset (before augmentation).

Class No.	Class Label	Number of Instances
0	Raw Image	100
1	Enhanced Image	1000
Total Size		1100

Table 4. Underwater image dataset description after applying data augmentation (flip).

Class No.	Class Label	Number of Instances
0	Raw Image	3960
1	Enhanced Image	7652
Total Size		11,612

Table 5. Balanced dataset after applying downsampling on the enhanced image class in the augmented dataset.

Class No.	Class Label	Number of Instances
0	Raw Image	3960
1	Enhanced Image	3960
Total Size		7920

Table 6. Dataset distribution into train and test sets.

Class	Total	Train (90%)	Test (10%)
Raw Image	3960	3564	396
Enhanced Image	3960	3564	396

Table 7. Experimental parameter settings for ResNet-18 model training.

Parameter	Value
Optimizer	ADAM
Loss Function	Binary Cross-Entropy Loss
Activation Function	ReLU
Learning Rate	0.0001
Batch Size	32
No. of Epochs	1000
Termination Epoch (Early Stopping)	83
Patience	10

Table 8. Evaluation measures of the base ResNet-18 model.

Evaluation Measure	Result
Accuracy	92%
Precision (PR)	94%
Sensitivity/Recall (SE)	90%
Specificity (SP)	92%
F1-Score	91%
AUC-ROC Score	92%

Table 9. Evaluation measures of the proposed ResNet-18 model.

Evaluation Measure	Result
Accuracy	96%
Precision (PR)	99%
Sensitivity/Recall (SE)	92%
Specificity (SP)	99%
F1-Score	95%
AUC-ROC Score	96%

Table 10. Evaluation of EfficientNet and ViT with our proposed model using the same dataset.

	EfficientNet	ViT	Proposed ResNet-18
Accuracy	91%	94%	96%
Precision	92%	98%	99%
Sensitivity/Recall	90%	92%	92%
Specificity	90%	97%	99%
F1-Score	90%	95%	95%
AUC-ROC	96%	94%	96%

Table 11. Evaluation of our proposed model on datasets [55,56].

	PHISMID	MSRB
Accuracy	96%	95%
Precision	95%	95%
Sensitivity/Recall	97%	96%
Specificity	94%	91%
F1-Score	96%	93%
AUC-ROC	98%	96%

Table 12. Comparison of the performance of ResNet 18 based models on various underwater image datasets.

Sr No.	Reference	Year	Techniques Used	Dataset	No. of Classes	Results
1	[42]	2023	YOLOV3-ResNet-18	Fish Dataset [42]	4	Class 1: Accuracy = 0.97, PR = 0.98, SE = 0.92, F1 = 0.94
						Class 2: Accuracy = 0.98, PR = 0.94, SE = 0.96, F1 = 0.95
						Class 3: Accuracy = 0.98, PR = 0.95, SE = 0.96, F1 = 0.95
						Class 4: Accuracy = 0.99, PR = 0.99, SE = 0.99, F1 = 0.99
2	[49]	2022	S-ResNet	Danjiangkou Reservoir [49]	5	With ResNet-18: Accuracy = 0.93
						With S-ResNet:
						overall Accuracy = 0.92
						Class 1: PR = 0.81, SE = 0.91, F1 = 0.86,
						Class 2: PR = 0.94, SE = 0.97, F1 = 0.96,
						Class 3: PR = 0.95, SE = 0.90, F1 = 0.93,
						Class 4: PR = 0.95, SE = 0.80, F1 = 0.87,
						Class 5: PR = 0.90, SE = 1, F1 = 0.95,
3	[50]	2023	ResNet-18	Deepship [57]	4	Accuracy = 0.92, PR = 0.92, SE = 0.92, F1 = 0.92
4	[58]	2021	ResNet-18	ShipsEar [58]	5	Overall Accuracy = 0.94,
						Class 1: PR = 0.97, SE = 0.95, F1 = 0.96
						Class 2: PR = 0.95, SE = 0.94, F1 = 0.95
						Class 3: PR = 0.92, SE = 0.91, F1 = 0.92
						Class 4: PR = 0.95, SE = 0.96, F1 = 0.95
						Class 5: PR = 0.94, SE = 0.94, F1 = 0.94
5	[59]	2022	A ResNet	Deepship [57]	4	Accuracy = 0.99, PR = 0.99, SE = 0.99, F1 = 0.99,
5	[59]	2022	A ResNet	ShipsEar [58]	5	Accuracy = 0.98, PR = 0.98, SE = 0.98, F1 = 0.98
6	[60]	2022	Res-DenseNet	Lake dataset [60]	4	Accuracy = 0.97

Table 13. Comparison of the performance of various models on underwater image datasets.

Sr No.	Reference	Year	Dataset	Techniques Used	Results
1	[27]	2022	F4K	VGG16	82.44
				VGG19	88.07
				ResNet50	88.12
				InceptionV3	85.66
				Xception	83.70
				MLR-VGG16	96.25
				MLR-VGG19	97.09
2	[27]	2022	Fish-gres	VGG16	89.83
				VGG19	87.51
				ResNet50	97.84
				InceptionV3	93.22
				Xception	93.53
				MLR-VGG16	98.46
				MLR-VGG19	97.84
3	[61]	2022	F4K	DenseNet201	Acc = 0.90, PR = 0.90, SE = 0.86, F1 = 0.82
				FCMFDA-ELM	Acc = 0.99, PR = 0.99, SE = 0.99, F1 = 0.99
				FDA-ELM	Acc = 0.96, PR = 0.96, SE = 0.96, F1 = 0.95
				STOA-ELM	Acc = 0.96, PR = 0.96, SE = 0.96, F1 = 0.99
				WOA-ELM	Acc = 0.98, PR = 0.98, SE = 0.98, F1 = 0.97
				MFO-ELM	Acc = 0.98, PR = 0.98, SE = 0.98, F1 = 0.97
				ELM	Acc = 0.94, PR = 0.93, SE = 0.94, F1 = 0.93
4	[61]	2022	URPC	DenseNet201	Acc = 0.89, PR = 0.87, SE = 0.89, F1 = 0.88
				FCMFDA-ELM	Acc = 0.98, PR = 0.97, SE = 0.97, F1 = 0.97
				FDA-ELM	Acc = 0.94, PR = 0.93, SE = 0.92, F1 = 0.93
				STOA-ELM	Acc = 0.96, PR = 0.95, SE = 0.95, F1 = 0.95
				WOA-ELM	Acc = 0.94, PR = 0.94, SE = 0.93, F1 = 0.94
				MFO-ELM	Acc = 0.94, PR = 0.94, SE = 0.93, F1 = 0.93
				ELM	Acc = 0.93, PR = 0.93, SE = 0.92, F1 = 0.93

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Mehrunnisa; Leszczuk, M.; Juszka, D.; Zhang, Y. Improved Binary Classification of Underwater Images Using a Modified ResNet-18 Model. Electronics 2025, 14, 2954. https://doi.org/10.3390/electronics14152954

AMA Style

Mehrunnisa, Leszczuk M, Juszka D, Zhang Y. Improved Binary Classification of Underwater Images Using a Modified ResNet-18 Model. Electronics. 2025; 14(15):2954. https://doi.org/10.3390/electronics14152954

Chicago/Turabian Style

Mehrunnisa, Mikolaj Leszczuk, Dawid Juszka, and Yi Zhang. 2025. "Improved Binary Classification of Underwater Images Using a Modified ResNet-18 Model" Electronics 14, no. 15: 2954. https://doi.org/10.3390/electronics14152954

APA Style

Mehrunnisa, Leszczuk, M., Juszka, D., & Zhang, Y. (2025). Improved Binary Classification of Underwater Images Using a Modified ResNet-18 Model. Electronics, 14(15), 2954. https://doi.org/10.3390/electronics14152954

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Improved Binary Classification of Underwater Images Using a Modified ResNet-18 Model

Abstract

1. Introduction

2. Literature Review

3. Methodological Background of ResNet-18 Architecture

ResNet-18 Architecture

4. Model Overview

4.1. Model Selection

4.2. Proposed ResNet-18 Model

4.3. Proposed Classification Model

5. Experimental Results and Analysis

5.1. Dataset

5.2. Data Augmentation

5.3. Evaluation Metrics

5.4. Experimental Setup

5.5. Results and Discussion

6. Comparison of Our Proposed Model with Other Deep Learning Models

7. Evaluating the Performance of Our Proposed Model on Marine Snow-Affected Underwater Images

8. Performance Comparison of Variants of ResNet-18 on Different Datasets

9. Conclusions

10. Future Work

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI