1. Introduction
The ocean covers a significant portion of the Earth’s surface and harbors abundant natural resources, meaning that underwater exploration is of significant importance. Various underwater technologies, including autonomous underwater vehicles (AUVs), rely heavily on visual data for navigation, mapping, and environmental analysis, making underwater imaging a critical component of ocean exploration. As a result, underwater image processing has become a vital technology across multiple domains, including marine biology, oceanography, and underwater robotics. However, acquiring and processing high-quality underwater images is exceptionally challenging due to the complex and uncontrollable nature of the underwater environment. Common issues, often caused by artificial lighting, such as light attenuation, color distortion, low contrast, blurred details, and noise, significantly degrade image quality. These problems not only hinder human visual perception but also reduce the effectiveness of automated analysis. Moreover, the unique optical properties of underwater imaging, including selective light absorption and scattering, mean that traditional in-air imaging methods or conventional enhancement algorithms are insufficient. Specially designed algorithms for underwater image enhancement, correction, and analysis are essential in addressing these challenges and advancing underwater exploration.
Classic image enhancement methods in underwater settings rely on the physical modeling of light propagation to approximate and reverse image degradation [
1] or employ priors such as dark-channel estimation [
2] and color information [
3] to improve visual quality. Beyond enhancement, conventional approaches to underwater object detection, segmentation, and tracking have relied primarily on handcrafted features and heuristic rules [
4,
5]. Although these approaches can improve underwater image quality, they typically depend on fixed assumptions that may not generalize well to the diverse and dynamic conditions encountered in underwater environments. In recent years, deep learning-based methods have emerged as powerful alternatives for underwater image processing. These methods leverage large amounts of data and neural network architectures to automatically learn representations and achieve impressive performance. For example, several studies have applied convolutional neural networks, transformers, and state-space models to enhance and restore underwater images [
6,
7,
8,
9,
10]. In addition, deep learning techniques have been widely adopted for underwater object detection and segmentation [
11,
12,
13,
14,
15], demonstrating strong performance in various underwater environments. Compared to traditional methods, deep learning-based approaches offer improved robustness, leading to enhanced image detail, higher accuracy, and better adaptability to complex underwater conditions.
This book, Application of Deep Learning in Underwater Image Processing, presents nine innovative approaches that leverage deep learning techniques to address the key challenges in underwater image processing. In addition to a quantitative study on underwater image enhancement, this publication includes a comprehensive introduction to related topics, including image restoration, underwater object detection and segmentation, and sonar image analysis. Researchers have conducted in-depth analyses and integrated advanced techniques, including Generative Adversarial Networks (GANs) and the You Only Look Once (YOLO) series, to further advance deep learning applications in underwater image processing. Furthermore, we provide a thorough comparative analysis of conventional and deep learning-based methods. In summary, this book serves as a comprehensive resource exploring recent breakthroughs in deep learning for underwater imaging, offering both practical tools and conceptual frameworks for professionals engaged in marine science, engineering, and computer vision research.
2. An Overview of Published Articles
Awan et al. (contribution 1) address the challenging problem of underwater image degradation caused by color distortion and contrast loss due to light attenuation and scattering. Existing methods typically use linear transformations for color compensation followed by image enhancement. However, the authors observed that linear transformations for color compensation may fail to enhance images across a variety of underwater scenes. To address this problem, the authors propose a dual-pathway framework that uses a classifier to categorize underwater images as Type I or Type II based on their color characteristics. Type I images benefit from linear transformation, whereas Type II images decline in quality when a linear transform is applied. Depending on the classification, images are then processed using either the Deep Line Model or the Deep Curve Model, which perform linear or nonlinear transformations, respectively. The framework demonstrates superior performance in restoring underwater images on benchmark datasets. Future research directions include refining the classifier to handle ambiguous cases, extending the model to more nuanced image categories, and integrating richer performance metrics for deeper insights into image quality restoration.
Fu et al. (contribution 2) tackle the challenges of underwater image degradation caused by absorption, scattering, and complex lighting conditions. Existing GAN-based methods often suffer from instability during training, resulting in suboptimal enhancement. To address these issues, the authors propose the Multi-scale Evolutionary Generative Adversarial Network (MEvo-GAN), which integrates genetic algorithms into GANs to enhance underwater images. The MEvoGAN framework employs a multi-path generator architecture to extract features at different spatial scales. This design improves the network’s ability to recover fine textures and global structures from degraded underwater images. In addition, the authors integrated an evolutionary algorithm composed of variation, evaluation, and selection modules. These components guide generator training by simulating natural selection, generating multiple offspring models, and choosing the best-performing candidates based on metrics. The experimental results show that MEvoGAN outperforms existing methods in restoring underwater images in benchmark datasets.
Gayá-Vilar et al. (contribution 3) address the challenge of efficiently monitoring cold-water coral habitats in deep-sea environments. These habitats are difficult to assess due to limited visibility, light attenuation, and complex backgrounds. Traditional object detection methods are limited in terms of computational efficiency and real-time performance. To overcome these limitations, the authors utilized the YOLOv8l-seg to detect and segment coral species in underwater images. The experimental results demonstrated that YOLOv8l-seg is effective in monitoring cold-water coral species. Furthermore, the study revealed that the coral distribution is highly uneven and is greatly influenced by environmental conditions.
Jiang et al. (contribution 4) propose an improved YOLOv8 for identifying and classifying marine debris. Marine debris has caused significant environmental damage, emphasizing the necessity of cleanup. However, the identification and classification of marine debris are hindered by low underwater visibility, which slows down cleanup operations. To overcome these limitations, the authors integrated the clo block transformer module into the YOLOv8 backbone network. This enhancement improves the extraction of both high- and low-frequency features from underwater debris images, thereby enhancing the perception of crucial image information, particularly for small, indistinct targets. Furthermore, the authors introduced the coarse-to-fine spatial and channel reconstruction module to reduce spatial and channel redundancy and enhance feature representation to handle confusion caused by suspended matter and varying light intensities underwater. The experimental results show that the improved model outperforms the original YOLOv8 in marine debris detection tasks.
Zheng et al. (contribution 5) propose Multi-gradient Feature Fusion YOLOv7 (MFF-YOLOv7) to address the challenges of target detection in underwater sonar images. These challenges include the complex underwater environment, low-quality sonar image data, and limited sample sizes. MFF-YOLOv7 involves several key modifications to the YOLOv7 model, including replacing its spatial pyramid pooling channel shuffling and pixel-level convolution with a multi-scale information fusion module to enhance multi-scale feature processing and reduce missed detections for various target sizes. In addition, the authors introduced recurrent feature aggregation convolution to improve feature extraction and adaptability to noisy sonar images. This allows the model to better learn and represent target features. Furthermore, a spatial and channel synergistic attention mechanism was integrated to help the model focus on crucial features, thereby boosting recognition accuracy and robustness in challenging underwater environments. The experimental results show that MFF-YOLOv7 achieves higher accuracy than other object detection approaches.
Yang et al. (contribution 6) address the challenges of target recognition in side-scan sonar (SSS) images, which suffer from distortion and noise, leading to blurred details and feature loss. Existing models often have limitations when deployed on edge devices due to their high computational complexity and resource consumption. To address these issues, the authors propose SS-YOLO, a lightweight deep learning model focused on SSS target detection that aims to achieve both a lightweight design and enhanced accuracy. The SS-YOLO framework improves the YOLOv8 model by replacing the complex convolutional layer in the coarse-to-fine module with a combination of partial convolution and pointwise convolution to reduce redundant computations and memory access. Additionally, they integrated an adaptive scale spatial fusion module using 3D convolution to combine multi-scale feature maps, maximizing the extraction of invariant features and addressing information loss. The authors also included an improved multi-head self-attention mechanism in the detection head, which enhanced the model’s ability to focus on important features with low computational load. Furthermore, the authors propose a new side-scan sonar dataset, created by combining self-collected and public data and expanding it through augmentation to overcome limited sample sizes. The experimental results demonstrate that SS-YOLO outperforms the original YOLOv8 model in terms of accuracy while maintaining lower model complexity.
Lu et al. (contribution 7) address the challenges of underwater object detection caused by the limitations of sonar imaging, such as noise, low resolution, and the lack of texture and color information. To overcome these challenges and improve detection accuracy, the authors developed AquaYOLO, an enhanced version of YOLOv8. YOLOv8 consists of a backbone module, neck module, and prediction module. The backbone module performs initial feature extraction, the neck module refines and captures detailed features, and the prediction head identifies and localizes objects. The authors propose a residual block to replace traditional convolutional layers in the backbone module for improved feature extraction. In addition, they propose a dynamic selection aggregation module in the neck module to dynamically fuse multi-layer features and enhance feature correlation. The experimental results show that AquaYOLO achieves superior performance on a custom sonar image dataset.
Lin et al. (contribution 8) address the challenges of underwater image degradation stemming from color attenuation, scattering, and noise from artificial illumination. In contrast to traditional GAN-based restoration models, which often require paired data or suffer from color inconsistencies, the authors propose a Dual-CycleGAN framework with dynamic guidance for robust underwater image restoration. Their framework comprises two collaboratively trained CycleGANs: a Light Field CycleGAN, which generates enhanced light field guidance images, and a Restoration CycleGAN, which performs the actual restoration process. Integrating light field information into the model guides the restoration process, significantly improving color fidelity and structural detail. A comprehensive set of loss functions supports the training process, including adversarial, cycle consistency, perceptual, identity, patch-based contrast quality index, intermediate output, and color balance losses. These functions ensure enhanced training stability and perceptual quality. The Dual-CycleGAN achieves state-of-the-art performance with reduced computational complexity.
Ma et al. (contribution 9) conducted a comparative study on a traditional physical model and four deep learning-based approaches for underwater image enhancement. The traditional method employed techniques such as color balance correction, LAB spatial decomposition, adaptive histogram, bilateral filter, and Laplace pyramid decomposition. The deep learning-based methods included water-net, UWCNN, UWCycleGAN, and U-shape Transformer. Based on evaluations using the UIEB dataset, the authors conclude that traditional methods are more effective in shallow and stable waters environments, while deep learning-based methods are better suited to diverse and dynamic underwater conditions. For future work, the authors suggest incorporating physical priors into deep learning architectures, developing lightweight models, and exploring adaptive enhancement strategies.