Detection Method of Marine Biological Objects Based on Image Enhancement and Improved YOLOv5S

Li, Peng; Fan, Yibing; Cai, Zhengyang; Lyu, Zhiyu; Ren, Weijie

doi:10.3390/jmse10101503

Open AccessArticle

Detection Method of Marine Biological Objects Based on Image Enhancement and Improved YOLOv5S

by

Peng Li

¹

,

Yibing Fan

¹,

Zhengyang Cai

¹,

Zhiyu Lyu

² and

Weijie Ren

^1,*

¹

College of Intelligent Systems Science and Engineering, Harbin Engineering University, Harbin 150001, China

²

School of Automation Engineering, Northeast Electric Power University, Jilin 132012, China

^*

Author to whom correspondence should be addressed.

J. Mar. Sci. Eng. 2022, 10(10), 1503; https://doi.org/10.3390/jmse10101503

Submission received: 7 September 2022 / Revised: 5 October 2022 / Accepted: 13 October 2022 / Published: 16 October 2022

(This article belongs to the Special Issue Advances in Ocean Monitoring and Modeling for Marine Biology)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Marine biological object detection is of great significance for the exploration and protection of underwater resources. There have been some achievements in visual inspection for specific objects based on machine learning. However, owing to the complex imaging environment, some problems, such as low accuracy and poor real-time performance, have appeared in these object detection methods. To solve these problems, this paper proposes a detection method of marine biological objects based on image enhancement and YOLOv5S. Contrast-limited adaptive histogram equalization is taken to solve the problems of underwater image distortion and blur, and we put forward an improved YOLOv5S to improve accuracy and real-time performance of object detection. Compared with YOLOv5S, coordinate attention and adaptive spatial feature fusion are added in the improved YOLOv5S, which can accurately locate the target of interest and fully fuse the features of different scales. In addition, soft non-maximum suppression is adopted to replace non-maximum suppression for the improvement of the detection ability for overlapping objects. The experimental results show that the contrast-limited adaptive histogram equalization algorithm can effectively improve the underwater image quality and the detection accuracy. Compared with the original model (YOLOv5S), the proposed algorithm has a higher detection accuracy. The detection accuracy AP50 reaches 94.9% and the detection speed is 82 frames per second; therefore, the real-time performance can be said to reach a high level.

Keywords:

marine biological object; object detection; image enhancement; deep learning; improved YOLOv5S

1. Introduction

Oceans account for 70 percent of the earth’s surface; they are rich in natural resources and play an important part in the existence and development of mankind [1]. The exploration and development of marine resources have recently become the focus of many countries along with the continuous development of marine information technology and the increasing demand for human life [2]. Marine biological detection technology is regarded as a critical link for the development of marine resources. It is of great significance for rescue missions, artificial structure inspection, ecological monitoring and marine life tracking [3]. In practical applications, the complexity of the underwater environment increases the difficulty of marine biological detection, so the research of marine object detection methods is particularly important.

The challenge of the underwater environment poses two difficulties: (i) wavelength-selective attenuation reduces and distorts the contrast between objects and backgrounds [4], (ii) polarization and atomization effects may cause false positives [5]. Most underwater object detection has resulted in a significantly poor performance due to these difficulties above [6]. Usually, the key technology of underwater object detection based on optical images includes image preprocessing and object detection. Underwater image preprocessing methods are divided into underwater image enhancement methods and underwater image restoration methods [7]. The underwater image enhancement method only uses a computer graphics method to improve image clarity, and there is no need for an implementation process of an algorithm to consider the specific physical imaging process. The underwater image restoration methods aim to inversely solve the underwater imaging model to obtain the restored underwater image [8]. The algorithm process depends on the underwater image degradation model. Marine biometric detection algorithms are generally divided into two categories: traditional object detection algorithms and convolution neural network-based object detection algorithms. The process of traditional object detection consists of three parts: image preprocessing, image feature extraction and classification. In practical application, the complexity of the underwater environment increases the difficulty of marine biological detection [9].

Abstract deep learning allows computational models that are composed of multiple processing layers to learn representations of data with multiple levels of abstraction [10]. With the rapid development of deep learning, people have introduced more powerful tools to solve the problems in traditional architecture. Therefore, it is widely used in the field of vision, such as image and video semantic segmentation [11], object detection [12], disease prediction [13], etc. Multiple studies have gradually introduced deep learning to underwater object classification and detection along with the mass progress of computer vision technology, which provides a new direction for marine biological detection. Object detection algorithms based on deep learning are the mainstream detection algorithms at present [14].

Regardless of traditional methods or algorithms based on deep learning, the detection effect is usually poor for objects with poor image quality, blurred backgrounds, and complex backgrounds. The detection of underwater biological object requires both high accuracy and high real-time. At present, the common detection algorithms struggle to balance the relationship between accuracy and speed. In this paper, relevant object detection models are selected by reading papers, and a large number of simulation experiments are carried out on the selected models. YOLOv5S is discovered to have certain advantages in accuracy and speed. Thus, YOLOv5S is chosen as the main method for marine biological object detection.

In order to make YOLOv5s have a better detection effect, this paper first improves the underwater image quality, background blurring and background complexity. By comparing with other traditional unsupervised preprocessing algorithms, the contrast-limited adaptive histogram (CLAHE) is selected to apply to ocean image enhancement to improve image brightness, correct blue-green color, and enhance image contrast skew. At the same time, in order to make YOLOv5S have better model capability, a coordinated attention (CA) mechanism and an adaptive spatial feature fusion (ASFF) were added to YOLOv5S, which fully fused the features of different scales, thus improving the performance of detection models. Soft non-maximum suppression (Soft NMS) is used instead of non-maximum suppression (NMS) to enhance the detection ability of overlapping objects.

The rest of this document is organized as follows: Section 2 briefly introduces the related work. Section 3 describes the image preprocessing methods in detail. Section 4 introduces the YOLOv5S object detection model and improvement strategies. Section 5 carries out the experimental research, and the analysis of the experimental results on the test dataset are presented. Then, Section 6 concludes the whole work. Finally, a new method based on image enhancement and improved YOLOv5S applied to marine biology detection has been obtained to achieve detection with excellent accuracy and outstanding real-time performance.

2. Related Work

Due to the special underwater imaging environment, underwater image processing is of great significance for marine biological object detection. As is well-known, most underwater images have blur and color distortion, which are caused by light scattering and absorption. When light propagates in water, it scatters when it meets suspended particles, and forward scattering blurs the details of the image. Backscattering causes foggy blur and reduces the image contrast [15]. In addition, owing to light being selectively absorbed by water, red light with longer wavelength attenuates the fastest, and shorter-wavelength blue light propagates the farthest, so the underwater image is often blue-green. These two characteristics of underwater images have a great impact on object detection; therefore, enhancing and restoring the original underwater image is essential. The conventional unsupervised image processing methods can be generally divided into two categories: unphysical model enhancement methods and physical model-based restoration methods [16]. There is no need for underwater image enhancement methods to concern the physical imaging process and not require obtaining the optical parameters of the water body in advance [17]. These methods directly use image processing methods to improve image clarity, mainly including histogram-based enhancement algorithms, Retinex-based enhancement algorithms, filtering and information processing-based enhancement algorithms, image fusion-based enhancement algorithms and convolutional neural network (CNN) based enhancement algorithms [18]. Pizer et al. [19] proposed adaptive histogram equalization (AHE). It is AHE that redistributes the image brightness through the calculation of the image local histogram to make the image contrast changed, so AHE is an algorithm that has the ability to improve the image local contrast. For better performance of the AHE algorithm, Zuiderveld [20] proposed contrast-limited adaptive histogram equalization. Jobson et al. proposed a multi-scale Retinex enhancement algorithm [21] and a multi-scale Retinex with color restoration (MSRCR) enhancement algorithm [22]. A priori knowledge is considered in underwater image restoration and the dark channel prior algorithm proposed by H. Kaiming et al. [23]. In addition to the dark channel prior method, Biancoet et al. [24] proposed a new prior method, which can calculate the transmittance map by using the difference of attenuation of each color channel. Wang et al. [25] proposed the adaptive attenuation curve prior (AACP) method, and Shin et al. [26] proposed a general convolution structure, from which the transmission image and background light of underwater image can be learned at the same time to realize underwater image restoration. In order to further improve the restoration effect, Hu et al. [27] proposed a parallel convolution neural network to process the underwater image. The network includes two parallel branches to estimate the transmission image and background light. Finally, the background light and transmission image are input into the underwater imaging model to restore the underwater image.

The development of marine biological detection is synchronized with general object detection. Traditional underwater object detection methods are mostly based on SIFT [28], HOG [29] and shape features [30]. The accuracy of object detection is greatly improved by object detection methods based on CNNs along with the development of deep learning. Algorithms based on CNN are mainly divided into two major classes, two-stage target detection and one-stage target detection [31]. Two-stage target detection algorithms divide the detection problem into two phases: in the first phase, the algorithm generates candidate regions, while in the second phase, the algorithm classifies and refines candidate regions. Such algorithms include R-CNN [32], Fast R-CNN [33], Faster R-CNN [34] and Cascade R-CNN [35]. Two-stage detection algorithms can achieve the most accurate detection effects. Although they are accurate, the prediction speed is very slow, and it has high hardware requirements, which is difficult to meet in terms of real-time requirements. One-stage object detection algorithms detect and classify simultaneously and directly outputs the classification probability and location coordinate values of the targets. Typical algorithm models, such as the YOLO series [36,37,38,39], SSD [40], RetinaNet [41], FreeAnchor [42], FSAF [43] and FCOS [44], may be slightly less accurate than the two-stage model, but have higher real-time predictive power than the two-stage model. Therefore, the greatest advantage of this kind of network model is its fast speed, although the accuracy is slightly lower than that of the two-stage detector. Yang et al. [45] used two representative target detection algorithms (YOLOv3 and Fast R-CNN) for underwater object detection. which can represent two major kinds of object detection. Song et al. [46] proposed an automatic real-time underwater object detection method based on an improved convolutional neural network. Salman et al. [47] proposed an automatic method to detect and locate fish instances in unconstrained underwater video by using a depth R-CNN network. Cao et al. [48] proposed a real-time and lightweight multi-scale object detector to detect underwater live crabs. Ammar Mahmood et al. [49] developed a novel method for capturing feature information by adding a convolutional block attention module to the YOLOv5 backbone network with a self-adaptive global histogram stretching algorithm designed to eliminate degradation problems. Ref. [50] proposed an improved YOLOv3 real-time detection model called YOLOv3-DPFIN to achieve accurate detection of multi-level sonar objects. Sun et al. [51] proposed a method to detect and remove crosstalk noise in forward scanning sonar images using CNN. Hu et al. [52] proposed a shrimp net based on CNN to realize shrimp recognition, but there are few CNN layers in shrimp net, and the ability to extract deep features is insufficient, so it is difficult for it to detect shrimp of different sizes. Because underwater biological detection is mostly small object detection, and the biological size is very small, it is necessary to deeply extract and enhance the fusion of features. Ref. [53] proposed an improved YOLO-SC (YOLO-Submarine Cable) detection method based on the YOLO-V3 algorithm, built a testing environment for submarine cables, and created a submarine cable image dataset.

3. Image Enhancement Algorithm

CLAHE is an image enhancement algorithm derived from adaptive histogram equalization (AHE). The image contrast is changed by AHE algorithm through the calculation of the image local histogram and the redistribution of the brightness. The algorithm can not only obtain more image details but also improve the local contrast of the image. The CLAHE mechanism is as follows.

3.1. Contrast Ratio

Contrast refers to the measurement of different brightness levels between the brightest white and the darkest black in the light and dark areas of an image. The image contrast is measured by the grayscale range, which can be obtained by observing the grayscale histogram. The larger the grayscale range, the higher the contrast. The most commonly used quantitative measurement method is Michelson contrast. Michelson contrast is defined by Formula (1).

C = \frac{I_{max} - I_{min}}{I_{max} + I_{m i i n}}

(1)

In Formula (1),

I_{max}

represents the lightest brightness, and

I_{min}

represents the darkest brightness.

3.2. Histogram Equalization

Histogram equalization is often used to increase the global contrast of many images, especially when the contrast of useful data in an image is fairly close. In this way, the brightness can be better distributed on the histogram. This can be used to enhance local contrast without affecting the overall contrast, which is achieved by effectively extending the commonly used brightness.

Suppose the input image is I, the height is H, the width is W, the

H i s t_{I}

represents the gray histogram of I, and the

H i s t_{I} (k)

represents the number of pixels with a gray value equal to k, where

k \in [0, 255]

. The global histogram equalization operation maps image I to make the gray histogram

H i s t_{O}

of output image O equalized, so that the number of pixels in each gray level is approximately equal, as per Formula (2).

H i s t_{O} (k) \approx \frac{H \times W}{256} k \in [0, 255]

(2)

Then, for any gray level p(

0 \leq p \leq 255

), q(

0 \leq q \leq 255

) can always be found, making

\sum_{k = 0}^{p} H i s t_{I} (k) = \sum_{k = 0}^{q} H i s t_{O} (k)

.

\sum_{k = 0}^{p} H i s t_{I} (k)

and

\sum_{k = 0}^{q} H i s t_{O} (k)

are called the cumulative histograms of I and O, respectively. Additionally, because of

H i s t_{O} (k) \approx \frac{H \times W}{256}

, Formula (3) is obtained.

\sum_{k = 0}^{p} H i s t_{I} (k) \approx (q + 1) \times \frac{H \times W}{256}

(3)

Simplify Formula (3) to Formula (4).

q \approx \frac{\sum_{k = 0}^{p} H i s t_{I} (k)}{H \times W} \times 256 - 1

(4)

The above formula gives a mapping from an input pixel with a gray level of p to an output pixel with a gray level of q, and thus, Formula (5) can be obtained.

O (r, c) = \frac{\sum_{k = 0}^{I (r, c)} H i s t_{I} (k)}{H \times W} \times 256 - 1

(5)

where

I (r, c)

is the gray value of row r and column c of input image I, and

O (r, c)

is the gray value of the output at the corresponding position of output image O, where

0 \leq r < H

,

0 \leq c < W

.

3.3. Local Processing

From the above, histogram equalization is the global processing of images. For a particular image data, there are many obvious noise points using the histogram equalization algorithm, even weakening the details of the image, making the image appear paradoxically worse.

The traditional histogram equalization is for the whole image, but in fact, it is unreasonable to directly equalize the whole image. The global histogram equalization is not effective when there are some differences among the parts of the image. For a particular image data, there are many obvious noise points using the histogram equalization algorithm, even weakening the details of the image, making the image appear paradoxically worse. Therefore, an adaptive histogram equalization algorithm (AHE) is generated. It follows from this that for the grayscale values

I (r, c)

of the pixels in column r of row r, equalizing them with histograms takes the mapping of Formula (5). For the adaptive histogram equalization algorithm, on the basis of the conventional histogram equalization algorithm, the images are divided into blocks and processed separately, and no small piece of the images is statistically a distribution function that is exclusive to each other.

3.4. Contrast Limit

AHE does not make a transitional treatment of the block to the block’s edge and suffers from overmagnifying the image. Without noise, the gray histogram of each small block is limited to a small grayscale. However, if there is noise, the noise will be amplified after histogram equalization of each segmented block. To avoid noise, a new method called “contrast-limited adaptive histogram equalization” is proposed based on adaptive histogram equalization. If the histogram exceeds the preset “Limit Contrast”, it will be clipped, and the clipped parts will be evenly distributed to other parts; thus, the histogram will be reconstructed.

4. The Improved YOLOv5S Object Detection Model

YOLOv5 is a one-stage object detection model based on convolution neural networks. There are four different models, YOLOv5S, YOLOv5M, YOLOv5L and YOLOv5X. From the perspective of model size and real-time, YOLOv5S is used as the detection network in this paper. At the same time, to make YOLOv5S have better model capability, a coordinated attention (CA) mechanism and an adaptive spatial feature fusion (ASFF) were added to YOLOv5S, which fully fused the features of different scales, thus improving the performance of detection models. Soft non-maximum suppression (soft NMS) is used instead of non-maximum suppression (NMS) to enhance the detection ability of overlapping objects.

4.1. YOLOv5S

YOLOv5S is the model with the smallest depth and the smallest width of characteristic graphs in the YOLOv5 series. The model selected in this paper is YOLOv5S, which has the smallest size. It can meet our requirements with precision and is much faster than other models. The YOLOv5S model structure is shown in Figure 1.

From Figure 1, as can be seen, the YOLOv5 model mainly consists of three parts: BACKBONE, PAFPN and HEAD. A network of classifiers with outstanding performance is generally used as the BACKBONE to extract some common feature representations. YOLOv5 retains the CSPDarknet53 structure and adds Focus and SPP structures to make up the skeleton in comparison with the previous YOLO series. The neck network of YOLOv5 still uses the FPN+PAN structure, named PAFPN, and uses the CSP-net structure for reference to enhance the ability of network feature fusion. It is the head that complete the output of the object detection result. Different detection algorithms have different numbers of output dimensions of the model, but usually include classification and regression parts. Furthermore, YOLOv5 takes GIOU-Loss to replace the Smooth L1 Loss function for the further improvement of detection accuracy.

4.2. Improvement Strategy

In this paper, three improvement measures are proposed for YOLOv5S:

(1) ASFF is added before the detection layer to achieve the cross-layer fusion of the feature layer, make full use of the features of different scales, and improve the model detection effect.

Many models use FPN to capture multilevel features to take full advantage of the semantic information of high-level features and fine-grained features of low-level features, but either YOLOv3 or RetinaNet uses direct connections or additions, which do not take full advantage of features of different scales. ASFF can adaptively learn the spatial weights of feature map fusion at different scales and has a better feature fusion effect. In this paper, ASFF is added after the PAFPN layer for further feature fusion. The ASFF structure can be represented in Figure 2.

The features of each layer can be fused according to Formula (6):

y_{i j}^{l} = α_{i j}^{l} \cdot x_{i j}^{L 1} + β_{i j}^{l} \cdot x_{i j}^{L 2} + γ_{i j}^{l} \cdot x_{i j}^{L 3}

(6)

where

x_{i j}

stands for input characteristics. For example,

x_{i j}^{L 1}

represents the features of the location

(i, j)

of the layer

l_{1}

.

α_{i j}^{l}, β_{i j}^{l}, γ_{i j}^{l}

are the weights for the corresponding positions of the signature map, and

α_{i j}^{l} + β_{i j}^{l} + γ_{i j}^{l} = 1

. These three parameters can be adaptively learned by the network.

y_{i j}

represents the characteristics of the output, and

y_{i j}^{l}

represents the characteristics of the location of the output to the

l_{t h}

ASFF layer.

(2) Coordinate attention is introduced after the PAFPN layer. The content that needs more attention can be picked out by using an attention mechanism, which has extensive use in deep neural networks. Compared with previous attention methods on lightweight networks, coordinate attention has the following advantages. First, it captures both direction-aware and location-aware information while extracting cross-channel information, which makes the model locate and identify objects of interest more accurately. Second, coordinate attention is flexible and lightweight and can be easily plugged into classical modules, bringing huge benefits to downstream tasks. The precise location information is used to encode the relationship between channel and long-term correlations by CA. Similar to the SE module, there are two steps in the process of CA, with coordinate information embedding as the first step and coordinating attention generation as the second step. Its specific structure is described in detail in Figure 3.

It is the CA module that decomposes the solution of channel attention into two parallel 1D feature coding processes; thus, the generated attention map by CA effectively integrates the spatial coordinate information. Specifically, each channel of the input X is encoded respectively along horizontal and vertical coordinates through pooled kernels of dimensions (H, 1) and (1, W). Therefore, the output for channel

c_{h}

with a height of h is described below.

z_{c}^{h} (h) = \frac{1}{W} \sum_{0 \leq i < W} x_{c} (h, i)

(7)

Similarly, the output for the c-th channel with a width of W is described below;

z_{c}^{w} (w) = \frac{1}{H} \sum_{0 \leq j < H} x_{c} (j, w)

(8)

These two transformations fuse features along two spatial directions, returning two direction-aware attention maps. Then, specific direction information is embedded into two feature maps, and the feature map is encoded into two attention maps, each capturing long-range dependence of the input feature map along one spatial direction. Therefore, the representation of the feature map is enhanced by the multiplication between input feature map and attention maps which save the location information.

(3) Soft-NMS is used to improve the detection capability of overlapping objects. The specific steps of the algorithm can be shown in Algorithm 1. Non-maximum suppression (NMS) plays a significant role in the process of the object detection. When object detection is in progress, the object detection box with high confidence is extracted, while the false detection box with low confidence is suppressed. Generally speaking, when the parsing model is output to the object frame, there will be many object frames, depending on the number of anchors. Many of the duplicate boxes are located at the same object, and NMS is used to remove these duplicate boxes and obtain the real object boxes. The NMS algorithm first sorts the recommendation boxes by score from high to low; then the detection box M with the highest score would be selected, and the other boxes that overlap significantly with the highest score recommendation box M would be suppressed. The remaining check boxes will be recursively manipulated by this procedure. According to the principle of the algorithm, if a real existence object is within the preset overlap threshold, it may not be detected. That is, when the two object boxes are close together, the boxes with lower scores are deleted because they overlap too much.

Forcing the scores of neighboring low-confidence detection boxes to zero is the biggest problem of the NMS algorithm. If the same kind of object is dense and there is occlusion, it is easy to miss the detection. Soft-NMS learned the lessons of NMS. In specific algorithm implementation, instead of directly deleting the bounding box where the IOU is higher than the setting threshold, it reduces the confidence score of the box. It can improve the detection ability of the model for overlapping occluded objects without increasing the amount of calculations. The improved YOLOv5S structure is shown in Figure 4:

Algorithm 1: Soft non-maximum suppression.

5. Experimental Research and Result Analysis

5.1. Dataset Establishment

(A) Dataset. This paper uses the underwater biological image dataset, which was officially supplied by the China Underwater Vehicle (Zhanjiang) Competition in 2020. The dataset contains 5534 images of holothurian, echinus, starfish and scallops, as shown in Figure 5.

(B) Dataset image enhancement. Because many images in the original dataset have low contrast and poor color distortion quality, some images with poor quality are selected for enhancement. The CLAHE underwater image enhancement algorithm is used to enhance the dataset. The enhanced image is compared with the original image as shown in Figure 6. It can be seen that the RGB three-channel histogram of the enhanced image is stretched, and the image contrast and image quality is improved. The enhanced image and the original image constitute the dataset of this paper, totaling 8938 images.

5.2. Model Training

Transfer learning was used as the training method. The model was pretrained on the standard coco dataset, and then 100 epochs were trained using the sample dataset enhanced in the paper. In addition to the image enhancement technology mentioned above, cutout, random cropping, mosaic, flipping, rotating were also used in the preprocessing of training. The optimizer was SGD, the initial model learning rate was set to 0.001 and the input image shape was set to 640 × 640. The batchsize was set to 16, and the learning rate decreased to 10% at 75 and 90 epochs. The dataset was divided into train val:test = 9:1, train:val = 9:1, with 7241 images in the training set, 804 images in the cross-validation set, and 893 images in the test set. The test set was independent from the validation set, and test images were not used during training.

5.3. Experimental Results

5.3.1. Validation of Image Enhancement

The image enhancement algorithms of CLAHE, Retinex, ACE and DCP image restoration methods based on physical models were used in this paper. The image processing effect is shown in Figure 7.

From Figure 7, it can be easily seen that the image contrast enhancement after ACE and CLAHE processing is better than other methods, but some of the image color deviation problems after ACE processing still exist; the image contrast after DCP processing is reduced and the bluish-green deviation is more serious. After RGHS treatment, the brightness of the image is improved, but some of the images are distorted and the color is red.

Different image processing algorithms that enhance the image for model training and detection are used to verify the impact of different image enhancement algorithms on model detection results. The enhanced images of each algorithm are trained on the yolov5s model and tested on the test set. The experimental results are shown in Table 1. From the experimental results, it can be known that, compared with the original image, the detection accuracy of the model is improved after CLAHE, RGHS, ACE and DCP image preprocessing, especially for holothurian. The CLAHE algorithm

A P^{50}

was improved by 3.6%, the ACE algorithm

A P^{50}

was improved by 2.5%, the RGHS algorithm

A P^{50}

was improved by 2.3%, the DCP algorithm

A P^{50}

was improved by 1.9%, and the CLAHE algorithm achieved the best detection results. Compared with the original image,

A P^{50}

was improved by 2.6% and

A P

was increased by 2.5%, which fully demonstrates the effectiveness of CLAHE algorithm for underwater image enhancement.

5.3.2. Validation of Improved Algorithm

In this section, the influence of the improvement strategies introduced to the model is verified with the value of detection AP. The experimental results are shown in Table 2. The detection accuracy of the model improved by the three methods is significantly higher than that of the original YOLOv5S. After adding ASFF,

A P^{50}

increased by 1.9% and

A P

increased by 5.6%. After adding Soft-NMS,

A P^{50}

increased by 0.8% and

A P

increased by 1.4%. Finally, after adding the CA model,

A P^{50}

increased by 1.3% and

A P

increased by 2.1%. Combining the above three improved methods, compared with YOLOv5S, the

A P^{50}

of the improved YOLOv5S increased by 4%, and the

A P

increased by 9.1%.

The model detection P-R diagram of ASFF, Soft-NMS and CA added in sequence is shown in Figure 8. As can be seen from the figure, the detection accuracy of the three detection methods for holothurians, starfish and scallops has been significantly improved, especially for holothurians. The ASFF remarkably improved the

A P^{50}

, and the

A P^{50}

was improved by 4.3%. The other two methods for improving also enhanced their detection ability. The three enhancement methods finally improved the detection accuracy of the improved YOLOv5S for holothurians (

A P^{50}

) by 7.2%, scallops (

A P^{50}

) by 5.9% and starfish (

A P^{50}

) by 2.6%. Finally, the detection accuracy of holothurians, echinus, scallops and starfish is 94.6%, 93.0%, 96.0% and 95.8%, respectively.

5.3.3. Comparison of Detection Effects of Different Models

In this paper, two-stage architectures (Faster R-CNN, Cascade R-CNN) with backbone resnet50, single-stage architectures (RetinaNet, FreeAnchor, FSAF, FCOS) with backbone resnet50, YOLO series and the improved YOLOv5S are selected for comparison to evaluate the detection performance of the improved YOLOv5S. The experimental results on test datasets are shown in Table 3. The results of Table 3 show that the YOLO series has obvious advantages in marine biological detection compared with other models. Whether it is a two-stage architecture or a one-stage architecture, its precision has no advantage over the YOLO series. However, in terms of speed, the YOLO series is much faster than other models. In particular, the improved yolov5s not only surpasses other one stage two stage models in

A P^{50}

, but also has 3–5 times the speed. As for the YOLO series, the

A P^{50}

of the improved YOLOv5S is 94.9%, and the AP is 62.8%, which is 4% and 9.1% higher than that of YOLOv5S before the improvement, while the detection frames per second is only slightly lower than before the improvement. The detection accuracy of the improved YOLOv5S is equivalent to YOLOv3; the model volume is only 7% of that of YOLOv3, and the detection speed is 4 times that of YOLOv3. YOLOv4 performs poorly on the data set in this paper. Compared with YOLOv4, the improved YOLOv5S has great advantages in accuracy and speed. The model volume is only 8% of YOLOv4. Although the detection accuracy of YOLOv5M is slightly better than that of the improved YOLOv5S, the volume of the improved YOLOv5S model is only half of YOLOv5M, and the speed is faster than YOLOv5M. Overall, the improved YOLOv5S has achieved good detection results; in particular, the AP has increased by 9.1% compared with before improvement. That is because adding CA makes the model extract more important and interesting features, and adding ASFF causes the cross-layers of different scale feature maps to fuse so that the classification network can obtain more informative feature maps so as to make the detection frame of the model more accurate. Most of the datasets in this paper are dense small objects, and the overlapping occlusion is serious. The addition of Soft-NMS improves the detection ability of the model for overlapping and occluded objects, which is also an important reason for the sharp increase of model AP.

Figure 9 shows the comparison of detection effects of different models. From the diagram, it can be seen that the YOLOv3 model has some false and missed cases. The starfish has been misdetected and the sea cucumber has been missed in Figure 9b. YOLOv5S has missed inspection, and the holothurian is missed in Figure 9b. YOLOv4, YOLOv5M and the improved YOLOv5S can achieve complete detection of objects, but the improved YOLOv5S box regression is more accurate and has higher confidence.

6. Conclusions

To realize the rapid detection of underwater biological images with low definition, low contrast and low quality, an improved model based on YOLOv5S is proposed for the effective real-time detection of underwater objects. In this paper, the CLAHE algorithm is used to enhance the underwater dataset. Then, in the model, the ASFF is used to make feature maps more fully fused, CA attention mechanism is introduced, and Soft-NMS is used to enhance the model’s ability to detect overlapping objects. Finally, the comparative experiment of the improved YOLOv5S and other models indicates that the detection accuracy of the improved YOLOv5S network has been greatly increased compared with the original YOLOv5S. The

A P

reached 62.8%, representing an increase of 9.1%, and

A P^{50}

reached 94.9%, representing an increase of 4%. The model volume is only 20.4 megabytes, the detection speed has great advantages compared with other models, and the FPS reaches 82. The improved YOLOv5S model in this paper has strong expansibility, can be applied to other marine biological detection and identification, plays a good role in promoting the identification and object detection of marine biological resources, and lays a foundation for the next deployment to mobile devices.

Author Contributions

Conceptualization, P.L. and W.R.; methodology, P.L., Y.F. and Z.C.; software, Y.F. and Z.C.; writing—original draft preparation, P.L. and Y.F.; writing—review and editing, Z.L. and W.R.; funding acquisition, Z.L. and W.R. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Nature Science Foundation of China (51879060), the Natural Science Foundation of Heilongjiang Province (LH2021F017), the Fundamental Research Funds for the Central Universities (3072022TS0402, 3072022JC0404) and the Research Initiation Project of Northeast Electric Power University (BSJXM-2021022).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

AHE	Adaptive histogram equalization
CLAHE	Contrast-limited adaptive histogram equalization
CA	CACoordinate attention
ASFF	Adaptive spatial feature fusion
NMS	Non-maximum suppression
Soft-NMS	Soft non-maximum suppression
FPS	Frames per second
AP	Average precision
CNN	Convolutional neural network

References

McLellan, B.C. Sustainability assessment of deep ocean resources. Procedia Environ. Sci. 2015, 28, 502–508. [Google Scholar] [CrossRef] [Green Version]
Lu, H.; Dong, W.; Li, Y.; Li, J.; Humar, I. CONet: A Cognitive Ocean Network. Wirel. Commun. IEEE 2019, 26, 90–96. [Google Scholar] [CrossRef] [Green Version]
Lu, H.; Uemura, T.; Wang, D.; Zhu, J.; Huang, Z.; Kim, H. Deep-Sea Organisms Tracking Using Dehazing and Deep Learning. Mob. Netw. Appl. 2018, 25, 1008–1015. [Google Scholar] [CrossRef]
Zhou, J.C.; Zhang, D.H.; Zhang, W.S. Classical and state-of-the-art approaches for underwater image defogging: A comprehensive survey. Front. Inf. Technol. Electron. Eng. 2020, 21, 1745–1769. [Google Scholar] [CrossRef]
Kuanar, S.; Mahapatra, D.; Bilas, M.; Rao, K.R. Multi-path dilated convolution network for haze and glow removal in nighttime images. Vis. Comput. 2021, 38, 1121–1134. [Google Scholar] [CrossRef]
J, Z.; Liu, Z.; Zhang, W.; Zhang, W. Underwater image restoration based on secondary guided transmission map. Multimed. Tools Appl. 2021, 80, 7771–7788. [Google Scholar]
Wang, Y.; Song, W.; Fortino, G.; Qi, L.; Zhang, W.; Liotta, A. An Experimental-based Review of Image Enhancement and Image Restoration Methods for Underwater Imaging. IEEE Access 2019, 2019, 140233–140251. [Google Scholar] [CrossRef]
Wang, R.; Wang, Y.; Zhang, J.; Fu, X. Review on Underwater Image Restoration and Enhancement Algorithms. In ICIMCS ’15: Proceedings of the 7th International Conference on Internet Multimedia Computing and Service; Association for Computing Machinery: New York, NY, USA, 2015; p. 6. [Google Scholar]
Mliki, H.; Dammak, S.; Fendri, E. An improved multi-scale face detection using convolutional neural network. Signal Image Video Process. 2020, 14, 1345–1353. [Google Scholar] [CrossRef]
LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef]
Garcia-Garcia, A.; Orts, S.; Oprea, S.; Villena-Martinez, V.; Martinez-Gonzalez, P.; Rodríguez, J.G. A survey on deep learning techniques for image and video semantic segmentation. Appl. Soft Comput. 2018, 70, 41–65. [Google Scholar] [CrossRef]
Zhao, Z.Q.; Zheng, P.; Xu, S.T.; Wu, X. Object detection with deep learning: A review. IEEE Trans. Neural Networks Learn. Syst. 2019, 30, 3212–3232. [Google Scholar] [CrossRef] [PubMed]
Dimitri, G.M.; Spasov, S.; Duggento, A.; Passamonti, L.; Lió, P.; Toschi, N. Multimodal and multicontrast image fusion via deep generative models. Inf. Fusion 2022, 88, 146–160. [Google Scholar] [CrossRef]
Wang, X.; Yang, J. Marathon athletes number recognition model with compound deep neural network. Signal Image Video Process. 2020, 14, 1379–1386. [Google Scholar] [CrossRef]
Liang, Z.; Wang, Y.; Ding, X.; Mi, Z.; Fu, X. Single Underwater Image Enhancement by Attenuation Map Guided Color Correction and Detail Preserved Dehazing. Neurocomputing 2021, 425, 160–172. [Google Scholar] [CrossRef]
Han, F.; Yao, J.; Zhu, H.; Wang, C. Underwater Image Processing and Object Detection Based on Deep CNN Method. J. Sens. 2020, 2020, 1–20. [Google Scholar] [CrossRef]
Jian, M.; Liu, X.; Luo, H.; Lu, X.; Dong, J. Underwater image processing and analysis: A review. Signal Process. Image Commun. 2021, 91, 116088. [Google Scholar] [CrossRef]
Anwar, S.; Li, C. Diving deeper into underwater image enhancement: A survey. Signal Process. Image Commun. 2020, 89, 115978. [Google Scholar] [CrossRef]
Pizer, S.M.; Amburn, E.P.; Austin, J.D.; Cromartie, R.; Geselowitz, A.; Greer, T.; ter Haar Romeny, B.; Zimmerman, J.B.; Zuiderveld, K. Adaptive histogram equalization and its variations. Comput. Vis. Graph. Image Process. 1987, 39, 355–368. [Google Scholar] [CrossRef]
Zuiderveld, K. Contrast Limited Adaptive Histogram Equalization—ScienceDirect. In Graphics Gems; Elsevier: Amsterdam, The Netherlands, 1994; pp. 474–485. [Google Scholar]
Jobson, D.J.; Rahman, Z.U.; Woodell, G.A. Properties and performance of a center/surround retinex. IEEE Trans. Image Process. 1997, 6, 451–462. [Google Scholar] [CrossRef]
Jobson, D.J.; Rahman, Z.U.; Woodell, G.A. A multiscale retinex for bridging the gap between color images and the human observation of scenes. IEEE Trans. Image Process. 1997, 6, 965–976. [Google Scholar] [CrossRef] [Green Version]
HeK, M.; SunJ, T.X.O. Single image haze removal using dark channel prior. IEEE Trans. Pattern Anal. Mach. Intell. 2011, 33, 2341. [Google Scholar]
Carlevaris-Bianco, N.; Mohan, A.; Eustice, R.M. Initial results in underwater single image dehazing. In Oceans 2010 Mts/IEEE Seattle; IEEE: Piscataway, NJ, USA, 2010; pp. 1–8. [Google Scholar]
Wang, Y.; Liu, H.; Chau, L.P. Single underwater image restoration using adaptive attenuation-curve prior. IEEE Trans. Circuits Syst. I Regul. Pap. 2017, 65, 992–1002. [Google Scholar] [CrossRef]
Shin, Y.S.; Cho, Y.; Pandey, G.; Kim, A. Estimation of ambient light and transmission map with common convolutional architecture. In OCEANS 2016 MTS/IEEE Monterey; IEEE: Piscataway, NJ, USA, 2016; pp. 1–7. [Google Scholar]
Wang, K.; Hu, Y.; Chen, J.; Wu, X.; Zhao, X.; Li, Y. Underwater image restoration based on a parallel convolutional neural network. Remote Sens. 2019, 11, 1591. [Google Scholar] [CrossRef] [Green Version]
Lowe, D.G. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 2004, 60, 91–110. [Google Scholar] [CrossRef]
Dalal, N.; Triggs, B. Histograms of oriented gradients for human detection. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), San Diego, CA, USA, 20–25 June 2005; Volume 1, pp. 886–893. [Google Scholar]
Priyadharsini, R.; Sharmila, T.S. Object detection in underwater acoustic images using edge based segmentation method. Procedia Comput. Sci. 2019, 165, 759–765. [Google Scholar] [CrossRef]
Cheng, R. A survey: Comparison between Convolutional Neural Network and YOLO in image identification. J. Physics: Conf. Ser. 2020, 1453, 012139. [Google Scholar] [CrossRef] [Green Version]
Pedersen, M.; Bruslund Haurum, J.; Gade, R.; Moeslund, T.B. Detection of marine animals in a new underwater dataset with varying visibility. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Long Beach, CA, USA, 16–7 June 2019; pp. 18–26. [Google Scholar]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 39, 1137–1149. [Google Scholar] [CrossRef] [Green Version]
Siddiqui, S.A.; Salman, A.; Malik, M.I.; Shafait, F.; Mian, A.; Shortis, M.R.; Harvey, E.S. Automatic fish species classification in underwater videos: Exploiting pre-trained deep neural network models to compensate for limited labelled data. ICES J. Mar. Sci. 2018, 75, 374–389. [Google Scholar] [CrossRef]
Cai, Z.; Vasconcelos, N. Cascade R-CNN: High Quality Object Detection and Instance Segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 43, 1483–1498. [Google Scholar] [CrossRef] [Green Version]
Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
Redmon, J.; Farhadi, A. YOLO9000: Better, faster, stronger. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 7263–7271. [Google Scholar]
Farhadi, A.; Redmon, J. Yolov3: An incremental improvement. In Computer Vision and Pattern Recognition; Springer: Berlin/Heidelberg, Germany, 2018; pp. 1804–02767. [Google Scholar]
Diwan, T.; Anirudh, G.; Tembhurne, J.V. Object detection using YOLO: Challenges, architectural successors, datasets and applications. Multimed. Tools Appl. 2022, 1–33. [Google Scholar] [CrossRef]
Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. Ssd: Single shot multibox detector. In European Conference on Computer Vision; Springer: Berlin/Heidelberg, Germany, 2016; pp. 21–37. [Google Scholar]
Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
Zhang, X.; Wan, F.; Liu, C.; Ji, R.; Ye, Q. Freeanchor: Learning to match anchors for visual object detection. In Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, Vancouver, BC, Canada, 8–14 December 2019; pp. 147–155. [Google Scholar]
Zhu, C.; He, Y.; Savvides, M. Feature selective anchor-free module for single-shot object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 840–849. [Google Scholar]
Tian, Z.; Shen, C.; Chen, H.; He, T. Fcos: Fully convolutional one-stage object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Korea, 27 October–2 November 2019; pp. 9627–9636. [Google Scholar]
Yang, H.; Liu, P.; Hu, Y.; Fu, J. Research on underwater object recognition based on YOLOv3. Microsyst. Technol. 2021, 27, 1837–1844. [Google Scholar] [CrossRef]
Song, Y.; He, B.; Liu, P. Real-time object detection for AUVs using self-cascaded convolutional neural networks. IEEE J. Ocean. Eng. 2019, 46, 56–67. [Google Scholar] [CrossRef]
Salman, A.; Siddiqui, S.A.; Shafait, F.; Mian, A.; Shortis, M.R.; Khurshid, K.; Ulges, A.; Schwanecke, U. Automatic fish detection in underwater videos by a deep neural network-based hybrid motion learning system. ICES J. Mar. Sci. 2020, 77, 1295–1307. [Google Scholar] [CrossRef]
Cao, S.; Zhao, D.; Liu, X.; Sun, Y. Real-time robust detector for underwater live crabs based on deep learning. Comput. Electron. Agric. 2020, 172, 105339. [Google Scholar] [CrossRef]
Liu, Z.; Zhuang, Y.; Jia, P.; Wu, C.; Xu, H.; Liu, Z. A Novel Underwater Image Enhancement Algorithm and an Improved Underwater Biological Detection Pipeline. J. Mar. Sci. Eng. 2022, 10, 1204. [Google Scholar] [CrossRef]
Kong, W.; Hong, J.; Jia, M.; Yao, J.; Cong, W.; Hu, H.; Zhang, H. YOLOv3-DPFIN: A dual-path feature fusion neural network for robust real-time sonar target detection. IEEE Sens. J. 2019, 20, 3745–3756. [Google Scholar] [CrossRef]
Sung, M.; Cho, H.; Kim, T.; Joe, H.; Yu, S.C. Crosstalk removal in forward scan sonar image using deep learning for object detection. IEEE Sens. J. 2019, 19, 9929–9944. [Google Scholar] [CrossRef]
Hu, W.C.; Wu, H.T.; Zhang, Y.F.; Zhang, S.H.; Lo, C.H. Shrimp recognition using ShrimpNet based on convolutional neural network. J. Ambient. Intell. Humaniz. Comput. 2020, 1–8. [Google Scholar] [CrossRef]
Li, Y.; Zhang, X.; Shen, Z. YOLO-Submarine Cable: An Improved YOLO-V3 Network for Object Detection on Submarine Cable Images. J. Mar. Sci. Eng. 2022, 10, 1143. [Google Scholar] [CrossRef]

Figure 1. Network architecture of YOLOv5S.

Figure 2. Structure diagram of ASFF.

Figure 3. CA module architecture.

Figure 4. The Improved YOLOv5S structure diagram. Compared with the original model, it adds CA in the feature extraction section and ASFF in the detection header section.

Figure 5. Dataset images of marine life (holothurian, echinus, scallops and starfish). It can be seen that, unlike normal images, underwater images have chromatic aberration and low contrast, resulting in poor image quality.

Figure 6. Comparison between original (top) and enhanced (bottom). It can be seen that the grayscale histogram of the processed image has a larger grayscale range.

Figure 7. Visual effects of different image processing algorithms. (a–d) represent underwater images from four typical perspectives.

Figure 8. Test P-R diagram of successively adding improved method. It can be seen that the accuracy of the model increases with the addition of improved modules.

Figure 9. Comparison of test results of various models. (a–c) are three images with obvious contrast effect.

Table 1. Comparison of image processing algorithm experimental results.

Method	AP50 (%)				AP50 (%)	AP (%)
Method	Holothurian	Echinus	Scallops	Starfish	AP50 (%)	AP (%)
IMAGE	83.8	93.0	87.5	92.3	88.3	51.2
CLAHE	87.4	93.0	90.1	93.2	90.9	53.7
DCP	85.7	93.2	89.2	92.0	89.5	52.8
ACE	86.3	92.7	89.7	93.7	90.2	53.1
RGHS	86.1	92.9	88.9	93.0	89.7	53.4

Table 2. Comparison of experimental results of improved algorithms.

Model	ASFF	Soft-NMS	CA	AP50(%)	AP(%)
YOLOv5S				90.9	53.7
	✓			92.8	59.3
	✓	✓		93.6	60.7
	✓	✓	✓	94.9	62.8

Table 3. Comparison of experimental results of improved algorithms on test datasets.

Model	AP50(%)	AP(%)	Size	FPS
Faster R-CNN	90.8	61.7	41.2 M	23
Cascade R-CNN	90.4	63.8	68.4 M	18
FreeAnchor	91.8	63.1	36.1 M	26
RetinaNet	89.9	58.1	36.17 M	26
FCOS	85.8	48.0	31.8 M	17
FSAF	89.6	57.4	36.2 M	26
YOLOv3	93.8	63	283.4 M	34
YOLOv4	92.7	59.2	256.3 M	47
YOLOv5S	90.9	53.7	14.2 M	83
$ImprovedYOLOv 5 S$	94.9	62.8	20.4 M	82
YOLOv5M	94.5	65.5	42.5 M	59

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Li, P.; Fan, Y.; Cai, Z.; Lyu, Z.; Ren, W. Detection Method of Marine Biological Objects Based on Image Enhancement and Improved YOLOv5S. J. Mar. Sci. Eng. 2022, 10, 1503. https://doi.org/10.3390/jmse10101503

AMA Style

Li P, Fan Y, Cai Z, Lyu Z, Ren W. Detection Method of Marine Biological Objects Based on Image Enhancement and Improved YOLOv5S. Journal of Marine Science and Engineering. 2022; 10(10):1503. https://doi.org/10.3390/jmse10101503

Chicago/Turabian Style

Li, Peng, Yibing Fan, Zhengyang Cai, Zhiyu Lyu, and Weijie Ren. 2022. "Detection Method of Marine Biological Objects Based on Image Enhancement and Improved YOLOv5S" Journal of Marine Science and Engineering 10, no. 10: 1503. https://doi.org/10.3390/jmse10101503

APA Style

Li, P., Fan, Y., Cai, Z., Lyu, Z., & Ren, W. (2022). Detection Method of Marine Biological Objects Based on Image Enhancement and Improved YOLOv5S. Journal of Marine Science and Engineering, 10(10), 1503. https://doi.org/10.3390/jmse10101503

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Detection Method of Marine Biological Objects Based on Image Enhancement and Improved YOLOv5S

Abstract

1. Introduction

2. Related Work

3. Image Enhancement Algorithm

3.1. Contrast Ratio

3.2. Histogram Equalization

3.3. Local Processing

3.4. Contrast Limit

4. The Improved YOLOv5S Object Detection Model

4.1. YOLOv5S

4.2. Improvement Strategy

5. Experimental Research and Result Analysis

5.1. Dataset Establishment

5.2. Model Training

5.3. Experimental Results

5.3.1. Validation of Image Enhancement

5.3.2. Validation of Improved Algorithm

5.3.3. Comparison of Detection Effects of Different Models

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI