LIFRNet: A Novel Lightweight Individual Fish Recognition Method Based on Deformable Convolution and Edge Feature Learning

Yin, Jianhao; Wu, Junfeng; Gao, Chunqi; Jiang, Zhongai

doi:10.3390/agriculture12121972

Open AccessArticle

LIFRNet: A Novel Lightweight Individual Fish Recognition Method Based on Deformable Convolution and Edge Feature Learning

by

Jianhao Yin

^1,2,

Junfeng Wu

^1,2,*,

Chunqi Gao

^1,2 and

Zhongai Jiang

^1,2

¹

College of Information Engineering, Dalian Ocean University, Dalian 116023, China

²

Key Laboratory of Environment Controlled Aquaculture, Dalian Ocean University, Ministry of Education, Dalian 116023, China

^*

Author to whom correspondence should be addressed.

Agriculture 2022, 12(12), 1972; https://doi.org/10.3390/agriculture12121972

Submission received: 24 September 2022 / Revised: 31 October 2022 / Accepted: 14 November 2022 / Published: 22 November 2022

(This article belongs to the Section Digital Agriculture)

Download

Browse Figures

Versions Notes

Abstract

:

With the continuous development of industrial aquaculture and artificial intelligence technology, the trend of the use of automation and intelligence in aquaculture is becoming more and more obvious, and the speed of the related technical development is becoming faster and faster. Individual fish recognition could provide key technical support for fish growth monitoring, bait feeding and density estimation, and also provide strong data support for fish precision farming. However, individual fish recognition faces significant hurdles due to the underwater environment complexity, high visual similarity of individual fish and the real-time aspect of the process. In particular, the complex and changeable underwater environment makes it extremely difficult to detect individual fish and extract biological features extraction. In view of the above problems, this paper proposes an individual fish recognition method based on lightweight convolutional neural network (LIFRNet). This proposed method could extract the visual features of underwater moving fish accurately and efficiently and give each fish unique identity recognition information. The method proposed in this paper consists of three parts: the underwater fish detection module, underwater individual fish recognition module and result visualization module. In order to improve the accuracy and real-time availability of recognition, this paper proposes a lightweight backbone network for fish visual feature extraction. This research constructed a dataset for individual fish recognition (DlouFish), and the fish in dataset were manually sorted and labeled. The dataset contains 6950 picture information instances of 384 individual fish. In this research, simulation experiments were carried out on the DlouFish dataset. Compared with YOLOV4-Tiny and YOLOV4, the accuracy of the proposed method in fish detection was increased by 5.12% and 3.65%, respectively. Additionally, the accuracy of individual fish recognition reached 97.8%.

Keywords:

individual fish recognition; deformable convolution; lightweight; deep learning

1. Introduction

With the continuous development of aquaculture and the expansion of the farming scale, the old farming model of relying on experience, great effort and the weather has become increasingly inappropriate for the needs of current agricultural production and management. The aquaculture industry is gradually transforming from extensive production to factory, large-scale and intelligent production. Because of the continuous expansion of the aquaculture scale and aquaculture categories, it is of great significance to effectively obtain and analyze some important information generated in the production process, which is also very important for reducing the risk of aquaculture, improving the economic benefits of enterprises and reducing the labor intensity of employees.

In the process of fishery production, an accurate and real-time grasp of the body length, weight information, health status and behavior status of farmed fish could provide key analysis data for bait feeding, water quality management, disease control, feed formula management, etc., and then provide important data support for production management decision making. However, the acquisition of these data requires the realization of the individual recognition of fish. This means that, on the basis of the identified fish species, it is necessary to further determine the individual fish in order to bind the growth characteristics of the individual to the specific individual. To achieve this goal, people have tried a variety of different methods, including placing tags on fish bodies, using RFID technology and implanting tracking devices. However, these methods are highly invasive, which could cause great damage to individual fish and have high implementation costs, so these methods are difficult to popularize widely.

There is no doubt that non-contact biometric feature extraction is an ideal way to solve individual fish recognition. Due to its poor robustness and generalization, traditional computer vision technology could not meet the actual production needs of individual fish recognition. However, deep learning and artificial intelligence technology have developed rapidly in recent years, and their combination with many fields has shown great technical advantages and application value, which fully proves their effectiveness and practicality. The use of artificial intelligence and deep learning technology has provided good applications in the fields of attitude estimation, object detection, autopilot and target recognition, which makes it possible to use it to solve the individual recognition problem of fish. In particular, large-scale deep learning algorithms have achieved high accuracy in face recognition and have been widely used. With the deepening of research, we find that there are many challenges in solving the problem of individual fish recognition by using computer vision technology. Unlike other fields, accurate and real-time individual fish recognition has its own characteristics. It is not feasible to apply the mature technology for individual fish recognition directly. Underwater real-time individual fish recognition faces many new situations and challenges (Figure 1):

Complex underwater environment: The primary difficulty of underwater recognition task is that individual fish recognition needs to deal with the complex and changeable underwater environment. Due to poor underwater lighting conditions, compared to the data obtained under normal conditions, the quality of underwater video and image data is not high. In addition, some water bodies have large chromatic aberration changes, turbid water quality and the interference of many non-fish targets such as algae. It brings difficulties to effectively obtaining the visual characteristics of fish, so it is a great challenge to accurately recognize individual fish.
Serious occlusion between fish: The majority of underwater fish activity occurs in groups. Many fish swim fast and are small in size, and the individual fish shield each other. However, individual recognition needs to accurately separate individual fish from other individual fish and the surrounding environment, and then extract the visual feature information of its torso. Therefore, it is very difficult to effectively extract the visual feature information of a fish torso between severely occluded individual fish.
There is a significant visual similarity between each fish. Different fish species have unique visual characteristics. However, the visual differences between individual fish are very small and the visual similarities are strong. Some individual fish are challenging to distinguish directly with the naked eye, so the recognition algorithm must precisely capture the tiny visual variations between the individuals.

Therefore, this paper proposes an individual fish recognition method (LIFRNet) based on a lightweight convolution neural network. The main contributions of this paper are as follows:

in the fish detection part, the CBAM attention mechanism module is added to greatly improve the accuracy of object detection at the cost of a small number of parameters by focusing on the two dimensions of channel and space;
in the fish recognition part, we use the combination of 1 × 1 convolution and BN layer to learn the edge features of fish, use deformable convolution which is more suitable for the swimming posture of fish, and use the Mish activation function instead of Relu to obtain smaller intra class distance and larger inter-class distance;
we collect and recognize the individual fish recognition dataset (DlouFish), which contains a total of 6950 images of 384 individual fish and is numbered by the individual, to facilitate the feature extraction, training and prediction of underwater individual fish using deep neural networks.

The structure of this paper is as follows: the second part of the paper mainly introduces the related work of individual fish recognition, and the third part mainly introduces our proposed lightweight convolutional neural network for individual fish recognition. The fourth part mainly introduces the results of the simulation experiments, and the fifth part summarizes the proposed work and prospects for the future work.

2. Related Work

With the wider application of animal individual recognition, animal individual recognition has also attracted more and more attention from experts and scholars, and has become a hot topic of research in academia and industry. The characteristics are analyzed and studied in a targeted manner, different methods are designed to realize the individual recognition of animals and different data sets are also constructed to train and test the individual recognition algorithm. In order to fill the gap in the northeast tiger dataset, Li et al. [1] published a dataset of over 8000 video clips of 92 northeast tigers in 2019. Based on this dataset, Liu et al. [2] used the pedestrian re-recognition method, extracted features from various body parts of tigers, and used a partial pose-guided global learning approach to complete the re-recognition of the northeast tiger. In order to increase the precision and speed of cow detection in actual production scenarios, Xu et al. [3] used a facial recognition framework combining Retinaface and ArcFace. The automatic cow recognition technique proposed by Li et al. [4] used the Zernike matrix as a cow feature extractor, followed by linear discriminant analysis on the collected features and a support vector integration class approach for individual cow recognition. Ghosh et al. [5] analyzed the performances of various deep CNN-based models using an identical set of hyper parameters trained end-to-end on a pig breed dataset and a goat breed dataset, respectively. The experimental results showed that MobileNetV2 was the best deep CNN model for goat breed classification and InceptionV3 was the best model for pig breed classification.

In 2016, Villon et al. [6] analyzed and experimented with two different methods of deep learning and SVM classifiers to detect and identify fish, and discussed their advantages and disadvantages. Facts have proved that with the improvement of computer computing power and the arrival of the age of big data, the use of deep learning for fish detection and classification has become a major trend. Tamou et al. [7] used the pre-trained AlexNet network to extract features from the foreground fish images of an available underwater dataset and then used an SVM classifier to classify. The convolution neural network AlexNet is combined with transfer learning to realize the automatic classification of fish species. An accuracy of 99.45% was obtained in the Fish Recognition Ground Truth dataset. Blount et al. [8] proposed the Flukebook platform in the field of underwater species recognition, which combines photo algorithms with data management and infrastructure for whales and dolphins. Flukebook was trained on 15 different species to form 37 species-specific recognition processes, and then applied to cetacean photo recognition through ongoing collaboration between computer vision researchers, software engineers and biologists. By enhancing ResNet, enhancing the feature information output of identified objects, and enhancing the utilization of feature information, Zhao et al. [9] suggested a composite fish detection framework based on composite backbone networks and enhanced path aggregation networks. Nixon [10] proposed a neural network capable of identifying, categorizing and counting 11 fish species to track the reproductive activity of fish populations using YOLOv4 and Darknet as the infrastructure and architecture. Based on the spot features on shark skin, Arzoumanian et al. [11] developed a shark feature recognition library and employed feature matching for individual shark recognition.

In contrast to terrestrial animal recognition, underwater species recognition is more challenging to train a high-performance recognition model due to the noisy nature of underwater imagery. To solve these issues, Kaur et al. [12] proposed the Atmospheric-Light-Enhancement Algorithm (ALE), which includes a preprocessing step for underwater images that acts on the intensity, contrast, and sharpness of the object to improve the visualization quality. In order to train precise deep neural network fish recognition models from noisy large-scale underwater photos using adaptive perturbation methods on confrontational perturbed images, Zhang et al. [13] introduced a unique deep adversarial learning framework for AdvFish. Deep et al. [14] employed single-image super-resolution approaches to tackle the issue of limited discriminative information from low-resolution images and deep learning methods to explicitly learn discriminative features from relatively low-resolution images. Syreen et al. [15] proposed the Iterative Grouped Evolution Network (IGCN) to divide all candidate areas into fish and non-fish entities. A hybrid fusion of optical flow and VGG16 at level one. In the LifeCLEF 2015 fish dataset, a detection accuracy of 94.05% was achieved. Villon et al. [16] used the GoogLeNet to extract features and adopted the Softmax classification method to detect reef fish. Rauf et al. [17] proposed a CNN architecture containing 32 depth layers, which can better obtain valuable features from images to complete fish species recognition. A data set named Fish Pak was also provided, including 915 images of six different kinds of fish. Xu et al. [18] applied the YOLO deep learning model to three different data sets recorded by real-world waterpower sites to identify fish in underwater videos, with a mAP score of 0.5392. Petrellis [19] proposed fish morphological feature recognition based on deep learning technology. First, the fish and background were separated by object detection and image segmentation, and then the size of fish and the position of key points were measured by aligning landmarks. The accuracy of fish size estimation was 80–91%. Rosales et al. [20] created a fish detector using Faster R-CNN to locate fish. The model achieved a mini-batch accuracy equal to 99.95 percent with an RPN mini-batch accuracy equal to 100 percent. Jalal et al. [21] combines luminous flux and a Gaussian mixture model with a depth neural network to obtain time information to identify fish moving freely in the background, thus proposing a method to classify fish in an unconstrained underwater video. Classification accuracies of 91.64% and 78.8% were achieved for the LifeCLEF 2015 and UWA datasets, respectively. Hossain et al. [22] proposed an automatic monitoring system for marine organisms, which uses GMM background subtraction for detection, and uses the Pyramid Histogram Of visual Words (PHOW) feature with an SVM classifier for classification. In the CLEF 2015 dataset, the still image can be classified with an accuracy of 91.7%. Ben Tamou et al. [23] used the transfer learning framework to propose a training loss curve method for targeted data enhancement. Additionally, Tamou proposed a hierarchical CNN classification method to classify fish first into family levels and then into species categories. In the LifeClef 2015 Fish dataset, 81.53% accuracy was achieved. Ben Tamou et al. [24] achieved 81.83% accuracy in the LifeClef 2015 Fish dataset again. They used a new strategy of incremental learning to train the network. First, they learned difficult species and then learned new species using knowledge distillation to complete the classification task of live fish species in underwater images.

However, the purpose of some existing underwater fish recognition work is often to classify fish by category and not to accurately recognize different individual fish in the same species. At the same time, there are still some problems in the field of individual fish recognition, such as the lack of data resources and a large number of model parameters. In addition, the complexity and uncertainty of the underwater environment has led to a decrease in recognition accuracy.

In response to the above issues, a lightweight convolution neural network based individual fish recognition method (LIFRNet) is proposed. The schematic diagram of individual fish recognition is shown in Figure 2.

3. The Proposed Work

With the continuous development of computer vision technology, biometric technology has been widely used; especially, face recognition technology has been widely used in public security, cash payment and identity recognition, in addition to other fields. However, research on the individual recognition of underwater fish is still emerging. Individual fish recognition has great differences from other biometric recognition. First of all, individual fish recognition is primarily conducted using underwater equipment. Because of the complexity of the underwater environment, the water quality is frequently subpar, which results in poor image data and obscure fish biometric features, which severely hamper the extraction of fish biometric features. Secondly, the biological characteristics of some species of fish have only very small differences, and the similarity between biological individuals is very high. The feature extraction method must capture the small differences of biological individuals to accurately distinguish different individuals. Therefore, in view of the various problems in individual fish recognition, we must design individual recognition algorithms in a targeted manner.

Committed to solving the problems of difficulty in extracting biological features, high visual similarity between individual fish and high real-time requirements in individual fish recognition, this paper designs a lightweight underwater fish real-time individual recognition method: LIFRNet (Figure 3). LIFRNet consists of three parts, namely, the underwater fish object detection module, individual recognition module and visualization module. The underwater fish object detection module can detect the fish in the data stream in real time and separate the individual fish from the surrounding environment. The individual recognition module can extract the biological features of the detected single fish target and obtain a feature map with fish body features. The final visualization module uses the optimal weights obtained by multiple iterations of training to visually identify the fish school and output the results.

3.1. Underwater Fish Detection Module

After extensive experiments, we found that existing object detection methods are mainly used to detect objects in normal environments. However, the results of these methods are not satisfactory for fuzzy, low frame rate and small objects, so these methods cannot be directly applied to the recognition of underwater individual fish. This paper proposes the YOLO-CBAM method for underwater fish object detection. The method proposed in this paper uses YOLOV4-Tiny as the main framework and makes targeted improvements according to the characteristics of individual fish recognition. After a lot of repeated experiments, compared with the ordinary YOLO algorithm, we found that, although YOLOV4-Tiny has a lighter network structure, the detection accuracy drops significantly. In order to retain the advantages of the YOLOV4-Tiny network, which has a small number of structural parameters and fast detection speed, and to further improve the detection ability of the network while reducing the weight of the network, this paper further optimizes the backbone network of YOLOV4-Tiny according to the characteristics of individual fish recognition (Figure 4). The algorithm in this paper integrates the convolutional block attention module (CBAM) [25] into the object detection backbone network, so that the network can adaptively focus on the more important parts of the image and fuzzy image data, so that the network can learn the visual characteristics of underwater fish objects. The structure schematic of CBAM module is shown in Figure 5.

The visual attention mechanism module used in this paper consists of two sub-modules, which are the channel attention module and the spatial attention module. The spatial attention module obtains weights by locating objects and performing some transformations, so it can find the most important parts of the image for learning. The channel attention module obtains the importance of each feature through modeling and can assign different features according to different tasks. In this paper, the attention mechanism and YOLOV4-Tiny are organically integrated, which not only ensures the lightweight quality of the network model, but also improves the accuracy of the model to a certain extent. In the feature extraction network, CBAM is added to the two feature layers extracted by the backbone network and the up-sampled result.

The implementation of the channel attention mechanism is divided into two parts, as shown in Figure 6. Firstly, global max pooling and global average pooling are performed on the feature layer, respectively, and then the shared fully connected layer is used for processing. Then, the two results are added and put into the Sigmoid function to obtain the weights of each channel in the feature layer. Finally, the weights are multiplied by the original input feature layer to generate the input features required by the spatial attention module. The spatial attention mechanism takes the maximum value and average value of the channel of each feature point for the input feature layer, and then stacks the two results and uses a 7 × 7 convolution to adjust the number of channels to 1. After being processed by the Sigmoid function, the weights in the feature layer are obtained, and finally, the weights are multiplied by the original input feature layer to obtain the final generated features (Figure 7). Figure 8 shows the results of the fish object detection of the method proposed in this paper. In Figure 8a, we found that the algorithm only detected two individual fish by YOLO-tiny, and all individual fish are detected by the proposed method.

3.2. Individual Fish Recognition Module

The effective detection of individual fish in images is the premise of individual fish recognition. After accurately detecting the individual fish in the images, effectively extracting the visual feature information of the individual fish is the core problem which needs to be resolved in order to achieve accurate individual fish recognition. The differences in visual similarity between individuals of the same species of fish are small, and the recognition network needs to accurately capture the subtle feature differences between individuals. In order to adapt to the characteristics of individual fish recognition, this paper designs a lightweight and deformable convolution individual fish recognition network structure, as shown in Figure 9.

Among them, the function of the distance calculation is to compare the features of the input pictures to calculate the similarity of individual fish. The smaller the value obtained after the distance calculation, the higher the similarity between individual fish, the larger the value and the lower the similarity. After repeated tests with different input images, we found that the values between individuals of the same fish were much less than 1, and the values between individuals of different fish were maintained at 1.4–1.7, which is greater than 1. Therefore, we use 1 as the cut-off point to determine whether it is the same individual fish. If the resulting value is less than 1, the model considers the input different pictures as the same individual fish; if it is greater than 1, it considers them as different individual fish. In this way, the task of recognizing underwater individual fish is completed.

3.2.1. Backbone Network

The discussion in this work is focused on making the model more lightweight while improving its capacity to identify fish bodies. We made three improvements to the MobileNetV1 [26] backbone network for the individual recognition module: convolution kernel, activation function and average pooling layer. The improved backbone network is shown in Figure 10. A picture 112 × 112 × 3 in size is used as input, and a 1 × 1 × 512 feature vector is obtained after the neural network. The network has 29 layers, 14 layers of size 3 × 3 deformable convolutions, 13 layers of size 3 × 3 deep separable deformable convolutions, 1 layer of size 1 × 1 standard convolution and 1 fully connected layer. Among them, dark blue represents the normal deformable convolution and light purple represents the deeply separable deformable convolution.

3.2.2. Deformable Convolution

The trajectory and pose of the fish body swimming in the water vary depending on the characteristics of fish activities underwater, but the standard convolution kernel can only sample the input feature map at a fixed position, which is weak in generalization ability and poorly adaptable to unknown changes.

This paper uses deformable convolution in place of standard convolution to address this issue. Deformable convolution differs from standard convolution by adding a direction parameter to each element, allowing the convolution kernel to be extended to a wider range during training. Instead of using a regular convolution, a deformable convolution of size 3 × 3 is employed in this study [27]. This allows the convolution kernel to change its shape to the actual situation and more effectively extract input information without increasing the number of parameters. The comparison of the sampling position of fish body images after the addition of deformable convolution is given in Figure 11.

3.2.3. Edge Feature Learning

Image edge refers to the collection of pixels whose gray level changes are discontinuous around them. Edges widely exist between objects and backgrounds, and between objects. Therefore, edges are important features of image segmentation, image understanding and image recognition. In the task of underwater individual fish recognition, the changes of light and background environment often lead to occlusion and blurring of the body features of fish individuals. At this time, it is difficult to accurately complete the recognition task using the main features of the body of fish. At the same time, individuals of the same species usually have similar trunk characteristics, which also brings challenges to the recognition task.

By comparing a great number of fish pictures, we found that the mouth, tail, fins and other parts of the fish have distinctive characteristics. As shown in Figure 12, the trunk features of the two fish are similar, and the texture features of the mouth and tail play a key role in distinguishing different individuals. Therefore, we believe that when body features cannot be used for effective recognition, the learning of fish body edge features is particularly important.

In order to improve the network’s ability to learn edge features, we added 1 × 1 standard convolution, and discarded the pooling layer commonly used in convolution neural networks. Although the pooling layer has the advantage of preventing overfitting and downsampling, it reduces the learning ability of the network for edge features. The reason for this is that, in a feature map, although the sensing ranges of the center point and the corner point are the same, the sensing area of the center point contains the complete information of the whole picture, while the sensing area of the corner point only contains part of the picture. At this time, the weight of each point should be different, but the pooling layer treats them as the same weight [28]. Therefore, when identifying fish bodies with fuzzy trunk features and similar texture features that can only be distinguished by detailed information, the disadvantages of pooling layer will be further amplified.

The 1 × 1 convolution was first used in the Network in Network technique [29], and its calculation method is the same as the other convolution kernels; the only difference is the size. The authors concluded that the operation of 1 × 1 convolution + Relu can increase the nonlinearity of the network, thus improving the nonlinear fitting ability of the network and the classification effect of the network without increasing the number of network parameters.

In this paper, a 1 × 1 convolution kernel with 512 channels is added after the 7 × 7 convolutional kernel to replace the average pooling layer. Since 1 × 1 convolution is performed on the channels, the correlation information between channels can be extracted. The 1024 channels are linearly combined across channels to 512 channels, thus increasing cross-channel information interaction and reducing computational load.

In addition, we added a BN layer and the Mish activation function. Using this combination, the advantages of the pooling layer in preventing overfitting are retained to the greatest extent, and the model learning speed can be accelerated. The nonlinearity can be greatly increased without losing the resolution of the characteristic map.

3.2.4. Mish Activation Function

The generalization ability and adaptability of the network can be significantly enhanced with the use of an appropriate activation function. Relu6 serves as the network’s activation function in the conventional MobileNetV2 network to ensure good numerical resolution, even at low precision [30]. However, the issue of neuron death is not resolved by Relu or Relu6. The gradient of the function becomes zero when the input is close to zero or negative, making it unable to learn using backpropagation.

In order to avoid such issues in individual fish recognition networks, the Mish function—a self-norming non-monotonic function whose smoothing property allows better penetration of information into the neural network, resulting in higher accuracy and stronger generalization—was adopted as the activation function in this paper, instead of Relu6 [31]. The Mish function expression is shown in Equation (1) and the image is shown in Figure 13.

y = x tanh(ln(1 + exp(x))),

(1)

As seen in Figure 14, when the input value is negative, it is not truncated, as with Relu and Relu6; instead, a lesser gradient is allowed to flow in order to ensure the flow of information, successfully resolving the issue of neuron death. The Mish function also avoids the gradient saturation issue because it is unbounded. It was found that the Mish activation function is 0.494% better than Swish and 1.671% better than Relu.

3.2.5. Loss Function

The traditional target recognition is more inclined to be treated as a classification problem. The categories are labeled with categories and the results are given by Softmax, and Softmax loss [32] is shown in Equation (2). However, as the dataset expands and the categories change, it is necessary to retrain the model. For this type of problem, Deng et al. [33] proposed Arcface loss based on Softmax loss to improve the inter-class separability while reducing the intra-class distance, as shown in Equation (3).

L_{\begin{array}{l} S o f t m a x \end{array}} = - \frac{1}{N} \sum_{i = 1}^{N} \log \frac{e^{W_{y i}^{T} f_{i} + b_{y i}}}{\sum_{j = 1}^{n} e^{W_{j}^{T} f_{i} + b_{y i}}};

(2)

L_{arcface} = - \frac{1}{N} \sum_{i = 1}^{N} \log \frac{e^{S (\cos (θ_{y i} + m))}}{e^{S (\cos (θ_{y i} + m)) + \sum_{j \neq y_{i}} e^{S \cos (θ_{j})}}} .

(3)

Specifically, Arcface loss fixes the bias b in Softmax loss to 0, and transforms

W_{y i}^{T} f_{i}

into

∥ W_{j} ∥ \cdot ∥ f_{i} ∥ \cos θ_{j}

by dot product transformation, where

θ_{j}

represents the angle between the weights

W_{j}

and the features

f_{i}

; after normalization makes

∥ W_{j} ∥ = ∥ f_{i} ∥ = 1

, the normalized prediction only depends on the angle

θ_{j}

between the features

f_{i}

and the weights

W_{j}

; then, it multiplies the features by the constant

S

, at which time, the learned features are distributed on the hypersphere with the radius

S

; finally, the direction the angle penalty

m

is added in the direction of

θ_{j}

to achieve the purpose of increasing the inter-class distance and reducing the intra-class distance.

In this paper, Arcface loss is used as the loss function of LIFRNet, and the loss converges to 0.001 after 300 epochs.

3.3. Real-Time Visualization Module

In the real-time visualization module, LIFRNet integrates the optimal weight of the detection and recognition modules to display individual fish information in real time. In fact, different sides of a fish body have different texture features. Therefore, we treat the different sides of the same fish body as two different fish in the training of the recognition module. In the visualization module, we want the same fish to have unique identity information. Therefore, we propose two solutions:

(1): Cameras are installed on different sides of the water. Each camera only recognizes fish swimming in the same direction, and then summarizes the information to obtain accurate fish information.
(2): Recode the individual fish information, and use the numbers 1 and 2 as the basis to distinguish different sides of the same fish. The individual fish information is obtained through only one camera.

In the actual underwater fish body recognition, we found that the fish body swam in a wide range, the swimming posture was irregular and the swimming direction often changed, so we chose the recoding method to complete the fish body information visualization. The rendering of the visualization module is shown in Figure 14, where the code of the fish head swimming toward the right is 1, such as Fish_ 5_ 1. The code of the fish head swimming towards the left is 2, for example, Fish_ 13_ 2. The purpose of this is to prevent different side texture features from affecting the training of the model, and when the same fish swims past the underwater camera in different directions, we can also intuitively and accurately grasp the fish body information through the real-time visualization module.

In the processes of aquaculture, the real-time visualization module is helpful for the aquaculture personnel to pay attention to the individual information of fish at any time, and adjust the aquaculture strategy according to the actual situation to achieve the goal of precise aquaculture.

After the detection and recognition of the two modules, LIFRNet integrates the functions of the two modules to form a real-time visualization module, which can detect and identify individual fish in real time through underwater cameras, which enables to aquaculture personnel to pay attention to individual fish information at any time and adjust their aquaculture strategies according to the actual situation.

4. Experiment Result and Discussion

4.1. The Dataset

In the field of underwater fish recognition, there is no publicly accessible dataset for individual fish recognition. Therefore, we developed a fish recognition dataset (DlouFish), as shown in Figure 15, through extensive collection and collation. The dataset consists of 6950 labeled individual fish photographs and is numbered according to the individuals. It contains 2100 images of koi, 1850 images of puffer fish, 1800 images of clown fish and 1200 images of grass carp. These images are from the internet and photography. Considering that the underwater reference object is fuzzy, it was difficult to identify the fish body through the background environment. Therefore, we extracted the frame of the video, artificially labeled the identity information of the fish body according to the continuity of the video, and made a data set after the disruption.

We divided the data set into a training set and a test set at a ratio of 9:1, which included different kinds of fish bodies, such as brocade carp with obvious patterns and puffer fish with high similarity. At the same time, the lighting conditions are were quite different. The purpose of this was to improve the learning ability of the model during training and verify the generalization of the model during testing.

In order to facilitate the analysis of the experimental results, we formulated the naming rules of the data in the data set. The numbering rule is individual fish number + image number. For example, the picture named “000101” represents the first individual fish picture with the ID number 1, and the image named “001111” represents the 11th individual fish picture with the ID number 11. The advantage of this rule is that we can intuitively judge whether the predicted individual fish is the same one by using the number assigned.

4.2. Experimental Setup

In this research, the experiments were conducted using the Pytorch framework under Ubuntu 20.04, and the computer GPU configuration was GeForce RTX 3090Ti. The loss function was Arcface, the optimizer was adam, momentum was 0.9, batchsize was 64, the initial learning rate was 0.001 and the minimum learning rate was 0.0001 The algorithm evaluation metric was mAP, and the descent method was step, with 300 epochs of training.

4.3. Performance Comparison of YOLOV4-Tiny Incorporating Different Attention Mechanisms

In this research, the mechanism modules of mainstream attention in recent years, including SE [34], ECA [35] and CBAM, were added to YOLOv4 tiny and compared with the traditional YOLOv4 tiny and YOLOv4 [36]. The experimental results (Table 1) show that, by incorporating the attention mechanism module, the accuracy of the model can be significantly increased. In our DlouFish dataset, compared with the traditional YOLOv4, the accuracy of YOLOV4-Tiny after CBAM fusion was improved by 3.65%, and the parameter amount was nearly 10 times smaller than YOLOv4. At the same time, when the parameters were similar, the model we used performed the best and achieved an accuracy of 88.6%.

4.4. Performance Comparison and Analysis

4.4.1. Comparison and Analysis of Improved Network Performance

This research used deep convolutional networks to learn the differences between fish visual features for individual fish recognition. Therefore, when the network predicts the same individual fish, the distance between the pictures given is as small as possible; that is, the similarity is high. When the network predicts different individual fish, the distance result given is as large as possible; that is, the similarity is low. The performance of LIFRNet in recognizing different individual fish is shown in Table 2.

4.4.2. Analysis of Experimental Results of Adding Deformable Convolution Using Different Methods

We adopted two different methods when using deformable convolution. The first one (Method 1) was to add a deformable convolution of size 3 × 3 to the 1 × 1 convolution, plus a BN layer with the Mish activation function, and not to change the standard convolution in the backbone network. The significance of this was to increase the number of network layers by adding a convolution, and at the same time, allow the activation function to play a bigger role. The other method (Method 2) is to replace all 3 × 3 standard convolutions with 3 × 3 deformable convolutions. The experimental results of these two methods are shown in Table 3.

The results of the studies demonstrate that the distance between distinct fish bodies may be enhanced by adding deformable convolution and that this has a better overall impact than the other method, which is essentially identical when recognizing the same fish body. However, this has the unintended consequence of dramatically increasing the number of parameters. The original network had 4,231,976 parameters; the number of parameters obtained using this method was 55.75 percent more.

In this research, we finally choose the method of using deformable convolution, instead of standard convolution, without increasing the number of parameters, for the following reasons:

YOLOV4-Tiny with a fused CBAM attention mechanism is utilized for this purpose, instead of YOLOV4 for object detection, as our goal is to create a lightweight solution for individual fish recognition.
While adding deformable convolution improves the effect, it is not particularly helpful for actual fish detection. The reason is that we artificially set a threshold value when the network returns a prediction result, and when the distance is larger than the threshold value, this predicted image is deleted from the list of alternatives, which does not affect the recognition accuracy.
When recognizing the same fish, there is almost no difference in the effects of the two methods, which means that when the distance is less than the threshold value, the effect of the two methods on the recognition accuracy is equal.

4.4.3. Analysis of Experimental Results of Different Background Environment

In this experiment, we performed background elimination for different pictures of individual fish. The effect of the background environment on the distance of fish similarity was tested by this method when the model recognized different individual fish of the same category. The experimental results are shown in Figure 16.

The experimental results showed that the similarity distance of individual fish changed slightly when we performed background elimination on one of the pictures. The similarity distance became smaller, but the value change was extremely small, which means that the difference in the background environment has a slight effect on the recognition ability of the model. When we eliminated the backgrounds of the two pictures at the same time, the similarity distance value was almost the same as the value without background elimination. This indicates that the difference in background environment color had only a minimal effect on the model. The model focuses more on the extraction of texture features from the individual fish than on learning the features of the pictures from the background environment.

4.4.4. Analysis of Experimental Results under Different Backbone Networks

In the following part of our research, we used Resnet50 [37], Iresnet50 [38], Mobilefacenet, MobilenetV2 and our proposed methods for experiments. The experimental results are shown in Table 4 and Table 5.

It can be seen that the distance of our method is smaller; the average distance decreased by 0.284, when testing the same fish. Additionally, the distance is larger when recognizing different individuals.

In addition, the resnet50 network with parameters six times larger than ours has 5.86% less accuracy than ours. On the DlouFish dataset, our Acc 1 reached 97.8%.

5. Conclusions

In this paper, we proposed a lightweight algorithm for individual fish recognition that can lessen the negative effects of fish swimming irregularly and the complex underwater environment. We also constructed and labeled a fish recognition dataset (DlouFish), which contains 6950 images of 384 fish and is numbered by the individual, to fill the dataset gap in the field of underwater live fish recognition. The experimental results demonstrate that the algorithm suggested in this study performs both fish detection and fish recognition tasks with considerably higher accuracy and is capable of handling the underwater fish recognition challenge. We will keep working on underwater object detection in our upcoming studies and enhance the performance of the model in more difficult environments.

Author Contributions

Methodology, J.Y. and J.W.; conceptualization, J.Y. and J.W.; resources, Z.J.; data curation, J.Y. and C.G.; writing—original draft preparation, J.Y. and J.W.; writing—review and editing, J.Y. and J.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Key Research Projects in Liaoning Province (2020JH2/10100043), National Natural Science Foundation of China (31972846), Key Laboratory of Environment Controlled Aquaculture (Dalian Ocean University) Ministry of Education (202205) and National Key Research and Development Program of China (2021YFB2600200).

Institutional Review Board Statement

Not applicable; non-contact photography of individual fish through digital cameras and underwater cameras.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author. The data are not publicly available because part of the data is provided by a cooperative enterprise.

Conflicts of Interest

The authors declare no conflict of interest.

References

Li, S.; Li, J.; Tang, H.; Qian, R.; Lin, W. ATRW: A benchmark for Amur tiger re-identification in the wild. arXiv 2019, arXiv:1906.05586. [Google Scholar]
Liu, C.; Zhang, R.; Guo, L. Part-pose guided amur tiger re-identification. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, Seoul, Republic of Korea, 27–28 October 2019. [Google Scholar]
Xu, B.; Wang, W.; Guo, L.; Chen, G.; Li, Y.; Cao, Z.; Wu, S. CattleFaceNet: A cattle face identification approach based on RetinaFace and ArcFace loss. Comput. Electron. Agric. 2022, 193, 106675. [Google Scholar] [CrossRef]
Li, W.; Ji, Z.; Wang, L.; Sun, C.; Yang, X. Automatic individual identification of Holstein dairy cows using tailhead images. Comput. Electron. Agric. 2017, 142, 622–631. [Google Scholar] [CrossRef]
Ghosh, P.; Mustafi, S.; Mukherjee, K.; Dan, S.; Roy, K.; Mandal, S.N.; Banik, S. Image-Based Identification of Animal Breeds Using Deep Learning. In Deep Learning for Unmanned Systems; Springer: Cham, Switzerland, 2021; pp. 415–445. [Google Scholar] [CrossRef]
Villon, S.; Chaumont, M.; Subsol, G.; Villéger, S.; Claverie, T.; Mouillot, D. Coral Reef Fish Detection and Recognition in Underwater Videos by Supervised Machine Learning: Comparison between Deep Learning and HOG+SVM Methods. In International Conference on Advanced Concepts for Intelligent Vision Systems; Springer: Cham, Switzerland, 2016; pp. 160–171. [Google Scholar] [CrossRef] [Green Version]
Tamou, A.B.; Benzinou, A.; Nasreddine, K.; Ballihi, L. Transfer Learning with deep Convolutional Neural Network for Underwater Live Fish Recognition. In Proceedings of the 2018 IEEE International Conference on Image Processing, Applications and Systems (IPAS), Sophia Antipolis, France, 12–14 December 2018; pp. 204–209. [Google Scholar] [CrossRef]
Blount, D.; Gero, S.; Van Oast, J.; Parham, J.; Kingen, C.; Scheiner, B.; Stere, T.; Fisher, M.; Minton, G.; Khan, C.; et al. Flukebook: An open-source AI platform for cetacean photo identification. Mamm. Biol. 2022, 1–19. [Google Scholar] [CrossRef]
Zhao, Z.; Liu, Y.; Sun, X.; Liu, J.; Yang, X.; Zhou, C. Composited FishNet: Fish Detection and Species Recognition from Low-Quality Underwater Videos. IEEE Trans. Image Process. 2021, 30, 4719–4734. [Google Scholar] [CrossRef] [PubMed]
Nixon, D. Computer vision neural network using YOLOv4 for underwater fish video detection in Roatan, Honduras. In Proceedings of the 2021 IEEE International Conference on Machine Learning and Applied Network Technologies (ICMLANT), Soyapango, El Salvador, 16–17 December 2021. [Google Scholar]
Arzoumanian, Z.; Holmberg, J.; Norman, B. An astronomical pattern-matching algorithm for computer-aided identification of whale sharks Rhincodon typus. J. Appl. Ecol. 2005, 42, 999–1011. [Google Scholar] [CrossRef]
Kaur, M.; Vijay, S. Underwater images quality improvement techniques for feature extraction based on comparative analysis for species classification. Multimedia Tools Appl. 2022, 81, 19445–19461. [Google Scholar] [CrossRef]
Zhang, Z.; Du, X.; Jin, L.; Wang, S.; Wang, L.; Liu, X. Large-scale underwater fish recognition via deep adversarial learning. Knowl. Inf. Syst. 2022, 64, 353–379. [Google Scholar] [CrossRef]
Deep, B.V.; Dash, R. Underwater fish species recognition using deep learning techniques. In Proceedings of the 2019 6th International Conference on Signal Processing and Integrated Networks (SPIN), Noida, India, 7–8 March 2019. [Google Scholar]
Syreen, R.F.; Merriliance, K. Bi-Level Fish Detection Using novel Iterative Grouped Convolutional Neural Network. Des. Eng. 2021, 16652–16665. [Google Scholar]
Villon, S.; Mouillot, D.; Chaumont, M.; Darling, E.S.; Subsol, G.; Claverie, T.; Villéger, S. A Deep learning method for accurate and fast identification of coral reef fishes in underwater images. Ecol. Inform. 2018, 48, 238–244. [Google Scholar] [CrossRef] [Green Version]
Rauf, H.T.; Lali, M.I.U.; Zahoor, S.; Shah, S.Z.H.; Rehman, A.U.; Bukhari, S.A.C. Visual features based automated identification of fish species using deep convolutional neural networks. Comput. Electron. Agric. 2019, 167, 105075. [Google Scholar] [CrossRef]
Xu, W.; Matzner, S. Underwater fish detection using deep learning for water power applications. In Proceedings of the 2018 International Conference on Computational Science and Computational Intelligence (CSCI), Las Vegas, NV, USA, 12–14 December 2018. [Google Scholar]
Petrellis, N. Fish morphological feature recognition based on deep learning techniques. In Proceedings of the 2021 10th International Conference on Modern Circuits and Systems Technologies (MOCAST), Thessaloniki, Greece, 5–7 July 2021. [Google Scholar]
Rosales, M.A.; Palconit, M.G.B.; Almero, V.J.D.; Concepcion, R.S.; Magsumbol, J.-A.V.; Sybingco, E.; Bandala, A.A.; Dadios, E.P. Faster R-CNN based Fish Detector for Smart Aquaculture System. In Proceedings of the 2021 IEEE 13th International Conference on Humanoid, Nanotechnology, Information Technology, Communication and Control, Environment, and Management (HNICEM), Manila, Philippines, 28–30 November 2021; pp. 1–6. [Google Scholar] [CrossRef]
Jalal, A.; Salman, A.; Mian, A.; Shortis, M.; Shafait, F. Fish detection and species classification in underwater environments using deep learning with temporal information. Ecol. Inform. 2020, 57, 101088. [Google Scholar] [CrossRef]
Hossain, E.; Alam, S.M.S.; Ali, A.A.; Amin, M.A. Fish activity tracking and species identification in underwater video. In Proceedings of the 2016 5th International Conference on Informatics, Electronics and Vision (ICIEV), Dhaka, Bangladesh, 13–14 May 2016; pp. 62–66. [Google Scholar] [CrossRef]
Ben Tamou, A.; Benzinou, A.; Nasreddine, K. Targeted Data Augmentation and Hierarchical Classification with Deep Learning for Fish Species Identification in Underwater Images. J. Imaging 2022, 8, 214. [Google Scholar] [CrossRef] [PubMed]
Ben Tamou, A.; Benzinou, A.; Nasreddine, K. Live Fish Species Classification in Underwater Images by Using Convolutional Neural Networks Based on Incremental Learning with Knowledge Distillation Loss. Mach. Learn. Knowl. Extr. 2022, 4, 753–767. [Google Scholar] [CrossRef]
Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018. [Google Scholar]
Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv 2017, arXiv:1704.04861. [Google Scholar]
Dai, J.; Qi, H.; Xiong, Y.; Li, Y.; Zhang, G.; Hu, H.; Wei, Y. Deformable convolutional networks. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017. [Google Scholar]
Chen, S.; Liu, Y.; Gao, X.; Han, Z. MobileFaceNets: Efficient CNNs for Accurate Real-Time Face Verification on Mobile Devices. In Chinese Conference on Biometric Recognition; Springer: Berlin/Heidelberg, Germany, 2018; pp. 428–438. [Google Scholar]
Lin, M.; Chen, Q.; Yan, S. Network in network. arXiv 2013, arXiv:1312.4400. [Google Scholar]
Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.C. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018. [Google Scholar]
Misra, D. Mish: A self regularized non-monotonic neural activation function. arXiv 2019, arXiv:1908.08681. [Google Scholar]
Liu, W.; Wen, Y.; Yu, Z.; Yang, M. Large-margin softmax loss for convolutional neural networks. arXiv 2016, arXiv:1612.02295. [Google Scholar]
Deng, J.; Guo, J.; Xue, N.; Zafeiriou, S. Arcface: Additive angular margin loss for deep face recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019. [Google Scholar]
Hu, J.; Shen, L.; Albanie, S.; Sun, G.; Wu, E. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018. [Google Scholar]
Wang, Q.; Wu, B.; Zhu, P.; Li, P.; Zuo, W.; Hu, Q. Supplementary material for ‘ECA-Net: Efficient channel attention for deep convolutional neural networks. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020. [Google Scholar]
Bochkovskiy, A.; Wang, C.Y.; Liao, H. YOLOv4: Optimal Speed and Accuracy of Object Detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
Duta, I.C.; Liu, L.; Zhu, F.; Shao, L. Improved residual networks for image and video recognition. In Proceedings of the 2020 25th International Conference on Pattern Recognition (ICPR), Milan, Italy, 10–15 January 2021. [Google Scholar]

Figure 1. Individual fish images taken from underwater. They contain extreme occlusion between individuals, chromatic aberration variation, a complex undersea environment and a high degree of visual similarity among fish species.

Figure 2. Schematic diagram of individual fish recognition.

Figure 3. LIFRNet network structure.

Figure 4. Enhanced feature extraction network with CBAM.

Figure 5. CBAM module structure schematic.

Figure 6. Schematic diagram of the structure of the channel attention model.

Figure 7. Schematic diagram of the structure of the spatial attention model.

Figure 8. Performance of individual fish detection with CBAM: (a) is the performance of normal detection method; (b) is the performance of detection method with CBAM.

Figure 9. Schematic diagram of the network structure of the individual fish recognition module.

Figure 10. The network architecture is used for the recognition network. Each intermediate tensor is labeled filter size, channels and stride. Activation layers and batch normalization layers are inserted after each convolution but are not pictured here.

Figure 11. Standard convolution and deformable convolution: (a) is standard convolution and (b) is deformable convolution. The circle in the figure represents the change of the convolution range.

Figure 12. The fish with similar body features and distinct edge point features.

Figure 13. Mish activation function.

Figure 14. Visualization result.

Figure 15. Example images of DlouFish dataset: (a) different individual fish; (b) the percentage of the number of various classes in the dataset.

Figure 16. Experimental results of different background environments.

Table 1. Accuracy comparison of the incorporation of different attention mechanisms.

Network	YOLOV4-Tiny	YOLOV4-Tiny + SE	YOLOV4-Tiny + ECA	YOLOV4-Tiny + CBAM	YOLOV4
mAP	83.48	87.39	85.34	88.60	84.95
Parameters	5.87 M	5.92 M	5.87 M	5.96 M	63.9 M

Table 2. The effects of different stages of improvement on the recognition effect.

Fish ID	Distance (Original Network)	Distance (Modification of Activation Function)	Distance (Modification of Activation Function + Average Pooling Layer)	Distance (Modification of Activation Function + Average Pooling Layer + Deformable Convolution)
001403 001413	0.214	0.091	0.085	0.078
000102 000103	0.172	0.056	0.047	0.041
000401 000601	1.411	1.471	1.496	1.545
000103 000201	1.455	1.465	1.467	1.537
001804 001801	0.027	0.025	0.049	0.037
001804 002001	1.062	1.306	1.369	1.443

Table 3. The influence of deformable convolution on individual fish recognition.

Fish ID	Method 1	Method 2
000601 000602	0.018	0.019
000103 000201	1.628	1.537
000401 000601	1.545	1.427
Increased number of parameters	2,359,296	0

Table 4. Comparison of fish distance performance in different networks.

Category	Image	Resnet50	Iresnet50	Mobile- Facenet	MobilenetV2	Our Method
The same fish distance	001403	0.445	0.317	0.326	0.349	0.078
	001413
	000102	0.471	0.242	0.298	0.305	0.041
	000103
The different fish distance	001501	1.253	1.368	1.413	1.317	1.486
	001601
	001101	1.404	1.529	1.582	1.446	1.601
	001201

Table 5. Performance comparison of different networks.

Network	Parameter Quantity	Acc 1
Resnet50	25.6 M	91.94%
Iresnet50	25.56 M	94.65%
Mobilefacenet	0.99 M	95.21%
MobileV2	3.4 M	93.91%
Our method	4.23 M	97.8%

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yin, J.; Wu, J.; Gao, C.; Jiang, Z. LIFRNet: A Novel Lightweight Individual Fish Recognition Method Based on Deformable Convolution and Edge Feature Learning. Agriculture 2022, 12, 1972. https://doi.org/10.3390/agriculture12121972

AMA Style

Yin J, Wu J, Gao C, Jiang Z. LIFRNet: A Novel Lightweight Individual Fish Recognition Method Based on Deformable Convolution and Edge Feature Learning. Agriculture. 2022; 12(12):1972. https://doi.org/10.3390/agriculture12121972

Chicago/Turabian Style

Yin, Jianhao, Junfeng Wu, Chunqi Gao, and Zhongai Jiang. 2022. "LIFRNet: A Novel Lightweight Individual Fish Recognition Method Based on Deformable Convolution and Edge Feature Learning" Agriculture 12, no. 12: 1972. https://doi.org/10.3390/agriculture12121972

APA Style

Yin, J., Wu, J., Gao, C., & Jiang, Z. (2022). LIFRNet: A Novel Lightweight Individual Fish Recognition Method Based on Deformable Convolution and Edge Feature Learning. Agriculture, 12(12), 1972. https://doi.org/10.3390/agriculture12121972

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

LIFRNet: A Novel Lightweight Individual Fish Recognition Method Based on Deformable Convolution and Edge Feature Learning

Abstract

1. Introduction

2. Related Work

3. The Proposed Work

3.1. Underwater Fish Detection Module

3.2. Individual Fish Recognition Module

3.2.1. Backbone Network

3.2.2. Deformable Convolution

3.2.3. Edge Feature Learning

3.2.4. Mish Activation Function

3.2.5. Loss Function

3.3. Real-Time Visualization Module

4. Experiment Result and Discussion

4.1. The Dataset

4.2. Experimental Setup

4.3. Performance Comparison of YOLOV4-Tiny Incorporating Different Attention Mechanisms

4.4. Performance Comparison and Analysis

4.4.1. Comparison and Analysis of Improved Network Performance

4.4.2. Analysis of Experimental Results of Adding Deformable Convolution Using Different Methods

4.4.3. Analysis of Experimental Results of Different Background Environment

4.4.4. Analysis of Experimental Results under Different Backbone Networks

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI