An Improved Tuna-YOLO Model Based on YOLO v3 for Real-Time Tuna Detection Considering Lightweight Deployment

: A real-time tuna detection network on mobile devices is a common tool for accurate tuna catch statistics. However, most object detection models have multiple parameters, and normal mobile devices have difficulties in satisfying real-time detection. Based on YOLOv3, this paper proposes a Tuna-YOLO, which is a lightweight object detection network for mobile devices. Firstly, following a comparison of the performance of various lightweight backbone networks, the Mo-bileNet v3 was used as a backbone structure to reduce the number of parameters and calculations. Secondly, the SENET module was replaced with a CBAM attention module to further improve the feature extraction ability of tuna. Then, the knowledge distillation was used to make the Tuna-YOLO detect more accurate. We created a small dataset by deframing electronic surveillance video of fishing boats and labeled the data. After data annotation on the dataset, the K-means algorithm was used to get nine better anchor boxes on the basis of label information, which was used to improve the detection precision. In addition, we compared the detection performance of the Tuna-YOLO and three versions of YOLO v5-6.1 s/m/l after image enhancement. The results show that the Tuna-YOLO reduces the parameters of YOLOv3 from 234.74 MB to 88.45 MB, increases detection precision from 93.33% to 95.83%, and increases the calculation speed from 10.12 fps to 15.23 fps. The performance of the Tuna-YOLO is better than three versions of YOLO v5-6.1 s/m/l. Tuna-YOLO provides a basis for subsequent deployment of algorithms to mobile devices and real-time catch statistics.


Introduction
Tuna fisheries are known as "golden fisheries", and there are five regional fishery management organizations in three oceans to manage them [1][2][3].Due to the depletion of several tuna stocks, stock assessment has been carried out in these regional fishery management organizations, and the resource status of important tuna stocks has been closely monitored, both of which depend on relevant fishery data and scientific observer data submitted by flag states [4][5][6].It is a time-consuming and cost-ineffective task in traditional fishery management.Meanwhile, artificial intelligence technologies and deep learning algorithms are gradually replacing part of human labor.In tuna fisheries, they are gradually replacing the work of human observers.Therefore, scientists use computer vision techniques based on deep learning to classify tuna species and estimate tuna sizes to get more accurate data [7].In addition, electronic observers will probably replace human observers in the near future.
The low detection precision usually results from a large number of species with different shapes, and complex scenarios [8,9] in tuna longline fisheries.Strachan et al. used the image binarization algorithm to differentiate the fish contour and background was added.As the teacher network, the vanilla YOLO v3 used knowledge distillation on the backbone to guide the training of Tuna-YOLO.Through the ablation test, it is proved that the detection performance and speed can be further improved without any increase in calculation.Tuna-YOLO can provide technical support for the replacement of manual observers by electronic observers in the future.

Image Dataset Resource
All of the image data were from Liancheng Overseas Fishery (Shenzhen) Co., Ltd. and all the fish were put on the deck for shooting to make statistics of the catch.This study selected feature-diverse Thunnus obesus, Thunnus albacares, Makaira mazara and Xiphias gladius at a complex environment as four kinds of detection targets.Furthermore, the dataset was divided into training set, test set and validation set by the ratio of 8:1:1.The biological characteristic information of four fish species is shown in Table 1.In order to reduce the risk of data leakage, we avoided the "late split" operation when performing image augmentation to prevent false impressions that the detection performance is excellent.
Table 1.Biological characteristic information of four species.

Xiphias gladius
Specifically, in this study, due to the special lighting environment three preprocessing algorithms were sequentially used to enhance the proved the quality of the original image and the detection performanc terms of three evaluation indexes.In order to effectively reduce the n and improve the detection and classification accuracy, Tuna-YOLO w on YOLOv3.The Darknet-53 was replaced by MobileNetv3, and th module was added.As the teacher network, the vanilla YOLO v3 use lation on the backbone to guide the training of Tuna-YOLO.Through is proved that the detection performance and speed can be further imp increase in calculation.Tuna-YOLO can provide technical support for manual observers by electronic observers in the future.

Image Dataset Resource
All of the image data were from Liancheng Overseas Fishery (S and all the fish were put on the deck for shooting to make statistics of t selected feature-diverse Thunnus obesus, Thunnus albacares, Makaira gladius at a complex environment as four kinds of detection targets.F taset was divided into training set, test set and validation set by th biological characteristic information of four fish species is shown in reduce the risk of data leakage, we avoided the "late split" operatio image augmentation to prevent false impressions that the detection p lent.

Makaira mazara 150
Long body, strong front body, prom sword, two raised crests on both sid cle

Experiment Set
The Tuna-YOLO was evaluated by using the above dataset.T made use of a warm-up strategy, learning rate decay [26], L2 regulari preprocessing techniques [28].The maximum rate of learning was 0.1 ally decreasing.Each network will undergo 200 epochs of training.Py used to conduct all experiments on an NVIDIA RTX 3070 graphics ca

Image Enhancement
Because of the low-resolution monitoring equipment and lack taken from tuna vessels were in low definition and it was difficult which would affect the accuracy of tuna species and size detection.Di based on the fact that different tuna species have different local featu ever, without clear local feature information, the recognition error 210 Brown-black back and body, small dorsal fin, no gill and pelvic fin, long and thin snout takes up one third of total length

Thunnus obesus
Specifically, in this study, due to the special lighting environment three preprocessing algorithms were sequentially used to enhance the proved the quality of the original image and the detection performanc terms of three evaluation indexes.In order to effectively reduce the n and improve the detection and classification accuracy, Tuna-YOLO w on YOLOv3.The Darknet-53 was replaced by MobileNetv3, and th module was added.As the teacher network, the vanilla YOLO v3 use lation on the backbone to guide the training of Tuna-YOLO.Through is proved that the detection performance and speed can be further imp increase in calculation.Tuna-YOLO can provide technical support for manual observers by electronic observers in the future.

Image Dataset Resource
All of the image data were from Liancheng Overseas Fishery (S and all the fish were put on the deck for shooting to make statistics of t selected feature-diverse Thunnus obesus, Thunnus albacares, Makaira gladius at a complex environment as four kinds of detection targets.F taset was divided into training set, test set and validation set by the biological characteristic information of four fish species is shown in reduce the risk of data leakage, we avoided the "late split" operation image augmentation to prevent false impressions that the detection pe lent.

Makaira mazara 150
Long body, strong front body, prom sword, two raised crests on both side cle

Experiment Set
The Tuna-YOLO was evaluated by using the above dataset.T made use of a warm-up strategy, learning rate decay [26], L2 regulariz preprocessing techniques [28].The maximum rate of learning was 0.1 ally decreasing.Each network will undergo 200 epochs of training.Py used to conduct all experiments on an NVIDIA RTX 3070 graphics car

Image Enhancement
Because of the low-resolution monitoring equipment and lack taken from tuna vessels were in low definition and it was difficult t which would affect the accuracy of tuna species and size detection.Di based on the fact that different tuna species have different local featu ever, without clear local feature information, the recognition error 320 Long and thin pectoral fins, big eyes, gray belly, pectoral fins blue-black above, brown below

Thunnus albacares
Specifically, in this study, due to the special lighting environment three preprocessing algorithms were sequentially used to enhance the proved the quality of the original image and the detection performanc terms of three evaluation indexes.In order to effectively reduce the n and improve the detection and classification accuracy, Tuna-YOLO w on YOLOv3.The Darknet-53 was replaced by MobileNetv3, and th module was added.As the teacher network, the vanilla YOLO v3 use lation on the backbone to guide the training of Tuna-YOLO.Through is proved that the detection performance and speed can be further imp increase in calculation.Tuna-YOLO can provide technical support for manual observers by electronic observers in the future.

Image Dataset Resource
All of the image data were from Liancheng Overseas Fishery (S and all the fish were put on the deck for shooting to make statistics of t selected feature-diverse Thunnus obesus, Thunnus albacares, Makaira gladius at a complex environment as four kinds of detection targets.F taset was divided into training set, test set and validation set by the biological characteristic information of four fish species is shown in reduce the risk of data leakage, we avoided the "late split" operation image augmentation to prevent false impressions that the detection pe lent.

Makaira mazara 150
Long body, strong front body, prom sword, two raised crests on both side cle

Experiment Set
The Tuna-YOLO was evaluated by using the above dataset.T made use of a warm-up strategy, learning rate decay [26] Specifically, in this study, due to the special lighting environment three preprocessing algorithms were sequentially used to enhance the proved the quality of the original image and the detection performanc terms of three evaluation indexes.In order to effectively reduce the n and improve the detection and classification accuracy, Tuna-YOLO w on YOLOv3.The Darknet-53 was replaced by MobileNetv3, and th module was added.As the teacher network, the vanilla YOLO v3 use lation on the backbone to guide the training of Tuna-YOLO.Through is proved that the detection performance and speed can be further imp increase in calculation.Tuna-YOLO can provide technical support for manual observers by electronic observers in the future.

Image Dataset Resource
All of the image data were from Liancheng Overseas Fishery (S and all the fish were put on the deck for shooting to make statistics of t selected feature-diverse Thunnus obesus, Thunnus albacares, Makaira gladius at a complex environment as four kinds of detection targets.F taset was divided into training set, test set and validation set by the biological characteristic information of four fish species is shown in reduce the risk of data leakage, we avoided the "late split" operation image augmentation to prevent false impressions that the detection pe lent.

Makaira mazara 150
Long body, strong front body, promi sword, two raised crests on both side cle

Experiment Set
The Tuna-YOLO was evaluated by using the above dataset.Th made use of a warm-up strategy, learning rate decay [26], L2 regulariz preprocessing techniques [28].The maximum rate of learning was 0.1 ally decreasing.Each network will undergo 200 epochs of training.Py used to conduct all experiments on an NVIDIA RTX 3070 graphics car

Image Enhancement
Because of the low-resolution monitoring equipment and lack taken from tuna vessels were in low definition and it was difficult t which would affect the accuracy of tuna species and size detection.Di based on the fact that different tuna species have different local featu ever, without clear local feature information, the recognition error r 150 Long body, strong front body, prominent snout like a sword, two raised crests on both sides of caudal peduncle

Experiment Set
The Tuna-YOLO was evaluated by using the above dataset.The training process made use of a warm-up strategy, learning rate decay [26], L2 regularization [27] and data preprocessing techniques [28].The maximum rate of learning was 0.1, which was gradually decreasing.Each network will undergo 200 epochs of training.PyTorch 1.8.1 [29] was used to conduct all experiments on an NVIDIA RTX 3070 graphics card.

Evaluation Index 2.3.1. Image Enhancement
Because of the low-resolution monitoring equipment and lack of light, all videos taken from tuna vessels were in low definition and it was difficult to detect the target, which would affect the accuracy of tuna species and size detection.Distinguishing tuna is based on the fact that different tuna species have different local feature attributes.However, without clear local feature information, the recognition error rate would become higher.So, image augmentation was used to optimize the texture features of these videos [30,31].Firstly, saturation adjustment and histogram equalization were used to improve the overall quality of the images.Then, image brightness was increased by gamma correction.Finally, the improved multi-scale Retinex algorithm was selected to improve the image quality.

Improved Lightweight Tuna-YOLO Network Architecture
Vanilla YOLO v3 consists of three parts, i.e., backbone, bottleneck and prediction.In the backbone, Darknet53 extracts feature information by convolution calculation, then the other two parts select a certain pixel in the image as the center point and a suitable loss function according to the prior box distribution.To make the loss value converge as quickly as possible, the size of prior boxes and the stating position of the detection frame in the network training were fine-tuned to minimize for the loss function, and to convert the detection problem to a regression question [32].
The proposed Tuna-YOLO employed MobileNet v3 as backbone.The MobileNet v3 combined the advantages of depth wise separable convolution [33], linear bottleneck in-verted residuals [34], NetAdapt algorithm [35] and SENet [36] structure.However, the SENet is not suitable for object detection because of its global characteristic.Local feature is necessary for object detection because of the complexity of scenes, e.g., different targets in similar background, same targets in different backgrounds.Therefore, the CBAM attention mechanism was used to improve the network's ability to understand local feature information to replace the SENet structure.The structure of Tuna-YOLO was shown in Figure 1.

Improved Lightweight Tuna-YOLO Network Architecture
Vanilla YOLO v3 consists of three parts, i.e., backbone, bottleneck the backbone, Darknet53 extracts feature information by convolution c other two parts select a certain pixel in the image as the center point function according to the prior box distribution.To make the loss quickly as possible, the size of prior boxes and the stating position of t in the network training were fine-tuned to minimize for the loss funct the detection problem to a regression question [32].
The proposed Tuna-YOLO employed MobileNet v3 as backbone.combined the advantages of depth wise separable convolution [33], li verted residuals [34], NetAdapt algorithm [35] and SENet [36] struct SENet is not suitable for object detection because of its global characte is necessary for object detection because of the complexity of scenes, e. in similar background, same targets in different backgrounds.Theref tention mechanism was used to improve the network's ability to under information to replace the SENet structure.The structure of Tuna-YO Figure 1.In the Tuna-YOLO network, the design of anchor boxes was esse gree, accuracy and real-time detection efficiency after network model t simulate the real length and width of the real bounding boxes, K-Mean was used to cluster 9 anchor boxes according to the label.The distrib truth bounding boxes with label information [37]   In the Tuna-YOLO network, the design of anchor boxes was essential for fitting degree, accuracy and real-time detection efficiency after network model training.In order to simulate the real length and width of the real bounding boxes, K-Means cluster algorithm was used to cluster 9 anchor boxes according to the label.The distribution of all ground truth bounding boxes with label information [37] was shown in Figure 2. We can find the positions of the annotation boxes basically in the center of the image, and the distribution of the annotation boxes is relatively consistent.It can be seen from the size statistics of the annotation boxes that the targets are mainly large-sized objects, which meet the characteristics of the sample types in the dataset and are conducive to subsequent study.Analysis and research of detection algorithms improve the model precision.In total, 9 sizes were obtained from clustering, e.g., (16,23), (32,45), (34,26), (39, 68), (74, 48), (82, 123), (136, 98), (187, 231) and (386, 334).

Knowledge Distillation
The calculations and parameter amounts of the network were reduced significantly after adopting the lightweight design, but so was the detection accuracy.To address this problem, knowledge distillation (KD), a joint training method by transferring "knowledge", was employed to improve the detection accuracy.The KD structure was shown in Figure 3. KD is the process of imitating the distillation in chemistry, using the softmax function with temperature parameters to "distill" the logit output from complex and large networks, so as to generate more information in categories.This part of the information is called "dark knowledge".The additional information guides the simple and small network to learn more knowledge, and the two networks are called the teacher network and the student network, respectively.In total, 9 sizes were obtained from clustering, e.g., (16,23), (32,45), (34,26), (39, 68), (74, 48), (82, 123), (136, 98), (187, 231) and (386, 334).

Knowledge Distillation
The calculations and parameter amounts of the network were reduced significantly after adopting the lightweight design, but so was the detection accuracy.To address this problem, knowledge distillation (KD), a joint training method by transferring "knowledge", was employed to improve the detection accuracy.The KD structure was shown in Figure 3. KD is the process of imitating the distillation in chemistry, using the softmax function with temperature parameters to "distill" the logit output from complex and large networks, so as to generate more information in categories.This part of the in-formation is called "dark knowledge".The additional information guides the simple and small network to learn more knowledge, and the two networks are called the teacher network and the student network, respectively.
To diversify the information distribution output by the teacher network, we used the temperature parameter τ to get soft prediction output by distilling logits output between the teacher network and student network.The same dataset was used because soft prediction output implied the information of the negative samples.With the help of SoftMax active function, the teacher network's class prediction probability distribution could be regarded as the soft target.Similarly, this method was used to get not only the soft prediction output but also the hard prediction output from the student network.As for the soft prediction output, soft prediction output and soft target were used to calculate loss value by loss function L so f t , which was a part of total loss.The L so f t was defined as: where P T i is the i-th soft target at time T, Q T i is the i-th soft prediction output at time T, N is the total number of samples and N = 27 in this paper.
shown in Figure 3. KD is the process of imitating the distillation in chem softmax function with temperature parameters to "distill" the logit outpu and large networks, so as to generate more information in categories.This formation is called "dark knowledge".The additional information guides small network to learn more knowledge, and the two networks are called work and the student network, respectively.Figure 3.The KD structure.Firstly, a "teacher" network whose network depth and width were much larger than MobileNet v3 was grafted to the original YOLO v3 structure, and trained to reach a good performance.Then, a relatively simple student network, MobileNet v3, was built and then trained by the "dark knowledge" of the superior teacher network, so that the detection performance of the student network was close to that of the teacher network, which was another kind of knowledge transfer.
The hard prediction output and ground truth were used to calculate the loss value by loss function L hard .The L hard was defined as: where C T i is the i-th hard target at time T, N = 27 in this paper.The total loss function was defined as: In this paper, DenseNet201-YOLOv3 and backbone of improved Tuna-YOLO were selected, respectively, for the teacher network and the student network, as a way to improve the detection performance and to increase the mAP of Tuna-YOLO.

Methods
To test the enhancement results of different augmentation algorithms mentioned in Section 2.3.1, the images before and after augmentation were compared according to the combination and splitting of algorithms, and three indexes to evaluate the quality of images were used [38], i.e., standard deviation, mean gradient and information entropy.
In addition, the network performance and computation speed were synthetically compared between the Tuna-YOLO after knowledge distillation and other models, such as DenseNet121-YOLOv3, DenseNet169-YOLOv3 and DenseNet201-YOLOv3.In particular, the speed of these models was evaluated in terms of parameters, floating-point operations per second (FLOPs) and fps, and the detection performance was evaluated in terms of accuracy, recall rate and mAP [41].The closer the mAP value is to 1, the better the predictive performance of the network model.Generally, these three indexes can evaluate the detection performance of the network to varying degrees.The mAP reflects the detection accuracy on the basis of IoU, so it is the most important evaluation index.The performance of class prediction can be directly explored from the confusion matrix.The types of prediction mainly include the following four types: True Positive, False Negative, False Positive, and True Negative, which mainly reflect the relationship between the predicted class and the real class, which can be seen in Table 2 for the description.Accuracy represents the rate of predicting positively in prediction results, which is defined as: Recall rate represents the rate of predicting positively in all samples, which was defined as: R = X P X TP + X FN (5) where X TP represents the number of positive samples that are correctly divided, X FP represents the number of samples that are incorrectly classified as positive samples, X FN represents the number of wrongly classified as negative samples and X TN represents the number of negative samples that are correctly divided.The equation of mAP was defined as: where γ is the threshold of IoU, m is the number of different samples; γ = 0.5, m = 4 in this paper.

Comparison of Different Image Augmentations
Figure 4 shows the images before and after augmentation.Table 3 shows their respective values of standard deviation, mean gradient and information entropy.The improved Retinex algorithm achieved the best results on the three-evaluation index (Table 3).In order to verify the superiority of the improved multi-scale Retinex tation algorithm in network model performance, a comparison of the mA among Tuna-YOLO, YOLOv3 and DenseNet201 after training.The best on the improved multi-scale Retinex algorithm (Table 4).

Comparison of Detection between Tuna-YOLO and Other Lightweight Ne
To verify the detection performance of Tuna-YOLO after knowledg In order to verify the superiority of the improved multi-scale Retinex image augmentation algorithm in network model performance, a comparison of the mAP was conducted among Tuna-YOLO, YOLOv3 and DenseNet201 after training.The best mAP was based on the improved multi-scale Retinex algorithm (Table 4).

Comparison of Detection between Tuna-YOLO and Other Lightweight Network
To verify the detection performance of Tuna-YOLO after knowledge distillation, the same dataset from Liancheng Overseas Fishery (Shenzhen, China) Co., Ltd. was used, and the videos were framed into annotated images, which were input into all lightweight networks.All lightweight networks were deployed on the baseline of YOLOv3 for experiments.The results are shown in Table 5.
Compared with the YOLOv3 based on DarkNet53, the improved Tuna-YOLO in this study reduced Params by 62.3%, FLOPs by 73.5% and mAP by 1.8%, and increased fps by 50.5%.Compared with other lightweight networks, Tuna-YOLO had obvious advantages in terms of fps and mAP.The improvement of fps facilitates the real-time detection of tuna on mobile devices.Compared with the original YOLOv3, the knowledge-distilled Tuna-YOLO improved mAP by 7.67%.In general, YOLO v5 is more suitable for small object detection, and the detection performance improves with the increase of network parameters.The YOLOv5-6.1 large's mAP will be higher than that of the knowledge-distilled Tuna-YOLO when using the maximum number of parameters, but it still fails to meet our requirements for detection speed (Table 6).The comparison curves consisting of PR curve, F1 score curve, precision curve and recall curve, are shown in Figure 5.The model performance of Tuna-YOLO after knowledge distillation has been significantly improved compared with the original YOLOv3 (Figure 5).

Validation Results of the Network Model
The Tuna-YOLO after knowledge distillation and original YOLOv3 were used to detect the target from the electronic monitoring videos in frames.The detection precision of various tuna species is shown in Table 7, and the comparison of detection results is shown in Figure 6.The comparison of the confusion matrix is shown in Figure 7.The precision of Tuna-YOLO was higher than that of the original YOLOv3 (Table 7).

Validation Results of the Network Model
The Tuna-YOLO after knowledge distillation and original YOLOv3 were used to d tect the target from the electronic monitoring videos in frames.The detection precision o

Model
Xiphias

The Advantages of Improved Tuna-YOLO
The improved Tuna-YOLO based on YOLOv3 is suitable for tuna detection beca

The Advantages of Improved Tuna-YOLO
The improved Tuna-YOLO based on YOLOv3 is suitable for tuna detection because the YOLOv3 performs better than Faster-RCNN and SSD in terms of speed and accuracy [42][43][44].On the basis of YOLOv3, the Tuna-YOLO has higher detection accuracy and simpler network structure [45].Jiang et al. [46] and Wang et al. [47] also optimized the original network on the aspect of detection accuracy, but it was practically difficult to deploy on mobile devices because of the many parameters.Jiang et al. [48] integrated the ideas of dense connections, residual connections and group convolution.The mAP indicators on the mini-RD and SAR ship detection dataset (SSDD) reached 83.21% and 85.46%, respectively.Furthermore, compared with different YOLO v5 versions, Tuna-YOLO after knowledge distillation is also superior in comprehensive performance.Tuna-YOLO borrowed the idea of YOLOv3 and replaced the backbone DarkNet53 of YOLOv3 with the MobileNetv3 with CBAM attention module, which reduced the parameter amount of network by 62.3%.Given that the parameter decrease would inevitably lead to a decrease in mAP, knowledge distillation was used to operate knowledge transfer from the teacher network to the student network.The detection precision of knowledge distillation was improved by 6.41%, which perfectly solved the problem of low detection accuracy with reduced model parameters, hence the realization of real-time detection.

The Comparison of Performance between Tuna-YOLO and Other Models
In this study, the mAP of Tuna-YOLO reached 85.74%, the fps reached 15.23 fps and the accuracy reached 95.83%, which were at a relatively high level.Alessandro Betti [49] presented YOLO-S, whose architecture exploited a small feature extractor based on Darknet20, as well as skip connection, via both bypass and concatenation, and reshape-passthrough layer to avoid the vanishing gradient problem, and promoted feature reuse across the network and combined low-level positional information with more meaningful high-level information.Muksit et al. [50] proposed the YOLO-Fish, which enhanced YOLOv3 by fixing the issue of up sampling step sizes to reduce the misdetection of tiny fish and adding spatial pyramid pooling (SPP) to the first model to add the capability to detect fish appearance in those dynamic environments, respectively.Kazim et al. [51] put forward the improved YOLOv3 by increasing detection scale from three to four, applied K-means clustering to optimize the anchor boxes, novel transfer learning technique and improved loss function to increase the model performance.Gupta et al. [52] raised the YOLO Fish, which used hierarchical techniques in both the classification step and in the dataset, with a mAP of 91.8%.However, the speed was only 3.79 fps.Wang et al. [53] proposed the FML-Centernet model to detect fish in a river.This network improved the efficiency of detection by testing the ratio of positive and negative samples and optimizing the loss function.The mAP of the network reached 85%, and the fps was 10.12 fps.Li et al. [54] proposed an improved fish recognition network model YOLO-V3-Tiny-MobileNet by optimizing the MobileNet and YOLO-V3-Tiny network models, which had shallow feature extraction network layers and insufficient extraction capabilities.The recognition precision and accuracy of the model were 79.3% and 86.5%, respectively.Xu et al. [55] proposed a detection network model (YOLOv3-Corn) for corn leaf diseases and insect pests.By modifying the feature fusion layers of the network model, a new Head (104 × 104) was constructed to improve the detection accuracy; the detection accuracy of the network model YOLOv3-Corn was 84.34%, and the fps was 8.7 fps.Table 8 shows the specific results of comparison.[50] 76.56 7.6 Improved YOLO v3 [51] 91.3 5.9 YOLO Fish [52] 91.8 3.79 FML-Centernet [53] 85 10.12 YOLO-V3-Tiny-MobileNet [54] 86.

Conclusions
An improved real-time lightweight detection network was proposed for tuna detection based on the YOLOv3 network, which used lightweight design on the backbone and combined the CBAM attention mechanism module on the basis of the MobileNet v3 network structure to build an efficient tuna detection network, Tuna-YOLO.Knowledge distillation was used on the Tuna-YOLO to improve the accuracy of the model.The experimental results showed that the Tuna-YOLO was more streamlined after model compression, which realized the real-time detection of tuna on the mobile devices by increasing the detection speed and provided potential for the replacement of human observers with electronic observers.
, L2 regulariz preprocessing techniques [28].The maximum rate of learning was 0.1 ally decreasing.Each network will undergo 200 epochs of training.Py used to conduct all experiments on an NVIDIA RTX 3070 graphics car 2.3.Evaluation Index 2.3.1.Image Enhancement Because of the low-resolution monitoring equipment and lack taken from tuna vessels were in low definition and it was difficult t which would affect the accuracy of tuna species and size detection.Di based on the fact that different tuna species have different local featu ever, without clear local feature information, the recognition error 200 Mid-long pectoral fins, long second dorsal fins, blue-black back, gray abdomen, other fins are yellow Makaira mazara

Figure 1 .
Figure 1.The structure of Tuna-YOLO.The improved MobileNet v3's 6th, 12th, structure were used as a branch to combine with the Neck part of YOLO v3.
was shown in Figure positions of the annotation boxes basically in the center of the image, a of the annotation boxes is relatively consistent.It can be seen from the

Figure 1 .
Figure 1.The structure of Tuna-YOLO.The improved MobileNet v3's 6th, 12th, and 15th layer bneck structure were used as a branch to combine with the Neck part of YOLO v3.

JFigure 2 .
Figure 2. The distribution of all ground truth bounding boxes.(a) the length and width information of all ground truth bounding boxes and (b) the real size and shape of all ground truth bounding boxes.

Figure 2 .
Figure 2. The distribution of all ground truth bounding boxes.(a) the length and width information of all ground truth bounding boxes and (b) the real size and shape of all ground truth bounding boxes.

Figure 4 .
Figure 4. Effect of different data augmentation.(a-e), respectively, represent orig tion adjustment, histogram equalization, gamma correction and improved multi

Figure 5 .
Figure 5.Comparison of network model training results.(a-h), respectively, represent PR curve, F score curve, precision curve and recall curve.

Figure 5 .
Figure 5.Comparison of network model training results.(a-h), respectively, represent PR curve, F1 score curve, precision curve and recall curve.

Figure 6 .
Figure 6.Comparison of test results.(a,c,e,g), respectively, represent detection result imag original YOLOv3, and (b,d,f,h) are the detection results by Tuna-YOLO.

Figure 7 .
Figure 7.The confusion matrices of YOLO v3 and Tuna-YOLO.(a,b), respectively, represent the confusion matrix of original YOLO v3 and Tuna-YOLO.

Table 1 .
Biological characteristic information of four species.
Brown-black back and body, small d and pelvic fin, long and thin snout ta total length Thunnus obesus 320 Long and thin pectoral fins, big eyes ral fins blue-black above, brown belo Thunnus albacares 200 Mid-long pectoral fins, long second black back, gray abdomen, other fins

Table 1 .
Biological characteristic information of four species.

Table 1 .
Biological characteristic information of four species.
pectoral fins, long second d black back, gray abdomen, other fins

Table 1 .
Biological characteristic information of four species.
Brown-black back and body, small d and pelvic fin, long and thin snout ta total length Thunnus obesus 320 Long and thin pectoral fins, big eyes ral fins blue-black above, brown belo Thunnus albacares 200 Mid-long pectoral fins, long second d black back, gray abdomen, other fins

Table 2 .
The distribution of classification results.

Table 3 .
Comparison of image quality between before and after image augmentation.

Table 3 .
Comparison of image quality between before and after image augmenta

Table 4 .
mAP of network model in different augmentation algorithms.

Table 4 .
mAP of network model in different augmentation algorithms.

Table 5 .
Performance comparison of different lightweight networks.

Table 6 .
Comparison of model performance.

Table 7 .
Precision values of tunas on Tuna-YOLO and original YOLOv3.

Table 7 .
Precision values of tunas on Tuna-YOLO and original YOLOv3.

Table 8 .
Comparison with different algorithms based on the YOLO.