Identification of Large Yellow Croaker under Variable Conditions Based on the Cycle Generative Adversarial Network and Transfer Learning

: Variable-condition fish recognition is a type of cross-scene and cross-camera fish re-identification (re-ID) technology. Due to the difference in the domain distribution of fish images collected under different culture conditions, the available training data cannot be effectively used for the new identification method. To solve these problems, we proposed a new method for identifying large yellow croaker based on the CycleGAN (cycle generative adversarial network) and transfer learning. This method constructs source sample sets and target sample sets by acquiring large yellow croaker images in controllable scenes and actual farming conditions, respectively. The CycleGAN was used as the basic framework for image transformation from the source domain to the target domain to realize data amplification in the target domain. In particular, IDF (identity foreground loss) was used to optimize identity loss judgment criteria, and MMD (maximum mean discrepancy) was used to narrow the distribution between the source domain and target domain. Finally, transfer learning was carried out with the expanded samples to realize the identification of large yellow croaker under varying conditions. The experimental results showed that the proposed method achieved good identification results in both the controlled scene and the actual culture scene, with an average recognition accuracy of 96.9% and 94%, respectively. These provide effective technical support for the next steps in fish behavior tracking and phenotype measurement.


Introduction
The large yellow croaker (Larimichthys crocea) is marine migratory fish of the Pacific Northwest [1]. In recent years, due to its high economic value, large yellow croaker has become one of the most commercially valuable marine fishery species in China's aquaculture production [2]. Accurate identification of large yellow croaker under variable conditions is of great significance to improve the ability of the high-throughput detection of fish phenotypes in genetic breeding and aquaculture production [3]. Affected by differences in sampling methods [4,5], illumination [6], and the farming environment [7,8], the images obtained in different farming scenarios have different a domain distribution, which limits the effect of data interoperability and increases the difficulty associated with the industrial application of identification technology. In recent years, with the development of transfer learning [9,10] and person re-ID [11,12], a possible solution for the accurate identification of fish targets under variable working conditions has been provided.
With the progress of information technology such as artificial intelligence and deep learning, the identification technology of production objects, diseases, and behaviors in the agricultural field has been continuously developed and has been widely used in different fields of the industry [13,14]. However, compared with static objects such as rice and plants [15,16], and large land-based animals such as cattle and sheep [17,18], the development of underwater freestyle moving-target recognition technology is slow, and relevant studies have mostly focused on application scenarios where specific working conditions and training data are easy to obtain [19]. To solve this problem, transfer learning technology has been introduced into the field of fish identification. For example, Zhang et al. [6] proposed a transfer learning method based on a residual network to realize unconstrained swimming fish identification. Yuan et al. [20] used a metric learning network based on a residual structure to realize 5-way, 15-shot fish target recognition, and the recognition accuracy was higher than 90%. The method based on small samples and transfer learning can effectively improve the accuracy of fish identification. However, the application of this method has certain limitations when targeting unconstrained swimming fish under obvious actual farming conditions with relatively different backgrounds and postures. This is mainly due to: (1) the variation in sampling device and scene leading to a difference in domain distribution between the target domain and source domain, which results in the ineffective use of available training data in the new recognition domain; and (2) the change in fish swimming posture leading to the dispersion of target features, and a single data source cannot cover all feature spaces, which reduces the adaptability of the algorithm to different features.
Re-identification (Re-ID) is a technology that unifies images from different source domains into the feature space of the target domain through image domain-to-domain conversion to achieve data enhancement. It is mainly used to solve the limitations of supervised methods in the application of real scenes, and has made significant progress in the field of pedestrian re-identification. For example, Wang et al. [21] used attribute features to transfer the model to an unlabeled dataset; Deng et al. [22] embedded a twin network into the CycleGAN [23] to realize image transmission from the source domain to the target domain. Ye et al. proposed RACE (robust anchor embedding) [24] and DGM (dynamic graph co-matching) [25] to solve the video-based unsupervised person re-identification problem. Tang et al. [26] used the CycleGAN and MMD methods to strengthen the retention of pedestrian identity information and narrow the distribution of the domain.
Inspired by this image domain transfer method, we proposed a large yellow croaker recognition method based on cyclic adversarial networks and transfer learning. In this study, large yellow croaker images were collected as source samples in controlled scenes in a specific environment to provide basic image samples for fish recognition in different scenes. The large yellow croaker in the scene to be identified was collected as the target domain sample. The CycleGAN was adopted as the basic model for the transfer from the source domain to the target domain. The foreground mask self-evaluation method [27] was used to optimize the evaluation effect of the model's identity loss. MMD was introduced as the loss function to improve the model's ability to adapt to the distribution of pull and push domains. Then, the expanded sample was used for transfer learning to reduce the influence of uneven distribution of transfer learning sample data on the recognition accuracy, and realize the recognition of a free-swimming large yellow croaker. Finally, the ablation test and comparison test were used to verify the effectiveness of the proposed method.

Method Overview
In this study, the ReID method was mainly used to unify the style of fish images obtained from different scenes, increase the number of target samples to be identified in the application scene and improve the adaptability of the algorithm. Therefore, in order to preserve identity information and extract the distribution between different domains, we embedded a foreground mask loss and a MMD layer in the CycleGAN to enable image domain-to-domain transfer. In addition, transfer learning has been shown to have advantages in feature reuse, but due to the uneven distribution of pre-trained samples, the performance of the models varies significantly in different recognition tasks. Therefore, in order to increase the recognition ability of the fish features of the transfer learning pretrained model, we optimized the knowledge transfer process by expanding the fish dataset. The overall framework of the proposed method is shown in Table 1. As shown in Figure 1, in our algorithm, source domains (specific breeding scenarios) and target domains (ship farming scenarios) were input into the CycleGAN to generate false target domains and false source domains. In a large-scale water mass, due to the relatively sparse spatial distribution of fish, it is easy to obtain background images without fish, and common foreground extraction algorithms can segment the foreground and background more accurately. Therefore, in the conversion process, using the difference in foreground image changes to calculate identity loss, the false source domain identity information could be pulled to the target domain identity information. At the same time, the distribution of the false target domain will be pulled towards the target domain. After the translation was completed, the tagged fish source domain image was transferred to the target domain image to realize the expansion of the fish sample set in the culture scene. Finally, the transfer model was trained with the expanded data to further improve the target recognition accuracy. The framework for our method. There were two components of our method (i.e., a data transfer layer and a knowledge transfer layer). The data transfer part was mainly used to achieve sample expansion in the target domain, including CycleGAN, IDF, and maximum mean discrepancy. CycleGAN was mainly used to transfer images from the source domain to the target domain. IDF restricts CycleGAN to retaining fish identity information during the transfer process. The maximum mean discrepancy was used to narrow the distribution between the source and destination domains during transmission. Knowledge transfer was mainly used to improve the ability of the model to recognize the characteristics of fish, and the amplified data were mainly used to increase the effect of the transfer model on knowledge transfer.

CycleGAN-Based Translation
The CycleGAN is an image transformation model based on the generative adversarial network (s), which consists of two pairs of generators and discriminators. G is the mapping function from the source domain to the target domain, and ̂ is the mapping function from the target domain to the source domain. DS and DT are style discriminators for the source and target domains, respectively. S and T represent the source and destination domains, respectively. The CycleGAN mainly realizes the image conversion of two different domains by minimizing the loss function, so as to realize the multi-modal conversion between domains. The objective function consists of three parts: adversarial loss, cycle consistency loss and identity loss. The purpose of the adversarial loss function is to make the generated image indistinguishable from the real image of the target domain. The adversarial loss is used to maximize the probability of the discriminator to output the image to the generator, which is used to improve the quality of the converted image and make it more realistic. Applying adversarial loss to the two mapping functions, the objectives are expressed as: (1) where s and t are the source domain image and the target domain image, respectively. Since the full diversity of the target domain cannot be captured using adversarial loss alone, the generator may produce a limited or repetitive output, and the correct mapping from a single input s to the desired output t cannot be guaranteed. Therefore, the Cy-cleGAN uses cycle consistency loss so that the learned mapping function has periodic consistency. The cycle consistency loss improves the generator's ability to generate images that retain the original image by minimizing the difference between the original input image and the cyclic production image, thereby improving the accuracy of image conversion. The cycle consistency loss is expressed as:

Identity Foreground Loss
As part of the CycleGAN loss function, the identity loss forces the generator to not change the characteristics of the input image, but to maintain its own characteristics. The CycleGAN identity loss only uses global characteristics to count identity loss, and does not consider the impact of background noise on identity information, which leads to the mis-generation of identity during style transfer. However, under actual farming conditions, phenomena such as light absorption, scattering and diffraction caused by water turbidity reduce the feature difference between foreground fish and background noise in the image. This results in unclear identification of the fish after style conversion, which increases the risk of misidentification. To solve this problem and ensure the correct identification of fish as much as possible, we introduced the foreground constraint into the identity loss to evaluate the changes in fish before and after migration. Due to the large volume of aquaculture water and the relatively dispersed distribution of fish under actual farming conditions, the background difference method could be easily used to obtain fish foreground images [27]. Therefore, the fish foreground images were used as the constraint conditions, and Formula (4) was used to calculate the loss of fish identity information.
where M(s) and M(t) represent the foreground mask of the fish image with a specific pose, and ⨀ represents the same or logical operation.

Maximum Mean Discrepancy
For large yellow croaker images collected under different working conditions, the CycleGAN only transferred the background style of each image from the source domain to the target domain, ignoring the intra-domain distribution differences. The distribution difference provides different reference features for target recognition, which is very important in the task of target recognition with variable characteristics. Maximum mean discrepancy is mainly used to evaluate whether the distribution of two datasets is similar, and in the field of style transfer, it is mainly used to minimize the distribution difference between two networks. Therefore, the maximum mean discrepancy was used to measure the distribution difference between different sampling scenarios to solve the problem of fish sample enhancement.
where k is the kernel function, m and n are the number of samples in the source and target domains, respectively, and i and j represent the coordinates of samples in specific domains. As is shown in Formula 6, the Gaussian kernel function was chosen in this paper to calculate the inner product between feature graphs.

Full Objective Function
By combining the CycleGAN, foreground mask loss and maximum mean discrepancy, we could achieve the full objective of CGAN-TM as: The 2 and 3 control the weights of foreground mask loss and maximum mean discrepancy during the translation process, respectively. Detailed analysis of the parameter sensitivity is presented in Section 4.7.

Transfer Learning
According to the actual farming conditions, it is difficult to construct a sufficient field sample set according to the change in the farming environment, so the identification of large yellow croaker becomes a small-sample recognition situation. In order to simplify the complexity of the model integration application, this study adopted VGG-16 as the basic transfer learning framework, and used the CIFAR-10 dataset (open dataset, 10 categories, 60,000 images) to pre-train the model. The new training samples composed of the original small-sample data and the migrated data were used to optimize the pre-trained model parameters, and the optimized model was used to realize fish target recognition.

Datasets and Evaluation Protocol
In order to evaluate the effectiveness of the method proposed in this paper, we constructed two image sample datasets: source domain and target domain. The source domain samples were collected in a recirculating aquaculture system with a controlled sampling environment, and the target domain samples were collected in an actual farming environment on an aquaculture ship. We took the source domain and the target domain as the identification scenes and verified each one.
Source area image: A total of 360 large yellow croakers with different specifications were placed in the temporary rearing tank. An underwater camera was used and the underwater depth of the camera was 40 cm. The camera was parallel to the water's surface during sampling, and the sampling was continuous for 24 h. A total of 600 images of large yellow croakers in different swimming states were selected to construct a source sample set, including 480 large yellow croaker images for training and 120 images for testing.
Target area image: We selected the "Guoxin 1" aquaculture ship, No. 1 warehouse, to collect the actual farmed fish images. The warehouse is 15 m deep and 8 m in diameter, with a total of approximately 10,000 large yellow croakers. In order to avoid the impact of fish and the influence of circulating water during sampling, a sliding rail was used to conduct continuous sampling at a depth of 4 m underwater for 1 h. A total of 300 images of large yellow croakers were obtained, of which 240 images were used for training and the remaining 60 images were used for testing.
We used VGG-16 as the core framework to verify the effect of fish image transfer and the effectiveness of transfer learning in different domains. We used recall, specificity and the mean average precision (mAP) to evaluate the performance of data transfer on the source domain and target domain. Meanwhile, we selected the recall and mean average precision (mAP) to evaluate transfer learning effects.

Implementation Details
Our method was implemented using the Pytorch framework. For the CycleGAN, we used foreground mask loss instead of the identity loss function. We calculate the MMD losses using five Gaussian cores with different σ values (0.25, 0.5, 1, 2, 4) and trained them with the CycleGAN. In Equation (7), 1 , 2 and 3 were set as 10, 5 and (0.6, 0.8), respectively.
In order to reduce the complexity of the model framework, VGG-16, which is consistent with the CycleGAN, was selected as the transfer learning backbone network and pre-trained on the CIFAR-10 dataset. We used SGD to optimize the model, and the SGD momentum parameter set to 0.9, the weight attenuation parameter was set to 0.0005, and the learning rate was set to 0.0002. In the transfer learning stage, the original data, the generated fake data and the amplified data were used for transfer learning. Due to the small number of model parameters, freezing specific convolution layers had no obvious effect on reducing the training time, so all weight parameters were updated for transfer learning. We set the learning rate of the full connection layer to 0.01, the output dimensions to 2, the batch_size to 16, and the epoch to 60.
The GPU used was RTX A5000, the system used was Windows10, and the Pytorch version used was 1.0. Several randomly selected generated images are shown in Figure 2.

Input
Cycle

Evaluation
The main goal of fish data domain transfer was to expand the training samples, while the goal of transfer learning was to improve the fish recognition rate of a specific sample number. In order to verify the validity of this algorithm, we verified the effect of domain transfer from the source domain to the target domain, and from the target domain to the source domain.

Performance of Direct Transfer
Due to the insufficient number of samples, the model demonstrated poor performance in the source domain and the target domain. As is shown in Table 1, the recall rate of 52% and 24.24% and the mAP of 56.17% and 57.58%, in the source domain and the target domain, respectively, were achieved. However, in order to expand the number of samples, the performance of the model was slightly improved when the source domain and the target domain were directly migrated. For example, the recall rate of the data migrated to the target domain was 30.3%. Furthermore, due to the poor quality of the data in the target domain, the recall rate decreased by 4.5% after direct transfer to the source domain, and the performance decreased significantly. The main reason for this was that the source domain and target domain samples were collected under different settings, resulting in different domain distributions.

Effectiveness of the CycleGAN
As the source and target datasets are often collected in different environments, the CycleGAN is able to efficiently generate images with similar styles of datasets. Therefore, we used the CycleGAN to transfer the source domain and target domain image styles to each other, obtain fake source and fake target data. We combined the fake data with the original training data for training. As is shown in Table 1, after adding pseudo-training samples, the recall rate and mAP value of the model in the source domain increased by 10% and 5%, respectively. However, the model recall dropped to 18.2 percent in the target domain, and mAP dropped to 61.65 percent. This was mainly due to the poor quality of the target domain samples and the unsupervised transmission process of the CycleGAN, so the generated images contained a lot of noise and did not take into account the distribution of different datasets.

Necessity of Identity Foreground Loss
In order to enhance the transfer effect of fish feature information, we introduced identity foreground loss (IDF) into the CycleGAN. As is shown in Figure 2, by supervising the process of identity transfer, IDF reduces the interference of similar background features on the foreground transfer and eliminates the noise in the process of image generation. Finally, it improves the performance of the transfer model in the task of fish sample expansion. As is shown in Table 1, CycleGAN + IDF can increase the source domain recall rate to 60% and the mAP value to 80%. However, the target domain recall rate dropped to 12.1 percent and the mAP value dropped to 58.9 percent. As can be seen from Figure 2, due to the poor image quality of the target domain, the difference between the foreground and background was reduced. However, the CycleGAN + IDF was only concerned with the image difference between two different domains, but did not take into account the image difference between a specific domain, which reduced the transfer effect from the source domain to the target domain, resulting in an obvious loss of fish features in the generated images.

Importance of Maximum Mean Discrepancy
We embedded the MMD into the CycleGAN with IDF, trying to narrow the distribution by reducing the maximum mean discrepancy between the foreground in different domains. As can be seen from Table 1, the recall rate and mAP value of the model increased to 65% and 81%, respectively, after the transfer of the target domain to the source domain. Furthermore, the increase was 24.25% and 65.75%, respectively, after the transfer of the source domain to the target domain. The results show that embedding MMD loss in the CycleGAN can successfully minimize the distribution differences between different foreground samples, which makes fish target feature extraction more efficient in different datasets. However, it can be seen from Figure 2 that the image generated by adding only MMD loss function for low-quality image data transfer still had local identity feature loss.

Practicability of Our Method
We verified the practicability of the proposed method by migrating from the source domain to the target domain and from the target domain to the source domain. Obviously, with the CycleGAN, IDF, MMD, the recall rate and mAP accuracy of the final identification results were the highest, reaching 77.5%, 88.75%, 69.5%, and 84.95%, respectively. These results increased by 30%, 15%, 39.2%, and 19.8%, respectively, compared with direct transfer. Since only 300 samples of the target set were selected, the results further prove the practicability of the proposed method in terms of its practical application.

Parameter Sensitivity
In this study, three parameters, 1 , 2 and 3 , control the relative importance of three target losses. We evaluated their influence on the mutual transfer between the source and target domains. 1 is the original parameter in the CycleGAN, and parameter 10 has been proved to be the optimal choice in the literature [23,26]. In this study, the foreground mask loss was used to optimize the identity loss function in the CycleGAN, so 2 could learn from the original parameters. 3 is a key parameter controlling MMD loss weight, so this section mainly compared the sensitivity of 2 and 3 ; the comparison results are shown in Tables 2 and 3. It is clear that both the foreground identity loss and MMD loss have been proven to be effective compared to the case of 2 = 0 and 3 = 0. From Table  3, we can see that foreground identity loss was positive when the target domain was transferred to the source domain. However, due to the poor image quality of the target domain, the features of the target to be recognized were not obvious. When the image was transferred from the source domain to the target domain, the transfer effect was poor. As can be seen from Table 4, when the weight was small, MMD loss had a significant impact on the recognition effect, and when the weight was large, the recognition effect changed slowly with the weight. Therefore, for different datasets, the values of 2 and 3 should be carefully selected due to the difference in data quality and domain distribution.

Comparison with State-of-the-Art Methods
We compared the proposed method with state-of-the-art methods, including interdomain comparative transfer [27] and multi-domain joint transfer [28,29], etc. The experimental results are shown in Table 4. PTGAN (person transfer generative adversarial network) mainly considers domain differences between datasets without considering identity information loss caused by intra-domain deformation. This is similar to the method of only considering IDF loss function in the ablation experiment in this study, resulting in poor performance. CamStyle (camera style) uses label smooth regularization (LSR) to reduce the overfitting risk caused by noisy generated samples, and achieves a good effect in the target domain. However, since the loss of identity difference is not considered, the feature loss of transfer samples seriously reduces the performance of source domain recognition. StarGAN uses the mask vector to optimize the feature differences in different datasets and improve the algorithm's transfer effect among features. However, in the field of underwater free-swimming fish recognition, especially the transfer learning when the features of acquired fish images are seriously lost, the algorithm's transfer recognition effect is poor. In the target domain, recall and mAP were 24.03% and 66.95%, respectively. After the destination domain was transferred to the source domain, the mAP reached 88.39%. Compared with the above methods, this study preserved the identity information in the transmission process by introducing IDF loss, thus eliminating the background noise to a certain extent. Meanwhile, the MMD layer was adopted to learn the distribution of unlabeled datasets, thus successfully reducing the distribution difference between different foreground samples.

Effectiveness of Transfer Learning
From Table 5, we can see that the recognition accuracy of the original data was higher than that of the fake data, and the recognition accuracy of the amplified data was the highest, with recall reaching 96.5% and 87%, respectively. Overall, the recognition accuracy of fish in the source domain was higher than that in the target domain. On the whole, the fish identification accuracy in the source area was higher than that in the target area. This was mainly due to the low image quality of the target domain, resulting in more loss of identity information during data migration. By comparing the overall recognition accuracy and fish recognition accuracy, we found that, although the recall of false source domain data was low, the mAP value was high. This proves that the background recognition rate was high. It was further demonstrated that data migration effectively distinguished between background and foreground features. On the whole, the transfer learning method effectively improved the target recognition accuracy. The recognition accuracy after the amplification of the source domain and the target domain reached 96.9% and 94%, respectively, which reflects the effectiveness of the combination of data amplification and transfer learning.

Conclusions
In this paper, we proposed an improved CycleGAN and transfer learning method to recognize the large yellow croaker (Larimichthys crocea) in a factory ship farming scene. There are still many problems associated with a variable scene recognition task (e.g., the distribution of different datasets cannot be pulled closer during the translation process and a large number of learning samples are difficult to obtain under production conditions). To solve the first problem, we introduced the foreground ID loss and maximum mean discrepancy into the CycleGAN framework. Meanwhile, to enhance the practicality of the technology, we used transfer learning to improve recognition accuracy. We conducted extensive experiments and the results have validated the effectiveness of our method. When compared with state-of-the-art methods, the improved CycleGAN method can achieve competitive performance with a simple framework, and the final test results show that the data amplification method of domain transfer can improve the recognition accuracy of small-sample transfer learning.