Lightweight Pig Face Feature Learning Evaluation and Application Based on Attention Mechanism and Two-Stage Transfer Learning

: With the advancement of machine vision technology, pig face recognition has garnered significant attention as a key component in the establishment of precision breeding models. In order to explore non-contact individual pig recognition, this study proposes a lightweight pig face feature learning method based on attention mechanism and two-stage transfer learning. Using a combined approach of online and offline data augmentation, both the self-collected dataset from


Introduction
Pork constitutes the primary segment of China's meat consumption market.In 2018, China faced its inaugural outbreak of African swine fever virus in pigs [1].This highly contagious virus is characterized by an alarming mortality rate approaching 100%, and currently, there is no vaccine available for its prevention and treatment.The individual monitoring of pigs enables swift isolation of those infected, safeguarding the overall herd welfare.The advent of the intelligentization era has propelled the popularity of machine vision technology, particularly in image recognition, with widespread applications in medicine, the military, education, and beyond [2][3][4].Nevertheless, technological research in the animal breeding industry remains somewhat limited.Facial recognition technology, as a pivotal research avenue in the realm of computer vision, has garnered substantial attention in recent years [5,6].As animal husbandry advances and modern agriculture undergoes transformation, there is an escalating need for the effective management and monitoring of pig herds.To establish a scalable, standardized, and precise pig farming model, the integration of computer technology for the automatic and accurate identification of individual pigs is indispensable [7][8][9].Consequently, addressing issues pertaining to the accuracy, robustness and real-time performance of pig facial recognition calls for urgent and in-depth research and innovation.
Moreover, the utilization of pig facial recognition technology in agriculture spans various dimensions.This technology enables the monitoring of individual pig facial features, facilitating the swift detection of collective health issues within pig herds.This, in turn, allows for timely intervention measures and reduces the risk of disease transmission.Pig management systems, leveraging facial recognition, can precisely identify each pig and tailor personalized feeding plans based on individual features and historical feeding data.This optimization leads to improved feed utilization, enhanced growth efficiency, and minimized feed wastage.Pig facial recognition technology is also instrumental in monitoring pig behavior, encompassing activity levels and social interactions.This insight aids in assessing the health and welfare of each pig.The analysis of facial expressions further offers the potential to identify the emotional states of pigs, providing valuable insights into their emotional well-being and responses to environmental changes [10][11][12][13][14].In conclusion, the incorporation of pig facial recognition technology in agriculture has the potential to elevate production efficiency, enhance animal welfare, and offer comprehensive, real-time information for farm management.
Traditional animal breeding commonly relies on two identification methods for individual animals [15][16][17][18].One approach involves using RFID, an advanced technique that requires implanting chips into animals.However, this method poses potential health risks to the animals, and RFID systems struggle to collect information simultaneously from multiple animals.Another method employs physical tags, such as ear tags and branding, but these can cause stress and have a notable impact on animal health.Moreover, interactions such as biting and rubbing among animals can result in the loss of these physical tags.
To overcome the limitations associated with traditional contact-based identification methods, the realm of individual animal recognition has witnessed the emergence of various non-contact identification approaches grounded in deep learning and computer vision [19][20][21][22].These approaches often involve models incorporating convolutional neural networks (CNNs) and attention mechanisms, enabling swift identification of individual animals through the use of cameras and computing devices.This, in turn, enhances the accuracy and robustness of animal recognition to a certain extent.Wada et al. [23] proposed a feature space method for pig facial recognition, achieving a recognition rate of 97.9% for 16 categories by manually cropping facial images to isolate critical areas.However, this method falls short of achieving automated pig facial feature extraction.In a study by Kashiha et al. [24], Fourier descriptors with rotation and translation invariance were utilized to preserve pattern features.By drawing unique patterns on 10 pigs, they achieved a recognition accuracy of 88.7%.Si Yongsheng and colleagues [25] enhanced the YOLOv4 model by integrating the RFB-s structure into its backbone network and refining the NMS algorithm, thereby improving the recognition accuracy for cows with similar body patterns.Wang Rong and team [26] developed an open-set pig face recognition method with an integrated attention mechanism capable of identifying pigs not present in the training dataset.Xie Qiuju and associates [27] created an enhanced Densenet-CBAM model for pig face recognition, combining DenseNet with CBAM to better integrate pig facial features.Huang Luwen's group [28] proposed the DWT-GoatNet, a sheep face recognition model merging frequency and spatial domain characteristics, which is effective in varying lighting conditions, such as daytime and nighttime.Mark F. Hansen and colleagues [29] employed a convolutional neural network (CNN) for non-invasive individual pig identification, achieving high recognition accuracy.Marsot and team [30] introduced an adaptive pig face recognition method, demonstrating that the neural network can classify pigs by extracting their facial features.Xiao Jianxing and colleagues [31] utilized an enhanced Mask R-CNN and SVM classifier for image segmentation and back feature extraction of cows, enabling non-contact and stress-free individual recognition.Wang Zhenyao et al. [32] introduced a two-stage method based on triplet margin loss for pig face recognition.In the first stage, the efficient det-d0 model is employed for pig face detection.The second stage focuses on pig face classification, utilizing six backbone classification models (ResNet-18, ResNet-34, DenseNet-121, SENSE-3, AlexNet, and VGGNet-19) and proposes an enhanced approach based on triplet margin loss.The established two-stage model achieves an average precision of 91.35% for the recognition of 28 pig faces.Xu Beibei et al. [33] combined the lightweight RetinaFace-mobilenet with ArcFace in their proposed CattleFaceNet, achieving a recognition accuracy of 91.3% and a processing time of 24 frames per second (FPS), surpassing other methods in terms of efficiency.
In summary, despite advancements in pig face recognition research, several challenges persist: (1) Complex feature extraction algorithms limit their applicability to specific tasks, hindering generalizability.(2) The data environment is straightforward, allowing for the recognition of only individual pigs in uncomplicated settings.The data often originate from the same shooting scenes, overlooking the impact of factors such as shooting perspectives, lighting, and pose variations.This limitation restricts its practical application in real-world scenarios.(3) The developed models are often large and fail to meet lightweight criteria, restricting their practical use in pig face recognition.
Therefore, the pig face recognition model presented in this study, employing an attention mechanism and two-stage transfer learning, focuses on crucial feature learning and extracts multidimensional feature information from images.Moreover, it updates the weight parameters based on the variance in the objectives, establishing a connection between feature information and task objectives.The trained two-stage model offers several advantages, including high accuracy, a lightweight design, and easy deployment.Capable of identifying individual pigs in complex cross-domain scenarios, this model serves as a reference for implementing non-contact individual pig identification in precision breeding.

Data Collection and Filtering
This study employs two distinct datasets.The first dataset was gathered from the pig breeding farm at Shanxi Agricultural University, situated in Taigu County, Jinzhong City, Shanxi Province.Figure 1 illustrates the natural environment of the pigsty.The breeding facility comprises three breeds of breeding pigs: Jin-Fen White, Horse-Body, and Hybrid pigs (Jin-Fen White × Horse-Body).We selected the facial features of fattening pigs from these three breeds (70-180 days old) as the subjects of our study.At three specific time intervals: 8:00-9:00 in the morning, 12:00-1:00 at noon, and 4:00-5:00 in the afternoon, we conducted multi-angle, multi-scenario image and video captures of the fattening pigs within the pig breeding farm.To ensure the practicality and authenticity of the research, no deliberate cleaning treatment was applied during the collection of facial data from the pigs.
Another portion of the experimental data is derived from the public dataset of the JD Finance Pig Face Recognition Competition.This dataset consists of 30 videos corresponding to 30 pigs, with the duration of each pig's video set at 1 min.Using OpenCV, we performed frame-by-frame processing on the captured video stream, extracting one image every 3 frames.In total, 41,313 photos were collected.Figure 2 illustrates the pig facial sample data collected in a complex environment.
To address the challenge of excessive similarity among adjacent images in the splitframe intercepted dataset, we employed the Perceptual Hashing algorithm (p-hash) to filter out highly similar images.The p-hash algorithm generates a unique "fingerprint" string for each image, enabling the comparison of these fingerprints to determine image similarity-the closer the results-the more similar the images.This process involves transforming the image using Discrete Cosine Transform (DCT) to acquire the mean value of DCT coefficients and calculating image similarity based on these transform domain features.Moreover, even after the removal of highly similar images, the dataset still contains unusable data, such as images where the pig face is obscured, the image quality is poor, or the pig face is absent.Consequently, these data also require manual filtering.Another portion of the experimental data is derived from the public dataset of the JD Finance Pig Face Recognition Competition.This dataset consists of 30 videos corresponding to 30 pigs, with the duration of each pig's video set at 1 min.Using OpenCV, we performed frame-by-frame processing on the captured video stream, extracting one image every 3 frames.In total, 41,313 photos were collected.Figure 2 illustrates the pig facial sample data collected in a complex environment.To address the challenge of excessive similarity among adjacent images in the splitframe intercepted dataset, we employed the Perceptual Hashing algorithm (p-hash) to filter out highly similar images.The p-hash algorithm generates a unique "fingerprint" string for each image, enabling the comparison of these fingerprints to determine image similarity-the closer the results-the more similar the images.This process involves transforming the image using Discrete Cosine Transform (DCT) to acquire the mean value of DCT coefficients and calculating image similarity based on these transform domain features.Moreover, even after the removal of highly similar images, the dataset still contains unusable data, such as images where the pig face is obscured, the image quality is poor, or the pig face is absent.Consequently, these data also require manual filtering.

Division of the Dataset
Following data organization, 59 pigs were identified as suitable for pig face feature learning (29 from Shanxi Agricultural University's breeding farm and 30 from the "JD Finance" Global Data Explorer Competition public dataset).Dataset A, used to train the model, consisted of 1273 images of pigs numbered 1-21 from Shanxi Agricultural University and 1418 images of 30 pigs from the JD dataset.Dataset B, used for model testing, included 457 images of pigs numbered 22-29 from Shanxi Agricultural University.Another portion of the experimental data is derived from the public dataset of th Finance Pig Face Recognition Competition.This dataset consists of 30 videos correspo ing to 30 pigs, with the duration of each pig's video set at 1 min.Using OpenCV, we formed frame-by-frame processing on the captured video stream, extracting one im every 3 frames.In total, 41,313 photos were collected.Figure 2 illustrates the pig fa sample data collected in a complex environment.To address the challenge of excessive similarity among adjacent images in the sp frame intercepted dataset, we employed the Perceptual Hashing algorithm (p-hash) to ter out highly similar images.The p-hash algorithm generates a unique "fingerpr string for each image, enabling the comparison of these fingerprints to determine im similarity-the closer the results-the more similar the images.This process invo transforming the image using Discrete Cosine Transform (DCT) to acquire the mean va of DCT coefficients and calculating image similarity based on these transform dom features.Moreover, even after the removal of highly similar images, the dataset still c tains unusable data, such as images where the pig face is obscured, the image qualit poor, or the pig face is absent.Consequently, these data also require manual filtering.

Division of the Dataset
Following data organization, 59 pigs were identified as suitable for pig face fea learning (29 from Shanxi Agricultural University's breeding farm and 30 from the Finance" Global Data Explorer Competition public dataset).Dataset A, used to train model, consisted of 1273 images of pigs numbered 1-21 from Shanxi Agricultural Uni sity and 1418 images of 30 pigs from the JD dataset.Dataset B, used for model test included 457 images of pigs numbered 22-29 from Shanxi Agricultural University.

Division of the Dataset
Following data organization, 59 pigs were identified as suitable for pig face feature learning (29 from Shanxi Agricultural University's breeding farm and 30 from the "JD Finance" Global Data Explorer Competition public dataset).

YOLOv8 Network
YOLOv8 [34,35] stands as the latest innovation in the YOLO series of object detection algorithms.It was developed by Ultralytics (Figure 5).In comparison to its predecessors, YOLOv8 exhibits significant enhancements in both speed and accuracy.Therefore, it is well-suited for real-time object detection applications.

YOLOv8 Network
YOLOv8 [34,35] stands as the latest innovation in the YOLO series of object detection algorithms.It was developed by Ultralytics (Figure 5).In comparison to its predecessors, YOLOv8 exhibits significant enhancements in both speed and accuracy.Therefore, it is well-suited for real-time object detection applications.

YOLOv8 Network
YOLOv8 [34,35] stands as the latest innovation in the YOLO series of object detection algorithms.It was developed by Ultralytics (Figure 5).In comparison to its predecessors, YOLOv8 exhibits significant enhancements in both speed and accuracy.Therefore, it is well-suited for real-time object detection applications.In its backbone architecture, YOLOv8 replaces YOLOv5's C3 with the c2f module, thereby enhancing the capture of gradient flow information while maintaining a lightweight design.It dynamically adjusts channel numbers for different model scales, resulting in a significant improvement in model performance.In the Neck section, YOLOv8 In its backbone architecture, YOLOv8 replaces YOLOv5's C3 with the c2f module, thereby enhancing the capture of gradient flow information while maintaining a lightweight design.It dynamically adjusts channel numbers for different model scales, resulting in a significant improvement in model performance.In the Neck section, YOLOv8 eliminates the two convolutional connection layers in the PAN-FPN upsampling stage, choosing instead to directly input feature outputs from various backbone stages into the upsampling process.Simultaneously, the C3 module is replaced with the C2f module.Figure 6 provides a structural comparison diagram of C3 and C2f.
eliminates the two convolutional connection layers in the PAN-FPN upsampling stage, choosing instead to directly input feature outputs from various backbone stages into the upsampling process.Simultaneously, the C3 module is replaced with the C2f module.Figure 6 provides a structural comparison diagram of C3 and C2f.In YOLOv8, a significant modification is observed in the Head component, transitioning from a coupled head to a decoupled head structure.This new structure comprises two distinct branches: classification and regression, each representing different features.Consequently, within the decoupled head, the number of channels differs between the classification (Cls) category branch and the box regression branch.Since the Anchor box depends on the dataset, an insufficient dataset cannot accurately represent the data's feature distribution.As a result, YOLOv8 employs the TaskAlignedAssigner as its matching strategy for positive and negative samples.
YOLOv8's loss function updates the gradient loss and consists of two components: classification loss (VFL Loss) and regression loss (CIOU Loss + DFL).The VFL, aiming to emphasize positive samples, utilizes Binary Cross-Entropy (BCE) for positive samples and In YOLOv8, a significant modification is observed in the Head component, transitioning from a coupled head to a decoupled head structure.This new structure comprises two distinct branches: classification and regression, each representing different features.Consequently, within the decoupled head, the number of channels differs between the classification (Cls) category branch and the box regression branch.Since the Anchor box depends on the dataset, an insufficient dataset cannot accurately represent the data's fea-ture distribution.As a result, YOLOv8 employs the TaskAlignedAssigner as its matching strategy for positive and negative samples.
YOLOv8's loss function updates the gradient loss and consists of two components: classification loss (VFL Loss) and regression loss (CIOU Loss + DFL).The VFL, aiming to emphasize positive samples, utilizes Binary Cross-Entropy (BCE) for positive samples and Focal Loss (FL) for negative samples to modulate the loss.CIOU Loss calculates the Intersection over Union (IOU) between the predicted and target boxes, while DFL enables the network to rapidly concentrate on the distribution of the target's proximity.The VFL is represented in Formula (1).
In the aforementioned equation, 'q' represents the label.For positive samples, 'q' denotes the Intersection over Union (IOU) between the bounding box (bbox) and ground truth (GT), whereas for negative samples, 'q' is assigned a value of 0.

Coordinate Attention
CA represents an efficient location-based attention mechanism [36].Addressing the limitation of SE, which models channel relationships but overlooks location information, CA enhances this aspect.CA considers not just the inter-channel relationships but also incorporates direction-related location information.Additionally, CA enhances model accuracy with minimal computational overhead.Furthermore, the CA module's flexibility and lightweight design allow for easy integration into the core modules of lightweight networks.Integrating CA into the YOLOv8 model enables the acquisition of cross-channel pig face features and long-range spatial dependencies.This approach also captures direction-aware and precise pig face positioning, thereby enhancing the focus on pig face feature learning and the extraction of multi-dimensional feature information within images.Figure 7 illustrates the structure of the CA module.

Transfer Learning
Transfer learning, a machine learning technology [37][38][39], involves transferring existing knowledge from a source domain to a target domain for acquiring new knowledge.The essence of transfer learning lies in identifying commonalities between the source and target domains.This approach can mitigate issues such as data scarcity, low training efficiency, and accuracy reduction caused by overfitting.The effectiveness of transfer learning is contingent upon the similarity among the source domain, target domain, and the specific task at hand.In this study, Dataset A serves as the source domain, while Dataset B is utilized as the target domain.The workflow diagram of this study is presented in Figure 8.
The study employs a two-stage transfer learning approach.The first stage involves training the model using the PASCAL VOC2012 dataset [40], an advanced version of the VOC2007 dataset.The VOC2012 dataset comprises 20 classes, with 11,530 images for training and validation and 27,450 annotated objects in regions of interest.The PASCAL VOC

Transfer Learning
Transfer learning, a machine learning technology [37][38][39], involves transferring existing knowledge from a source domain to a target domain for acquiring new knowledge.The essence of transfer learning lies in identifying commonalities between the source and target domains.This approach can mitigate issues such as data scarcity, low training efficiency, and accuracy reduction caused by overfitting.The effectiveness of transfer learning is contingent upon the similarity among the source domain, target domain, and the specific task at hand.In this study, Dataset A serves as the source domain, while Dataset B is utilized as the target domain.The workflow diagram of this study is presented in Figure 8.The study employs a two-stage transfer learning approach.The first stage involves training the model using the PASCAL VOC2012 dataset [40], an advanced version of the VOC2007 dataset.The VOC2012 dataset comprises 20 classes, with 11,530 images for training and validation and 27,450 annotated objects in regions of interest.The PASCAL VOC is renowned for its standardized datasets in image recognition and classification.The model was initialized using pre-trained weights derived from training on the VOC2012 dataset.The second stage focuses on training the collected pig face dataset.Transfer learning is applied to the weight file from the first stage, followed by model testing.

Training Model Parameters
The model underwent training with fp16 mixed precision on the training set for 100 epochs.The initial 50 epochs constituted freeze training.Key parameters included a learning rate of 0.01, an SGD optimizer with a momentum factor of 0.937, a weight decay coefficient of 0.0005, and a batch size of 16.The learning rate was modulated using the cosine annealing decay strategy.

Test Platform
This study utilized a desktop computer as the platform for model training, validation, and testing.The configuration of this test platform is detailed in Table 1.

Evaluation Metrics
The trained models underwent testing on an identical test set.To precisely evaluate the performance of various pig face feature learning models, metrics such as precision (P), recall (R), average precision (AP), mean average precision (mAP), and harmonic mean (F1) were computed from the classification results.The corresponding formulas are detailed in Equations ( 2)- (6).
In Formulas ( 2)-( 5), TP represents the number of positive samples correctly identified, FP denotes the number of negative samples incorrectly identified as positive, FN indicates the number of positive samples misclassified as negative, and N reflects the total number of detected sample categories.Higher values of precision (P), recall (R), and the harmonic mean (F1) signify superior performance of the recognition model.
In Formula ( 6), 'n' denotes the number of epochs.'Pn' represents the precision achieved after the nth epoch.The average precision (AP) is calculated by summing and averaging the precision values obtained from each epoch.AP, the average of precision values at various recall levels, corresponds to the area under the precision-recall (P-R) curve, bounded by the vertical and horizontal axes.A higher AP value indicates the superior performance of the recognition model.

YOLOv8 Model Compared with Other Models
This study compares the results of six algorithms: EfficientDet [41,42], SSD-mobilenetv2 [43,44], YOLOv5 [45,46], YOLOv7-tiny [47,48], swin_transformer [49][50][51], and YOLOv8.Firstly, these six algorithms are all considered lightweight, which is crucial for scenarios involving embedded devices or limited computational resources.Secondly, they represent different design philosophies and performance characteristics in the field of object detection.For instance, EfficientDet is renowned for its high performance and efficient design, SSD-mobilenetv2 for its lightweight structure, YOLOv5 and YOLOv7-tiny for their balance between speed and accuracy, and YOLOv8 as the latest version in the YOLO series, allowing insights into the performance improvements of the latest technology through comparisons with its predecessors.Additionally, swin_transformer is an emerging model based on the transformer architecture, showcasing promising features in image processing.By comparing these algorithms, a comprehensive understanding of their performance differences in pig face feature learning is achieved, providing deeper insights for research and applications in pig face feature learning.
To evaluate the effectiveness of feature learning in YOLOv8 on the pig face dataset, various models, including EfficientDet, SSD-mobilenetv2, YOLOv5, YOLOv7-tiny, YOLOv8, and swin_transformer, were trained using pre-trained weights on identical datasets and test environments.During the training of each model, the alterations in mean average precision (mAP) and loss values of the validation set were carefully examined and compared.
As the number of iterations increases, the mean average precision (mAP) of the five models on the validation set shows an upward trend, while the loss values demonstrate a downward trend, eventually stabilizing within a specific range (Figure 9).The mAP of the YOLOv8 model on the validation set stabilizes at 98% (Figure 9a), surpassing the other five models.This indicates that the YOLOv8 model exhibits superior learning of pig face features and achieves a higher recognition rate.The comparative evaluation of the learning effect on pig face features by different models is presented in Table 2. models on the validation set shows an upward trend, while the loss values demonstra downward trend, eventually stabilizing within a specific range (Figure 9).The mAP of YOLOv8 model on the validation set stabilizes at 98% (Figure 9a), surpassing the ot five models.This indicates that the YOLOv8 model exhibits superior learning of pig f features and achieves a higher recognition rate.The comparative evaluation of the lea ing effect on pig face features by different models is presented in Table 2.    Compared to other models, YOLOv8's F1 score exceeds that of EfficientDet, SSD, YOLOv5, and YOLOv7-tiny by 0.01, 5.13, 0.52, and 0.28 percentage points, respectively, as shown in Table 2, indicating its superior comprehensive performance.The F1 score of YOLOv8 is reduced by 0.84 compared to swin_transformer.However, the parameters of the swin_transformer model are 27.54 million, while the YOLOv8 model has parameters of 11.15 million.In comparison, the YOLOv8 model aligns more closely with the requirements of lightweight models.Simultaneously, the mAP@0.5 of the YOLOv8 model for pig face feature learning stands at 97.73%, surpassing EfficientDet, SSD, YOLOv5, YOLOv7-tiny and swin_transformer by 0.32, 1.23, 1.56, 0.43 and 0.14 percentage points, respectively.Additionally, the mAP@0.9 of YOLOv8 is 79.37%, which is 9.64, 56.7, 13.33, 9.33 and 5.05 percentage points higher compared to the aforementioned models.When the Intersection over Union (IoU) threshold is set to 0.9, YOLOv8 continues to outperform the other five models in terms of average precision.This superior performance is attributed to the replacement of the C3 structure with the C2f structure in YOLOv8's backbone and neck, offering richer gradient flow and the adjustment of channel numbers, which collectively enhance the model's performance.Furthermore, YOLOv8 employs a dynamic allocation strategy for positive and negative samples, proving to be more flexible and efficient than the static allocation strategies used in other models.This strategy aids the model in better-utilizing sample information, thereby enhancing training efficacy.Overall, the YOLOv8 model demonstrates higher robustness and superior generalization capabilities for pig face feature learning.
Comparative analysis reveals that the model trained using pre-training weights exhibits higher precision and superior efficacy (Table 3).Within the same dataset, models trained with pre-training weights demonstrated mAP values 1.8-11.45%higher than those trained solely with the backbone network, as shown in Table 3.This is attributed to the pre-trained model being trained on the PASCAL VOC2012 dataset, which encompasses 20 classes and over 30,000 labeled objects, thereby equipping the model with extensive feature knowledge.Consequently, when tasked with learning pig face features, this model, after fine-tuning the new target, demonstrates enhanced recognition ability and precision, exemplifying the benefits of transfer learning.

Impact of CA on the Model
To assess the impact of incorporating various attention mechanisms on the performance of pig face feature learning models, SE, CBAM, ECA and CA were integrated at identical positions in the YOLOv8 architecture and subsequently trained and tested on the same pig face dataset.The test results are presented in Table 4. Comparative analysis of models with different attention mechanisms revealed that only the YOLOv8-CA model exhibited a 0.3 percentage point increase in mean average precision (mAP), whereas the YOLOv8-CBMA, YOLOv8-ECA, and YOLOv8-SE models all demonstrated decreases in mAP.The superior performance of the CA mechanism is attributed to its focus on cross-channel pig face feature information and acquisition of longrange spatial dependencies, coupled with its ability to capture direction-aware and precise pig face positional information.In contrast, the SE mechanism primarily concentrates on establishing interdependencies among pig face features across channels while neglecting their locational information.Conversely, the CBAM mechanism merges channel and spatial attention yet overlooks long-range dependencies.
Figure 10 displays the precision-recall (PR) maps for models incorporating various attention mechanisms.
The precision-recall (PR) curve of YOLOv8-CA encompasses those of the other three attention mechanisms, signifying that YOLOv8-CA outperforms YOLOv8-CBAM, YOLOv8-ECA, and YOLOv8-SE, as illustrated in Figure 10.Notably, the area under the YOLOv8-CA curve is substantially larger than that of the YOLOv8 curve.This indicates that incorporating the CA mechanism into YOLOv8 enhances the model's performance and increases its average precision in recognition tasks.

Analysis of Transfer Learning Results
In trials involving eight pigs (Pigcs1-Pigcs8), a comparison was conducted among the backbone network (Nopre-pigcs), pre-trained weights (pigcs), and our two-stage trained weights (pre-pigcs).The mAP values were 84.81% for Nopre-pigcs, 92.60% for pigcs, and 95.73% for pre-pigcs, respectively.The two-stage model exhibited a 10.92% and 3.13% higher mAP value compared to the backbone network and pre-trained weights models, respectively, indicating enhanced recognition precision.Figure 11 displays the AP values for each pig when tested with the different models.Cs3 exhibited the largest increase in AP values, from 85.5% to 95.6%, while cs8 showed the smallest increase, with all AP values at 87.65%.The two-stage model demonstrated notable stability and generalizability.
attributed to its focus on cross-channel pig face feature information and acquisition of long-range spatial dependencies, coupled with its ability to capture direction-aware and precise pig face positional information.In contrast, the SE mechanism primarily concentrates on establishing interdependencies among pig face features across channels while neglecting their locational information.Conversely, the CBAM mechanism merges channel and spatial attention yet overlooks long-range dependencies.
Figure 10 displays the precision-recall (PR) maps for models incorporating various attention mechanisms.The precision-recall (PR) curve of YOLOv8-CA encompasses those of the other three attention mechanisms, signifying that YOLOv8-CA outperforms YOLOv8-CBAM, YOLOv8-ECA, and YOLOv8-SE, as illustrated in Figure 10.Notably, the area under the YOLOv8-CA curve is substantially larger than that of the YOLOv8 curve.This indicates that incorporating the CA mechanism into YOLOv8 enhances the model's performance and increases its average precision in recognition tasks.

Analysis of Transfer Learning Results
In trials involving eight pigs (Pigcs1-Pigcs8), a comparison was conducted among the backbone network (Nopre-pigcs), pre-trained weights (pigcs), and our two-stage trained weights (pre-pigcs).The mAP values were 84.81% for Nopre-pigcs, 92.60% for pigcs, and 95.73% for pre-pigcs, respectively.The two-stage model exhibited a 10.92% and Agriculture 2024, 14, x FOR PEER REVIEW 13 3.13% higher mAP value compared to the backbone network and pre-trained we models, respectively, indicating enhanced recognition precision.Figure 11 display AP values for each pig when tested with the different models.Cs3 exhibited the la increase in AP values, from 85.5% to 95.6%, while cs8 showed the smallest increase, all AP values at 87.65%.The two-stage model demonstrated notable stability and gen izability.Compared to the two-stage model, other models exhibited errors in pig face feature learning, including misidentifying pig cs2 as cs1 (top of Figure 13), failing to detect pig cs3 (middle of Figure 13), and mistaking pig cs7 for cs8 (bottom of Figure 13).Factors like uncleaned pig faces or side views may contribute to misdetections and non-detections.The two-stage model significantly mitigates these issues, enhancing recognition accuracy for pigs with facial obstructions or side profiles.
representing different pig IDs and matrix entries reflecting the count of recognized pig faces.For instance, a "1" in the matrix's third column and second row signifies that the model once misidentified cs7 as cs3.The test results indicate that the model accurately recognizes most pig faces.Cs7 exhibited the poorest recognition performance, being most frequently misidentified as other pigs.Compared to the two-stage model, other models exhibited errors in pig face featu learning, including misidentifying pig cs2 as cs1 (top of Figure 13), failing to detect p cs3 (middle of Figure 13), and mistaking pig cs7 for cs8 (bottom of Figure 13).Factors li uncleaned pig faces or side views may contribute to misdetections and non-detection The two-stage model significantly mitigates these issues, enhancing recognition accura for pigs with facial obstructions or side profiles.

Conclusions and Future Directions
This study employs pig face photos from pig farms and videos from JD Finance's p face recognition competition as datasets.These datasets undergo screening via d-hash a are enhanced using both online and offline data augmentation methods.Various mod and transfer learning approaches are employed to evaluate the learning efficacy of p

Conclusions and Future Directions
This study employs pig face photos from pig farms and videos from JD Finance's pig face recognition competition as datasets.These datasets undergo screening via d-hash and are enhanced using both online and offline data augmentation methods.Various models and transfer learning approaches are employed to evaluate the learning efficacy of pig face features, leading to the following conclusions: Pig facial recognition technology has exhibited significant potential across diverse applications in the livestock industry, including precision breeding, disease detection, automated feeding systems, and individual behavioral analysis.The precise identification of individual pigs facilitates meticulous breeding management, thereby elevating breeding efficiency and overall profitability.Additionally, the application of pig facial recognition in disease detection allows for early identification, effectively curbing the spread of diseases within the livestock population.The implementation of automated feeding systems, integrating pig facial recognition for personalized feeding plans, contributes to heightened feeding efficiency and a reduction in feed wastage.Moreover, the technology's capability to analyze individual behaviors offers robust support for behavioral studies and welfare assessments, actively advancing the digital transformation and intelligent management of the livestock industry.
In future research, our primary focus will center on the seamless integration of pig facial feature learning technology with performance data and DNA genetic information.This involves creating a distinct identity for each pig using pig facial feature learning technology, establishing a comprehensive database management system, and gathering diverse data, including health status, reproductive performance, DNA sequence data, etc.By amalgamating these data sources, we aim to ensure accurate tracking of performance data and genetic information for each pig, providing a more reliable means for confirming individual pig identities.This approach not only assists in the precise selection of pigs with desirable genetic traits but also offers novel perspectives and tools for animal breeding and health management, enhancing the scientific foundation for decision making.Additionally, to address common occlusion challenges in practical scenarios, we plan to explore and introduce innovative methods and models.Combining multimodal data, such as visible light images, infrared images, and depth images, will enhance the robustness of recognition models against occlusion scenarios.Simultaneously, leveraging generative adversarial networks to generate missing parts of occluded facial features, coupled with applying self-attention mechanisms to learn the correlation between different facial regions, is expected to significantly improve recognition performance in occlusion scenarios.The comprehensive application of these methods, alongside the latest deep learning technologies, aims to achieve outstanding results for pig facial recognition technology in complex scenarios involving occlusion.Finally, lightweight models often achieve efficient inference by reducing parameter quantity and model complexity, potentially leading to a decrease in accuracy.Especially in complex real-world scenarios, lightweight models may not adapt as well to factors such as occlusion, lighting changes, and diverse environments as larger models.This is particularly crucial in pig facial recognition, where environments like farms may have varying conditions.We will investigate methods such as optimizing model structures, improving training strategies, and introducing adaptive mechanisms to allow models to dynamically adjust their parameters in response to changing environments.These ongoing research efforts and innovations aim to ensure that pig facial recognition technology operates stably and efficiently in various application scenarios, achieving intelligent and precise management in the breeding industry and making significant contributions to animal welfare and public health safety.Furthermore, it injects new vitality into the sustainable development of the agricultural industry chain in the era of digitalization in animal husbandry.
(a) Pig-face with back light (b) Pig-face with no light (c) Pig-face with shaded (d) Pig-face on the side (e) Pig-face with unwashed

Figure 1 .Figure 1 .
Figure 1.The natural environment of the pigsty.
(a) Pig-face with back light (b) Pig-face with no light (c) Pig-face with shaded (d) Pig-face on the side (e) Pig-face with unwashed
Dataset A, used to train the model, consisted of 1273 images of pigs numbered 1-21 from Shanxi Agricultural University and 1418 images of 30 pigs from the JD dataset.Dataset B, used for model testing, included 457 images of pigs numbered 22-29 from Shanxi Agricultural University.Datasets A and B were each divided into training, validation, and test sets in a 7:1:2 ratio.The training and validation sets were utilized for model training and parameter adjustment, while the test set evaluated the model's performance.The training set data were labeled using LabelImg (1.8.1) software, and the annotated data were stored in the VOC dataset format.2.3.Data Augmentation Data augmentation plays a pivotal role in improving model generalization, alleviating overfitting, expanding data volume, and bolstering model robustness.Common data augmentation techniques encompass rotation, flipping, and brightness adjustment.Rotation enables the model to learn features that remain effective at various angles, thereby enhancing invariance to rotational transformations and diminishing the risk of overfitting to training data.Flipping assists the model in comprehending and learning the symmetry of images, allowing it to adapt to different perspectives and orientations encountered in real-world scenarios.Brightness adjustment, achieved by modifying image brightness, simulates diverse lighting conditions, aiding the model in better adapting to and managing dynamic lighting changes in real-world situations.Moreover, augmenting training data through data augmentation effectively elevates model performance.Particularly in scenarios with relatively small datasets, these positive effects render data augmentation a crucial strategy for bolstering the robustness and generalization performance of deep learning models.Data augmentation can be categorized into online and offline methods.To ensure effective model training, this study employs a combination of both online and offline data augmentation techniques.Online data augmentation involves augmenting data concurrently with model training.The advantage of this method lies in not requiring synthesized data, thereby saving storage space and offering high flexibility.Offline data augmentation involves enhancing and generating images prior to training the model.In this study, the training set of dataset A, featuring pig faces numbered 1-21 from Shanxi Agricultural University, was enriched using offline data augmentation techniques such as rotation, flipping, and brightness enhancement (Figures 3 and 4).Data augmentation plays a pivotal role in improving model generalization, alleviating overfitting, expanding data volume, and bolstering model robustness.Common data augmentation techniques encompass rotation, flipping, and brightness adjustment.Rotation enables the model to learn features that remain effective at various angles, thereby enhancing invariance to rotational transformations and diminishing the risk of overfitting to training data.Flipping assists the model in comprehending and learning the symmetry of images, allowing it to adapt to different perspectives and orientations encountered in real-world scenarios.Brightness adjustment, achieved by modifying image brightness, simulates diverse lighting conditions, aiding the model in better adapting to and managing dynamic lighting changes in real-world situations.Moreover, augmenting training data through data augmentation effectively elevates model performance.Particularly in scenarios with relatively small datasets, these positive effects render data augmentation a crucial strategy for bolstering the robustness and generalization performance of deep learning models.Data augmentation can be categorized into online and offline methods.To ensure effective model training, this study employs a combination of both online and offline data augmentation techniques.Online data augmentation involves augmenting data concurrently with model training.The advantage of this method lies in not requiring synthesized data, thereby saving storage space and offering high flexibility.Offline data augmentation involves enhancing and generating images prior to training the model.In this study, the training set of dataset A, featuring pig faces numbered 1-21 from Shanxi Agricultural University, was enriched using offline data augmentation techniques such as rotation, flipping, and brightness enhancement (Figures3 and 4).

Figure 3 .
Figure 3. Effects of data augmentation.Figure 3. Effects of data augmentation.

Figure 3 .
Figure 3. Effects of data augmentation.Figure 3. Effects of data augmentation.Agriculture 2024, 14, x FOR PEER REVIEW 6 of 18

Figure 4 .
Figure 4. Before-and-after comparison of the number of data augmentation.

Figure 4 .
Figure 4. Before-and-after comparison of the number of data augmentation.
VOC2007 dataset.The VOC2012 dataset comprises 20 classes, with 11,530 images for ing and validation and 27,450 annotated objects in regions of interest.The PASCAL is renowned for its standardized datasets in image recognition and classification model was initialized using pre-trained weights derived from training on the VOC dataset.The second stage focuses on training the collected pig face dataset.Transfer l ing is applied to the weight file from the first stage, followed by model testing.

Figure 9 .
Figure 9. Curve of mAP and loss on validation set during training of different models (a) is cur of mAP.(b) is curve of loss.

Figure 9 .
Figure 9. Curve of mAP and loss on validation set during training of different models (a) is curve of mAP.(b) is curve of loss.

Figure 10 .
Figure 10.PR graphs with different attention mechanisms are introduced.

Figure 10 .
Figure 10.PR graphs with different attention mechanisms are introduced.

Figure 11 .
Figure 11.AP values for each pig using different methods.

Figure 12
Figure 12 illustrates the confusion matrix of the two-stage model, with axis l representing different pig IDs and matrix entries reflecting the count of recognized faces.For instance, a "1" in the matrix's third column and second row signifies tha model once misidentified cs7 as cs3.The test results indicate that the model accur recognizes most pig faces.Cs7 exhibited the poorest recognition performance, being frequently misidentified as other pigs.

Figure 11 .
Figure 11.AP values for each pig using different methods.

Figure 12
Figure 12 illustrates the confusion matrix of the two-stage model, with axis labels representing different pig IDs and matrix entries reflecting the count of recognized pig faces.For instance, a "1" in the matrix's third column and second row signifies that the model once misidentified cs7 as cs3.The test results indicate that the model accurately recognizes most pig faces.Cs7 exhibited the poorest recognition performance, being most frequently misidentified as other pigs.Compared to the two-stage model, other models exhibited errors in pig face feature learning, including misidentifying pig cs2 as cs1 (top of Figure13), failing to detect pig cs3 (middle of Figure13), and mistaking pig cs7 for cs8 (bottom of Figure13).Factors like uncleaned pig faces or side views may contribute to misdetections and non-detections.The two-stage model significantly mitigates these issues, enhancing recognition accuracy for pigs with facial obstructions or side profiles.

Figure 13 .
Figure 13.False recognition results of pig faces compared with two-stage models ((a,c,e) is the tw stage model correct recognition result graph, (b,d,f) is other model error recognition result map)

Figure 13 .
Figure 13.False recognition results of pig faces compared with two-stage models ((a,c,e) is the two-stage model correct recognition result graph, (b,d,f) is other model error recognition result map).

Table 2 .
Evaluation of the learning effect of pig face features by different models.

Table 2 .
Evaluation of the learning effect of pig face features by different models.

Table 3 .
Comparison of the effects of using backbone network and using pre-trained weights on pig face feature.

Table 4 .
Comparison of parameters of pig face feature learning results with different attention mechanism models.

•
The pig face feature recognition model built on YOLOv8 demonstrates an impressive recognition accuracy of 97.73% on the test set, with 11.155 million parameters, an average detection speed of 13.032 milliseconds, and a model size of 42.7 million.YOLOv8 stands out for its concise parameter count, minimal computational requirements, compact model size, and exceptional performance, making it a valuable benchmark for upcoming endeavors in lightweight, non-contact, intelligent pig face feature recognition;• Without modifying the model structure, four attention mechanisms-CA, CBAM, SE, and ECA-were integrated.Among these, CA emerged as the most effective in augmenting the pig face feature recognition model's performance.The inclusion of CA led to a 0.3 percentage point enhancement in the model's mAP value on the test set.These findings indicate that the attention module, particularly CA, intensifies the focus on crucial pig face features, thereby enhancing the model's efficiency in feature recognition; • This study introduces a two-stage transfer learning approach to enhance the accuracy of pig face recognition models in challenging environments.Initially, a YOLOv8 network was pre-trained through a two-stage transfer learning process on the Pascal VOC2012 dataset.Subsequently, the feature extraction network underwent fine-tuning on dataset A, resulting in a task-specific pre-trained network enriched with pertinent features for pig face recognition.Finally, this task-specific network underwent additional fine-tuning on dataset B, yielding the ultimate pig face recognition model.The refined model demonstrated precise recognition of eight test pigs, achieving an impressive mAP value of 95.73%.Particularly noteworthy is the improvement in the AP value for cs4 from 85.5% to 95.6%, showcasing robust performance in the presence of noise, occlusion, and other disturbances.