Next Article in Journal
Dynamic Tuning and Multi-Task Learning-Based Model for Multimodal Sentiment Analysis
Previous Article in Journal
Intelligent Transaction Scheduling to Enhance Concurrency in High-Contention Workloads
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Lightweight Domestic Pig Behavior Detection Based on YOLOv8

School of Information Science and Engineering, Shandong Agricultural University, Tai’an 271018, China
*
Author to whom correspondence should be addressed.
Appl. Sci. 2025, 15(11), 6340; https://doi.org/10.3390/app15116340
Submission received: 19 April 2025 / Revised: 28 May 2025 / Accepted: 29 May 2025 / Published: 5 June 2025

Abstract

The prevalence and magnitude of extensive domestic pig breeding in China are rising, and behavioral assessment of these pigs is essential for enhancing production efficiency. The existing behavior identification technique for domestic pigs is computationally intensive, making it challenging to use on edge devices. This study introduces a lightweight method for identifying domestic pig behavior, YOLOv8-PigLite, derived from YOLOv8. Initially, a novel two-branch bottleneck module is developed within the C2f module, incorporating average pooling and deep convolution (DWConv) in one branch, while the other branch utilizes maximum pooling and DWConv to augment multi-scale feature representation. Subsequently, a Grouped Convolution module is integrated into the convolution framework, followed by incorporating the SE module to diminish the recognition error rate further. Ultimately, we implement BiFPN in the neck network to replace the original FPN, which streamlines the neck network and enhances its feature-processing capabilities. The test findings indicated that, in comparison to the original YOLOv8n model, the precision, recall, and mean average precision at 50% remain constant, while the parameters and floating-point computations are diminished by 59.80% and 39.50%, respectively. Additionally, the FPS has increased by 32.61%, and the model’s generalizability has been validated on public datasets.

1. Introduction

Pork serves as the primary source of animal protein in China, and its quality is intricately linked to the nation’s health. Therefore, it is essential to evaluate the behavior of the domestic pig to guarantee the quality and safety of pork. Artificial farming, a complex and enduring agricultural method, poses significant obstacles because of its labor-intensive and insufficiently mechanized characteristics [1]. Consequently, developing a model for recognizing domestic pig behavior is very significant.
The advent of target detection algorithms enables the rectification of these deficiencies, with current techniques classified into traditional and deep learning-based categories. Conventional target detection algorithms predominantly depend on manually crafted features and traditional machine learning methods, utilizing sliding windows or region segmentation to scan the image and then employing manually extracted features to define the target’s appearance. Such algorithms exhibit restricted generalization capability, and the sliding window approach results in computational inefficiency [2].
Deep learning algorithms for target detection fall into two categories: two-stage and one-stage [3]. Two-stage algorithms, such as R-CNN [4], Fast R-CNN [5], and Faster R-CNN [6], employ region proposal methods to identify candidate regions. These zones are utilized for classification and bounding-box regression, resulting in high accuracy but reduced speed. Single-stage algorithms (e.g., YOLO [7] and SSD [8]) immediately anticipate target categories and locations from the image, omitting the region recommendation phase, which enhances speed and is conducive to real-time applications.
The second method of deep learning for domestic pig behavior recognition has attracted considerable interest from researchers in recent years. The multi-type pig behavior recognition method has been a prominent research area in pig behavior identification, owing to its improved adaptability and proficiency in accurately recognizing various common pig actions and behaviors [9]. Wang et al. [10] proposed an instance classification method inspired by the image grid concept in YOLO, transforming image instance segmentation into a classification problem. Lizhong Dong [11] introduced a method for identifying pig behavior by leveraging the animal’s gestures and temporal characteristics. This approach integrated the OpenPose algorithm with YOLOv5s to detect and estimate the pig’s gestures, analyzing 20 main body regions. The twenty essential components of the pig’s anatomy are examined, and the skeleton is removed to identify three behaviors: standing, walking, and lying. Shuqin Tu et al. [12] created a system for behavior recognition and tracking of group-reared pigs with the YOLOv5s depth model in conjunction with the enhanced DeepSort algorithm. Utilizing the outcomes of the YOLOv5s model classification, the enhanced DeepSort algorithm was employed to facilitate behavior identification throughout the tracking process, thereby alleviating the issue of frequent identity number fluctuations resulting from the overlapping and obscuring of pigs. Shao et al. [13] employed YOLOv5 to isolate individual pigs from herd photos, integrated it with the DeepLab v3+ semantic segmentation technique to delineate pig outlines, and identified standing, prone, side-lying, and exploratory behaviors utilizing a deep separable convolutional network. Ji et al. [14] employed the enhanced YOLOX to identify pig postural behaviors and implemented a streamlined copy–paste and label-smoothing technique to effectively address the category imbalance issue resulting from a scarcity of sitting examples in the dataset.
Modern behavior recognition methods using deep learning have improved in accuracy, but they often use complex models that are heavy and require a lot of resources, making it difficult to use them in real-time on smaller swine farms [15]. This paper offers a lightweight pig behavior identification system, YOLOv8-PigLite, derived from YOLOv8, leveraging the high detection accuracy of the YOLOv8n model and its efficacy in detecting multi-scale targets. This paper presents a two-branch bottleneck module within the C2f module, comprising one branch utilizing average pooling and deep convolution (DWConv), and the other employing maximum pooling and DWConv, aimed at enhancing multi-scale feature representation. Additionally, we substitute conventional convolution in the convolution module with integration into the SE module to further diminish the recognition error rate. Furthermore, we implement BiFPN to replace the original FPN in the neck network, thereby streamlining the network structure and augmenting feature-processing capabilities, resulting in a lightweight, high-precision, real-time performance system with exceptional robustness and generalization ability. We present a behavior recognition technique for farmed pigs exhibiting exceptional generalization capability.

2. Dataset Preparation

This study’s dataset is derived from the subsequent dataset, Pig Behavior Recognition Dataset (PBRD) [16], in order to validate the model’s efficacy. The dataset comprises 11,862 photos in six categories: Lying, Sleeping, Investigating, Eating, Walking, and Mounted. The training set comprises 9344 photos, while the validation set contains 2518 images. The category-wise sample distribution of the PBRD dataset is shown in Figure 1a.
To verify the superiority of the proposed algorithm in pig behavior recognition, two publicly available datasets from Roboflow, Comportamentos and Behavior_Pig were integrated to form a new external dataset named Comp. This dataset contains a total of 8103 images covering behavior categories: Lying, Sleeping, Eating, Walking, Mounted, and Drinking. The category-wise sample distribution of the Comp dataset is illustrated in Figure 1b.

3. Principles of Algorithms

3.1. YOLOv8

The YOLO (You Only Look Once) algorithm family is significant in target recognition and has attained considerable success in computer vision in recent years. YOLOv8 represents the subsequent significant enhancement to YOLOv5, which was made opensource by Ultralytics on 10 January 2023 [17], and presently accommodates picture classification, object identification, and instance segmentation tasks. YOLOv8 amalgamates the advantages of the YOLOX, YOLOv6, YOLOv7, and PPYOLOE algorithms, perpetuating the real-time efficiency of the series while enhancing accuracy, speed, and user-friendliness.
YOLOv8 substitutes the C3 module of YOLOv5 with the C2f module in the backbone network, enhancing feature extraction through the bottleneck block and SPPF module. The neck adheres to the PANet framework by dynamically modifying feature fusion weights and decreasing the channel count in certain convolutional layers to minimize computational load and enhance the contribution ratio of features across various layers. The head transitions to an anchor-free design and a decoupled head structure, integrated with dynamic label assignment (Task-Aligned Assigner) [18], to minimize weight and computational demands while ensuring real-time performance. Therefore, the method is appropriate for the behavioral recognition of domestic pigs.

3.2. YOLOv8-PigLite

3.2.1. C2f-DualDW

The YOLOv8n network architecture comprises numerous C2f modules that markedly enhance detection accuracy and real-time performance via efficient multi-level feature extraction and fusion, optimized computations to improve gradient flow, and adaptable configurations for various model sizes; thus, the efficacy of the entire network is intrinsically connected to the feature learning capabilities of the C2f modules [19]. The traditional Standard Convolution in C2f operates concurrently on the spatial dimensions (width and height) and the channel dimension of the input feature map; however, it necessitates substantial computational resources, complicating the assurance of inference speed on resource-constrained devices (e.g., mobile). In this research, we propose a novel bottleneck module, DualDW, to supplant the bottleneck structure in the original C2f module (e.g., Figure 2), therefore decreasing the number of model parameters and computational demands while ensuring model performance.
The DualDW module uses pooling techniques to execute 2D pooling on the feature map [H, W, C/2] post-splitting, downsampling the spatial dimensions (width and height) to produce a feature map with the dimensions [H/2, W/2, C/2]. Maximum 2D pooling (MaxPool) efficiently reduces dimensionality and identifies significant features, yet it sacrifices contextual details and inadequately models background information. Conversely, average 2D pooling (AvgPool) preserves overall regional information and smooths data, albeit at the expense of blurring critical details. Therefore, we have devised a dual-branch structural design, wherein the two parallel branches employ average pooling and maximum pooling, respectively, thereby enhancing the DualDW module’s capacity for multi-scale feature representation.
The DualDw module substitutes the conventional Standard Convolution with Depthwise Convolution (DWConv). DWConv is less computationally demanding than Standard Convolution, facilitates real-time processing, and is appropriate for lightweight models. As seen in Figure 3, when a H × W × C_in feature map is inputted, the Standard Convolution employs a K × K convolution kernel that convolves across all channels of the input feature map, resulting in a single output channel, while a total of C_out H′ × W′ convolution kernels are produced [20]. Upon completion of the Standard Convolution, the parametric quantity P1 and the computational amount C1 can be determined using Equation (1):
P1 = k × k × C_in × C_out
C1 = H′ × W′ × k × k × C_in × C_out
DWConv is a channel-wise convolution of the input feature map [21], wherein each input channel utilizes a distinct convolution kernel to produce an output channel, with the output being the summation of C_in channels. Upon completion of the deep convolution, the parametric quantity P2 and the computational amount C2 can be determined using Equation (2):
P2 = k × k × C_in
C2 = H′ × W′ × k × k × C_in
The computational load and parameter count of DWConv are only 1/C_out of that of Standard Convolution, which reduces the model’s computational and parametric requirements while maintaining its feature expression capacity.

3.2.2. GSConvSE Convolution Module

The convolutional layer represents the principal source of computational and parametric requirements in the YOLOv8n model, while the convolutional operations in the YOLOv8n convolutional module utilize Standard Convolution, leading to excessive full-channel interactions on devices with constrained memory. The pig behavior dataset includes complex and varied surroundings, as well as a range of activities, such as crawling (one pig resting its forelimbs on another) and sleeping (lying on its side or stomach), which are clearly differentiated and easily recognizable. Specific actions, like walking and exploring, are shown by the pigs standing up, and they differ because the legs move when walking, while the head is used for sniffing when exploring; these details need careful observation of features, and understanding the overall context is important for telling different behaviors apart. The dataset’s properties require that the model possess efficient computational abilities and robust feature extraction capabilities. Motivated by the MobileNet [22] and ShuffleNet [23] architectures, this paper presents a novel convolution module, GSConvSE, which diminishes the parameter count and computational load through Group Convolution. Simultaneously, it employs the SE mechanism to integrate global contextual information, addressing the inadequacy of information interaction among the Grouped Convolution units. Figure 4 illustrates its structure.
Group Convolution employs the split–transform–merge paradigm [24], seen in Figure 5, which partitions the input into several groups, restricts information interchange across distinct groups, and ultimately amalgamates them, therefore diminishing the parameter count and computing burden. The computational and parameter quantities for a conventional convolution, when processing a feature map of dimensions H × W × C_in, are delineated in Equation (1). Grouped Convolution partitions the input channels into G groups, wherein the channels in each group convolve exclusively with the convolution kernels of that group, resulting in the associated output channels. Each segment of the input feature map comprises C_in/G channels, while the output feature map consists of C_out/G channels. The total number of groups of convolutional kernels is G; so, following the execution of Grouped Convolution, the parameters P3 and computing cost C3 may be determined using Equation (3). The formula indicates that the computational and parameter quantities of Grouped Convolution are only 1/G of those of conventional convolution.
P3 = k × k × C_in × C_out/G
C3 = H′ × W′ × k × k × C_in × C_out/G
Grouped Convolution can significantly reduce the computation and number of parameters, but the lack of direct interaction between groups leads to a lack of global contextual information. The SE mechanism [25] compensates for the lack of inter-channel information interaction by introducing a global channel-dependent perspective to the output of the Grouped Convolution module through global average pooling (Squeeze) and channel weight learning (Excitation). At the same time, the computational overhead of SE is extremely low, much smaller than that of the fully connected layer, which is suitable for lightweight design.

3.2.3. BiNeck Network

In YOLOv8, the neck module retains the PAN-FPN architecture; however, the singular feature transfer of PAN-FPN presents limitations that fail to adequately address the intricate requirements of pig behavior recognition tasks. The pig dataset encompasses both close-up images of individual pigs (characterized by large targets and abundant details) and distant images of groups of pigs (featuring small targets susceptible to occlusion), necessitating the accommodation of multi-scale variations.
The Feature Pyramid Network (FPN) improves the basic features by combining information from higher layers while limiting how much location information is passed to those higher layers (Figure 6a) [26]. Figure 6b shows that the Path Aggregation Network (PANet) makes the Feature Pyramid Network (FPN) better by adding a bottom-up approach that focuses on basic location details; however, it does not use higher-level meaning information very well. Neural Architecture Search for Feature Pyramid Network (NAS-FPN) uses a method called Neural Architecture Search (NAS) to automatically find the best design for a feature pyramid (Figure 6c), which can discover effective ways to combine features. However, it necessitates substantial computational resources and time for architecture exploration, rendering it impractical for real-world applications. To address the multi-scale and lightweight demands of pig behavior detection, we utilize the Bidirectional Feature Pyramid Network (BiFPN). The BiFPN changes how much high-level and low-level information it uses by combining features in both directions, which helps to better detect detailed features of pigs that are close up and small targets that are far away, all while being efficient with computing resources. The near-view pig’s characteristics and the far-view pig’s modest target detection challenge ensure improved performance while emphasizing computational efficiency.

3.3. Experimental Platform and Model Testing Metrics

3.3.1. Experimental Setting

The operating system utilized in this experiment was Ubuntu 20.04, and the hardware specifications included an Intel® Xeon® Platinum 8362 CPU and an RTX 3090 GPU with 24 GB of memory. The computer language utilized was Python 3, specifically version 3.8, while the deep learning framework employed was PyTorch, version 2.0.0, with CUDA 11.8 support enabled; all comparison algorithms were executed within the same environment. The grid training parameters were formulated in accordance with the experimental conditions, as presented in Table 1.

3.3.2. Evaluation Indicators

Subsequent to model training, the model is refined by assessing its efficacy to attain the desired detection outcomes. This experiment employs precision, recall, and mean average precision at 50 (mAP50) as metrics for assessing the model’s detection accuracy, while the number of parameters and GFLOPs are utilized to evaluate the model’s complexity.
P r e c i s i o n = T P T P + F P
Equation (4) represents the mathematical formulation for the precision rate, whereby TP (true positives) denotes the count of targets accurately predicted inside the category, and FP (false positives) signifies the count of targets for which the prediction is erroneous.
R e c a l l = T P T P + F N
Equation (5) represents the mathematical formulation of the recall rate, whereas FN (false negatives) denotes the quantity of true targets that remain undetected.
A P = k = 1 N R k R k 1 P k
m A P 50 = 1 C c = 1 c A P c
Equation (6) represents a weighted estimation of the area beneath the precision and recall curves, while Equation (7) denotes the average mean precision calculated at an intersection-over-union (IoU) threshold exceeding 0.50. N is the number of ranked predictions, recall measures how well the top k ranked predictions perform, and precision shows how accurate the top k ranked forecasts are; the average precision for class c is calculated when the IoU is 0.5 or higher.

4. Experiments and Analysis

4.1. Confusion Matrix

The confusion matrix is a tool used to evaluate how well a classification model works by comparing its predictions to the real results, with columns showing predicted categories and rows showing actual categories [27].
Figure 7 presents the confusion matrix analysis of the proposed YOLOv8-PigLite model for the pig behavior classification task. Figure 7a displays the absolute prediction counts, indicating that the model demonstrates excellent classification performance for behaviors such as Sleeping (with 1900 correct predictions), Mounted (434), and Lying (466), showcasing its strong ability to accurately distinguish these categories. However, some misclassifications are observed in categories such as Investigating and Eating, where visual similarity and complex image backgrounds occasionally cause samples to be confused with the background or other behavior types. Figure 7b shows the normalized confusion matrix, further emphasizing the model’s high detection accuracy across most categories. These results confirm the effectiveness and reliability of the YOLOv8-PigLite model.

4.2. Experimental Results

The training set inputs of YOLOv8n and the improved method YOLOv8-PigLite are utilized to compare the convergence of loss values. Figure 8 illustrates that the convergence rate of the enhanced algorithm surpasses that of the original model, demonstrating that the methodology proposed in this study not only diminishes model complexity and enhances detection performance but also offers a notable advantage in convergence speed.
To assess the detection efficacy of the lightweight YOLOv8-PigLite technique, we present a series of graphs depicting loss and performance parameters during the training and validation phases. This study evaluates three categories of losses: bounding-box loss (box_loss), classification loss (cls_loss), and distribution focus loss (dfl_loss). The bounding-box loss assesses the positional difference between the predicted and reference bounding boxes, the classification loss evaluates the variance between the predicted and actual categories, and the distribution focus loss improves the box loss by refining the probability distribution of the bounding-box position. The initial three columns of the Figure 9 exhibit a notable alteration in the loss levels by the 50th iteration. Following 150 iterations, box_loss decreases to 0.41 and cls_loss reduces to 0.24, demonstrating a comparable trend in the validation set and signifying the model’s efficacy in target localization and classification tasks. The value of the loss function currently oscillates within a narrow range, indicating convergence and improved outcomes.
The evaluation metrics in the final two columns of the Figure 9 indicate that, at the onset of training, precision, recall, and mAP exhibit varying degrees of volatility, primarily due to the initial randomization of model parameters and the heterogeneity of the training data. As iterations progress, these metrics exhibit consistent enhancement: precision rises from 0.01 to 0.96, recall advances from 0.09 to 0.95, and, notably, mAP@50 attains a substantial value of 0.98, thereby affirming the model’s exceptional accuracy and robustness in detecting domestic pig behavior.

4.3. Heat Map Analysis

To further analyze the model’s detection performance and visually demonstrate the improvement effects, in this paper, three behavioral images of domestic pigs were selected for heat map visualization. The heat map is a visualization tool that can visually present the distribution of data, the area of interest of the model, and the importance of features in a color-changing manner. In the heatmap, each data point is assigned a specific color, with the shade of the color reflecting the data intensity at that location. We use warm colors (e.g., red) to represent high-intensity areas and cool colors (e.g., blue) to represent low-intensity areas. In this paper, a gradient-based CAM (Grad-CAM) method is used to obtain the heat map of the model [28] in order to reveal the model’s focus patterns on the behavioral characteristics of domestic pigs.
The heatmap comparison in Figure 10 demonstrates that, compared to YOLOv8n, the improved YOLOv8-PigLite model exhibits significant enhancements in attention distribution. In the first column images, the improved model concentrates attention on the pig’s head and limbs, producing clearer contours that markedly improve the recognizability of pig behaviors. In the second column images, where targets are smaller and inverted, increasing recognition difficulty, the improved model shows a substantial increase in high-intensity red regions, effectively enhancing target visibility and recognition accuracy. In the third column images, for mounting behaviors, YOLOv8-PigLite focuses on the pig’s head and limbs, enabling the precise identification of behaviors such as “one pig placing its forelimbs on another’s back.” In contrast, YOLOv8n’s heatmaps reveal a more scattered attention distribution, with non-critical areas (e.g., the pig’s back) exhibiting high activation, indicating significant interference from background noise during feature extraction.
Overall, the YOLOv8-PigLite heatmaps show a notable improvement in the boundary clarity of activation regions and more concentrated coverage of high-intensity red areas, robustly validating the model’s superior performance in perceiving behavioral features. The improved model effectively suppresses responses in non-critical areas, significantly reducing background noise interference, as evidenced by the increased proportion of cooler (low-intensity) regions in the heatmaps, thereby enhancing the purity of feature extraction. Furthermore, the improved boundary clarity in the heatmaps reflects substantial advancements in YOLOv8-PigLite’s spatial focus and information discrimination capabilities. In summary, the heatmap results not only visually demonstrate the enhanced detection performance of the improved model but also confirm its ability to achieve synergistic optimization of detection accuracy and inference efficiency under a lightweight design, providing a solid visual foundation for further model optimization.

4.4. Contrast Experiment with Mainstream Algorithm

To further validate the superiority of the YOLOv8-PigLite algorithm, a comparison was conducted with two-stage (Faster R-CNN) and single-stage deep learning-based target detection techniques (SSD, YOLOv3-tiny, YOLOv5n, YOLOv8n, and YOLOv9s) on domestic pig datasets.
The test results in Table 2 show that the YOLOv8-PigLite algorithm we are referring to in this paper is much better than the faster R-CNN and SSD algorithms when it comes to accuracy, parameter count, and floating-point computation. Although YOLOv8-PigLite achieves a recall rate of 96.3%, which is marginally lower, the difference is not statistically significant. The YOLOv8-PigLite algorithm works better than the YOLOv3-tiny, YOLOv5n, and YOLOv9s algorithms when it comes to precision, recall, mean average precision, parameter count, and floating-point operations. Compared to the standard algorithm YOLOv8n, YOLOv8-PigLite reduces the number of parameters and GFLOPs by 59.80% and 39.50%, respectively. This paper’s method, YOLOv8-PigLite, exhibits the lowest parameters and GFLOPs among the seven algorithms, with values of 1.21 M and 4.9 GFLOPs, respectively. It approaches the optimal metrics, indicating excellent accuracy and a lightweight design.

4.5. Statistical Analysis

To evaluate the statistical significance of the performance enhancement, we executed five separate training and testing sessions for each model using the Comp division dataset. The results of the statistical analysis show that the precision, recall, and mAP measures follow a normal distribution; therefore, we conducted paired t-tests for these metrics [29]. The t-test is a commonly used statistical significance test designed to evaluate whether the difference between the means of two groups is statistically significant. In significance testing, the p-value represents the probability of observing the current result, or a more extreme one, under the null hypothesis (i.e., no difference between the means of the two groups). Typically, the significance level (α) is set at 0.05, serving as the threshold for determining statistical significance. A p-value greater than 0.05 indicates that the observed difference is not statistically significant.
Table 3 compares the performance metrics of YOLOv8n and the improved model YOLOv8-PigLite, including precision, recall, and mAP@50%, using a t-test. The results show that the mean precision of YOLOv8-PigLite increased from 96.06 ± 0.30% to 96.38 ± 0.19%, with a p-value of 0.0054 (p < 0.05), indicating that this improvement is statistically significant. Conversely, the recall decreased from 94.60 ± 0.34% to 94.38 ± 0.33%, with a p-value of 0.0402 (p < 0.05), suggesting that this decline is also statistically significant. For the mAP@50% metric, the p-value of 0.6974 (>0.05) for both models indicates no significant difference. Additionally, YOLOv8-PigLite reduced the parameter count from 3.01 M to 1.21 M and GFLOPs from 8.1 to 4.9, markedly enhancing resource efficiency. In summary, the improved model demonstrates a significant increase in precision, a significant decrease in recall, and a notable advantage in resource efficiency.

4.6. Ablation Experiments

Ablation tests were conducted to illustrate the enhancement effect of C2f-DualDW, GSConvSE, and BiNeck on the lightweight algorithm described in this research.
Table 4 illustrates the following: Group 1 studies utilize the YOLOv8n algorithm as a benchmark model for comparison with later experiments. Group 2 experiments involve modifications to the C2f module solely based on the backbone network, resulting in a small accuracy enhancement, while the number of parameters and floating-point computations are decreased by 26.57% and 24.69%, respectively. Group 3 experiments enhance the convolution module based on Group 2, maintaining similar accuracy, with subsequent reductions in parameters and floating-point computations of 19.00% and 14.75%, respectively. Group 4 experiments focus on optimizing the neck network from Group 3, achieving a slight accuracy improvement, alongside reductions in parameters and floating-point computations of 32.40% and 5.76%, respectively. The data analysis indicates that each enhancement of the algorithm diminishes the computational demands and complexity of the model.
In addition, to further validate the effectiveness of the algorithm, we use the fixed frame rate method [30] to compare the frame rates (frames per second, FPS) of the benchmark algorithm and the improved algorithm, as shown in Equation (8).
F P S = 1000 t
In this context, t represents the time required for processing each frame, measured in milliseconds. Table 2 shows that the better algorithm YOLOv8-PigLite boosts the FPS by 32.61% compared to the standard algorithm, proving that the improved algorithm is more efficient.

4.7. Model Generalization Test

To further evaluate the advancement of the proposed algorithm in pig behavior recognition, the publicly available Comp dataset was used for additional experiments. This dataset extends the behavior categories of the current study by including the Drinking class, thereby covering more key pig behaviors and validating the generalization capability of the algorithm. The experimental results are presented in Table 5.
This paper’s approach, YOLOv8-PigLite, demonstrates superior performance metrics compared to the benchmark model, YOLOv8n, on the Comp dataset, even after undergoing lightweighting. Specifically, the precision (P) on the PBRD dataset exhibits a modest enhancement, although the recall (R) and mean average precision at 50 (mAP@50) remain rather stable; on the Comp dataset, the metrics for YOLOv8-PigLite are equivalent to those of YOLOv8n. The experimental findings conclusively demonstrate the superiority of this paper’s algorithm in domestic pig behavior recognition, encompassing a broader range of key behaviors while maintaining or enhancing performance and ensuring a lightweight model. This offers advantageous support for achieving efficient and precise identification of domestic pig behavior in practical applications.

5. Conclusions

This study addresses the challenges of excessive model parameters and high computational complexity in pig behavior recognition by proposing a lightweight algorithm based on YOLOv8n, named YOLOv8-PigLite. The algorithm introduces optimized improvements to the original model’s C2f module and convolution operations, and significantly reduces parameters in the neck network, thereby lowering overall computational cost while effectively maintaining detection accuracy. Additionally, a dual-branch structure and SE attention modules are incorporated to enhance the model’s ability to capture behavioral features, further reducing misclassification rates. Experimental results demonstrate that YOLOv8-PigLite achieves a significant reduction in both parameter count and floating-point operations compared to the baseline and other classical detection models, while maintaining strong generalization ability and recognition accuracy on a public pig behavior dataset. The model performs comparably to the original YOLOv8 in terms of precision and recall, while offering improved efficiency and stability. With its lightweight design and ease of deployment, YOLOv8-PigLite is particularly suitable for small- and medium-sized farms with limited computing resources. It enables real-time recognition and early warning of abnormal pig behaviors, effectively alleviating the inefficiencies of traditional manual monitoring, and holds substantial practical value and application potential.

Author Contributions

Methodology, K.Z.; software, Y.Z.; formal analysis, Y.Z.; investigation, K.Z.; writing—original draft preparation, K.Z.; writing—review and editing, H.X.; project administration, Y.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by funding from Shandong Agricultural University through a technical development contract titled “Development of Machine Learning-Based Swine Growth Prediction Technology” (Contract Registration No. 2024370901156798). All associated costs were fully covered by the funding provided under this contract.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available from the first author upon request. The source codes generated are available in the repository [https://github.com/nnz119/pig-lite] (accessed on 29 May 2025).

Acknowledgments

This research was conducted at Shandong Agricultural University. We express our sincere gratitude to Shandong Agricultural University for providing essential resources, including laboratory facilities and computational support, which greatly contributed to the completion of this study.

Conflicts of Interest

The authors declare no conflicts of interest. This paper represents the opinions of the authors and does not mean to represent the position or opinions of the Shandong Agricultural University.

References

  1. Dang, Y.; Wang, F.; Tian, J.; Xie, Z. Research on feature selection methods for pig posture recognition. Jiangsu Agric. Sci. 2016, 44, 448–451. [Google Scholar] [CrossRef]
  2. Song, W. Pig Target Detection and Status Analysis Based on Deep Learning. Master’s Thesis, Northeast Agricultural University, Harbin, China, 2019. [Google Scholar]
  3. Zheng, C.; Zhu, X.; Yang, X.; Wang, L.; Tu, S.; Xue, Y. Automatic recognition of lactating sow postures from depth images by deep learning detector. Comput. Electron. Agric. 2018, 147, 51–63. [Google Scholar] [CrossRef]
  4. Jie, L.; Xiaohui, H.; Jingbo, G. Lightweight field cotton grade detection based on YOLOv8. Comput. Eng. (accepted, in press). [CrossRef]
  5. Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014. [Google Scholar] [CrossRef]
  6. Girshick, R. [Fast R-CNN] Fast R-CNN. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015. [Google Scholar]
  7. Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed]
  8. Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar] [CrossRef]
  9. Liu, F.; Wu, W.; Liu, X.; Xinran, W.; Yaping, F.; Guoliang, L.; Xiaoyong, D. Research progress of computer vision and deep learning in pig recognition. J. Huazhong Agric. Univ. 2023, 42, 47–56. [Google Scholar] [CrossRef]
  10. Wang, X.L.; Kong, T.; Shen, C.H.; Jiang, Y.; Li, L. SOLO: Segmenting objects by locations. In Computer Vision—ECCV 2020; Springer International Publishing: Cham, Switzerland, 2020; pp. 649–665. [Google Scholar]
  11. Dong, L.; Meng, X.; Pan, M.; Yi, Z.; Yubin, L.; Xiang, G.; Honggang, L. Pig behavior recognition method based on posture and temporal features. Trans. Chin. Soc. Agric. Eng. 2022, 38, 148–157. [Google Scholar] [CrossRef]
  12. Tu, S.; Liu, X.; Liang, Y.; Zhang, Y.; Huang, L.; Tang, Y. Group-housed pig behavior recognition and tracking method based on improved DeepSORT. Trans. Chin. Soc. Agric. Mach. 2022, 53, 345–352. [Google Scholar]
  13. Shao, H.M.; Pu, J.Y.; Mu, J. Pig-posture recognition based on computer vision: Dataset and exploration. Animals 2021, 11, 1295. [Google Scholar] [CrossRef] [PubMed]
  14. Ji, H.Y.; Yu, J.H.; Lao, F.D.; Zhuang, Y.R.; Wen, Y.B.; Teng, G.H. Automatic position detection and posture recognition of grouped pigs based on deep learning. Agriculture 2022, 12, 1314. [Google Scholar] [CrossRef]
  15. Ge, S.; Ji, H.; Zhan, Y.; Li, X.; Zheng, W.; Wang, T. Lightweight pig posture recognition method based on improved YOLOv5s. J. China Agric. Univ. 2025, 30, 179–189. [Google Scholar]
  16. Zhang, N.N. Pig Behavior Recognition Dataset. Mendeley Data V1. 2025. [Google Scholar] [CrossRef]
  17. Terven, J.; Córdova-Esparza, D.-M.; Romero-González, J.-A. A Comprehensive Review of YOLO Architectures in Computer Vision: From YOLOv1 to YOLOv8 and YOLO-NAS. Mach. Learn. Knowl. Extr. 2023, 5, 1680–1716. [Google Scholar] [CrossRef]
  18. Feng, C.; Zhong, Y.; Gao, Y.; Scott, M.R.; Huang, W. TOOD: Task-aligned One-stage Object Detection. In Proceedings of the IEEE International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021. [Google Scholar] [CrossRef]
  19. Wei, J.; Wanhu, W.; Junjie, Y. AEM-YOLOv8s: Small Target Detection Algorithm for UAV Aerial Images. Comput. Eng. Appl. 2024, 60, 191–202. [Google Scholar]
  20. Jingtao, G.; Feng, L.; Huiting, Z.; Biao, Y.; Dayang, L. Strawberry target detection method based on transfer learning and lightweight YOLOv5s. J. Chin. Agric. Mech. 2025, 46, 253–260. [Google Scholar]
  21. Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet classification with deep convolutional neural networks. Commun. ACM 2017, 60, 84–90. [Google Scholar] [CrossRef]
  22. Ye, D.H.; Zikic, D.; Glocker, B.; Criminisi, A.; Konukoglu, E. [SqueezeNet] Squeezenet: Alexnet-Level Accuracy with 50X Fewer Parameters and <0.5MB Model Size. arXiv 2016, arXiv:1602.07360. [Google Scholar]
  23. Zhang, X.; Zhou, X.; Lin, M.; Sun, J. ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices. In Proceedings of the Computer Society Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018. [Google Scholar] [CrossRef]
  24. Xie, S.; Girshick, R.; Dollár, P.; Tu, Z.; He, K. Aggregated residual transformations for deep neural networks. In Proceedings of the 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, 21–26 July 2017. [Google Scholar] [CrossRef]
  25. Hu, J.; Shen, L.; Albanie, S.; Sun, G.; Wu, E. Squeeze-and-Excitation Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 42, 2011–2023. [Google Scholar] [CrossRef] [PubMed]
  26. Liu, J.Y.; Zhang, M.F. Automatic extraction of image water depth based on YOLOv5 algorithm. J. Fujian Norm. Univ. Nat. Sci. Ed. 2023, 39, 86–92. [Google Scholar]
  27. Wang, X.; Huang, J.; Tan, W.; Shen, Z. Object detection based on deep feature enhancement and path aggregation optimization. Comput. Sci. 2025, 24, 1–18. [Google Scholar]
  28. Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-CAM: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE International Conference on Computer Vision 2017, Venice, Italy, 22–29 October 2017; pp. 618–626. [Google Scholar]
  29. Yang, Q.; Yang, G.; Zhong, S. Hyperparameter optimization method for YOLO model based on orthogonal optimization strategy. Sci. Technol. Eng. 2025, 25, 1573–1579. [Google Scholar]
  30. Jiang, X.; Wang, R.; Ma, Y. Insulator defect detection based on lightweight improved RT-DETR edge deployment algorithm. Trans. Electr. Eng. 2025, 40, 842–854. [Google Scholar]
Figure 1. Category-wise sample distribution in PBRD and Comp datasets. (a) Category-wise sample distribution in PBRD datasets. (b) Category-wise sample distribution in Comp datasets.
Figure 1. Category-wise sample distribution in PBRD and Comp datasets. (a) Category-wise sample distribution in PBRD datasets. (b) Category-wise sample distribution in Comp datasets.
Applsci 15 06340 g001
Figure 2. C2f-DualDW structure.
Figure 2. C2f-DualDW structure.
Applsci 15 06340 g002
Figure 3. The principle of Standard Convolution and DWConv. (a) Standard Convolution. (b) DWConv.
Figure 3. The principle of Standard Convolution and DWConv. (a) Standard Convolution. (b) DWConv.
Applsci 15 06340 g003
Figure 4. GSConvSE module structure.
Figure 4. GSConvSE module structure.
Applsci 15 06340 g004
Figure 5. DWConv: the principle of Standard Convolution and Grouped Convolution. (a) Standard Convolution. (b) Grouped Convolution.
Figure 5. DWConv: the principle of Standard Convolution and Grouped Convolution. (a) Standard Convolution. (b) Grouped Convolution.
Applsci 15 06340 g005
Figure 6. Different Feature Pyramid Network designs. (a) FPN. (b) PANet. (c) NAS-FPN. (d) BiFPN.
Figure 6. Different Feature Pyramid Network designs. (a) FPN. (b) PANet. (c) NAS-FPN. (d) BiFPN.
Applsci 15 06340 g006
Figure 7. Confusion matrix results of the YOLOv8-PigLite model for pig behavior recognition. (a) Absolute confusion matrix. (b) Normalized confusion matrix.
Figure 7. Confusion matrix results of the YOLOv8-PigLite model for pig behavior recognition. (a) Absolute confusion matrix. (b) Normalized confusion matrix.
Applsci 15 06340 g007
Figure 8. Comparison of training effects at each stage of the model.
Figure 8. Comparison of training effects at each stage of the model.
Applsci 15 06340 g008
Figure 9. Plot of training effect at each stage of the model.
Figure 9. Plot of training effect at each stage of the model.
Applsci 15 06340 g009
Figure 10. Comparison of Gard-CAM visualization results of the models before and after improvement.
Figure 10. Comparison of Gard-CAM visualization results of the models before and after improvement.
Applsci 15 06340 g010
Table 1. Training parameter settings.
Table 1. Training parameter settings.
ParameterValues
Training batch size64
Number of iterations (epochs)150
Image size (Image_size)640
Initial learning rate0.01
Momentum0.937
Table 2. Comparative tests of different algorithms.
Table 2. Comparative tests of different algorithms.
ModelP %R %mAP@50Params/MGFLOPs
Faster R-CNN44.099.697.3136.75369.79
SSD72.999.297.724.0161.06
YOLOv8n96.095.598.63.018.1
YOLOv5n92.690.496.12.507.1
YOLOv3-tiny89.493.095.712.1318.9
YOLOv9s94.395.097.96.2022.1
YOLOv8-PigLite96.395.298.51.214.9
Table 3. t-test analysis of YOLOv8n vs. YOLOv8-PigLite performance on key metrics.
Table 3. t-test analysis of YOLOv8n vs. YOLOv8-PigLite performance on key metrics.
Precision (%)Recall (%)mAP@50 (%)Params (M)GFLOPs
YOLOv8n96.06 ± 0.3094.60 ± 0.3498.60 ± 0.303.018.1
YOLOv8-PigLite96.38 ± 0.1994.38 ± 0.3398.66 ± 0.131.214.9
p0.00540.04020.6974--
“-” indicates “no data”.
Table 4. Results of ablation experiments.
Table 4. Results of ablation experiments.
ModelP %R %mAP@50Params/MGFLOPsFPS
YOLOv8n96.095.598.63.018.146.45
+C2f-DualDW96.296.598.52.216.158.8
+GSConvSE95.996.298.61.795.259.14
+BiNeck96.395.298.51.214.961.6
Table 5. External dataset testing.
Table 5. External dataset testing.
DatasetYOLOv8nYOLOv8-PigLite
P %R %mAP@50P %R %mAP@50
PBRD96.095.598.696.395.298.5
Comp95.796.298.496.296.298.6
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhang, K.; Zhang, Y.; Xu, H. Lightweight Domestic Pig Behavior Detection Based on YOLOv8. Appl. Sci. 2025, 15, 6340. https://doi.org/10.3390/app15116340

AMA Style

Zhang K, Zhang Y, Xu H. Lightweight Domestic Pig Behavior Detection Based on YOLOv8. Applied Sciences. 2025; 15(11):6340. https://doi.org/10.3390/app15116340

Chicago/Turabian Style

Zhang, Kaining, Yu Zhang, and Hongli Xu. 2025. "Lightweight Domestic Pig Behavior Detection Based on YOLOv8" Applied Sciences 15, no. 11: 6340. https://doi.org/10.3390/app15116340

APA Style

Zhang, K., Zhang, Y., & Xu, H. (2025). Lightweight Domestic Pig Behavior Detection Based on YOLOv8. Applied Sciences, 15(11), 6340. https://doi.org/10.3390/app15116340

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop