An Enhanced YOLOv8n-Based Approach for Pig Behavior Recognition

Guo, Jianjun; Xu, Yudian; Lin, Lijun; Zhang, Beibei; Zhou, Piao; Luo, Shangwen; Zhuo, Yuhan; Ji, Jingyu; Luo, Zhijie; Cheng, Guangming

doi:10.3390/computers15040230

Open AccessArticle

An Enhanced YOLOv8n-Based Approach for Pig Behavior Recognition

by

Jianjun Guo

^1,†,

Yudian Xu

^1,†,

Lijun Lin

^1,†,

Beibei Zhang

¹,

Piao Zhou

¹,

Shangwen Luo

¹,

Yuhan Zhuo

¹,

Jingyu Ji

¹,

Zhijie Luo

^1,* and

Guangming Cheng

^2,*

¹

College of Artificial Intelligence, Zhongkai University of Agriculture and Engineering, Guangzhou 510225, China

²

China Electronic Product Reliability and Environmental Testing Research Institute, Guangzhou 511300, China

^*

Authors to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Computers 2026, 15(4), 230; https://doi.org/10.3390/computers15040230

Submission received: 10 March 2026 / Revised: 30 March 2026 / Accepted: 6 April 2026 / Published: 8 April 2026

(This article belongs to the Section AI-Driven Innovations)

Download

Browse Figures

Versions Notes

Abstract

Pig behavior statistics can reflect their health status. Conventional approaches depend on manual observation to derive behavioral information from video recordings, a process that demands substantial time and human effort. To overcome these limitations in indoor intensive farming environments, this study introduces an effective approach for recognizing pig behaviors, employing an enhanced YOLOv8n architecture. The approach utilizes advanced object detection algorithms to automatically identify pig behaviors, including stand, lie, eat, fight, and tail-bite, from overhead video footage of the enclosure. First, images of daily pig behaviors are collected using cameras to build a pig behavior dataset. To boost detection accuracy, the SE attention mechanism is embedded within the feature extraction backbone of the YOLOv8n network to enhance its representational capacity, strengthening the model’s capacity to grasp overarching contextual information and improve the expressiveness of extracted features. The GIoU loss function is employed during training to reduce computational cost and accelerate model convergence. Moreover, integrating Ghost convolution into the backbone significantly reduces both computational complexity and the total number of parameters. The experimental findings reveal that the optimized YOLOv8n model contains just 1.71 million parameters, marking a 42.93% reduction relative to the baseline model. Its floating-point operations total 5.0 billion, indicating a 38.27% decrease, while the mean average precision (mAP@50) reaches 96.8%, surpassing the original by 2.6 percentage points. Compared with other widely used YOLO-based object detection frameworks, the proposed approach achieves notably higher accuracy while requiring significantly lower computational resources and model complexity.

Keywords:

live pigs; behavior recognition; improved YOLOv8n; GIoU; ghost convolution; SE attention mechanism

Graphical Abstract

1. Introduction

In modern smart agriculture, animal behavior recognition technology plays a crucial role in enhancing agricultural efficiency and ensuring animal welfare [1]. Traditional animal behavior monitoring primarily relies on the experience of livestock keepers, who observe the animals’ daily activities to determine potential signs of illness. This method is inefficient due to its high time and labor demands and fails to support continuous, real-time surveillance. Moreover, in high-density farming environments, frequent contact between humans and animals increases the risk of disease transmission and poses potential threats to animal health. Some farms have adopted radio-frequency identification (RFID) technology by equipping pigs with smart ear tags. These tags interact with RFID signals in feeding areas to detect feeding behavior [2]. However, such invasive methods may not only raise farming costs but also induce additional stress in animals. At the same time, several studies have identified actions such as locomotion, feeding, resting, and upright postures in dairy cows [3]. Other research has collected acceleration data on cattle and sheep behavior using speed sensors and employed machine-learning algorithms [4,5] to classify five distinct behaviors: sitting, standing, walking, grazing, and ruminating. Nonetheless, these methods also have limitations. In contrast, non-contact behavior recognition based on computer vision technology offers an effective solution to the above issues and provides strong support for the development of intelligent livestock farming [6].

According to relevant studies, changes in animal behavior are closely associated with their health status [7,8]. When animals fall ill, they typically exhibit reduced food intake and lower activity levels [9] and may also lie for extended periods [10]. Moreover, when the illness affects internal abdominal organs, many animals tend to adopt a dog-sitting posture to relieve intra-abdominal pressure. Climbing behavior may increase the risk of skin abrasions and elevate pressure on the hind limbs, potentially leading to lameness. Signs of estrus in animals can often be identified through mounting and fence-crossing behaviors between individuals [11]. Therefore, by monitoring animal behaviors, it is possible to detect potential health issues early and thereby effectively improve overall animal welfare.

With technological advancements, researchers both in China and abroad have adopted various methods to monitor animal behavior. Kashiha et al. [12] installed CCD cameras and water meters above pig pens to monitor drinking behavior in pigs. Through image contour analysis, they precisely quantified the distance between the pig’s head and ears and the drinker, along with the time spent by the pig at the drinker, allowing for the accurate identification of drinking behaviors. Yang et al. [13] utilized the Faster R-CNN to locate each pig within the pen and associate the pig’s head with its body. They then analyzed pixel occupancy in the feeding area to identify feeding behaviors, achieving an impressive 99.6% accuracy in feeding recognition. Nasirahmadi et al. [14] employed RGB cameras to extract pig images via background subtraction and applied a Support Vector Machine (SVM) classifier to distinguish between lateral and sternal recumbent postures, achieving identification accuracies of 94.4% and 94%, respectively. Subsequently, Nasirahmadi et al. [15] employed elliptical fitting methods to track pigs and established the positions of the head, tail, and sides by identifying the points where the major and minor axes intersected. By calculating Euclidean distances between heads and other features such as tail position and ellipse axis lengths, pig mounting behavior was successfully identified. Wang et al. [16] introduced an efficient cattle behavior recognition framework utilizing the YOLOv5s model, enhanced with attention modules, inverted residual blocks, and depthwise separable convolutions inspired by EfficientNetV2. Incorporating these elements into the backbone and feature enhancement stages led to an achieved mAP of 87.7%. Shang et al. [17] combined an improved Squeeze-and-Excitation (SE) attention mechanism with Convolutional Block Attention Module (CBAM), optimizing the MobileNetV3 model and integrating it with trajectory recognition algorithms to analyze and determine cattle tracks and behaviors, achieving a maximum accuracy of 95.17%. Guo et al. [18] introduced an enhanced version of the YOLOv4 framework tailored for identifying and analyzing behavioral patterns in meat pigeons. Using an Adaptively Spatial Feature Fusion (ASFF) module, the model significantly improved detection accuracy in complex environments, with mAP@50 and mAP@75 increasing by 14.73% and 14.97%, respectively, compared to the original YOLOv4. Ge et al. [19] addressed issues such as low pig posture recognition accuracy, high model complexity, and slow detection speed by proposing a performance-aware global channel pruning algorithm to optimize the lightweight YOLOv5s model. By identifying and eliminating redundant or low-contribution connections and performing tuning compensation, a more efficient model was achieved. Guo et al. [20] proposed a resource-efficient pig facial recognition architecture, termed RKNet-HAM, which leverages a fourth-order Runge–Kutta integration approach in combination with a Hybrid Attention Mechanism (HAM) to enhance feature representation. By focusing on semantic information in low-resolution pig face images, the model achieved 99.26% accuracy and strong generalizability, with a compact model size of only 1.52 MB, making it suitable for deployment on embedded devices. Based on this model, Guo et al. [21] developed a digital unmanned pig farming system, effectively addressing issues of low production efficiency, manual feces cleaning, bulky equipment, high power consumption, and low sensing data accuracy in traditional pig farms, thereby improving farming efficiency, reducing labor costs, and increasing economic benefits. Chen et al. [22] proposed a multimodal method combining visual and sensor technologies, using different processing strategies for different scenarios. The average accuracy of pig behavior recognition using this multimodal approach reached 88.82%, significantly outperforming unimodal methods, and greatly enhancing the accuracy and reliability of behavior recognition in complex environments.

Recently, the YOLO series has undergone rapid evolution, continuously pushing the boundaries of real-time object detection. For instance, Wang et al. [23] introduced YOLOv9 with Programmable Gradient Information (PGI), effectively overcoming data loss in deep networks and achieving superior parameter utilization on the MS COCO dataset. Subsequently, Wang et al. [24] proposed YOLO10, which used an NMS-free (Non-Maximum Suppression) training strategy and reported a reduction in end-to-end inference latency of nearly 46% compared to previous generations, without sacrificing accuracy. The latest iterations, YOLO11 [25] and YOLO12 [26], integrate advanced architectural components like C3k2 blocks and C2PSA attention modules, further pushing the State-of-the-Art accuracy-speed boundaries on standard benchmarks.

Despite notable progress in animal behavior recognition, significant challenges remain in achieving real-time monitoring and high-precision identification. To further enhance the accuracy and efficiency of animal behavior recognition, this study proposes an improved method based on the YOLOv8n model. By incorporating advanced attention mechanisms and optimizing the backbone network architecture, the proposed approach aims to achieve more precise animal behavior monitoring. This method is expected to improve detection accuracy while reducing computational complexity and the number of model parameters, thereby providing stronger technical support for intelligent livestock farming.

2. Data Collection and Preprocessing

2.1. Data Collection

The experimental data were gathered in August 2023 at a modern pig farming facility situated in Yunfu City, Guangdong Province. To ensure the model’s adaptability to real-world farming challenges, long-duration video recordings were captured via high-definition surveillance cameras mounted 3.0 m directly above the pens. Unlike studies conducted under constant illumination, our data collection spanned various time periods to encompass diverse indoor lighting conditions, ranging from intense natural sunlight during midday to low-light environments during early morning and overcast periods. The pig pens varied in size, typically housing 8–20 pigs of similar body size, resulting in a relatively high-density rearing environment with frequent physical occlusions. This multi-scenario acquisition strategy aimed to construct a pig behavior dataset with enhanced environmental diversity, thereby improving model robustness and generalization across varying light intensities and complex social interactions. Representative images reflecting these diverse acquisition conditions are presented in Figure 1.

2.2. Dataset Construction and Analysis

To establish a high-quality dataset with robust generalization capabilities and to strictly eliminate the risk of data leakage, this study implemented a rigorous data acquisition and preprocessing pipeline:

(1): Video-level Splitting Strategy: To evaluate the model’s genuine generalization performance on unseen data, the 20 source videos were pre-partitioned into training, validation, and testing sets according to a strict 8:1:1 ratio at the video level. This ensures that the test set contains entirely novel scenes and pig individuals never encountered during training.
(2): Multi-frequency Frame Sampling: Given the high spatial-temporal correlation between consecutive frames, we adopted a class-specific temporal sampling strategy to balance the dataset: General Sampling: For common behaviors (Stand and Lie), a sparse sampling interval of 5 seconds was used to reduce redundancy; Targeted Sampling for Rare Behaviors: To mitigate the natural scarcity of aggressive interactions, segments containing fight and tail-bite were sampled at a higher frequency of 1.5 s. This targeted approach successfully increased the number of instances of rare behaviors, enabling the model to learn the fine-grained morphological features of mouth-to-tail and head-to-head contact. A total of 2500 high-quality images were finalized.
(3): Training Set Augmentation: During each epoch, stochastic transformations, including horizontal flipping, rotation, brightness adjustment, and Mixup, were applied to the training images. This ensures that the model encounters a stochastic variety of samples in every iteration. A representative result of the augmentation process is presented in Figure 2.
(4): Annotation and Quality Control: The images were manually annotated using LabelImg. To ensure scientific rigor and inter-observer reliability, a two-stage verification protocol was implemented. Three trained researchers performed the primary labeling based on strict morphological criteria. A cross-review was conducted where annotators swapped datasets to identify and correct discrepancies. The extracted images were manually annotated using the LabelImg tool, covering five distinct behaviors: stand, lie, eat, fight, and tail-bite. Table 1 provides an overview of the classification criteria for pig behaviors along with the associated dataset distribution.

Figure 2. Image enhancement examples.Note: (a) Original Image; (b) Vertical Flip; (c) Scaling; (d) Horizontal Flip; (e) Rotation; (f) RandomCropping; (g) Color Space Transformation; (h) Translation; (i) Mosaic Augmentation.

3. Research Methodology

3.1. YOLO Object Detection Algorithm

Object detection models can generally be classified into two categories: one-stage and two-stage models. One-stage models directly predict both object classes and locations from the entire image in a single process. This approach reduces both training time and model complexity, offering fast detection speeds, although it often sacrifices accuracy. Prominent examples of one-stage models include the YOLO series, SSD, and RetinaNet. In contrast, two-stage models first generate candidate object regions, which are then refined to produce the final detection results. While this process enhances accuracy, it typically comes at the cost of slower detection speeds. Well-known two-stage models include R-CNN, Fast R-CNN, and Mask R-CNN.

YOLOv8n is an object detection model released in 2023 by the Ultralytics team [27]. It has garnered widespread attention due to its fast and accurate object recognition capabilities. Based on the YOLOv5 architecture [28], YOLOv8n incorporates various enhancements and integrates advantages from multiple object detectors. The structural diagram of YOLOv8n is shown in Figure 3.

The backbone architecture functions as the core framework of the model, tasked with deriving meaningful representations from the input image. In YOLOv8n, this component primarily adopts a CSPDarknet-like architecture, similar to that in YOLOv5 [29], and introduces cross-stage connections between different network stages. On one hand, by directly linking partial feature maps to deeper layers, the model enhances its information transmission capability; on the other hand, it improves feature reuse efficiency. As shown in Figure 3, the model alternates between stacking convolutional layers (Conv) and C2f modules for feature extraction. Positioned at the terminal stage of the backbone, the Spatial Pyramid Pooling-Fast (SPPF) module utilizes pooling at various scales to effectively capture spatial features, thereby markedly enhancing the model’s capability to identify objects across a range of sizes.

Positioned between the backbone and the detection head, the intermediate fusion module merges and optimizes features across multiple scales. It upsamples feature maps generated by the backbone to increase spatial resolution, then aggregates multi-scale information via sequential concatenation and C2f layers, resulting in richer, more informative feature representations.

The detection head, located at the top of the model, performs the final object recognition and localization. The features derived from the backbone and neck are transmitted to the head, where object features are decoded, and detection results are generated. Unlike YOLOv5’s coupled head design, YOLOv8n adopts a mainstream decoupled head architecture [30], transitioning from an anchor-based to an anchor-free paradigm [31]. Instead of relying on predefined anchor boxes, it directly predicts object positions and categories on the feature maps. This anchor-free approach reduces dependence on manually tuned hyperparameters, leading to a more streamlined and training-efficient model.

Additionally, the model integrates the concept of Distributional Focal Loss (DFL) [32], which focuses on hard samples to address the long-tail distribution problem in classification tasks.

Among various YOLO iterations, YOLOv8n currently provides an optimal balance between precision and efficiency for livestock monitoring. However, the continuous evolution of the Ultralytics ecosystem toward YOLO11 and future architectures highlights a shift toward self-attention mechanisms and NMS-free training. While these advancements offer a clear roadmap for the field, YOLOv8n remains the most stable baseline for deployment on resource-constrained edge devices in pig farms.

3.2. Improved Method for Pig Behavior Recognition Based on YOLOv8n

Although YOLOv8n, as a widely adopted object detection algorithm, incorporates various advanced optimization strategies, its feature extraction process may still be influenced by noise in complex environments, leading to suboptimal feature representations. This limitation can result in false positives or missed detections in pig behavior recognition. To overcome this limitation, this study introduces an enhanced YOLOv8n-based detection framework that improves resilience and feature extraction precision, thereby significantly boosting recognition performance in pig behavior analysis.

3.2.1. SE Attention Mechanism

A key component of Convolutional Neural Networks (CNNs) is the convolutional operation, which enables the model to capture both spatial and channel-wise features within localized receptive regions at each layer, thereby constructing rich feature representations. The attention mechanism reduces interference from background noise, allowing the neural network to place greater emphasis on the important feature regions of the target object, which has led to its widespread application in tasks such as object recognition [33,34,35]. Traditional attention mechanisms primarily focus on determining the importance weights of targets across various spatial domains or feature maps [36,37], often neglecting attention along the channel dimension. This constraint limits the comprehensive extraction of global features and increases computational complexity. To address this issue, an enhanced Squeeze-and-Excitation (SE) attention mechanism is introduced. By enhancing feature extraction, this approach improves the accuracy and efficiency of pig behavior recognition. In terms of model architecture, the SE module is integrated into the YOLOv8n framework. The SE module learns dynamic channel-wise weights, enabling the network to adaptively recalibrate the importance of each feature channel based on task-specific relevance. This strategy allows the model to concentrate more effectively on informative channels, thus improving feature discrimination. Consequently, the model demonstrates better overall performance and predictive accuracy while capturing more comprehensive global features. Notably, this mechanism allows even the early layers to access a broader receptive field. Moreover, the inclusion of the SE module incurs only a marginal increase in model complexity and computational cost, providing a favorable balance between accuracy and efficiency and ensuring scalability across various application scenarios. Figure 4 illustrates the SE module architecture [38].

The main steps of the SE module are as follows:

Squeeze: Global average pooling is applied to the input feature map, reducing the spatial dimensions of each channel to a single scalar. This operation produces a channel descriptor that encodes the global spatial context of each channel, as shown in Equation (1).

F_{sq} (\cdot) = \frac{1}{H \times W} \sum_{J = 1}^{W} F_{tr} (i, j)

(1)

In this context,

F_{sq}

represents the compressed features, and

F_{tr}

represents the input feature map, where H and W represent its height and width, respectively.

F_{ex} (\cdot) = σ (W_{2} δ (W_{1} F_{sq} (\cdot)))

(2)

In this context,

F_{ex}

denotes the features following excitation, while

W_{1}

and

W_{2}

refer to the weights of the two fully connected layers. Additionally,

δ

and

σ

represent the ReLU and Sigmoid activation functions, respectively.

Recalibration: The computed weights are subsequently applied to the initial feature map, enabling selective modulation of each channel, as depicted in Equation (3).

F_{scale} (\cdot) = F_{tr} (\cdot) \otimes F_{ex} (\cdot)

(3)

Here,

F_{scale}

represents the features after recalibration, and

\otimes

denotes element-wise multiplication. This approach allows the network to emphasize informative features while attenuating less relevant ones. The integration of the SE module enhances the model’s ability to prioritize salient information, thus contributing to improved detection accuracy. Moreover, the SE module introduces only minimal computational overhead and does not substantially increase model complexity [39,40].

3.2.2. C3Ghost Convolutional Module

The C3 convolutional module, used in YOLOv8n, integrates depthwise separable and dilated convolutional layers, thereby reducing parameter count and computational burden. However, this structure may degrade feature map resolution, thereby compromising the extraction of fine-grained information and reducing the effectiveness of feature fusion. These limitations can hinder model training and impact overall stability. To mitigate these drawbacks, this study incorporates the Ghost module from GhostNet, proposed by Han et al. [41]. The Ghost module is a novel, lightweight convolutional structure that achieves parameter and computation reduction by producing compressed feature representations through linear transformations of intrinsic feature maps, while preserving or even enhancing detection accuracy. Accordingly, the standard C3 module is substituted with the C3Ghost variant to facilitate improved performance. The conventional convolutional structure is illustrated in Figure 5.

Y = Xf + b

(4)

where X represents the input feature map, f denotes the n convolution kernels of size k × k with c channels, b represents the bias, and Y denotes the output feature map obtained through convolution operations.

The GhostConv module is illustrated in Figure 6.

The Ghost module first employs a 1 × 1 convolution kernel to halve the channel count of the input feature map. It then generates Ghost feature maps through grouped convolution operations. Finally, the Ghost feature maps are merged with the original feature map to produce an enhanced feature map. In this way, the C3Ghost module can significantly lower computational cost and parameter count while preserving model performance, thus improving the model’s efficiency and practical applicability. The formula for GhostConv is as follows:

Y^{'} = X^{'} f^{'}

(5)

y_{i, j} = Φ_{i, j} (y_{i}^{'}) (i \in [1, m], j \in[1, s])

(6)

3.2.3. GIoU Loss Function

Intersection over Union (IoU) is a widely adopted metric in object detection tasks, commonly used to evaluate the overlap between the predicted bounding box and the ground truth. However, when the two boxes do not intersect, the IoU score is 0, limiting the model’s learning capacity.

The YOLOv8n model employs Complete IoU (CIoU) as its loss function. While CIoU enhances IoU by including the distance between non-overlapping regions to more accurately capture the extent of overlap, it requires complex calculations for assessing the aspect ratio of bounding boxes. This complexity increases the computational cost during training, slows model convergence, and does not address the issue of balancing sample difficulty.

Therefore, this paper introduces the Generalized Intersection over Union (GIoU) loss function to replace the CIoU function. Introduced in 2019, GIoU improves IoU by considering non-overlapping areas, enabling it to more comprehensively assess the degree of overlap between the predicted bounding box and the ground truth. The formula for GIoU loss is presented as follows:

GIoU = IoU - \frac{Area of smallest convex hull - Area of union}{Area of smallest convex hull}

(7)

Here, the Area of Intersection refers to the overlapping region between two bounding boxes, the Area of Union denotes the combined area covered by both boxes, while the Area of the Smallest Convex Hull represents the area of the minimum convex polygon that completely encloses the two boxes.

By introducing the GIoU loss function, the model can more effectively optimize bounding box localization, demonstrating enhanced robustness, particularly when dealing with irregularly shaped or highly variable-sized objects. Compared with the CIoU loss, GIoU offers a more comprehensive evaluation metric that improves the model’s detection accuracy and generalization, while reducing computational complexity and accelerating convergence.

3.3. Improved YOLOv8n-Based Model for Pig Behavior Recognition

In this work, the original CIoU loss function in the YOLOv8n model was substituted with the GIoU loss, leading to a notable acceleration of convergence and a substantial reduction in computational cost, while maintaining comparable accuracy. Furthermore, the backbone feature-extraction network of YOLOv8n was augmented with the Squeeze-and-Excitation (SE) attention mechanism. This mechanism dynamically adjusts channel-specific weights, enabling the network to concentrate more effectively on informative channels and thereby strengthening feature discrimination. Although this adjustment results in a modest increase in the parameter count, it contributes to improved detection performance.

To further lighten the model, the traditional C3 modules in the backbone were replaced with C3Ghost modules, and the standard Conv modules were substituted with GhostConv modules. The C3Ghost architecture reduces the number of convolutions and generates richer feature representations with fewer parameters, thereby maintaining model accuracy. Figure 7 presents the structural design of the enhanced YOLOv8n model.

4. Experimental Results and Analysis

4.1. Experimental Environment

The experiments were conducted on a system running Windows 11, equipped with a 12th-generation Intel Core i5-12400F CPU and an NVIDIA GeForce RTX 4060 graphics card. CUDA version 11.8 was utilized for GPU acceleration. The deep-learning tasks were implemented using PyTorch 1.12.1, with Python version 3.9.7 serving as the programming environment.

The resolution of the input image was 1280 × 720 pixels. The batch size of 16 was used while training, and the model was trained for 100 epochs. The optimizer used was the Adaptive Moment Estimation (Adam) optimizer.

To ensure a strictly fair comparison, all baseline models and comparative experiments were implemented and trained from scratch under an identical experimental protocol. This includes using the same dataset partitions, input resolution, data augmentation strategies, optimizer settings, and total training epochs.

4.2. Evaluation Metrics

The study uses several assessment metrics to evaluate the algorithm’s recognition performance, including Precision, Recall, Number of Parameters (Params), mAP@50, Floating Point Operations (FLOPs), and Frames Per Second (FPS).

Precision represents the ratio of correctly predicted positive instances to the total number of instances labeled as positive by the model. The higher the precision, the higher the accuracy of the algorithm. Recall is the fraction of correctly identified positive samples among all positive samples, and it indicates how well the model captures the true positives. The higher the recall, the wider the coverage. mAP@50 is a commonly employed measure of evaluation in object detection, reporting mAP when the threshold of IoU is set to 0.5. An elevated mAP reflects superior detection performance. The Params serves as an indicator of the model’s structural complexity. A reduced parameter count generally indicates a more compact model that demands fewer computational resources. FLOPs represent the total number of floating-point operations needed for inference, serving as an important metric to evaluate computational complexity. FPS measures the inference speed, i.e., the number of frames processed per second. FPS is influenced not only by the efficiency of the algorithm but also by the performance of the hardware used.

4.3. Experimental Results Analysis

4.3.1. Ablation Study Results and Analysis

To evaluate the impact of different attention mechanisms on the performance of the recognition algorithm, a series of comparative experiments was conducted using several commonly adopted attention modules. The experimental results are shown in Table 2.

As seen from the table, the SE attention mechanism achieved the highest mAP@50 compared to other attention mechanisms.

The attention visualization of the original YOLOv8n model and the model improved by integrating the SE module is illustrated in Figure 8b. The results demonstrate that after incorporating the SE attention mechanism, the model focused more effectively on the pig’s torso region while reducing attention to irrelevant areas of the image. This indicates that the model can more efficiently concentrate on target regions, decreasing the influence of background noise on recognition outcomes. Consequently, this verifies the effectiveness of integrating the SE module, thereby enhancing the model’s reliability and interpretability.

To assess the contribution of each proposed enhancement and its influence on overall model performance, ablation studies were systematically conducted. The corresponding outcomes are summarized in Table 3.

Considering the constraints on computational power in real-world farming environments, several optimizations were integrated into the YOLOv8n model. First, the GIoU loss was adopted to replace the original loss function. While this modification does not alter the model’s parameter count or inference FLOPs, it significantly improves the optimization landscape during training. The GIoU loss enhances the model’s localization precision and generalization capability by simplifying gradient calculations and accelerating convergence, thereby boosting overall detection performance. The superior optimization behavior is empirically demonstrated by the convergence curves of the loss functions during the training phase in Figure 9.

Incorporating the SE attention mechanism improved precision, recall, and mAP while only slightly increasing the model’s parameter count. This improvement is due to the SE module’s capacity to direct the network in extracting relevant target information, thereby boosting the model’s ability to identify subtle features.

Replacing the C3 convolution module in the backbone with the C3Ghost module reduced both the number of parameters and FLOPs, while simultaneously improving precision, recall, and mAP. The ablation experiment results revealed a minor decrease in mAP when the SE attention mechanism was incorporated alone, compared to the complete set of improvements. However, Models 5 and 7 exhibited an increase in mAP. This could be due to the attention mechanism’s capacity to focus on important information with higher weights while suppressing irrelevant features, and dynamically adjusting attention weights during learning.

In conclusion, the ablation experiments demonstrate that combining the GIoU loss function, SE attention mechanism, and C3Ghost convolution module effectively optimizes the YOLOv8n model, striking a balance between reducing model complexity and enhancing recognition accuracy. Compared with the original YOLOv8n model, the enhanced version achieves a 42.93% reduction in parameters. It reduces FLOPs to 61.73% of the initial value while achieving 96.40% precision, 96.1% recall, and 96.8% mAP, thereby achieving lightweight optimization without compromising detection performance.

4.3.2. Comparison of Experimental Results Across Different Models

To compare the performance of the improved YOLOv8n model with other models, mainstream one-stage object detection algorithms, including YOLOv5n, YOLOv8n, YOLOv9n, YOLOv10n, YOLO11n, as well as the most recently proposed YOLOv12n model, were selected for comparative experiments. The experimental results are shown in Table 4.

As shown in Table 4, the YOLOv5n network model demonstrates the lowest AP, coupled with the highest parameter count and floating-point operations. Although this YOLOv9n model achieves a mAP of 95.8%, its frame rate is relatively low, limiting its applicability in high-real-time scenarios. Both the YOLOv10n and YOLO11n models perform excellently in terms of frame rate, but they still fall short of the improved YOLOv8n. Although YOLOv12n strikes a balanced performance between accuracy (96.1%) and speed (120.53 FPS), its number of parameters is 47.6% higher than that of the improved YOLOv8n; its computational cost (5.80 × 10⁹ FLOPs) is also 16% larger. This indicates that it remains less lightweight than the enhanced model.

The experimental results demonstrate that the optimized YOLOv8n model achieves superior recognition accuracy while maintaining a lightweight architecture. Given its low parameter count and reduced FLOPs, the model exhibits significant potential for real-time deployment on resource-constrained edge devices.

Recognition results for different daily behaviors of pigs can provide feedback on the model’s ability to identify pig features. Significant differences in the performance of YOLO models on pig behavior recognition tasks reflect their varying abilities to recognize target features. In this study, the mAP values of the YOLOv5n, YOLOv8n, YOLOv9n, YOLOv10n, YOLO11n, and YOLOv12n models were compared for recognizing five pig behaviors: lie, eat, fight, and tail-bite. The outcomes are presented in Table 5.

Based on the experimental results presented in Table 5, the model developed in this study demonstrates substantial advantages in recognizing pig behavior, particularly in detecting lie and tail-bite behaviors, achieving accuracy rates of 98.5% and 96.8%, respectively. While the accuracy for detecting stand behavior is 91.0%, it is lower than for other behaviors, which can be attributed to challenges in distinguishing between stand and lie postures from certain angles, leading to potential confusion between the two.

The model effectively identifies critical moments in aggressive interactions based on spatial proximity and orientation. However, a limitation of this frame-based approach is its inability to distinguish between brief accidental contact and sustained aggressive behavior, as it lacks temporal contextual information.

Compared with existing models such as YOLOv5n, YOLOv8n, and YOLOv9n, the proposed model shows notable improvements in average precision across all five behaviors, with the maximum improvement reaching 4.4 percentage points and the minimum improvement of 1.3 percentage points. Specifically, the model presented in this study outperforms YOLOv9n and YOLOv10n, particularly in recognizing eat and lie behaviors.

Moreover, the YOLOv12n model demonstrates strong performance, particularly in detecting lie, eat, and tail-bite behaviors, with precision rates of 97.2%, 94.0%, and 94.2%, respectively. However, the enhanced YOLOv8n model remains highly competitive, displaying significant improvements across all behaviors, particularly in the detection of lie and tail-bite behaviors, achieving precision rates of 98.5% and 96.8%, respectively.

In conclusion, the model proposed in this study presents distinct advantages over existing models in recognizing a broad spectrum of pig behaviors, particularly in the detection of lie, eat, fight, and tail-bite behaviors. Its overall high accuracy underscores its potential for practical application in similar standardized indoor enclosures, providing a more reliable and precise solution for monitoring pig behavior.

4.3.3. Recognition Result Analysis

The recognition results of different object detection models on pig behavior are shown in Figure 10. As shown in Figure 10, under various lighting conditions and in complex scenarios such as crowding and occlusion, the improved pig behavior recognition algorithm performs exceptionally well. It significantly outperforms other object detection models in recognizing pig behaviors, with high confidence in the recognition results and no missed detections. The detection results of the YOLOv5n model are relatively lower compared to the other models, making it unsuitable for pig target detection. YOLOv9n performs poorly in detecting eat behavior, ranking last in the comparison experiment. Moreover, its frame rate is the slowest compared to all other models, which prevents it from fulfilling real-time detection needs. YOLO11n performs poorly in recognizing pig stand behavior, with insufficient precision. Both YOLOv8n and YOLO11n network models exhibit weaker overall performance in pig behavior recognition compared to the improved YOLOv8n model. YOLOv12n achieves an mAP of 96.1%, nearly on par with the enhanced YOLOv8n, and both models demonstrate comparable robustness. However, in terms of computational efficiency, the real-time frame rate of YOLOv12n is 120.53 FPS, slightly lower than the 126.20 FPS of the improved YOLOv8n. Concurrently, its parameter count and FLOPs exceed those of the improved YOLOv8n by 47.6% and 16.0%, respectively.

To evaluate the generalization capability and robustness of the improved YOLOv8n model, a performance validation was conducted on an entirely independent test set consisting of 818 instances. To eliminate potential data leakage, a strict video-based splitting strategy was implemented, ensuring that no frames from the same video sequence appeared in both the training and testing sets. As depicted in Figure 11, the model achieved a robust mAP of 94.8% on this unseen dataset, confirming its reliable performance in practical scenarios. A granular analysis of the normalized confusion matrix (Figure 11) reveals the following:

Robust Static Behavior Recognition: Despite the rigorous data partitioning, the model maintained high recognition accuracy for common static behaviors. ‘Lie’ and ‘Eat’ behaviors achieved accuracies of 0.97 and 0.95, respectively, indicating that the model effectively captured the distinctive morphological features of these behaviors.

Analysis of Morphological Confusion: A detailed examination of the inter-class error rates reveals a subtle confusion between ‘Stand’ and ‘Lie’, with 7% of ‘Stand’ instances misclassified as ‘lie’. This primarily stems from the morphological similarity between the two postures during transitional moments, such as when a pig is rising or lying, when captured from an overhead perspective.

Resilience in Interactive Behavior Detection: The proposed model demonstrated strong generalization to complex and sparse behaviors. ‘Fight’ and ‘Tail-bite’ reached accuracies of 0.93 and 0.97, respectively. The occasional confusion (4%) between ‘Fight’ and ‘Eat’ typically occurs in high-density group scenes near feeders, where intense competition mimics the visual patterns of aggressive interactions. These findings offer valuable insights for integrating temporal features to further distinguish these dynamic interactions in future research.

In summary, the comparative analysis demonstrates that the enhanced YOLOv8n-based model provides a robust and efficient solution for frame-level pig behavior identification. By achieving a superior balance between detection precision and computational efficiency, the model stands out for its minimal parameter count and FLOPs while maintaining high average precision across diverse behaviors. Our results indicate that the model remains resilient to partial occlusions, significantly reducing the occurrence of missed detections. While the current inference speed was validated on desktop-grade hardware, the lightweight architectural footprint highlights its strong potential for future deployment on resource-constrained edge devices within indoor intensive pig farming environments.

5. Conclusions

(1): To address the challenges posed by limited computational resources in real-world farming applications and the need for real-time performance, the YOLOv8n model was enhanced by incorporating the GhostNet architecture and an SE attention mechanism module. Additionally, the bounding box loss function was modified from CIoU to GIoU, reducing the model’s computational complexity. As a result, the proposed YOLOv8n-based pig behavior recognition model exhibits a 42.93% reduction in parameters and a 38.27% decrease in floating-point operations. With an average precision of 96.8%, the model shows a 2.6 percentage point improvement over the original YOLOv8n, thereby achieving a balance between lightweight design and enhanced recognition accuracy.
(2): The improved pig behavior recognition model outperforms advanced network architectures, including YOLOv9n, YOLOv10n, YOLO11n, and YOLOv12n, in terms of detection accuracy. Compared with widely used object detection models, this approach demonstrates robust performance in recognizing pig behaviors, even in challenging environments such as crowded pigpens and poor lighting. While achieving an inference speed of 126.26 FPS on an NVIDIA RTX 4060 GPU, the model’s small architectural footprint (only 1.7 MB) makes it an ideal candidate for real-time deployment on edge devices. Nevertheless, it should be noted that the performance metrics reported in this study are derived from single training runs. Due to computational resource constraints and our primary focus on architectural optimization for practical applications, we did not include variance or confidence intervals based on multiple random seeds. To ensure the reliability of the comparative results, a strictly identical experimental environment was maintained for all models, including hardware configurations, hyperparameter settings, and data partitioning. While this approach may involve minor stochastic variability, the consistent improvements observed across different behavior categories suggest that the proposed modifications are effective, and future research will incorporate multi-run statistical validation to further quantify the model’s stochastic stability.
(3): The enhanced YOLOv8n model proposed in this study achieves significant improvements in detection accuracy and computational efficiency for pig behavior recognition. However, certain limitations persist. One challenge is the model’s difficulty distinguishing between stand and lie behaviors from certain angles, due to the similarity of their visual features. While the model performs well at detecting large-scale behaviors, it struggles to identify small-scale features. Additionally, the dataset used in this study is restricted to a single facility with pigs of similar body size. This narrow acquisition domain limits the model’s performance evaluation under domain shifts. While the model achieves robust in-domain performance, testing on unseen pens, under different lighting conditions, across various age groups, and on different farms is necessary to fully validate its generalizability across settings. Therefore, for broader practical applications, the model must be further adapted to cope with complex environmental changes.
(4): Future research will focus on several key directions to further enhance the performance and applicability of the model. Firstly, multimodal data fusion will be explored to expand perception capabilities by incorporating additional sources, such as audio and temperature. Secondly, architectural innovations from the latest iterations, including the attention-centric C2PSA modules in YOLO12, the NMS-free training protocols featured in YOLO11, and the performance benchmarking framework introduced in YOLO26 [42], will be integrated to better resolve behavioral overlaps in high-density environments. Thirdly, the temporal dynamics of pig behaviors will be captured through sequence analysis, employing methods such as temporal convolutional networks or recurrent neural networks to distinguish transitional postures. Fourthly, efforts will continue to optimize the model’s weight for efficient deployment on resource-constrained edge devices. Moreover, the potential for extending this method to other livestock species will be investigated to advance intelligent agricultural technologies. Finally, practical considerations, including user feedback and real-time system integration, will be prioritized to enhance the system’s operational effectiveness.

Author Contributions

J.G.: Data curation, Conceptualization. Y.X.: Validation, Software, Methodology, Data curation. L.L.: Supervision, Funding acquisition. B.Z.: Writing—review and editing. P.Z.: Writing—review and editing. S.L.: Formal analysis. Y.Z.: Resources. J.J.: Writing—review and editing. Z.L.: Supervision, Project administration, Funding acquisition. G.C.: Writing—review and editing, Datasets Construction and Quality Evaluation. All authors have read and agreed to the published version of the manuscript.

Funding

This paper was supported partly by Special Projects in Key Fields of Ordinary Universities in Guangdong Province under Grant 2025ZDZX4025, Guangdong Province Rural Science and Technology Commissioner under Grant KTP20240590, KTP20240597, Guangdong Province Graduate Education Innovation Program Project under Grant 2024ANLK_049, Innovative projects with distinctive features in ordinary universities in Guangdong Province under Grant 2023KTSCX048, Guangzhou Rural Science and Technology Commissioner Special Project under Grant 2024E04J0106, Yunfu 2023 provincial science and technology innovation strategy and rural revitalization strategy project under Grant 2023020101, Yunfu City’s 2025 Provincial Science and Technology Support “Hundred, Thousand, and Million Project” under Grant 2025020206.

Data Availability Statement

Data will be made available on request.

Acknowledgments

The authors thank the funders listed in Funding section.

Conflicts of Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References

Hao, W.; Han, W.; Han, M.; Li, F. A novel improved yolov3-sc model for individual pig detection. Sensors 2022, 22, 8792. [Google Scholar] [CrossRef] [PubMed]
Marcon, M.; Brossard, L.; Quiniou, N. Precision feeding based on individual daily body weight of group-housed pigs with an automatic feeder developed to allow for restricting feed allowance. Precis. Livest. Farming 2015, 15, e601. [Google Scholar]
Tran, D.; Nguyen, T.N.; Khanh, P.C.P.; Tran, D. An iot-based design using accelerometers in animal behavior recognition systems. IEEE Sens. J. 2021, 22, 17515–17528. [Google Scholar] [CrossRef]
Hou, S.; Wang, T.; Qiao, D.; Xu, D.J.; Wang, Y.; Feng, X.; Khan, W.A.; Ruan, J. Temporal-Spatial Fuzzy Deep Neural Network for the Grazing Behavior Recognition of Herded Sheep in Triaxial Accelerometer Cyber-Physical Systems. IEEE Trans. Fuzzy Syst. 2024, 33, 338–349. [Google Scholar] [CrossRef]
Dai, X.; Wu, J.; Cheng, G.; Yang, L.; Wang, Y.; Li, Z.; Han, S. Applications and challenges of wearable devices in livestock and poultry health management. J. Nanjing Agric. Univ. 2025, 48, 766–780. [Google Scholar]
Nasirahmadi, A.; Edwards, S.A.; Sturm, B. Implementation of machine vision for detecting behaviour of cattle and pigs. Livest. Sci. 2017, 202, 25–38. [Google Scholar] [CrossRef]
Mellor, D.J. Updating animal welfare thinking: Moving beyond the “Five Freedoms” towards “a Life Worth Living”. Animals 2016, 6, 21. [Google Scholar] [CrossRef]
Zhu, Z.; Wang, H.; Li, B.; Zhao, W.; Zhu, J.; Jia, N.; Zhao, Y. Research progress of deep learning in recognition of typical behaviors of livestock and poultry. J. Agric. Sci. Technol. 2024, 26, 110–124. [Google Scholar]
Munsterhjelm, C.; Heinonen, M.; Valros, A. Effects of clinical lameness and tail biting lesions on voluntary feed intake in growing pigs. Livest. Sci. 2015, 181, 210–219. [Google Scholar] [CrossRef]
Krsnik, B.; Yammine, R.; Pavičić, Ž.; Balenović, T.; Njari, B.; Vrbanac, I.; Valpotić, I. Experimental model of enterotoxigenic Escherichia coli infection in pigs: Potential for an early recognition of colibacillosis by monitoring of behavior. Comp. Immunol. Microbiol. Infect. Dis. 1999, 22, 261–273. [Google Scholar] [CrossRef]
Rydhmer, L.; Zamaratskaia, G.; Andersson, H.K.; Algers, B.; Guillemet, R.; Lundström, K. Aggressive and sexual behaviour of growing and finishing pigs reared in groups, without castration. Acta Agric. Scand Sect. A 2006, 56, 109–119. [Google Scholar] [CrossRef]
Kashiha, M.; Bahr, C.; Haredasht, S.A.; Ott, S.; Moons, C.P.; Niewold, T.A.; ödberg, F.O.; Berckmans, D. The automatic monitoring of pigs water use by cameras. Comput. Electron. Agric. 2013, 90, 164–169. [Google Scholar] [CrossRef]
Yang, Q.; Xiao, D.; Lin, S. Feeding behavior recognition for group-housed pigs with the Faster R-CNN. Comput. Electron. Agric. 2018, 155, 453–460. [Google Scholar] [CrossRef]
Nasirahmadi, A.; Sturm, B.; Olsson, A.; Jeppsson, K.; Müller, S.; Edwards, S.; Hensel, O. Automatic scoring of lateral and sternal lying posture in grouped pigs using image processing and Support Vector Machine. Comput. Electron. Agric. 2019, 156, 475–481. [Google Scholar] [CrossRef]
Nasirahmadi, A.; Hensel, O.; Edwards, S.A.; Sturm, B. Automatic detection of mounting behaviours among pigs using image analysis. Comput. Electron. Agric. 2016, 124, 295–302. [Google Scholar] [CrossRef]
Wang, R.; Gao, R.; Li, Q.; Zhao, C.; Ma, W.; Yu, L.; Ding, L. A lightweight cow mounting behavior recognition system based on improved YOLOv5s. Sci. Rep. 2023, 13, 17418. [Google Scholar] [CrossRef]
Shang, C.; Wu, F.; Wang, M.; Gao, Q. Cattle behavior recognition based on feature fusion under a dual attention mechanism. J. Vis. Commun. Image Represent. 2022, 85, 103524. [Google Scholar] [CrossRef]
Guo, J.; He, G.; Xu, L.; Liu, T.; Feng, D.; Liu, S. Behavior detection model of meat pigeons based on improved YOLOv4. Trans. Chin. Soc. Agric. Mach. 2023, 54, 347–355. [Google Scholar]
Ge, S.; Ji, H.; Zhan, Y.; Li, X.; Zheng, W.; Wang, T. Lightweight pig posture recognition method based on improved YOLOv5s. J. China Agric. Univ. 2025, 30, 179–189. [Google Scholar]
Guo, J.; Kong, Y.; Lin, L.; Xu, L.; Feng, D.; Cao, L.; Chen, J.; Ye, J.; Ye, S.; Yao, Z. Lightweight network based on Fourth order Runge-Kutta scheme and Hybrid Attention Module for pig face recognition. Comput. Electron. Agric. 2024, 223, 109099. [Google Scholar] [CrossRef]
Guo, J.; Kong, Y.; Liu, S.; Liu, T.; Cao, L.; Liu, Y. Construction and application of a digital unmanned pig farming system. J. Huazhong Agric. Univ. 2024, 43, 288–296. [Google Scholar]
Chen, H.; Yin, L.; Yang, M.; Zhang, S.; Lin, J. Multimodal pig behavior recognition based on vision and sensor fusion. Trans. Chin. Soc. Agric. Eng. 2025, 41, 194–203. [Google Scholar]
Wang, C.; Yeh, I.; Mark Liao, H. Yolov9: Learning what you want to learn using programmable gradient information. In European Conference on Computer Visio; Springer: Berlin/Heidelberg, Germany, 2024; pp. 1–21. [Google Scholar]
Wang, A.; Chen, H.; Liu, L.; Chen, K.; Lin, Z.; Han, J.; Ding, G. Yolov10: Real-time end-to-end object detection. Adv. Neural Inf. Process. Syst. 2024, 37, 107984–108011. [Google Scholar]
Jocher, G.; Qiu, J. Ultralytics YOLO11. Available online: https://docs.ultralytics.com/zh/models/yolo11 (accessed on 14 May 2025).
Tian, Y.; Ye, Q.; Doermann, D. Yolov12: Attention-centric real-time object detectors. arXiv 2025, arXiv:2502.12524. [Google Scholar]
Reis, D.; Kupec, J.; Hong, J.; Daoudi, A. Real-time flying object detection with YOLOv8. arXiv 2023, arXiv:2305.09972. [Google Scholar] [CrossRef]
Jocher, G.; Chaurasia, A.; Stoken, A.; Borovec, J.; Kwon, Y.; Michael, K.; Fang, J.; Wong, C.; Yifu, Z.; Montes, D. ultralytics/yolov5: V6. 2-yolov5 classification models, apple m1, reproducibility, clearml and deci. ai integrations. Zenodo 2022. [Google Scholar] [CrossRef]
Wang, C.; Liao, H.; Wu, Y.; Chen, P.; Yeh, I. A new backbone that can enhance learning capability of CNN. 2020 IEEE. In CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW); IEEE: Piscataway, NJ, USA, 2020; pp. 390–391. [Google Scholar]
Song, G.; Liu, Y.; Wang, X. Revisiting the sibling head in object detector. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 11563–11572. [Google Scholar]
Zhang, S.; Chi, C.; Yao, Y.; Lei, Z.; Li, S.Z. Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 9759–9768. [Google Scholar]
Li, X.; Wang, W.; Wu, L.; Chen, S.; Hu, X.; Li, J.; Tang, J.; Yang, J. Generalized focal loss: Learning qualified and distributed bounding boxes for dense object detection. Adv. Neural Inf. Process. Syst. 2020, 33, 21002–21012. [Google Scholar]
Woo, S.; Park, J.; Lee, J.; Kweon, I.S. Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
Yang, L.; Zhang, R.; Li, L.; Xie, X. Simam: A simple, parameter-free attention module for convolutional neural networks. In Proceedings of the International Conference on Machine Learning, Vienna, Austria, 18–24 July 2021; pp. 11863–11874. [Google Scholar]
Cao, Y.; Xu, J.; Lin, S.; Wei, F.; Gcnet, H.H. Non-Local Networks Meet Squeeze-Excitation Networks and Beyond, 2019 IEEE. In Proceedings of the CVF International Conference on Computer Vision Workshop (ICCVW), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 1971–1980. [Google Scholar]
Li, Y.; Yao, T.; Pan, Y.; Mei, T. Contextual transformer networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 45, 1489–1500. [Google Scholar] [CrossRef]
Fang, P.; Hao, H.; Li, T.; Wang, H. Instance segmentation of broiler image based on attention mechanism and deformable convolution. Trans. Chin. Soc. Agric. Mach. 2021, 52, 257–265. [Google Scholar]
Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2018; pp. 7132–7141. [Google Scholar]
Hou, M.; Hao, W.; Dong, Y.; Ji, Y. A detection method for the ridge beast based on improved YOLOv3 algorithm. Herit. Sci. 2023, 11, 167. [Google Scholar] [CrossRef]
Mpofu, J.B.; Li, C.; Gao, X.; Su, X. Optimizing motion detection performance: Harnessing the power of squeeze and excitation modules. PLoS ONE 2024, 19, e308933. [Google Scholar] [CrossRef] [PubMed]
Han, K.; Wang, Y.; Tian, Q.; Guo, J.; Xu, C.; Xu, C. GhostNet: More Features From Cheap Operations. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 14–19 June 2020; pp. 1577–1586. [Google Scholar]
Sapkota, R.; Cheppally, R.H.; Sharda, A.; Karkee, M. YOLO26: Key architectural enhancements and performance benchmarking for real-time object detection. arXiv 2025, arXiv:2509.25164. [Google Scholar] [CrossRef]

Figure 1. Dataset images of pigs after frame extraction.

Figure 3. YOLOv8n network model structure.

Figure 4. Squeeze-and-excitation module.

Figure 5. Traditional convolution.

Figure 6. GhostConv module.

Figure 7. Improved YOLOv8n network model structure.

Figure 8. Comparison of improved YOLOv8n and original model visualized heatmaps results.

Figure 9. Convergence curves of loss functions during the training phase.

Figure 10. Comparative visualization of pig behavior detection results across different environmental conditions and YOLO architectures. Note: (a–g) display the recognition results of: (a) YOLOv5n, (b) YOLOv8n, (c) YOLOv9n, (d) YOLO10n, (e) YOLO11n, (f) YOLO12n, and (g) the proposed improved YOLOv8n, respectively. The detection results are categorized by behavior, with different-colored bounding boxes representing specific categories: Stand (pink), Lie (red), Eat (orange), Fight (green), and Tail-bite (yellow). The left side of the composite figure labels the specific experimental variations, including Bright lighting, Dim lighting, Crowded scene, and Occlusion.

Figure 11. Confusion matrices of the improved YOLOv8n model on the independent test set.

Table 1. Criteria for judging the behavior of live pigs.

Behavior Category	Label	No. of Instances	Judgment Criteria for Annotation
stand	stand	6340	All four hooves are in contact with the ground; the spine is horizontal or slightly arched.
lie	lie	7850	The limbs are tucked or extended; a large area of the torso is touching the floor.
eat	eat	5380	The snout is inside or significantly overlapping the feeding trough area.
fight	fight	2520	Head-to-head or head-to-body contact; involves pushing or biting postures between individuals.
tail-bite	tail-bite	2250	The attacker’s snout is in direct contact with or extremely close to the tail region of another pig.

Table 2. Comparison of different related attention methods.

AcMix	CBAM	ECA	Triplet	SE	mAP
√					95.9
	√				95.7
		√			95.9
			√		96.0
				√	96.8

Note: “√” indicates that the model includes this module.

Table 3. Ablation experiment results.

Model ID	GIoU	SE	C3Ghost	Parameters	FLOPs	Precision (%)	Recall (%)	mAP (%)
1	—	—	—	3,006,233	8.1 × 10⁹	93.4	92.1	94.2
2	√	—	—	3,006,233	8.1 × 10⁹	93.8	93.4	94.8
3	—	√	—	3,016,985	8.1 × 10⁹	94.2	94.0	95.1
4	—	—	√	1,714,661	5.0 × 10⁹	93.2	92.8	94.0
5	√	√	—	3,016,985	8.1 × 10⁹	94.8	94.5	95.6
6	√	—	√	1,714,661	5.0 × 10⁹	94.0	94.2	95.2
7	—	√	√	1,715,173	5.0 × 10⁹	95.1	95.4	96.0
8	√	√	√	1,715,173	5.0 × 10⁹	96.4	96.1	96.8

Note: The experiments are based on the YOLOv8n model as the baseline. “—“ indicates that the model does not include the module, while “√” indicates that the model includes the module.

Table 4. Comparison of experiment results of different models.

Model ID	mAP (%)	$FPS / (f \times s^{- 1})$	Parameters	FLOPs
YOLOv5n	93.7	70.92	7.00 × 10⁶	15.8 × 10⁹
YOLOv8n	94.2	117.90	3.00 × 10⁶	8.10 × 10⁹
YOLOv9n	95.8	32.15	2.62 × 10⁶	10.7 × 10⁹
YOLOv10n	95.2	86.21	2.70 × 10⁶	8.20 × 10⁹
YOLO11n	95.5	84.03	2.60 × 10⁶	6.30 × 10⁹
YOLOv12n	96.1	120.53	2.51 × 10⁶	5.80 × 10⁹
Improved YOLOv8n	96.8	126.20	1.70 × 10⁶	5.00 × 10⁹

Table 5. Comparison of the mean average accuracy of different behavior recognition methods.

Model ID	Stand	Lie	Eat	Fight	Tail-Bite
YOLOv5n	88.5	95.1	91.2	89.4	90.5
YOLOv8n	89.2	96.4.	93.5	90.8	92.1
YOLOv9n	90.5	96.8	94.4	91.5	91.8
YOLOv10n	87.8	96.5	93.2	90.0	91.4
YOLO11n	90.1	97.0	92.1	91.2	90.8
YOLOv12n	91.5	97.2	94.0	92.5	94.2
Improved YOLOv8n	91.0	98.5	95.2	92.5	96.8

Note: Values are expressed in percentages (%).

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Guo, J.; Xu, Y.; Lin, L.; Zhang, B.; Zhou, P.; Luo, S.; Zhuo, Y.; Ji, J.; Luo, Z.; Cheng, G. An Enhanced YOLOv8n-Based Approach for Pig Behavior Recognition. Computers 2026, 15, 230. https://doi.org/10.3390/computers15040230

AMA Style

Guo J, Xu Y, Lin L, Zhang B, Zhou P, Luo S, Zhuo Y, Ji J, Luo Z, Cheng G. An Enhanced YOLOv8n-Based Approach for Pig Behavior Recognition. Computers. 2026; 15(4):230. https://doi.org/10.3390/computers15040230

Chicago/Turabian Style

Guo, Jianjun, Yudian Xu, Lijun Lin, Beibei Zhang, Piao Zhou, Shangwen Luo, Yuhan Zhuo, Jingyu Ji, Zhijie Luo, and Guangming Cheng. 2026. "An Enhanced YOLOv8n-Based Approach for Pig Behavior Recognition" Computers 15, no. 4: 230. https://doi.org/10.3390/computers15040230

APA Style

Guo, J., Xu, Y., Lin, L., Zhang, B., Zhou, P., Luo, S., Zhuo, Y., Ji, J., Luo, Z., & Cheng, G. (2026). An Enhanced YOLOv8n-Based Approach for Pig Behavior Recognition. Computers, 15(4), 230. https://doi.org/10.3390/computers15040230

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

An Enhanced YOLOv8n-Based Approach for Pig Behavior Recognition

Abstract

1. Introduction

2. Data Collection and Preprocessing

2.1. Data Collection

2.2. Dataset Construction and Analysis

3. Research Methodology

3.1. YOLO Object Detection Algorithm

3.2. Improved Method for Pig Behavior Recognition Based on YOLOv8n

3.2.1. SE Attention Mechanism

3.2.2. C3Ghost Convolutional Module

3.2.3. GIoU Loss Function

3.3. Improved YOLOv8n-Based Model for Pig Behavior Recognition

4. Experimental Results and Analysis

4.1. Experimental Environment

4.2. Evaluation Metrics

4.3. Experimental Results Analysis

4.3.1. Ablation Study Results and Analysis

4.3.2. Comparison of Experimental Results Across Different Models

4.3.3. Recognition Result Analysis

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI