An Improved Pig Counting Algorithm Based on YOLOv5 and DeepSORT Model

Pig counting is an important task in pig sales and breeding supervision. Currently, manual counting is low-efficiency and high-cost and presents challenges in terms of statistical analysis. In response to the difficulties faced in pig part feature detection, the loss of tracking due to rapid movement, and the large counting deviation in pig video tracking and counting research, this paper proposes an improved pig counting algorithm (Mobile Pig Counting Algorithm with YOLOv5xpig and DeepSORTPig (MPC-YD)) based on YOLOv5 + DeepSORT model. The algorithm improves the detection rate of pig body parts by adding two different sizes of SPP networks and using SoftPool instead of MaxPool operations in YOLOv5x. In addition, the algorithm includes a pig reidentification network, a pig-tracking method based on spatial state correction, and a pig counting method based on frame number judgment on the DeepSORT algorithm to improve pig tracking accuracy. Experimental analysis shows that the MPC-YD algorithm achieves an average precision of 99.24% in pig object detection and an accuracy of 85.32% in multitarget pig tracking. In the aisle environment of the slaughterhouse, the MPC-YD algorithm achieves a correlation coefficient (R2) of 98.14% in pig counting from video, and it achieves stable pig counting in a breeding environment. The algorithm has a wide range of application prospects.


Introduction
The number of pigs is crucial information for pig sales and breeding management, and it is extremely important for farmers. Farmers not only need to calculate their own interests based on the number of pigs during the sales process but also need to have a clear understanding of the number of pigs during the breeding process in order to adjust their breeding management plans. However, currently, pig counting is mainly a manual task, which not only has disadvantages such as low efficiency, high cost, and a small counting range, but also can lead to animal welfare issues due to personnel impatience during counting. Accurate pig counting algorithms can ensure that farmers' interests are not infringed upon in pig sales, improve breeding efficiency, reduce labor costs, and lower the occurrence rate of animal welfare issues [1]. Therefore, pig farmers have a strong demand for a high-precision and high-reliability automatic pig counting method.
In recent years, video image processing methods have gradually shifted from traditional algorithms to deep learning algorithms. Deep learning algorithms, with their high accuracy, faster running speed, and stronger generalization, can be used to solve problems such as image object detection, image instance segmentation, and multiobject tracking from video. Popular image object detection algorithms currently include the faster R-CNN series and YOLO series [2][3][4]. Image instance segmentation algorithms include mask R-CNN, mask scoring R-CNN, and PointINS [5][6][7]; video multiobject tracking algorithms include MHT-DAM, SORT, and DeepSORT [8][9][10]. In the agricultural field, issues such as low efficiency, high cost, low intelligence, and a shortage of labor are becoming increasingly prominent. The emergence of these algorithms provides new ideas for improving agricultural production efficiency [11].
With the development of computer vision technology, an increasing number of computer vision algorithms are being applied in the field of precision agriculture, such as agricultural object detection [12], plant disease and pest recognition [13], animal behavior recognition [14], agricultural object segmentation [15], animal weight measurement [16], and agricultural object tracking [17]. These cases demonstrate the broad application prospects of using computer vision algorithms to solve agricultural problems. Although using computer vision algorithms to solve counting problems is not currently the mainstream research focus, there has been a commercial demand for algorithms to count objects in video images to reduce the manual counting workload over the past few decades [18,19]. For example, Lins et al. [20] designed a method for counting fish based on image density level classification and local regression. Zhao et al. [21] used the DeepLabV3+ network to achieve a recall rate of 91% for counting the number of seeds per silique in long-horned fruit. Gao et al. [22] created an automatic apple counting method using the YOLOv4-tiny detection network and a single-object tracking algorithm. Kestur et al. [23] proposed a target detection network called MangoNet based on deep semantic segmentation, which is used for mango detection and counting. Although these methods for image or video counting are not specifically designed for counting pigs, they provide useful a research foundation and technical solutions for pig counting.
Recently, some scholars have made good progress in the research of pig counting algorithms [24]. For example, Oczak et al. [25] estimated the number of piglets in a pen by extracting three parameters: the number of detected objects, the area, and the perimeter of all objects from segmented images of farrowing sows. Huang et al. [26] proposed a two-stage center clustering network (CClusnet) to solve the problem of partial occlusion during piglet counting. Jensen et al. [27] used a convolutional neural network (CNN) with a single linear output node to estimate the number of pigs in a given area of a pig pen. However, these researchers only counted pigs through images and have not yet focused on the video tracking and counting of moving pigs. Although Chen tracked and counted pigs using a camera mounted on a patrol robot on the roof [28], the number of pigs could only be estimated through the space perception time response filter (STRF), and it was still difficult to effectively solve problems such as pig adhesion, overlap, and occlusion. Therefore, a more accurate, widely applicable, and easily deployable pig counting algorithm that focuses on video tracking is currently lacking. As a popular tracking algorithm, DeepSORT has been successfully applied to counting fish [29], sheep [30], and birds [31]. Although the structures of counting algorithms are very similar, different targets have different motion patterns, appearance characteristics, densities, and shooting angles. Therefore, for application to counting pigs in farming environments, optimization is required based on pig characteristics and the farming environment.
To address the problems of weak applicability, low efficiency, and easy tracking loss in the current process of automatic pig counting, this paper proposes an improved pig counting algorithm based on a YOLOv5 + DeepSORT [32] model, called the Mobile Pig Counting Algorithm with YOLOv5xpig and DeepSORTPig (MPC-YD). To address the difficulty of recognizing local pig body parts, the MPC-YD algorithm includes a YOLOv5xpig object detection network. In addition, the MPC-YD algorithm also introduces a pig reidentification network, a pig tracking method based on spatial state correction, and a pig counting method based on the number of frames in the DeepSORT algorithm. First, the data acquisition and dataset preparation techniques of the algorithm are introduced, and then the structure of the algorithm is described in detail, followed by an analysis of the performance and results of the algorithm.

Data Collection and Dataset Production
The data used in this study were collected from a commercial pig slaughterhouse located in Heyuan city, Guangdong Province, where a total of 1892 pig videos were collected. The experimental data collection scenario is shown in Figure 1. In order to ensure the applicability of the algorithm and animal welfare, the experimental data collection channel was one of the channels actually used in the slaughterhouse production, with a width of 1.5 m, and all pigs passed through the channel at a normal speed (v ≤ 3 m/s). The experimental collection equipment used included a Hikvision TB-1217A-3/PA camera(manufactured by Hikvision Digital Technology Co., Ltd., Hangzhou, Zhejiang, China), at a height of 2.8 m from the ground, a video frame rate of 25 fps, and a resolution of 2688 × 1520. The data collection dates were from 1-7 February 2021, and from 25-29 March 2022, during the period of 8:00-16:00.

Data Collection and Dataset Production
The data used in this study were collected from a commercial pig slaughterhouse located in Heyuan city, Guangdong Province, where a total of 1892 pig videos were collected. The experimental data collection scenario is shown in Figure 1. In order to ensure the applicability of the algorithm and animal welfare, the experimental data collection channel was one of the channels actually used in the slaughterhouse production, with a width of 1.5 m, and all pigs passed through the channel at a normal speed (v ≤ 3 m/s). The experimental collection equipment used included a Hikvision TB-1217A-3/PA camera(manufactured by Hikvision Digital Technology Co., Ltd., Hangzhou, Zhejiang, China), at a height of 2.8 m from the ground, a video frame rate of 25 fps, and a resolution of 2688 × 1520. The data collection dates were from 1-7 February 2021, and from [25][26][27][28][29] March 2022, during the period of 8:00-16:00. In order to implement the pig counting algorithm, we needed to create a pig detection dataset and a pig tracking dataset. Therefore, we followed the following three steps to create the datasets: In order to implement the pig counting algorithm, we needed to create a pig detection dataset and a pig tracking dataset. Therefore, we followed the following three steps to create the datasets:

Design of MPC-YD Algorithm
Aiming to address the current difficulties faced by the YOLOv5x algorithm in detecting partial pig body parts (such as half a pig head or half a pig leg) and the problem of easy loss of pig tracking by the DeepSORT algorithm, this paper proposes an MPC-YD pig counting algorithm. The algorithm improves the detection rate of local body parts of pigs by adding two different sizes of SPP networks and using the SoftPool instead of Max-Pool operation in YOLOv5x. Additionally, this algorithm includes a pig reidentification network, a spatiotemporal-state-based pig tracking correction method, and a frame-based pig counting method based on DeepSORT to improve the accuracy of pig tracking and counting.

MPC-YD Algorithm Architecture
The MPC-YD pig counting algorithm is mainly divided into two parts: YOLOv5xpig pig target detection algorithm and DeepSORTpig pig target tracking and counting algorithm. The overall process is shown in Figure 3. The YOLOv5xpig pig target detection algorithm is responsible for detecting the pigs in each frame of the pig video and passing the detection results to the DeepSORTpig algorithm. After obtaining the pig target detection results in the current image, the DeepSORTpig algorithm first reads the detection results of the previous frame and predicts the pig's motion trend in the current image. Then, the pig reidentification network is used to determine the pig association relationship between the prediction results of the previous frame and the target detection results of the current image. The ID is then assigned based on the association relationship, and the ID is corrected using a spatiotemporal-state-based pig tracking correction method (IDFind module in Figure 3). Finally, pig counting is achieved using the frame-based counting method (ID-Match module in Figure 3)

Design of MPC-YD Algorithm
Aiming to address the current difficulties faced by the YOLOv5x algorithm in detecting partial pig body parts (such as half a pig head or half a pig leg) and the problem of easy loss of pig tracking by the DeepSORT algorithm, this paper proposes an MPC-YD pig counting algorithm. The algorithm improves the detection rate of local body parts of pigs by adding two different sizes of SPP networks and using the SoftPool instead of MaxPool operation in YOLOv5x. Additionally, this algorithm includes a pig reidentification network, a spatiotemporal-state-based pig tracking correction method, and a frame-based pig counting method based on DeepSORT to improve the accuracy of pig tracking and counting.

MPC-YD Algorithm Architecture
The MPC-YD pig counting algorithm is mainly divided into two parts: YOLOv5xpig pig target detection algorithm and DeepSORTpig pig target tracking and counting algorithm. The overall process is shown in Figure 3. The YOLOv5xpig pig target detection algorithm is responsible for detecting the pigs in each frame of the pig video and passing the detection results to the DeepSORTpig algorithm. After obtaining the pig target detection results in the current image, the DeepSORTpig algorithm first reads the detection results of the previous frame and predicts the pig's motion trend in the current image. Then, the pig reidentification network is used to determine the pig association relationship between the prediction results of the previous frame and the target detection results of the current image. The ID is then assigned based on the association relationship, and the ID is corrected using a spatiotemporal-state-based pig tracking correction method (IDFind module in Figure 3). Finally, pig counting is achieved using the frame-based counting method (ID-Match module in Figure 3).

YOLOv5xpig Pig Counting Object Detection Algorithm
The high-precision pig object detection algorithm is an important prerequisite to ensure the accuracy of pig counting, as low-precision pig object detection algorithms can result in multiple IDs being assigned to the same pig during tracking. Considering that the YOLO model has relatively higher detection speed, is easier to deploy and apply in commercial settings than other algorithms in the field of object detection, and has more community support, we chose to adopt the YOLO model for object detection. YOLOv5 is a popular object detection network that includes four versions: YOLOv5s, YOLOv5m, YOLOv5l, and YOLOv5x. Although YOLOv5x has the lowest running speed, it has the highest detection accuracy. Because the accuracy of pig counting is more important than speed in this task, the pig object detection network YOLOv5xpig was improved based on YOLOv5x to ensure the accuracy of pig counting. During the pig counting process, it is difficult to avoid the appearance of partial pig body parts (half a pig head or half a pig leg) because all pigs exist in situations where their bodies gradually appear in the camera's shooting range or gradually disappear from the camera's shooting range. A spatial pyramid pooling network (SPP-net) [33] refers to a type of neural network module that is designed to address challenges related to complex shape, large size differences, and low-texture features in object detection tasks. We added two SPP-net modules with different feature block sizes to the YOLOv5x network in the YOLOv5xpig pig object detection network to improve the detection of partial pig body parts. The SPP1 feature block sizes are 1 × 1, 11 × 11, 13 × 13, and 15 × 15, while the SPP2 feature block sizes are 1 × 1, 3 × 3, 5 × 5, and 7 × 7. The improved YOLOv5xpig structure is shown in Figure 4. By using two SPP-net modules with different feature block sizes, YOLOv5xpig obtains more feature information for detection with its three different detection heads. Furthermore, to reduce the loss of feature information from partial pig body parts caused by pooling operations, YOLOv5xpig uses SoftPool [34] instead of the MaxPool operation in SPP-net, as SoftPool can better preserve detailed information in the feature map and prevent overfitting. The core idea of SoftPool is to use the SoftMax function to calculate activation weights w i for a feature region S with a size of C × H × W, where a i is the activation value in the feature region S. The formula is as follows:

YOLOv5xpig Pig Counting Object Detection Algorithm
The high-precision pig object detection algorithm is an important prerequisite to ensure the accuracy of pig counting, as low-precision pig object detection algorithms can result in multiple IDs being assigned to the same pig during tracking. Considering that the YOLO model has relatively higher detection speed, is easier to deploy and apply in commercial settings than other algorithms in the field of object detection, and has more  YOLOv5xpig is a specific algorithm used for pig detection and recognition. SPP-net is a specific convolutional neural network structure that can handle inputs of varying sizes using spatial pyramid pooling (SPP).

Pig Counting and Reidentification Network
The reidentification network in the original DeepSORT algorithm was designed for pedestrians, but there are significant differences in the way pigs and humans walk. Therefore, a ratio of 1:2 is often used for pedestrian feature extraction, while a ratio of 2:1 is used for pig images to better extract pig features. Additionally, the DeepSORT algorithm uses the appearance of targets to improve tracking accuracy, which requires using a pig reidentification network to match the detected pigs in the YOLOv5xpig algorithm detection results. Although there is not much difference in appearance between pigs, in reality, there are significant differences in the characteristics among individual pigs, such as coat color, pattern, and body type, as shown in Figure 5. In order to improve the pig tracking matching rate, we designed a pig reidentification network (pig reidentification model) based on the characteristics of pig image sizes, as shown in Table 1. The pig reidentification model mainly includes 1 convolutional layer, 1 max pooling layer, 9 residual layers, and 1 average pooling layer. Finally, the output feature dimension of the pig r-identification network is increased from 128 to 512 compared with the original network. YOLOv5xpig is a specific algorithm used for pig detection and recognition. SPP-net is a specific convolutional neural network structure that can handle inputs of varying sizes using spatial pyramid pooling (SPP).
Finally, after standard summation calculation of all activation weights w i and activation values a i within feature region S, the output value ∼ a of the SoftPool operation can be obtained, as shown in the following formula: In addition, we used the CIOU loss to measure the loss of the bounding boxes, with a formula as follows: where ρ is the distance between the centers of the predicted and ground truth boxes, c is the diagonal length of the minimum bounding rectangle of the predicted and ground truth boxes, v is the normalized difference in the aspect ratio between the predicted and ground truth boxes, α is the influence factor of v, and IOU is the ratio of the intersection to the union of the predicted and ground truth boxes.

Pig Counting and Reidentification Network
The reidentification network in the original DeepSORT algorithm was designed for pedestrians, but there are significant differences in the way pigs and humans walk. Therefore, a ratio of 1:2 is often used for pedestrian feature extraction, while a ratio of 2:1 is used for pig images to better extract pig features. Additionally, the DeepSORT algorithm uses the appearance of targets to improve tracking accuracy, which requires using a pig reidentification network to match the detected pigs in the YOLOv5xpig algorithm detection results. Although there is not much difference in appearance between pigs, in reality, there are significant differences in the characteristics among individual pigs, such as coat color, pattern, and body type, as shown in Figure 5. In order to improve the pig tracking matching rate, we designed a pig reidentification network (pig reidentification model) based on the

Pig Tracking Method Based on Spatial State Correction
DeepSORT is an improvement upon the SORT object tracking algorithm, which defines tracking scenarios and cascade matching strategies using an eight-dimensional spatial state (u, v, γ, h, u, v , γ, h) to achieve higher object tracking accuracy. Although DeepSORT has good tracking performance, rapid movements, mutual compression, or posture changes of pigs may cause the algorithm to fail in tracking them. In the task of counting pigs in corridors, new complete pigs cannot suddenly appear in the center of the video screen: they can only appear at the two ends of the screen. Therefore, if a new complete pig's ID appears in the center of the video, the ID must be incorrect and needs to be corrected. Based on these practical situations, we designed a pig tracking algorithm based on spatial state correction on the basis of the DeepSORT algorithm. The algorithm is implemented through the IDFind module, shown in Figure 3. The specific steps are as follows: (1) Read all pig IDs from the current and previous frames, and use the positions and IDs of pigs in both frames to identify new and lost pigs.
determine whether a new pig ID needs to be corrected. Calculate R using Formula (4); if R > 0, then there is a complete pig that is not close to the edge of the image, and the pig ID needs to be corrected.

Pig Tracking Method Based on Spatial State Correction
DeepSORT is an improvement upon the SORT object tracking algorithm, which defines tracking scenarios and cascade matching strategies using an eight-dimensional spatial state h) to achieve higher object tracking accuracy. Although DeepSORT has good tracking performance, rapid movements, mutual compression, or posture changes of pigs may cause the algorithm to fail in tracking them. In the task of counting pigs in corridors, new complete pigs cannot suddenly appear in the center of the video screen: they can only appear at the two ends of the screen. Therefore, if a new complete pig's ID appears in the center of the video, the ID must be incorrect and needs to be corrected. Based on these practical situations, we designed a pig tracking algorithm based on spatial state correction on the basis of the DeepSORT algorithm. The algorithm is implemented through the IDFind module, shown in Figure 3. The specific steps are as follows: (1) Read all pig IDs from the current and previous frames, and use the positions and IDs of pigs in both frames to identify new and lost pigs. R > 0, then there is a complete pig that is not close to the edge of the image, and the pig ID needs to be corrected.
where γ is the aspect ratio of the detection box, h is the height of the detection box, and v represents the x value of the detection center (x,y), 120,000 is the maximum detection area for pig images, and 30,000 is one-quarter of the maximum detection area for pig images. The maximum detection area for pig images can be obtained in Formula (5) below. Meanwhile, we think that only displaying pigs with a body volume greater than 1/4 indicates that they have entered the image. In practical use, the parameter 3000 can be adjusted based on the specific situation. (3) Read all lost pig information, use the Euclidean distance to find the lost pig closest to the new pig, and replace the new pig ID with the old pig ID.

Pig Counting Method Based on Frame Number and Detection Area Judgment
When pigs stay at the edge of a video for a long time, they may have fewer features or frequently enter and exit the video edge, which makes it easy for the DeepSORT algorithm to double-count pigs at the edge. To address the issue of pigs at the video edge being prone to incorrect counting, this paper proposes a pig counting algorithm based on frame number and detection area judgment. This method is implemented through the ID Match module, as shown in Figure 3. The principle of this method is that when pigs pass through the aisle, the pixel area of their bodies gradually increases and then decreases in the video, forming a process of first increasing and then decreasing. The steps of the pig counting method based on frame number judgment are as follows: (1) In order to record all pig information for counting calibration and judgment, a twodimensional list List(id, n,γ,h,v) is used to save all pig information in chronological order after being corrected based on spatial state. Here, id represents the pig number, n represents the total number of times the pig has appeared in the different frames of the current video, (γ,h) represent the aspect ratio and detection height of the pig bounding box, and v represents the x value of the detection center (x,y).  (5) is used to determine whether the current pig needs to be counted. If N > 0, it is considered a valid count (the same pig has appeared in multiple frames with a complete body), and pig counting is performed. Then, the information of the counted pig is recorded using List1(id, n,v).
where γ is the aspect ratio of the detection box, h is the height of the detection box, n represents the total number of times the pig appears in different frames in the current video, z is the average frame number of pigs in the target tracking dataset, and s is the maximum image area of pigs in the target tracking dataset. (4) The change process of the v value in List and List1 is used to determine whether the counted pig has turned back. If a pig disappears from view and then turns back, the counting result needs to be adjusted according to the number of times that pig has turned back to ensure counting accuracy.

Experimental Environment and Training Parameters
The experiments in this study were conducted on a Windows 10 operating system with hardware configurations consisting of an Intel Core i7-9700 CPU@3.00 GHz and an NVIDIA GeForce RXT 2080Ti GPU. The main operating environment included Pytorch 1.6, Python 3.8, and CUDA 11.4. The training parameters for YOLOv5xpig and DeepSORTpig are shown in Table 2.

Evaluation Metrics for the MPC-YD Pig Counting Algorithm
The evaluation metrics in this study were divided into three parts: pig object detection, pig object tracking, and pig counting. For pig object detection, metrics including model size (Ms), mean average precision (including mAP:0.5 and mAP:0.95), recall (R), and FPS were used to evaluate the detection performance of the model.
TP represents the number of correctly identified pigs, FP represents the number of image regions that are incorrectly identified as pigs, FN represents the number of pig regions that are incorrectly identified, and AP represents the area under the precision-recall curve (PR curve).
For pig target tracking, accuracy (Acc), multiobject tracking accuracy (MOTA), and multiobject tracking precision (MOTP) were used as indicators to evaluate the model. Acc was used to evaluate the model's accuracy in recognizing pigs, MOTA was used to evaluate the accuracy of the tracker's continuous tracking results, and MOTP was used to evaluate the accuracy of the tracker's predicted target positions.
In Equation (10), TP represents the number of correctly identified pigs, FP represents the number of image regions that are incorrectly identified as pigs, TN represents the number of correctly identified nonpig regions, and FN represents the number of pig regions that are incorrectly identified. In Equation (11), t represents the tth frame of the image, m t represents the number of missed detections in the tth frame, fp t represents the number of false positives in the tth frame, mme t represents the number of mismatches in the tth frame, and g t represents the number of correct detections in the tth frame. In Equation (12), i refers to the matching result number, d i t represents the distance error between the matched target in the tth frame image and the predicted correct position, and c t represents the number of matches in the tth frame image.
For pig counting results, mean absolute error (MAE) and coefficient of determination (R 2 ) were used as indicators to evaluate the accuracy of the counting. The relevant calculation formulas are as follows: where y i represents the true value,ŷ i represents the predicted value, and y represents the mean of all true values.

Training of the MPC-YD Pig Counting Algorithm
The training curve of YOLOv5xpig with mAP of 0.95 is shown in Figure 6. From the graph, it can be observed that the model detection accuracy rapidly improves before 100 iterations, while the improvement rate gradually slows down between 100 and 200 iterations and starts to converge. Finally, at 500 iterations, the model detection accuracy tends to stabilize, and the model achieves the desired performance. the matched target in the tth frame image and the predicted correct position, and c t represents the number of matches in the tth frame image. For pig counting results, mean absolute error (MAE) and coefficient of determination (R 2 ) were used as indicators to evaluate the accuracy of the counting. The relevant calculation formulas are as follows: where y i represents the true value, y i represents the predicted value, and y represents the mean of all true values.

Training of the MPC-YD Pig Counting Algorithm
The training curve of YOLOv5xpig with mAP of 0.95 is shown in Figure 6. From the graph, it can be observed that the model detection accuracy rapidly improves before 100 iterations, while the improvement rate gradually slows down between 100 and 200 iterations and starts to converge. Finally, at 500 iterations, the model detection accuracy tends to stabilize, and the model achieves the desired performance. The training effect of the DeepSORTpig algorithm on pig reidentification using the pig target tracking dataset is shown in Figure 7. From the graph, it can be observed that the model accuracy rapidly decreases before 20 iterations but with large fluctuations. This is because there are significant differences in image size, shape, and pig posture in the pig target tracking dataset (Figure 2). From the loss curve and A curve, it can be seen that the model basically converges after 40 iterations; at this point, the model accuracy (ACC) reached 95.32%, indicating that it can accurately extract pigs' phenotype features. The training effect of the DeepSORTpig algorithm on pig reidentification using the pig target tracking dataset is shown in Figure 7. From the graph, it can be observed that the model accuracy rapidly decreases before 20 iterations but with large fluctuations. This is because there are significant differences in image size, shape, and pig posture in the pig target tracking dataset (Figure 2). From the loss curve and A curve, it can be seen that the model basically converges after 40 iterations; at this point, the model accuracy (ACC) reached 95.32%, indicating that it can accurately extract pigs' phenotype features. Sensors 2023, 23, x FOR PEER REVIEW 12 of 19 Figure 7. Training of the pig reidentification network, including the loss curve and error curve training results.

Analysis of Pig Object Detection Accuracy
In order to analyze the performance of the pig object detection algorithm, we compared YOLOv5xpig with other networks (YOLOv5s, YOLOv5m, YOLOv5l, YOLOv5x, and faster R-CNN), as shown in Table 3. From Table 3, it can be seen that although the YOLOv5xpig model is larger, its average precision (mAP:0.5 and mAP:0.95) is higher than that of YOLOv5s, YOLOv5m, YOLOv5l, YOLOv5x, and faster R-CNN. Moreover, the average precision (mAP:0.95) of YOLOv5xpig is 7.07% higher than that of YOLOv5x, and the average precision (mAP:0.5) is 3.12% higher than that of YOLOv5x. In addition, although the 33 FPS of YOLOv5xpig is relative low, it is nevertheless higher than the 25 FPS accuracy of the camera used in this study, which fully meets production requirements. The YOLOv5xpig algorithm can better detect local body parts of pigs than YOLOv5x, as shown in Figure 8. From Figure 8, it can be seen that YOLOv5xpig can detect piglet heads, which YOLOv5x cannot. Furthermore, because YOLOv5x cannot detect the piglet heads, as shown in Figure 8, it cannot identify pig No. 3 in the video, but it can be successfully tracked using YOLOv5xpig. Therefore, Figure 8 shows that the YOLOv5xpig algorithm can enable the MPC-YD algorithm to track pigs with only local body features, indicating that improving the detection of local body parts of pigs is very meaningful for improving pig tracking performance.

Analysis of Pig Object Detection Accuracy
In order to analyze the performance of the pig object detection algorithm, we compared YOLOv5xpig with other networks (YOLOv5s, YOLOv5m, YOLOv5l, YOLOv5x, and faster R-CNN), as shown in Table 3. From Table 3, it can be seen that although the YOLOv5xpig model is larger, its average precision (mAP:0.5 and mAP:0.95) is higher than that of YOLOv5s, YOLOv5m, YOLOv5l, YOLOv5x, and faster R-CNN. Moreover, the average precision (mAP:0.95) of YOLOv5xpig is 7.07% higher than that of YOLOv5x, and the average precision (mAP:0.5) is 3.12% higher than that of YOLOv5x. In addition, although the 33 FPS of YOLOv5xpig is relative low, it is nevertheless higher than the 25 FPS accuracy of the camera used in this study, which fully meets production requirements. The YOLOv5xpig algorithm can better detect local body parts of pigs than YOLOv5x, as shown in Figure 8. From Figure 8, it can be seen that YOLOv5xpig can detect piglet heads, which YOLOv5x cannot. Furthermore, because YOLOv5x cannot detect the piglet heads, as shown in Figure 8, it cannot identify pig No. 3 in the video, but it can be successfully tracked using YOLOv5xpig. Therefore, Figure 8 shows that the YOLOv5xpig algorithm can enable the MPC-YD algorithm to track pigs with only local body features, indicating that improving the detection of local body parts of pigs is very meaningful for improving pig tracking performance.

Analysis of Pig Object Tracking Accuracy
In order to verify the improved DeepSORT algorithm's performance in pig tracking, we compared the improved DeepSORT algorithm with the original algorithm, as shown in Table 4. From Table 4, it can be seen that the MOTA of the improved DeepSORT algorithm was improved by 1.2% compared with the original algorithm, but the MOTP was only improved by 0.34%. This is because the pig tracking method based on spatial state correction added in this study has the function of reducing the probability of target loss during pig tracking, thus improving the continuity of pig tracking, resulting in a greater improvement in MOTA. The comparison of the DeepSORTpig algorithm with and without the pig tracking method based on spatial state correction is shown in Figure 9. From Figure 9, it can be seen that the original algorithm lost track of pig No. 4 due to its rapid passage through the channel, resulting in pig No. 4 being misidentified as pig No. 5. However, according to the rules of the pig tracking method based on spatial state correction, if the bounding box area of pig No. 4 in the previous frame is large and the center of the bounding box is not on the edge of the image, it can be considered that pig No. 4 has not disappeared in the current image. Then, the tracking of pig No. 4 can be corrected based on the detection box and ID assignment. Finally, pig No. 4 is successfully maintained in the correct tracking state.

Analysis of Pig Object Tracking Accuracy
In order to verify the improved DeepSORT algorithm's performance in pig tracking, we compared the improved DeepSORT algorithm with the original algorithm, as shown in Table 4. From Table 4, it can be seen that the MOTA of the improved DeepSORT algorithm was improved by 1.2% compared with the original algorithm, but the MOTP was only improved by 0.34%. This is because the pig tracking method based on spatial state correction added in this study has the function of reducing the probability of target loss during pig tracking, thus improving the continuity of pig tracking, resulting in a greater improvement in MOTA. The comparison of the DeepSORTpig algorithm with and without the pig tracking method based on spatial state correction is shown in Figure 9. From Figure 9, it can be seen that the original algorithm lost track of pig No. 4 due to its rapid passage through the channel, resulting in pig No. Although the pig counting method based on frame number cannot improve the accuracy of pig tracking by DeepSORTpig, it can improve the accuracy of pig counting, as shown in Figure 10. Due to the busy state of the slaughterhouse, pig No. 4, shown in Figure 10a, remained at the edge of the video and was constantly squeezed by other pigs, causing its ID to change from 4 to 26. This also resulted in the original counting algorithm's count changing from 19 to 24, while the actual correct count should have changed from 15 to 17. However, by introducing a pig counting algorithm based on the number of frames and detection area, as shown in Figure 10b, even though the ID of pig No. 4 changed from 4 to 24, this algorithm only counted pigs No. 21 and No. 23, resulting in the correct count of 17. This successfully prevented the occurrence of duplicate pig counting due to pig ID changes caused by squeezing and prolonged stay. Although the pig counting method based on frame number cannot improve the accuracy of pig tracking by DeepSORTpig, it can improve the accuracy of pig counting, as shown in Figure 10. Due to the busy state of the slaughterhouse, pig No. 4, shown in Figure 10a, remained at the edge of the video and was constantly squeezed by other pigs, causing its ID to change from 4 to 26. This also resulted in the original counting algorithm's count changing from 19 to 24, while the actual correct count should have changed from 15 to 17. However, by introducing a pig counting algorithm based on the number of frames and detection area, as shown in Figure 10b, even though the ID of pig No. 4 changed from 4 to 24, this algorithm only counted pigs No. 21 and No. 23, resulting in the correct count of 17. This successfully prevented the occurrence of duplicate pig counting due to pig ID changes caused by squeezing and prolonged stay.

Analysis of Pig Counting Results
In order to verify the counting accuracy in this experiment, a total of 1695 pigs in 94 videos were counted, producing an MAE of 1.03 and an R 2 of 98.39%, as shown in Figure 11. From Figure 11, it can be seen that the error probability of the pig counting algorithm increases with the increased in the number of pigs, but the maximum error did not exceed eight pigs. This is because as the number of pigs increases, the situation of crowding between pigs becomes more complex. At the same time, a 3D scatter plot of pig counting was established using the pig video time, actual number, and algorithmic counting results, as shown in Figure 12. From Figure 12, it can be seen that the probability of counting errors increases with increases in pig numbers and video duration. Moreover, when the video duration is short but the number of pigs is large, errors more easily occur, as crowding at the channel exits is also more likely to occur, which increases the complexity of pig tracking due to the large number of pigs squeezing each other. However, considering Figures 11 and 12, it can be seen that when the number of pigs is no more than 10, the algorithm produced only 4 videos out of 41 with counting errors, with an MAE of 0.14. The error values of these four videos were one, one, one, and three, which meet the requirements for verifying the accuracy of manual counting for small slaughterhouses.

Analysis of Pig Counting Results
In order to verify the counting accuracy in this experiment, a total of 1695 pigs in 94 videos were counted, producing an MAE of 1.03 and an R 2 of 98.39%, as shown in Figure  11. From Figure 11, it can be seen that the error probability of the pig counting algorithm increases with the increased in the number of pigs, but the maximum error did not exceed eight pigs. This is because as the number of pigs increases, the situation of crowding between pigs becomes more complex. At the same time, a 3D scatter plot of pig counting was established using the pig video time, actual number, and algorithmic counting results, as shown in Figure 12. From Figure 12, it can be seen that the probability of counting errors increases with increases in pig numbers and video duration. Moreover, when the video duration is short but the number of pigs is large, errors more easily occur, as crowding at the channel exits is also more likely to occur, which increases the complexity of pig tracking due to the large number of pigs squeezing each other. However, considering Figures  11 and 12, it can be seen that when the number of pigs is no more than 10, the algorithm produced only 4 videos out of 41 with counting errors, with an MAE of 0.14. The error values of these four videos were one, one, one, and three, which meet the requirements for verifying the accuracy of manual counting for small slaughterhouses.  Figure 11. Pig counting two-dimensional image result. The yellow dots represent the counting results that match the ground truth, while darker dots indicate larger discrepancies between the counting results and the ground truth. Figure 11. Pig counting two-dimensional image result. The yellow dots represent the counting results that match the ground truth, while darker dots indicate larger discrepancies between the counting results and the ground truth. Figure 11. Pig counting two-dimensional image result. The yellow dots represent the counting results that match the ground truth, while darker dots indicate larger discrepancies between the counting results and the ground truth.

Figure 12.
Pig counting three-dimensional image result. The yellow dots represent the counting results that match the ground truth, while darker dots indicate larger discrepancies between the counting results and the ground truth.

Counting Test in Different Breeding Environments.
In order to verify the generalization of the counting algorithm, we tested pig breeding videos with different numbers of pigs for a duration of 30 s, as shown in Figure 13. From Figure 13, it can be seen that the algorithm correctly counted two, three, four, and five pigs, indicating that the algorithm can be used for counting in conventional breeding environments and has good generalization. In addition, because pigs in breeding environments can appear in the video for a long time, which is different from the situation in the aisle of a slaughterhouse, it was necessary to set the MAX_AGE parameter to not execute

Counting Test in Different Breeding Environments
In order to verify the generalization of the counting algorithm, we tested pig breeding videos with different numbers of pigs for a duration of 30 s, as shown in Figure 13. From Figure 13, it can be seen that the algorithm correctly counted two, three, four, and five pigs, indicating that the algorithm can be used for counting in conventional breeding environments and has good generalization. In addition, because pigs in breeding environments can appear in the video for a long time, which is different from the situation in the aisle of a slaughterhouse, it was necessary to set the MAX_AGE parameter to not execute the ID deletion operation and reset the MAX_AGE parameter to zero when triggered, in order to prevent pig tracking from being lost due to triggering of the MAX_AGE parameter when the pig reaches the maximum frame number during counting in breeding environments. the ID deletion operation and reset the MAX_AGE parameter to zero when triggered, in order to prevent pig tracking from being lost due to triggering of the MAX_AGE parameter when the pig reaches the maximum frame number during counting in breeding environments. Figure 13. Pig counting result in breeding environment.

Discussion
Although the MAE of the pig counting algorithm in this paper could reach 0.14 when the number of pigs was no more than 10, the application and promotion of this algorithm still has a lot of room for improvement, as the number of pigs in each batch entering the

Discussion
Although the MAE of the pig counting algorithm in this paper could reach 0.14 when the number of pigs was no more than 10, the application and promotion of this algorithm still has a lot of room for improvement, as the number of pigs in each batch entering the slaughterhouse cannot be fixed to below 10. Currently, the strategy for promoting the application of this algorithm is to split large groups of pigs into multiple groups with less than 10 pigs for multiple counting sessions, which ensures the accuracy of the pig counting. The counting effect of this algorithm under a smooth channel is shown in Video S1. In addition, when counting in a farming environment, the counting range of this algorithm is limited to no more than five pigs. However, if the number of pigs exceeds five, the small size of the farming troughs may result in a higher probability of counting errors due to pigs blocking each other while eating. Nevertheless, despite this limitation, the algorithm still possesses the ability to stably track pigs, providing important technical support for studying pig behavior patterns in farming environments. Therefore, in addition to improving the existing short-comings, the next step to improve this algorithm is to analyze pig behavior patterns by tracking them.

Conclusions
To solve the problem of low efficiency in manual pig counting and easy loss of pig tracking in video, we developed an MPC-YD pig tracking and counting method. In this algorithm, we introduced a method of embedding two SPP-nets with different feature block sizes into the YOLOv5x network to improve pig target detection accuracy, addressing the problem of difficult detection of local pig body parts. Furthermore, we added a pig reidentification network, a pig tracking method based on spatial state correction, and a pig counting method based on frame number judgment to the DeepSORT algorithm to reduce pig counting errors. Finally, the pig counting coefficient of determination (R2) of this model reached 98.14%, and it also demonstrated good tracking and counting effects in breeding environments.
It is worth mentioning that all the data used for training and validation in this method were collected from real production environments, and only one camera needs to be installed above the pig pen aisle to implement this method. Therefore, this method is easy to promote and apply and can provide technical support for intelligent pig counting in both breeding and slaughterhouse environments.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author. The data are not publicly available due to the fact that the data would reveal the operation of the slaugh-terhouse.