Broiler Behavior Detection and Tracking Method Based on Lightweight Transformer

Qi, Haixia; Chen, Zihong; Liang, Guangsheng; Chen, Riyao; Jiang, Jinzhuo; Luo, Xiwen

doi:10.3390/app15063333

Open AccessReview

Broiler Behavior Detection and Tracking Method Based on Lightweight Transformer

by

Haixia Qi

^1,2,*

,

Zihong Chen

¹

,

Guangsheng Liang

¹,

Riyao Chen

^1,3,

Jinzhuo Jiang

¹ and

Xiwen Luo

^1,4

¹

College of Engineering, South China Agricultural University, Guangzhou 510642, China

²

Guangdong Laboratory for Lingnan Modern Agriculture, Guangzhou 510642, China

³

School of Automation Science and Engineering, South China University of Technology, Guangzhou 510641, China

⁴

Key Laboratory of Key Technology on Agricultural Machine and Equipment, South China Agricultural University, Ministry of Education, Guangzhou 510642, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(6), 3333; https://doi.org/10.3390/app15063333

Submission received: 10 February 2025 / Revised: 11 March 2025 / Accepted: 14 March 2025 / Published: 18 March 2025

(This article belongs to the Special Issue Big Data and AI for Food and Agriculture)

Download

Browse Figures

Versions Notes

Abstract

Detecting the daily behavior of broiler chickens allows early detection of irregular activity patterns and, thus, problems in the flock. In an attempt to resolve the problems of the slow detection speed, low accuracy, and poor generalization ability of traditional detection models in the actual breeding environment, we propose a chicken behavior detection method called FCBD-DETR (Faster Chicken Behavior Detection Transformer). The FasterNet network based on partial convolution (PConv) was used to replace the Resnet18 backbone network to reduce the computational complexity of the model and to improve the speed of model detection. In addition, we propose a new cross-scale feature fusion network to optimize the neck network of the original model. These improvements led to a 78% decrease in the number of parameters and a 68% decrease in GFLOPs. The experimental results show that the proposed model is superior to the traditional network in the speed, accuracy and generalization ability of broiler behavior detection. (1) The detection speed is improved from 49.5 frames per second to 68.5 frames per second, which is 22.6 frames and 10.9 frames higher than Yolov7 and Yolov8, respectively. (2) mAP0.5 reaches 99.4%, and MAP0.5:0.95 increases from 84.9 to 88.4%. (3) Combined with the multi-target tracking algorithm, the chicken flock counting, behavior recognition, and individual tracking tasks are successfully realized.

Keywords:

broiler chicken behavior detection; FCBD-DETR; FasterNet; behavior recognition

1. Introduction

According to the Agricultural Outlook 2023–2032 published by the Food and Agriculture Organization of the United Nations, poultry is projected to dominate meat production, accounting for half of all new meat production in the next decade, with broiler chickens playing a significant role in this sector [1], p. 193. Consequently, there is an increasing focus on the welfare of broiler breeding chickens [2]. Welfare quality criteria indicate that animal health is a crucial indicator for assessing effective welfare measures. For instance, positive behaviors, such as feeding, serve as indicators of health; their absence may suggest a negative emotional state [3,4]. Furthermore, prolonged lying is often indicative of discomfort in birds, as well as issues related to leg health and disease [5,6]. For example, lameness in broiler chickens typically leads to reduced activity levels, resulting in extended periods of lying [7]. Therefore, monitoring and analyzing the feeding, drinking, standing, and lying behaviors of broilers can facilitate the tracking of flock health status and enable the early detection and prevention of leg diseases, thereby minimizing significant losses within the flock [8,9].

Traditional disease detection in poultry flocks is typically performed by farmers and veterinarians through visual observation of poultry behavior and vocalizations [10,11]. However, in large-scale production environments, this method often fails to provide timely detection [12], potentially leading to significant losses for the entire flock [13]. Additionally, some studies examining the effects of varying light intensities, colors, and feeding methods on chicken behavior [14,15,16] frequently utilize video recordings to manually quantify behavioral changes. This approach, however, is not only time-consuming and labor-intensive but is also susceptible to numerous statistical errors. To address the afore-mentioned issues, researchers are continually exploring more effective behavior detection methods. For instance, Li et al. (2017) [17], utilized the UHF RFID system to monitor the feeding and nesting behaviors of individual hens. Aydin (2017) [18] assessed the lameness behavior of broiler chickens through the application of image processing algorithms. Furthermore, Li et al. (2018) [19], employed the UHF RFID system to continuously observe the feeding behavior of individual broilers. Furthermore, due to the ease of camera positioning and minimal interference to animals [20], deep learning-based animal behavior detection technology has emerged as a non-invasive, efficient, and cost-effective solution, widely applied to study the behavior, movement, and activities of animals in group environments. For example, Xiao Deqin et al. (2023) [21] used the YOLOX algorithm to detect key behaviors (eating, drinking, resting, and stress) in Majang geese. Wang J. (2020) [22] used the YOLOV3 algorithm to identify the behavior of caged laying hens. Qi et al. (2024) [23] used the lightweight YOLOV4 network to successfully detect dead chickens in a large-scale caged layer environment. Gu et al. (2022) [24] applied the YOLOV5 detection algorithm to identify the daily behaviors of laying ducks, and Guo et al. (2022) [25] utilized a CNN network model to effectively detect eating, drinking, standing, and resting behaviors in broilers of various ages. However, broiler chickens exhibit more dynamic behaviors, with frequent changes in morphology and orientation compared to laying hens. Higher stocking densities present greater challenges for individual behavior detection, often limiting the study of broiler behavior to either a single behavior of an individual or a single behavior of multiple individuals. This study will focus on the detection and tracking of multiple behaviors in group broilers to provide a foundation for monitoring broiler health and welfare and advancing research on broiler behavior.

In practical agricultural applications, complex scenes and target scale changes pose challenges to target detection. Traditional CNN-based detectors have limitations in adapting to individual differences [26]. The Transformer detector can better capture global information [27] and improve generalization ability [28]. Therefore, combining a CNN backbone with a Transformer detector has become increasingly popular [29], which not only simplifies the process but also improves performance. Given the above advantages, the Transformer detector is more suitable for broiler behavior detection. RT-DETR, as the first real-time end-to-end detector, shows better accuracy and speed than YOLO on the COCO dataset. However, its computational requirements (57 G GFLOPs, 20 M parameters) pose challenges for deployment and application in broiler farming scenarios.

In summary, to address these challenges and promote the development of the broiler industry and related research, this paper proposes a broiler behavior detection model that offers a faster detection speed, higher accuracy, enhanced generalization ability, and improved deployment efficiency through the optimization of RT-DETR. The main contributions of this paper are summarized as follows:

We propose replacing the original ResNet and HG series backbone network models with a lightweight backbone network, FasterNet. By implementing PConv, FasterNet minimizes redundant computations and memory accesses, thereby enhancing the efficiency of spatial feature extraction. This approach reduces the requirement of the model on the computing power of the device and improves the detection speed.
To enhance information interaction within the model, we propose a novel cross-scale feature fusion network that includes a feature enhancement module (DDConv) and a feature fusion module (MSDCC). The feature enhancement module expands the convolution kernel, thereby increasing the receptive field of the feature prediction layer. Concurrently, the feature fusion module improves the model’s capacity to capture broader contextual information through residual connections, resulting in a richer gradient flow that compensates for the inherent limitations of feature extraction. This enhancement enables the model to more accurately comprehend the input data and generate more informative and discriminative representations, ultimately improving classification accuracy and generalization ability.
The proposed combination of FCBD-DETR and Bytetrack [30] effectively facilitates the recognition, tracking, and counting of chicken behaviors. The FCBD-DETR model demonstrates outstanding performance across various downstream tasks. The implementation of these technologies can enhance the management efficiency of chicken farms, optimize resource allocation, and promote the sustainable development of smart farming practices.

2. Materials and Methods

2.1. Data Collection

From 25 May to 14 June 2023, a total of 100 yellow-striped broilers were raised at the Chick Feeding Test Site of the Institute of Animal Husbandry, Guangdong Academy of Agricultural Sciences. The rearing density was set at 0.04 m² per broiler, with 1 trough, 4 dropper heads, and 25 chickens per cage. A dimmable LED bulb was installed on the ceiling, maintaining a light intensity of 10 lx. Lighting was scheduled to be on from 01:00 to 21:00. The broilers were placed on wood chips on the floor and had unrestricted access to food and water.

We recorded the behavior of broiler chickens in real time using 24 h video capture. Given that the size and shooting angle of the target broiler chickens can interfere with the effectiveness of the behavior recognition model, we aimed to further evaluate the model’s generalization and robustness, as well as enhance its resistance to interference. During the experiment, we continuously captured video data of broiler chickens at two different farming densities over a span of 20 days from two distinct heights and angles. These data were stored as MP4 files on an external hard drive. The camera employed was a Logitech 4K webcam(c1000e), with the video resolution set at 1280 × 720 pixels and a sampling rate of 24 frames per second. From this 20-day video data, we created two datasets corresponding to the different heights and angles, with the main information of each dataset presented in Table 1 and Figure 1.

2.2. Methods

In this study, we constructed a chicken behavior recognition dataset using images extracted from 20 days of video data. After eliminating redundant and feature-blurred images, we selected 2000 images from the chicken behavior data collected between 7 June and 14 June to create Dataset A. Additionally, we selected 400 images from the chicken behavior data collected between 1 June and 7 June to form Dataset B. This temporal division allows us to evaluate the stability of the model’s recognition performance when confronted with chickens of varying ages and growth stages. Dataset A is divided into a training set, a validation set, and test set A in a ratio of 6:2:2, which is employed to evaluate the model’s training results. Dataset B serves as test set B and is specifically designed to assess the model’s discriminative power across various contexts. The diversity of the data samples is directly related to the model’s generalization ability and robustness, significantly impacting its training efficacy. To enhance sample diversity, data augmentation techniques were applied to the images in Dataset A, expanding the training set to 4500 images and the validation and test sets to 1875 images each. These techniques included random angle rotation, random flipping, random occlusion, random brightness adjustment, and noise addition, as illustrated in Figure 2. Dataset B remains unprocessed.

2.3. Data Definition and Annotation

The experimental environment in this paper is based on a 64-bit Windows11 operating system with 16GB of RAM and a 12th Gen Intel (R) Core (TM) i5-12400F 2.50 G processor. The graphics card type is NVIDIA RTX3060, and the video memory is 12 GB. The GPU is accelerated by CUDA11.6, and the algorithm is trained based on the deep learning framework Pytorch1.13.1.

In this study, images of broiler chickens were manually annotated using the LabelImg annotation tool, with each annotation box representing the smallest bounding rectangle that captures the observed behaviors of the broilers. The annotated behaviors include drinking, eating, standing, and lying down. Table 2 presents the criteria for classifying these behaviors, while Figure 3 provides a schematic representation of the classification. Figure 4 illustrates a statistical plot of the number of examples and the size and location distribution of the labeled boxes for each analogy in the training set. For details of the other datasets, refer to Table 3.

2.4. RT-DETR

RT-DETR provides a robust framework, as illustrated in Figure 5. Its backbone network employs a convolutional neural network (CNN) to extract outputs from three scale layers, specifically S3, S4, and S5, with S5 serving as the input for the Transformer encoder AIFI. The mathematical procedures are detailed in Equations (1) and (2). Notably, S3, S4, and S5 convert multi-scale features into image feature sequences through the cross-scale feature fusion module (CCFM), with the corresponding mathematical process presented in Equation (3). Subsequently, a fixed number of image features are selected as the initial object queries for the decoder via the IoU-aware query selector. Ultimately, the decoder within the auxiliary prediction header iteratively optimizes the object queries to produce bounding boxes and confidence scores. This approach transforms the detection task into an unordered sequence output, significantly mitigating the influence of variations in size, shape, and texture among target individuals on the model’s detection performance. Therefore, adopting RT-DETR as the benchmark model for broiler behavior detection is a viable strategy.

Q = K = V = F l a t t e n (S_{5})

(1)

F_{5} = R e s h a p e (A t t n (Q, K, V))

(2)

O u t p u t = C C F M (\{S_{3}, S_{4}, F_{5}\})

(3)

where Attn denotes multi-head self-attention and Reshape refers to restoring the feature shape to that of S5, which is the inverse operation of Flatten.

However, RT-DETR still faces several challenges in broiler behavior detection, including the following:

(1): Compared to other lightweight detection models, the Transformer requires the storage of large-scale key-value pairs due to its fully connected attention mechanism. This requirement can lead to memory limitations and slow detection speeds, particularly given the restricted computing resources available at actual breeding sites.
(2): The single-neck feature structure of RT-DETR often results in the loss of contextual semantic information. Chickens possess similar appearance characteristics and tend to congregate, which poses significant challenges for distinguishing individual behaviors. This is especially problematic during detection and tracking, where mutual occlusion between individuals can easily result in target loss.
(3): The varying shooting equipment utilized by different breeding institutions and researchers leads to significant discrepancies in the acquired images. Additionally, slight variations in shooting angles and heights can cause substantial changes in the image background. Consequently, the model must demonstrate strong generalization capabilities to effectively manage variable scenes and target features.

2.5. Modified FDDC-DETR

To address the challenges faced by RT-DETR in detecting broiler behavior, this study proposes an efficient multi-scale chicken FCBD-DETR model. The structure of the proposed model is illustrated in Figure 6. Enhancements to the original network structure include the use of FasterNet as the backbone network, which reduces redundant computation and memory access, thereby addressing the issue of slow detection speed. In the hybrid encoder section, we have replaced the conventional convolution (Conv) and Fusion modules with our developed DDConv and MSDDC modules, respectively. DDConv retains a portion of the shallow image information while acquiring deeper semantic features through channel splitting, which deepens the convolution layer. This facilitates the capture of global information, resulting in more detailed and comprehensive image data. It also provides deeper, more advanced, and richer semantic features for the Transformer, thereby enhancing the network’s ability to learn and distinguish various features. A more detailed explanation of these improved modules is provided below.

2.5.1. FasterNet

To deploy an accurate broiler behavior detection model in chicken farms with limited computing resources, we utilize FasterNet to minimize computational load. Chen et al. (2023) [31] developed FasterNet by leveraging the low FLOP characteristics of Pconv. The architecture comprises four FasterNet blocks, with each block containing one PConv layer and two 1 × 1 Conv layers. The structure of Pconv is illustrated in Figure 7, while the FLOPs calculation formula is presented in Equation (4). Conventional convolution is applied only to Cp channels of the feature map with input dimensions h × w × c, leaving the other channels unchanged. Consequently, when Cp accounts for 1/4 of the channels, its FLOP is merely 1/16 that of conventional convolution. The batch normalization (BN) layer is employed to accelerate training speed and enhance accuracy, while the ReLU layer serves as the activation function to expedite model training and mitigate the vanishing gradient problem, resulting in lower latency and preserving feature diversity. To achieve a balance between detection accuracy and scalability, we select FasterNet-t0 as the backbone network of our model. This choice effectively reduces the number of FLOPs, which in turn decreases memory access and latency. Such enhancements enable the model to lower the detection time per image during the detection process, yielding higher FPS values and facilitating efficient detection results without reliance on powerful hardware computing devices. Equation (4) is as follows:

f_{F L O P s} = h \times ω \times {c_{p}}^{2} \times k^{2}

(4)

where f_FLOPs represents the number of floating-point operations, h and w represent the height and width of the feature map, respectively, c_p represents the number of channels involved in convolution, and k represents the size of the convolution kernel.

2.5.2. DDConv and MSDCC Module

It is unrealistic to build a powerful model solely by increasing the width and depth of the network to address the issues of poor detection and generalization capabilities. Instead, we aim to adopt a lightweight design that reduces the computational burden while enhancing network performance. Depthwise separable convolution (DWConv) is a widely recognized lightweight optimization method for convolutional networks, primarily consisting of two processes, namely depthwise convolution and pointwise convolution (PWConv). Depthwise convolution operates by convolving each channel independently with a single convolution kernel. Compared to conventional convolution, this approach effectively minimizes redundant calculations and floating point operations (FLOPs). However, the exclusive reliance on channel-wise convolution hampers the information connectivity between channels, leading to suboptimal feature learning and reduced accuracy. Consequently, PWConv is typically employed to increase the number of channels and facilitate linear combinations among them, thereby enhancing the integration of information across channels and improving the model’s expressive capacity. The memory access patterns of DWConv and conventional convolution are illustrated in Equations (5) and (6). It is observed that when using PWConv to expand the number of channels, the condition c′ > c arises, resulting in higher computational costs compared to standard convolution. This indicates that the inclusion of pointwise convolution in depthwise separable convolution necessitates greater memory resources. To address the high computational load associated with pointwise convolution, we draw on the shuffle concept proposed in previous studies [32,33] and design a dual-depth fused convolution block, which we refer to as DDConv. Its structure is shown in Figure 8. Equations (5) and (6) are as follows:

h \times w \times {2 c}^{'} + k^{2} \times c^{'} \approx h \times ω \times {2 c}^{'}

(5)

h \times w \times 2 c + k^{2} \times c \approx h \times ω \times 2 c

(6)

where h and w denote the height and width of the feature map, respectively, c denotes the number of channels for ordinary convolution, and c’ denotes the number of channels for pointwise convolution.

We apply a normal convolution to reduce the number of channels to one-quarter of the original. For the initial deep convolution, we utilize a 5 × 5 convolution kernel, followed by a mixed strategy shuffle that rearranges the channels within each group. This shuffling interleaves channels from different groups, enhancing the information flow between the deep convolution channels and the regular convolution channels, thereby improving the model’s feature representation capability. Subsequently, we perform deep convolution on the interactive information a second time to extract more profound features, which are then concatenated with the interactive information to produce the final output. This process effectively integrates the feature information from the ordinary convolution into the two deep convolution channels, further reducing the computational load and strengthening the information connection between each channel. The DDConv method retains a small portion of the initial semantic information while acquiring more extensive and deeper semantic insights, ultimately enhancing the model’s feature extraction capabilities and improving detection accuracy.

Aggregation often results in significant occlusions within high-density chicken rearing environments. The feature fusion module (Fusion, as shown in Figure 9) of the original model possesses a singular structure, which leads to a deficiency in contextual semantic information, resulting in missed or false detections. To address this issue, a multi-scale feature fusion module, termed MSDCC, is proposed. Figure 9 illustrates both the MSDCC and the original Fusion structure. MSDCC is designed to extract diverse features from multi-layer inputs and to effectively fuse these features. The incorporation of two residual connections enhances the module’s capacity to capture broader contextual information, thereby facilitating richer gradient flows. This enhancement mitigates the limitations associated with feature extraction and enables the model to interpret input data with greater accuracy, ultimately yielding a more informative and discriminative representation. The proposed method significantly enhances both classification accuracy and generalization ability.

2.6. Evaluation Metrics and Experimental Environment

2.6.1. Evaluation Metrics

In this paper, five metrics are employed to evaluate the model, including model parameters (Params), gigabit floating point operations (GFLOPs), and frames per second (FPS), which reflect the model’s storage requirements, computational complexity, and detection speed. Additionally, mAP50 (mean average precision at an IoU threshold of 0.5) and mAP50-95 (mean average precision across IoU thresholds from 0.5 to 0.95) are utilized to assess the model’s detection performance. Equations (7) and (8) are as follows:

P r e c i s i o n = \frac{T P}{T P + F P}

(7)

R e c a l l = \frac{T P}{T P + F N}

(8)

where TP is the number of positive samples predicted as positive samples, FP is the number of negative samples predicted as positive, and FN is the number of positive samples predicted as negative samples. Equations (9) and (10) are as follows:

A P = \int_{0}^{1} P r e c i s i o n (R e c a l l) d R e c a l l

(9)

m A P = \frac{1}{N} \sum_{i = 1}^{N} A P_{i}

(10)

where AP [35] is the average precision, mAP is the mean average precision, and N is the number of categories.

2.6.2. Experimental Environment

This experimental environment is based on a 64-bit Windows 11 operating system, equipped with 16 GB of RAM and a 12th Gen Intel(R) Core(TM) i5-12400F processor operating at 2.50 GHz. The graphics card utilized is an NVIDIA RTX 3060, which features 12 GB of video memory. GPU acceleration is provided by CUDA 11.6, and the training is conducted using the deep learning framework PyTorch 1.13.1. The training parameter settings are detailed in Table 4.

3. Results and Discussion

3.1. Backbone Improvements

So far, the official RT-DETR has launched seven fundamental models. Notably, the -R101, -X, and -H models possess approximately twice the number of parameters and gigabit floating-point operations compared to the -R18, -R34, -R50, and -L models, which provides the latter with a considerable advantage in deployment efficiency. This study aims to enhance the deployment efficiency of the RT-DETR algorithm while preserving high detection performance. We trained models on dataset A using backbone networks including-R18, -R34, -R50, -L, and FasterNet, subsequently comparing their performance across two test sets, as detailed in Table 5. In selecting the backbone network, we prioritized the influence of the number of parameters and complexity on model efficiency, ultimately choosing FasterNet due to its fewer parameters and lower computational requirements. As indicated in Table 5, FasterNet achieves relatively low mAP50 and mAP95 scores on the four test sets; however, it comprises only 10.8 M parameters and 28.5 GFLOPs of computation, significantly lower than other backbone networks. In contrast, while ResNet50 demonstrates high accuracy across all test sets, it contains 36 M parameters and 136 GFLOPs, which restricts its applicability in low-resource settings. Therefore, employing FasterNet as the backbone network is advantageous for reducing model complexity, even if it results in a slight decrease in accuracy. Future research will concentrate on optimizing the network architecture to further enhance the model’s generalization ability and accuracy, aiming to achieve a better balance between performance and resource constraints, ultimately providing an efficient solution for object detection in embedded scenarios.

3.2. DDConv and MSDCC Ablation Experiments

It has long been established that the performance of a model can be enhanced by increasing its depth, width, input resolution, bottleneck rate, and group width. Ding et al. (2022) [36], demonstrated that expanding the effective receptive field of a convolutional neural network (CNN) by adding convolutional kernels can also lead to improved model performance. However, this approach results in an exponential increase in the number of parameters and floating-point operations. Chollet et al. (2017) and Howard et al. (2017) [37,38], proposed that this drawback can be significantly mitigated through the application of depthwise convolution (DW). Building on this concept, we propose DDConv in this paper, which expands the receptive field of the model by applying depthwise convolution (DW) with a 5 × 5 kernel twice after channel splitting. Figure 10 presents a schematic representation of the receptive field before and after model enhancement. In each image, the extent of the dark region (green) indicates the area influenced by the pixel on the output; specifically, a darker region signifies that the corresponding input pixel has a greater contribution to the network output. Consequently, a more extensively distributed dark area reflects a larger effective receptive field (ERF) [39]. Therefore, DDConv demonstrates a larger receptive field in comparison to the original convolution. Additionally, Table 6 shows the high contribution area ratio for different t thresholds before and after model improvement.

The threshold t represents the percentage of the highest contribution scores considered for calculating the high-contribution area ratio r. For example, a threshold of t = 20% means that only the pixels with the top 20% contribution scores are included when determining the area ratio r. In Table 6, the effective receptive field of the optimized downsampling layer demonstrates a significant enhancement at low t values (20% and 30%), achieving 2.3% and 4.4%, respectively. This represents a substantial improvement compared to the original downsampling layer’s performance of 1% and 2.3%. These results indicate that the new module enhances the model’s capacity to capture local information. Furthermore, comparable performance (10.4%) is sustained for medium and high t values (50%), effectively preserving robust global feature extraction capabilities. Although there is a slight reduction in receptive fields at extremely high t values (99%), the model exhibits a more balanced capacity to capture varying levels of information, which is crucial for chicken detection scenarios that require simultaneous processing of local details and global semantic information. Therefore, we selected DDonv as the downsampling layer and further optimized the model by designing MSDCC.

Table 7 demonstrates that the introduction of MSDCC results in lower FLOPs and a reduced number of parameters. On test set A and test set B, the mAP50 and mAP95 metrics of the proposed model outperform those of other models. This not only indicates the high accuracy and robustness of the proposed model on these test sets but also further validates the effectiveness of DDConv and MSDCC in enhancing the model’s performance.

3.3. Comparison Experiments of Different Models

To further validate the superiority of the proposed model, a comparative experiment on chicken behavior detection was conducted using a traditional target recognition network. The experimental results are presented in Table 8. This method achieves an mAP50 of 99.4%, demonstrating performance improvements of 1.2%, 0.8%, 0.1%, 0.3%, and 0.8% compared to SSD, Faster-RCNN, Yolov7, Yolov8m, and the original model, respectively. The computational complexity is only 13.3%, 9%, 17.7%, 23.2%, and 58.6% of the previous five algorithms, while the number of parameters is reduced by 63.0%, 93.5%, 75.9%, 65.6%, and 55.8%, respectively. Furthermore, the proposed model exhibits excellent detection speed, achieving an FPS value of 68.5, which is significantly higher than that of other models. Compared to the commonly used object detection algorithms listed in the table, the proposed algorithm demonstrates substantial advantages in accuracy, computational complexity, number of parameters, and detection speed.

3.4. Fine-Tuning Experiment

Although FCBD-DETR demonstrates superior performance compared to other models, particularly in terms of operational efficiency, it still exhibits certain limitations. Specifically, the detection accuracy of the proposed model on test set B is lower than that on test set A. To address the practical application requirements, we conducted additional boosting experiments on test set B to enhance model performance. Bommasani et al. (2021) [40], proposed a method based on fine-tuning pre-trained weights, which can improve the model’s adaptability to downstream tasks. Consequently, fine-tuning experiments were performed on test set B to further augment the model’s detection capabilities.

The objective of the fine-tuning experiment was to determine whether optimal model weights can be achieved by utilizing existing pre-trained weights to enhance the object detection of chicken behaviors, while minimizing the usage of new training data and reducing the number of training cycles. The specific methodology is as follows: Initially, from the 400 images in test set B, 50 images are randomly selected to form the fine-tuning validation set, while another 50 images constitute the fine-tuning test set. The remaining 300 images are divided into 3 fine-tuning training sets, containing 100, 200, and 300 images, respectively. Subsequently, each training set is trained for 100, 150, and 200 epochs, utilizing the pre-trained weights accordingly. Finally, the fine-tuning test set is employed to evaluate the model’s detection performance under varying amounts of training data and training durations.

Figure 11 illustrates the results of fine-tuning experiments conducted using RT-DETR and FCBD-DETR. The experimental findings indicate that, in the absence of pre-trained weights, our model demonstrates a significant enhancement in mAP at both 100 and 150 iterations. Notably, when constrained to 100 training images, FCBD-DETR exhibits considerable capabilities in scenarios where pre-trained weights are utilized. After 200 iterations with 300 training images, the model achieves an mAP50 of 97.6%, in contrast to 95.0% when pre-trained weights are not employed. The fine-tuning experiments reveal that the incorporation of pre-trained weights facilitates a faster convergence of FCBD-DETR, resulting in more favorable outcomes even when the quantity of newly labeled data is limited or the training duration is shortened, thereby significantly reducing the training cost.

Figure 12 illustrates the visualization results of FCBD-DETR before and after fine-tuning on test set B. Prior to fine-tuning, the model exhibited issues such as missed detections, false detections, and inaccurate positioning. However, following fine-tuning, the model demonstrated significant improvements in new scenarios, effectively enhancing its ability to detect chicken behavior. Specifically, fine-tuning based on pre-trained weights enables the model to adapt quickly to new backgrounds at a low cost, thereby facilitating effective object-background differentiation. Consequently, FCBD-DETR shows considerable potential for adaptive and accurate detection of chicken behavior in complex breeding environments.

Through these experiments, we not only verify the adaptability of the FCBD-DETR model across various environments but also demonstrate the significant role of the fine-tuning method based on pre-trained weights in enhancing the model’s generalization ability. This research serves as an essential reference for the advancement and application of chicken behavior detection technology in the future of intelligent agriculture.

4. Chicken Behavior Tracking Extension Application

Smart technology plays an indispensable role in modern animal husbandry [41]. Among these technologies, object detection algorithms serve as a core component, providing essential support for downstream tasks, such as object tracking, behavior analysis, and counting [42]. Previous experimental results indicate that the proposed FCBD-DETR model demonstrates excellent performance in detecting chicken targets within complex environments, exhibiting strong generalization capabilities and robustness. Consequently, this study offers valuable technical support for smart agriculture and establishes a solid foundation for related tasks aimed at enhancing animal welfare.

The accuracy of object detection serves as the foundation for chicken counting. In this study, we further applied the FCBD-DETR model to the chicken counting task, as illustrated in Figure 13. The accurate counting results are derived from the number of detected chicken bounding boxes, emphasizing the direct influence of object detection accuracy on the reliability of the counting outcomes. This influence is particularly critical in occluded and densely populated environments, where issues, such as missed detections and false detections, can significantly undermine counting accuracy. With the aid of intelligent counting technology, farmers can monitor the number, density, and distribution of chickens in real time. If certain areas become overly crowded, it may lead to a deterioration of the feeding environment and provoke stress reactions among the chickens. By identifying these congested areas, farmers can adjust the breeding space and density in a timely manner, thereby enhancing the environment and improving the welfare and quality of life for the chickens.

The accurate detection of chicken behavior is crucial for studying their health, habits, and welfare. In this study, the FCBD-DETR model was further applied to the behavior counting task and integrated with ByteTrack for tracking chickens. The experimental results demonstrate that the model can consistently and accurately assign an ID to each walking chicken while effectively locating and tracking them. Even when chickens move quickly and their positions change rapidly, their locations can still be accurately detected and tracked, allowing for the corresponding routes to be drawn. The movements of chickens—whether walking, eating, lying down, or drinking—can be continuously monitored. Notably, even in dense clusters, individual chickens can be accurately identified and tracked during walking and lying down. The tracking process is visualized in Figure 14. Therefore, the FCBD-DETR model provides significant technical support for smart chicken tracking, enabling farmers to better monitor and manage the behaviors and activities of their flocks. As illustrated in Figure 15, statistics on specific behaviors are obtained by filtering out irrelevant detection results. This technology allows farmers to accurately identify the frequency of undesirable behaviors within the chicken flock and offers an effective behavioral statistical model for researchers, thereby minimizing the stress and errors associated with close contact and manual counting.

5. Conclusions

To address the challenges of the insufficient generalization ability, robustness, and low efficiency of models in detecting broiler behavior within complex agricultural environments, this paper proposes a detection model, namely FCBD-DETR, based on Fasternet and MSDDC.

The experimental results demonstrate that, in comparison to the original model, the proposed model reduces computational complexity from 31.2 GFLOPs to 18.3 GFLOPs, decreases the number of parameters from 20.1 million to 8.9 million, and improves the FPS value from 49.5 to 68.5, thereby significantly enhancing detection efficiency.

The FCBD-DETR model demonstrates superior performance compared to traditional mainstream detection models across two distinct scenarios. Furthermore, fine-tuning experiments indicate that our model can significantly decrease the costs associated with labeling and training new data.

In general, compared to other models, FCBD-DETR demonstrates significant advantages in speed, parameter count, and computational complexity. However, its generalization capability in unfamiliar environments still requires improvement. Future research should focus on enhancing the model’s robustness and adaptability, particularly when addressing more complex or unknown agricultural settings. This enhancement will facilitate its extension and application to new farming scenarios.

Author Contributions

Concept, editing, project management: H.Q.; Methodology, investigation, resources, data planning, writing-manuscript preparation, visualization: Z.C.; Software: G.L.; Verification: R.C.; Formal analysis: J.J.; project management: X.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported by Qi Haixia’s Guangdong University Characteristic Innovation Project [2024KTSCX102] and Luo Xiwen’s University Special Discipline Construction Project [2023B10564003].

Conflicts of Interest

The authors declare that they have no conflicts of interest.

References

OECD; FAO. OECD-FAO Agricultural Outlook 2023–2032; OECD Publishing: Paris, France, 2023; Available online: https://www.fao.org/3/cc6361en/cc6361en.pdf (accessed on 15 October 2024).
Welfare Quality®. Welfare Quality® Assessment Protocol for Poultry (Broilers, Laying Hens); Welfare Quality® Consortium: Lelystad, The Netherlands, 2009. [Google Scholar]
Appleby, M.C.; Mench, J.A.; Hughes, B.O. Poultry Behaviour and Welfare; CABI: Wallingford, UK, 2004. [Google Scholar]
Boissy, A.; Manteuffel, G.; Jensen, M.B.; Moe, R.O.; Spruijt, B.; Keeling, L.J.; Winckler, C.; Forkman, B.; Dimitrov, I.; Langbein, J.; et al. Assessment of positive emotions in animals to improve their welfare. Physiol. Behav. 2007, 92, 375–397. [Google Scholar] [CrossRef]
Bradshaw, R.; Kirkden, R.D.; Broom, D.M. A review of the aetiology and pathology of leg weakness in broilers in relation to welfare. Avian Poult. Biol. Rev. 2002, 13, 45–103. [Google Scholar] [CrossRef]
Tablante, N.L. Common Poultry Diseases and Their Prevention; University of Maryland Extension: College Park, MD, USA, 2013; pp. 1–53. [Google Scholar]
Weeks, C.A.; Danbury, T.D.; Davies, H.C.; Hunt, P.; Kestin, S.C. The behaviour of broiler chickens and its modification by lameness. Appl. Anim. Behav. Sci. 2000, 67, 111–125. [Google Scholar] [CrossRef] [PubMed]
Morris, T. Poultry Production Systems: Behaviour, Management and Welfare, by M. C. Appleby, B.O. Hughes & H. A. Elson. xvi + 238 pp. Wallingford: CAB International (1992). £40.00 or $76.00 (hardback). ISBN 0 85198 797 4. J. Agric. Sci. 1993, 120, 420–421. [Google Scholar] [CrossRef]
Kashiha, M.; Pluk, A.; Bahr, C.; Vranken, E.; Berckmans, D. Development of an early warning system for a broiler house using computer vision. Biosyst. Eng. 2013, 116, 36–45. [Google Scholar] [CrossRef]
Huang, J.; Wang, W.; Zhang, T. Method for detecting avian influenza disease of chickens based on sound analysis. Biosyst. Eng. 2019, 180, 16–24. [Google Scholar] [CrossRef]
Zhuang, X.; Zhang, T. Detection of sick broilers by digital image processing and deep learning. Biosyst. Eng. 2019, 179, 106–116. [Google Scholar] [CrossRef]
Zhuang, X.; Bi, M.; Guo, J.; Wu, S.; Zhang, T. Development of an early warning algorithm to detect sick broilers. Comput. Electron. Agric. 2018, 144, 102–113. [Google Scholar] [CrossRef]
Rushton, J.; Viscarra, R.; Bleich, E.G.; McLeod, A. Impact of avian influenza outbreaks in the poultry sectors of five South East Asian countries (Cambodia, Indonesia, Lao PDR, Thailand, Viet Nam) outbreak costs, responses and potential long term control. Worlds Poult. Sci. J. 2005, 61, 491–514. [Google Scholar] [CrossRef]
Blatchford, R.A.; Archer, G.S.; Mench, J.A. Contrast in light intensity, rather than day length, influences the behavior and health of broiler chickens. Poult. Sci. 2012, 91, 1768–1774. [Google Scholar] [CrossRef]
Franco, B.R.; Shynkaruk, T.; Crowe, T.; Fancher, B.; French, N.; Gillingham, S.; Schwean-Lardner, K. Light color and the commercial broiler: Effect on behavior, fear, and stress. Poult. Sci. 2022, 101, 102052. [Google Scholar] [CrossRef] [PubMed]
Girard MT, E.; Zuidhof, M.J.; Bench, C.J. Feeding, foraging, and feather pecking behaviours in precision-fed and skip-a-day-fed broiler breeder pullets. Appl. Anim. Behav. Sci. 2017, 188, 42–49. [Google Scholar] [CrossRef]
Li, L.; Zhao, Y.; Oliveira, J.; Verhoijsen, W.; Liu, K.; Xin, H. A UHF RFID system for studying individual feeding and nesting behaviors of group-housed laying hens. Trans. ASABE 2017, 60, 1337–1347. [Google Scholar] [CrossRef]
Aydin, A. Using 3D vision camera system to automatically assess the level of inactivity in broiler chickens. Comput. Electron. Agric. 2017, 135, 4–10. [Google Scholar] [CrossRef]
Li, G.; Zhao, Y.; Hailey, R.; Zhang, N.; Liang, Y.; Purswell, J.L. Radio-frequency Identification (RFID) System for Monitoring Specific Behaviors of Group Housed Broilers. In Proceedings of the 10th International Livestock Environment Symposium (ILES X), Omaha, NE, USA, 25–27 September 2018. [Google Scholar] [CrossRef]
Van Der Stuyft, E.; Schofield, C.P.; Randall, J.; Wambacq, P.; Goedseels, V. Development and application of computer vision systems for use in livestock production. Comput. Electron. Agric. 1991, 6, 243–265. [Google Scholar] [CrossRef]
Xiao, D.; Zeng, R.; Zhou, M.; Huang, Y.; Wang, W. Monitoring key behaviors of herd Magang geese based on DH-YoloX. Trans. Chin. Soc. Agric. Eng. 2023, 39. Available online: https://link.cnki.net/urlid/11.2047.S.20230303.1347.004 (accessed on 15 October 2024).
Wang, J. Research on Key Techniques for Behavior Recognition of Caged Breeder Chickens. Ph.D. Dissertation, Agricultural University of Hebei, Baoding, China, 2020. [Google Scholar]
Qi, H.; Li, C.; Huang, G. Dead Chicken detection Algorithm based on lightweight YOLOv4. J. Chin. Agric. Mech. 2024, 45, 195–201. [Google Scholar] [CrossRef]
Gu, Y.; Wang, S.; Yan, Y.; Tang, S.; Zhao, S. Identification and analysis of emergency behavior of cage-reared laying ducks based on YoloV5. Agriculture 2022, 12, 485. [Google Scholar] [CrossRef]
Guo, Y.; Aggrey, S.E.; Wang, P.; Oladeinde, A.; Chai, L. Monitoring behaviors of broiler chickens at different ages with deep learning. Animals 2022, 12, 3390. [Google Scholar] [CrossRef]
Lin, A.; Chen, B.; Xu, J.; Zhang, Z.; Lu, G.; Zhang, D. DS-TransUNet: Dual Swin Transformer U-Net for Medical Image Segmentation. IEEE Trans. Instrum. Meas. 2022, 71, 4005615. [Google Scholar] [CrossRef]
Liu, L.; Ouyang, W.; Wang, X.; Fieguth, P.; Chen, J.; Liu, X.; Pietikäinen, M. Deep learning for generic object detection: A survey. Int. J. Comput. Vis. 2020, 128, 261–318. [Google Scholar] [CrossRef]
Wang, W.; Dai, J.; Chen, Z.; Huang, Z.; Li, Z.; Zhu, X.; Hu, X.; Lu, T.; Lu, L.; Li, H.; et al. Internimage: Exploring large-scale vision foundation models with deformable convolutions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 14408–14419. [Google Scholar]
Carion, N.; Massa, F.; Synnaeve, G.; Usunier, N.; Kirillov, A.; Zagoruyko, S. End-to-end object detection with transformers. In Computer Vision—ECCV 2020: Proceedings of the 16th European Conference, Glasgow, UK, 23–28 August 2020; Springer International Publishing: Cham, Switzerland, 2020; pp. 213–229. [Google Scholar]
Zhang, Y.; Sun, P.; Jiang, Y.; Yu, D.; Weng, F.; Yuan, Z.; Luo, P.; Liu, W.; Wang, X. Bytetrack: Multi-object tracking by associating every detection box. In Computer Vision—ECCV 2022: Proceedings of the 17th European Conference, Tel Aviv, Israel, 23–27 October 2022; Springer Nature: Cham, Switzerland, 2022; pp. 1–21. [Google Scholar]
Chen, J.; Kao, S.H.; He, H.; Zhuo, W.; Wen, S.; Lee, C.H.; Chan SH, G. Run, don’t walk: Chasing higher FLOPS for faster neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 12021–12031. [Google Scholar]
Zhang, X.; Zhou, X.; Lin, M.; Sun, J. Shufflenet: An extremely efficient convolutional neural network for mobile. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018. [Google Scholar]
Li, H.; Li, J.; Wei, H.; Liu, Z.; Zhan, Z.; Ren, Q. Slim-neck by GSConv: A better design paradigm of detector architectures for autonomous vehicles. arXiv 2022, arXiv:2206.02424. [Google Scholar]
Ding, X.; Zhang, X.; Ma, N.; Han, J.; Ding, G.; Sun, J. RepVGG: Making VGG-style convnets great again. In Proceedings of the IEEE/CVF Conference on Computer VISION and pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 13733–13742. [Google Scholar]
Padilla, R.; Netto, S.L.; Da Silva, E.A. A survey on performance metrics for object-detection algorithms. In Proceedings of the 2020 International Conference on Systems, Signals and Image Processing (IWSSIP), Niteroi, Brazil, 1–3 July 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 237–242. [Google Scholar]
Ding, X.; Zhang, X.; Han, J.; Ding, G. Scaling up your kernels to 31×31: Revisiting large kernel design in CNNs. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 11963–11975. [Google Scholar]
Chollet, F. Xception: Deep learning with depthwise separable convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1251–1258. [Google Scholar]
Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv 2017, arXiv:1704.04861. [Google Scholar]
Luo, W.; Li, Y.; Urtasun, R.; Zemel, R. Understanding the effective receptive field in deep convolutional neural networks. In Advances in Neural Information Processing Systems 29, Proceedings of the Annual Conference on Neural Information Processing Systems 2016, Barcelona, Spain, 5–10 December 2016; Neural Information Processing Systems Foundation, Inc. (NeurIPS): La Jolla, CA, USA, 2016. [Google Scholar]
Bommasani, R.; Hudson, D.A.; Adeli, E.; Altman, R.; Arora, S.; von Arx, S.; Bernstein, M.S.; Bohg, J.; Bosselut, A.; Brunskill, E.; et al. On the opportunities and risks of foundation models. arXiv 2021, arXiv:2108.07258. [Google Scholar]
Wu, D.; Cui, D.; Zhou, M.; Ying, Y. Information perception in modern poultry farming: A review. Comput. Electron. Agric. 2022, 199, 107131. [Google Scholar] [CrossRef]
Zou, Z.; Chen, K.; Shi, Z.; Guo, Y.; Ye, J. Object detection in 20 years: A survey. Proc. IEEE 2023, 111, 257–276. [Google Scholar] [CrossRef]

Figure 1. Examples of datasets.

Figure 2. Dataset augmentation example.

Figure 3. Schematic representation of broiler behavior categories.

Figure 4. Training data analysis: object instance counts and bounding box properties.

Figure 5. RT-DETR network structure.

Figure 6. Improved RT-DETR network structure.

Figure 7. Structure of the Pconv block.

Figure 8. DDConv structure diagram, where c1 is the number of input channels and c2 is the number of output channels.

Figure 9. Network structure diagram of the Fusion and MSDCC approaches: (a) RepBlock consists of repvggblock with ReLU activation [34]. (b) DDConv is the structure we developed, as shown in Figure 8.

Figure 10. The effective receptive field (ERF) ResNet18/FasterNet/FasterNet + DDConv. A more widely distributed dark area indicates a larger ERF.

Figure 11. The fine-tuning experiment results for different training images and epochs.

Figure 12. Visual inspection of FCBD-DETR in test B detection knots.

Figure 13. Chicken counting based on FCBD-DETR.

Figure 14. Tracking visualization based on FCBD-DETR and Bytetrack.

Figure 15. Chicken tracking and classification counting based on FCBD-DETR.

Table 1. Detailed shooting information of datasets A and B.

Data Set	Stocking Density	Shooting Height	Shooting Angle
A	0.04 m²/individual	1.5 m	The coop is tilted down 45° directly in front of it
B	0.04 m²/individual	1 m	The sides of the coop are angled 45° downward

Table 2. Classification criteria for broiler behavior.

Behavior Name	Standard of Judgment	Free Sketch
Lying	Belly on the ground	(a)
Feed	Head into the trough	(b)
Stand	Legs visible	(c)
Drink	Head in the sink	(d)

Table 3. Instance distribution per dataset.

Dataset	Lying Instance	Feed Instance	Stand Instance	Drink Instance
Validation set	12,604	5334	5839	2307
Test set A	12,627	5464	5709	2208
Test set B	2586	641	1177	122

Table 4. Training parameters.

Parameters	Settings	Parameters	Settings
Optimizer	SGD	lrf	1.0
Epoch	300	weight_decay	0.0001
Batch-size	4	momentum	0.9
Workers	4	Warmup_epochs	2000
imgs	640	Warmup_momentum	0.8
lr0	0.0001	Close_mosaic	10

Table 5. Performance of models under different backbones.

Backbone	Test Set A		Test Set B		Params (M)	FLOPs (G)
Backbone	mAP50	mAP50-95	mAP50	mAP50-95	Params (M)	FLOPs (G)
ResNet18	98.6	84.9	31.2	20.1	20	60
ResNet34	99.4	88.6	31.0	18.4	31	92
ResNet50	99.2	89.0	37.7	23.5	36	136
HGNetv2	99.3	88.4	32.2	19.1	32	110
FasterNet	98.3	84.3	27.6	17.4	10.8	28.5

Table 6. High-contribution area ratio (r) for different models with varying thresholds (t). A larger r suggests a smoother distribution of high-contribution pixels, hence a larger ERF.

Model	t = 20%	t = 30%	t = 50%	t = 99%
RT-DETR-ResNet18	0.3%	0.5%	1.3%	77.9%
RT-DETR-Fasternet	1%	2.3%	10.4%	95.9%
RT-DETR-Fasternet-DDConv	2.3%	4.4%	10.4%	88.7%

Table 7. MSDCC ablation experiments.

Model	Tes Set A		Test Set B		Params (M)	FLOPs (G)
Model	mAP50	mAP50-95	mAP50	mAP50-95	Params (M)	FLOPs (G)
RT-DETR-ResNet18	98.6	84.9	31.2	20.1	20	60
RT-DETR-Fasternet	98.3	84.3	27.6	17.4	10.8	28.5
RT-DETR-Fasternet +DDConv	99.0	86.7	25.6	15.7	9.9	26.7
Our(RT-DETR-Fasternet +DDConv + MSDCC)	99.4	88.4	31.6	19.1	8.9	18.7

Table 8. Comparison of performance between different models.

Models	mAP50(%)		GFLOPs	Params (M)	FPS
Models	Test Set A	Test Set B	GFLOPs	Params (M)	FPS
SSD	98.2	26.5	137.2	24.01	23.5
Faster-RCNN	98.6	30.5	200.9	136.75	11.8
Yolov7	99.3	30.4	103.2	36.93	45.9
Yolov8m	99.1	27.4	78.7	25.8	57.6
RT-DETR-R18	98.6	31.2	31.2	20.1	49.5
Our	99.4	31.6	18.3	8.9	68.5

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Qi, H.; Chen, Z.; Liang, G.; Chen, R.; Jiang, J.; Luo, X. Broiler Behavior Detection and Tracking Method Based on Lightweight Transformer. Appl. Sci. 2025, 15, 3333. https://doi.org/10.3390/app15063333

AMA Style

Qi H, Chen Z, Liang G, Chen R, Jiang J, Luo X. Broiler Behavior Detection and Tracking Method Based on Lightweight Transformer. Applied Sciences. 2025; 15(6):3333. https://doi.org/10.3390/app15063333

Chicago/Turabian Style

Qi, Haixia, Zihong Chen, Guangsheng Liang, Riyao Chen, Jinzhuo Jiang, and Xiwen Luo. 2025. "Broiler Behavior Detection and Tracking Method Based on Lightweight Transformer" Applied Sciences 15, no. 6: 3333. https://doi.org/10.3390/app15063333

APA Style

Qi, H., Chen, Z., Liang, G., Chen, R., Jiang, J., & Luo, X. (2025). Broiler Behavior Detection and Tracking Method Based on Lightweight Transformer. Applied Sciences, 15(6), 3333. https://doi.org/10.3390/app15063333

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Broiler Behavior Detection and Tracking Method Based on Lightweight Transformer

Abstract

1. Introduction

2. Materials and Methods

2.1. Data Collection

2.2. Methods

2.3. Data Definition and Annotation

2.4. RT-DETR

2.5. Modified FDDC-DETR

2.5.1. FasterNet

2.5.2. DDConv and MSDCC Module

2.6. Evaluation Metrics and Experimental Environment

2.6.1. Evaluation Metrics

2.6.2. Experimental Environment

3. Results and Discussion

3.1. Backbone Improvements

3.2. DDConv and MSDCC Ablation Experiments

3.3. Comparison Experiments of Different Models

3.4. Fine-Tuning Experiment

4. Chicken Behavior Tracking Extension Application

5. Conclusions

Author Contributions

Funding

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI