1. Introduction
The broiler chicken industry plays a critical role in driving economic growth, ensuring market supply, and increasing farmers’ incomes [
1,
2]. With the rapid development of broiler production, the implementation of precision farming management has become increasingly important [
3]. In modern broiler rearing and breeding, comprehensive collection of individual data facilitates the formulation of scientific breeding strategies, disease prevention, and the mitigation of potential risks in farms [
4,
5]. Accurate recognition of individual broilers is a prerequisite for obtaining detailed individual information. By employing precise individual recognition technology, producers can monitor the health and welfare of each broiler in real time, thereby devising personalized management plans and enhancing overall farming welfare [
6]. Furthermore, individual broiler recognition plays an irreplaceable role in behavioral studies and full-process product traceability [
7,
8]. In recent years, with the widespread adoption of sustainable intensive farming models, the demand for accurate individual recognition of every broiler has become increasingly urgent.
Currently, radio frequency identification (RFID) technology has become one of the most widely used methods for individual identification in broilers. Gao’s team developed an intelligent weighing system based on RFID to accurately capture individual broiler weights [
9]. Similarly, van der Sluis and colleagues constructed a tracking system using RFID tags that enables real-time monitoring of broilers’ behaviors throughout their lifecycle [
10]. In addition, Collet employed RFID technology to collect positional data of individual broilers, thereby achieving precise quantification of free-range behavior [
11]. However, despite its widespread application in broiler identification, RFID technology faces numerous challenges in large-scale farms. Primarily, RFID systems are susceptible to interference from environmental factors—especially nearby electronic devices and metallic components—which can reduce chip recognition accuracy [
12]. Moreover, the high cost of sensor deployment and maintenance, along with stringent requirements for power consumption and device performance stability, further limit its application. More importantly, the use of RFID technology may have adverse impacts on broiler welfare [
13]. Compared with traditional methods, contactless identification approaches based on broiler facial features offer the advantages of non-invasiveness, higher stability, and improved recognition accuracy. Consequently, the advancement of facial recognition techniques for broiler chickens holds great promise for facilitating precision poultry farming, enhancing animal welfare, and meeting the growing demands for intelligent and efficient management in modern agricultural systems.
With continuous advances in computer vision technology, an increasing number of researchers have adopted non-contact methods to conduct studies on broiler target detection, pose estimation, as well as growth and health monitoring [
14,
15]. For example, Ma et al. constructed an advanced chicken face detection network based on Generative Adversarial Networks (GANs) and Masked Autoencoders (MAEs), enhancing small-object features through adversarial generation and self-encoding mechanisms [
16]; Yin et al. integrated multi-object tracking with single-shot detection to enable long-term behavioral monitoring and individual tracking of broilers [
17]; Yang et al. utilized deep learning models to analyze movement trajectories within broiler flocks, achieving accurate detection and classification of motion states, which supports activity assessment and early health warning [
18]; Narisi et al. estimated the feeding time of individual broilers using convolutional neural networks (CNNs) combined with image processing techniques, validating its application potential in optimizing feed efficiency [
19]. However, research on broiler face recognition remains in its infancy and lags significantly behind the facial recognition technologies developed for larger livestock such as pigs and cattle. For instance, Hansen’s team developed a pig face recognition method based on convolutional neural networks (CNNs) that achieved 96.7% accuracy in farm environments [
20], while Weng and colleagues proposed the TB-CNN model, which attained 99.85% accuracy in multi-angle cattle face recognition [
21]. In contrast, broiler face recognition has yet to achieve similar breakthroughs, and no dedicated studies on this topic have been reported. This technological gap mainly stems from three specific challenges: first, the relatively small facial area and subtle features of broilers demand higher recognition accuracy [
16]; second, mutual occlusion among broilers further complicates facial feature extraction [
22]; and third, recognition errors in dynamic multi-target scenarios tend to escalate as group size increases [
23]. Moreover, due to the limited computational resources of embedded systems, CNN-based deep learning algorithms must optimize memory management, reduce weight parameters, and lower computational costs to meet the requirements of embedded deployment [
24,
25].
Given the urgent need for lightweight, high-precision solutions, YOLO-based architectures emerge as a promising candidate due to their inherent advantages in speed-accuracy trade-offs and embedded deployment feasibility [
26]. Modern YOLO variants demonstrate exceptional deployment efficiency, achieving ultra-high frame rates through one-stage inference on conventional hardware while maintaining detection performance, making them particularly suitable for real-time monitoring requirements [
27]. For instance, the YOLO Granada framework, derived from YOLOv5, reduces parameter counts and FLOPs to approximately 55% of the original model through network pruning and lightweight architectural design, concurrently enhancing inference speed by 17% [
28]. The LSR YOLO model optimized for sheep face detection employs ShuffleNetv2/Ghost modules to replace standard convolutions, achieving a compact model size of ~9.5 MB specifically tailored for edge devices [
29]. On resource-constrained platforms like Jetson Nano, Tiny YOLOv3 achieves ~9 FPS (≈110 ms/frame) [
30], while the Ag YOLO optimized for low-power NCS2 accelerators attains 36.5 FPS with a 12-fold parameter reduction [
31]. The latest YOLOv11 architecture incorporates cross-stage spatial attention modules and network reconfiguration designs, effectively enhancing multi-scale feature representation capabilities without compromising real-time performance through improved contextual feature focusing [
32]. Multiple optimization strategies including network pruning, lightweight module substitution, and attention mechanism enhancement have been systematically validated to improve deployment feasibility in embedded systems while maintaining computational efficiency.
To address the array of challenges in individual broiler identification, this study proposes a broiler face recognition model named YOLO-IFSC. The model is specifically optimized based on YOLOv11n to tackle the aforementioned issues. First, by refining the network structure, YOLO-IFSC significantly improves recognition accuracy for small targets and subtle features, effectively mitigating the high-precision demands imposed by the small facial area. Second, the model enhances its robustness against occlusion, effectively reducing the interference in facial feature extraction caused by occluded regions. Additionally, YOLO-IFSC employs multi-scale feature fusion techniques to resolve multi-target recognition challenges in group environments, and its lightweight design reduces computational cost and memory usage, thereby meeting the practical requirements of embedded devices. The main improvements to YOLOv11n are as follows:
Inception-F Module Replacement: A multi-branch dynamic fusion module is introduced, which optimizes multi-scale feature extraction through parallel pathways and dynamic weighting mechanisms [
33]. This enhancement increases the model’s adaptability to features at various granularities while effectively reducing both the computational load and parameter count in the broiler face recognition task, thereby improving overall computational efficiency.
C2f-Faster Module Replacement: An efficient feature compression module based on partial convolution is adopted [
34]. By leveraging selective channel computation to reduce redundancy, this module strikes an excellent balance between computational cost and feature representation capability.
SPPELANF Module Replacement: A multi-scale pyramid fusion module is introduced, which combines cross-level feature integration with dynamic pooling strategies to enhance the model’s ability to perceive complex spatial relationships [
35].
CBAM Module Replacement: A dual-domain attention-guided module is utilized, which adaptively weights both the channel and spatial dimensions, effectively enhancing the feature response in key regions [
36].
3. Results
3.1. Quantitative Evaluation of Different Variants of the Inception-F Module in Multi-Scale Feature Extraction
To quantitatively evaluate the contribution of the five parallel paths (R1–R5) in the Inception-F module to multi-scale feature extraction, we first constructed three ablation variants: Inception-A (containing only R1–R3, as shown in
Figure 5), Inception-B (containing R1–R4), and the complete Inception-F (containing R1–R5). These variants were used to replace the P3, P5, and P7 layers of the backbone network, respectively, and the experimental results are summarized in
Table 3.
Experimental results demonstrate that Inception-F achieves significant improvements in precision, F1 score, and mean average precision (mAP) compared to the two ablation variants. Specifically, relative to the baseline model (YOLOv11n), Inception-F increases precision by 2.2 percentage points, F1 score by 0.8 percentage points, and mAP by 2.0 percentage points. Meanwhile, its parameter count and computational cost rise only marginally over those of Inception-A and Inception-B and remain substantially lower than the baseline. In contrast, the recall rates of Inception-A and Inception-B decrease by 1.3% and 0.9%, respectively, failing to balance overall performance. Considering both detection performance and model compactness, the full five-path dynamic weighted fusion variant (Inception-F) represents the optimal solution.
3.2. Comparative Experiments of the SPPELANF Module and Alternative Designs
To validate the effectiveness of the proposed parallel ELAN pooling with cross-layer FPN fusion strategy in the SPPELANF module, we compared it against two alternative configurations: the serial, same-scale max-pooling module (SPPF) from the baseline YOLOv11n and the parallel ELAN pooling module without FPN fusion (SPPELAN). Under the same training protocol, each design replaced the P9 layer of the backbone and was evaluated using identical performance metrics and complexity quantification methods. The results are summarized in
Table 4.
Experimental results indicate that SPPELANF improves precision by 1.6 percentage points over the baseline and maintains near–optimal levels in F1 (87.2%) score and mAP (90.4%), while its parameter count and FLOPs differ minimally from those of SPPELAN. Although SPPELAN slightly outperforms SPPELANF in recall and mAP, its precision and F1 score are inferior, and the lack of cross-layer fusion leads to insufficient contextual consistency of features. In summary, the SPPELANF module achieves the optimal balance between detection performance and model compactness through parallel multi-scale extraction and cross-layer dynamic fusion.
3.3. Comparison of Inception-F Module Insertion Locations
To evaluate the impact of the Inception-F module on the performance of the YOLOv11n model, as shown in
Figure 4, we inserted it into the P3, P5, and P7 layers of the backbone network (Position 1) and the P17 and P20 layers of the neck network (Position 2). Position 1 + 2 refers to the insertion at both positions. The results are shown in
Table 5.
The experimental results show that the Inception-F module effectively improves mAP while reducing the model’s parameter size and FLOPs. At Position 1, the p value increased by 2.2%, and mAP improved by 2%. At Position 2, the p value decreased by 0.3%, but mAP increased by 1.6%. Although the p value at Position 1 + 2 decreased by 1%, its mAP significantly improved by 3.6%, reaching 91.4%. The parameter count for the Position 1 + 2 model decreased by 23.2%, and FLOPs were reduced by 16.6%. These results indicate that the Inception-F module not only improves mAP but also optimizes model lightweight design and computational efficiency. Despite fluctuations in p values at different positions, considering the significant mAP improvement and the notable lightweight effect, Position 1 + 2 was chosen as the final solution.
3.4. Comparison of C2f-Faster Module Insertion Locations
To evaluate the impact of the C2f-Faster module on the YOLOv11n model, we inserted it into the P2 and P4 layers of the backbone network (Position 1) and the P13 and P19 layers of the neck network (Position 2), and also considered inserting the module at both positions. The experimental results show that introducing the C2f-Faster module significantly reduced the model’s parameter count and FLOPs, while also improving mAP. The results are shown in
Table 6.
Specifically, after inserting the module at Position 1, FLOPs decreased by 4.5%, mAP improved by 2.1%, and the parameter count reduced by 1.5%. At Position 2, FLOPs decreased by 7.5%, and mAP improved by 2.5%. Notably, when the C2f-Faster module was inserted at both Position 1 and Position 2, the reduction in FLOPs was the same as at Position 2, but the accuracy dropped by 1.5%. However, Position 2 maintained a high mAP (90.3%) and precision (90.5%), while significantly reducing computational load and memory usage. Considering the balance between model accuracy and computational resources, Position 2 was chosen as the optimal solution, especially for applications with limited computational resources.
3.5. Ablation Experiments
To systematically evaluate the impact of different modules on the performance of the YOLOv11n model, we designed several variant models and conducted ablation experiments. All experiments were conducted under the same dataset and training parameters to ensure that the comparison results accurately reflect the contribution of each improved module to object recognition performance. Additionally, we assessed the lightweight effects of each model to verify their feasibility on resource-limited embedded devices. As shown in
Figure 4, YOLOv11-I represents the introduction of the Inception-F module, applied to the P3, P5, and P7 layers of the backbone network and the P17 and P20 layers of the neck network; YOLOv11-F represents the introduction of the C2f-Faster module, replacing the P13 and P19 layers of the neck network; YOLOv11-S represents the introduction of the SPPELANF module, replacing the SPPF module in the P9 layer of the backbone network; YOLOv11-C represents the introduction of the CBAM module, replacing the C2PSA module in the P10 layer of the backbone network. YOLOv11-IF combines the Inception-F and C2f-Faster modules, inheriting the improvements of YOLOv11-I and YOLOv11-F; YOLOv11-IFS further combines the SPPELANF module on this basis, inheriting the improvements of YOLOv11-I, YOLOv11-F, and YOLOv11-S; and the YOLO-IFSC model introduces all four modules (Inception-F, C2f-Faster, SPPELANF, CBAM) simultaneously, inheriting all the optimizations of YOLOv11-I, YOLOv11-F, YOLOv11-S, and YOLOv11-C. Through the comparison of these different variants, this study aims to explore the independent contributions of each module, the synergistic effects of module combinations, and their enhancements in model accuracy, recall rate, mAP, and lightweight design, particularly focusing on the adaptability and real-time processing capability for embedded devices. The results are shown in
Table 7.
YOLOv11-I significantly improved the recall rate by 3.3%, and mAP increased by 3.6%, validating the effectiveness of the multi-scale feature fusion mechanism in complex scenarios. Although precision decreased by 1%, this indicates that a balance needs to be optimized between feature richness and classification confidence calibration. This performance improvement is attributed to the innovative design of the five parallel paths and dynamic weighting mechanism, which effectively extracts and integrates key features from multi-scale information. YOLOv11-F achieved a mAP of 90.3% with a 2.3% improvement in precision, indicating the advantages of PConv in enhancing local feature extraction. Although recall rate slightly decreased, the precision improvement effectively compensated for this gap. YOLOv11-S improved mAP by 2.6%, benefiting from the combination of Spatial Pyramid Pooling (SPP) and Feature Pyramid Networks (FPNs), which enhanced cross-scale information extraction and improved the processing accuracy for features at different scales. YOLOv11-C achieved a 1.7% improvement in precision, a 3.3% increase in recall rate, and a 2.4% improvement in F1 score. This improvement is attributed to the adaptive weighting in both the channel and spatial dimensions by the CBAM module, which effectively focuses on key features, enhancing the model’s recognition ability and accuracy. The YOLOv11-IF model demonstrated a significant lightweight advantage, with a 25.9% reduction in the number of parameters and a 19.6% reduction in FLOPs, while maintaining a mAP of 90.3%, proving that multi-module collaboration can achieve a balance between computational efficiency and recognition accuracy. Further integration of the SPPELANF module, forming YOLOv11-IFS, achieved a mAP of 91.2%, a 3.4% improvement over the baseline model, under the constraints of 1.79 M parameters and 5.2 G FLOPs. Finally, the integrated YOLO-IFSC model, with the cooperation of all four modules, achieved revolutionary lightweight design while maintaining excellent recognition performance (mAP50 = 91.5%, F1 = 87.3%): the number of parameters decreased to 1.55 M (a 40.8% reduction from the baseline), and FLOPs dropped to 5.0 G (a 24.2% reduction). The synergistic effect of all modules significantly enhanced the model’s comprehensive adaptability to complex scenarios, including multi-target interactions, dynamic scale variations, and robustness against environmental interference.
Figure 9 shows the precision and mAP (mean Average Precision) curves for the ablation experiments on the validation set. According to the experimental results, YOLO-IFSC not only achieves the highest mAP value, but also converges rapidly after only 100 epochs in the early stages of training, significantly outperforming other variants. In terms of precision, all models exhibit high smoothness and minimal fluctuation throughout the 500-epoch training process. This indicates that the synergistic effect of the modules significantly contributes to the improvement of recognition performance, particularly in mAP, and also validates the stability of the training process, providing a solid theoretical foundation and technical support for real-time object recognition tasks.
3.6. YOLO-IFSC Network Training Results
As shown in
Figure 10, during the 500 epochs of training, both the training loss and validation loss showed a synchronized optimization trend: they rapidly decreased within the first 100 epochs, decelerated between 100 and 300 epochs, and stabilized from 300 to 500 epochs. The box_loss, cls_loss, and dfl_loss on the validation set all exhibited high stability, indicating that the model did not overfit during training and has good generalization ability. The changes in precision, recall rate, and mAP@0.5 followed the expected growth pattern, with mAP@0.5 stabilizing after about 100 epochs. Although recall rate showed some fluctuation in the early stages, overall metrics continued to improve, with no abnormal fluctuations or performance degradation, fully verifying the stability and reliability of the model’s recognition performance.
3.7. Progressive Occlusion Experiment
To quantitatively assess the model’s robustness under varying occlusion conditions, we manually synthesized three occlusion scenarios within the annotated broiler face bounding boxes using Photoshop’s rectangle tool: Level 1 (mild occlusion, covering ≤10% of the bounding box area and restricted to the periocular region); Level 2 (moderate occlusion, covering 20% ± 3% of the bounding box area, with the rectangular region spanning the eyes and lateral cheek regions and allowed to extend horizontally beyond the bounding box to realistically simulate large external occluders such as feeders or cage bars); and Level 3 (severe occlusion, covering 50% ± 5% of the bounding box area, encompassing the central and peripheral facial regions and permitting up to 5% of the occluder to exceed the bounding box to mimic more extreme physical obstructions).
The XGrad-CAM visualization results are shown in
Figure 11. Compared to YOLOv11n, YOLO-IFSC exhibited more concentrated feature activation in the broiler face region. Quantitative analysis further verified the superior performance of YOLO-IFSC: at Level 1 and Level 2, zero-error recognition was achieved, and high confidence scores were maintained across all occlusion levels. In contrast, YOLOv11n showed identity recognition errors at Level 2, demonstrating lower robustness. These performance improvements were achieved without affecting real-time recognition efficiency (36.6 FPS), validating the effectiveness of the proposed architecture’s synergistic effects—multi-scale dynamic fusion, partial convolution kernel optimization, cross-layer feature integration, and dual-domain attention mechanism—in enhancing occlusion robustness and accuracy, making it suitable for broiler face recognition tasks.
3.8. Comparison with Different Models
To rigorously benchmark YOLO-IFSC on broiler face recognition, we compared it against four families of state-of-the-art detectors. First, a set of lightweight single-stage models—YOLOv3-tiny, YOLOv5s, YOLOv8n, YOLOv9s, YOLOv10n, and YOLOv11s—were chosen for their minimal parameter count and high inference throughput. Second, classic two-stage frameworks—Faster R-CNN and Cascade R-CNN—were included to represent architectures that prioritize robust feature extraction and detection accuracy. Third, transformer-based detectors—namely DETR and RT-DETR—were evaluated for their use of global self-attention in end-to-end set prediction and incorporation of deformable attention mechanisms to accelerate inference. Finally, RTMDet-tiny was assessed as a hybrid convolution–transformer design, fusing efficient convolutional blocks with attention mechanisms. The comparative results are presented in
Table 8. This comprehensive evaluation highlights YOLO-IFSC’s superior trade-off between compactness and performance, underscoring its suitability for deployment in resource-constrained, real-time poultry-monitoring applications.
From the perspective of detection accuracy, YOLO-IFSC achieved an outstanding mAP@0.5 of 91.5%, significantly outperforming all lightweight models (e.g., YOLOv8n at 88.4% and YOLOv10n at 88.0%), representing improvements of 3.1% and 3.5%, respectively; compared to two-stage detectors Cascade R-CNN (89.4%) and Faster R-CNN (85.9%), YOLO-IFSC maintains high precision while incurring lower computational and storage overhead (only 1.55 M parameters and 5.0 GFLOPs), demonstrating a superior balance between model compactness and performance. Among transformer-based methods, DETR and RT-DETR achieve mAPs of 87.7% and 87.3% but require 41 M and 19.9 M parameters and 96 GFLOPs and 57 GFLOPs of computation, respectively—substantially higher than YOLO-IFSC, and thus less suitable for embedded deployment. Notably, RTMDet-tiny, as a hybrid lightweight–transformer architecture, attains a mAP@0.5 of 90.5%, approaching YOLO-IFSC’s accuracy, but exhibits inferior efficiency with an inference speed of only 29.1 FPS (versus YOLO-IFSC’s 36.6 FPS) and 4.88 M parameters with 8.0 GFLOPs—both exceeding those of our model and indicating higher deployment barriers. Overall, YOLO-IFSC delivers state-of-the-art detection accuracy and real-time performance under extremely low resource consumption, fully validating its applicability and promotion potential in resource-constrained environments.
3.9. Model Performance Evaluation on Embedded Platform
To assess the real-world deployment performance of the proposed YOLO-IFSC model on a resource-constrained embedded platform, this study utilized the NVIDIA Jetson Orin NX Super Developer Kit (NVIDIA Corporation, Santa Clara, CA, USA), which is equipped with an 8-core Arm Cortex-A78AE v8.2 CPU, a 1024-core Ampere GPU, and 16 GB of LPDDR5 memory. During testing, the model continuously received input data at a rate of 30 FPS. The system recorded key runtime metrics in real time, including end-to-end inference latency, peak GPU utilization, power consumption, and peak memory increase. The mean Average Precision (mAP) was subsequently computed offline to comprehensively evaluate the model’s detection accuracy and real-time responsiveness. The experimental results are summarized in
Table 9.
YOLO-IFSC and its FP16 quantized version demonstrated superior performance over the baseline model YOLOv11n in terms of inference speed, resource consumption, and detection accuracy. Specifically, YOLO-IFSC achieved a high detection accuracy (mAP50 of 91.5%) with an end-to-end inference latency of 27.5 ms, while maintaining a peak GPU utilization of 71%, power consumption of 12.1 W, and memory increase of 1198 MB. After quantization, YOLO-IFSC_FP16 further reduced the inference latency to 17.4 ms, with power consumption and GPU utilization decreased to 10.7 W and 60%, respectively, and memory usage reduced to 933 MB, while detection accuracy only slightly dropped to 91.3%. In contrast, YOLOv11n and its FP16 variant exhibited inferior performance in terms of accuracy, inference speed, and resource efficiency. Additionally, the peak temperature remained well below the Jetson Orin NX’s thermal design limit of 85 °C, ensuring thermal stability and sustained high-performance operation during extended runtime. These findings demonstrate YOLO-IFSC’s exceptional balance of high accuracy, real-time responsiveness, and minimal resource demands, affirming its strong suitability for embedded deployment.
4. Discussion
Precision farming has raised an urgent demand for individual broiler recognition, and broiler face recognition, as a potential solution, plays a significant role in ensuring animal welfare and improving intelligent farming management. However, research on broiler face recognition is still in its early stages. This study proposes for the first time a lightweight, non-contact broiler face recognition model—YOLO-IFSC.
Ablation studies indicate that the strategy adopted in this study is both rational and superior. Firstly, large convolution kernels (e.g., the 5 × 5 convolution in Inception-F) expand the network’s receptive field, facilitating the capture of broader contextual information and enabling more complete modeling of shape features [
49]. For example, models such as RepLKNet employ large kernels like 31 × 31 to construct an expanded receptive field [
45], significantly enhancing object shape representation. Secondly, the introduction of partial convolution (PConv) helps eliminate redundant computations by performing depthwise convolution only on regions of interest. In broiler leg disease detection work, incorporating PConv into the C2f module not only reduced computational load but also more accurately extracted spatial features [
34], which is consistent with our conclusions when applying this strategy in broiler face recognition. The inclusion of CBAM is also authoritatively supported: Woo et al. demonstrated that CBAM, as a lightweight module, can significantly improve the classification and detection performance of various CNNs [
36]. Overall, the above strategies have been validated as effective through ablation experiments, and multiple studies have supported their design rationale, fully demonstrating the rationality and superiority of the methods employed in this work. Comparative experiments with other lightweight models further substantiate that our model achieves superior inference speed and accuracy. We additionally conducted model deployment experiments on the Jetson Orin NX platform to verify the application potential of YOLO-IFSC in real-world edge computing environments. The results showed that, under FP16 quantization, the model achieved an end-to-end inference latency of 17.4 ms with power consumption controlled below 11 W, demonstrating favorable real-time performance and energy efficiency. This performance level is comparable to existing lightweight detection models (such as YOLOv8n_FP16) on the same platform [
50], and the model still maintained 91.3% mAP under embedded conditions, indicating that the proposed architecture not only ensures accuracy but also exhibits strong deployment adaptability and engineering feasibility.
However, there are certain limitations in this study. First, the current dataset primarily covers WOD168 broilers, and there is an urgent need to verify the model’s cross-breed generalization ability using multi-breed data. Second, as shown in
Figure 11, in severe occlusion (Level 3) scenarios, some broiler identities were not accurately recognized, reflecting that there is still room for improvement in the model’s global dependency modeling. Moreover, variations in the growth stages of broiler chickens lead to facial feature drift, and the current model still requires improvement in its adaptability to different age groups. To address this, future work will focus on three directions: one is to introduce the long-range attention mechanism of the Vision Transformer [
51] to enhance the model’s ability to capture contextual information in occluded areas; the second is to construct a diversified dataset covering multiple breeds to comprehensively improve the model’s generalization performance. Third, an online incremental learning strategy will be adopted to address feature drift across different growth stages. This study fills a critical technological gap in the niche domain of broiler facial recognition, representing a pivotal step toward individualized management in intensive farming environments. For the first time, YOLO-IFSC enables robust individual identification of broilers in high-density settings, laying a solid technical foundation for intelligent farm management, precision feeding, animal health and welfare monitoring, and full-process traceability from chick to market-ready product. Additionally, as the current dataset is still in collaboration with related enterprises and has not been made publicly available, it is planned to release the dataset after the project is completed to promote further research.