1. Introduction
As the public increasingly focuses on healthy diets, protein-rich foods have become their preferred choice, and poultry products ideally meet this demand. Not only is poultry rich in protein, but it is also relatively low in fat, making it a widely regarded nutritionally balanced food option that has won growing consumer favor. Data from 2022 indicate that China’s poultry output reached 16.14 billion units, a year-on-year increase of 400 million with 2.5% growth. The country’s poultry-meat production reached 24.43 million tons (up 630,000 tons or 2.6%), while egg production stood at 34.56 million tons (a rise of 480,000 tons or 1.4%). As of year-end 2022, China’s live poultry inventory totaled 6.77 billion, recording a slight 0.2% decrease year-on-year [
1]. These figures demonstrate the poultry industry’s expansion scale and its development momentum in adapting to market demand.
As an integral part of the agricultural sector, traditional poultry farming has long provided us with abundant food resources and significant economic benefits [
2]. However, it also faces serious challenges and limitations that cannot be ignored. In this conventional breeding model, the entire process relies heavily on the personal experience and subjective judgment of farmers, lacking scientific management and meticulous operation, which inevitably leads to waste of resources and low production efficiency. Taking the use of feed as an example, its often imprecise control leads to unnecessary waste, not only increasing the cost of breeding, but also imposing an additional burden on the environment [
3]. More importantly, such feed mismanagement can create unhygienic conditions in poultry houses, as spilled feed attracts pests and promotes bacterial growth. This hygiene issue directly correlates with disease prevention challenges in poultry farming [
4]. Currently, the detection of poultry diseases mainly relies on manual observation, which is prone to misdiagnosis or missed diagnosis, and the unsanitary conditions caused by feed spillage may further exacerbate disease risks. These challenges in production not only reduce farm efficiency, but also directly impact the safety and quality of poultry products reaching consumers. For instance, poor hygiene and disease outbreaks can lead to contaminated meat or eggs, while the lack of precise monitoring makes it difficult to ensure proper traceability. As a result, as consumers’ expectations for food safety and quality continue to rise, the traditional poultry farming model is no longer able to meet the market demand for high-quality, traceable products.
Smart agriculture has perfectly integrated advanced information and communication technology with modern agricultural equipment, significantly elevating the level of automation and intelligence in agricultural production and effectively overcoming numerous challenges faced in traditional animal husbandry. By deploying environmental monitoring devices, such as temperature and humidity sensors and ammonia detectors, we are able to monitor the breeding environment in real-time, ensuring that the conditions necessary for the growth of farm animals are optimal [
5]. Furthermore, by attaching electronic tags to the animals, we can record and track the growth indicators and environmental factors of the animals in real-time [
6]. Consumers can quickly access comprehensive information about the animals, from birth to slaughter, by simply scanning the relevant barcodes or QR codes. Behind all these advancements, artificial intelligence plays a central role in driving the continuous progress of smart agriculture, laying a solid foundation for in-depth research and widespread application in the field of animal husbandry.
The rapid advancement of artificial intelligence has become a hot topic in the field of technology in recent years, particularly with the rise of deep learning and the widespread application of neural networks. In the last few years, researchers have achieved groundbreaking results in natural language processing (NLP) using deep learning techniques [
7]. For instance, pre-trained models such as BERT, GPT-4, and T5 have demonstrated impressive performance across various NLP tasks, including text generation, machine translation, and question-answering systems [
8,
9,
10]. In computer vision tasks, such as object detection with the YOLO series, multiple objects in images can be efficiently detected, playing a crucial role in industrial automation, traffic supervision, and autonomous driving [
11]. Additionally, semantic segmentation is another area garnering attention, involving the assignment of each pixel in an image to a specific semantic category. Pre-trained convolutional neural networks like DeepLab, UNet, and SegNet have driven performance improvements in semantic segmentation tasks, showcasing broad application prospects in fields such as medical image analysis and urban planning [
12,
13,
14].Recently, Umirzakova et al. [
15] proposed MIRA-CAP (Memory-Integrated Retrieval-Augmented Captioning), a novel framework that enhances both image and video captioning by leveraging a cross-modal memory bank, adaptive dataset pruning, and a streaming decoder, achieving state-of-the-art performance on standard datasets.
While artificial intelligence is rapidly advancing, its application in smart agriculture largely remains in the research phase. Dihua Wu [
16] utilized an improved ResNet-50 deep learning algorithm to identify the gender of chickens, addressing the inefficiencies associated with traditional manual observation. Recent studies have further demonstrated the potential of deep learning in poultry behavior monitoring and farm management. Paneru et al. [
17] developed specialized YOLOv8 models for automated detection of dustbathing behavior in cage-free laying hens, achieving over 93% precision in tracking this welfare-related behavior across different growth phases. Similarly, Yang et al. [
18] proposed an innovative Depth Anything Model (DAM) that enables accurate poultry drinking behavior monitoring and floor egg detection using only RGB images, with 92.3% accuracy in behavior recognition and significant improvements in farm operation efficiency.
In the field of poultry farming, although traditional object detection technologies have made some progress, they still have many limitations. Firstly, poultry present a high degree of diversity and complexity in actual breeding environments [
19]. This results in the models often exhibiting insufficient generalization ability when dealing with different types, environments, and species of poultry, which increases the difficulty of operation and maintenance. Secondly, traditional object detection methods typically require a large amount of annotated data for training models, which is challenging and expensive to obtain in real poultry farming scenarios, potentially failing to meet high requirements for real-time performance [
20]. Furthermore, for some rare breeds, it is particularly difficult to obtain enough data samples due to their limited number and the restrictions on protection and management. Finding an object detection method that can maintain high performance under conditions of scarce samples is of significant importance for enhancing the level of intelligence in poultry farming.
Few-shot learning has emerged as a crucial research direction in the field of machine learning, aiming to address the significant performance degradation traditional machine learning methods experience when faced with scarce data [
21]. Particularly in specialized domains like healthcare and agriculture, the acquisition of a substantial amount of labeled data can be time-consuming and expensive, underscoring the prominence of this challenge. Few-shot learning innovates by designing more efficient algorithms and models, enabling machine learning models to perform well even when only a limited number of samples are available. This approach substantially reduces the complexity and cost associated with data preparation and enhances the model’s generalization capabilities, especially when encountering unseen categories or scenarios. With the rapid advancements in deep learning technologies, few-shot learning has made remarkable progress in various fields, such as image recognition, segmentation tasks, and speech recognition, in recent years, becoming a significant driving force behind the continuous innovation in artificial intelligence technologies [
22,
23,
24].
While few-shot learning has shown promising results in various vision tasks, its performance in agricultural settings—particularly poultry detection—remains suboptimal. This is primarily due to the unique challenges in such environments, including high intra-class variance among birds (e.g., differences in breed, size, and plumage), crowded scenes with frequent occlusions, and significant variations in lighting and background conditions. Most existing FSOD methods are designed for general object categories and fail to account for these domain-specific complexities. As a result, there is a clear need for tailored solutions that can better adapt to the visual diversity and environmental variability inherent in agricultural monitoring tasks.
To overcome the challenges encountered in practical applications, this paper integrates few-shot object detection with poultry farming. As a novel paradigm in object detection, few-shot object detection can achieve accurate detection and recognition of poultry with only a small number of annotated samples, effectively addressing the complexity of the breeding environment and the diversity of animal species [
25]. By just using between one and five images, few-shot object detection can fulfill the task of detecting poultry, circumventing the need for large-scale datasets, saving time and resources in data collection and annotation, and enhancing the model’s generalization capabilities and robustness. This paper innovatively applies the few-shot object detection technique to two common types of poultry, chickens and ducks, using ducks as the base class model for training. Subsequently, chickens are introduced as a new class for few-sample learning and testing. To demonstrate the model’s generalization ability, goldfish are also tested as a new class. The ultimate goal is to achieve efficient and accurate animal detection with a limited number of samples.
This study employs the FSCE (Few-Shot Object Detection via Contrastive Proposal Encoding) as its foundational model, an innovative approach tailored for few-shot object detection [
26]. However, in the nuanced domain of poultry farming, characterized by diverse terrains, variable climates, and myriad breeding practices, object detection presents significant challenges. The propensity of these animals to congregate in dense clusters exacerbates the issue, as standard object detection methods struggle with multiple overlapping detection boxes against a backdrop of highly variable environmental conditions. This complexity underscores the need for a robust model capable of navigating the unique challenges posed by the poultry farming landscape.
To significantly improve the performance and stability of our model in the intricate settings of poultry farming, we have integrated the advanced Sharpness-Aware Minimization (SAM) technique into our framework [
27]. SAM refines the optimization process by directly targeting the smoothness irregularities within the loss function. This strategic approach ensures a more stable model convergence, especially critical in environments with complex agricultural backgrounds and densely populated animal scenes.
Furthermore, recognizing the unique challenge posed by the tendency of poultry to cluster, resulting in numerous overlapping detection boxes, we have innovatively adopted the Soft-NMS algorithm as a pivotal enhancement to our model. Traditional Non-Maximum Suppression (NMS) methods, while effective in certain contexts, fall short in densely populated scenarios by indiscriminately discarding all but the highest-scoring overlapping boxes. This often leads to the erroneous elimination of valid detections [
28]. In contrast, Soft-NMS employs a more nuanced approach [
29]. By decrementing the scores of overlapping boxes rather than outright exclusion, Soft-NMS substantially improves the retention of accurate detection boxes for partially obscured animals. This nuanced adaptation not only addresses the inherent limitations of traditional NMS in complex detection scenarios, but also markedly enhances our model’s capability to accurately identify poultry.
In summary, the contributions of this paper are as follows:
We have innovatively introduced few-shot object detection technology into smart agriculture, aiming to enhance the practical performance of models in poultry farming.
To address the issue of model performance fluctuations caused by non-smooth loss in poultry farming environments, we have incorporated the Sharpness-Aware Minimization (SAM) model. This optimization enhances the smoothness of the loss function during model training, leading to improved stability and accuracy of the model in complex backgrounds.
To tackle the challenges posed by the high-density (average occlusion rate > 10%) congregation of farm animals, we have adopted Soft-NMS as a replacement for the traditional NMS method, reducing the wrongful deletion of occluded targets and significantly improving the model’s detection performance in dense scenarios.
3. Results
3.1. Experiments Setting
In this study, the SSNFNet model is trained and validated on the poultry dataset. The experimental setup designates ducks as the base class, with chickens and fish considered as novel categories for research. Additionally, the fish dataset is specifically chosen to represent aquatic life, testing the robustness of the SSNFNet model in diverse environments. The dataset consists of 500 images of ducks and 60 images each of chickens and fish. We use Faster-RCNN [
32] with Resnet-101 [
33] and Feature Pyramid Network [
35] as the detection model. The training of the SSNFNet model is conducted in two stages: initially, a base detector is trained using the duck dataset in the first phase; subsequently, in the second phase, the model undergoes fine-tuning to adapt and accurately identify the features of the new categories, chickens and fish. For each few-shot scenario, a small support set containing K randomly selected images (where K represents the number of shots, e.g., one-shot, two-shot) was assembled; each image in this support set depicted a single isolated object with no occlusion. All remaining images in this curated dataset then served as the query set to rigorously verify the model’s generalization capacity under practical evaluation conditions. This phased training approach is designed to enhance the model’s adaptability and accuracy in recognizing new categories. To demonstrate the stability of our method, each experiment was run three times using different seeds (2022–2024), and the mean and variance of the results were calculated.
In the image processing of the dataset, this study adopted a diversified input size strategy to enhance the generalization ability of the SSNFNet model. During the training phase, a range of minimum input sizes was utilized, including 480, 512, 544, 576, 608, 640, 672, 704, 736, 768, 800, etc. This multi-size input approach aids the model in effectively learning how to handle images of varying sizes. For testing, a consistent minimum input size of 800 was chosen to maintain uniformity in performance evaluation. Additionally, image cropping functionality was enabled to accommodate different sizes and proportions of images, ensuring the model’s excellent performance under various conditions. Through such a meticulous experimental setup, we aim to comprehensively evaluate the performance of the SSNFNet model in processing poultry datasets, especially those with significant differences in morphology, behavior, and habitat.
The experiments were conducted using the PyTorch 1.10.1 framework on a machine equipped with four NVIDIA RTX 2080 Ti GPUs (Santa Clara, CA, USA) (each with 11 GB of memory), an Intel(R) Xeon(R) Platinum 8255C CPU (Santa Clara, CA, USA) running at 2.50 GHz with 48 virtual cores, and 40 GB of system memory (RAM) (Intel, Santa Clara, CA, USA). It is important to note that the accuracy of few-shot learning can be significantly influenced by the number of GPUs used during training. All few-shot object detection models were trained and evaluated using the same four-GPU configuration to ensure the consistency, accuracy, and reliability of the experimental results.
In terms of training efficiency, each training epoch took approximately 10 s to complete, and the models were trained for a total of 200 epochs. As a result, the full training process for each model required no more than 30 min. For inference, the model achieved a speed of 4 s to process 126 images on an NVIDIA RTX 3060 GPU (Santa Clara, CA, USA), resulting in an approximate inference rate of 31 frames per second (FPS). We have further validated these results through preliminary deployment experiments on Jetson Nano hardware at 800 × 600 resolution, where the model maintained detection accuracy while achieving 8 FPS—comparable to FSCE (8 FPS) and superior to HTRPN (6 FPS) on identical hardware. This demonstrates our method’s practical deployment capability, as eight detections per second provides sufficient responsiveness for real-world object detection applications, while our careful design ensures no additional computational burden is introduced during inference compared to baseline methods. The original model has a size of 38.2 MB and requires approximately 4 GB of memory for inference. For deployment on edge devices, the model can be quantized, reducing its model size to around 19 MB and lowering its memory usage to approximately 2 GB, making it more suitable for resource-constrained environments.
To comprehensively evaluate the performance of the SSNFNet model in object detection tasks, we have chosen AP50 as our primary metric. AP50, a crucial indicator in the field of object detection, is primarily used to assess a model’s accuracy in detection. It focuses on measuring the degree of match between the model’s predicted bounding boxes and the actual bounding boxes, particularly when their Intersection over Union (IoU) reaches or exceeds 50%, at which point the prediction is considered correct. The calculation formula for the IoU is as follows:
here, “Area of Intersection” refers to the area of the overlap between the predicted and the actual bounding boxes, while “Area of Union” is the total area covered by both bounding boxes, which is the sum of their individual areas minus the area of intersection. The value of IoU ranges from 0 to 1, where 0 indicates no overlap and 1 indicates perfect overlap. This metric is extensively used to evaluate the performance of object detection models, especially their precision in detection.
The AP50 provides us with a quantitative measure of model performance relative a to higher IoU threshold under more lenient conditions, which helps assess the practicality and flexibility of the model in real-world applications. Especially in challenging few-shot learning scenarios, AP50 provides a clear standard to assess the model’s performance in terms of recognition accuracy. Moreover, due to its widespread application in the field of object detection, AP50 makes our research results comparably valuable and referential.
3.2. Comparative Experiments
We conducted a series of training and testing sessions on the poultry farming dataset using the popular few-shot object detection models: TFA w/cos [
34], FSCE [
26], Meta R-CNN [
36], DeFRCN [
37], and HTRPN [
38]. The experimental results are shown in
Table 1 and
Table 2. Although there were cases where HTRPN surpassed FSCE in model efficacy, it was essential to recognize that the HTRPN’s performance was significantly influenced by the number of GPUs, leading to considerable fluctuations in accuracy. Given this observation, we ultimately chose the FSCE model as the basis for our research. The FSCE model demonstrated superior stability and accuracy, making it more suitable for our application needs in the poultry farming domain.
In our comprehensive evaluation of SSNFNet across diverse few-shot scenarios—spanning one-shot, two-shot, three-shot, and five-shot settings—SSNFNet has unequivocally set a new benchmark in the realm of object detection. Detailed in
Table 1 and
Table 2, these experiments demonstrate SSNFNet’s unparalleled performance, significantly surpassing that of existing models and cementing its status as the superior choice for nuanced detection tasks. A standout achievement of SSNFNet is its exceptional proficiency in the detection of aquatic species, such as fish, underscoring its versatility. Taking
Table 2 as an example, within the three-shot paradigm, SSNFNet achieved groundbreaking detection accuracies (mAP50) of 82.75% for chickens and 66.00% for fish, culminating in an impressive average accuracy of 74.38%. Advancing to the five-shot scenario, SSNFNet further showcased its formidable capacity by elevating the average accuracy to an astounding 81.93%, a testament to its robustness and efficacy far beyond other few-shot detection models.
Beyond standard mAP50 comparisons, we conducted extensive evaluations to assess detection robustness under varying localization precision requirements. Specifically, we report AP scores at finer-grained IoU thresholds—ranging from AP50 to AP95—in
Table 3. This in-depth analysis reveals that SSNFNet not only achieves superior performance at conventional thresholds, but also maintains strong accuracy even under stringent localization constraints (e.g., AP75 and AP95), highlighting its superior regression capability and detection stability. Moreover, to further validate the effectiveness of our method from different perspectives, we conducted additional evaluations on a challenging object detection scenario with severe occlusions (i.e., the Fish dataset), beyond the commonly used mAP metric. Specifically, we selected F1 Score, Recall, and Precision as key evaluation metrics to provide a more comprehensive understanding of detection performance. As shown in
Table 4, all metrics demonstrate consistent improvement as the number of shots increases, indicating that our method offers stable and comprehensive enhancement in complex detection scenarios.
Moreover, a meticulous comparative analysis with well-known traditional object detection frameworks, including RetinaNet [
39], CenterNet [
40], FCOS [
41], EfficientDet [
42], YOLOv5 [
43], YOLOv7 [
44], and YOLOv8 [
45], further highlights SSNFNet’s exceptional adaptability and performance. The results, presented in
Table 1 and
Table 2, affirm SSNFNet’s consistent dominance over both traditional and few-shot object detection models across all few-shot settings tested. This remarkable superiority is not merely a statistical triumph, but a beacon of SSNFNet’s practical applicability, especially within the agricultural domain, where acquiring extensive labeled datasets is often a prohibitive challenge. SSNFNet’s ability to deliver high accuracy with minimal training samples is not just innovative; it represents a paradigm shift in object detection, opening new vistas for efficient, scalable deployment in varied and resource-constrained environments.
To vividly showcase the prowess of our SSNFNet model, we have utilized compelling visual representations, particularly emphasizing the five-shot scenario for its illustrative potential. As depicted in
Figure 7, SSNFNet not only excels in the precision of detecting individual targets, but also demonstrates an exceptional capability to identify a higher number of objects within the same image. It is this blend of high precision and the capability to handle high-density object scenarios that underscores SSNFNet’s exceptional utility and positions it as a groundbreaking advancement in the field of object detection, particularly suited for applications demanding meticulous attention to detail and high fidelity in object recognition.
To further validate the stability of our method, we extended the experimental setup from three random seeds (2022–2024) to five seeds (2021–2025) and visualized the final results, as shown in
Figure 8, demonstrating that our method maintains consistent performance in terms of both mean and variance metrics, which not only confirms its robustness, but also highlights its potential for reliable deployment in open-world scenarios.
We also analyzed some failure cases, as shown in
Figure 9. We observed that under dim lighting and extreme occlusion conditions, the model tends to mislabel multiple overlapping fish as a single fish. Moreover, when a fish is extensively occluded, it is often missed by the model. These issues primarily explain why the model performs worse on the fish dataset compared to the chicken dataset.
3.3. Ablation Experiments
In the ablation study section of this paper, we systematically explore the effects of Soft-NMS and Sharpness-Aware Minimization (SAM) on enhancing the performance of our proposed SSNFNet model. All ablation experiments were conducted on a poultry breeding dataset under a five-shot setting, aiming to precisely evaluate the contributions of these techniques to the task of few-shot object detection.
We conducted four sets of experiments. Initially, the baseline FSCE model, which did not incorporate Soft-NMS or SAM, achieved an average accuracy of 73.66%. When Soft-NMS was introduced without the use of SAM, there was an increase in detection accuracy for chickens to 81.75%, while the accuracy for fish was 66.56%, resulting in an improved average accuracy of 74.16% compared to the baseline model. These results indicate that Soft-NMS can enhance the model’s poultry detection performance through its more nuanced handling of overlapping detections.
Table 5 presents the results of our model ablation experiments.
The integration of SAM without Soft-NMS led to a notable improvement in fish detection accuracy, which increased to 72.90%, and chicken detection also improved to 85.52%. The average accuracy improved to 79.21%. This indicates that the SAM component significantly contributes to the model’s robustness, particularly in complex aquatic environments, where the background may introduce non-smooth loss challenges.
Our complete SSNFNet model, which incorporates both Soft-NMS and SAM, achieved the highest accuracies of 87.12% for chickens and 76.74% for fish, with an overall average accuracy of 81.93%. The combined effect of Soft-NMS and SAM in SSNFNet demonstrates a synergistic improvement, confirming that the integration of these two components can effectively enhance the model’s detection performance across different species in agricultural breeding scenarios.
3.4. Parameter Analysis
We also conducted parameter experiments, where a detailed analysis of parameters underscored the significance of fine-tuning hyperparameters for achieving optimal model performance. When testing one hyperparameter, all other hyperparameters were kept constant. By adjusting the rho value in SAM, the use of SAM-adaptive, the application of SAM-nesterov, the sigma value in Soft-NMS, and the score threshold of Soft-NMS, we observed significant differences in model performance across various parameter combinations.
The rho parameter in SAM dictates the model’s sensitivity to the local geometric properties of the loss function. A lower rho value results in a gentler adjustment of model sharpness, enhancing accuracy across different categories. The SAM-adaptive parameter determines whether the SAM optimizer adjusts its behavior adaptively to fit the data characteristics. In the SAM-nesterov setting, enabling nesterov-accelerated gradient (NAG) boosts the momentum of the optimization process, aiding the model in converging faster to superior solutions. Soft-NMS adjusts the strictness of non-maximum suppression (NMS) through the sigma parameter, mitigating penalties on highly overlapping boxes by modifying weights. The score threshold of Soft-NMS specifies the minimum score for retaining targets during the Soft-NMS process, with its adjustment balancing the quantity and quality of detection outcomes.
As shown in
Table 6, the model performs optimally on datasets of chickens and fish when the SAM’s rho value is set to 0.01, SAM-adaptive is false, SAM-nesterov is true, the sigma value of Soft-NMS is 0.6, and the score threshold of Soft-NMS is 0.9, achieving an average accuracy of 81.93%. It is noted that the optimal settings for these parameters may vary across different datasets.
4. Discussion
4.1. Contribution to Intelligent Poultry Farming
This study introduces a specialized few-shot object detection model called SSNFNet for the poultry farming sector, achieving significant results. This approach allows for accurate identification and tracking of poultry, even with limited data samples, substantially improving the capability of farms to diagnose diseases early, monitor for abnormal behaviors, and optimize production efficiency. Automated monitoring systems enable real-time observation of poultry health and behavioral patterns, facilitating prompt detection and response to health issues, minimizing disease spread and impact. Moreover, the application of this technology aids in refining feeding strategies and living conditions, enhancing resource utilization, reducing waste, and ultimately increasing the overall productivity of poultry farming. This research not only provides an innovative method for intelligent poultry cultivation, but also offers valuable insights for practices in precision agriculture.
4.2. Comparison of Methods
In the domain of poultry farming, traditional object detection technologies face significant challenges due to their reliance on extensive annotated datasets for training models to achieve desired accuracy and generalization capabilities. This process often demands the collection and annotation of thousands of images, covering a variety of poultry species, behaviors, and potential farming environmental conditions, which introduces considerable upfront costs and time investment. However, the introduction of SSNFNet—a breakthrough based on few-shot object detection technology—marks a significant advancement. SSNFNet dramatically reduces the dependency on extensive annotated data, enabling the recognition of new species or behaviors with training on just a handful of images. This not only accelerates the deployment of models significantly, but also reduces costs markedly. Importantly, it elegantly addresses the complexity and variability of poultry farming environments, offering a more flexible and efficient solution.
Contrastingly, traditional object detection methods, when confronted with new scenarios or environmental changes, might necessitate the re-collection of extensive data and re-training of the model, a process that is both time-consuming and costly. In stark contrast, SSNFNet can quickly learn and adapt based on a few examples. This flexibility and robust adaptability allow for SSNFNet to swiftly accommodate new poultry breeds or previously un-encountered specific behavior patterns, demonstrating superior generalization capabilities. In this manner, SSNFNet contributes a reliable tool for intelligent poultry farming, providing ongoing monitoring and management in the ever-changing farming environment.
This contribution not only signifies technological progress, but also paves the way for new possibilities in the digital transformation of the poultry farming industry, establishing a robust foundation for future advancements. Through diminishing reliance on extensive datasets, the implementation of SSNFNet accelerates innovation and provides substantial support for animal welfare and disease management.
4.3. Limitations and Future Developments
Our study has made certain progress in exploring the application of few-shot object detection in the field of poultry farming, yet there are still some limitations, which also point the way for future research.
Firstly, regarding the generalizability and dataset scope of SSNFNet, it is pertinent to highlight that our current dataset, while centered on chickens, ducks, and goldfish, encompasses notable environmental and acquisitional variability. Specifically, data was gathered under diverse lighting conditions resulting from collection at different times of day, utilizing various acquisition devices which introduced sensor noise, and spanning distinct farming environments—for example, chicken data was sourced from three separate farms featuring different housing conditions. The inclusion of these variations allowed for our experiments to assess the model’s capability in extracting robust deep semantic features, a critical aspect of effective few-shot learning. Indeed, SSNFNet demonstrated commendable robustness in handling these inherent dataset variations. However, we acknowledge that the geographic spread of our data collection sites and the diversity of specific environmental factors (e.g., unique housing structures not encountered, extreme weather conditions, or a vastly wider range of animal species) are not exhaustive. This could introduce biases if the model is deployed in regions or conditions starkly different from those in our training and testing data, potentially limiting its out-of-the-box generalizability to such novel settings. Investigating the model’s operational boundaries under such increasingly complex and varied scenarios will be a key focus of subsequent work.
Secondly, real-world deployment of SSNFNet in operational farm settings presents practical constraints that were not exhaustively addressed in this study. For instance, variations in sensor types, quality, age, and calibration across different farms, or even over time on the same farm, can lead to inconsistencies in input data, potentially affecting model performance. More critically, occlusion issues, where animals are partially or fully hidden by other animals, feeding troughs, drinkers, or structural elements of their enclosures, are frequent occurrences in densely populated poultry environments. Such occlusions can significantly impair the model’s detection accuracy and reliability. Future work should explicitly investigate strategies to enhance model robustness against these sensor variations and develop more sophisticated methods for handling occluded objects in real-world agricultural settings.
Thirdly, our proposed SSNFNet model adopts a two-stage training approach, where the first stage relies on a larger initial dataset to train the foundational network. This reliance limits the model’s adaptability in extreme few-shot scenarios where such initial large datasets are unavailable. Moving forward, we will explore more efficient few-shot learning strategies to reduce dependence on large-scale initial datasets, potentially investigating one-stage architectures or meta-learning approaches that are inherently more data-efficient.
Fourthly, the performance of the model is to some extent affected by computational resources, especially the number of GPUs available. We have observed a significant improvement in the model’s performance with an increase in the number of available GPUs. This phenomenon suggests that, despite the fact that the model’s design considers computational efficiency, its performance advantage might be diminished in resource-constrained situations, such as on lower-resource farms or edge computing devices. In response to the relationship between model performance and computational resources, future work will focus on optimizing the model’s structure, exploring model compression techniques, and refining the training process to achieve high performance even with limited computational resources, thereby enhancing its accessibility and applicability in diverse farming contexts.
In summary, the journey ahead remains full of opportunities for innovation. Despite the constraints brought about by data scarcity, deployment complexities, and limited computational resources, which test the resilience of our methodologies, they also inspire creative solutions and development strategies. As we progress, the integration of more effective learning methods and the optimization of model architectures hold the promise of ushering in a new era of precision and flexibility in agricultural technology.