Image Classification of Chicken Breed and Gender Using Deep Learning

Zhu, Liuchao; Chen, Zixin; Zhang, Hanwen; Shan, Yanju; Ji, Gaige; Xu, Huanliang; Shu, Jingting; Huang, Junxian

doi:10.3390/agriengineering7070211

Open AccessArticle

Image Classification of Chicken Breed and Gender Using Deep Learning

by

Liuchao Zhu

^1,†,

Zixin Chen

^1,†,

Hanwen Zhang

^2,†,

Yanju Shan

³,

Gaige Ji

³,

Huanliang Xu

¹,

Jingting Shu

^3,* and

Junxian Huang

^1,*

¹

College of Artificial Intelligence, Nanjing Agricultural University, 1st WeiGang, Nanjing 210095, China

²

Hertfordshire College, Changzhou Institute of Technology, Changzhou 213032, China

³

Key Laboratory for Poultry Genetics and Breeding of Jiangsu Province, Jiangsu Institute of Poultry Sciences, Yangzhou 225125, China

^*

Authors to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

AgriEngineering 2025, 7(7), 211; https://doi.org/10.3390/agriengineering7070211

Submission received: 18 May 2025 / Revised: 12 June 2025 / Accepted: 24 June 2025 / Published: 2 July 2025

(This article belongs to the Collection Exploring the Application of Artificial Intelligence and Image Processing in Agriculture)

Download

Browse Figures

Versions Notes

Abstract

Identifying chicken breeds and genders accurately is essential for conserving local breeds and maintaining gender ratios on farms. This study developed a system based on the Swin Transformer that efficiently and accurately classifies chicken breeds and genders. The system incorporates a target detection module to eliminate background noise and employs data augmentation techniques to prevent overfitting. A high-quality dataset, consisting of 10,482 locally captured images representing 13 Chinese native chicken breeds, was created for training and testing the model. The system was evaluated using a custom dataset and compared against popular image classification models, such as ResNet and ViT. Results indicate that the target detection module and data augmentation effectively improved the model’s performance. Additionally, strategies such as increasing the input size appropriately and utilizing pre-trained weights significantly enhanced the model’s accuracy. Interpretability analysis reveals that the system successfully identifies specific chicken body parts across different breeds and genders, aligning with human visual attention and highlighting its effectiveness. This work provides a robust solution for poultry management, aiding in tasks such as breed selection, gender ratio control, and genetic conservation. Furthermore, the methodology and dataset presented in this research provide a foundation for future studies in agricultural computer vision applications.

Keywords:

chicken breed identification; chicken gender classification; swin transformer; computer vision

1. Introduction

According to the Food and Agriculture Organization of the United Nations (FAO), over 1600 chicken breeds are recognized globally, comprising about 63% of all poultry breeds [1,2]. These breeds exhibit significant diversity in morphology, physiology, and behavior [3], which enables farmers to select breeds that meet specific production requirements. The varying breeding challenges and distinct economic value associated with different chicken breeds necessitate precise breed identification for effective farm management. For instance, the broiler chicken industry typically selects breeds with rapid growth and superior meat quality [4], while the laying hen industry prefers breeds known for high egg production rates and robust adaptability [5]. Such precise selection is crucial for large-scale breeding operations to avoid hybridization at the farm level [6]. Additionally, gender determination is vital for managing poultry populations, as it ensures an optimal gender ratio within flocks. This is essential for improving production efficiency, promoting healthy growth, and increasing economic returns [2]. Therefore, precise species and gender identification is fundamental to boosting the overall efficiency of the poultry farming industry [7].

The diversity of poultry breeds and the high similarity among certain breeds create significant challenges for effective classification. In addition, the nuanced variations in physical characteristics and behavioral traits complicate accurate identification. Traditional manual classification methods require extensive experience and suffer from inefficiency and high costs, making them unsuitable for modern large-scale breeding operations [8,9]. In addition, prolonged close contact between breeders and chickens poses a significant biosecurity risk [10].

In recent years, with the development of artificial intelligence and computer technology, an increasing number of modern technologies have been applied to agricultural production [11,12,13,14]. These applications have significantly accelerated the intelligentization process of related industrial tasks [15,16]. By fine-tuning the YOLOv4 (You Only Look Once version 4) target detection algorithm, Gupta et al. [17] achieved recognition of multiple cow breeds with an accuracy of 81.07%. In addition, Jwade et al. [18] applied transfer learning to the VGG-19 model, fine-tuning it for downstream tasks, and established a sheep breed classifier with an average accuracy of 95.8%. For poultry breed and gender classification, after constructing a dataset containing images of four significant pigeon breeds, Rahman et al. [19] performed analysis using a convolutional neural network (CNN)-based architecture with transfer learning, achieving a final recognition accuracy of 95.33% during testing. Furthermore, Wu et al. [20] introduced SE attention to enhance the residual unit of ResNet-50, combining it with an improved activation function and optimizer to boost model performance. The final result showed an accuracy of 98.42% for chicken breed detection. To provide a comprehensive comparison of the advantages and disadvantages of existing intelligent methods, several major approaches are summarized in Table 1.

After a systematic analysis of the model’s potential advantages for this task and the experimental results, we developed a new image classification system using the Swin Transformer [21] to cope with the above problems, aimed at accurately and effectively identifying the chicken breed and gender. Our dataset was colllected in real production scenarios and we compared various classic models of image classification and integrated other computer vision techniques such as object detection and data augmentation to enhance the performance of the classification system.

As typical black-box models, the decision mechanisms of deep learning models are often difficult to understand intuitively [22]. The emergence of some interpretable analysis techniques, such as Grad-CAM(Gradient-weighted Class Activation Mapping) [23], SHAP(SHapley Additive exPlanation) [24] and LIME (Local Interpretable Model-Agnostic Explanations) [25], has alleviated this problem to some extent. These methods offer significant advantages, including enhanced transparency by revealing which parts of the input data influence the model’s decisions, increased trustworthiness by allowing us to understand and verify the model’s behavior, and the ability to identify and mitigate potential biases or errors within the model. Consequently, these methods can assist the application of deep learning techniques in agriculture. We have also applied interpretable analysis techniques to the proposed classification system and the results confirm to some extent the effectiveness of the system for classification.

This study presents a novel image classification system for chicken breed and gender classification, utilizing deep learning models and computer vision techniques. The main contributions include the creation of a high-resolution image dataset containing multiple native Chinese chicken breeds, the incorporation of the Swin Transformer model with target detection, and data augmentation to improve classification performance. Additionally, interpretability analysis methods, such as Grad-CAM, were employed to improve the model’s transparency and trustworthiness. The complete source code is available on GitHub (https://github.com/quietbamboo/breed-classification, accessed on 15 June 2025). The proposed system outperforms traditional models in real-world agricultural scenarios, offering an efficient and reliable solution for chicken breed and gender classification, with potential benefits for preserving local breeds and advancing the poultry industry.

2. Materials and Methods

2.1. Materials

2.1.1. Data Collection

In this study, we developed a high-resolution image dataset comprising 10,482 images from 13 commonly recognized native Chinese chicken breeds, each with a resolution of 2160 × 3840 pixels. Furthermore, we considered breed and gender as separate distinguishing factors for classification. Each breed was categorized into two distinct groups based on gender, resulting in a total of 26 categories across the 13 breeds. During image collection, approximately 30 individual chickens of each gender were photographed for each breed, with the number of images collected per individual being consistent. However, there were variations in the number of individuals across some of the breeds, resulting in a dataset that was not balanced. Additionally, the backgrounds, angles, lighting conditions, and postures of the chickens were randomly set for each image, aiming to introduce diversity and ensure the algorithm’s applicability in real-world scenarios. Table 2 provides a detailed illustration of the distribution of individuals and image samples in this dataset.

2.1.2. Dataset Splitting

In this study, as illustrated in Figure 1a, the dataset was divided into training, validation, and test sets in a ratio of 64:16:20 to accurately evaluate the models’ performance. Specially, due to the potential high similarity among multiple images collected from the same chicken, all images from a single individual were treated as a independent entity in data partitioning to prevent data leakage. Specifically, images in different datasets were not collected from the same chicken. This division ensured that each dataset represented the overall data distribution and included images from each category, thereby facilitating comprehensive training and rigorous evaluation of the model. Additionally, fixed seeds were utilized during the data splitting process to partition the dataset consistently, ensuring that each experiment was conducted under identical data conditions. This method guarantees the fairness of comparative analyses.

2.2. Preprocessing

Images captured in real-world scenarios frequently exhibit various uncertainties, such as complex backgrounds, lighting variations, different shooting angles, and scale changes. These factors may significantly impact the results of experiments. To address these issues, a series of preprocessing operations, including object detection and multiple data augmentation techniques, were implemented to standardize the characteristics and quality of images within the dataset, as shown in Figure 1b.

2.2.1. Object Detection

As shown in Figure 2, the images in the dataset were derived from real breeding environments, which may have resulted in the presence of distracting elements such as other chickens or the background of the coop. Therefore, we used the object detection methodology based on YOLOv8 (https://github.com/ultralytics/ultralytics, accessed on 10 September 2024) to accurately segment the target subject. This algorithm is particularly adept at performing real-time object detection tasks within complex images [26] due to its rapid processing speed and exceptional accuracy.

Relevant hyperparameters were iteratively optimized over multiple epochs, and appropriate data augmentation parameters were selected to further enhance model training. Upon completion of training the object detection model, the trained model was employed to extract targets from images across the entire dataset. Specifically, the bounding rectangle of the chicken in the original image was inferred by the trained object detection model, and a new image was then cropped based on this rectangle, effectively excluding complex background information. Additionally, to meet the requirement of square images for subsequent input layers, edge zero-padding was applied to the segmented images, ensuring no distortion of the chicken’s features. This approach also improves computational efficiency, as padding with zero values prevents activation of units in the following layers, reducing unnecessary computations [27].

2.2.2. Data Augmentation

We applied a series of data augmentation techniques to increase the variety and diversity of the training dataset, which helped the model generalize better and reduced the risk of overfitting [28]. These techniques included randomly cropping and resizing images (crop scale range: 0.6–1.0), random horizontal flipping (flip probability: 0.5), random angle rotation (rotation angle range: ±180°), and adjustments to brightness, contrast, saturation, and hue (jitter range for the four image attributes: 0.8–1.2). The technique of randomly cropping and resizing adjusts each image to a predetermined pixel dimension while preserving the flexibility of essential features, thereby enhancing the model’s capacity to capture significant feature information. The methods of random horizontal flipping and random angle rotation introduce variability by flipping images horizontally and rotating them at random angles, respectively, thereby reducing the model’s dependency on specific image orientations or configurations. Adjustments to brightness, contrast, saturation, and hue simulate different lighting and color conditions, aiding the model’s performance in diverse real-world environments.

2.3. Model

In recent years, deep learning networks have been extensively applied in various image processing tasks due to their unique structures and outstanding performance. This study compares eight state-of-the-art (SOTA) image classification models: AlexNet [29], VGG-16 [30], ResNet-50 [31], MobileNetV3 [32], RegNet [33], Vision Transformer (ViT) [34], Swin Transformer [21], and YOLOv8. AlexNet is one of the earliest convolutional neural networks and marked a major breakthrough in image recognition in 2012 by adopting ReLU activation and multi-GPU training. VGG-16 builds on this by stacking small 3 × 3 convolutional filters, enabling deeper networks with improved feature extraction. However, increasing depth led to the problem of vanishing or exploding gradients [35,36], which ResNet-50 solved by introducing residual connections, allowing deeper architectures to train effectively. MobileNetV3 is a lightweight network optimized through neural architecture search for efficient deployment on mobile and edge devices. RegNet was introduced in 2020 and provides a systematic and scalable framework for model design, balancing accuracy and efficiency across various settings. That same year, Vision Transformer (ViT) pioneered the use of self-attention mechanisms on image patches, capturing global context and showing strong performance on large-scale datasets. Building on ViT, Swin Transformer introduces a hierarchical design with shifted windows to enable efficient self-attention over local regions. This reduces computation and allows effective processing of high-resolution images. As a result, it scales well across tasks from classification to detection, segmentation, and 3D scene understanding [37,38,39]. YOLOv8 integrates advanced detection techniques like anchor-free prediction and improved feature pyramids, delivering both high speed and accuracy for real-time vision applications.

We chose these specific models to ensure that our comparison reflects a wide range of image classification methods while maintaining a balance between model novelty, efficiency, and architectural diversity. Other well-known models, such as DenseNet, while very effective, are not included because the models we selected generally cover their design principles. These eight models are evaluated in the classifier component, as illustrated in Figure 1c, to develop an effective system for chicken breed and gender classification.

This study aims to identify the most effective and comprehensive model for breed and gender classification in chickens. To this end, a series of optimization strategies were employed to enhance model performance and better adapt to the classification task. Firstly, the output layer of each model was adjusted to align with the specific number of target categories in this task. Next, we conducted extensive comparative experiments on the key hyperparameters, as these parameters significantly impact model performance [40]. All models were fine-tuned to their optimal hyperparameter settings to the best of our ability before comparison. The tuned parameters included input size, batch size, learning rate (lr), optimizer, and the number of training epochs. Secondly, pre-training can usually accelerate model convergence and significantly impact the final performance of models [29,41]. So, we performed ablation experiments to assess the contribution of pre-trained weights. Similarly, we designed comparative experiments to evaluate the effectiveness of the preprocessing techniques used. In addition, an early stopping mechanism was introduced to prevent model overfitting. Specifically, training was terminated if the validation accuracy did not improve over 100 consecutive epochs, indicating that the model likely reached its optimal performance. Finally, we applied visualization techniques to the two best-performing models to analyze and interpret their classification processes, providing further insights into their decision-making mechanisms.

After completing hyperparameter tuning, we identified a configuration set that likely leads to optimal model performance. Specifically, all models were trained using the cross-entropy loss function and the Adam optimizer, with a random seed fixed at 123, a batch size of 32, a learning rate of

5 \times 10^{- 5}

, and a maximum of 1000 training epochs. In addition, We explored the use of a criterion with weighted coefficients and designed a comparative experiment to evaluate its effect on the model’s performance. All models were initialized with ImageNet-1K pre-trained weights released by the original authors. Batch normalization was applied in all models to stabilize training by normalizing the activations of each layer, improving convergence. However, dropout was not used in the models, as its regularization effect was not deemed necessary given the specific characteristics of the dataset. For the Swin Transformer model, the configuration included a patch size of 4, an embedding dimension of 96, layer depths of (2, 2, 6, 2), attention heads of (3, 6, 12, 24), a window size of 7, an MLP ratio of 4, and a stochastic depth rate of 0.1.

2.4. Evaluation Metrics

Four evaluation metrics, which are suitable for multi-classification tasks, were selected to comprehensively evaluate the model’s performance. The calculation formulas are as follows. In these formulas, TP (True Positive) represents the number of samples correctly predicted as positive, while

T N

(True Negative), FP (False Positive), and FN (False Negative) can be determined through parity inference. Additionally, n is the total number of sample categories, and

w_{i}

is the weight of the i category, calculated by dividing the actual number of samples in that category by the total number of test samples.

Accuracy (ACC) represents the proportion of samples whose predicted results match the actual labels; recall rate indicates the proportion of samples correctly identified as class i out of the total actual samples of that class. The calculation formula is (1). In the formula,

C_{i i}

and

C_{i j}

are elements in the confusion matrix, representing the number of correct predictions for each category and the number of samples whose actual category is i and predicted category is j.

\begin{matrix} Accuracy = \frac{\sum_{i = 1}^{n} C_{i i}}{\sum_{i = 1}^{n} \sum_{j = 1}^{n} C_{i j}} \end{matrix}

(1)

Weighted Recall (WR) is the recall rate weighted by the proportion of each category in the total sample. This can be calculated using Equations (2) and (3). Equation (1) and (3) yield equivalent results, meaning that ACC and WR have the same value in the current multi-class task. Precision indicates the proportion of correctly predicted positive samples out of all predicted positive samples, and Weighted Precision (WP) is the precision weighted by category weights, as shown in Equations (4) and (5).

\begin{matrix} {Recall}_{i} = \frac{T P_{i}}{T P_{i} + F N_{i}} \end{matrix}

(2)

\begin{matrix} Weighted Recall = \sum_{i = 1}^{n} ({Recall}_{i} \times w_{i}) \end{matrix}

(3)

\begin{matrix} {Precision}_{i} = \frac{T P_{i}}{T P_{i} + F P_{i}} \end{matrix}

(4)

\begin{matrix} Weighted Precision = \sum_{i = 1}^{n} ({Precision}_{i} \times w_{i}) \end{matrix}

(5)

The F1 score is the harmonic mean of precision and recall, used to measure the balanced classification performance of a model for a single category. The Weighted F1 score [42] is an overall metric obtained by weighting the F1 scores of each category by their sample size, used to measure the model performance on imbalanced datasets. The calculation formulas are illustrated in Equations (6) and (7).

\begin{matrix} F 1_{i} = 2 \times \frac{{Precision}_{i} \times {Recall}_{i}}{{Precision}_{i} + {Recall}_{i}} \end{matrix}

(6)

\begin{matrix} Weighted F 1 Score = \sum_{i = 1}^{n} w_{i} \times {F 1}_{i} \end{matrix}

(7)

The Matthews Correlation Coefficient (MCC) [43] is a composite indicator for evaluating overall model performance, considering the four categories in the confusion matrix (TN/TP/FN/FP), and its calculation formula is shown in Equation (8). In the formula, c represents the number of all correct predictions; s represents the total number of samples in the test set;

p_{i}

represents the total number of samples predicted to be of class i; and

t_{i}

represents the total number of samples actually of class i.

\begin{matrix} M C C = \frac{c \times s - \sum_{i = 1}^{n} p_{i} \times t_{i}}{\sqrt{(s^{2} - \sum_{i = 1}^{n} p_{i}^{2}) (s^{2} - \sum_{i = 1}^{n} t_{i}^{2})}} \end{matrix}

(8)

2.5. Development Environment

The development and testing platform for the algorithms involved in this study was a high-performance computing server equipped with an AMD EPYC 7742 64-Core Central Processing Unit (CPU), 86 GB of runtime memory, 350 GB of hard disk capacity, an NVIDIA A800 Graphics Processing Unit (GPU), and Ubuntu 22.04. The programming environment includes OpenCV 4.9.0, Python 3.8.0, Torch 1.8.0, Torchvision 0.9.0, and Ultralytics 8.2.27. The labeling software tool used for annotating the chicken’s bounding box in the images was anylabeling (https://github.com/vietanhdev/anylabeling, accessed on 1 August 2024).

3. Results and Discussion

3.1. Swin Transformer Achieves Superior Accuracy and Robustness in Comparative Model Evaluation

In this study, we compared eight representative image classification deep learning models, AlexNet, VGG-16, ResNet-50, MobileNet-V3, RegNet, ViT, YOLOv8 and Swin Transformer, aiming to develop a more accurate and efficient system for chicken breed and gender classification. In the experiment, the key hyperparameters mentioned in Section 2.3 were optimized for different models, and the optimal results obtained after adjusting parameters for different models are shown in Table 3.

The evaluation results indicate that after targeted tuning of different models, all models achieved test accuracies above 90%. These results validate the effectiveness of the data collected in real scenarios and prove the feasibility of using deep learning models for this task. The comprehensive performance of Swin Transformer was better than the other models. Specifically, the Swin Transformer achieved test results of 99.38% for ACC, 99.39% for WP, 99.37% for Weighted F1, and 99.34% for MCC. The Swin Transformer demonstrated a significant improvement over ViT across multiple accuracy metrics. Notably, the ACC of the Swin Transformer surpassed that of ViT by 0.72%. We hypothesize that this improvement is primarily due to the Swin Transformer’s sliding window self-attention mechanism, which better captures both local details and global context in images, leading to more precise handling of complex image tasks.

Furthermore, compared to CNN-based models, the Swin Transformer also exhibited superior performance across various metrics. For instance, the ACC of the Swin Transformer was 0.96% higher than that of ResNet-50. This performance enhancement can likely be attributed to the Swin Transformer’s multi-scale feature fusion strategy. This strategy effectively integrates local and global feature information, thereby improving its ability to process visual features at different scales. To further illustrate the training dynamics of the Swin Transformer, Figure A1 (see Appendix A) presents the accuracy and loss curves for each training epoch, indicating stable convergence and continued performance improvement. In addition, Figure A2 (see Appendix A) presents the confusion matrix of the Swin Transformer on the test set, showing strong diagonal dominance. This reflects consistently high accuracy across all chicken breed and gender categories, with no evident class bias. These results confirm the model’s robustness, fairness, and strong generalization in fine-grained classification.

In consideration of the needs of real-world agricultural scenarios, we also added indicators such as GPU inference time, CPU inference time, maximum GPU memory usage, and model parameter quantity. The comparison results show that the Swin Transformer had a disadvantage in terms of the number of Parameters, Inference Time, and Max GPU Usage. Despite these drawbacks, the developed classification systems must be implemented in real-world agricultural scenarios, where misclassification of breed and gender could result in severe consequences, such as improper allocation of resources. In this task, we selected Swin Transformer due to its excellent comprehensive performance.

However, Swin Transformer had a disadvantage in terms of the number of parameters and inference time. The developed classification systems must be implemented in real-world agricultural scenarios, where misclassification of breed and gender could result in severe consequences, such as improper allocation of resources. In this task, we selected Swin Transformer due to its excellent comprehensive performance.

In the current farm environment, we have sufficient computing resources to support its use. However, when resources are limited, opting for a lightweight model such as MobileNetV3 or reducing the image resolution would be a good alternative. All models were trained within 10 h and are suitable for deployment in practical applications. To leverage the full potential of Swin Transformer, the model used in the experiment is the Large version, pre-trained on images with a resolution of 384 × 384 pixels and 22,000 categories. The images are divided into 4 × 4 patches, and the window size for attention computation is set to 12 × 12. Additionally, to support a more comprehensive performance comparison, Table A1 (see Appendix A) presents the results of various image classification models evaluated on input images resized to 512 × 512 pixels, enabling further analysis of resolution effects and model differences.

3.2. Target Detection and Data Augmentation Enhance Performance in Chicken Breed and Gender Classification

In this study, we conducted several experiments to analyze the impact of target detection and data augmentation on chicken breed and gender classification tasks, aiming to explore the effectiveness of these preprocessing techniques in improving model performance. The results of these experiments are illustrated in Table 4.

The effectiveness of the target detection module was validated by replacing the original data, which included complex environmental information, with data identified and cropped through target detection, serving as the model input. As illustrated in Table 4, the accuracy reached 98.81%. This represents a 1.64% increase in accuracy after applying target detection. This improvement can likely be attributed to the ability of target detection techniques to better localize and identify relevant regions of interest within the images. By focusing on these critical areas, the model can reduce noise and irrelevant background information, leading to more accurate predictions.

We incorporated the data augmentation process described above into the data preprocessing stage to evaluate the effectiveness of the data enhancement module. As shown in Table 4, the accuracy reached 97.84%. This represents a 0.67% improvement in accuracy after applying data augmentation. This improvement is likely due to the ability of data augmentation to increase the diversity of the training dataset by introducing various transformations such as random cropping, rotation, and flipping adjustment. By exposing the model to a wider range of variations in the training data, data augmentation helps the model generalize better to unseen data, thereby reducing overfitting and improving its overall predictive accuracy. When both target detection and data augmentation were applied simultaneously in the data preprocessing, a final accuracy rate of 99.38% was achieved. These results demonstrate that applying target detection and data augmentation to chicken breed and gender classification tasks is both effective and reliable. In addition, to verify the effectiveness of the weighted coefficient strategy, we similarly designed a comparative experiment. As indicated in Table 4, the accuracy decreased slightly from 99.38% to 99.19% after incorporating the weighted coefficients. The possible reason is that, although the categories are not perfectly balanced, the imbalance is not severe enough to necessitate special handling. As a result, the effect of the weighting coefficient on improving the model’s performance is limited in this case.

3.3. Appropriate Image Input Size Improves the Accuracy of the Classification Model

The input size of the model represents the resolution of the images used for training and validation, which often has a significant impact on the model’s performance [44,45]. Higher resolution images can provide more information but also increase computational complexity and the impact of noise. Currently, most models used for image classification adopt an input size of 224 × 224 for training and validation. However, this standardized size may limit model performance for tasks that require capturing fine-grained image details. In these tasks, the differences between samples are subtle, and these details are crucial for enhancing the classification model’s performance. Smaller image sizes might lead to the loss of these crucial details. To evaluate the potential impact of this issue, we conducted tests using three different input sizes: 224 × 224, 512 × 512, and 1024 × 1024. For each model, three sets of comparative experiments were conducted to evaluate the effect of input size on model performance and to determine the optimal size. The results are presented in Table 5.

Table 5 shows that increasing the input size from 224 × 224 to 512 × 512 improves the classification accuracy across different models to varying degrees. Notably, Swin Transformer achieved the highest accuracy of 99.38% after an increase of 1.49%. This general improvement in accuracy may be attributed to the higher resolution of input images providing more detailed information, enabling the models to make more precise predictions. However, when the input size was further increased to 1024 × 1024, the models’ performance diverged. Specifically, ResNet-50, MobileNetV3, and YOLOv8 continued to improve their accuracy, reaching 98.42%, 97.79%, and 98.27%, respectively. In contrast, other models experienced varying degrees of accuracy decline with this adjustment. In summary, Swin Transformer performed best at a resolution of 512 × 512, achieving 99.38% accuracy, and maintained high accuracy even at 1024 × 1024, demonstrating good adaptability to different resolutions.

The evaluation results indicate that subtle differences between similar samples may be crucial for the accurate classification of chicken breed and gender. This also partially explains why the classification accuracy improves with an increase in input size. However, the results also demonstrate that an excessively large input size may amplify the impact of noise or features with limited generalization capability, leading to a decline in model performance. As the classification system proposed in this study was developed using Swin Transformer, the input size was set to 512 × 512 to better meet the requirements of this specific task.

3.4. Application of Pre-Trained Weights Enhances Classification Model Performance

Benefiting from the strong visual knowledge distributed in large image datasets, pre-trained models that are fine-tuned with task-specific data tend to perform better on downstream tasks than models without pre-trained weights [35]. Additionally, utilizing pre-trained weights can enhance model robustness and expedite convergence [41]. Therefore, in order to further explore the impact of pre-trained weights on model performance, we compared the performance of the eight image classification models used with or without pre-trained weights. The results are illustrated in Figure 3.

The results illustrated in the bar chart indicate that the application of pre-trained weights improved the prediction accuracy across the eight models used in the experiment to varying degrees. Notably, the lightweight network MobileNet-V3 experienced a 10.41% increase in accuracy. Overall, these results suggest that pre-trained weights can effectively enhance the performance of various models in the current task. The improvement in pre-trained model performance is likely attributable to the extensive visual features acquired from large-scale image datasets during the pre-training phase, enabling the models to adapt more effectively to the specific task. Consequently, we opted to use pre-trained weights to further enhance the model’s classification performance.

3.5. Swin Transformer Exhibits Superior Performance in ROI Visualization, Supporting Previous Experimental Results

In this study, we applied multiple deep learning models to the classification of chicken breeds and genders. Many models demonstrated exceptional performance, comparable to human experts specialized in chicken breeds. Additionally, the evaluation results revealed that all eight models showed accuracy improvements after increasing the image input size from 224 × 224 to 512 × 512. As a typical black-box model, the prediction results of deep learning models are often difficult to interpret intuitively.

Therefore, an interpretability analysis was conducted to provide an intuitive understanding of the models’ classification behavior and the effect of image input size on model performance. This analysis visualized the feature information extracted by the deep learning models, highlighting the contribution of different body parts of the chickens to the model’s decisions. The Grad-CAM [23] technique was used to visualize the ROI (region of interest) identified by the model, with the results shown in Figure 4. The varying shades of colors indicate the extent of the region’s contribution to the classification decision: redder colors denote greater contributions, while bluer colors denote lower ones.

The ROIs of RegNet and Swin Transformer were visualized for image input sizes of 224 × 224 and 512 × 512, and four sets of comparative experiments were conducted on multi-class samples. The results show that all four models exhibit significant differences in the ROI for different chicken breeds and genders, closely related to the focus areas used in human classification. This finding confirms the effectiveness of these models in recognizing visual features specific to breeds and genders. The models are able to extract more and more accurate key features at an input size of 512 × 512 compared to an input size of 224 × 224. This proves that appropriately increasing the image input resolution can refine more features for the model, thus facilitating more accurate and effective classifications to some extent. Additionally, under conditions of two different input resolutions, the Swin Transformer demonstrates more accurate and effective ROIs in chickens compared to RegNet, which is consistent with the results illustrated in Table 5.

Furthermore, the ROI results for the fourth group of samples in Figure 4 indicate that after increasing the model’s input size, local features are further refined, while background information also has some influence on the model’s classification behavior. The possible reason for this is that, although the influence of the background was somewhat mitigated using independent groups to divide the dataset and target detection strategy during the data division process, the impact of the background on image classification remains non-negligible in the specific context of some samples. Therefore, we aim to explore techniques such as image segmentation to further isolate the effect of the background in our future work.

4. Conclusions

This study constructed a high-precision image dataset of 13 common Chinese native chicken breeds based on real production scenarios. An innovative classification system was developed using Swin Transformer, incorporating optimization strategies such as data preprocessing, object detection, and data augmentation to maximize performance. Following these optimization strategies, the system achieved superior results. These outcomes confirm the significant improvement in system performance, with the Swin Transformer outperforming other baseline models and demonstrating its suitability for practical classification tasks. Although the model has some limitations in computational cost and inference speed, the minor differences do not significantly affect its overall effectiveness. Furthermore, interpretability analysis was applied to better understand the model’s decision-making mechanisms, revealing that the model focused on distinct regions specific to each breed, further validating the reliability of its classification decisions. In conclusion, the developed system offers an effective solution for enhancing the economic benefits of the poultry industry and preserving local breeds, providing a valuable tool for the precise classification of poultry breeds and genders.

Author Contributions

Conceptualization, L.Z., Z.C., and H.Z.; Methodology, L.Z., Z.C., J.H., and H.X.; Validation, L.Z. and Z.C.; Visualization, L.Z., Z.C., and J.H.; Software, Z.C. and H.Z.; Writing—Original Draft, L.Z., Z.C., and J.H.; Writing—Review and Editing, L.Z., Z.C., H.Z., and H.X.; Investigation, H.Z.; Resources, J.S. and J.H.; Data Curation, G.J. and Y.S.; Project Administration, J.S. and G.J.; Supervision, H.X. and Y.S.; Funding Acquisition, Y.S., G.J., J.S., and J.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research is supported by STI2030-Major Projects 617 (Grant No. 2023ZD04064) and the earmarked fund for CARS-41. The APC was funded by Junxian Huang.

Institutional Review Board Statement

Ethical review and approval were waived for this study because all animal-related procedures involved in this research were non-invasive and did not cause harm or distress to the animals, thereby ensuring the preservation of their welfare.

Informed Consent Statement

Not applicable.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request. In addition, we have made the source code available on GitHub (https://github.com/quietbamboo/breed-classification, accessed on 15 June 2025).

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

Table A1. Performance comparison of image classification models on chicken breed and gender identification with fixed input size of 512 × 512 pixels.

Model	ACC/WR (%)	WP (%)	Weighted F1 (%)	MCC (%)	Inference Time (ms)	Paras (M)
Alexnet	91.46	92.17	91.44	91.06	0.42	57.11
VGG-16	97.27	97.36	97.26	97.13	7.73	134.37
ResNet-50	97.89	97.95	97.90	97.78	4.43	23.56
MobileNetV3	97.27	97.51	97.30	97.13	1.11	4.24
RegNet	99.23	99.27	99.23	99.19	12.71	37.42
ViT	98.66	98.69	98.66	98.59	28.64	86.45
YOLOv8	97.22	97.36	97.20	97.08	10.1	56.18
Swin Transformer	99.38	99.39	99.37	99.34	26.94	86.77

Figure A1. Accuracy and loss curves of the Swin Transformer in the training process.

Figure A2. Confusion matrix showing true versus predicted labels for the Swin Transformer model, demonstrating its accuracy in classifying chicken breeds and genders.

References

FAO. Domestic Animal Diversity Information System (DAD-IS): Data; FAO: Rome, Italy, 2021. [Google Scholar]
Gonzalez Ariza, A.; Arando Arbulu, A.; Navas Gonzalez, F.J.; Nogales Baena, S.; Delgado Bermejo, J.V.; Camacho Vallejo, M.E. The study of growth and performance in local chicken breeds and varieties: A review of methods and scientific transference. Animals 2021, 11, 2492. [Google Scholar] [CrossRef] [PubMed]
Rubin, C.J.; Zody, M.C.; Eriksson, J.; Meadows, J.R.; Sherwood, E.; Webster, M.T.; Jiang, L.; Ingman, M.; Sharpe, T.; Ka, S.; et al. Whole-genome resequencing reveals loci under selection during chicken domestication. Nature 2010, 464, 587–591. [Google Scholar] [CrossRef]
Dawkins, M.; Layton, R. Breeding for better welfare: Genetic goals for broiler chickens and their parents. Anim. Welf. 2012, 21, 147–155. [Google Scholar] [CrossRef]
Bain, M.M.; Nys, Y.; Dunn, I.C. Increasing persistency in lay and stabilising egg quality in longer laying cycles. What are the challenges? Br. Poult. Sci. 2016, 57, 330–338. [Google Scholar] [CrossRef]
Adavoudi, R.; Pilot, M. Consequences of hybridization in mammals: A systematic review. Genes 2021, 13, 50. [Google Scholar] [CrossRef] [PubMed]
Vakhrameev, A.B.; Narushin, V.G.; Larkina, T.A.; Barkova, O.Y.; Peglivanyan, G.K.; Dysin, A.P.; Dementieva, N.V.; Makarova, A.V.; Shcherbakov, Y.S.; Pozovnikova, M.V.; et al. Disentangling clustering configuration intricacies for divergently selected chicken breeds. Sci. Rep. 2023, 13, 3319. [Google Scholar] [CrossRef]
Nasiri, A.; Omid, M.; Taheri-Garavand, A. An automatic sorting system for unwashed eggs using deep learning. J. Food Eng. 2020, 283, 110036. [Google Scholar] [CrossRef]
Ino, Y.; Oka, T.; Nomura, K.; Watanabe, T.; Kawashima, S.; Amano, T.; Hayashi, Y.; Okabe, A.; Uehara, Y.; Masuda, T.; et al. Breed differentiation among Japanese native chickens by specific skull features determined by direct measurements and computer vision techniques. Br. Poult. Sci. 2008, 49, 273–281. [Google Scholar] [CrossRef]
Bernstein, A.S.; Ando, A.W.; Loch-Temzelides, T.; Vale, M.M.; Li, B.V.; Li, H.; Busch, J.; Chapman, C.A.; Kinnaird, M.; Nowak, K.; et al. The costs and benefits of primary prevention of zoonotic pandemics. Sci. Adv. 2022, 8, eabl4183. [Google Scholar] [CrossRef]
Wu, Z.; Zhang, T.; Fang, C.; Yang, J.; Ma, C.; Zheng, H.; Zhao, H. Super-resolution fusion optimization for poultry detection: A multi-object chicken detection method. J. Anim. Sci. 2023, 101, skad249. [Google Scholar] [CrossRef]
Fang, C.; Zhang, T.; Zheng, H.; Huang, J.; Cuan, K. Pose estimation and behavior classification of broiler chickens based on deep neural networks. Comput. Electron. Agric. 2021, 180, 105863. [Google Scholar] [CrossRef]
Bhuiyan, M.R.; Wree, P. Animal Behavior for Chicken Identification and Monitoring the Health Condition Using Computer Vision: A Systematic Review. IEEE Access 2023, 11, 126601–126610. [Google Scholar] [CrossRef]
Nyalala, I.; Okinda, C.; Kunjie, C.; Korohou, T.; Nyalala, L.; Chao, Q. Weight and volume estimation of poultry and products based on computer vision systems: A review. Poult. Sci. 2021, 100, 101072. [Google Scholar] [CrossRef]
Ji, B.; Zhu, W.; Liu, B.; Ma, C.; Li, X. Review of recent machine-vision technologies in agriculture. In Proceedings of the 2009 Second International Symposium on Knowledge Acquisition and Modeling, Wuhan, China, 30 November–1 December 2009; Volume 3, pp. 330–334. [Google Scholar]
Chowdhury, E.; Morey, A. Intelligent packaging for poultry industry. J. Appl. Poult. Res. 2019, 28, 791–800. [Google Scholar] [CrossRef]
Gupta, H.; Jindal, P.; Verma, O.P.; Arya, R.K.; Ateya, A.A.; Soliman, N.F.; Mohan, V. Computer vision-based approach for automatic detection of dairy cow breed. Electronics 2022, 11, 3791. [Google Scholar] [CrossRef]
Jwade, S.A.; Guzzomi, A.; Mian, A. On farm automatic sheep breed classification using deep learning. Comput. Electron. Agric. 2019, 167, 105055. [Google Scholar] [CrossRef]
Rahman, M.; Prodhan, S.A.; Mia, M.J.; Habib, M.T.; Ahmed, F. Pigeon breed recognition using convolutional neural network. In Proceedings of the 2021 Third International Conference on Intelligent Communication Technologies and Virtual Mobile Networks (ICICV), Tirunelveli, India, 4–6 February 2021; pp. 1426–1430. [Google Scholar]
Wu, D.; Ying, Y.; Zhou, M.; Pan, J.; Cui, D. Improved ResNet-50 deep learning algorithm for identifying chicken gender. Comput. Electron. Agric. 2023, 205, 107622. [Google Scholar] [CrossRef]
Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 10012–10022. [Google Scholar]
Li, X.; Xiong, H.; Li, X.; Wu, X.; Zhang, X.; Liu, J.; Bian, J.; Dou, D. Interpretable deep learning: Interpretation, interpretability, trustworthiness, and beyond. Knowl. Inf. Syst. 2022, 64, 3197–3234. [Google Scholar] [CrossRef]
Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 618–626. [Google Scholar]
Lundberg, S. A unified approach to interpreting model predictions. arXiv 2017, arXiv:1705.07874. [Google Scholar]
Ribeiro, M.T.; Singh, S.; Guestrin, C. “Why should i trust you?” Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 1135–1144. [Google Scholar]
Terven, J.; Córdova-Esparza, D.M.; Romero-González, J.A. A comprehensive review of yolo architectures in computer vision: From yolov1 to yolov8 and yolo-nas. Mach. Learn. Knowl. Extr. 2023, 5, 1680–1716. [Google Scholar] [CrossRef]
Hashemi, M. Enlarging smaller images before inputting into convolutional neural network: Zero-padding vs. interpolation. J. Big Data 2019, 6, 1–13. [Google Scholar] [CrossRef]
Shorten, C.; Khoshgoftaar, T.M. A survey on image data augmentation for deep learning. J. Big Data 2019, 6, 1–48. [Google Scholar] [CrossRef]
Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. In Proceedings of the Advances in Neural Information Processing Systems 25 (NIPS 2012), Lake Tahoe, NV, USA, 3–6 December 2012. [Google Scholar]
Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
Howard, A.; Sandler, M.; Chu, G.; Chen, L.C.; Chen, B.; Tan, M.; Wang, W.; Zhu, Y.; Pang, R.; Vasudevan, V.; et al. Searching for mobilenetv3. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 1314–1324. [Google Scholar]
Radosavovic, I.; Kosaraju, R.P.; Girshick, R.; He, K.; Dollár, P. Designing network design spaces. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 10428–10436. [Google Scholar]
Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
Han, X.; Zhang, Z.; Ding, N.; Gu, Y.; Liu, X.; Huo, Y.; Qiu, J.; Yao, Y.; Zhang, A.; Zhang, L.; et al. Pre-trained models: Past, present and future. AI Open 2021, 2, 225–250. [Google Scholar] [CrossRef]
Bengio, Y.; Simard, P.; Frasconi, P. Learning long-term dependencies with gradient descent is difficult. IEEE Trans. Neural Netw. 1994, 5, 157–166. [Google Scholar] [CrossRef]
Xu, X.; Feng, Z.; Cao, C.; Li, M.; Wu, J.; Wu, Z.; Shang, Y.; Ye, S. An improved swin transformer-based model for remote sensing object detection and instance segmentation. Remote Sens. 2021, 13, 4779. [Google Scholar] [CrossRef]
Hatamizadeh, A.; Nath, V.; Tang, Y.; Yang, D.; Roth, H.R.; Xu, D. Swin unetr: Swin transformers for semantic segmentation of brain tumors in mri images. In Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries; Crimi, A., Bakas, S., Eds.; Springer: Cham, Switzerland, 2021; pp. 272–284. [Google Scholar]
Jiang, Y.; Zhang, Y.; Lin, X.; Dong, J.; Cheng, T.; Liang, J. SwinBTS: A method for 3D multimodal brain tumor segmentation using swin transformer. Brain Sci. 2022, 12, 797. [Google Scholar] [CrossRef]
Liao, L.; Li, H.; Shang, W.; Ma, L. An empirical study of the impact of hyperparameter tuning and model optimization on the performance properties of deep neural networks. ACM Trans. Softw. Eng. Methodol. (TOSEM) 2022, 31, 1–40. [Google Scholar] [CrossRef]
Hendrycks, D.; Lee, K.; Mazeika, M. Using pre-training can improve model robustness and uncertainty. In Proceedings of the International Conference on Machine Learning, PMLR, Long Beach, CA, USA, 9–15 June 2019; pp. 2712–2721. [Google Scholar]
Mandl, T.; Modha, S.; Majumder, P.; Patel, D.; Dave, M.; Mandlia, C.; Patel, A. Overview of the hasoc track at fire 2019: Hate speech and offensive content identification in indo-european languages. In Proceedings of the 11th Annual Meeting of the Forum for Information Retrieval Evaluation, Kolkata, India, 12–15 December 2019; pp. 14–17. [Google Scholar]
Grandini, M.; Bagli, E.; Visani, G. Metrics for multi-class classification: An overview. arXiv 2020, arXiv:2008.05756. [Google Scholar]
Richter, M.L.; Byttner, W.; Krumnack, U.; Wiedenroth, A.; Schallner, L.; Shenk, J. (Input) size matters for CNN classifiers. In Proceedings of the Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, 14–17 September 2021; Proceedings, Part II 30. Springer: Berlin/Heidelberg, Germany, 2021; pp. 133–144. [Google Scholar]
Tan, M.; Le, Q. Efficientnet: Rethinking model scaling for convolutional neural networks. In Proceedings of the International Conference on Machine Learning, PMLR, Long Beach, CA, USA, 9–15 June 2019; pp. 6105–6114. [Google Scholar]

Figure 1. Workflow of chicken breed and gender classification system.

Figure 2. Typical chicken images of 13 breeds in the dataset collected. The symbols (M) and (F) represent the rooster and hen in the image, respectively.

Figure 3. Comparison of performance between pre-trained and non-pre-trained weights.

Figure 4. Visualizing the ROI of chicken images from various breeds for both breed and gender identification utilizing the gradCAM technique.

Table 1. Advantages and disadvantages of existing intelligent methods.

Team	Methods and Accuracy	Advantages	Disadvantages
Cow breed classification [17]	Model: YOLOv4 Key Techniques: Bag of Freebies Accuracy: 81.07% Kappa: 0.78 mAP: 94.22% IoU: 81.24%	- Real-time detection - Effective classification	- Class imbalance - Sensitive to noise - Environment impacts accuracy
Sheep breed classification [18]	Model: VGG-16 Key Techniques: Double-stage preprocessing Accuracy: 95.8% Standard Deviation: 1.7	- High accuracy - Robust to variation - Real-time suitability	- High computational demand - Limited to 4 breeds - Misclassification issues with crossbreeds
Pigeon breed classification [19]	Model: Custom CNN Key Techniques: Transfer learning Accuracy: Baseline (95.33%) NasNet Mobile (90.72%) Xception (89.75%)	- Feature-focused design - Scalable approach - Cloud-based applications	- Limited to 4 breeds - Performance relies on dataset balance - No inference benchmarks - Sensitive to noise
Chicken gender identification [20]	Model: Improved ResNet-50 Key Techniques: SE attention Overall Accuracy: 98.42% F1: 98.43% Recall: 98.95%	- High accuracy - Real-time deployment feasible - Comprehensive validation	- Lack generalization to breed detection - Small dataset size (only 960 public images) - High training resource requirements
This work	Model: Swin Transformer Key Techniques: Object detection, Interpretability Accuracy: 99.38% WP: 99.39% F1: 99.37% MCC: 99.34% Inference Time: 26.94 ms	- High accuracy - Robust classification - Suitable for large-scale application - Comprehensive dataset (13 breeds, 10 k+ HD chicken images) - Dual-task classification: breed and gender - Visualization-Assisted Model Analysis	- High training resource requirements - Extended inference time

Table 2. Distribution of the number of individuals and images of chickens of different genders and breeds in the dataset.

Breed Name	Number of Male Chickens	Number of Female Chickens	Number of Male Images	Number of Female Images
Guangxi Partridge chicken	30	30	538	517
Wenshang Barred chicken	30	30	341	368
Longsheng Feng chicken	11	11	179	164
Guangxi Yellow chicken	30	30	536	561
Yao chicken	50	50	787	605
Wuliangshan Black-bone chicken	30	30	353	365
Changshun Blue-eggshell chicken	30	30	391	371
Houdan	30	30	486	352
Xuefeng Black-bone chicken	30	30	371	267
Sichuan Mountain Black-bone chicken	30	28	455	309
Danzhou chicken	30	30	390	390
Wenchang chicken	20	20	275	285
Wumeng Feng chicken	30	30	397	429

Table 3. Best achievable performance of image classification models for chicken breed and gender identification following extensive hyperparameter tuning.

Model	ACC/WR (%)	WP (%)	Weighted F1 (%)	MCC (%)	GPU Inference Time (ms)	CPU Inference Time (ms)	Max GPU Usage (GB)	Paras (M)
Alexnet	91.46	92.17	91.44	91.06	0.42	82.13	0.94	57.11
VGG-16	97.27	97.36	97.26	97.13	7.73	168.33	3.80	134.37
ResNet-50	98.42	98.47	98.42	98.34	22.20	167.15	0.95	23.56
MobileNetV3	97.79	97.88	97.81	97.68	4.04	97.71	0.63	4.24
RegNet	99.23	99.27	99.23	99.19	12.71	298.42	2.28	37.42
ViT	98.66	98.69	98.66	98.59	28.64	393.48	0.94	86.45
YOLOv8	98.27	98.33	98.27	98.19	37.2	190.24	0.44	56.18
Swin Transformer	99.38	99.39	99.37	99.34	26.94	360.57	2.05	86.77

The bold values represent the best performance for each metric.

Table 4. Contribution of different modules to the performance of the chicken breed and gender classification system.

Module	Exp No. 1	Exp No. 2	Exp No. 3	Exp No. 4 ^*	Exp No. 5
Object detection	×	✓	×	✓	✓
Data augmentation	×	×	✓	✓	✓
Accuracy (%)	97.17	98.81	97.84	99.19	99.38

* Exp No. 4 used weighted coefficients to mitigate the effects of the imbalanced dataset. ‘×’ indicates that the preprocessing technique listed in the current row was not used in that particular experiment, while ‘✓’ indicates that the technique was applied in that experiment.

Table 5. The impact of different input sizes on the classification accuracy of different models.

Model	224 × 224	512 × 512	1024 × 1024
AlexNet	86.00%	91.46%	88.44%
VGG-16	95.73%	97.27%	96.79%
ResNet-50	96.74%	97.89%	98.42%
MobileNetV3	95.83%	97.27%	97.79%
RegNet	97.55%	99.23%	98.47%
ViT	97.99%	98.66%	98.05%
YOLOv8	94.96%	97.22%	98.27%
Swin Transformer	97.89%	99.38%	99.28%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhu, L.; Chen, Z.; Zhang, H.; Shan, Y.; Ji, G.; Xu, H.; Shu, J.; Huang, J. Image Classification of Chicken Breed and Gender Using Deep Learning. AgriEngineering 2025, 7, 211. https://doi.org/10.3390/agriengineering7070211

AMA Style

Zhu L, Chen Z, Zhang H, Shan Y, Ji G, Xu H, Shu J, Huang J. Image Classification of Chicken Breed and Gender Using Deep Learning. AgriEngineering. 2025; 7(7):211. https://doi.org/10.3390/agriengineering7070211

Chicago/Turabian Style

Zhu, Liuchao, Zixin Chen, Hanwen Zhang, Yanju Shan, Gaige Ji, Huanliang Xu, Jingting Shu, and Junxian Huang. 2025. "Image Classification of Chicken Breed and Gender Using Deep Learning" AgriEngineering 7, no. 7: 211. https://doi.org/10.3390/agriengineering7070211

APA Style

Zhu, L., Chen, Z., Zhang, H., Shan, Y., Ji, G., Xu, H., Shu, J., & Huang, J. (2025). Image Classification of Chicken Breed and Gender Using Deep Learning. AgriEngineering, 7(7), 211. https://doi.org/10.3390/agriengineering7070211

Article Menu

Image Classification of Chicken Breed and Gender Using Deep Learning

Abstract

1. Introduction

2. Materials and Methods

2.1. Materials

2.1.1. Data Collection

2.1.2. Dataset Splitting

2.2. Preprocessing

2.2.1. Object Detection

2.2.2. Data Augmentation

2.3. Model

2.4. Evaluation Metrics

2.5. Development Environment

3. Results and Discussion

3.1. Swin Transformer Achieves Superior Accuracy and Robustness in Comparative Model Evaluation

3.2. Target Detection and Data Augmentation Enhance Performance in Chicken Breed and Gender Classification

3.3. Appropriate Image Input Size Improves the Accuracy of the Classification Model

3.4. Application of Pre-Trained Weights Enhances Classification Model Performance

3.5. Swin Transformer Exhibits Superior Performance in ROI Visualization, Supporting Previous Experimental Results

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI