Intelligent Classifier for Identifying and Managing Sheep and Goat Faces Using Deep Learning

Yadav, Chandra Shekhar; Peixoto, Antonio Augusto Teixeira; Rufino, Luis Alberto Linhares; Silveira, Aedo Braga; Alexandria, Auzuir Ripardo de

doi:10.3390/agriengineering6040204

Open AccessArticle

Intelligent Classifier for Identifying and Managing Sheep and Goat Faces Using Deep Learning

by

Chandra Shekhar Yadav

^1,*,†,

Antonio Augusto Teixeira Peixoto

^2,†,

Luis Alberto Linhares Rufino

^3,†,

Aedo Braga Silveira

^4,† and

Auzuir Ripardo de Alexandria

^4,†

¹

School of Computer Applications, Noida Institute of Engineering and Technology, Greater Noida 201306, Uttar Pradesh, India

²

Instituto Federal do Ceará, Jaguaribe 63475-000, Brazil

³

Programa de Pós-Graduação em Ciências Veterinárias Brazil, Universidade Estadual do Ceará—UECE, Fortaleza 60040-215, Brazil

⁴

Instituto Federal de Educação, Ciência e Tecnologia do Ceará—IFCE Brazil, Fortaleza 60040-215, Brazil

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

AgriEngineering 2024, 6(4), 3586-3601; https://doi.org/10.3390/agriengineering6040204

Submission received: 29 July 2024 / Revised: 16 September 2024 / Accepted: 20 September 2024 / Published: 30 September 2024

Download

Browse Figures

Review Reports Versions Notes

Abstract

Computer vision, particularly in artificial intelligence (AI), is increasingly being applied in various industries, including livestock farming. Identifying and managing livestock through machine learning is essential to improve efficiency and animal welfare. The aim of this work is to automatically identify individual sheep or goats based on their physical characteristics including muzzle pattern, coat pattern, or ear pattern. The proposed intelligent classifier was built on the Roboflow platform using the YOLOv8 model, trained with 35,204 images. Initially, a Convolutional Neural Network (CNN) model was developed, but its performance was not optimal. The pre-trained VGG16 model was then adapted, and additional fine-tuning was performed using data augmentation techniques. The dataset was split into training (88%), validation (8%), and test (4%) sets. The performance of the classifier was evaluated using precision, recall, and F1-Score metrics, with comparisons against other pre-trained models such as EfficientNet. The YOLOv8 classifier achieved 95.8% accuracy in distinguishing between goat and sheep images. Compared to the CNN and VGG16 models, the YOLOv8-based classifier showed superior performance in terms of both accuracy and computational efficiency. The results confirm that deep learning models, particularly YOLOv8, significantly enhance the accuracy and efficiency of livestock identification and management. Future research could extend this technology to other livestock species and explore real-time monitoring through IoT integration.

Keywords:

computer vision; livestock face identification; deep learning; CNN; VGG16; YOLO

1. Introduction

One of the most exciting and promising areas in artificial intelligence (AI) is computer vision, giving machines the ability to see, interpret, and understand the world around them. This area is growing rapidly with applications across a variety of industries including robotics, self-driving cars, health analytics, livestock tracking and monitoring, and security [1].

However, much work remains to be done to develop an intelligent classifier to identify similar-looking cattle. Here, we will correctly identify goats or sheep using an intelligent classifier using computer vision. We reviewed similar recent research in this field to find a suitable algorithm for the development of the smart and intelligent classifier. Chen et al. [2] analyzed the process of image segmentation, recognition and analysis of behaviors through computer vision and modern deep learning models.

Moreover, an assessment was conducted on the progression of remarkable investigations within this domain, including the advancement of resilient algorithms designed for the identification and detection of behaviors in cattle across various stages of growth. They then quantified the cattle’s behavioral recognition results and built a powerful monitoring system for their growth, health, and wellbeing [3]. Hossain et al. [1] conducted a systematic examination of diverse machine learning algorithms, such as Support Vector Machine (SVM), k-Nearest Neighbor (KNN), and Artificial Neural Network (ANN), within the context of cattle identification. Additionally, they investigated the suitability of Convolutional Neural Network (CNN), Residual Network (ResNet), Inception, You Only Look Once (YOLO), and Faster R-CNN models to enhance livestock-monitoring systems.

On the other hand, management is crucial for monitoring cattle growth to improve production and welfare, which can be particularly difficult with novel breeds. A metalearning system that uses machine learning methods has proven successful in detecting irregularities in the weight gain of cattle during the fattening process, leading to continuous improvement in its effectiveness. This was illustrated in a study carried out at the “El Rosario” farm in Monteria, Colombia, where an R2 value of 90.8% was achieved [4].

The utilization of a machine learning ensemble technique is crucial in incorporating spatial variances in the adaptability of livestock to drought conditions, thereby enhancing the efficacy of statistical models devised for the anticipation of reductions in yield linked to drought [5]. This could potentially enhance the predictive precision of deep learning models, thereby enabling the integration of all aforementioned deep learning models into an ensemble to enhance accuracy in cattle identification. The results of this work have the potential to improve livestock comfort by offering a more efficient and accurate method for identifying and managing animals. Through the use of deep learning models such as YOLOv8, which accurately distinguishes between sheep and goat images based on distinct features, the system ensures minimal physical intervention, reducing stress among the animals. Additionally, this automated system can be adapted to account for various environmental factors and animal characteristics, such as different diets for different stages of life (juvenile, adult, gestation, and lactation), as well as breed-specific needs [6].

Separating sheep and goats can offer several advantages, depending on the management goals and specific conditions of the farm. Sheep and goats, while both small ruminants, have different nutritional, behavioral, and health needs. Goats, for example, are more selective browsers and tend to seek out better food, sometimes leading to competition with sheep if food is scarce. This competition could result in stronger animals dominating the resources, impacting the weaker ones. Additionally, the reproductive cycles and nutritional requirements of sheep and goats differ, especially during gestation and lactation, necessitating different management strategies for optimal production. Health-wise, although disease transmission between the two species is uncommon, separating them can help in tailoring disease prevention and treatment strategies based on species-specific susceptibilities [7].

The aim of this work is to automatically identify individual sheep or goats based on their physical characteristics, including muzzle pattern, coat pattern or ear pattern. This can be achieved by taking pictures of goats or sheep with a camera and then using computer vision algorithms to isolate and analyze the characteristics of the goats or sheep [8].

Through this project, herd management can be carried out intelligently and without harming the livestock. This research work is very useful for farmers to track and monitor the health and welfare of their livestock. This research can also be used to count sheep and goats using drone footage. The efficiency, precision of farming, and the quality of life of farmers can be radically improved.

Literature Review

The advancement of the identification system was heavily dependent on the significance of cattle muzzles and fur patterns. It is worth mentioning that commonly utilized methods for feature extraction, including Local Binary Pattern (LBP), Speeded Up Robust Features (SURF), Scale-Invariant Feature Transform (SIFT), and Inception or Convolutional Neural Network (CNN), were recognized [9].

The Convolutional Neural Network (CNN) model was utilized to extract the characteristics of cattle from a rear perspective. Moreover, the Long Short-Term Memory (LSTM) model was utilized to capture the temporal details of cattle. The objective of this methodology was to develop a sophisticated identification system, as emphasized in the research [10]. Qiao et al. presented a one-shot learning-based approach with pseudo-labeling for cattle video segmentation in smart livestock farming. The method uses an Xception-based Fully Convolutional Neural Network (Xception-FCN) to extract features and a pseudo-labeling module to improve segmentation accuracy by leveraging unlabeled data. The system was evaluated using a challenging feedlot cattle video dataset, achieving an 88.7% mean intersection-over-union score and an 80.8% contour accuracy. These results demonstrate that the proposed approach outperforms state-of-the-art methods, providing a reliable solution for livestock video segmentation [11].

Ahmed et al. proposed a deep transfer learning-based animal face identification system using a hybrid approach that automates the identification of livestock animals. The system leverages YOLO v7 for detecting faces and muzzles, and applies the SIFT algorithm to extract key features, which are then stored in a database for future matching. The method is tested on a dataset, using FLANN for matching extracted features against the database, achieving an impressive 99.7% accuracy in muzzle detection and 100% accuracy in livestock identification. This demonstrates the effectiveness of the system for real-time livestock identification, which has practical applications in modern agriculture [12].

Md Sultan Mahmud et al. conducted a detailed examination of the use of deep-learning in the livestock identification and health monitoring. They identified Convolutional Neural Network (CNN) as the most versatile and comprehensive model, which integrates Long Short-Term Memory (LSTM), neural networks, Mask region-based Convolutional Neural Network (Mask- RCNN), and Faster-RCNN [13].

It is also observed in this research that RestNet is the most widely used pre-trained model for automated systems in smart animal husbandry. Aburasain, R.Y. et al. configured, trained, and used SSD-500 and YOLO V-3 for cattle group identification using images captured by drones with the greatest accuracy [14].

Santosh Kumar et al. also used a deep belief convolutional neural network for the identification of individual animals and groups of animals using features including muzzle point image patterns [15]. Peiyuan Jiang et al. discussed the similarities as well as differences and disadvantages of YOLO versions and convolutional neural network (CNN) algorithms for computer vision. The YOLO algorithm for livestock identification is still being improved [16].

Furthermore, with the utilization of YOLO (You Only Look Once) and the situation of subsequent advancements in architecture, there has been a discernible improvement in the precision of object detection, occasionally even outperforming traditional two-stage object detection systems. Arunabha M. Roy et al. highlighted that YOLO achieves an accuracy of 63.4, while Fast-RCNN reaches 70 [17]. However, it is crucial to note that the inference speed of YOLO is approximately 300 times faster. Additionally, the authors conducted a thorough examination of single-stage object detectors, with a specific focus on YOLOs, including their regression formulation, advancements in architecture, and performance metrics [17]. The researchers also provided a summary of the comparative analyses conducted on two-stage and one-stage object detection models, as well as between various versions of YOLO and their respective applications [18].

Chen, W. et al. proposed the real-time YOLO face detector, which maintains the high speed of the original YOLO method and is one of the best face detectors, offering a balance between throughput and speed [2]. Shubham Shinde et al. showcased the efficacy and efficiency of YOLO as a rapid detection and localization technique within the Liris Human Activity dataset [19]. Xingyu Jiang et al. explained and developed several applications to demonstrate the importance of multimodal image matching. Further innovations in multimodal imaging can be found in [20].

Both supervised and self-supervised learning have been used for image identification, image segmentation, and image classification. Generally, supervised learning is used when the labeled dataset is available. If not, self-supervised learning is chosen. Kriti Ohri and Mukesh Kumar explored self-supervised learning, in which the data itself provides strong signals of interest that enable learners to perceive the relationships involved in the data without external labels [21]. Syed Sahil Abbas Zaidi et al. also used CNN for object detection and classification and compared the performance with the help of various matrices [22].

Huang, Y. et al. identified astrocytes with complex morphological structure composed of glial fibrillary protein (GFAP) in the central nervous system (CNS) using YOLOv5, an advanced deep learning platform for object detection and classification [23]. Lawal, O.M. has successfully used YOLOv3, which includes YOLODenseNet and YOLOMixNet, for tomato detection during robotic harvesting [24].

Kganyago et al. presented a summary of the latest developments in remote sensing technologies and machine learning models to estimate biochemical and biophysical parameters in precision agriculture. Future deep learning research efforts should develop adaptive models to address the challenges facing agriculture [25].

The results of the literature reviews of the main references related to our research work are represented in Table 1.

2. Methods

Figure 1 shows the development steps of an intelligent classifier for distinguishing images of goats and sheep. The performance of an intelligent classifier depends on several factors, such as the quality of the dataset, the complexity of the models, and the number of training datasets available.

Here we have used various goat and sheep datasets available on the internet and Kaggle datasets. In our dataset, we have tried to include a variety of images of goats and sheep of all possible colors and breeds. There is a mix of images taken in the morning, noon, and evening; and images taken in different lighting conditions. We have taken the images of goat and sheep faces from the full image, which describes the shape of the face, eyes, nose, ears, horns, and facial hair [26].

It is possible to achieve high accuracy in detecting goat and sheep images. Image capture, image processing, feature extraction using convolutional neural network and classification using dense network are used to identify goat and sheep images accurately in computer vision. Initially, 4759 images of size 64 × 64 × 3 were collected from the world wide web.

On the Keras platform offered by Colab, the architecture of the CNN model entails the incorporation of two convolutional layers succeeded by a MaxPooling layer. For this experiment, we split the image set into a set of training images (78%) and a set of test images (22%).

The result of this model was not encouraging, so we looked at the well-known pre-trained model VGG16 and adapted it accordingly on the Google Colab platform. Its performance is better than that of the previously developed CNN model. Finally, we use the Roboflow framework to design and deploy the deep learning model of our classifier. We increased the total number of images to 35,204 using a data augmentation tool that includes a training dataset (88%), a validation dataset (8%), and a test dataset (4%).

This classifier performs better than the previous two models and can correctly and quickly classifyreal images of goat and sheep.

In order to evaluate the efficacy of the Intelligent Classifier, various performance measures including precision, recall, and F1 Score were utilized. These metrics are designed using the following formulas:

A c c u r a c y = \frac{(T P + F P)}{(T P + F N + T N + F P)}

(1)

P r e c i s i o n = \frac{T P}{(T P + F P)}

(2)

R e c a l l = \frac{T P}{(T P + F N)}

(3)

F 1 S c o r e = 2 \frac{P r e c i s i o n \cdot R e c a l l}{(P r e c i s i o n + R e c a l l)}

(4)

where TP and TN represent the number of positive classified and negative rejected samples. FP and FN represent the number of positive and negative samples that were misclassified, respectively [27].

After a thourough literature review of recent research, we found that the CNN classifier is one of the best classifiers for identifying goat and sheep images because convolutional networks have advantages of parameter sharing and sparsity of connection over using only fully connected networks. The architecture of the CNN classifier is depicted in Figure 2.

There are different CNN models, such as LeNet-5, AlexNet, VGG16, VGG-19 and ResNets, that are already available for computer vision. The LeNet model is trained on 32 × 32 × 1 gray images to recognize numbers 0 to 9. There are 60,000 parameters in it. AlexNet is a 1000-times larger network than LeNet-5 because there are 60 million parameters in this network and they are trained on 227 × 227 × 3 color images. This model requires multiple GPUs for computation [28].

The VGG-16 architecture has 138 million parameters with 16 layers. The architecture of VGG-16 is quite uniform. ResNets is a very deep model of 100 layers, with skip connections that perform identity mapping and merge with the layer outputs through addition operations [29].

Which CNN models to use depends entirely on the nature of the problem and the size of the dataset [30]. Upon conducting an extensive review of existing literature, a convolutional neural network (CNN) model was formulated, consisting of two convolutional layers, each accompanied by a subsequent MaxPooling layer.

This architecture was finalized with a flat layer followed by two dense layers implementing the Softmax activation function. Despite efforts to optimize this model using the Keras tuner, the achieved performance fell short of expectations. Subsequently, various pre-trained models including VGG16, MobileNet, Exception, ResNet50, ResNet101, ResNet152, and EfficientNet—already fine-tuned with the ImageNet dataset—were explored. These pre-trained models were trained using a limited dataset of only 250 images sized (200 × 200 × 3), over 25 epochs, and their performance was assessed. Results indicated a marginal improvement compared to our initially proposed model.

Recognizing the unique challenges posed by our dataset, particularly the absence of classes for goat and sheep images within the standard 1000 classes of the ImageNet dataset, we proceeded to fine-tune the top layers and hyper parameters of VGG16, ResNet, MobileNet, XceptionNet, Inception, and EfficientNet pretrained models. Evaluation of the outcomes from each model revealed that EfficientNet outperformed the others. Detailed results and analysis of all models are elaborated upon in the subsequent section.

Tensorflow framework with Keras API on Google Colab was used to perform image processing tasks and build the intelligent classifiers. Additionally, we used Python 3.6 with the OpenCV library for the data augmentation program. All programs and analyses were carried out on an HP laptop with Intel(R) Core(TM) i5-4210M CPU@2.60 GHz, 2 core(s), 16 GB RAM Hewlett-Packard, Greater Noida India and Windows 10 configuration. All models were trained, validated and tested on Google Colab with T4GPU, except the YOLO8 classifier. The YOLO8 classifier was built, trained and deployed on the Roboflow computer vision platform.

3. Results

Following an extensive examination of the available literature, a Convolutional Neural Network (CNN) model was developed, comprising 2 convolutional layers, each one succeeded by a layer of MaxPooling. The input image dimensions were configured at 200 × 200 × 3, with a 3 × 3 filter size, utilizing Rectified Linear Unit (RELU) as the activation mechanism. The initial convolutional layer contains 32 filters, while the subsequent MaxPooling layer employs a 2 × 2 filter size and is positioned immediately after the first convolutional layer.

The second convolutional layer is furnished with 64 filters, also with a 3 × 3 filter size and utilizing RELU as the activation method. Subsequently, the MaxPooling layer connected to the second convolutional layer features a 2 × 2 filter size. Furthermore, the model structure incorporates a flat layer succeeded by two dense layers. The first dense layer encompasses 384 neurons with RELU as the activation process, whereas the second dense layer comprises a single neuron with the Sigmoid activation function. A detailed illustration of this model’s architecture is provided in Table 2.

The model is configured with a learning rate of 0.0001 and uses the Adam optimizer for training. This architecture effectively extracts features from input data through the convolutional and pooling layers and then performs classification based on these features using the dense layers.

In Figure 3, the behavior of the CNN model in training and validation is shown, with Figure 3a depicting the accuracy metrics, whereas Figure 3b shows the losses incurred during said steps.

Table 3 presents the performance metrics of a CNN model for classifying “Goat” and “Sheep,” with precision, recall, and F1-score represented as percentages. For the “Goat” category, the model achieved a precision of 67%, a recall of 94%, and an F1-score of 78%, based on 17 samples. For the “Sheep” category, it obtained a precision of 91%, a recall of 56%, and an F1-score of 69%, with 18 samples. The overall accuracy of the model is 74% across 35 total samples. The macro average of the metrics is 79% precision, 75% recall, and 74% F1-score, while the weighted averages are 79% precision, 74% recall, and 73% F1-score, accounting for the different sample sizes. The CNN model shows better precision with sheep but higher recall for goats.

Table 4 outlines the architecture of a tuned pre-trained VGG16 model, detailing the layers, output shapes, and the number of parameters for each layer. The model starts with an input layer (input1) that processes images of shapes (200, 200, 3) with 0 parameters. This is followed by two convolutional layers (block1conv1 and block1conv2), both with 64 filters, leading to an output shape of (200, 200, 64), with 1792 and 36,928 parameters, respectively. A max-pooling layer (block1pool) reduces the output size to (100, 100, 64) with no parameters.

Next, two convolutional layers in block2conv1 and block2conv2 increase the filters to 128, producing an output of (100, 100, 128) and requiring 73,856 and 147,584 parameters, respectively; followed by another max-pooling layer that reduces the output to (50, 50, 128). Three convolutional layers in block3 increase the filters to 256, producing an output of (50, 50, 256), with each layer requiring 295,168, 590,080, and 590,080 parameters, followed by max-pooling that reduces the output to (25, 25, 256).

In block4, three convolutional layers increase the filters to 512, producing an output of (25, 25, 512), with 1,180,160, 2,359,808, and 2,359,808 parameters, respectively, followed by max-pooling reducing the output to (12, 12, 512). The same configuration applies to block5, resulting in an output of (12, 12, 512) and 2,359,808 parameters for each convolutional layer. A final max-pooling reduces the output to (6, 6, 512), and the VGG16 block contains 14,714,688 total parameters. After flattening the output to (None, 18,432), a dense layer with 384 units is added with 7,078,272 parameters, followed by a dropout layer. Finally, a dense layer outputs one unit with 385 parameters.

The model has a total of 14,714,688 parameters (56.13 MB), of which 7,079,424 are trainable (27.01 MB), and 7,635,264 are non-trainable (29.13 MB). The learning rate is set to 0.0001, and the optimizer used is Adam.

Results of a tuned VGG16 pre-trained model are presented in Figure 4 and Table 5. In Figure 4a, the accuracy metrics for the VGG16 model are shown, for training and validation, whereas Figure 4b shows the loss metrics for the same scenario.

In Table 5, the performance of the VGG16 model for classifying “Goat” and “Sheep” are presented. For the “Goat” category, the model achieved a precision of 76%, a recall of 94%, and an F1-score of 84%, based on 17 samples. For the “Sheep” category, the precision was 93%, recall was 72%, and the F1-score was 81%, with 18 samples. The overall accuracy of the model is 83%, based on 35 total samples. The macro averages for precision, recall, and F1-score are 85%, 83%, and 83%, respectively, and the weighted averages for these metrics are also 85%, 83%, and 83%, accounting for the number of samples in each category.

Next, we have the results of the pre-trained EfficientNet model, depicted in Figure 5 and Table 6. In Figure 5a, the accuracy metrics for the EfficientNet model are shown, for training and validation, with Figure 5b showing the loss metrics. Therefore, in Table 5, we have presented the performance of the EfficientNet model for classifying “Goat” and “Sheep”. For the “Goat” category, the model achieved a precision of 89%, a recall of 94%, and an F1-score of 91%, based on 17 samples. For the “Sheep” category, the model reached a precision of 94%, recall of 89%, and an F1-score of 91%, with 18 samples.

The overall accuracy of the model is 91%, based on 35 total samples. The macro average for precision, recall, and F1-score is 92%, 92%, and 91%, respectively. Similarly, the weighted averages for precision, recall, and F1-score are 92%, 91%, and 91%, accounting for the number of samples in each category.

Moving to the next model, results of the ResNet50 model are shown in Figure 6 and Table 7. Similarly, to Figure 3, Figure 4 and Figure 5, in Figure 6 training and validation accuracy and losses are shown.

Table 7 provides performance metrics for a ResNet50 model, which is used to classify “Goat” and “Sheep.” For the “Goat” category, the model achieved a precision of 84%, a recall of 94%, and an F1-score of 89%, based on 17 samples. For the “Sheep” category, the precision was 94%, recall was 83%, and the F1-score was 88%, with 18 samples. The overall accuracy of the model is 89%, based on a total of 35 samples. The macro average of precision, recall, and F1-score is 89% across both categories. Similarly, the weighted average for precision, recall, and F1-score is also 89%, accounting for the distribution of samples in each category.

The next model is the ResNet101, with validation and training accuracy and losses shown in Figure 7.

In Table 8, the performance metrics for a ResNet101 model is presented, where it was used to classify “Goat” and “Sheep”. For the “Goat” category, the model achieved a precision of 81%, a recall of 100%, and an F1-score of 89%, based on 17 samples. For the “Sheep” category, the model reached a precision of 100%, a recall of 78%, and an F1-score of 88%, based on 18 samples. The overall accuracy of the model is 89%, calculated over 35 total samples. The macro average of the precision, recall, and F1-score is 90%, 89%, and 88%, respectively, across both categories. Similarly, the weighted average for these metrics—taking into account the number of samples per category—is 91% precision, 89% recall, and 88% F1-score. The model shows high precision for sheep, but perfect recall for goats.

Additionally, the performance of the ResNet152 model presented in Table 9 depicts 93% precision for “sheep” and 94% recall for “goat” images, with fewer false positives. Table 10 shows the training and classification time per step of the models used in this research. The bold values show the lowest training and classification time.

Roboflow YOLO

Finally, the computer vision pipeline provided by Roboflow, YOLO (You Only Look Once), was tested. This pipeline consists of collecting images, organizing the image datasets, annotating images, augmenting the images to increase the dataset size, training the model, managing the model, and deploying it. The classifier of this framework has a validation accuracy of 95.8%. A snapshot of the goat and sheep images are shown in the Figure 8 and Figure 9, respectively.

The YOLO algorithm, commonly used for object detection, is a real-time system that identifies objects in images or video frames. This is used with Roboflow, which is a platform.

The total number of images for this classifier was initially 3699. After augmenting the images through horizontal flip, vertical flip, clockwise and counterclockwise rotation, upside-down flip, cropping, rotation between certain angles, shearing both horizontally and vertically, and adjustments to hue (between specific values), saturation (in a certain range), brightness (in a specific range), exposure (within a range), blur (up to 2.5 px), noise (up to a percentage of pixel values), and cutout (with 3 boxes of a specified size), the dataset size increased to a total of 35,204 images. Of these, the training set consists of 30,816 images, the validation set contains 2924 images, and the test set includes 1464 images in Figure 10. Figure 11 shows a comparison of the deep learning models discussed in this research work.

4. Conclusions and Future Scope of Work

The research shows that computer vision and deep learning techniques, especially the YOLOv8 model on the Roboflow platform, significantly improve the accuracy of livestock identification and handling. Our intelligent classifier outperformed other well-known pre-trained models, achieving a remarkable 95.8% accuracy in distinguishing between challenging images of goats and sheep. This achievement not only proves the effectiveness of deep learning models in livestock management, but also highlights their potential to improve the precision and efficiency of farming operations.

YOLO’s ability to consider the entire image during the detection process makes it more effective at understanding the global context of objects, which helps reduce the number of false positives that occur in algorithms that only examine small regions of the image. By analyzing the whole image and predicting multiple objects simultaneously, YOLO tends to be less prone to incorrectly detecting or overestimating objects (lower false-positive rate), especially when compared to other algorithms that segment the image and process each region separately.

The advances made in this project offer a robust tool for farmers, enabling them to monitor livestock health and activities with greater accuracy, ultimately leading to improved animal welfare and agricultural productivity. The results confirm the importance of continuous model optimization and dataset improvement to meet the specific needs of livestock management.

Future research may extend this classification to include other livestock species such as cattle and pigs, improving its versatility. Integrating the model with IoT devices and edge computing can enable real-time remote monitoring and improve its practical utility. In addition, refining the model to adapt to varying environmental conditions and incorporating multimodal data such as images, voices, and captioning of the images can further improve accuracy and reliability. Efforts to scale this technology for global deployment should focus on developing user-friendly interfaces and addressing ethical and privacy concerns. By tackling these challenges, the classifier can significantly contribute to more sustainable and efficient farming practices, and meet the increasing demands for global food production, while ensuring animal welfare at the same time.

Author Contributions

Conceptualization, C.S.Y. and A.R.d.A.; methodology, C.S.Y., A.R.d.A. and A.A.T.P.; software, C.S.Y.; validation, L.A.L.R., A.A.T.P., A.B.S. and A.R.d.A.; resources, C.S.Y. and A.A.T.P.; data curation, C.S.Y. and A.A.T.P.; writing—original draft preparation, C.S.Y. and A.R.d.A.; writing—review and editing, C.S.Y., L.A.L.R., A.R.d.A., A.B.S. and A.A.T.P.; supervision, A.B.S., A.R.d.A. and A.A.T.P.; project administration, A.B.S. and A.R.d.A.; funding acquisition, A.R.d.A. and C.S.Y.; animal science and veterinary science, L.A.L.R. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Research Council—CNPq through call 305359/2021-5 and 442182/2023-6 and the Internal Simplified Call PRPI/Postgraduate—Grant Support for IFCE Stricto Sensu Postgraduate Programs, FUNCAP (UNI-0210-00699.01.00/23, and Edital 38/2022), and CAPES.

Data Availability Statement

Dataset will be made available on request to the authors.

Acknowledgments

I am grateful to Noida Institute of Engineering and Technology Greater Noida India and Federal Institute of Education, Science and Technology of Ceara (IFCE) Brazil for providing resources for developing the deep learning model to analyze the results. My special thanks go to Auzuir Ripardo de Alexendria, Advisor of the Telecommunications Engineering Program—PPGET IFCE, and Glendo de Freitas Guimaraes, Coordinator of the Telecommunications Engineering Program—PPGET, for their supervision and guidance during this project. Their expertise was invaluable and their feedback was encouraging. I am grateful for their patience and support, which contributed significantly to the success of this research.

Conflicts of Interest

The author(s) declare no conflicts of interest related to this paper. No financial, professional, or personal relationships have influenced the research, results, or conclusions. If any conflicts arise, they will be disclosed to the editor promptly.

References

Gunaratnam, A.; Thayananthan, T.; Thangathurai, K.; Abhiram, B. Computer vision in livestock management and production. Engineering Applications in Livestock Production; Academic Press: Cambridge, MA, USA, 2024; pp. 93–128. [Google Scholar] [CrossRef]
Chen, W.; Huang, H.; Peng, S.; Zhou, C.; Zhang, C. YOLO-face: A real-time face detector. Vis. Comput. 2021, 37, 805–813. [Google Scholar] [CrossRef]
Chen, C.; Zhu, W.; Norton, T. Behaviour recognition of pigs and cattle: Journey from computer vision to deep learning. Comput. Electron. Agric. 2021, 187, 106255. [Google Scholar] [CrossRef]
García, R.; Aguilar, J. A meta-learning approach in a cattle weight identification system for anomaly detection. Comput. Electron. Agric. 2024, 217, 108572. [Google Scholar] [CrossRef]
Zhou, S.; Wu, Y.; Wang, C.; Lu, H.; Zhang, Z.; Liu, Z.; Lei, Y.; Chen, F. Projection of future drought impacts on millet yield in northern Shanxi of China using ensemble machine learning approach. Comput. Electron. Agric. 2024, 218, 108725. [Google Scholar] [CrossRef]
Lin, X.; Ju, L.; Cheng, Q.; Jiang, Y.; Hou, Q.; Hu, Z.; Wang, Y.; Wang, Z. Comparison of growth performance and rumen metabolic pathways in sheep and goats under the same feeding pattern. Front. Vet. Sci. 2023, 10, 1013252. [Google Scholar] [CrossRef]
El Sabry, M.I.; Almasri, O. Stocking density, ambient temperature, and group size affect social behavior, productivity and reproductivity of goats-A review. Trop. Anim. Health Prod. 2023, 55, 181. [Google Scholar] [CrossRef]
Thakur, R.; Baghel, M.; Bhoj, S.; Jamwal, S.; Chandratre, G.A.; Vishaal, M.; Badgujar, P.C.; Pandey, H.O.; Tarafdar, A. Digitalization of livestock farms through blockchain, big data, artificial intelligence, and Internet of Things. In Engineering Applications in Livestock Production; Academic Press: Cambridge, MA, USA, 2024; pp. 179–206. [Google Scholar] [CrossRef]
Hossain, M.; Kabir, M.A.; Zheng, L.; Swain, D.L.; McGrath, S.; Medway, J. A systematic review of machine learning techniques for cattle identification: 408 Datasets, methods and future directions. Artif. Intell. Agric. 2022, 6, 138–155. [Google Scholar] [CrossRef]
Qiao, Y.; Su, D.; Kong, H.; Sukkarieh, S.; Lomax, S.; Clark, C. Individual Cattle Identification Using a Deep Learning Based Framework. IFAC-PapersOnLine 2019, 52, 318–323. [Google Scholar] [CrossRef]
Qiao, Y.; Xue, T.; Kong, H.; Clark, C.; Lomax, S.; Rafique, K.; Sukkarieh, S. One-Shot Learning with Pseudo-Labeling for Cattle Video Segmentation in Smart Livestock Farming. Animals 2022, 12, 558. [Google Scholar] [CrossRef]
Ahmad, M.; Abbas, S.; Fatima, A.; Issa, G.F.; Ghazal, T.M.; Khan, M.A. Deep Transfer Learning- Based Animal Face Identification Model Empowered with Vision-Based Hybrid Approach. Appl. Sci. 2023, 13, 1178. [Google Scholar] [CrossRef]
Mahmud, M.; Zahid, A.; Das, A.K.; Muzammil, M.; Khan, M.U. A systematic literature review on deep learning applications for precision cattle farming. Comput. Electron. Agric. 2021, 187, 106313. [Google Scholar] [CrossRef]
Aburasain, R.; Edirisinghe, E.; Albatay, A. Drone-Based Cattle Detection Using Deep Neural Networks. In Advances in Intelligent Systems and Computing, AISC; Springer: Cham, Switzerland, 2021; Volume 1250, pp. 598–611. [Google Scholar] [CrossRef]
Kumar, S.; Pandey, A.; Satwik, K.S.R.; Kumar, S.; Singh, S.K.; Singh, A.K.; Mohan, A. Deep learning framework for recognition of cattle using muzzle point image pattern. Measurement 2018, 116, 1–17. [Google Scholar] [CrossRef]
Jiang, P.; Ergu, D.; Liu, F.; Cai, Y.; Ma, B. A Review of Yolo Algorithm Developments. Procedia Comput. Sci. 2022, 199, 1066–1073. [Google Scholar] [CrossRef]
Roy, A.; Bhaduri, J.; Kumar, T.; Raj, K. WilDect-YOLO: An efficient and robust computer vision-based accurate object localization model for automated endangered wildlife detection. Ecol. Inform. 2023, 75, 101919. [Google Scholar] [CrossRef]
Diwan, T.; Anirudh, G.; Tembhurne, J. Object detection using YOLO: Challenges, architectural successors, datasets and applications. Multimed. Tools Appl. 2023, 82, 9243–9275. [Google Scholar] [CrossRef]
Shinde, S.; Kothari, A.; Gupta, V. YOLO based Human Action Recognition and Localization. Procedia Comput. Sci. 2018, 133, 831–838. [Google Scholar] [CrossRef]
Jiang, X.; Ma, J.; Xiao, G.; Shao, Z.; Guo, X. A review of multimodal image matching: Methods and applications. Inf. Fusion 2021, 73, 22–71. [Google Scholar] [CrossRef]
Ohri, K.; Kumar, M. Review on self-supervised image recognition using deep neural networks. Knowl.-Based Syst. 2021, 224, 107090. [Google Scholar] [CrossRef]
Zaidi, S.; Ansari, M.S.; Aslam, A.; Kanwal, N.; Asghar, M.; Lee, B. A survey of modern deep learning based object detection models. Digit. Signal Process. 2022, 126, 103514. [Google Scholar] [CrossRef]
Huang, Y.; Kruyer, A.; Syed, S.; Kayasandik, C.B.; Papadakis, M.; Labate, D. Automated detection of GFAP-labeled astrocytes in micrographs using YOLOv5. Sci. Rep. 2022, 12, 22263. [Google Scholar] [CrossRef]
Lawal, O. Development of tomato detection model for robotic platform using deep learning. Multimed. Tools Appl. 2021, 80, 26751–26772. [Google Scholar] [CrossRef]
Kganyago, M.; Adjorlolo, C.; Mhangara, P.; Tsoeleng, L. Optical remote sensing of crop biophysical and biochemical parameters: An overview of advances in sensor technologies and machine learning algorithms for precision agriculture. Comput. Electron. Agric. 2024, 218, 108730. [Google Scholar] [CrossRef]
Khotijah, S. Prediction for the Goat Sheep on Image Dataset. 2023. Available online: https://www.kaggle.com/code/khotijahs1/prediction-for-the-goat-sheep-on-image/input (accessed on 12 September 2024).
Javanmardi, S.; Miraei Ashtiani, S.H.; Verbeek, F.; Martynenko, A. Computer-vision classification of corn seed varieties using deep convolutional neural network. J. Stored Prod. Res. 2021, 92, 101800. [Google Scholar] [CrossRef]
Dhillon, A.; Verma, G. Convolutional neural network: A review of models, methodologies and applications to object detection. Prog. Artif. Intell. 2020, 9, 85–112. [Google Scholar] [CrossRef]
Abdallah, T.B.; Elleuch, I.; Guermazi, R. Student Behavior Recognition in Classroom using Deep Transfer Learning with VGG-16. Procedia Comput. Sci. 2021, 192, 951–960. [Google Scholar] [CrossRef]
Begum, N.; Hazarika, M. Maturity detection of tomatoes using transfer learning. Meas. Food 2022, 7, 100038. [Google Scholar] [CrossRef]

Figure 1. Stages for developing intelligent classifier.

Figure 2. CNN classifier.

Figure 3. Metrics for the CNN model in training and validation. (a) Accuracy metrics for CNN model. (b) Loss metrics for CNN model.

Figure 4. Metrics for the VGG16 model in training and validation. (a) Accuracy metrics of VGG16 model. (b) Loss metrics of VGG16 model.

Figure 5. Metrics for the EfficientNet model in training and validation. (a) Accuracy metrics for the EfficientNet model. (b) Loss metrics for the EfficientNet model.

Figure 6. Metrics for the ResNet50 model in training and validation. (a) Accuracy metrics for the ResNet50. (b) Loss metrics for the ResNet50.

Figure 7. Metrics for the ResNet101 model in training and validation. (a) Accuracy metrics for the ResNet101. (b) Loss metrics for the ResNet101.

Figure 8. Snapshot of goat images for computer vision model development; YOLO serves as a fast and efficient method for object detection tasks.

Figure 9. Snapshot of sheep images.

Figure 10. Training, validation, and test images after data augmentation.

Figure 11. Comparison of deep learning models.

Table 1. Comparison of previous approaches.

Reference	Methodology	Result
[4]	Meta-learning methods using machine learning	R2 value of 90.8%
[11]	Xception-based fully convolutional neural network (Xception-FCN)	80.8% contour accuracy
[12]	YOLO v7 for detecting faces and muzzles	99.7% accuracy in muzzle detection
[17]	Traditional two-stage object detection systems	Accuracy of 63.4%, while Fast-RCNN reaches 70%

Table 2. Architecture of the convolutional neural network (CNN) model.

Layer	No. of Filters/Units/Value	Kernel Size	Activation Function
Conv1	32	3 × 3	RELU
MaxPooling_1	-	2 × 2	-
Conv2	64	3 × 3	RELU
MaxPooling_2	-	2 × 2	-
Dense_1_units	348	-	RELU
Dense_2_units	1	-	Sigmoid
Learning Rate	0.0001	-	-
Optimizer	Adam	-	-

Table 3. CNN model performance metrics.

Category	Precision (%)	Recall (%)	F1-Score (%)	Count
0: Goat	67	94	78	17
1: Sheep	91	56	69	18
Accuracy	-	-	74	35
Macro average	79	75	74	35
Weighted average	79	74	73	35

Table 4. Architecture of tuned pre-trained VGG16 model.

Layer	Output Shape	No. of Parameters
input1 (Input Layer)	(None, 200, 200, 3)	0
block1conv1 (Conv_2D)	(None, 200, 200, 64)	1792
block1conv2 (Conv_2D)	(None, 200, 200, 64)	36,928
block1pool (Max_Pooling2D)	(None, 100, 100, 64)	0
block2conv1 (Conv_2D)	(None, 100, 100, 128)	73,856
block2conv2 (Conv_2D)	(None, 100, 100, 128)	147,584
block2pool (Max_Pooling2D)	(None, 50, 50, 128)	0
block3conv1 (Conv_2D)	(None, 50, 50, 256)	295,168
block3conv2 (Conv_2D)	(None, 50, 50, 256)	590,080
block3conv3 (Conv_2D)	(None, 50, 50, 256)	590,080
block3pool (Max_Pooling2D)	(None, 25, 25, 256)	0
block4conv1 (Conv_2D)	(None, 25, 25, 512)	1,180,160
block4conv2 (Conv_2D)	(None, 25, 25, 512)	2,359,808
block4conv3 (Conv_2D)	(None, 25, 25, 512)	2,359,808
block4pool (Max_Pooling2D)	(None, 12, 12, 512)	0
block5conv1 (Conv_2D)	(None, 12, 12, 512)	2,359,808
block5conv2 (Conv_2D)	(None, 12, 12, 512)	2,359,808
block5conv3 (Conv_2D)	(None, 12, 12, 512)	2,359,808
block5pool (Max_Pooling2D)	(None, 6, 6, 512)	0
VGG16 (Functional)	(None, 6, 6, 512)	14,714,688
Flatten	(None, 18,432)	0
Dense	(None, 384)	7,078,272
Dropout	(None, 384)	0
Dense1	(None, 1)	385
Total No. of parameters: 14,714,688 (56.13 MB)
No. of trainable parameters: 7,079,424 (27.01 MB)
No. of non-trainable parameters: 7,635,264 (29.13 MB)
Learning rate: 0.0001
Optimizer: Adam

Table 5. Performance metrics of the VGG16 model.

Category	Precision (%)	Recall (%)	F1-Score (%)	Count
0: Goat	76	94	84	17
1: Sheep	93	72	81	18
Accuracy	-	-	83	35
Macro average	85	83	83	35
Weighted average	85	83	83	35

Table 6. Performance metric of the EfficientNet model.

Category	Precision (%)	Recall (%)	F1-Score (%)	Count
0: Goat	89	94	91	17
1: Sheep	94	89	91	18
Accuracy	-	-	91	35
Macro average	92	92	91	35
Weighted average	92	91	91	35

Table 7. Performance metrics of the ResNet50 Model.

Category	Precision (%)	Recall (%)	F1-Score (%)	Count
0: Goat	84	94	89	17
1: Sheep	94	83	88	18
Accuracy	-	-	89	35
Macro average	89	89	89	35
Weighted average	89	89	89	35

Table 8. Performance metric of the ResNet101 model.

Category	Precision (%)	Recall (%)	F1-Score (%)	Count
0: Goat	81	100	89	17
1: Sheep	100	78	88	18
Accuracy	-	-	89	35
Macro average	90	89	88	35
Weighted average	91	89	88	35

Table 9. Performance metric of the ResNet152 model.

Category	Precision (%)	Recall (%)	F1-Score (%)	Count
0: Goat	80	94	86	17
1: Sheep	93	78	85	18
Accuracy	-	-	86	35
Macro average	87	86	86	35
Weighted average	87	86	86	35

Table 10. Training and classification times of different models.

Model	Training Time/Step	Classification Time/Step
CNN	981 ms	1 s
VGG16	2 s 191 ms	135 ms
ResNet50	36 s 165 ms	1 s
ResNet101	47 s 215 ms	34 ms
ResNet152	2 s 296 ms	4 s
EfficientNet	34 s 155 ms	49 ms
Inception	1 s 97 ms	1 s
Xception	2 s 306 ms	1 s 782 ms
MobileNet	1 s 122 ms	469 ms
Yolov8	117 ms	1 s

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yadav, C.S.; Peixoto, A.A.T.; Rufino, L.A.L.; Silveira, A.B.; Alexandria, A.R.d. Intelligent Classifier for Identifying and Managing Sheep and Goat Faces Using Deep Learning. AgriEngineering 2024, 6, 3586-3601. https://doi.org/10.3390/agriengineering6040204

AMA Style

Yadav CS, Peixoto AAT, Rufino LAL, Silveira AB, Alexandria ARd. Intelligent Classifier for Identifying and Managing Sheep and Goat Faces Using Deep Learning. AgriEngineering. 2024; 6(4):3586-3601. https://doi.org/10.3390/agriengineering6040204

Chicago/Turabian Style

Yadav, Chandra Shekhar, Antonio Augusto Teixeira Peixoto, Luis Alberto Linhares Rufino, Aedo Braga Silveira, and Auzuir Ripardo de Alexandria. 2024. "Intelligent Classifier for Identifying and Managing Sheep and Goat Faces Using Deep Learning" AgriEngineering 6, no. 4: 3586-3601. https://doi.org/10.3390/agriengineering6040204

APA Style

Yadav, C. S., Peixoto, A. A. T., Rufino, L. A. L., Silveira, A. B., & Alexandria, A. R. d. (2024). Intelligent Classifier for Identifying and Managing Sheep and Goat Faces Using Deep Learning. AgriEngineering, 6(4), 3586-3601. https://doi.org/10.3390/agriengineering6040204

Article Menu

Intelligent Classifier for Identifying and Managing Sheep and Goat Faces Using Deep Learning

Abstract

1. Introduction

Literature Review

2. Methods

3. Results

Roboflow YOLO

4. Conclusions and Future Scope of Work

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI