Next Article in Journal
Comparison of Single Control Loop Performance Monitoring Methods
Next Article in Special Issue
Exploration of Machine Learning Algorithms for pH and Moisture Estimation in Apples Using VIS-NIR Imaging
Previous Article in Journal
Hyperspectral Anomaly Detection Based on Multi-Feature Joint Trilateral Filtering and Cooperative Representation
Previous Article in Special Issue
A Computer Vision Milky Way Compass
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Determination of the Live Weight of Farm Animals with Deep Learning and Semantic Segmentation Techniques

Department of Computer Engineering, Faculty of Engineering Natural Sciences, Maltepe University, 34857 Istanbul, Turkey
Appl. Sci. 2023, 13(12), 6944; https://doi.org/10.3390/app13126944
Submission received: 5 May 2023 / Revised: 4 June 2023 / Accepted: 7 June 2023 / Published: 8 June 2023
(This article belongs to the Special Issue Applied Computer Vision in Industry and Agriculture)

Abstract

:
In cattle breeding, regularly taking the animals to the scale and recording their weight is important for both the performance of the enterprise and the health of the animals. This process, which must be carried out in businesses, is a difficult task. For this reason, it is often not performed regularly or not performed at all. In this study, we attempted to estimate the weights of cattle by using stereo vision and semantic segmentation methods used in the field of computer vision together. Images of 85 animals were taken from different angles with a stereo setup consisting of two identical cameras. The distances of the animals to the camera plane were calculated by stereo distance calculation, and the areas covered by the animals in the images were determined by semantic segmentation methods. Then, using all these data, different artificial neural network models were trained. As a result of the study, it was revealed that when stereo vision and semantic segmentation methods are used together, live animal weights can be predicted successfully.

1. Introduction

Livestock farming has become an important industrial sector as well as a side occupation for people engaged in agriculture in rural areas. Thanks to practices such as cooperatives, producer unions, registered breeding, artificial insemination practices, and livestock supports, the place of the livestock sector in the country’s economy has started to gain more importance. It is necessary to determine the weight of the animals raised in cattle breeding farms and to follow them regularly. Increasing the profitability of the business depends on the regular follow-up of live weight [1].
The most common method of measuring the live weight of farm animals is traditional measurement using a scale. Although this direct approach is very accurate, it comes with various difficulties and limitations. Firstly, animals are required to be moved to the site of measurement scale, which can be time-consuming and laborious, especially in farms with a large number of animals. Secondly, this whole operation with the separation of animals from their natural environment causes stress, and therefore negatively affects their health and milk yield. Due to those drawbacks of direct measurement approaches, a variety of indirect measurement approaches have been proposed in the literature [2]. In indirect measurement, the true value of animal live weight is estimated by a regression model trained on various features extracted from measurements obtained from several sensors such as 2D [3] and 3D cameras [4], thermal cameras [5], and ultrasonic sensors [6].
In this study, we consider the determination of the live weight of farm animals as a computer vision and a regression problem. First, we obtain the images of farm animals using a stereo setup. Then, applying deep learning-based semantic segmentation techniques, we extract distance and size data from images to feed into a regression model. Finally, we obtain the weight estimates from the regression model as a proxy for the actual weights of the animals. The main motivation for our study was to apply state-of-the-art image processing techniques using modern deep learning approaches to propose an effective solution to the problem considered. The main contributions and novelty of our study can be summarized as follows:
  • We propose an effective indirect measurement method for determining the live weight of farm animals based on stereo vision and state-of-the-art semantic segmentation techniques using deep learning.
  • Our method is particularly important in that animals’ body measurements are taken without the need for separating them from their natural environments and thus not adversely affecting their health and milk yield.
  • We propose a very simple yet effective system and setup composed of relatively cheaper hardware that is accessible and affordable for many farms of small to large scale.
  • We investigate and compare the performances of three different Artificial Neural Network (ANN) architectures in estimating live animal weight.
The rest of this paper is organized as follows. The related work is reviewed in Section 2. In Section 3, we present the materials and methods used in the study. We present our experimental results and discussion in Section 4 and Section 5, respectively. Finally, in Section 6, we conclude the paper.

2. Related Work

In this section, we provide essential background on livestock weight estimation with a review of significant past research. Our focus in this review is on the work with indirect measurement approaches based on image processing techniques. We also summarize them in Table 1.
There are several studies in the literature that are based on image processing techniques on 2D images. In a study by Weber et al., the live body weight of cattle was estimated using dorsal area images taken from above using a kind of fence system [7]. Their system first performs segmentation and then generates a convex hull around the segmented area to obtain features to feed a Random Forest-based regression model. Tasdemir and Ozkan performed a study where they predicted the live weight of cows using an ANN-based regression model [8]. They determined various body dimensions such as wither height, hip height, body length, and hip width applying photogrammetric techniques on images of cows captured from various angles. Wang et al. developed an image processing-based system to estimate the body weight of pigs [9]. Their main approach was to process images captured from above to extract features such as area, convex area, perimeter, and so on. Then, using these features, they trained an ANN-based regression model for weight prediction. A Fuzzy Rule-Based System was also utilized in cattle weight estimation by Anifah and Haryanto [10]. They obtained 2D side images of cattle from a very close distance of 1.5 m. After applying the Gabor filter to the images, they obtained body length and circumference as features. Finally, they designed a fuzzy logic system to estimate body weight.
Three-dimensional imaging techniques also found application in body weight estimation systems. Hansen et al. used a 3D Kinect-like depth camera to obtain the views of cows from above as they passed along a fence [11]. Applying thresholding, they obtained the segmented area of cows to reach a body weight estimate. In another study where a 3D Kinect camera was used, Fernandes et al. processed images taken from above of pigs by applying two segmentation steps [12]. Then, they extracted features from segmented images such as body area, volume, width, and height to feed a linear regression model to obtain the weight estimate. In a similar study, Cominotte et al. developed a system to capture images of cattle using a 3D Kinect camera [13]. They trained and compared a number of linear and non-linear regression models by feeding them with features extracted from segmented images. In a study by Martins et al., a 3D Kinect camera was used to capture images of cows from lateral and dorsal perspectives [14]. They used several measurements obtained from these images to run a Lasso regression model to estimate body weight. Nir et al. used a 3D Kinect camera as well to take images of dairy heifers to estimate height and body mass [15]. Their approach was to fit an ellipse to the body image to calculate some features. Then, they used these features to train various linear regression models. Song et al. created a system to estimate the body weight of cows using a 3D camera system [16]. Similar to previous studies, they extracted morphological features from 3D images such as hip height, hip width, and rump length. Combining these features with some other cow data such as days in milk, age, and parity, they trained multiple linear regression models. Another study that employed a 3D Kinect camera is the one conducted by Pezzuolo et al. [17]. They captured body images of pigs using two cameras from top and side, and then extracted body dimensions from images such as heart girth, length, and height using image processing techniques. They developed linear and non-linear regression models based on these dimensions to predict weight.
Advanced scanning devices were also introduced in body weight estimation studies. Le Cozler et al. used a 3D full-body scanning device to obtain very detailed body images of cows [18]. Then, they computed body measures from these 3D images such as volume, area, and other morphological traits. Using these measures, they trained and compared several regression models. Stajnko et al. developed a system to make use of thermal camera images of cows to extract body features and then used them in several linear regression models to estimate body weight [19].
Stereo vision techniques are also used in the determination of live animal weight. Shi et al. developed a regression model to analyze and estimate the body size and live weight of farm pigs under indoor conditions in a farm [20]. Their system was based on a binocular stereo vision system and a special fence system through which animals passed for taking the measurements. They segmented the images obtained from the stereo system using a depth threshold and predicted the body length and withers height, then the body weight. Some other notable studies using stereo vision are by Nishide et al. and Yamashita et al. [21,22].
Deep learning-based approaches are very popular today due to their success in image-processing applications. Deep learning is a special form of neural network algorithm. Although it has achieved the most advanced results in many fields, its use in determining the weight of livestock is limited [23]. There are studies that apply deep learning algorithms and determine the weight of pigs [24,25].
When we examine the prior research on the estimation of live body weight of farm animals such as pigs, cattle, cows, and heifers, there is a common approach to capturing images of animals that the animals are forced to move into special types of boxes or fences, or they are forced to pass through a special passage. This operation is very similar to traditional weight measurement with scales, and therefore, it also requires the separation of animals from their natural environment, and it causes stress-related problems in their health and milk yield [3]. Our proposed approach is superior to this in that animals’ pictures are taken in their natural environments without the need for a special measurement station. Additionally, our approach is totally contactless and pictures do not need to be taken from very close proximity, unlike previous studies. One other advantage of our proposed approach provides a simpler structure and setup composed of relatively cheaper hardware that can be accessible and affordable for many farms of small to large scale. Last but not least, we employ modern and state-of-the-art deep learning-based image processing techniques in our system, which is one of the few such studies.

3. Materials and Methods

3.1. Overview of the Proposed Method

Our proposed system is composed of a number of steps performing various tasks from raw data collection to model training. These steps are presented in Figure 1 as block components and they are described in their respective subsections.

3.2. Data Collection

In the study, a stereo setup was prepared to obtain animal images. The stereo mechanism is used to capture digital images with stereo vision techniques used in computer vision and to obtain some inferences from these images. The setup used in the study is shown in Figure 2.
During the data collection phase, 85 animals were photographed from the side and the back with this setup. In total, 170 pairs and 340 stereo images were obtained. Using stereo vision techniques on these images, the distance of each animal to the camera plane was calculated.
Architectural components of the stereo setup are given in Figure 2 and their relationships are presented in Figure 3. At the heart of the system is a Raspberry Pi 4 microcomputer with 4 GB of RAM, where the Python code we developed runs to capture animal images. It is powered by a mobile power supply. Two Microsoft Lifecam Studio Webcams are connected to it via two USB ports. A mobile phone with Android OS acts as a monitor and it is connected to Raspberry Pi via Video Capture USB 2.0 to HDMI converter. Finally, a wireless mini integrated keyboard and touchpad are used to control the device.

3.3. Stereo Vision and Image Correction

Stereo vision is a technique used to calculate the distance and position of a point to the camera plane viewed by two cameras whose relative positions and projections are known. A single camera is a mapping between a 3D world and a 2D image [26,27,28]. The geometry of a stereo setup consisting of two identical cameras is shown in Figure 4.
Here, O l and Or are the focal points of both cameras, f is the focal length of both cameras, P is any point in space, Z is the distance of this point in space to the camera plane, T is the translation value between the two cameras. x l and x r are reflections of the P point on both viewing planes. This geometry creates similar triangles between the P x l x r and P O l O r points. The Z value can be easily calculated using Equation (1) and the similarity theorem.
T ( x l x r ) Z f = T Z Z = f T x l x r Z = f T d
In Figure 4, the x l x r value is expressed with the variable d. In stereo vision, the d value is also expressed as disparity. In order to increase the accuracy of the stereo vision calculation, stereo calibration is required. Stereo calibration is related to the rotation matrix R, which defines the relative rotation between the coordinate systems of the two cameras, and the transformation vector T, which defines the translation of the two camera centers. After a correctly performed calibration, R and T matrices are obtained. By using the calibration matrices obtained as a result of stereo calibration, corrections or rectification processes can be made on stereo images. Stereo rectification ensures that objects are positioned correctly in pairs of images to match the stereo arrangement. Thus, the stereo distance calculation is performed with less cost and higher accuracy. Stereo rectification aligns the image pair for more reliable stereo distance results [28]. Example images obtained with the help of stereo setup are shown in Figure 5.

3.4. Deep Learning and Semantic Segmentation

Deep learning, a sub-branch of machine learning, is used in many different fields. Deep learning algorithms offer better results than traditional machine learning algorithms if more data are provided. Therefore, object segmentation approaches such as Mask R-CNN [29] based on deep learning can also be used to perform tasks such as weight estimation. Semantic segmentation is used to determine object boundaries. In this study, deep learning semantic segmentation methods were used on stereo images, and then the areas covered by the animals in the images were determined. Semantic segmentation classifies each pixel in the image as belonging to a class. Various models have been introduced in semantic segmentation over time: the Fully Convolutional Network [30], which is based on deep learning; U-Net [31], which takes its name from its architecture and is used especially in medical problems; and Deeplab v3+ [32], which showed the highest success in segmentation tasks in the PASCAL VOC 2012 dataset in 2018. In this study, the PASCAL VOC 2012 dataset and Deeplab v3+ segmentation model were used to perform segmentation tasks on the rectified images. Deeplab v3+ architecture is shown in Figure 6.
The segmentation results on the images taken with the model used are shown in Figure 7 and Figure 8.

3.5. Dataset Creation

After completing the segmentation processes in all images, the number of pixels occupied by the animals was calculated in the segmentation maps of the images from the left and right cameras for each animal. The distance of each animal to the camera plane was calculated by stereo calculation technique using segmentation maps. In order to calculate the distance of the animal to the camera plane from the segmentation maps, the position of the left border of the animal in pixels on the X-axis was determined in each of the image pairs. The disparity (d) value was calculated by subtracting the limit value in the segmentation map from the left camera and the limit value in the segmentation map coming from the right camera. Figure 9 shows the pixel numbers of the areas covered by a sample animal in the stereo image pair and the X-axis value of the left border of the animal in both images.
The stereo distance calculation for a single animal is conducted as follows. As seen in Figure 9, if the X L (Left camera view) and X R (Right camera view) values are subtracted from each other, the disparity (d) value is found as 23 pixels. Along with this value, the distance value can be easily calculated using the focal length (f) from the stereo calibration matrices and the shift value (T) between the cameras from the translation matrix. As a result of the camera calibration processes, the focal length distance was obtained as 646.45 cm. The translation T value for our setup is 9.92 cm. The stereo distance calculation for the example animal in Figure 8 was obtained as in Equation (2).
Z = f T d 646.45 × 9.92 23 = 278.82
After calculating the distance values for each animal, the number of pixels occupied by the animals in the images was also determined. Using all these values, a dataset consisting of 85 rows was created for a total of 85 animals. The created dataset is shown in Table 2 and Table 3, and the distances are written in meters.

3.6. Model Training

In the images obtained, the number of pixels in the area occupied by the animal, that is, the segmentation data, does not make any sense on its own. Even if an animal is light in weight, it will take up a lot of space in the image if it is viewed close to the plane of the camera. The opposite is also possible. Pixel numbers are directly proportional to weight, and disparity value is inversely proportional to weight. An increase in the disparity value means that the animal is viewed from a point close to the camera plane. In the study, distance-related errors are eliminated, since the stereo camera setup is calibrated. In the images obtained, the values were made meaningful by considering the stereo distance variable. When the prepared dataset is examined, it is seen that there are data at very different scales from each other. While the pixel numbers in the image are expressed in thousands, the stereo distances are expressed in a few meters, and the disparity values are expressed in the range of 5 and 30 pixels. Training a neural network with such inputs may take a lot of time and the network may not be successful enough. Data at such different scales should be expressed as values close to each other by normalization techniques. The main reason for this is that these features are multiplied by the model weights. Data normalization also accelerates the training time by transforming the raw data into a specific range. Data normalization is extremely useful for modeling applications where the inputs are often at very different scales [33]. In this study, the Z-score normalization technique was used. Here, the Z-score value is calculated by Equation (3), where μ represents the arithmetic mean of the data, σ standard deviation, and Z k the data to be normalized.
Z k = Z k μ σ
In the study, three different artificial neural networks were trained after the data obtained from the images taken from different directions were normalized. The first network (ANN-1) is trained with image data taken from the side, the second network (ANN-2) from the back, and the third network (ANN-3) from both directions. A total of 90% of the dataset is reserved for training artificial neural networks and 10% for testing. The architecture of artificial neural networks used in the proposed system is shown in Table 4.
ANN-1 and ANN-2 artificial neural networks used for training are fully connected networks with a three-element input layer, two hidden layers consisting of 64 nodes, and an output layer consisting of one element. Each network has 4488 parameters. The ReLU function is used as the activation function, the mean absolute error function is used as the loss function, the Adam optimizer is used as the optimizer, and a constant value of 10 3 is used as the learning rate value. A total of 1000 training steps were seen as sufficient. The ANN-3 network has the same features as other networks. It covers the entire dataset. Therefore, the number of inputs is 8 and the total number of parameters is 4818.

3.7. Recommended Method for Weight Prediction

The performed study is a hybrid system that makes weight estimation using semantic segmentation and stereo distance data together. The basic operation steps of this system for weight predictions are shown in Figure 10.

4. Results

In this section, we present the prediction performances of the neural networks trained in a comparative manner. The performance levels of the networks are shown in Figure 11.
The success rate of the ANN-1 network is higher than the ANN-2 network. The reason for this is the inability of the images taken from the back to reveal the general body dimensions of the animal. On the other hand, the performance rate of the ANN-3 network is higher than the other two networks. This is because the network was trained with data from images taken from both angles of animals. Randomly, 10% of the taken images were not used in the training but in the testing of the estimated animal weights. Weight estimation was made separately for the three proposed networks and the results are shown in Table 5, Table 6 and Table 7.
As seen in Table 5, the estimations for the test data made by the ANN-1 network vary between approximately ±50 kg. Note that ANN-1 is only trained with data obtained from the side. In Table 6, the error amounts in the estimations made by the ANN-2 network, which was trained only with photographs taken from the back, vary between approximately ±50 kg. However, the error rates increased dramatically in animals with id numbers 36, 70, and 81. This significantly reduces the accuracy of the network trained with images taken from behind. The reason for this is the inability of the images taken from the back to reveal the general body dimensions of the animal. Table 7 shows the results obtained from the ANN-3 network trained with the entire dataset. In most cases, the predictions were made with a margin of error of approximately ±20 kg, and much more successful results were obtained than the first two networks. The animal image taken in prediction number 36 with a high amount of error is very close to the camera plane. The image of the animal taken very close to the camera plane causes serious errors as it cannot be adequately represented in the dataset. For this reason, it would be more appropriate to take the images to be obtained at reasonable distances not very close to the camera plane.
In this study, the K-fold cross-validation technique was used to test the validity of the proposed method and the accuracy of the results obtained. K-fold cross-validation is one of the methods of splitting the dataset for evaluation of classification models and training of the model [34,35]. This method is used to generate random layers. Each layer represents a combination of training data subset and test data subset sections for training and validating machine learning models. For each layer, a certain accuracy value is obtained for the model. For example, in the case of 10-fold cross-validation, the overall accuracy is estimated by averaging the accuracy values produced by all 10 folds. For any dataset with a given number of samples, there are many possible combinations of training and test datasets that can be generated. Some of these datasets are used to train the model and some are used to test the success of the model. Therefore, it allows each divided part to be used separately for both training and testing. The representation of the K-fold cross-validation method for K = 10 is given in Figure 12.
Training and testing the model up to K can take a long time and can be costly in terms of computation and time for large datasets. On the other hand, it provides a reliable result. In this study, the K value was accepted as 10, and validity tests were carried out. Here, the test and training images at each step are meaningfully segmented. A similar situation was repeated at each K step and validity tests were performed on different images. In this study, the validity of the ANN-3 architecture, which was trained using both side and rear images, was tested with K-fold. The results obtained are given in Figure 12. When the predicted values obtained in each K step are compared with the actual values in Table 8, it can be concluded that the proposed model is quite successful. It is thought that 85 animals are not enough to successfully train a neural network. In addition, the weight distribution of the animals, whose images were taken with the stereo device, is generally around 400 kg. Therefore, the estimates made by nets are generally more successful for animals weighing 400 kg. Another weakness of the dataset we created is that animal images are generally taken from 6 to 8 m away. During the image acquisition phase, it was mostly not possible to take images from closer distances, such as 2–3 m, due to frightening the animals. At these distances, stereo vision works more successfully than at distances of 6–8 m. Utilizing all this information, more successful results can be obtained from a trained network with more animal images whose weights are normally distributed. In order to train a neural network successfully, the dataset on which the neural network is trained must be large enough, that is, it must consist of a sufficient number of observations [36]. All known possible variations of the problem area should be added to the dataset. Adequate data delivery to a system is necessary to obtain a robust and reliable network [37,38]. For example, the generated third neural network is trained with data created with images taken from both the side and the back. The amount of error in the weight estimations made by this neural network decreased to the range of ±20 kg.

5. Discussion

In this study, an attempt was made to estimate live animal weight by using stereo vision and semantic segmentation methods in the literature. Within the scope of the study, a stereo vision device was prepared, and stereo images of 85 cattle whose weights were known beforehand were obtained with this setup. Segmentation maps of the animals in these images were created with the Deeplab v3+ deep learning model, which is one of the semantic segmentation models.
Using the segmentation maps, the number of pixels covered by each animal in the image and their distance to the camera plane were calculated using the stereo distance calculation technique. A dataset was created by combining these obtained data. The dataset was created from the data obtained from photographs of animals taken from two different angles, from the side, and from the back.
Using this dataset, three different artificial neural networks, which are architecturally similar to each other, are trained. When the trained neural networks were compared, it was seen that the third neural network trained with the whole dataset was significantly more successful than the first two neural networks. At this point, it is clear that neural networks to be trained with datasets created with images taken from more angles will be more successful. For example, top images of cattle contain important information about the animal’s body structure. It can be said that networks trained with a dataset that includes top-shot data, if possible, will be more successful.
In addition, it is possible to say that neural networks will make more successful predictions if the quality and quantity of the dataset are increased. In the resulting estimations, although rare, dramatically incorrect estimations were observed. Weight estimations of animals that were limited in number in the dataset, that were light in weight, and whose stereo distance was very different from the rest of the dataset were found to be relatively unsuccessful. Therefore, it is clear that creating a more comprehensive and homogeneously distributed dataset will significantly increase the performance of the models.
Moreover, characteristics such as race and gender of animals directly affect their weight. For example, if the body sizes of two animals of different breeds are assumed to be exactly the same, it will be seen that the weights of these two animals are different from each other. At this point, in the study, a deep learning method that recognizes the breed and gender of the animal can be developed and the performance in weight estimation can be increased with a separate training model for each breed.

6. Conclusions

In this study, we considered the problem of live weight prediction of farm animals from a computer vision perspective. We applied state-of-the-art stereo vision and deep learning-based semantic segmentation using the setup we created that consists of a Raspberry Pi 4 microcomputer and two identical cameras. We used this setup to capture images of 85 farm animals taken from different angles. Applying stereo distance computation and semantic segmentation, we created a dataset to train various ANN models. Our test results of the trained ANNs suggest that our proposed system achieves good performance in terms of weight prediction. The most significant feature of the system is that it does not require the separation of animals from their natural environment to measure their weight, unlike traditional systems. This is particularly important because the separation is known to cause stress and negatively affect health and milk yield. Therefore, our system provides a convenient and contact-free weight measurement with minimal measurement error. The main limitation of our study is the number of images captured from real farm environments. It would be possible to achieve more accurate measurement predictions if more data were available and ANNs were trained with more data.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data can be obtained from the corresponding author upon reasonable request.

Acknowledgments

I would like to thank Volkan Tunali for his invaluable suggestions for the research.

Conflicts of Interest

The author declares no conflict of interest.

References

  1. Kaya, M. Laktasyondaki Holştayn Ineklerde Canlı Ağırlık ve Beden Kondisyon Skorunun Sayısal Görüntü Analizi Yöntemi ile Belirlenebilirliği. Ph.D. Thesis, Aydın Adnan Menderes University, Aydın, Turkey, 2019. [Google Scholar]
  2. Dang, C.; Choi, T.; Lee, S.; Lee, S.; Alam, M.; Park, M.; Han, S.; Lee, J.; Hoang, D. Machine Learning-Based Live Weight Estimation for Hanwoo Cow. Sustainability 2022, 14, 12661. [Google Scholar] [CrossRef]
  3. Wang, Z.; Shadpour, S.; Chan, E.; Rotondo, V.; Wood, K.M.; Tulpan, D. ASAS-NANP SYMPOSIUM: Applications of machine learning for livestock body weight prediction from digital images. J. Anim. Sci. 2021, 99, skab022. [Google Scholar] [CrossRef] [PubMed]
  4. Na, M.H.; Cho, W.H.; Kim, S.K.; Na, I.S. Automatic weight prediction system for Korean cattle using Bayesian ridge algorithm on RGB-D image. Electronics 2022, 11, 1663. [Google Scholar] [CrossRef]
  5. Vindis, P.; Brus, M.; Stajnko, D.; Janzekovic, M. Non invasive weighing of live cattle by thermal image analysis. In New Trends in Technologies: Control, Management, Computational Intelligence and Network Systems; IntechOpen: London, UK, 2010. [Google Scholar]
  6. Wang, Q. A Body Measurement Method Based on the Ultrasonic Sensor. In Proceedings of the 2018 IEEE International Conference on Computer and Communication Engineering Technology (CCET), Beijing, China, 18–20 August 2018; pp. 168–171. [Google Scholar]
  7. Weber, V.A.M.; de Lima Weber, F.; da Silva Oliveira, A.; Astolfi, G.; Menezes, G.V.; de Andrade Porto, J.V.; Rezende, F.P.C.; de Moraes, P.H.; Matsubara, E.T.; Mateus, R.G.; et al. Cattle weight estimation using active contour models and regression trees Bagging. Comput. Electron. Agric. 2020, 179, 105804. [Google Scholar] [CrossRef]
  8. Tasdemir, S.; Ozkan, I.A. ANN approach for estimation of cow weight depending on photogrammetric body dimensions. Int. J. Eng. Geosci. 2019, 4, 36–44. [Google Scholar] [CrossRef] [Green Version]
  9. Wang, Y.; Yang, W.; Winter, P.; Walker, L. Walk-through weighing of pigs using machine vision and an artificial neural network. Biosyst. Eng. 2008, 100, 117–125. [Google Scholar] [CrossRef]
  10. Anifah, L.; Haryanto. Decision Support System Two Dimensional Cattle Weight Estimation using Fuzzy Rule Based System. In Proceedings of the 2021 3rd East Indonesia Conference on Computer and Information Technology (EIConCIT), Surabaya, Indonesia, 9–11 April 2021; pp. 374–378. [Google Scholar]
  11. Hansen, M.F.; Smith, M.L.; Smith, L.N.; Jabbar, K.A.; Forbes, D. Automated monitoring of dairy cow body condition, mobility and weight using a single 3D video capture device. Comput. Ind. 2018, 98, 14–22. [Google Scholar] [CrossRef]
  12. Fernandes, A.F.; Dórea, J.R.; Fitzgerald, R.; Herring, W.; Rosa, G.J. A novel automated system to acquire biometric and morphological measurements and predict body weight of pigs via 3D computer vision. J. Anim. Sci. 2019, 97, 496–508. [Google Scholar] [CrossRef]
  13. Cominotte, A.; Fernandes, A.; Dorea, J.; Rosa, G.; Ladeira, M.; van Cleef, E.; Pereira, G.; Baldassini, W.; Neto, O.M. Automated computer vision system to predict body weight and average daily gain in beef cattle during growing and finishing phases. Livest. Sci. 2020, 232, 103904. [Google Scholar] [CrossRef]
  14. Martins, B.; Mendes, A.; Silva, L.; Moreira, T.; Costa, J.; Rotta, P.; Chizzotti, M.; Marcondes, M. Estimating body weight, body condition score, and type traits in dairy cows using three dimensional cameras and manual body measurements. Livest. Sci. 2020, 236, 104054. [Google Scholar] [CrossRef]
  15. Nir, O.; Parmet, Y.; Werner, D.; Adin, G.; Halachmi, I. 3D Computer-vision system for automatically estimating heifer height and body mass. Biosyst. Eng. 2018, 173, 4–10. [Google Scholar] [CrossRef]
  16. Song, X.; Bokkers, E.; van der Tol, P.; Koerkamp, P.G.; Van Mourik, S. Automated body weight prediction of dairy cows using 3-dimensional vision. J. Dairy Sci. 2018, 101, 4448–4459. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  17. Pezzuolo, A.; Guarino, M.; Sartori, L.; González, L.A.; Marinello, F. On-barn pig weight estimation based on body measurements by a Kinect v1 depth camera. Comput. Electron. Agric. 2018, 148, 29–36. [Google Scholar] [CrossRef]
  18. Le Cozler, Y.; Allain, C.; Xavier, C.; Depuille, L.; Caillot, A.; Delouard, J.; Delattre, L.; Luginbuhl, T.; Faverdin, P. Volume and surface area of Holstein dairy cows calculated from complete 3D shapes acquired using a high-precision scanning system: Interest for body weight estimation. Comput. Electron. Agric. 2019, 165, 104977. [Google Scholar] [CrossRef]
  19. Stajnko, D.; Brus, M.; Hočevar, M. Estimation of bull live weight through thermographically measured body dimensions. Comput. Electron. Agric. 2008, 61, 233–240. [Google Scholar] [CrossRef]
  20. Shi, C.; Teng, G.; Li, Z. An approach of pig weight estimation using binocular stereo system based on LabVIEW. Comput. Electron. Agric. 2016, 129, 37–43. [Google Scholar] [CrossRef]
  21. Nishide, R.; Yamashita, A.; Takaki, Y.; Ohta, C.; Oyama, K.; Ohkawa, T. Calf robust weight estimation using 3D contiguous cylindrical model and directional orientation from stereo images. In Proceedings of the Ninth International Symposium on Information and Communication Technology, Danang City, Vietnam, 6–7 December 2018; pp. 208–215. [Google Scholar]
  22. Yamashita, A.; Ohkawa, T.; Oyama, K.; Ohta, C.; Nishide, R.; Honda, T. Estimation of calf weight from fixed-point stereo camera images using three-dimensional successive cylindrical model. In Proceedings of the 5th IIAE International Conference on Intelligent Systems and Image Processing, Kitakyushu, Japan, 27–31 March 2017; pp. 247–254. [Google Scholar]
  23. Dohmen, R.; Catal, C.; Liu, Q. Image-based body mass prediction of heifers using deep neural networks. Biosyst. Eng. 2021, 204, 283–293. [Google Scholar] [CrossRef]
  24. Cang, Y.; He, H.; Qiao, Y. An intelligent pig weights estimate method based on deep learning in sow stall environments. IEEE Access 2019, 7, 164867–164875. [Google Scholar] [CrossRef]
  25. Suwannakhun, S.; Daungmala, P. Estimating pig weight with digital image processing using deep learning. In Proceedings of the 2018 14th International Conference on Signal-Image Technology and Internet-Based Systems (SITIS), Las Palmas de Gran Canaria, Spain, 26–29 November 2018; pp. 320–326. [Google Scholar]
  26. Elnashef, B.; Filin, S. Target-free calibration of flat refractive imaging systems using two-view geometry. Opt. Lasers Eng. 2022, 150, 106856. [Google Scholar] [CrossRef]
  27. Lu, B.; He, Y.; Wang, H. Stereo disparity optimization with depth change constraint based on a continuous video. Displays 2021, 69, 102073. [Google Scholar] [CrossRef]
  28. Shete, P.P.; Sarode, D.M.; Bose, S.K. A real-time stereo rectification of high definition image stream using GPU. In Proceedings of the 2014 International Conference on Advances in Computing, Communications and Informatics (ICACCI), Delhi, India, 24–27 September 2014; pp. 158–162. [Google Scholar]
  29. He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask r-cnn. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2961–2969. [Google Scholar]
  30. Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar]
  31. Punn, N.S.; Agarwal, S. Modality specific U-Net variants for biomedical image segmentation: A survey. Artif. Intell. Rev. 2022, 55, 5845–5889. [Google Scholar] [CrossRef] [PubMed]
  32. Wang, C.; Du, P.; Wu, H.; Li, J.; Zhao, C.; Zhu, H. A cucumber leaf disease severity classification method based on the fusion of DeepLabV3+ and U-Net. Comput. Electron. Agric. 2021, 189, 106373. [Google Scholar] [CrossRef]
  33. Xu, A.; Chang, H.; Xu, Y.; Li, R.; Li, X.; Zhao, Y. Applying artificial neural networks (ANNs) to solve solid waste-related issues: A critical review. Waste Manag. 2021, 124, 385–402. [Google Scholar] [CrossRef] [PubMed]
  34. Gunasegaran, T.; Cheah, Y.N. Evolutionary cross validation. In Proceedings of the 2017 8th International Conference on Information Technology (ICIT), Amman, Jordan, 17–18 May 2017; pp. 89–95. [Google Scholar] [CrossRef]
  35. Wong, T.T.; Yang, N.Y. Dependency Analysis of Accuracy Estimates in k-Fold Cross Validation. IEEE Trans. Knowl. Data Eng. 2017, 29, 2417–2427. [Google Scholar] [CrossRef]
  36. Alwosheel, A.; van Cranenburgh, S.; Chorus, C.G. Is your dataset big enough? Sample size requirements when using artificial neural networks for discrete choice analysis. J. Choice Model. 2018, 28, 167–182. [Google Scholar] [CrossRef]
  37. Cömert, Z.; Kocamaz, A. A Study of Artificial Neural Network Training Algorithms for Classification of Cardiotocography Signals. Bitlis Eren Univ. J. Sci. Technol. 2017, 7, 93–103. [Google Scholar] [CrossRef] [Green Version]
  38. Basheer, I.; Hajmeer, M. Artificial neural networks: Fundamentals, computing, design, and application. J. Microbiol. Methods 2000, 43, 3–31. [Google Scholar] [CrossRef]
Figure 1. General block diagram of the study.
Figure 1. General block diagram of the study.
Applsci 13 06944 g001
Figure 2. The stereo vision mechanism used in the study.
Figure 2. The stereo vision mechanism used in the study.
Applsci 13 06944 g002
Figure 3. Architectural components of the stereo setup.
Figure 3. Architectural components of the stereo setup.
Applsci 13 06944 g003
Figure 4. Diagram of stereo camera system.
Figure 4. Diagram of stereo camera system.
Applsci 13 06944 g004
Figure 5. Rectified and unrectified stereo images: (a,e) Original image taken from left camera. (b,f) Original image taken from right camera. (c,g) Rectified left camera view. (d,h) Rectified right camera view.
Figure 5. Rectified and unrectified stereo images: (a,e) Original image taken from left camera. (b,f) Original image taken from right camera. (c,g) Rectified left camera view. (d,h) Rectified right camera view.
Applsci 13 06944 g005
Figure 6. Deeplab v3+ Architecture.
Figure 6. Deeplab v3+ Architecture.
Applsci 13 06944 g006
Figure 7. Segmentation results in stereo images taken from the side: (a) Left camera input image; (b) Left camera segmentation map; (c) Left camera segmentation overlay; (d) Right camera input image; (e) Right camera segmentation map; (f) Right camera segmentation overlay.
Figure 7. Segmentation results in stereo images taken from the side: (a) Left camera input image; (b) Left camera segmentation map; (c) Left camera segmentation overlay; (d) Right camera input image; (e) Right camera segmentation map; (f) Right camera segmentation overlay.
Applsci 13 06944 g007
Figure 8. Segmentation results in stereo images taken from the rear: (a) Left camera input image; (b) Left camera segmentation map; (c) Left camera segmentation overlay; (d) Right camera input image; (e) Right camera segmentation map; (f) Right camera segmentation overlay.
Figure 8. Segmentation results in stereo images taken from the rear: (a) Left camera input image; (b) Left camera segmentation map; (c) Left camera segmentation overlay; (d) Right camera input image; (e) Right camera segmentation map; (f) Right camera segmentation overlay.
Applsci 13 06944 g008
Figure 9. Disparity value and pixels count of the animal in the image: (a) Left camera segmentation map; (b) Right camera segmentation map.
Figure 9. Disparity value and pixels count of the animal in the image: (a) Left camera segmentation map; (b) Right camera segmentation map.
Applsci 13 06944 g009
Figure 10. Basic Operation Steps for Weight Prediction.
Figure 10. Basic Operation Steps for Weight Prediction.
Applsci 13 06944 g010
Figure 11. Loss graphs of artificial neural networks: (a) Loss graph of ANN-1 network; (b) Loss graph of ANN-2 network; (c) Loss graph of ANN-3 network.
Figure 11. Loss graphs of artificial neural networks: (a) Loss graph of ANN-1 network; (b) Loss graph of ANN-2 network; (c) Loss graph of ANN-3 network.
Applsci 13 06944 g011
Figure 12. General structure of the k-fold cross-validation method.
Figure 12. General structure of the k-fold cross-validation method.
Applsci 13 06944 g012
Table 1. Summary of the previous studies.
Table 1. Summary of the previous studies.
ReferenceAnimal TypeImage TypeMethodEnvironment
[7]cattle2Dsegmentation + convex hull, random forest regressionfence system
[8]cow2DANN Regression-
[9]pig2DANN Regression-
[10]cattle2Dgabor filter, fuzzy logic-
[11]cow3Dsegmentationfence system
[12]pig3Dsegmentation, linear regression-
[13]cattle3Dsegmentation, linear and non-linear regression-
[14]cow3DLasso regressionfence system
[15]heifer3Dellipse fitting, linear regressionnarrow passage
[16]cow3Dlinear regression-
[17]pig3Dlinear and non-linear regression-
[18]cow3D full-body scanlinear regressionspecial scanning station
[19]cowthermallinear regression-
[20]pigstereo visionleast squares regressionfence system
[21]calfstereo visionlinear regression-
[22]calfstereo visionlinear regression-
[23]heifer2Ddeep learning-based image processing and regression-
[24]pig3Ddeep learning-based image processing and regression-
[25]pig2Ddeep learning-based image processing and regression-
Table 2. Dataset created using semantic segmentation and stereo images.
Table 2. Dataset created using semantic segmentation and stereo images.
IDLeft Side Shooting (Pixel)Right Side Side Shooting (Pixel)Side Pixel Difference (Pixel)Side Distance (m)Left Back Shooting (Pixel)Right Back Shooting (Pixel)Back Pixel Difference (Pixel)Back Distance (m)Real Weight (kg)
128,35928,270115.8313,48711,858164.00448
223,35523,495106.4169157563106.41408
353,85054,433232.7825,76123,903222.91464
422,38822,388106.4120,76921,169183.56453
532,92433,902164.0014,20812,965154.27399
627,17327,186144.5811,87210,927125.34385
716,89116,60288.0115,21918,846164.00421
855,58754,904183.5614,04414,221203.20503
928,35428,73297.125688546579.16529
1060,14659,403341.88981910,676242.67291
1124,67125,601144.5886279039125.34337
1235,29335,112134.9317,37317,537183.56490
1377568485610.6977327686134.93259
1437,89738,291144.5816,97817,075213.05470
1515,37616,86588.0120,58819,499154.27446
1645,77346,440173.7728,80830,308252.56519
1717,35617,06297.1216,49416,498106.41474
1864,57563,646272.3710,83010,623203.20388
1922,84123,86197.127100692397.12449
2021,44520,73597.1210,80610,364106.41453
2117,52916,66388.0112,02611,422134.93405
2214,03114,03179.1688189227115.83376
2310,21510,117610.699339850879.16395
2422,47822,31397.1277539370106.41445
2526,21625,909134.9385689541125.34367
2616,39816,64988.0124422194232.06429
2717,87617,985106.4112,63510,86497.12413
2838,09438,575144.5826,90929,423252.56515
2965,21763,783262.4638,32738,258282.29513
3017,56118,930115.8310,1859289125.34329
3142,77542,389183.5657996216106.41395
3222,86122,79388.018307826988.01518
3321,79621,84988.018665865297.12491
3417,21216,64888.019280907097.12418
3539,76939,128183.5622,87921,255242.67414
3626,02726,613134.9318,09417,368164.00417
3772,24771,788302.1345,86445,521591.08423
3853,92253,663203.2012,41713,487203.20444
3923,68923,606154.2712,11812,222144.58326
4016,11216,54497.1257265370512.82389
4152,82051,293252.5618,31318,414232.78384
4247,10547,196203.2025,16825,835222.91468
4322,61022,732115.8334853543610.69352
4485,00984,174501.2815,17414,346252.56304
4544,62444,198203.2010,00010,126115.83418
4647,57747,492183.5672156680106.41444
4727,16827,367106.415648579579.16472
4843,96844,156203.2028,42129,072252.56447
4927,86929,934183.5670896826416.03446
5017,97619,21397.1211,37210,61279.16484
5128,86029,024115.837773724288.01475
5244,58944,200183.5661125523106.41406
5335,88236,441144.585838570597.12429
5415,85316,07179.167081760679.16443
5526,19225,951115.8384057094106.41419
5685,25284,638361.7826,27628,467371.73413
5739,22338,667173.7797919380144.58396
5833,08432,123106.414714456797.12503
5937,99937,285154.2715,39615,864183.56450
6034,84534,814144.5859515853125.34397
6173,26472,385252.5648734534106.41451
6267,69168,701401.6067157211302.13258
Table 3. Dataset created using semantic segmentation and stereo images (cont.)
Table 3. Dataset created using semantic segmentation and stereo images (cont.)
IDLeft Side Shooting (Pixel)Right Side Side Shooting (Pixel)Side Pixel Difference (Pixel)Side Distance (m)Left Back Shooting (Pixel)Right Back Shooting (Pixel)Back Pixel Difference (Pixel)Back Distance (m)Real Weight (kg)
6312,48212,60779.1637743962321.37410
6424,64124,474125.3421,62921,273193.37423
6520,07919,97888.0157265492610.69458
6617,76617,84588.0113,94313,465115.83462
6716,25316,78379.166386639879.16436
6818,59218,50788.016002605688.01409
6919,40619,573252.5655755958262.46133
7045,51544,650183.5617,59017,739164.00481
7113,85013,713222.9167536239183.56131
7226,39627,250183.5613,92012,953183.56298
7337,66637,200144.5886338590144.58438
7424,94224,704106.4111,84012,272144.58445
7528,71728,191125.3464776478610.69460
7620,34619,837125.3445414541106.41283
7735,89636,164164.0095698195125.34398
7830,09330,029125.3411,4029606125.34450
7924,08223,725144.5876627535144.58300
8016,45016,53697.124717453797.12312
8113,61313,58279.1673797372106.41357
8240,19540,141222.9138313852106.41294
8321,01420,70388.0190178616106.41465
8418,57118,75888.0153145374512.82453
8516,09916,05179.1690679352115.83417
Table 4. Properties of artificial neural networks used in the proposed system.
Table 4. Properties of artificial neural networks used in the proposed system.
ArchitectureNum. of Elements in Input LayerNum. of Nodes in the First Hidden LayerNum. of Nodes in the Second Hidden LayerNum. of Elements in Output LayerTotal Num. of Parameters in the Network
ANN-13646414488
ANN-23646414488
ANN-38646414818
Table 5. Prediction values on the test dataset of ANN-1 network.
Table 5. Prediction values on the test dataset of ANN-1 network.
IDLeft Side Shooting (Pixel)Right Side Shooting (Pixel)Side Pixel Difference (Pixel)Real Weight (kg)Estimated Weight (kg)Difference (kg)
960,14659,40334291275.0815.17
2114,03114,0317376418.70−42.70
3672,24771,78830423421.771.22
4444,62444,19820418396.7821.21
4743,96844,15620447394.9752.02
6420,07919,9788458446.5711.42
6718,59218,5078409434.73−25.73
7013,85013,71322131139.36−8.36
8140,19540,14122294324.35−30.35
Table 6. Prediction values on the test dataset of ANN-2 network.
Table 6. Prediction values on the test dataset of ANN-2 network.
IDLeft Back Shooting (Pixel)Right Back Shooting (Pixel)Back Pixel Difference (Pixel)Real Weight (kg)Estimated Weight (kg)Difference (kg)
9981910,67624291302.66−11.66
218818922711376415.95−39.95
3645,86445,52159423632.53−209.53
4410,00010,12611418422.55−4.55
4728,42129,07225447492.18−45.18
64572654926458441.3916.60
67600260568409425.11−16.11
706753623918131332.30−201.30
813831385210294391.84−97.84
Table 7. Prediction values on the test dataset of ANN-3 network.
Table 7. Prediction values on the test dataset of ANN-3 network.
IDLeft Side Shooting (Pixel)Right Side Shooting (Pixel)Side Pixel Difference (Pixel)Side Distance (m)Left Back Shotting (Pixel)Right Back Shotting (Pixel)Back Pixel Difference (Pixel)Back Distance (m)Real Weight (kg)Estimated Weight (kg)Difference (kg)
960,14659,403341.88981910,676242.67291281.719.29
2114,03114,03179.1688189227115.83376389.12−13.12
3672,24771,788302.1345,86445,521591.08423566.15−143.15
4444,62444,198203.2010,00010,126115.83418389.9228.08
4743,96844,156203.2028,42129,072252.56447438.578.43
6420,07919,97888.0157265492610.68458456.091.91
6718,59218,50788.016002605688.01409430.47−21.47
7013,85013,713222.9167536239183.56131130.80.2
8140,19540,141222.9138313852106.41294272.0221.98
Table 8. Example of a table showing that its caption is as wide as the table itself and justified.
Table 8. Example of a table showing that its caption is as wide as the table itself and justified.
KReal WeightEstimated Weight
1399.8889403.3733
2403.8889395.3233
3390.6667407.6100
4378.5556381.4744
5451.0000447.5600
6436.0000431.0100
7434.5000436.4238
8384.8750382.3738
9380.2500410.3075
10459.8750468.5738
Average411.9500416.4030
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Guvenoglu, E. Determination of the Live Weight of Farm Animals with Deep Learning and Semantic Segmentation Techniques. Appl. Sci. 2023, 13, 6944. https://doi.org/10.3390/app13126944

AMA Style

Guvenoglu E. Determination of the Live Weight of Farm Animals with Deep Learning and Semantic Segmentation Techniques. Applied Sciences. 2023; 13(12):6944. https://doi.org/10.3390/app13126944

Chicago/Turabian Style

Guvenoglu, Erdal. 2023. "Determination of the Live Weight of Farm Animals with Deep Learning and Semantic Segmentation Techniques" Applied Sciences 13, no. 12: 6944. https://doi.org/10.3390/app13126944

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop