Determination of Non-Digestible Parts in Dairy Cattle Feces Using U-NET and F-CRN Architectures

Özden, Cevher; Bulut, Mutlu; Çanga Boğa, Demet; Boğa, Mustafa

doi:10.3390/vetsci10010032

Open AccessArticle

Determination of Non-Digestible Parts in Dairy Cattle Feces Using U-NET and F-CRN Architectures

¹

Computer Engineering, Akdeniz University, 07040 Antalya, Türkiye

²

Department of Agricultural Engineering, Çukurova University, 01120 Adana, Türkiye

³

Department of Chemistry and Chemical Processing, Osmaniye Korkut Ata University, 80050 Osmaniye, Türkiye

⁴

Bor Vocational School, Niğde Ömer Halisdemir University, 51700 Niğde, Türkiye

^*

Author to whom correspondence should be addressed.

Vet. Sci. 2023, 10(1), 32; https://doi.org/10.3390/vetsci10010032

Submission received: 14 November 2022 / Revised: 20 December 2022 / Accepted: 29 December 2022 / Published: 1 January 2023

Download

Browse Figures

Versions Notes

Abstract

:

Simple Summary

This study employs Fully Convolutional Regression Networks (FCRN) and U-Shaped Convolutional Network for Image Segmentation (U-Net) architectures tailored to the dataset containing dropping images of dairy cows collected from three different private dairy farms in Nigde. The main purpose of this study is to detect the number of undigested grains in dropping images in order to give some useful feedback to raiser. It is a novel study that uses two different regression neural networks on object counting in dropping images. To our knowledge, it is the first study that counts objects in dropping images and provides information of how effectively dairy cows digest their daily rations.

Abstract

Deep learning algorithms can now be used to identify, locate, and count items in an image thanks to advancements in image processing technology. The successful application of image processing technology in different fields has attracted much attention in the field of agriculture in recent years. This research was done to ascertain the number of indigestible cereal grains in animal feces using an image processing method. In this study, a regression-based way of object counting was used to predict the number of cereal grains in the feces. For this purpose, we have developed two different neural network architectures based upon Fully Convolutional Regression Networks (FCRN) and U-Net. The images used in the study were obtained from three different dairy cows enterprises operating in Nigde Province. The dataset consists of the 277 distinct dropping images of dairy cows in the farm. According to findings of the study, both models yielded quite acceptable prediction accuracy with U-Net providing slightly better prediction with a MAE value of 16.69 in the best case, compared to 23.65 MAE value of FCRN with the same batch.

Keywords:

image processing; deep learning; livestock; images of feces; indigestible parts

1. Introduction

During the recent years, with the rapid advancement of image processing technology, counting and determining the location of objects in a certain area of an image has become very popular and is successfully applied in many sectors [1]. Today, locating and counting objects in an image is increasingly used in many areas such as city management [2], regulation of city traffic [3], parking lot layout [4], farm management [5], and counting fruits in a garden [6].

Object counting techniques try to most accurately determine the number of objects in an image or video segment. In contrast, object detection identifies objects of the same type in an image and pinpoints their locations [7]. The main purpose of object counting applications is to calculate the number of cars on a road or parking lot [8,9,10,11], the number of people in a crowded area [12,13,14], the number of goats and sheep in an livestock farm [15,16], and the number of apples in an cultivated area [17].

Object detection and counting is a typical computer vision task. Convolutional Neural Networks (CNNs), Support Vector Machines (SVMs), and other techniques are among the many available methods for solving the issue [18]. CNN automatically extracts the features required to solve the problem. This is one of the most important properties of convolutional networks. In traditional machine learning methods, feature extraction is not done automatically. These methods require expert knowledge. CNN architectures can be trained with large datasets, and GPU support can be provided [19]. Thanks to such advantages, artificial intelligence is also developing with the improvement of convolutional neural network methods day by day.

Object counting techniques are generally grouped under four headings: object detection and counting, counting with orbital clustering, counting with global regression, and counting based on object density estimation [20]. Object detection and counting according to the basic threshold rule is one of the most used methods. The most important advantage of this method is that it performs very fast counting, while the disadvantage is that the counting performance decreases when more than one object approaches each other [21,22,23]. In the counting process using the orbital clustering technique, crowded moving objects are counted [24,25]. This method is only applied to track objects in the direction of the desired trajectories. In addition, because rapid training and testing procedures are used in the counting process with the global regression method, it gives good results in estimations [26]. However, the number of objects to be estimated across the entire image is calculated as the integral of the density map. Unlike other methods, this method gives the density map of the object for the analysis of object distribution throughout the image [27].

A one-way regression is what the deep learning method for object counting in image processing entails. Based on training data for the number of potential output units, these networks adopt a specified categorization architecture [28]. In addition, it has a single output unit that produces a real value as close as possible to the total number of objects to be counted [29,30]. High-dimensional nonlinear regression models are a category that can be applied to both circumstances. The number of estimators that must be taken into account here is inversely related to the quantity of input pixels. Recently, with the version updates of deep learning libraries, the object counting APIs (Application Programming Interface) of Tensorflow and YOLO libraries offer practical uses [31]. Given that CNN models are capable of classifying and detecting items from a wide range of classes, counting without localizing is not difficult for these models. In reality, representations of regression-based models’ feature maps have shown that these maps contain information on object positions.

The successful application of image processing technology in different fields has attracted much attention in the field of agriculture in recent years. In this study, we focused on detecting the amount of indigestible cereal grains in animal feces using image processing methods. In animal nutrition, the digestion and absorption of feed ingredients are important for the animal’s performance and for making the most of the feed ingredients in the ration. For this purpose, fecal separators and substances that are not digested in feces should be checked from time to time under farm conditions. In the process of washing the feces with this system, the problems related to the processing of the feed and the determination of the negative conditions in the rumen conditions of the animal are among the issues to be considered. Since the use of dung washing screens (stool separators) is time consuming and requires experienced staff on the farm, such applications are difficult to implement in every farm.

In recent studies on farm animals, researchers focused on deep learning algorithms, used these algorithms with experiments, and concluded that these algorithms worked with a high percentage of success. Next-generation methods such as Mask R-CNN, Faster R-CNN, YOLO (v3 and v4), and DeepLab v3 have been used for deep learning algorithms through image processing [32]. In addition, network architectures such as ResNet, Inception, Xception, and VGG16 have shown successful results in studies on farm animals. The high success rate of deep learning algorithms improves the ability to predict ahead in different areas of livestock and provides more efficient planning. Understanding the working principles of new generation image processing techniques and choosing appropriate algorithms can save time and increase economic efficiency in livestock activities [32,33,34]. The following is a list of the study’s main contributions to the literature: (i) Counting objects using image processing technology has previously been successfully applied in car counting, people counting, and fruit counting. However, the issue of detecting indigestible substances in animal feces is not yet included in the literature. In this study, we have improved upon highly advanced CNN architectures that generate a heat map for counting non-digestible grains in feces in one go. (ii) The effect of processing stool images on the success of the heat map prediction of indigestible grains has been demonstrated by detailed experiments. (iii) In farm conditions, fecal separators and materials that are not digested in feces should be checked periodically. The use of dung washing screens (stool separators) is a time-consuming and difficult task on the farm. With this method, we have achieved the number of indigestible grains in feces with a 16.69 MAE rate. The findings we obtained in this paper indicate that the object counting method has a significant potential in solving daily tasks of dairy cow farms.

2. Materials and Methods

2.1. Material

The data used in this study was collected from the 3 different private farms with an average of 100 milking cows located in Nigde province. Dairy cattle were a fed daily total mixed ration (full feeding) with 45% roughage and 55% concentrate feed. Roughage consists of different proportions of alfalfa, wheat straw, and beet pulp; concentrated feed consists of different ratios of milk feed (19% HP, 2700 mcal energy), barley paste, and soybean meal. The dataset consists of the 277 distinct dropping images of dairy cows. The images were taken with a cellphone camera under different lighting and weather conditions. All images were resized to 256 × 256 pixels. Due to the relatively low number of images, data augmentation and upsampling methods were employed in the network architectures. Some of the original images were too large with high pixel intensity. Those images were cropped into smaller chunks of images to oversample the dataset. In addition, horizontal and vertical flip were applied for data augmentation. A total of 80% of the images were used during the training phase, and the rest were used for testing purposes.

Supervised learning methods require labeled data for the learning phase. Therefore, all images had to be properly annotated. Most recent networks such as YOLO requires the bounding boxes that surround the object to be detected, which is a quite labor-intensive process, especially for small objects such as fruits or grains. Another approach used in the literature is regression-based estimates for counting objects instead of localizing and detecting the objects. In this approach, the problem is handled as an estimation task of a continuous density function whose integral over any region of the image yields the object count. We employed the second approach in the study. For this purpose, each undigested grain in each image was spotted by putting a red dot for each instance. In this way, the burden of detecting and localizing each object instance with a bounding box is avoided and a counting problem is approached as a regression task rather than classification. The data and the codes are shared in this public Github repository at https://github.com/cevher/Droppings (accessed on 14 November 2022).

2.2. Method

Convolutional Neural Networks are densely used for image classification and detection tasks and have achieved great success in the recent decade. They have been used for object counting tasks. The common way they work is to first detect the objects in an image and then count the found instances of the object, which is an effective method but requires the use bounding box annotations as labels during training. Labeling with a bounding box is quite hard to obtain especially for small-sized crowded objects such as the grains in droppings. The other methods used in the literature for this purpose were to count objects by estimating density heat maps much like a regression estimation. In this type of learning, dot-annotated density maps are used as labels. Dot annotation only gives the information about the location of the object and is much less labor intensive compared to bounding box annotation. As mentioned in the project of Lempitsky and Zisserman (2010) [35], the main idea is: given an image I, the goal is to obtain a density function F as a real function of pixels in the given image. Then, given the estimated density function of F and the query about the number of objects in the image I, the number of objects is estimated by integrating F over the entire I. If F is integrated over a subregion of the image, it gives the number of objects in this subregion [35]. The mechanism during training works in the following way: dot annotations are each represented by a Gaussian. A density surface D(x) is formed by the superposition of these Gaussians. The task is to regress this density surface from the corresponding cell image I(x). This is achieved by training a CNN using the mean square error between the output heat map and the target density surface as the loss function for regression. The trained network then predicts the density heat map D(x) given an input cell image I(x) [36]. Fully Convolutional Regression Networks and its derivative U-Net are two most common network architectures proposed in literature for this purpose. Both are used for image segmentation in various fields such as biomedical image processing, cell counting, car counting, and pedestrian counting. Another approach could be to use pretrained models such as ResNET or VGG versions. However, the feces dataset has not been used to train such models. Therefore, in order to change the weights of the pretrained models, thousands of images would be required. Considering that we only have 277 images in the feces dataset, it would be impossible to make a reasonable change in the weights. For this reason, transfer learning is not considered in this paper.

In order to create labels for training images, each pixel in an image is connected to a real-valued feature vector for a specific set of training images. Each training image has a collection of 2D points added to it; the total number of points indicates how many items the researchers have labeled. Thus, the density functions consist of real-valued values over pixel grids. The output of applying a convolution with a Gaussian kernel across pixel grids to each input image results in a density map known as the ground truth density function, whose integrals over image regions would correspond to the item counts. Gaussian Kernel function is defined as following:

G (x, y) = {a * \exp}^{- (a {(x - x_{0})}^{2} + 2 b (x - x_{0}) (y - y_{0}) + c (y - y_{0})^{2})}

(1)

G (x, y) = \frac{G (x, y)}{\max (G (x, y))}

(2)

The goal is to learn the linear transformation of the feature representation that approximates the density function at each function from a set of training photos with their ground truth densities. Two distinct deep learning architectures were employed in the study to achieve this goal. One is the Fully Convolutional Regression Networks (FCRNs). Different types of this network architecture have been successfully used for solving problems such as cell counting in microscopy images. The FCRN architecture used in this study was developed upon VGG-net (Figure 1). Small kernel sizes (3 × 3 and 5 × 5 pixels) were preferred and, in this way, the number of feature maps could be increased to compensate for the loss of spatial information due to max pooling applied after every two convolutional layers in the architecture. This fine-tuning enabled the network to transfer spatial information to the higher layers of the network. The overall architecture of FCRNA consisted of two main units: Conv units and UpConv units. Each Conv unit included a convolution layer, followed by a Rectified Linear unit (ReLU) and a max-pooling; in contrast, each UpConv unit consisted of a deconvolution, ReLU, and a convolution layer. A fully connected layer was implemented after the 4th Conv unit as convolution.

The second deep learning architecture is of U-Net architecture, which was built upon Fully Convolutional Networks (Figure 2). This type of network is very apt at working with a small number of training images and gives very precise segmentations. The architecture consists of two main parts: contracting and expanding paths. The contracting part of the architecture has the typical convolutional network architecture, which consists of the repeated application of two 3 × 3 convolution layers, followed by a ReLU and max pooling. This operation downsamples the given input images. The expanding part of the architecture consisted of an upsampling operation, followed by a 2 × 2 up-convolution layer. The final layer of the network is 1 × 1 convolution to map the feature vector. The network is made of 23 convolutional layers in total.

In the implementation phase, samples of 256 × 256 pixels were obtained from the center part of each dropping in the images, thus we increased the number of input images for training to some degree. In order to provide orthogonal initialization for the convolutional layer, each sample image was normalized. Stochastic Gradient Descent in PyTorch Framework was used for optimization and Mean Squared Error (MSELoss) function in PyTorch was used as loss function in both networks. The learning rate was set to 0.01, and it was decreased by 10 factors at every 10 epochs. Momentum and weight decay were taken as 0.8 and 0.005, respectively. The training was done with 200 epochs with each network. Dropout was not employed. Both U-Net and FCNR architectures employed three down sampling and three upsampling convolutional blocks with fixed 3 × 3 and 5 × 5 filter size. Each block in U-Net consisted of two convolutional layers, where FCNR started with 1 convolutional layer and the number of convolutional layers was increased by 1 for each subsequent layer in order to compensate for the loss of high-resolution information due to pooling. Both models were trained on randomly separated 80% of the image data. Upon finishing the training phase, networks were saved as pth files and they were tested on the unseen 20% of the image data. The evaluation was performed on a computer with Intel(R) Core(TM) i7-7500U CPU, 2.70 GHz. The results are given with Mean Absolute Error (MAE) and Root Mean Square Error (RMSA).

x_{i}, y_{i} \to {actual and target count of grains for i}^{th} image; N = number of image :

MAE = \frac{\sum_{i} |x_{i} - y_{i}|}{N}

(3)

RMSE = \sqrt{\frac{\sum_{i} {(x_{i} - y_{i})}^{2}}{N}}

(4)

The most popular metrics for evaluating accuracy for continuous variables are MAE and RMSE. Both metrics are indifferent to the direction of error and can range between 0 and

\infty

; lower scores mean better prediction.

3. Results

The dataset containing photos of dairy cow droppings was subjected to the two network designs. Training sets made up 80% of the dataset, and validation sets made up 20%. The large images were randomly divided into separated sub-images to increase the amount of data. By dividing the appropriate patch’s standard deviation and removing its mean, each patch obtained in this manner was normalized.

In RMSE, errors are squared before taking their average, which can be handy if large errors are undesirable. Both of our models were trained for 100 epochs with different batch sizes (1, 8 and 16). The obtained results are summarized for each model with respect to their MAE and RMSE scores.

From Table 1, it can be seen that U-Net architecture achieved the best MAE and RMSE values compared to FCRN in all of the different batch sizes. The lowest MAE value was achieved by U-Net with one batch size training. Sample results obtained with two architectures are given in Figure 3.

Some examples of the results of both networks are given in Figure 4 Input images and ground-truth dot annotations are given as input to networks during training. Networks produce estimated density maps based on the located object centers corresponding to given input images. As can be seen, both networks closely detected density areas of undigested grains. The model summaries for FCRN and U-net are presented in Appendix A and Appendix B, respectively.

4. Discussion

Counting small and crowded objects of the same type in an image or video footage is a very difficult and time-consuming task. In order to overcome this problem, artificial intelligence algorithms have been used in recent years. In this study, we used Convolutional Regression Networks due to their relatively easier labelling method, which have been applied in recent years for counting objects in an image. Different CRN architectures have been used in various researches in the field of biology, remote sensing, surveillance, etc. It is thought that the method we used in our study is new in the field of animal husbandry and the need for such applications has increased recently. For this reason, we predict that the interest in this subject and the number of these studies may increase in the future. However, due to the lack of studies on the determination of indigestible products in the feces of animals, a comparison could not be made in this study. However, some studies using this architecture are briefly summarized for comparison purposes in the following paragraph.

The results of the study are quite acceptable when compared to other studies in the literature that implement different regression network architectures on various datasets. One of them is Õnoro-Rubio and López-Sastre’s (2016) [37] study. They proposed Counting Convolution Neural Network and Hydra Convolutional Neural Network models and tested them on the TRANCOS dataset, which is a publicly available dataset of 1244 images of different traffic scenes. Their MAE results ranged between 10.99 and 25.99. A recent study was carried out by Kilic and Ozturk [38], who proposed a new architecture called Heatmap Learner Convolutional Neural Network (HLCNN) developed upon VGG-16. They conducted their tests on CARPK and PUCPR+ datasets, which are publicly available car images captured in a parking lot. They compared their results with the results of previous studies using the same datasets. They reported a state-of-the-art result as 2.12 and 2.52 MAE values; the previous studies’ MAE results ranged between 4.45 and 156.00. Pedestrian and car datasets have been heavily researched in most of the previous studies. Xie et al. (2018) [36] used CRN to count microscopic cells and they revealed that CRNs made quite good predictions without fine-tuning the count of cells in real microscope images. It has also been stated that it is possible to obtain better results by fine-tuning the CRN performance to the images. Haider et al. (2021) [39] used the Fully Convolutional Regression Network architecture for human detection from aerial thermal camera images. Two well-constructed models called FCN and U-net were chosen to compare the detection performance. It has been found that the method used in that study can detect people in crowded images with high precision, and that it provides better results compared to new techniques that detect people using thermal imaging. The precision estimate of the FCRN architecture proposed in the study was found to be 99%. Kılıç and Özturk (2021) [38] tried to estimate an exact number of cars in aerial images using convolutional neural network architecture. CNN architecture was developed by adding three convolutional adaptation layers to determine the heat maps of the cars to be counted. In the experiments, it has been shown that the developed model gives very successful results in determining the points where the cars are located. In addition, it was emphasized that more successful results were obtained thanks to the Heatmap CNN compared to the existing object counting techniques.

The current paper is the first study implemented on dropping images. Despite this fact, the results we achieved in this study are quite close to the results of the previous studies implemented on other image datasets. Sample results obtained with two architectures are given in Figure 3. When the literature is examined, it can be seen that a number of studies examining the counting of objects are classified into two main groups: counting by detection and regression. Object detection system inculcates CNN models to reveal the properties of sample object suggestions and then estimate the class probabilities and regression bounding boxes of those suggestions. These methods are defined as two-stage object detection, and the best known methods belong to the R-CNN family [40,41]. While two-stage methods are somewhat successful in detecting objects, single-shot methods such as YOLO and SSD provide a faster experience [42,43].

Object detection is considered a useful application of deep neural networks because it significantly achieves visual-based detection sensitivity. High-dimensional images are processed with the help of convolutional neural networks so that they can have strong substantiality and endurance to noise and deficit [44]. It has been determined that deep convolutional neural networks are used in many studies for agricultural purposes. In the studies of Kamileris and Prenafeta-Boldú (2018) [45], 40 papers using the words deep learning and agriculture were identified and these articles were evaluated with bibliographic analysis. In the study, it was reported that deep learning methods achieved higher detection sensitivity compared to previous image processing methods. Mirhaji et al. (2021) [46] carried out a study on oranges using image processing technology under various lighting conditions. The YOLOv4 network, one of the image processing techniques, was used in the study and almost 90% success was achieved. In a similar study, Koirala et al. (2019) [47] used RCNN from the YOLO series. This method has achieved an 80% success in detecting fruits on trees.

5. Conclusions

In this paper, two different architectures were used for the detection of indigestible substances in the feces of dairy cows. A heatmap-generated CNN architecture was used to count the indigestible fractions in the feces of dairy cows in a single shot. After computer processing the feces images, the effect of indigestible fractions on the success of the heatmap estimation is shown. Under farm conditions, indigestible materials should be checked during the day with feces sieves. It is quite difficult to collect fecal data in livestock enterprises. In addition, the difficulties in the processing and evaluation of the obtained data and the inhomogeneity of the data limits the number of such studies and these conditions are the fundamental limitations of the study. However, it is important to visually determine the indigestible parts of feces in animal feeding. In order to determine this, stool sieves are widely used in Turkey. However, it takes an average of 5–7 min to sift the feces of each animal, which is challenging work. In this case, it would be possible to examine a few feces in the farm in general, but not the whole farm. Examination of a few feces will not cover the entire farm and will prevent a clear idea of the entire herd from forming. For this reason, by using FCRN architecture in our study, it will be possible to evaluate the feces in the whole farm in a short time to get faster results and to prevent the problem of determining the indigestible parts of the feces. Providing fast and accurate results in a short time with an artificial intelligence method will lead to more profitable livestock and more performance increases in the farm, which are the main contributions of the current study of the dairy industry. The method will enable the detection of the negativities in the processing and preparation of feeds for animals, and any problem can be eliminated on short notice. In addition, with this method, the negative effects of the animal’s digestive system can be determined and it would be possible to take nutritional measures with the ration. This timely intervention will increase the profitability of the farm by providing timely prevention of health problems and could increase performance. These results show that the object counting method has significant potential in solving the daily tasks of dairy farms more effectively.

Author Contributions

Conceptualization, C.Ö., M.B. (Mutlu Bulut), D.Ç.B. and M.B. (Mustafa Boğa); methodology, C.Ö.; software, C.Ö.; resources, M.B. (Mutlu Bulut) and D.Ç.B.; writing—original draft preparation, M.B. (Mutlu Bulut) and D.Ç.B.; writing—review and editing, C.Ö. and M.B. (Mustafa Boğa); supervision, M.B. (Mustafa Boğa); project administration, C.Ö. and M.B. (Mustafa Boğa); funding acquisition: there is no funding. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Images of animal feces were used as material in the study. Since this material is not within the scope of experimental animals, it does not require ethical review and approval.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data and the codes are shared in this public Github repository (https://github.com/cevher/Droppings). Accessed on 14 November 2022.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Model Summary for FCRN

Layer (Type)	Output Shape	Param #
Conv2d-1	[−1, 32, 256, 256]	864
BatchNorm2d-2	[−1, 32, 256, 256]	64
ReLU-3	[−1, 32, 256, 256]	0
MaxPool2d-4	[−1, 32, 128, 128]	0
Conv2d-5	[−1, 64, 128, 128]	18,432
BatchNorm2d-6	[−1, 64, 128, 128]	128
ReLU-7	[−1, 64, 128, 128]	0
MaxPool2d-8	[−1, 64, 64, 64]	0
Conv2d-9	[−1, 128, 64, 64]	73,728
BatchNorm2d-10	[−1, 128, 64, 64]	256
ReLU-11	[−1, 128, 64, 64]	0
MaxPool2d-12	[−1, 128, 32, 32]	0
Conv2d-13	[−1, 512, 32, 32]	589,824
BatchNorm2d-14	[−1, 512, 32, 32]	1024
ReLU-15	[−1, 512, 32, 32]	0
Upsample-16	[−1, 512, 64, 64]	0
Conv2d-17	[−1, 128, 64, 64]	589,824
BatchNorm2d-18	[−1, 128, 64, 64]	256
ReLU-19	[−1, 128, 64, 64]	0
Upsample-20	[−1, 128, 128, 128]	0
Conv2d-21	[−1, 64, 128, 128]	73,728
BatchNorm2d-22	[−1, 64, 128, 128]	128
ReLU-23	[−1, 64, 128, 128]	0
Upsample-24	[−1, 64, 256, 256]	0
Conv2d-25	[−1, 1, 256, 256]	576
BatchNorm2d-26	[−1, 1, 256, 256]	2
ReLU-27	[−1, 1, 256, 256]	0
Total params: 1,348,834 Trainable params: 134,883

Appendix B. Model Summary for U-Net

Layer (Type)	Output Shape	Param #
Conv2d-1	[−1, 64, 256, 256]	1728
BatchNorm2d-2	[−1, 64, 256, 256]	128
ReLU-3	[−1, 64, 256, 256]	0
Conv2d-4	[−1, 64, 256, 256]	36,864
BatchNorm2d-5	[−1, 64, 256, 256]	128
ReLU-6	[−1, 64, 256, 256]	0
Conv2d-7	[−1, 64, 128, 128]	36,864
BatchNorm2d-8	[−1, 64, 128, 128]	128
ReLU-9	[−1, 64, 128, 128]	0
Conv2d-10	[−1, 64, 128, 128]	36,864
BatchNorm2d-11	[−1, 64, 128, 128]	128
ReLU-12	[−1, 64, 128, 128]	0
Conv2d-13	[−1, 64, 64, 64]	36,864
BatchNorm2d-14	[−1, 64, 64, 64]	128
ReLU-15	[−1, 64, 64, 64]	0
Conv2d-16	[−1, 64, 64, 64]	36,864
BatchNorm2d-17	[−1, 64, 64, 64]	128
ReLU-18	[−1, 64, 64, 64]	0
Conv2d-19	[−1, 64, 32, 32]	36,864
BatchNorm2d-20	[−1, 64, 32, 32]	128
ReLU-21	[−1, 64, 32, 32]	0
Conv2d-22	[−1, 64, 32, 32]	36,864
BatchNorm2d-23	[−1, 64, 32, 32]	128
ReLU-24	[−1, 64, 32, 32]	0
Upsample-25	[−1, 64, 64, 64]	0
ConvCat-26	[−1, 128, 64, 64]	0
Conv2d-27	[−1, 64, 64, 64]	73,728
BatchNorm2d-28	[−1, 64, 64, 64]	128
ReLU-29	[−1, 64, 64, 64]	0
Conv2d-30	[−1, 64, 64, 64]	36,864
BatchNorm2d-31	[−1, 64, 64, 64]	128
ReLU-32	[−1, 64, 64, 64]	0
Upsample-33	[−1, 64, 128, 128]	0
ConvCat-34	[−1, 128, 128, 128]	0
Conv2d-35	[−1, 64, 128, 128]	73,728
BatchNorm2d-36	[−1, 64, 128, 128]	128
ReLU-37	[−1, 64, 128, 128]	0
Conv2d-38	[−1, 64, 128, 128]	36,864
BatchNorm2d-39	[−1, 64, 128, 128]	128
ReLU-40	[−1, 64, 128, 128]	0
Upsample-41	[−1, 64, 256, 256]	0
ConvCat-42	[−1, 128, 256, 256]	0
Conv2d-43	[−1, 64, 256, 256]	73,728
BatchNorm2d-44	[−1, 64, 256, 256]	128
ReLU-45	[−1, 64, 256, 256]	0
Conv2d-46	[−1, 64, 256, 256]	36,864
BatchNorm2d-47	[−1, 64, 256, 256]	128
ReLU-48	[−1, 64, 256, 256]	0
Conv2d-49	[−1, 1, 256, 256]	64
Total params: 593,408 Trainable params: 593,408

References

Dai, Z.; Song, H.; Wang, X.; Fang, Y.; Yun, X.; Zhang, Z.; Li, H. Video-Based Vehicle Counting Framework. IEEE Access 2019, 7, 64460–64470. [Google Scholar] [CrossRef]
Wang, Y.; Zhang, X. Autonomous Garbage Detection for Intelligent Urban Management. In Proceedings of the MATEC Web of Conferences, Shanghai, China, 12–14 October 2018; Volume 232, pp. 1–5. [Google Scholar]
Boukerche, A.; Hou, Z. Object Detection Using Deep Learning Methods in Traffic Scenarios. ACM Comput. Surv. (CSUR) 2021, 54, 1–35. [Google Scholar] [CrossRef]
Chen, H.; Zhang, L.; Ma, J.; Zhang, J. Target Heat-Map Network: An End-to-End Deep Network for Target Detection in Remote Sensing Images. Neurocomputing 2019, 331, 375–387. [Google Scholar] [CrossRef]
Choi, D.; Lee, W.S.; Schueller, J.K.; Ehsani, R.; Roka, F.; Diamond, J. A Performance Comparison of RGB, NIR, and Depth Images in Immature Citrus Detection Using Deep Learning Algorithms for Yield Prediction. In Proceedings of the ASABE Annual International Meeting, Spokane, WA, USA, 16–19 July 2017. [Google Scholar] [CrossRef]
Khan, R.; Debnath, R. Multi Class Fruit Classification Using Efficient Object Detection and Recognition Techniques. Int. J. Image Graph. Signal Process. 2019, 11, 1–18. [Google Scholar] [CrossRef]
Zou, Z.; Shi, Z.; Guo, Y.; Ye, J. Object Detection in 20 Years: A Survey. arXiv 2019, arXiv:1905.05055. [Google Scholar]
Dzulkurnain, Z.; Mahamad, A.K.; Saon, S.; Ahmadon, M.A.; Yamaguchi, S. Internet of Things (IoT) Based Traffic Management & Routing Solution for Parking Space. Indones. J. Electr. Eng. Comput. Sci. 2019, 15, 336–345. [Google Scholar] [CrossRef]
Ciampi, L.; Gennaro, C.; Carrara, F.; Falchi, F.; Vairo, C.; Amato, G. Multi-Camera Vehicle Counting Using Edge-AI. Expert Syst. Appl. 2022, 207, 117929. [Google Scholar] [CrossRef]
Fachrie, M. A Simple Vehicle Counting System Using Deep Learning with YOLOv3 Model. J. RESTI (Rekayasa Sist. Dan Teknol. Inf.) 2020, 4, 462–468. [Google Scholar] [CrossRef]
Di Mauro, D.; Furnari, A.; Patanè, G.; Battiato, S.; Farinella, G.M. Estimating the Occupancy Status of Parking Areas by Counting Cars and Non-Empty Stalls. J. Vis. Commun. Image Represent. 2019, 62, 234–244. [Google Scholar] [CrossRef]
Pervaiz, M.; Ghadi, Y.Y.; Gochoo, M.; Jalal, A.; Kamal, S.; Kim, D.S. A Smart Surveillance System for People Counting and Tracking Using Particle Flow and Modified Som. Sustainability 2021, 13, 5367. [Google Scholar] [CrossRef]
Ren, P.; Wang, L.; Fang, W.; Song, S.; Djahel, S. A Novel Squeeze YOLO-Based Real-Time People Counting Approach. Int. J. Bio-Inspired Comput. 2020, 16, 94–101. [Google Scholar] [CrossRef]
Nogueira, V., Jr.; Oliveira, H.; Silva, J.A.; Vieira, T.; Oliveira, K. RetailNet: A Deep Learning Approach for People Counting and Hot Spots Detection in Retail Stores. In Proceedings of the 2019 32nd SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI), Rio de Janeiro, Brazil, 28–30 October 2019. [Google Scholar] [CrossRef]
Sarwar, F.; Griffin, A.; Periasamy, P.; Portas, K.; Law, J. Detecting and Counting Sheep with a Convolutional Neural Network. In Proceedings of the AVSS 2018—2018 15th IEEE International Conference on Advanced Video and Signal-Based Surveillance, Auckland, New Zealand, 27–30 November 2018; pp. 1–6. [Google Scholar] [CrossRef]
Li, Z.; Wang, J.; Tang, H.; Huang, C.; Yang, F.; Chen, B.; Wang, X.; Xin, X.; Ge, Y. Predicting Grassland Leaf Area Index in the Meadow Steppes of Northern China: A Comparative Study of Regression Approaches and Hybrid Geostatistical Methods. Remote Sens. 2016, 8, 632. [Google Scholar] [CrossRef] [Green Version]
Jarvinen, T.D.; Choi, D.; Heinemann, P.; Baugher, T.A. Multiple Object Tracking-by-Detection for Fruit Counting on an Apple Tree Canopy. In Proceedings of the 2018 ASABE Annual International Meeting, Detroit, MI, USA, 31 July 2018. [Google Scholar] [CrossRef]
Javare, P.; Khetan, D.; Kamerkar, C.; Gupte, Y.; Chachra, S.; Joshi, U. Using Object Detection and Data Analysis for Developing Customer Insights in a Retail Setting. SSRN Electron. J. 2020. [Google Scholar] [CrossRef]
Alzubaidi, L.; Zhang, J.; Humaidi, A.J.; Al-Dujaili, A.; Duan, Y.; Al-Shamma, O.; Santamaría, J.; Fadhel, M.A.; Al-Amidie, M.; Farhan, L. Review of Deep Learning: Concepts, CNN Architectures, Challenges, Applications, Future Directions; Springer International Publishing: Cham, Switzerland, 2021; Volume 8, ISBN 4053702100444. [Google Scholar]
Han, J.; Zhang, Z.; Cao, J.; Luo, Y.; Zhang, L.; Li, Z.; Zhang, J. Prediction of Winter Wheat Yield Based on Multi-Source Data and Machine Learning in China. Remote Sens. 2020, 12, 236. [Google Scholar] [CrossRef] [Green Version]
Wang, M.; Wang, X. Automatic Adaptation of a Generic Pedestrian Detector to a Specific Traffic Scene. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Colorado Springs, CO, USA, 20–25 June 2011; pp. 3401–3408. [Google Scholar] [CrossRef] [Green Version]
Arteta, C.; Lempitsky, V.; Noble, J.A.; Zisserman, A. Learning to Detect Cells Using Non-Overlapping Extremal Regions. Medical image computing and computer-assisted intervention. In Proceedings of the MICCAI International Conference on Medical Image Computing and Computer-Assisted Intervention, Nice, France, 1–5 October 2012; Volume 15, pp. 348–356. [Google Scholar] [CrossRef] [Green Version]
Huang, G.; Laradji, I.; Vazquez, D.; Lacoste-Julien, S.; Rodriguez, P. A Survey of Self-Supervised and Few-Shot Object Detection. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 1–20. [Google Scholar] [CrossRef]
Rabaud, V.; Belongie, S. Counting Crowded Moving Objects. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, New York, NY, USA, 17–22 June 2006; Volume 1, pp. 705–711. [Google Scholar] [CrossRef]
Li, Y.; Liu, H.; Zheng, X.; Han, Y.; Li, L. A Top-Bottom Clustering Algorithm Based on Crowd Trajectories for Small Group Classification. IEEE Access 2019, 7, 29679–29698. [Google Scholar] [CrossRef]
Sindagi, V.A.; Patel, V.M. A Survey of Recent Advances in CNN-Based Single Image Crowd Counting and Density Estimation. Pattern Recognit. Lett. 2018, 107, 3–16. [Google Scholar] [CrossRef] [Green Version]
Chen, K.; Gong, S.; Xiang, T.; Loy, C.C. Cumulative Attribute Space for Age and Crowd Density Estimation. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Portland, OR, USA, 23–28 June 2013; pp. 2467–2474. [Google Scholar] [CrossRef] [Green Version]
Ribera, J.; Chen, Y.; Boomsma, C.; Delp, E.J. Counting Plants Using Deep Learning. In Proceedings of the 2017 IEEE Global Conference on Signal and Information Processing (GlobalSIP), Montreal, QC, Canada, 14–16 November 2017; pp. 1344–1348. [Google Scholar]
Aich, S.; Stavness, I. Leaf Counting with Deep Convolutional and Deconvolutional Networks. In Proceedings of the 2017 IEEE International Conference on Computer Vision Workshops, ICCVW 2017, Venice, Italy, 22–29 October 2017; pp. 2080–2089. [Google Scholar] [CrossRef] [Green Version]
Aich, S.; Josuttes, A.; Ovsyannikov, I.; Strueby, K.; Ahmed, I.; Duddu, H.S.; Pozniak, C.; Shirtliffe, S.; Stavness, I. DeepWheat: Estimating Phenotypic Traits from Crop Images with Deep Learning. In Proceedings of the 2018 IEEE Winter Conference on Applications of Computer Vision, WACV 2018, Lake Tahoe, NV, USA, 12–15 March 2018; pp. 323–332. [Google Scholar]
Noman, M.; Stankovic, V.; Tawfik, A. Object Detection Techniques: Overview and Performance Comparison. In Proceedings of the 2019 IEEE 19th International Symposium on Signal Processing and Information Technology, ISSPIT 2019, Ajman, United Arab Emirates, 10–12 December 2019. [Google Scholar] [CrossRef] [Green Version]
Borges Oliveira, D.A.; Ribeiro Pereira, L.G.; Bresolin, T.; Pontes Ferreira, R.E.; Reboucas Dorea, J.R. A Review of Deep Learning Algorithms for Computer Vision Systems in Livestock. Livest. Sci. 2021, 253, 104700. [Google Scholar] [CrossRef]
Koltes, J.E.; Cole, J.B.; Clemmens, R.; Dilger, R.N.; Kramer, L.M.; Lunney, J.K.; McCue, M.E.; McKay, S.D.; Mateescu, R.G.; Murdoch, B.M.; et al. A Vision for Development and Utilization of High-Throughput Phenotyping and Big Data Analytics in Livestock. Front. Genet. 2019, 10, 1197. [Google Scholar] [CrossRef] [Green Version]
Barkema, H.W.; von Keyserlingk, M.A.G.; Kastelic, J.P.; Lam, T.J.G.M.; Luby, C.; Roy, J.P.; LeBlanc, S.J.; Keefe, G.P.; Kelton, D.F. Invited Review: Changes in the Dairy Industry Affecting Dairy Cattle Health and Welfare. J. Dairy Sci. 2015, 98, 7426–7445. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Lempitsky, V.; Zisserman, A. Learning To Count Objects in Images. In Proceedings of the TNIPS’10: Proceedings of the 23rd International Conference on Neural Information Processing Systems, Vancouver, BC, Canada, 6–9 December 2010; Volume 1, pp. 1324–1332. [Google Scholar]
Xie, W.; Noble, J.A.; Zisserman, A. Microscopy Cell Counting and Detection with Fully Convolutional Regression Networks. Comput. Methods Biomech. Biomed. Eng. Imaging Vis. 2018, 6, 283–292. [Google Scholar] [CrossRef]
Õnoro-Rubio, D.; L’opez-Sastre, R.J. Towards Perspective-Free Object Counting with Deep Learning. In Computer Vision—ECCV 2016; Leibe, B., Matas, J., Sebe, N., Welling, M., Eds.; Springer: Cham, Switzerland, 2016; Volume 9911, pp. 615–629. ISBN 978-3-319-46477-0. [Google Scholar]
Kilic, E.; Ozturk, S. An Accurate Car Counting in Aerial Images Based on Convolutional Neural Networks. J. Ambient. Intell. Humaniz. Comput. 2021. [Google Scholar] [CrossRef]
Haider, A.; Shaukat, F.; Mir, J. Human Detection in Aerial Thermal Imaging Using a Fully Convolutional Regression Network. Infrared Phys. Technol. 2021, 116, 103796. [Google Scholar] [CrossRef]
Drid, K.; Allaoui, M.; Kherfi, M.L. Object Detector Combination for Increasing Accuracy and Detecting More Overlapping Objects. In International Conference on Image and Signal Processing; Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Springer: Cham, Switzerland, 2020; Volume 12119, pp. 290–296. [Google Scholar] [CrossRef]
Carranza-García, M.; Torres-Mateo, J.; Lara-Benítez, P.; García-Gutiérrez, J. On the Performance of One-Stage and Two-Stage Object Detectors in Autonomous Vehicles Using Camera Data. Remote Sens. 2021, 13, 89. [Google Scholar] [CrossRef]
Zhou, T.; Yu, Z.; Cao, Y.; Bai, H.; Su, Y. Study on an Infrared Multi-Target Detection Method Based on the Pseudo-Two-Stage Model. Infrared Phys. Technol. 2021, 118, 103883. [Google Scholar] [CrossRef]
Patel, D. Single Shot Detector for Object Detection Using an Ensemble of Deep Learning and Statistical Modelling for Robot Learning Applications; Laurentian University Sudbury: Sudbury, ON, Canada, 2021; Available online: https://zone.biblio.laurentian.ca/handle/10219/3753 (accessed on 14 November 2022).
Liang, M.; Hu, X. Recurrent Convolutional Neural Network for Object Recognition. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 3367–3375. [Google Scholar] [CrossRef]
Kamilaris, A.; Prenafeta-Boldú, F.X. Deep Learning in Agriculture: A Survey. Comput. Electron. Agric. 2018, 147, 70–90. [Google Scholar] [CrossRef] [Green Version]
Mirhaji, H.; Soleymani, M.; Asakereh, A.; Abdanan Mehdizadeh, S. Fruit Detection and Load Estimation of an Orange Orchard Using the YOLO Models through Simple Approaches in Different Imaging and Illumination Conditions. Comput. Electron. Agric. 2021, 191, 106533. [Google Scholar] [CrossRef]
Koirala, A.; Walsh, K.B.; Wang, Z.; McCarthy, C. Deep Learning—Method Overview and Review of Use for Fruit Detection and Yield Estimation. Comput. Electron. Agric. 2019, 162, 219–234. [Google Scholar] [CrossRef]

Figure 1. Flowchart for model implementation.

Figure 2. Fully Connected Regression Network Architecture (FCRN).

Figure 3. U-Net Convolutional Architecture.

Figure 4. Comparison of two architecture on sample result. (a) Input images, (b) ground-truth dot annotations, (c) estimated density (heat) maps. FCRN: Fully Convolutional Regression Networks; U-Net: U-Shaped Convolutional Network.

Table 1. FCRN and U-Net Results.

Network	Batch Size	MAE	RMSE
FCRN	1	23.65	36.69
	8	37.78	45.76
	16	158.71	201.46
U-Net	1	16.69	22.48
	8	24.42	34.64
	16	144.28	198.34

MAE: Mean Absolute Error; RMSE: Root mean squared error. The lower scores mean better prediction. FCRN: Fully Convolutional Regression Networks; U-Net: U-Shaped Convolutional Network.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Özden, C.; Bulut, M.; Çanga Boğa, D.; Boğa, M. Determination of Non-Digestible Parts in Dairy Cattle Feces Using U-NET and F-CRN Architectures. Vet. Sci. 2023, 10, 32. https://doi.org/10.3390/vetsci10010032

AMA Style

Özden C, Bulut M, Çanga Boğa D, Boğa M. Determination of Non-Digestible Parts in Dairy Cattle Feces Using U-NET and F-CRN Architectures. Veterinary Sciences. 2023; 10(1):32. https://doi.org/10.3390/vetsci10010032

Chicago/Turabian Style

Özden, Cevher, Mutlu Bulut, Demet Çanga Boğa, and Mustafa Boğa. 2023. "Determination of Non-Digestible Parts in Dairy Cattle Feces Using U-NET and F-CRN Architectures" Veterinary Sciences 10, no. 1: 32. https://doi.org/10.3390/vetsci10010032

APA Style

Özden, C., Bulut, M., Çanga Boğa, D., & Boğa, M. (2023). Determination of Non-Digestible Parts in Dairy Cattle Feces Using U-NET and F-CRN Architectures. Veterinary Sciences, 10(1), 32. https://doi.org/10.3390/vetsci10010032

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Determination of Non-Digestible Parts in Dairy Cattle Feces Using U-NET and F-CRN Architectures

Abstract

Simple Summary

Abstract

1. Introduction

2. Materials and Methods

2.1. Material

2.2. Method

3. Results

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Appendix A. Model Summary for FCRN

Appendix B. Model Summary for U-Net

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI