Cow Rump Identiﬁcation Based on Lightweight Convolutional Neural Networks

: Individual identiﬁcation of dairy cows based on computer vision technology shows strong performance and practicality. Accurate identiﬁcation of each dairy cow is the prerequisite of artiﬁcial intelligence technology applied in smart animal husbandry. While the rump of each dairy cow also has lots of important features, so do the back and head, which are also important for individual recognition. In this paper, we propose a non-contact cow rump identiﬁcation method based on convolutional neural networks. First, the rump image sequences of the cows while feeding were collected. Then, an object detection model was applied to detect the cow rump object in each frame of image. Finally, a ﬁne-tuned convolutional neural network model was trained to identify cow rumps. An image dataset containing 195 different cows was created to validate the proposed method. The method achieved an identiﬁcation accuracy of 99.76%, which showed a better performance compared to other related methods and a good potential in the actual production environment of cow husbandry, and the model is light enough to be deployed in an edge-computing device.


Introduction
Individual identification is a tool that could be used to manage the possible development and the diseases of the dairy cows [1]. For modern precision dairy farming, the individual cow has been paid more attention than the herd. In addition, the implementation of automatic individual cow identification is the fundamental ingredient which will extend to fields such as intelligent milking, automatic behavior and health monitoring, etc. [2,3]. In this paper, we proposed a cow identification method focused on the rump part, which can be applied to some fields of intelligent analysis and individualized behavior detection with less labor, such as lameness detection, body condition scoring, individual localization, etc. [4][5][6]. Furthermore, the cow identification based on other angles of view and various systems for cows can take it as a reference.
In general, the animal identification can be accomplished by numerous methods, which could be divided into mechanical, electronic, and biometric [7]. For mechanical methods, take the ear brand, a traditional mechanical method, as a counter-example; its surface information can be identified by people easily. However, it might be low in speed and be less automatic, and the brands tend to be stolen, removed or duplicated, which causes some inevitable issues [8]. Thus far, the electronic method, such as the sensor-based system, has become a widespread electronic method in farms, which includes small passive RFID ear tags [9][10][11], active RFID ear tags [12], and some wired, wireless or hybrid digital device 01networks, radar, etc. [13]. These methods did gain popularity over the past few years, but they actually present some restrictions. For example, the ear tags may cause stress on the cows and may also be lost or damaged over time. Additionally, the reading distance is limited [14]. The development cost of the local position measurement system based on radar technology [15] is too high. To solve the restriction of these methods, the technology based on the computer vision has drawn most researchers' attention due to its low cost and non-contact type. So, it is necessary to apply computer vision technology to individual cow identification.
Nowadays, with the computer vision technology growing rapidly, most tasks in dairy farming have been much more automatic than before. In [16], an imaging system based on deep learning to detect feeding behaviors of dairy cows was developed. The authors of [17] proposed an improved single shot multi-box detector method to score the body condition of cows. In [18] the authors achieved the lameness detection based on the YOLOv3 and a relative step size characteristic vector. From these related studies, we can see that these non-contact methods have improved the automation and the accuracy of the precision dairy farming.
With regard to the identification of dairy cows, the current research focuses on the following aspects: muzzle, face, iris, body and gait, as shown in Figure 1a-f. The authors of [19] came up with vision animal biometric systems based on muzzle point image patterns; a Gaussian pyramid was applied to filter the noise from the muzzle images, SIFT and SURF were used as feature extraction and representation algorithms, the matching similarity score based on the key points of the muzzle point image was used to evaluate the identification accuracy of cattle. The authors of [20] proposed a cow's face representation model based on local binary pattern (LBP) texture features. After obtaining the cow face images, the images were divided into multiple regions, and a description of each region was provided using a local binary pattern, and then these descriptors were combined into a histogram of the facial image to realize the cow identification. The authors of [21] developed a cow identification system based on iris analysis includes iris imaging, iris detection, and identification. First, a clear iris was selected by comparing the captured iris sequence image, and the image was segmented by edge detection, and the contour of the iris was integrated into an elliptical shape. Then, the iris image was normalized, and the local and global characteristics of the milk cattle were extracted to complete the individual identification of dairy cows based on the iris images. In [22], a vision system to extract body images and identify cows was proposed. The FAST, SIFT and FLANN methods were used for feature extraction, descriptor, and matching. However, in a large herd of cows, extracting only the body image of the side of the cow is not enough to ensure the accuracy of identifying the cow. The authors of [23] proposed a cow identification method based on the L component of Lαβ color space to identify the cow's side. In [24], the authors proposed a cow identification method based on three-dimensional video analysis using RGB-D cameras. First, use the ICP algorithm to align the 3D point cloud data, then extract the gait information based on the average gait contour, and then linearly combine the extracted texture features of the cow coat to realize the identification of individual cows. However, cows will perform non-periodic head movements when walking, which will reduce the identification accuracy. In actual application, for the side of the cow, the requirement of the shooting angle is quite strict, and the cover problem may affect the results; for using the back of the cow, the deployment of the experimental device may be more complicated; as for using the rump of the cow, the complexity of the image collection can be reduced. Additionally, in subsequent cases, such as body condition scores, type classification, both of them will use the characteristics of the overall or partial area of the dairy rump, so the individual identification of the dairy cow can be used as the basis for future research. Therefore, in this paper, we implement cow individual identification based on cow rump images through a fine-tuned convolutional neural network. First, we use a camera placed behind the cow to obtain the upright standing images of the cow rump, then we use the SSD [25] model to perform real-time cow detection, and finally we fine-tune a convolutional neural network to achieve individual cow identification.
The sections of this paper are arranged as follows. Section 2 describes the data collection and the detailed process of the method we adopted. The experimental results are summarized and discussed in Section 3. Section 4 gives the final conclusion.

Image Acquisition
In this experiment, the rump image sequences of the cow were collected. The image acquisition was performed in a relatively natural environment; some other objects, such as walls and iron railings, may cause difficulties in the detection and identification. Images can reflect many common characteristics of dairy cows' actual breeding environment, so they could evaluate the identification performance of dairy cows objectively.
The experimental images collection was performed in the cowsheds of Shanghe Ranch and Nestle Dairy Cow Breeding Training Center in Harbin, Heilongjiang, in July 2018 and September 2018, respectively. The experimental object was the Holstein cow. Dairy cows were housed in barns with sand beds, with a fan and a sprayer to reduce the temperature around them. The self-locking neck clips were installed at the feeding line. The layout of the barn is shown in Figure 2. The "∆" is the initial position of the camera, 3.5 m away from the feeding line. The neck clip will clamp the cow's neck when the cow is eating. An Intel Realsense D435 camera was used to collect the image sequences of the cow rump along the camera movement route in this figure.

Experimental Data
the object detection model, then the detection results were assigned to the corresponding cow categories, forming the rump images to be identified.
After detecting the rump object, our experiment extracted 3057 rump images of 195 cows as input images for subsequent individual identification. In order to simplify the experiment, the 195 cows were numbered from 0 to 194. The rump images of each cow to be identified were randomly divided into a training set and a validation set with a ratio of 7:3. Finally, we obtained 2140 images in the training set and 917 images in the test set.

Individual Identification
Recently, convolutional neural networks (CNN) have had great achievements in visual recognition/classification tasks by learning the deep features of the original images [26][27][28][29].
Since the texture information of the cow rump is relatively less, it is difficult to manually define discriminative features from these cow images using traditional algorithms. Therefore, in order to take advantage of the deep features of the cow rump images, this paper proposed a cow individual identification method based on convolutional neural networks. The flowchart of this identification method is shown in Figure 3. The SSD object detection model was used to extract the cow rump object in each frame as the image to be identified, and then the image dataset was used a light convolutional neural network model to complete the individual identification of dairy cows. The detailed introduction of this method is as follows.

Object Detection
In order to detect the cow image in each cow images, we used the SSD object detection model, a deep learning framework for object detection, which converts the two stages of selecting proposal regions and classification into a single-stage regression problem, and the effective main network outputs very sparse detection results to achieve realtime object detection of the trained object category. SSD has pre-trained the cow object and meets the real-time detection requirements of the cow object in actual production applications. Therefore, we used the SSD object detection model to detect the cow rump. We use Equation (1) to select the rump images of interest as the images to be identified.
where COW denotes the set of images to be identified, A det i denotes the area of the object detection result, At denotes an area threshold, R det i represents the ratio of the width and height of the object detection result det i , and the Rmin and the Rmax denotes the minimum and the maximum thresholds of the aspect ratio, respectively, label det i denotes the object name of det i , n denotes the number of object detection results for the entire image sequence.
During the experiment, we found that when At is set to 0.1 × (1280 × 720) and Rmin and Rmax were set to 0.35 and 0.7, based on experience, respectively, the prominent rumps of interest in each cow image were extracted. Our experiment selected the detection result named cow, the area was larger than A t , and the aspect ratio R det i was between the aspect ratio thresholds Rmin and Rmax as the images to be identified, the object detection time of the single image sample can reach 20 ms. Figure 3 shows an example of rump object detection. From this figure, it can be found that the method detected the position of the rump object accurately. Some false detection objects in the detection result were filtered by Equation (1), as shown in Figure 4a; the two cows on the left were mistakenly detected as one cow object due to the camera angle. For such detection results, the aspect ratio can be used to eliminate them.

Cow Identification Model Based on Convolutional Neural Networks
In the selection of base network for individual identification model, Mobilenet v2 [30] network with the highest and the lightest performance in subsequent individual identification experiments was selected as the base network for this experiment. Mobilenet v2 is a typical light convolutional neural network model, which also has high classification performance in ImageNet. In the individual identification method proposed in this paper, the Mobilenet v2 network was fine-tuned for the deep features of the cow rump. The images of cow rump were used as the input of the network, and the weights pre-trained on ImageNet were taken as the initial parameters of the network, and the new model was fine-tuned by updating the parameters of the last two layers of Mobilenet v2. Mobilenet v2 model structure is shown in Table 1. Layer1 was a convolution layer, Layer2 to Layer7 were bottleneck depth-separable convolution layers, the bottleneck depth-separable convolution layers is shown in Figure 5, in which ReLU6 activation function was used behind each bottleneck depth-separable convolution layer. The FC8 was a fully connected layer, the Layer9 was an average pooling layer, the last Layer was the final fully connected layer.  In order to explore the influence of over-fitting or under-fitting on the experimental results, the original dataset and the data enhanced experimental control group were added to the experimental process. Horizontal flip and random cropping methods were applied in the experiment and the original dataset was enhanced by 20 times. In terms of model training, the original images in the dataset were input into the convolutional neural network for training, and each input image was converted into a 224 × 224 RGB image. After multiple convolution and pooling operations were performed, the prediction result was generated by the last fully connected layer of the network. There were 195 cows in the dataset of this experiment. The purpose of the experiment was to identify 195 cows. Therefore, the last fully connected layer of the convolutional neural network was made up of 195 neurons, and input the results into the 195-dimensional softmax layer, which produces a distribution on 195 categories of labels.
Mobilenet v2 uses inverted residual and linear bottlenecks to reduce the parameters of the model extremely. First, the rump image and the corresponding cow category label were input to the network and pass to the first convolution layer conv1.This layer has 32 convolution kernels with the size of 1 × 1, the convolution operation was performed in 2-pixel steps. The original image with the size of 224 × 224 × 3 was converted into a 112 × 112 × 32 feature map, then use the activation function Relu6 to increase the nonlinearity of neural network model. After this operation, 17 bottleneck depth-separable convolution layers was used, and the size of the obtained feature map was 7 × 7 × 320. Then, the fully connected layer has 1280 neurons, and there was a maximum pooling layer behind, and finally there was a 195-dimensional softmax layer. The probability that the input cow was classified into a certain category was represented by the output result, and the preliminary prediction of the input image was completed.
Next, after calculating the error between the prediction result and the actual category, the stochastic gradient descent method was used to minimize the loss function through the back propagation of the error to achieve the update of the parameters of the last two layers and complete the fine-tuning of the network. After the model converges, a set of optimal parameters of the network were obtained, which was the model obtained through training.
Finally, in the validation stage, the cow images in the validation set were input into the model for prediction, and the final identification result was obtained.

Experimental Results and Analysis
The hardware configuration for the experiment is as follows: the operating system is Windows 10, the CPU is Intel Core i7-7800X 3.5 GHz, the memory is 64 GB, and the graphics card is NVIDIA GTX 1080Ti 11 GB. The code is implemented using Python based on TensorFlow [31].
In this experiment, the two datasets were used to fine-tune the convolutional neural network to complete the end-to-end identification of the cow rump. In this paper, the accuracy was used as the evaluation indicator for individual identification, that is, the percentage of correctly predicted samples in the validation set. In terms of experimental parameters, the model was trained with a fixed learning rate of 0.001, and the batch size was set to 30. In terms of the base network selection of the model, this experiment compared the experimental results of five typical convolutional neural network models. The identification accuracy comparison for fine-tuning different base networks in three datasets is shown in Table 2. From the experimental results, it can be found that the accuracy of each base network is relatively close without data augment. With the increase in the number of experimental data, the identification accuracy of all the network is increasing, and the accuracy of each network model is similar. While it is obvious that the model trained by Mobilenet v2 is smaller than other networks, one possible reason is that the Mobilenet v2 used a small number of model parameters. In the end, the model with Mobilenet v2 using a 20 times augmented dataset achieved the highest identification accuracy of 99. 76%. In terms of model reason, it can reach 31.17 ms, which is comparable in reasoning time due to its network structure, Therefore, this study can achieve real-time individual identification. It can be seen that although accuracy is considerable, there is still a small amount of samples that have been misused, mostly caused by fuzzy images, caused by the lighting. Some misunderstandings are shown in Figure 6, while (a) shows the image under glare condition, and the (b) shows the blur image. At present, with our understanding, there is almost no work using cow rump for individual identification. Therefore, in this paper, the individual identification method of cow rump was compared with other methods from related views. The identification accuracy comparison of cow rump and other related views is shown in Table 3. The Table showed that the number of object categories of other related methods are relatively small. The work of [21] based on iris analysis achieved an identification accuracy of 98.33%, but the iris image acquisition is difficult. Furthermore, their method was only evaluated on six cows, and as the amount of experimental data increased, the identification accuracy will be affected. The authors of [23] proposed a SIFT-based method to identify the cow's side, which also achieved an identification accuracy of 98.33%. However, the SIFT-based traditional method is greatly affected by the environment, and its calculation amount is large and it is time-consuming, so it is difficult to realize real-time identification in actual production environments. Moreover, in our previous research [32], we proposed a cow identification method based on fusion of deep parts features of the cow's side. The method achieved an identification accuracy of 98.36% in a dataset containing 93 cows, which is 0.03% higher than the work of [23]. The proposed cow rump identification method achieved 4.46% and 2.04% improvements in performance compared to the work of using the face and body, respectively, which showed the advantages of high accuracy.

Conclusions
In this work, a non-contact cow rump identification method based on convolutional neural networks was proposed. In this method, SSD object detection model was first applied to detect the cow rump object in the rump image sequences of the cows in the feeding, and then a light convolutional neural network model was trained to identify cow rumps. To validate the proposed method, an image dataset containing 195 different cows was created, and relevant individual identification experiments were performed on this dataset. The proposed cow rump identification method achieved an accuracy of 99.76%, which has the advantages of high accuracy compared to other methods from related views. Moreover, the model can detect and classify 120 images per second, so the model can conduct real-time detection and identification, at the mean time, the model is light enough to be deployed in an edge-computing device. The experimental results also demonstrated the potential for our method to be applied to individualized behavior detection and intelligent analysis of dairy cows.