Computer Vision for Detection of Body Posture and Behavior of Red Foxes

Simple Summary Monitoring animal behavior provides an indicator of their health and welfare. For this purpose, video surveillance is an important method to get an unbiased insight into behavior, as animals often show different behavior in the presence of humans. However, manual analysis of video data is costly and time-consuming. For this reason, we present a method for automated analysis using computer vision—a method for teaching the computer to see like a human. In this study, we use computer vision to detect red foxes and their body posture (lying, sitting, or standing). With this data we are able to monitor the animals, determine their activity, and identify their behavior. Abstract The behavior of animals is related to their health and welfare status. The latter plays a particular role in animal experiments, where continuous monitoring is essential for animal welfare. In this study, we focus on red foxes in an experimental setting and study their behavior. Although animal behavior is a complex concept, it can be described as a combination of body posture and activity. To measure body posture and activity, video monitoring can be used as a non-invasive and cost-efficient tool. While it is possible to analyze the video data resulting from the experiment manually, this method is time consuming and costly. We therefore use computer vision to detect and track the animals over several days. The detector is based on a neural network architecture. It is trained to detect red foxes and their body postures, i.e., ‘lying’, ‘sitting’, and ‘standing’. The trained algorithm has a mean average precision of 99.91%. The combination of activity and posture results in nearly continuous monitoring of animal behavior. Furthermore, the detector is suitable for real-time evaluation. In conclusion, evaluating the behavior of foxes in an experimental setting using computer vision is a powerful tool for cost-efficient real-time monitoring.


Introduction
Animal welfare is becoming increasingly important in animal experimentation and husbandry, and is often defined by the Five Freedoms concept [1], the Five Domains concept, or the complementary use of both [2,3]. Thus, due to its multidimensional character, it is influenced by many factors [4]. For its detection based on animal movements, different The automated deduction of animal behavior patterns by a combined evaluation of the detection of body postures and the determination of activity levels is missing so far. For this purpose, we demonstrate the application of deep learning for the detection, tracking, activity, and behavior determination of red foxes (Vulpes vulpes) during an experimental study, which was being conducted to measure the long-time immunogenicity and efficacy of an oral rabies vaccine in these animals [34]. To this end, we used video surveillance data of the foxes generated as part of this experimental study. In particular, a convolutional neural network (CNN) (YOLOv4) is trained for red fox body posture detection, e.g., 'standing', 'lying', and 'sitting'. The results of this detection can be used to infer different activity levels [33], which can then be used in combined evaluation with the detected posture to determine different behavior. The presented technique can be applied to detect posture patterns of animals, including behavior determination.

Experimental Setup
The animals considered here are red foxes (Vulpes vulpes) of the fur color variant 'silver fox' [35]. The experimental study with 23 foxes was conducted over 450 days at the Friedrich-Loeffler-Institut (FLI), Greifswald, Insel Riems, Germany [34]. During that time the foxes were separately kept in cages, sized 3.18 m × 1.4 m × 1.75 m (length × width × height), equipped with a platform, 0.92 m × 1.4 m (length × width) at height of 0.8 m above the bottom of the cage. In addition, each cage was equipped with a hut. In the study, a novel oral vaccination regime against rabies was tested. This required animal blood to be sampled in regular intervals. To this end, the foxes were anesthetized [34]. All invasive procedures, i.e., blood sampling, infection, transponder application, euthanasia, were conducted under anesthesia by applying 0.5-1 mL of Zoletil (Virbac, France; 1 mL contains 50 mg Tiletamin, and 50 mg Zolazepam). It takes 7 to 10 min before you can start with the manipulations. Zoletil does not require an antidote, because it is a short time anesthesia. The increased activity after anesthesia may be reflective of the recovery phase. It is known that in canids, the half-life of tiletamine is 1.2 h and that of zolazepam is 1 h. During the recovery phase of canids from anesthesia with Zoletil ® , tiletamine therefore has an even longer effect than zolazepam and excitation states and increased movement can occur [36]. Housing and maintenance of the animals complied with national and European legislation and guidelines for the veterinary care of laboratory animals [37]. Food and water were provided according to the species-specific requirements and were also individually adjusted. The foxes received enrichment (such as ball, kong) at irregular intervals to improve animal welfare, which were not considered in our study. The animals were visually inspected daily and the cages cleaned on a regular basis. The availability of external monitoring, including recording by video cameras, was one of the requirements for approval. Therefore, every fox was monitored via two cameras (ABUS IR HD TVIP61500, ABUS, Wetter, Germany). Each cage was equipped with two cameras hung on the opposite narrow sides of the cage. Together, 33 TB of video data was recorded discontinuously (due to memory requirements), on 73 different days. We have used image data that was extracted from video data of all foxes for training the algorithm. For the exemplary application shown below, we restrict ourselves to analyzing video data of 1 single red fox on 6 different days, spanning a period of 11 days. Table 1 shows the evaluated video data and the times of events such as anesthesia or cage cleaning.
On day 3, anesthesia is administered at 09:56, the red fox shows normal behavior until this time, then lies for about 1 h followed by a wake-up phase (approximately 11:00 to 11:30) in which the red fox repeatedly crawls a bit, then repeatedly starts to stand up, walks a few steps and sat down or lies down again. Then there follows a phase in which the red fox very excitedly walks around a lot in the cage; this lasts until about 14:15. From then on the sitting and lying times between the walking become longer and longer until the red fox lies down at 16:01 and sleeps until 17:08.

Ethical Approval
The animal experiment was authorized by the local authority in Mecklenburg-Western Pomerania (Landesamt für Landwirtschaft, Lebensmittelsicherheit und Fischerei Mecklenburg-Vorpommern, # FLI-7221.3-1-087/16) and conducted in accordance with national and European legislation and guidelines for the veterinary care of laboratory animals [37].

Image and Video Data
The resolution of the video data was 1280 pixels (horizontal) × 720 pixels (vertical) and had a frame rate of 15 frames per second (fps). An image set was created by extracting single frames from the video data. For the image set, videos of all 23 red foxes with different body postures were used. No adjacent frames were extracted and frames with different illumination conditions (night and day) were used. The image set consisted of 8913 images. The red fox on each image of the image set was manually labeled using the software LabelImg [38], and attributed to one of the three classes 'sitting', 'standing', and 'lying'. Lying was defined as lying prone, lying on the side, lying curled up, or lying on the back. Standing was defined as a quadruped position. Sitting was regarded as an intermediate posture, i.e., the two hind legs not righted and the two forelegs righted on the floor.
The image set was split into a training (80%-7129 frames) and a test set (20%-1784 frames) ( Table 2), maintaining the relation of the three labeled classes. The training set was used to train the YOLOv4 object detection algorithm and the test set to evaluate the trained model for the red fox posture detection. Table 2. Splitting of the image set (8913 frames) into a training and a test set in the ratio 80% to 20%. Shown also is the number of frames per subset for each behavioral postures 'lying', 'sitting', and 'standing'.

Total
Lying Sitting Standing

Environment Configuration
The processor used in the study was an Intel Xeon E5-2667 v4 with 3.20 GHz, 377 GB Ram, and NVIDIA K80 with 2 GPUs and 24 GB video RAM. The operating system was CentOS 7. The algorithm was developed by using a Jupyter notebook [39] and Python 3.6.8 [40].

Automatic Evaluation: Red Fox Detection and Posture Classification
The detection of red fox postures was implemented by using the deep learning algorithm YOLOv4. The algorithm YOLO is a one-stage object detection algorithm for real-time object detection based on CNN [30,41]. In this study, we used version 4 of YOLO, which consists of a 'head', a 'neck', and a 'backbone' [42]. The head is used to implement the object detection [42], and is the YOLOv3 algorithm [41]. The neck is used to collect feature maps from different stages and is based on a path aggregation network (PAN) and SPP spatial pyramid pooling (SPP) [42]. The backbone is used for training and feature extraction and it is a CSPDarknet53, which is an open source neural network framework [41,42]. YOLOv4 is a state-of-the-art detector, which is faster and more accurate than other available detectors [42]. The training of the red fox posture detector based on YOLOv4 was performed with the parameters from Table 3 and the training set (Table 2).
In this study, we have limited the detectable classes for postures exclusively to three different postures, i.e., 'sitting', 'lying', and 'standing'. A detailed description of the training procedure is given in the Appendix A.
Intersection over Union (IoU) was used to determine if a detection was true positive or false positive (see Equation (1)). When IoU ≥ 0.5, it was true positive and false positive if IoU < 0.5. If an image was labeled and the model does not detect anything, it was false negative. The following values were computed: where IoU: Intersection over Union, BB p : predicted bounding box (BB) from the model, BB gt : ground-truth bounding box (e.g., manually labeled), AP: average precision, C: number of classes, TP: number of true positives, FP: number of false positives, FN: number of false negatives. The AP is a measure for the detection accuracy of the model (for more details see [33,43]). Here we used the 11-point interpolated AP [43]: with where p( r) is the precision at recall r. Equation (6) results in a smoothening of the precisionrecall curve.
For each image f , the trained object detection algorithm returns whether a red fox is on the image, and if so, the class ('lying', 'sitting, 'standing'), the confidence of the detection, and the bounding box center position (x f , y f ), and width and height-standardized between 0 and 1, respectively.

Automatic Evaluation: Activity Analysis
The bounding boxes were used to measure the activity level of the red fox [33]. To this end, the movement of the center of the bounding box between two consecutive frames was determined. This movement of the center of the BB corresponds to the distance covered by the red fox between two consecutive frames.
time period t, and number of frames F in t. The maximum of the mean vector norm for different periods and kinds of movement behavior can be used to determine thresholds for different activity levels. Three activity levels were considered: (i) Highly active: considerable movement of the bounding box (BB), i.e., the localization of the red fox changes, e.g., walking or running; (ii) Active: slight movement of the BB, i.e., the localization of the red fox does not change, but there is some movement inside the BB, e.g., rotation or minimal movements, such as scratching or stretching; (iii) Inactive: no movement of the BB, i.e., the red fox does not move, e.g., lying, sitting, or standing still.
For all three postures (lying, sitting, standing), highly active, active, and inactive variants are possible.

Automatic Evaluation: Behavior Analysis
For behavioral analysis the activity levels were considered along with body posture to draw conclusions about the behavior of the respective red fox. Therefore, a behavior was assigned to each possible combination of a body posture and activity level (see below).

Workflow for Automated Video Evaluation
For the evaluation of the videos we used the trained red fox detector as described above. The video analysis was implemented as follows: 1.
Red fox posture detection on each frame; 3.
Activity analysis using the BB values for the activity level determination; 4.
Behavior analysis using the posture and activity level for the behavior classification.
The evaluation workflow is depicted in Figure 1. For the joint evaluation of both cameras of the same animal, each video was evaluated separately for the same period. It is possible that the evaluation of both cameras differs, e.g., if the fox is not completely visible for camera 1 because the legs are in the blind spot, the fox is classified as lying for camera 1, while camera 2 (seeing the complete fox) classifies it as standing. In case of mismatches between the cameras, the larger vector norm and the body posture with the higher confidence was chosen. To avoid single individual false classifications for the body posture, a sliding window of 5 s duration was used to select the most frequently occurring body posture.

Model Training and Evaluation
The performance of the model was evaluated by comparing the label of manually labeled images (test set) with the results of the automated detection of the trained model. The results for the three classes are shown in Table 4, and the overall performance is shown in Table 5. The precision and recall of the model are 98.61% and 95.12%, respectively; the average IoU is 0.91, the mAP is 99.91% and the detection speed reaches 73.31 ms per frame. Figure 2 shows examples of red fox detection for each of the three postures 'sitting' (Figure 2a-d), 'standing' (Figure 2e-h), and 'lying' (Figure 2i-l) for day and night scenes.   For example, day 1 represents the normal activity of the red fox. More active and inactive phases alternate throughout the day. The period during which the animal caretakers entered the room and cleaned the cages related with increased red fox activity. This can be seen particularly well on day 1, from 11:30 to 12:00, where the largest amount of 'highly active' coincides with the presence of an animal caretaker in the room (11:30 to 11:55). The anesthetic phase, i.e., anesthesia and subsequent recovery phase, on day 3 relates with the active phase on that day. Activity increased suddenly with the beginning of the wake-up phase at 11:00. From 11:00 to 14:00, the activity level was almost only 'highly active' and related with walking around in the cage after the wake-up phase. From 14:00 to 16:00, the proportion of 'highly active' decreased and the red fox showed the activity level 'inactive' for a longer time. This is followed by a complete hour (16:00 to 17:00) in the activity level 'inactive', which coincided with the sleeping phase from 16:01 to 17:09.

Posture Detection
The trained model was used to classify the postures, 'sitting', 'lying', and 'standing'. Figure 4 shows the posture overviews for all recorded days during the observation period of 11 days.   In all plot, periods when an animal caretaker was in the room were very well recognizable, e.g., from 11:30 to 11:55 on day 1, (Figure 4a), and also on day 7 (Figure 4e). The red fox was only standing or sitting, but hardly lying. The anesthesia on day 3 with the subsequent wake-up phase (described in Section 3.1.1) is also reflected in the classified postures. On day 11, there was a phase in which the red fox almost exclusively showed the postures 'sitting' and 'standing', which lasted from about 10:00 to 12:00. The classified postures agree with the manually analyzed video data for this period, i.e., the red fox mostly sat or stood from 10:00 onwards, it laid down at 12:20. Another remarkable half hour was on day 5 from 9:00 p.m. to 9:30 p.m. Here the body posture detection shows that the red fox only used the body postures 'sitting' and 'standing'. The classified postures are consistent with the video.
The classified postures were used to determine the numbers of body posture changes. The numbers of changes for day 1, summed up per half hour, are shown in Figure 5. Randomly three arbitrarily selected half-hour videos were manually analyzed and compared with the determined number of changes. For this purpose, one video with many changes (11:00 to 11:30-98 changes), one with few changes (07:30 to 08:00-5 changes), and one with no changes (17:00 to 17:30-zero changes) were randomly selected from all the videos of day 1. In all three videos, the automatic detected numbers of changes matched the numbers seen during manual inspection of the video and, in addition, the time points of the changes matched. Figure 6 shows a timeline for each of the periods with posture changes. The five changes during the period lasting from 07:30 to 08:00 period are illustrated in Figure 6a, where each vertical line represents a posture change. A second period (Figure 6b) depicts all 98 posture changes that were recorded between 11:00 and 11:30. Most of the changes occurred between the postures 'sitting' and 'standing'.

Behavior Detection
To determine the behavior, all combinations of posture ('lying', 'sitting', and 'standing') and activity level ('inactive', 'active', and 'highly active') were considered and a behavior assigned to each combination, i.e., 'highly active standing', 'active standing', 'standing still', 'active lying', 'lying motionless', 'active sitting', and 'sitting still'. This is shown as a decision tree in Figure 7. With the decision tree (Figure 7) it is possible to generate a continuous behavior overview with a resolution of one value per second (see Section 2.7). For example, if a red fox is classified as 'standing' in an image, the combined view with the determined activity level provides the behavior of the fox. If the activity level is 'highly active', the behavior of 'highly active standing' can be inferred. If the corresponding activity level is 'active', this means that the fox is 'standing active'. The last possible combination is the activity level 'inactive', which means that the behavior of the fox is 'standing still'.
The determined behavior of the red fox is exemplarily represented by two timelines over half an hour each in Figure 8. The first half hour (Figure 8a) shows the behavior of a sleeping fox. It can be seen that the animal was 'lying motionless' for almost the entire observation period. Only at two short moments was there movement during sleep. The behavior coincides with the real behavior as determined by visual inspection of the video data, where it became evident that the fox slept curled up and changed its lying position only twice. In the second timeline (Figure 8b), one can see that the red fox showed all the defined behaviors. Here, visual inspection of the video showed that the red fox changed its location very often, and changed frequently between the body postures 'sitting', 'standing', and 'lying'.

Discussion
We were able to train a classifier with a very high precision for posture detection, and we can determine animal behavior in detail in a high temporal resolution. Our results suggest that the presented method may be useful for monitoring animals, especially posture, behavior, activity, and additional posture change monitoring. In our setting, the results show that the model achieves high performance for the posture detection. Moreover, the detection speed is sufficient for a real-time detection with 5 fps, which was used in this study. Furthermore, the animals do not need to be equipped with a sensor or collar, as was the case in other studies (e.g., [20,44]). This non-invasive approach is a major advantage of computer vision [45].
Activity determination works in the same way with the new method for the posture detection in our previous study on red fox detection [33]. The activity overview can also be used to detect events like the anesthesia or the presence of an animal caretaker in the room. In particular, increased activity in the presence of humans illustrates their influence on animal behavior and the advantages of video observation in getting an-at least in this respect-unbiased insight. Changes in movement patterns can provide information relevant to animal welfare and health [44,46], especially over long terms these data can be achieved automatically by the proposed approach.
In this study, three major postures of red foxes were investigated; 'lying', 'sitting', and 'standing'. The posture 'lying' consists of all the shown lying postures of the red fox, including 'lying curled up', 'lying on side', 'lying on back', and 'lying prone'. The red fox also shows other body postures like 'standing on the hind legs' or 'standing front end lowered'. For these rare postures, it is difficult to create a large enough image set for training, e.g., only 25 images could be labeled with the posture 'standing front end lowered' from the total extracted image set. The classified postures are also suitable for determining periods with special events. Furthermore, in contrast to pure activity monitoring, periods with an unremarkable activity, but a noticeable ratio of the postures were detected. For example, in the period from 09:00 to 09:30 on day 4, the red fox shows almost only 'sitting' and 'standing' postures ( Figure 4c), but no particularly striking values regarding of activity levels. Thus, the different duration of the postures could be used as an indicator of animal behavior. As an example, a reduction of lying time is a typical behavior change before a calving in dairy cows [47]. Similarly, the determination of the number of changes is an indicator for behavioral changes. In dairy cows, an increase of posture changes is another typical behavior change before calving [48].
Analysis of the behavior as a result of the combined evaluation of activity and posture yields a detailed overview of the behaviors under consideration, i.e., 'highly active standing', 'active standing', 'standing still', 'active lying', 'lying motionless', 'active sitting', and 'sitting still'. The use of the developed decision tree provided good insights into detailed behavior. Even though we did not provide a detailed paired analysis here, given the high accuracy of the used algorithm (as discussed in Section 3.1), the method is capable of reproducing the manual results. However, different kinds of behavior that show the same combination of activity level and posture cannot be distinguished. For example, the behaviors 'lying motionless-sleeping' and 'lying motionless-awake' have the same combination and the decision tree provides 'lying motionless'. However, sleeping and resting phases could be used to refine these categories, which may be relevant as indicators of animal welfare [6]. There is a further limitation in assessing welfare based on behavior, e.g., 'highly active standing' could be due to the fox's interaction with enrichment (indicating good welfare) and also pacing (indicating poor welfare). Thus, the number of determinable behaviors is a limitation [45]. This limitation can be minimized, for example, by training a model with more postures.
Our proposed method is based on single snapshots-as opposed to estimating behavior from long-time windows. It shows a high precision and accuracy here (correct detection of behavior compared to manual observation). In order to estimate the accuracy of long-term evaluations, a long-term study using a large amount of video data with this method might be part of future research.
We showed that computer vision systems are useful to generate activity, posture, and behavior overviews and the number of posture changes.

Conclusions
The aim of this study was to investigate the potential of computer vision for the detection of red foxes and the classification of their body postures to use the results for activity, posture, and behavior analysis. On the basis of the YOLOv4 algorithm, a model for the detection of red foxes was realized that classifies the body postures 'lying', 'sitting', and 'standing'. Along with the subsequent analysis of the posture changes, activity levels, and behavior of the red foxes, this study provides a method for seamless monitoring.
The generated daily, weekly, or monthly overviews (activity, posture, behavior) can be used to monitor animal activities, posture, and behavior, and may thus help to establish indicators for animal welfare.

Institutional Review Board Statement:
The data used in this paper is a subset of data generated in an experimental study with red foxes conducted over 450 days [34]. Animal housing and maintenance were in accordance with national and European legislation and followed the guidelines for the veterinary care of laboratory animals. The experimental study was approved by the local authority in Mecklenburg-Western Pomerania (Landesamt für Landwirtschaft, Lebensmittelsicherheit und Fischerei Mecklenburg-Vorpommern, # FLI-7221.3-1-087/16) and was conducted at the Friedrich-Loeffler-Institut (FLI), Greifswald, Insel Riems, Germany.

Informed Consent Statement: Not applicable.
Data Availability Statement: The data sets during and/or analyzed during the current study are available from the corresponding author on reasonable request. with <object-class> an integer between 0 and number of classes−1, and <BB x_center>, <BB y_center>, <BB width> and <BB hight> are float values between (0, 1], relative to the image height and width. Thus, the directory with the images contains a .txt file for each image with the same name. Create the files train.txt and test.txt. Split the image set into a training and test set and save the file names of the images, with respect to the full path relative to the directory darknet, in the respective file (one file name per line).

5.
Create a file obj.data containing the number of classes and paths to train.txt, obj.names, and the backup folder: classes = 3 train = data/train.txt names = data/obj.names backup = backup/ 6. For starting the training, run the code: ./darknet detector train obj.data yolo-obj.cfg yolov4.conv.137 The training can take several hours. During training the trained weights are saved in the backup/ directory, yolo-obj_xxxx.weights every 1000 iterations and yolo-obj_last.weights every 100 iterations. After training the final weight, yolo-obj_final.weights is also stored there. 7.