Combination of Sensor Data and Health Monitoring for Early Detection of Subclinical Ketosis in Dairy Cows

Subclinical ketosis is a metabolic disease in early lactation. It contributes to economic losses because of reduced milk yield and may promote the development of secondary diseases. Thus, an early detection seems desirable as it enables the farmer to initiate countermeasures. To support early detection, we examine different types of data recordings and use them to build a flexible algorithm that predicts the occurence of subclinical ketosis. This approach shows promising results and can be seen as a step toward automatic health monitoring in farm animals.


Introduction
Subclinical ketosis (SCK) is a common metabolic disease of dairy cows in early lactation, characterised by an increased concentration of ketone bodies in the absence of clinical signs of disease [1]. Analyzing the concentration of ß-hydroxybutyrate (BHB) in blood is the recommended reference test for detecting ketosis in dairy cows [2]. A commonly used threshold to define SCK is a ß-hydroxybutyrate (BHB) concentration in blood >1.2 mmol/L [3,4]. To detect SCK in dairy cows, various hand-held devices are commercially available, which were recently evaluated for use on farms [5,6]. The occurrence of SCK in dairy cows is associated with an increased risk of sequalae (e.g., clinical ketosis, displaced abomasum, metritis), decreased milk yield and impaired reproductive performance [3,7,8], affecting the economics of a dairy farm [9]. Major risk factors for the occurrence of ketosis are an excessive body condition score (BCS) before calving, an increased colostrum volume at first milking and an advanced parity [10]. Recent studies showed that subclinical and clinical diseases are associated with distinct animal behaviours, e.g., rumination as well as with standing and lying times, respectively [11,12]. Nowadays, more and more farmers rely on sophisticated sensor technologies for continuous and automated real-time monitoring of animal behaviours as well as of their health status [13,14]. The aim of this study was to predict the ketosis status of dairy cows within the first two weeks of lactation, based on 12 input variables, inter alia of the accelerometer system SMARTBOW (Smartbow GmbH, Weibern, Austria). The prediction is made using a flexible classification algorithm combining time series based acceleration data with other input specifically designed to cope with possibly different availability of data. In Sections 2.1-2.3, we discuss the different types of recorded data and how they were assessed.

Accelerometer
In this study, we used the accelerometer system SMARTBOW (Smartbow GmbH, Weibern, Austria), which was recently evaluated for monitoring of rumination [17,18] and detecting estrus events [19]. The system includes ear-tags (size and weight of 52 × 36 × 17 mm and 34 g) equipped with a three-dimensional accelerometer, receivers (SMARTBOW WallPoint) that are installed in the barn, and a local server (SMARTBOW Farmserver). Recorded data were sent from the ear-tags wirelessly and in real time to the receivers and transmitted to the local server, where data were processed by specific algorithms. In this study, 10 Hz sensors were used for measuring acceleration in three axes of head and ear movements of the animal with a range from −2 to +2 g. All cows were equipped with the sensor-based ear-tags approximately three weeks before the expected day of calving. In this study, the raw recordings were transformed into 7 data-streams that we are provided with. We inspect two time frames: the week before the day of calving and the the week after the day of calving. These seven data-streams represent the minutes per hour spent: • lying/not lying, • ruminating/not ruminating, • inactive/active/highly active.
The time spent in either of the states in each category above add up to 60 min; thus, we converted the data into 5 dimensional data-sets for every individual, namely minutes per hour spent lying, ruminating, inactive, active, and highly active.
A example of such 5 data-streams for both the pre-partal and post-partal time frame can be seen in Figure 1 below. Moreover, we present the average time spent in the respective behaviours for sick and healthy cows after calving in Figure 2.    For thorough examination, a decision on how to tackle missing data, and some results based on the presented sensor data, we refer to [20]. A summary of the data can be found in Table 1.

Health Data
The second type of data we consider is health data either directly recorded on farm site or based on previous calvings. Measurements that are either ordinal or nominal are transformed into metric features. In the following list, the features that were assessed are described in detail:

•
Body Condition Score (BCS): A total of three measurements were made, 8 weeks before calving (−8 w), 3 weeks before calving (−3 w) and on the day of calving (D0).
In Figure 3, the distribution of these three features is visualised:

•
Back Fat Thickness (BFT): As described above, three measurements were made: In Figure 4, the distribution of the BFT is visualised:  7. Non-Esterified Fatty Acids (NEFA): We used the maximal NEFA Value of all measurements as described in Section 2.1. This feature is vizualised in Figure 5. 8. 305 day Milk-Yield Equivalent: A measure that standardizes the milk yield of the previous lactation. Its impact on different diseases can be found in [21]. 9. This feature consists of the maximum observed fat/protein ratio during the previous lactation. 10. Parity: We distinguished between primi-and multiparous cows and transformed these categories as follows: primiparous → −1, multiparous → 1. Feature 8, 9 and 10 are depicted in Figure 6.

•
The following features are based on the locations the animals spent their time in the last two weeks before calving. We distinguished between three functional areas, namely cubicles (FA 1), feed alley (FA 2), and passageways (FA 3). These 9 features are depicted in Figure 7. 20. This feature describes the amount of hours in the last week before calving, where the animal was exposed to a temperature-humidity index (THI) of 72 or higher, where a THI ≥ 72 is defined by the Austrian Chamber of Agriculture as "moderate heat-stress", based on [22,23].
We can see the distribution of this feature in Figure 8 below.  The histograms above already give a first overview on to what extent the respective features differ between healthy and diseased individuals. A statistical comparison of all 20 features is discussed in Section 3.1.

Mathematical Section
This section covers the mathematical and algorithmic aspects for classifying the animals' health status, i.e., we define the central elements that constitute our classification algorithm. As we are provided with two different types of input, namely time series and features, we utilise methods for both types.
Time Series Classification (TSC) is a non-trivial task, which is thematized in numerous publications. State of the art in TSC is ensemble methods based on transformations of the original series and using flexible distance measures. Simple approaches, such as using Nearest Neighbour algorithm with a suitable similarity measure, still yield comparative results [24,25]. Deep learning approaches are shown to be promising but are still outperformed by distance based methods [26]. As simple methods are desirable both for the simplicity and the comparative performance, we designed a flexible measure to quantify the similarity between two time series which builds the basis for the first step in classification: . Letting a, b ∈ R n , f : R × R → R and g : {1, . . . , n} 2 → R with n ∈ N, we define the function D 1 : (1) Unfortunately, we can show that we have to oppose heavy limitations on the parameters to guarantee metric properties: Theorem 1. Let a = (a 1 , . . . , a n ), b = (b 1 , . . . , b n ) ∈ R n and δ ij be Kronecker's delta Consider the function D 1 (a, b; f , g, p) defined as in (1). With the choice D 1 is a metric on R n for p ≥ 1 and is equivalent to the n-dimensional Minkowski Distance [27]. More generally, with some weights h i that fulfil either: ∀h i : h i ∈ R + or ∀h i : h i ∈ R − are the only possible choices of g for D 1 being a metric on R n .

Proof of Theorem 1. See Appendix A.
Although the metric properties are of interest from a mathematical point of view and needed for some search speed-up algorithms [28], we can nevertheless utilise this function in a learning approach. As the general formulation of DIMA above allows for a variety of parameter settings, we assume that the function could be adjusted to build a central element for many other time series classification tasks. In our experiment, we decided on the following set of parameters: The function g(i, j) is constructed as follows: given a training set of uni-variate data with two healthy and sick classes, with n h and n s elements in the respective classes: and define the matrix G, where I n×n denotes the n × n Identity Matrix: Please note that, in case of λ = 1, D 1 reduces to the Manhattan Distance with our choices. In Figure 9 below, we see an exemplary visualisation of the matrix G(0) = G for our two considered time frames.

Machine Learning in Animal Disease Detection
Machine Learning approaches have been heavily used in animal behavioural/disease assessment over the last few years. Take as an example [29], where the authors applied different methods for detection of subacute ruminal acidosis in dairy cows. In their results, k-Nearest Neighbors showed the best results, outperforming deep learning methods and decision trees. In [30], the authors test deep learning architectures for early detection of respiratory disease in pigs and compare them with classical time series regression approaches. Their results do not show any significant differences in performance measures of the presented methods. The authors of [31] used a wearable device and a one-class support vector machine algorithm to detect lameness in dairy cows.
Having introduced the first main element of our approach in the last sub-section, we continue with shortly describing another two central elements: We utilise Nearest Centroid Classification [32] using Function D 1 with G(λ) as above for TSC, and a Naive Bayes Classifier [33] for the feature-based classification. We decided on the NCC algorithm as it is simple and inherently avoids bias based on class frequencies. Moreover, we utilised the well-known naive Bayes algorithm, as it can be easily adjusted to handle missing features both in the learning step and while testing, as we can learn parameters on reduced examples, and just omit the probabilities for testing where a data-point is missing. Thus, our algorithm can be employed for 1 to up to 20 features available. In case of all features missing, one could introduce indecisive results, to indicate that the next steps are up to farm management, or classify solely based on ear-tag data. We omit the description of these two algorithms, and information can be found in the respective citations. Moreover, we decided on a feature-selection step using Relief [34].
We describe the tailor-made algorithm in the following section.

Proposed Algorithm
The first step is to split up the data into stratified sets for 10-fold cross validation. About 90% constitute the training data, on which the parameters are learned, while the resulting algorithm is used to classify the remaining ∼10 percent. The results are added up. Thus, we repeat the following steps 10 times: 1. For every data stream of the sensor data, we learn an optimal parameter λ such that the leave-one-out inner cross validation balanced error is minimised using an NCC with distance function D 1 with G(λ). In case of ties, the highest value of λ is chosen. 2. Using these 5 (possibly different) parameters, we assume an animal to be sick or healthy, if the five trained NCCs from Step 1 decided at least 4 out of 5 times for a certain class label. The estimated labels are compared with the actual ones and the outcomes are added up to the results presented in Section 3.2.

Statistical Comparison
In this section, we compare the features we defined Section 2.3 using significance tests. We employ a Two Sided Mann-Whitney U Test with a p-value of 0.05. We decided for a non-parametric test, as our data-set violates assumptions such as normality. Using Bonferroni Correction for multiple hypothesis tests, we arrive at a threshold for significant results of 0.05/20 = 0.0025. The Mean ± Standard Deviation (µ ± σ) of each feature for both sick and healthy animals can be found in Table 2 below. Moreover, we added the exact p-values of each individual comparison, where a * indicates a significant difference.  Table 2 above shows that seven of our features do not differ statistically significant between healthy and sick animals, while 13 or 65% do. We can observe that the BCS before calving is higher for diseased animals, with a significant difference three weeks before calving, a finding that supports the results in [10]. In addition, in accordance with [10], the prevalence of SCK in multiparous animals is slightly higher, although there is no significant difference. Our results corroborate the findings in [35] where the authors also found a significant difference in NEFA concentration between healthy and ketotic cows.
All features based on location show significant differences, a result which we interpret as indicating that animals with higher risk of SCK tend to move less, i.e., variate their location less frequently than healthy cows.
In [36], the authors concluded that heat stress increases the ketosis risk mid-lactation. Our results point to the hypothesis that even prepartal heat stress may have an influence on development of SCK, as the average time spent under heat stress is significantly higher in diseased cows than in healthy ones. On the contrary, the authors of [37] calculated a 1.6 times higher risk of clinical ketosis in early lactation if the THI was lower than 83 on the day of calving in comparison to hotter days.
As we discussed the topic of possibly missing data, we briefly state the percentages of missing features below in Table 3.

Classification Results
As we assumed the location features to be rather specific, we evaluated our algorithm a total of four times, where we included the data described as follows: 1. Sensor data before calving, location features included 2. Sensor data before calving, location features not included 3. Sensor data after calving, location features included 4. Sensor data after calving, location features not included Applying our algorithm using a tenfold cross validation yielded the following results: We start out with stating the confusion matrices, for the prepartal sensor data with location features included (left) and without a location (right): Based on these confusion matrices, we can calculate some famous performance measures, which we present in Table 4 below. The best values in each column are boldfaced. The results, which clearly show better results for the post-partal time frame, are consistent with the findings in [20]. Although using pre-partal data seems desirable with respect to early detecting, the results indicate that the difference is more distinct post partum as all considered performance measures are the highest for Experiment 4.
Moreover, when comparing the difference w.r.t to the inclusion of location data, the results are slightly better when excluding the location data for both times frames, which is surprising given the statistically significant differences between healthy and sick cows as described in Section 3.1.
The percentage of correctly identified sick animals (=Sensitivity) varies from 63.2%-67.0% while the ratio of correctly identified healthy animals (=Specificity) varies from 59.3%-73.6%. The Negative Predictive Value is very high for all experiments (0.91%-0.92) but should be treated with caution as it is highly affected by an imbalanced data structure. As described in Table 1, we are dealing with an imbalanced class structure.
When looking at the results from Experiment 4, we see that more than two-thirds of the sick animals were detected correctly, while nearly 75% of the healthy animals were labelled correctly. Due to the imbalanced prevalence of class labels, estimating "sick" for a data set is only correct for about 32%, as can be seen by inspecting the precision, which still leaves room for improvement.
The algorithm proposed in Section 2.4 can be easily adjusted to include more individuals being classified as sick, useful e.g., when assuming the algorithm as a first "selection". For that, we can learn a threshold in Naive Bayes such that a certain percentage of "sick" training examples is classified correctly. Moreover, we can only filter data-sets in the first step where all five NCC were decided for the same class label, which leaves possibly more examples without a definitive decision.

Parameters and Relevant Features Learned
As we estimated the classification quality in a cross validation scheme, we repeatedly selected possibly different subset of features. We state the results as we assume these choices reflect the relevance of the respective features. As we distinguish between two experiments each, where we either included or excluded location features, we have a total of 20 feature selection procedures for both scenarios. Figure 10 shows that the amount of times a feature was chosen varies greatly, as it ranges from 3 up to 19. NEFA was chosen in 19 out of 20 splits, an indicator of its relevance for detecting SCK. Moreover, all BCS and BFT values were chosen in more than a third of all splits. The three least chosen features are all based on location.  Figure 11 shows that, also when excluding the location features, the amount of times a feature was chosen varies greatly from 3 up to 16. NEFA was chosen in 16 out of 20 splits, followed by THI, Milkyield and BCS -3 w. Based on this observation, the Parity and Max f/p ratio can be considered as having low relevance.

Discussion
In this article, we presented results from a study to identify indicators for subclinical ketosis in dairy cows around calving. Moreover we constructed an algorithm, which aims for estimating the health status.
We included a statistical comparison of different parameters, based on milk yield and components, animal movements within the barn, ambient temperature, and on visual observation. The results showed significant differences in 13 of the examined parameters between healthy cows and ones suffering from SCK. A literature review showed that our results partly corroborate the conclusions from other studies.
In a second step, we introduced a flexible machine learning approach, which combines elements of TSC with classical feature based algorithms. It is designed to be simple, interpretable, and flexible with respect to data availability. The results show that our approach is a promising first step for automatic recognition of diseases in dairy cows.
Future work will include more elaborate machine learning approaches to tackle the problem of early detection of SCK.

Conflicts of Interest:
The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

Appendix A
Proof. We want to prove the following statements: Let us begin with (A1): We plug into the definition of D 1 with our special choice for f and g: Since the last expression (A3) is equivalent to the n-dimensional Minkowski distance, we finished the first proof. Let us continue with the second statement (A2): First, we observe that (A2) is equivalent to the following implication: The proof of (A4) is trivial for n = 1, as every statement with an universal quantifier (∀) on an empty set is true. The proof of (A5) is also trivial for n = 1, since g(1, 1) cant be zero, as this would imply M(a, b) = 0 for all a,b. Therefore, let us continue proving (A4) by assuming n ≥ 2: Suppose now D 1 is a metric on n, which means that the metric conditions have to hold. First, we derive some properties of f : We start with the observation that f (x, x) has to be 0 for all x ∈ R: Let a = (x, . . . , x) T ∈ R n , x ∈ R: From the Identity of Indiscernibles-property of a metric, we see: Next, we can easily show that has to hold under the assumption that D 1 is a metric. Therefore, let a = (x, . . . , x) T , b = (y, . . . , y) T ∈ R n , x, y ∈ R, x = y.
We know that D 1 (a, b) p > 0 has to hold. Assume now (A7) as false: Seeing this contradiction, we arrive at the conclusion that (A7) has be true, which yields that Our next step is to prove that f (x, y) = f (y, x) ∀x, y ∈ R, and (A9) have to hold. Therefore, we inspect the symmetry condition of our metric using the same a, b as above, which tells us that Using the fact that (A7) holds, we can simplify to Next, we aim for (A10): Since a = b, we know because of the metric properties that Using (A7), we can derive ⇐⇒ f (x, y) = 0 ∀x, y ∈ R, x = y We can use these proven properties of f to finally show that g has to have the aforementioned properties. Therefore, let 1 ≤n,m ≤ n,m =n. We define four vectors a, b, c, d ∈ R: Using the metric properties, we know that Using the symmetry property (A9), the fact (A10) and renaming the indices, we get: Since we know that D 1 is a metric, we see: Using the symmetry property (A9): For the case n = 2, we are done with the proof of (A4), as in this case both (A16) and (A17) read as g(1, 2) = 0 and g(2, 1) = 0.
Assume now that n ≥ 3: Using the symmetry property (A9) and the fact (A10), we can pull out a factor f (x, y) in all sums and divide by it. Moreover, we rename the indices and get: Use the facts (A16) and (A17) that We subtract the first and fourth expression from (A19) from the left hand side of (A18), and the second and third expression from (A19) from the right-hand side of (A18), which gives us g(n,n) − g(n,m) + g(m,m) − g(n,m) = g(m,m) − g(m,n) + g(n,n) − g(m,n), which simplifies to g(n,m) = g(m,n) ∀m =n, 1 ≤m,n ≤ n. (A20) We define a = (x, x, . . . , x) T as above and two more vectors b and c: b = (x, x, . . . , x , y, Position i x, . . . , x) T , c = (x, x, . . . , x , y, Position j x, . . . , x) T We now prove (A5) by showing that the opposite assumptions lead to a contradiction. Assume therefore that (A5)' holds: ∃i, j, i = j : g(i, i) ≤ 0 ∧ g(j, j) ≥ 0 As we assume D 1 to be a metric, we know that D(a, b) p > 0 ∧ D(a, c) p > 0. Using (A7) and the simplified form of D 1 (A24), we see that the following statement has to hold: f (x, y)g(i, i) > 0 ∧ f (x, y)g(j, j) > 0 (A25) Inspecting (A25), we see that neither g(i, i) = 0 nor g(j, j) = 0 can fulfil these inequalities, thus we assume g(i, i) < 0 ∧ g(j, j) > 0. Using these assumptions and dividing by g(i, i) resp. g(j, j), we arrive at: f (x, y) < 0 ∧ f (x, y) > 0 (A26) As we arrived at a contradictory result (A26) by assuming (A5)' to hold, we can conclude that (A5) has to hold and therefore we have proven (A2).